Downloading dbGAP /

The working directory where I did this for GTEx:

# working directory: /data/aryee/caleb/mitoRNA2/gtex_mito

This vignette is useful for working with protected-access data from dbGAP. In this instance, I wanted to download raw sequencing data (.fastq files) from the GTEx project for further downstream analysis. This webpage provides an overview of some of the hiccups that I ran into trying to make this happen.

First, I downloaded the dbGAP repository key:

Next, I dragged the .ngc file into the working directory that I was going to download files through. Using vdb-config, an executable from SRA Toolkit, I

vdb-config --import prj_16592.ngc `pwd`

I ran the following loop by supplying this shell script with a file where each line contained an SRR_ID (over 9,400 total). This will linearly download each file before running a shell script that splits, aligns, and genotypes the mitochondria (as well as pulls down summary statistics) for each SRR file.

#!/bin/bash

SRR_IDS=$(cat $1 |tr "\n" " ")

for SRR_ID in $SRR_IDS
do
echo $SRR_ID
prefetch --ascp-path '/apps/lab/aryee/aspera-connect-3.5.6/bin/ascp|/apps/lab/aryee/aspera-connect-3.5.6/etc/asperaweb_id_dsa.openssh' --max-size 100G --ascp-options "-T -k 2 -l 400M" $SRR_ID
bsub -q big -o /dev/null sh doOne.sh $SRR_ID
done

where the per-SRA worker script looked like this:

cat doOne.sh

SRR=$1

# SRA -> .fastq
fastq-dump --split-files --gzip sra/${SRR}.sra

# Align and extract mitochondria
STAR --genomeDir /data/aryee/pub/genomes/star/hg19_chr/ --readFilesIn "${SRR}_1.fastq.gz" "${SRR}_2.fastq.gz" --readFilesCommand zcat --outFileNamePrefix ${SRR}
samtools view -H "${SRR}Aligned.out.sam" > "${SRR}.sam"
awk '$3 == "chrM" {print $0}' "${SRR}Aligned.out.sam" >> "${SRR}.sam"
samtools view -Sb "${SRR}.sam" | samtools sort > "mito_bam/${SRR}.mito.bam" && samtools index "mito_bam/${SRR}.mito.bam"
samtools view  "mito_bam/${SRR}.mito.bam" | wc -l > "mito_reads/${SRR}.mitoreads.txt"
mv "${SRR}Log.final.out" "starout/${SRR}Log.final.out"

rm "sra/${SRR}.sra"
rm "${SRR}_1.fastq.gz" 
rm "${SRR}_2.fastq.gz" 
rm "${SRR}.sam"
rm "${SRR}Aligned.out.sam"
rm "${SRR}Log.progress.out"
rm "${SRR}SJ.out.tab"
rm "${SRR}Log.out"