Downloading SRA files

Caleb Lareau, 11 January 2019

There are two main ways that I’ve found to achieve fast data retrieval from SRA via Aspera.

First way

The SRA maintains an FTP server that hosts most, but not all, of their data. You can do a direct download of the .sra file from this FTP server using aspera:

ASPERA=/apps/lab/aryee/aspera-connect-3.5.6/
SRR_ID="SRR6806716"

$ASPERA/bin/ascp -i $ASPERA/etc/asperaweb_id_dsa.openssh -T -k 2 -l 400M anonftp@ftp.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/${SRR_ID:0:6}/${SRR_ID}/${SRR_ID}.sra .
bsub -q normal fastq-dump --split-files --gzip ${SRR_ID}.sra

Second way

I’ve run into instances where we cannot easily get SRA files from the FTP server (they are missing). Consulting with the help desk informed me that prefetch may be a more stable route, but the command requires some tinkering to get large data downloaded via aspera. This is how to do it:

SRR_ID="SRR6806716"
prefetch $SRR_ID --max-size 50G -a "/apps/lab/aryee/aspera-connect-3.5.6/bin/ascp|/apps/lab/aryee/aspera-connect-3.5.6/etc/asperaweb_id_dsa.openssh"

Note: the prefetch route will try to place files in a default local directory that you can change by following these steps.