Accessing sequence data

Intro

There are many times you will need to either access sequence data from public data repositories, like the Sequence Read Archive (SRA), or from your own recent sequencing runs on Illumina's Basespace Cloud service. The steps below outline how to do this from the command line in a way that directly downloads data to our linux server for convenient access to our software

What you’ll need

Getting data from SRA

If you just want to get data directly from Illumina's Basespace, you can skip this step!

#navigate to the folder where you want the files to be downloaded
grabseqs sra -t 24 -m metadata.csv -r 3 PRJ#######

Accessing data on Basespace

  • Connect to our linux server using your terminal or by launching RStudio server in the browser and opening the terminal window.
  • Use the illumina basespace client tools and the bs auth function to authenticate
bs auth
# you should see a message similar to the one below.  Open a new browser tab and naviage to the url you see in your terminal.  You may be instructed to log into Basespace.  
Please go to this URL to authenticate:  https://basespace.illumina.com/oauth/device?code=NrZBs
# Once you've done that, you should see a message in your terminal that welcomes you.
Welcome, Daniel Beiting
  • Once authenticated, you will be allowed to download data directly from Basespace to our Linux.
  • Navigate to your sequencing project on Basespace and locate the project ID in your browser URL
  • image

Downloading fastq files from Basespace

Use the project ID number and the bs download function to get the data.

#specify where you want to data to download using the -o option
#NOTE: you need to have privileges to write to the folder you choose
bs download project -i 177233056 -o /publicData/myProjectFolder/
  • Once the download begins, you will be able to monitor progress in the terminal window (example below)
  • image

Downloading raw .bcl files from basespace

In some cases you will want unprocessed .bcl files that Illumina's basespace has not converted to fastq. This is often useful for single cell sequencing experiments. In these cases, you'll download the run, rather than the project

#specify where you want to data to download using the -o option
sudo bs download run -i 177233056 -o /publicData/myRunFolder/