fastq_screen: QC analysis of sequencing data

Check for contamination with fastqc_screen
filtering reads with fastq_screen

Check for contamination with fastqc_screen

Often times there are questions about whether there may be reads, other than those from the intended sample source, present in a data file. If you suspect contamination of a particular kind (e.g. other host, plasmid, rRNA, or some common bacterium used in lab), you can run fastq_screen to check a subsample of reads from your raw fastq file against a set of reference genomes. Although this has nothing to do with Sourmash or minHask sketches per se, it can be a useful way to confirm findings from Sourmash, or to check for organisms not well represented in the SBT reference databases provided above.

You can read more about fastq_screen in this recent paper

fastq_screen uses bowtie2 for aligning reads to the references, so we’ve provided a set of reference genomes on our server to which you can easily compare.

We’ve taken care of configuring fastq_screen so that it knows where to find bowtie2 and where to look for the reference genomes. This information is pretty clearly outlined in the fastq_screen configuration file found at /usr/local/bin/fastq_screen/fastq_screen.conf

# make sure you're in the directory where your fastq files are
fastq_screen --threads 24 --outdir fastq_screen *gz

🔥

DO NOT modify the fastq_screen.conf file. If you want to use fastq_screen against a bowtie2 reference genome that is not listed below, please contact us for help

‣

The reference genomes listed below have already been added to the configuration file.

filtering reads with fastq_screen

Depending on the results you get with fastq_screen above, you may want to filter reads based on alignment to a particular reference genome of interest. This is particularly useful for removing host reads contaminating a metagenomic sample, for example. To do this, you can use the --tag and --filter options for fastq_screen. See the documentation for fastq_screen to understand how to properly use —-tag and --filter

First, tag each read in each fastq with the genome to which it aligns (from the available references described above)

fastq_screen --tag sampleX.fastq.gz

Next, filter based on tags that were assigned above

fastq_screen --filter 1000 sampleX.fastq.gz