You can see how we use fastq_screen to quickly look for organisms in our raw fastq files by checking out our protocol here
FastQ Screen - Map sequences against multiple genomes
www.bioinformatics.babraham.ac.uk/projects/fastq_screen
Contact: steven.wingett@babraham.ac.uk
Synopsis
fastq_screen [OPTIONS]... [FastQ FILE]...
Function
FastQ Screen is intended to be used as part of a QC pipeline.
It allows you to take a sequence dataset and search it
against a set of bowtie databases. It will then generate
both a text and a graphical summary of the results to see if
the sequence dataset contains the kind of sequences you expect.
Options
--add_genome <text> Edits the file 'fastq_screen.conf' (in the folder where
this script is saved) to add a new genome. Specify the
additional genome as a comma separated list:
'Database name','Genome path and basename','Notes'
--aligner <func> Specify the aligner to use for the mapping. Valid
arguments are 'bowtie', bowtie2' (default) or 'bwa'.
Bowtie maps with parameters -k 2, Bowtie 2 with
parameters -k 2 --very-fast-local and BWA with mem -a.
Local aligners such as BWA or Bowtie2 will be better
at detecting the origin of chimeric reads.
--bisulfite Process bisulfite libraries. The path to the
bisulfite aligner (Bismark) may be specified in the
configuration file. Bismark runs in non-directional
mode. Either conventional or bisulfite libraries may
be processed, but not both simultaneously. The
--bisulfite option cannot be used in conjunction with
--bwa.
--bowtie <text> Specify extra parameters to be passed to Bowtie.
These parameters should be quoted to clearly
delimit bowtie parameters from fastq_screen
parameters. You should not try to use this option
to override the normal search or reporting options
for bowtie which are set automatically but it might
be useful to allow reads to be trimmed before
alignment etc.
--bowtie2 <text> Specify extra parameters to be passed to Bowtie 2.
These parameters should be quoted to clearly
delimit Bowtie 2 parameters from FastQ Screen
parameters. You should not try to use this option
to override the normal search or reporting options
for bowtie which are set automatically but it might
be useful to allow reads to be trimmed before
alignment etc.
--bwa <text> Specify extra parameters to be passed to BWA.
These parameters should be quoted to clearly
delimit BWA parameters from FastQ Screen
parameters. You should not try to use this option
to override the normal search or reporting options
for BWA which are set automatically but it might
be useful to allow reads to be trimmed before
alignment etc.
--conf <path> Manually specify a location for the configuration.
--filter <text> Produce a FASTQ file containing reads mapping to
specified genomes. Pass the argument a string of
characters (0, 1, 2, 3, -), in which each character
corresponds to a reference genome (in the order the
reference genome occurs in the configuration file).
Below gives an explanation of each character.
0: Read does not map
1: Read maps uniquely
2: Read multi-maps
3: Read maps (one or more times)
4: Passes filter 0 or filter 1
5: Passes filter 0 or filter 2
-: Do not apply filter to this genome
Consider mapping to three genomes (A, B and C), the
string '003' produces a file in which reads do not
map to genomes A or B, but map (once or more) to
genome C. The string '--1' would generate a file in
which reads uniquely map to genome C. Whether reads
map to genome A or B would be ignored.
A read needs to pass all the genome filters to be
considered valid (unless --pass specified).
When --filter is used in conjuction with --tag, FASTQ
files shall be mapped, tagged and then filtered. If
the --tag option is not selected however, the input
FASTQ file should have been previously tagged.
--force Do not terminate if output files already exist,
instead overwrite the files.
--get_genomes Download pre-indexed Bowtie2 genomes for a range of
commonly studied species and sequences. If used with
--bisulfite, Bismark bisulfite Bowtie2 indices will
be downloaded instead.
--help Print program help and exit.
--illumina1_3 Assume that the quality values are in encoded in
Illumina v1.3 format. Defaults to Sanger format
if this flag is not specified.
--inverse Inverts the --filter results i.e. reads that pass
the --filter parameter will not pass when
--filter --inverse are specified together, and vice
versa.
--nohits Writes to a file the sequences that did not map to
any of the specified genomes. This option is
equivalent to specifying --tag --filter 0000 (number
of zeros corresponds to the number of genomes
screened). By default the whole input file will be
mapped, unless overridden by --subset.
--outdir <text> Specify a directory in which to save output files.
If no directory is specified then output files
are saved in the current working directory.
--pass <int> Used in conjunction with --filter. By default all
genome filters must be passed for a read to pass
the --filter option. However, a minimum number
of genome filters may be specified that a read
needs pass to be considered to pass the --filter
option. (--pass 1 effecitively acts as an OR
boolean operator for the genome filters.)
--quiet Suppress all progress reports on stderr and only
report errors.
--subset <int> Don't use the whole sequence file, but create a
temporary dataset of this specified number of
reads. The dataset created will be of
approximately (within a factor of 2) of this size.
If the real dataset is smaller than twice the
specified size then the whole dataset will be used.
Subsets will be taken evenly from throughout the
whole original dataset. By Default FastQ Screen
runs with this parameter set to 100000. To process
an entire dataset however, adjust --subset to 0.
--tag Label each FASTQ read header with a tag listing to
which genomes the read did, or did not align. The
first read in the output FASTQ file will list the
full genome names along with a score denoting
whether the read did not align (0), aligned
uniquely to the specified genome (1), or aligned
more than once (2). In subsequent reads the
genome names are omitted and only the score is
printed, in the same order as the first line.
This option results in the he whole file being
processed unless overridden explicitly by the user
with the --subset parameter
--threads <int> Specify across how many threads bowtie will be
allowed to run. Overrides the default value set
in the configuration file
--top <int>/<int,int> Don't use the whole sequence file, but create a
temporary dataset of the specified number of
reads taken from the top of the original file. It is
also possible to specify the number of lines to skip
before beginning the selection e.g.
--top 100000,5000000 skips the first five million
reads and selects the subsequent one hundred thousand
reads. While this option is usually faster than
comparable --subset operations, it does not prevent
biases arising from non-uniform distribution of
reads in the original FastQ file. This option should
only be used when minimising processing time is of
highest priority.
--version Print the program version and exit.
2019 Babraham Institute, Cambridge, UK