Fastp

Usage: emirge.py DIR <required_parameters> [options]

This version of EMIRGE (emirge.py) attempts to reconstruct rRNA SSU genes from
Illumina metagenomic data.
DIR is the working directory to process data in.
Use --help to see a list of required and optional arguments

Additional information:
https://groups.google.com/group/emirge-users
https://github.com/csmiller/EMIRGE/wiki

If you use EMIRGE in your work, please cite these manuscripts, as appropriate.

Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011)
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data.
Genome biology 12: R44. doi:10.1186/gb-2011-12-5-r44.

Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF (2013)
Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments.
PloS one 8: e56018. doi:10.1371/journal.pone.0056018.


Options:
  -h, --help            show this help message and exit

  Required flags:
    These flags are all required to run EMIRGE, and may be supplied in any
    order.

    -1 reads_1.fastq[.gz]
                        path to fastq file with \1 (forward) reads from
                        paired-end sequencing run, or all reads from single-
                        end sequencing run.  File may optionally be gzipped.
                        EMIRGE expects ASCII-offset of 64 for quality scores.
                        (Note that running EMIRGE with single-end reads is
                        largely untested.  Please let me know how it works for
                        you.)
    -f FASTA_DB, --fasta_db=FASTA_DB
                        path to fasta file of candidate SSU sequences
    -b BOWTIE_DB, --bowtie_db=BOWTIE_DB
                        precomputed bowtie index of candidate SSU sequences
                        (path to appropriate prefix; see --fasta_db)
    -l MAX_READ_LENGTH, --max_read_length=MAX_READ_LENGTH
                        length of longest read in input data.

  Required flags for paired-end reads:
    These flags are required to run EMIRGE when you have paired-end reads
    (the standard way of running EMIRGE), and may be supplied in any
    order.

    -2 reads_2.fastq    path to fastq file with \2 (reverse) reads from
                        paired-end run.  File must be unzipped for mapper.
                        EMIRGE expects ASCII-offset of 64 for quality scores.
    -i INSERT_MEAN, --insert_mean=INSERT_MEAN
                        insert size distribution mean.
    -s INSERT_STDDEV, --insert_stddev=INSERT_STDDEV
                        insert size distribution standard deviation.

  Optional parameters:
    Defaults should normally be fine for these options in order to run
    EMIRGE

    -n ITERATIONS, --iterations=ITERATIONS
                        Number of iterations to perform.  It may be necessary
                        to use more iterations for more complex samples
                        (default=40)
    -a PROCESSORS, --processors=PROCESSORS
                        Number of processors to use in the mapping steps.  You
                        probably want to raise this if you have the
                        processors. (default: 1)
    -m MAPPING, --mapping=MAPPING
                        path to precomputed initial mapping (bam file).  If
                        not provided, and initial mapping will be run for you.
    -p SNP_FRACTION_THRESH, --snp_fraction_thresh=SNP_FRACTION_THRESH
                        If fraction of variants in a candidate sequence
                        exceeds this threhold, then split the candidate into
                        two sequences for next iteration.  See also
                        --variant_fraction_thresh. (default: 0.04)
    -v VARIANT_FRACTION_THRESH, --variant_fraction_thresh=VARIANT_FRACTION_THRESH
                        minimum probability of second most probable base at a
                        site required in order to call site a variant.  See
                        also --snp_fraction_thresh.  (default: 0.1)
    -j JOIN_THRESHOLD, --join_threshold=JOIN_THRESHOLD
                        If two candidate sequences share >= this fractional
                        identity over their bases with mapped reads, then
                        merge the two sequences into one for the next
                        iteration.  (default: 0.97; valid range: [0.95, 1.0] )
    -c MIN_DEPTH, --min_depth=MIN_DEPTH
                        minimum average read depth below which a candidate
                        sequence is discarded for next iteration(default: 3)
    --nice_mapping=NICE_MAPPING
                        If set, during mapping phase, the mapper will be
                        "niced" by the Linux kernel with this value (default:
                        no nice)
    --phred33           Illumina quality values in fastq files are the (fastq
                        standard) ascii offset of Phred+33.  This is the new
                        default for Illumina pipeline >= 1.8. DEFAULT is still
                        to assume that quality scores are Phred+64
    -e SAVE_EVERY, --save_every=SAVE_EVERY
                        every SAVE_EVERY iterations, save some information
                        about the program's state.  This is solely for
                        debugging information, and is NOT required to resume a
                        run (see --resume_from below).  (default=none)

  Resuming iterations:
    These options allow you to resume iterations from a previously
    completed EMIRGE iteration.  This requires that directories for the
    iteration to resume from and the previous iteration both be present.
    It is STRONGLY recommended that other options set on the command line
    be identical to the original run.  Note that EMIRGE does not check
    this for you!

    -r RESUME_FROM, --resume_from=RESUME_FROM
                        Resume iterations from COMPLETED iteration specified.
                        Requires that the iteration and previous iteration
                        fully completed, i.e. a priors file, bam file, and
                        fasta file are all present in the iteration directory.