Usage: emirge.py DIR <required_parameters> [options] This version of EMIRGE (emirge.py) attempts to reconstruct rRNA SSU genes from Illumina metagenomic data. DIR is the working directory to process data in. Use --help to see a list of required and optional arguments Additional information: https://groups.google.com/group/emirge-users https://github.com/csmiller/EMIRGE/wiki If you use EMIRGE in your work, please cite these manuscripts, as appropriate. Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011) EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome biology 12: R44. doi:10.1186/gb-2011-12-5-r44. Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF (2013) Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments. PloS one 8: e56018. doi:10.1371/journal.pone.0056018. Options: -h, --help show this help message and exit Required flags: These flags are all required to run EMIRGE, and may be supplied in any order. -1 reads_1.fastq[.gz] path to fastq file with \1 (forward) reads from paired-end sequencing run, or all reads from single- end sequencing run. File may optionally be gzipped. EMIRGE expects ASCII-offset of 64 for quality scores. (Note that running EMIRGE with single-end reads is largely untested. Please let me know how it works for you.) -f FASTA_DB, --fasta_db=FASTA_DB path to fasta file of candidate SSU sequences -b BOWTIE_DB, --bowtie_db=BOWTIE_DB precomputed bowtie index of candidate SSU sequences (path to appropriate prefix; see --fasta_db) -l MAX_READ_LENGTH, --max_read_length=MAX_READ_LENGTH length of longest read in input data. Required flags for paired-end reads: These flags are required to run EMIRGE when you have paired-end reads (the standard way of running EMIRGE), and may be supplied in any order. -2 reads_2.fastq path to fastq file with \2 (reverse) reads from paired-end run. File must be unzipped for mapper. EMIRGE expects ASCII-offset of 64 for quality scores. -i INSERT_MEAN, --insert_mean=INSERT_MEAN insert size distribution mean. -s INSERT_STDDEV, --insert_stddev=INSERT_STDDEV insert size distribution standard deviation. Optional parameters: Defaults should normally be fine for these options in order to run EMIRGE -n ITERATIONS, --iterations=ITERATIONS Number of iterations to perform. It may be necessary to use more iterations for more complex samples (default=40) -a PROCESSORS, --processors=PROCESSORS Number of processors to use in the mapping steps. You probably want to raise this if you have the processors. (default: 1) -m MAPPING, --mapping=MAPPING path to precomputed initial mapping (bam file). If not provided, and initial mapping will be run for you. -p SNP_FRACTION_THRESH, --snp_fraction_thresh=SNP_FRACTION_THRESH If fraction of variants in a candidate sequence exceeds this threhold, then split the candidate into two sequences for next iteration. See also --variant_fraction_thresh. (default: 0.04) -v VARIANT_FRACTION_THRESH, --variant_fraction_thresh=VARIANT_FRACTION_THRESH minimum probability of second most probable base at a site required in order to call site a variant. See also --snp_fraction_thresh. (default: 0.1) -j JOIN_THRESHOLD, --join_threshold=JOIN_THRESHOLD If two candidate sequences share >= this fractional identity over their bases with mapped reads, then merge the two sequences into one for the next iteration. (default: 0.97; valid range: [0.95, 1.0] ) -c MIN_DEPTH, --min_depth=MIN_DEPTH minimum average read depth below which a candidate sequence is discarded for next iteration(default: 3) --nice_mapping=NICE_MAPPING If set, during mapping phase, the mapper will be "niced" by the Linux kernel with this value (default: no nice) --phred33 Illumina quality values in fastq files are the (fastq standard) ascii offset of Phred+33. This is the new default for Illumina pipeline >= 1.8. DEFAULT is still to assume that quality scores are Phred+64 -e SAVE_EVERY, --save_every=SAVE_EVERY every SAVE_EVERY iterations, save some information about the program's state. This is solely for debugging information, and is NOT required to resume a run (see --resume_from below). (default=none) Resuming iterations: These options allow you to resume iterations from a previously completed EMIRGE iteration. This requires that directories for the iteration to resume from and the previous iteration both be present. It is STRONGLY recommended that other options set on the command line be identical to the original run. Note that EMIRGE does not check this for you! -r RESUME_FROM, --resume_from=RESUME_FROM Resume iterations from COMPLETED iteration specified. Requires that the iteration and previous iteration fully completed, i.e. a priors file, bam file, and fasta file are all present in the iteration directory.