List of software and uses

software	Version	Website	Citation	category	how to run	When to use	Reference files or databases
✅ anvi’o	7.1	merenlab.org	Community-led, integrated, reproducible multi-omics with anvi’o and Anvi’o: an advanced analysis and visualization platform for ‘omics data	metagenomicsvisualizationMAGs	`conda activate anvio-7.1`	When you're ready to dive into Metagenome Assembled Genomes (MAGs).
✅ Amazon Web Services Command Line Interface (AWS CLI)	2.12.6	docs.aws.amazon.com	unpublished	utility	`aws`	When you want to get reference genomes from the Illumina iGenomes project: https://ewels.github.io/AWS-iGenomes/
✅ BaseSpace Sequence Hub CLI tool suite	1.5.3	developer.basespace.illumina.com	unpublished	NGS tools	`bs`
✅ bcftools	1.18	samtools.github.io	Twelve years of SAMtools and BCFtools	NGS tools	`conda activate bcftools` then `bcftools`
✅ bcl2fastqb	2.20.0.422	support.illumina.com	unpublished	NGS tools	bcl2fastq -R $rundirectory -o $outdirectory --sample-sheet $samplesheet.csv --no-lane-splitting
✅ bcl-convert	4.2.7	emea.support.illumina.com	unpublished	NGS tools	bcl-convert
✅ bedtools	2.31.0	bedtools.readthedocs.io	BEDTools: a flexible suite of utilities for comparing genomic features	NGS tools	`bedtools`	Anytime you want to calculate genomic metrics from sequence data (e.g. coverage)
✅ BLAST	2.12.0	blast.ncbi.nlm.nih.gov	Basic Local Alignment Search Tool	sequence search	`blastn`, `blastp`, or `blastx`
✅ bowtie2	2.5.1	bowtie-bio.sourceforge.net	Ultrafast and memory-efficient alignment of short DNA sequences to the human genome and Fast gapped-read alignment with Bowtie2	read alignment	`bowtie2`	One of the best and most popular base-wise aligners. Even if you don't use it as your primary aligner, it is still used by many other software tools under the hood.	prebuilt bowtie2 indexes for many species are located in /data/reference_db in folders named by genus and species
✅ BWA	0.7.17-r1188	bio-bwa.sourceforge.net	Fast and accurate short read alignment with Burrows–Wheeler transform	read alignment	`bwa`	I don't use this directly, but used by other programs for alignment
✅ CellxGene gateway	0.3.11	github.com	unpublished	single cell	`cellxgene-gateway`	Allows us to host a cellxgene instance that works with multiple datasets	`/data/reference_db/cellxgene_data`
✅ CellRanger	7.1.0	support.10xgenomics.com		single cell	`cellranger`	If you want to preprocess single cell genomic data from the 10x platform
✅ CellRanger-arc	2.0.2	support.10xgenomics.com		single cell	`cellranger-arc`	If you want to preprocess single cell genomic data from the 10x platform
✅ CheckM	1.1.3	ecogenomics.github.io	CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes	metagenomicsQA/QC	`conda activate checkm` then `checkm lineage_wf`	If you have a bacterial genome assembly and want to check the quality of the assembly	`/data/reference_db/checkm`
✅ Clust	1.18.0	github.com	Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data	transcriptomics	`clust`	If you have an RNAseq dataset with multiple timepoints and want to identify 'tight' modules of co-regulated genes across these timepoints. Clust also allows comparison of modules between datasets/experiments.
✅ deeptools	3.5.2	deeptools.readthedocs.io	deepTools2: A next Generation Web Server for Deep-Sequencing Data Analysis	visualizationNGS tools	`deeptools`
❓ DIAMOND	2.1.8	www.diamondsearch.org	Fast and sensitive protein alignment using DIAMOND	multiple sequence alignment	`diamond`	If you have a bunch of protein AA or translated DNA sequences that you want to align.	diamond formatted databases for UniRef90 and UniRef50 live in `/data/reference_db/uniref`
✅ Docker	24.0.2, build cb74dfc	www.docker.com	unpublished	containerized software	`docker run [OPTIONS] IMAGE [COMMAND] [ARG...]`
✅ Dorado	0.3.1+bb8c5ee	github.com	unpublished	nanoporebasecallingGPU	`dorado`
✅ Fastp	0.23.4	github.com	fastp: an ultra-fast all-in-one FASTQ preprocessor	QA/QC	`fastp`
✅ FastQC	0.12.1	www.bioinformatics.babraham.ac.uk	unpublished	QA/QC	`fastqc`	The preferred choice for rapid quality control assessment of raw reads in a fastq file
✅ FastQ Screen	0.15.3	www.bioinformatics.babraham.ac.uk	FastQ Screen: A tool for multi-genome mapping and quality control	decontaminationQA/QC	`fastq_screen`	simple tool for figuring out if you fastq file has 'contaminating' reads from specific species. uses bowtie2 under the hood.
✅ Filezilla	3.63.0	filezilla-project.org	unpublished	utility	`filezilla`
✅ Filtlong	0.2.1	github.com	unpublished	QA/QCnanopore	`filtlong`	If you have Oxford Nanopore long read data and want to filter your raw data to remove reads based on length or quality
✅ Freebayes	1.3.6	github.com	Haplotype-based variant detection from short-read sequencing	variantsSNPs/INDELs	`freebayes`	For variant calling
✅ GATK	4.4.0.0	gatk.broadinstitute.org		variants	`gatk`	When you are working with SNPs/variants
❓ Grabseqs	0.7.0	github.com	grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories	public data	`grabseqs`	A convenient wrapper around the fasterq_dump software that makes it easy to grab sequences from SRA, ENA, MGRAST and iMicrobe
✅ GTDB-TK	2.1.1	ecogenomics.github.io	GTDB-Tk2: memory friendly classification with the genome taxonomy database and GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database	metagenomicsclassification	`conda activate gtdb` may need to run `export GTDBTK_DATA_PATH=/data/reference_db/GTDB-Tk/release214` to make sure your environment ‘sees’ the reference database		`/data/reference_db/GTDB-Tk`
✅ Humann3	3.7	huttenhower.sph.harvard.edu	Species-level functional profiling of metagenomes and metatranscriptomes	metagenomicsfunctional profiling		You have shotgun metagenomic data from a microbial community and want to understand functional content (e.g. bacterial metabolic pathways). Note that humann2 uses DIAMOND, MinPath and Bowtie2 under the hood	Humann: How to Set up / Run on Paired Files
✅ htop	3.2.2	htop.dev	unpublished	utility	`htop`
❓ iRep	1.10	github.com	Measurement of bacterial replication rates in microbial communities	metagenomics	`iRep` or `bPTR`
✅ Jellyfish	2.3.1	genome.umd.edu	A fast, lock-free approach for efficient parallel counting of occurrences of k -mers	NGS tools	`jellyfish`	For rapid/efficient counting of kmers in DNA
✅ Kallisto	0.50.1	pachterlab.github.io	Near-optimal probabilistic RNA-seq quantification	transcriptomicsread alignment	`kallisto`	Our preferred choice for mapping RNA-seq raw reads to a reference transcriptome	prebuilt kallisto indexes for a few species in /data/reference_db/kallisto
✅ Kallisto-BUStools	0.27.3	www.kallistobus.tools	Near-optimal probabilistic RNA-seq quantification and Modular, efficient and constant-memory single-cell RNA-seq preprocessing	single cell	`kb`	A great alternative to CellRanger for preprocessing single cell data from the 10x platform.	prebuilt kallisto indexes for a few species in /data/reference_db/kallisto
✅ KneadData	0.12.0	huttenhower.sph.harvard.edu	unpublished	decontamination	`kneaddata`	If you want to remove 'contaminating' reads from a fastq file. Uses bowtie2 under the hood
✅ Kraken2	2.0.7-beta	ccb.jhu.edu	Kraken: ultrafast metagenomic sequence classification using exact alignments and Improved metagenomic analysis with Kraken 2	metagenomicsclassification	`conda activate kraken` then`kraken2`		`/data/reference_db/kraken2db_standard/`
Krakenuniq	0.5.8	github.com	KrakenUniq: confident and fast metagenomics classification using unique k -mer counts	metagenomicsclassification	`conda activate kraken` then`krakenuniq`
✅ MACS3	3.0.0a6	macs3-project.github.io	Model-based Analysis of ChIP-Seq (MACS) and Improved peak-calling with MACS2	Epigenetics	`conda activate macs3` then `macs3`	Anytime you have ATAC-seq or ChIP-seq data and want to identify 'peaks' or read pile-ups at specific positions in the genome
✅ marker_alignments	0.4.2	github.com	Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes	metagenomicsclassification	`marker_alignments`	If you want to find microbial eukaryotes in metagenomic data	the EukDetect database used by this program lives in `/data/reference_db/eukdetect`
✅ Mastiff	0.0.3	github.com	unpublished	metagenomicspublic data	`mastiff`
✅ MaxBin2	2.2.7	sourceforge.net	MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm	assembly	`run_MaxBin.pl`
✅ MEGAHIT	1.2.9	github.com	An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph and A Fast and Scalable Metagenome Assembler driven by Advanced Methodologies and Community Practices	assemblymetagenomics	`megahit`	If you want to assemble genomes from metagenomic data
✅ MetaPhlAn4	4.06	huttenhower.sph.harvard.edu	Metagenomic microbial community profiling using unique clade-specific marker genes and Extending and improving metagenomic taxonomic profiling with uncharacterized species with MetaPhlAn 4	metagenomicsclassification	`conda activate biobakery` and then `metaphlan`		`/data/reference_db/biobakery`
✅ Micro	2.0.11	micro-editor.github.io	unpublished	utility	`micro`	anytime you need to edit a text file in the terminal….it’s far better than vim or nano!
✅ MosDepth	0.3.4	github.com	Mosdepth: quick coverage calculation for genomes and exomes	NGS tools	`mosdepth` and `plot-dist.py` for plotting
✅ MultiQC	1.14	multiqc.info	unpublished	QA/QC	`multiqc`	Our preferred choice for quickly and easily summarizing QC metrics, as well as outputs from MANY other programs, in a convenient html report
✅ Nextflow	23.04.2.5870	www.nextflow.io	unpublished	workflow management	`nextflow`	If you want to set up an automated workflow on our server
✅ nf-core	2.10	nf-co.re	The nf-core framework for community-curated bioinformatics pipelines	workflow management	`nf-core`
✅ nvitop	1.1.2	github.com	unpublished	utilityGPU	`pipx run nvitop`
✅ nvtop	3.0.1	github.com	unpublished	utilityGPU	`nvtop`
✅ Picard tools	3.0.0	broadinstitute.github.io	unpublished	NGS tools	`java -jar /usr/local/bin/picard.jar`	One of the main places we use this is for filtering out PCR duplicates in our ATAC-seq workflow
✅ Plink	1.07	www.cog-genomics.org	Second-generation PLINK: rising to the challenge of larger and richer datasets	comparative genomics	`plink`	Used for GWAS and other popgen analyses
✅ Porechop	0.2.4 (no longer maintained/supported)	github.com	unpublished	QA/QCnanopore	`porechop`	When you have Nanopore reads and you want to trim off the adapter sequence
✅ Prokka	1.14.6	github.com	Prokka: rapid prokaryotic genome annotation	annotation	`prokka`	Great for quickly (and accurately) annotating a bacterial genome
✅ QIIME2	2023.5.1	qiime2.org	Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2	16S	`source activate qiime2-2023.5` and then `qiime`	Anytime you want to figure out microbial community composition from 16S data
✅ QIIME1	1.9.1	qiime.org	QIIME allows analysis of high-throughput community sequencing data	16S	`source activate qiime1` and then `qiime`
✅ Rosella	0.4.2	rhysnewell.github.io	unpublished	metagenomicsbinningMAGs
✅ rust	1.26.0	www.rust-lang.org		programming language	`rustup` or `cargo` or `rustc`
✅ samtools	1.16.1	samtools.sourceforge.net	The Sequence Alignment/Map format and SAMtools	NGS tools	`samtools`	A powerful suite of tools for working with aligment files (bam, sam, etc)
✅ seqtk	1.3-r106	github.com		working with fasta/fastq	`seqtk`	I use this anytime I want to quickly subsample a fastq file
✅ seqKit	2.3.0	bioinf.shenwei.me	SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation	working with fasta/fastq	`seqkit`	Anytime you need to manipulate a fastq/a file. Some overlap in functionality with seqtk
✅ snpEff	5.1d	pcingola.github.io	unpublished	SNPs/INDELs	`java -jar /usr/local/bin/snpEff/snpEff.jar` for snpEff and `java -jar /usr/local/bin/snpEff/SnpSift.jar` for snpSift
✅ SPAdes	3.15.4	cab.spbu.ru	SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing	assemblymetagenomics	`spades.py` [options] -o <output_dir>	If you have a metagenomic sequencing data and want to assemble microbial genomes de novo
✅ SWGA2	1.0.0	github.com	A fast machine-learning-guided primer design pipeline for selective whole genome amplification	metagenomics	`conda activate swga2` then `soapswga`	when you want to design primers for carrying out selective whole genome amplification (SWGA)
✅ Sourmash	4.8.2	sourmash.readthedocs.io	sourmash: a library for MinHash sketching of DNA	metagenomicsclassification	`sourmash`	Fantastic software that takes an alignment-free approach to compare two or more fastq files to each other, or to all of refseq or genbank to understand what organisms might be present in the data.
✅ Spaceranger	2.1.1	www.10xgenomics.com	unpublished	single cell	`spaceranger`	When you have spatial gene expression from the Visium 10x platform
✅ SRA toolkit	3.0.5	github.com		public data	`fasterq_dump`, `sam-dump`, and more
✅ STAR	2.7.10b	github.com	STAR: Ultrafast Universal RNA-seq Aligner	read alignment	`STAR` (all caps) or `STARlong` (for aligning long reads)	Very fast and popular base-wise aligner	prebuilt STAR indexes for several species present in /data/reference_db/star
✅ StrainPhlAn	4.0.6	segatalab.cibio.unitn.it	Microbial strain-level population structure and genetic diversity from metagenomes	metagenomicsclassification	`conda activate biobakery` and then `strainphlan`
✅ Sunbeam	4.1.0	sunbeam.readthedocs.io	Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments	metagenomicsclassification	`conda activate sunbeam4.1.0`		Sunbeam: How to Set-up / Run
✅ Trimmomatic	0.39	www.usadellab.org	Trimmomatic: a flexible trimmer for Illumina sequence data	QA/QC	`java -jar /usr/local/bin/Trimmomatic/trimmomatic-0.39.jar`	Anytime you need to trim or filter raw reads from a fastq file based on base quality scores or length
✅ Unicycler	0.5.0	github.com	Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads	assembly	`unicycler`	If you have short (Illuminati) and long (Nanopore or PacBio) reads from a bacterial isolate and want to get a complete genome assembly
✅ VCFtools	0.1.17	github.com		variants	`vcftools`
✅ velocyto	0.17	velocyto.org	RNA velocity of single cells	single cell	`veloctyo`