software | Version | Website | Citation | category | how to run | When to use | Reference files or databases | |
---|---|---|---|---|---|---|---|---|
anviβo | 7.1 | merenlab.org | Community-led, integrated, reproducible multi-omics with anviβo and Anviβo: an advanced analysis and visualization platform for βomics data | metagenomicsvisualizationMAGs |
| When you're ready to dive into Metagenome Assembled Genomes (MAGs). | ||
Amazon Web Services Command Line Interface (AWS CLI) | 2.12.6 | docs.aws.amazon.com | unpublished | utility |
| When you want to get reference genomes from the Illumina iGenomes project: https://ewels.github.io/AWS-iGenomes/ | ||
BaseSpace Sequence Hub CLI tool suite | 1.5.3 | developer.basespace.illumina.com | unpublished | NGS tools |
| |||
bcftools | 1.18 | samtools.github.io | Twelve years of SAMtools and BCFtools | NGS tools | | |||
bcl2fastqb | 2.20.0.422 | support.illumina.com | unpublished | NGS tools | bcl2fastq -R $rundirectory -o $outdirectory --sample-sheet $samplesheet.csv --no-lane-splitting | |||
bcl-convert | 4.2.7 | emea.support.illumina.com | unpublished | NGS tools | bcl-convert | |||
bedtools | 2.31.0 | bedtools.readthedocs.io | BEDTools: a flexible suite of utilities for comparing genomic features | NGS tools |
| Anytime you want to calculate genomic metrics from sequence data (e.g. coverage) | ||
BLAST | 2.12.0 | blast.ncbi.nlm.nih.gov | Basic Local Alignment Search Tool | sequence search |
| |||
bowtie2 | 2.5.1 | bowtie-bio.sourceforge.net | Ultrafast and memory-efficient alignment of short DNA sequences to the human genome and Fast gapped-read alignment with Bowtie2 | read alignment |
| One of the best and most popular base-wise aligners. Even if you don't use it as your primary aligner, it is still used by many other software tools under the hood. | prebuilt bowtie2 indexes for many species are located in /data/reference_db in folders named by genus and species | |
BWA | 0.7.17-r1188 | bio-bwa.sourceforge.net | Fast and accurate short read alignment with BurrowsβWheeler transform | read alignment |
| I don't use this directly, but used by other programs for alignment | ||
CellxGene gateway | 0.3.11 | github.com | unpublished | single cell |
| Allows us to host a cellxgene instance that works with multiple datasets |
| |
CellRanger | 7.1.0 | support.10xgenomics.com | single cell |
| If you want to preprocess single cell genomic data from the 10x platform | |||
CellRanger-arc | 2.0.2 | support.10xgenomics.com | single cell |
| If you want to preprocess single cell genomic data from the 10x platform | |||
CheckM | 1.1.3 | ecogenomics.github.io | CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes | metagenomicsQA/QC |
| If you have a bacterial genome assembly and want to check the quality of the assembly |
| |
Clust | 1.18.0 | github.com | Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data | transcriptomics |
| If you have an RNAseq dataset with multiple timepoints and want to identify 'tight' modules of co-regulated genes across these timepoints. Clust also allows comparison of modules between datasets/experiments. | ||
deeptools | 3.5.2 | deeptools.readthedocs.io | deepTools2: A next Generation Web Server for Deep-Sequencing Data Analysis | visualizationNGS tools |
| |||
DIAMOND | 2.1.8 | www.diamondsearch.org | Fast and sensitive protein alignment using DIAMOND | multiple sequence alignment |
| If you have a bunch of protein AA or translated DNA sequences that you want to align. | diamond formatted databases for UniRef90 and UniRef50 live in | |
Docker | 24.0.2, build cb74dfc | www.docker.com | unpublished | containerized software |
| |||
Dorado | 0.3.1+bb8c5ee | github.com | unpublished | nanoporebasecallingGPU |
| |||
Fastp | 0.23.4 | github.com | fastp: an ultra-fast all-in-one FASTQ preprocessor | QA/QC |
| |||
FastQC | 0.12.1 | www.bioinformatics.babraham.ac.uk | unpublished | QA/QC |
| The preferred choice for rapid quality control assessment of raw reads in a fastq file | ||
FastQ Screen | 0.15.3 | www.bioinformatics.babraham.ac.uk | FastQ Screen: A tool for multi-genome mapping and quality control | decontaminationQA/QC |
| simple tool for figuring out if you fastq file has 'contaminating' reads from specific species. uses bowtie2 under the hood. | ||
Filezilla | 3.63.0 | filezilla-project.org | unpublished | utility |
| |||
Filtlong | 0.2.1 | github.com | unpublished | QA/QCnanopore |
| If you have Oxford Nanopore long read data and want to filter your raw data to remove reads based on length or quality | ||
Freebayes | 1.3.6 | github.com | Haplotype-based variant detection from short-read sequencing | variantsSNPs/INDELs |
| For variant calling | ||
GATK | 4.4.0.0 | gatk.broadinstitute.org | variants |
| When you are working with SNPs/variants | |||
Grabseqs | 0.7.0 | github.com | grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories | public data |
| A convenient wrapper around the fasterq_dump software that makes it easy to grab sequences from SRA, ENA, MGRAST and iMicrobe | ||
GTDB-TK | 2.1.1 | ecogenomics.github.io | GTDB-Tk2: memory friendly classification with the genome taxonomy database and GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database | metagenomicsclassification |
|
| ||
Humann3 | 3.7 | huttenhower.sph.harvard.edu | Species-level functional profiling of metagenomes and metatranscriptomes | metagenomicsfunctional profiling | You have shotgun metagenomic data from a microbial community and want to understand functional content (e.g. bacterial metabolic pathways). Note that humann2 uses DIAMOND, MinPath and Bowtie2 under the hood | Humann: How to Set up / Run on Paired Files | ||
htop | 3.2.2 | htop.dev | unpublished | utility |
| |||
iRep | 1.10 | github.com | Measurement of bacterial replication rates in microbial communities | metagenomics |
| |||
Jellyfish | 2.3.1 | genome.umd.edu | A fast, lock-free approach for efficient parallel counting of occurrences ofΒ k-mers | NGS tools |
| For rapid/efficient counting of kmers in DNA | ||
Kallisto | 0.50.1 | pachterlab.github.io | Near-optimal probabilistic RNA-seq quantification | transcriptomicsread alignment |
| Our preferred choice for mapping RNA-seq raw reads to a reference transcriptome | prebuilt kallisto indexes for a few species in /data/reference_db/kallisto | |
Kallisto-BUStools | 0.27.3 | www.kallistobus.tools | Near-optimal probabilistic RNA-seq quantification and Modular, efficient and constant-memory single-cell RNA-seq preprocessing | single cell |
| A great alternative to CellRanger for preprocessing single cell data from the 10x platform. | prebuilt kallisto indexes for a few species in /data/reference_db/kallisto | |
KneadData | 0.12.0 | huttenhower.sph.harvard.edu | unpublished | decontamination |
| If you want to remove 'contaminating' reads from a fastq file. Uses bowtie2 under the hood | ||
Kraken2 | 2.0.7-beta | ccb.jhu.edu | Kraken: ultrafast metagenomic sequence classification using exact alignments and Improved metagenomic analysis with Kraken 2 | metagenomicsclassification |
|
| ||
Krakenuniq | 0.5.8 | github.com | KrakenUniq: confident and fast metagenomics classification using uniqueΒ k-mer counts | metagenomicsclassification |
| |||
MACS3 | 3.0.0a6 | macs3-project.github.io | Model-based Analysis of ChIP-Seq (MACS) and Improved peak-calling with MACS2 | Epigenetics |
| Anytime you have ATAC-seq or ChIP-seq data and want to identify 'peaks' or read pile-ups at specific positions in the genome | ||
marker_alignments | 0.4.2 | github.com | Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes | metagenomicsclassification |
| If you want to find microbial eukaryotes in metagenomic data | the EukDetect database used by this program lives in | |
Mastiff | 0.0.3 | github.com | unpublished | metagenomicspublic data |
| |||
MaxBin2 | 2.2.7 | sourceforge.net | MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm | assembly |
| |||
MEGAHIT | 1.2.9 | github.com | An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph and A Fast and Scalable Metagenome Assembler driven by Advanced Methodologies and Community Practices | assemblymetagenomics |
| If you want to assemble genomes from metagenomic data | ||
MetaPhlAn4 | 4.06 | huttenhower.sph.harvard.edu | Metagenomic microbial community profiling using unique clade-specific marker genes and Extending and improving metagenomic taxonomic profiling with uncharacterized species with MetaPhlAn 4 | metagenomicsclassification |
|
| ||
Micro | 2.0.11 | micro-editor.github.io | unpublished | utility |
| anytime you need to edit a text file in the terminalβ¦.itβs far better than vim or nano! | ||
MosDepth | 0.3.4 | github.com | Mosdepth: quick coverage calculation for genomes and exomes | NGS tools |
| |||
MultiQC | 1.14 | multiqc.info | unpublished | QA/QC |
| Our preferred choice for quickly and easily summarizing QC metrics, as well as outputs from MANY other programs, in a convenient html report | ||
Nextflow | 23.04.2.5870 | www.nextflow.io | unpublished | workflow management |
| If you want to set up an automated workflow on our server | ||
nf-core | 2.10 | nf-co.re | The nf-core framework for community-curated bioinformatics pipelines | workflow management |
| |||
nvitop | 1.1.2 | github.com | unpublished | utilityGPU |
| |||
nvtop | 3.0.1 | github.com | unpublished | utilityGPU |
| |||
Picard tools | 3.0.0 | broadinstitute.github.io | unpublished | NGS tools |
| One of the main places we use this is for filtering out PCR duplicates in our ATAC-seq workflow | ||
Plink | 1.07 | www.cog-genomics.org | Second-generation PLINK: rising to the challenge of larger and richer datasets | comparative genomics |
| Used for GWAS and other popgen analyses | ||
Porechop | 0.2.4 (no longer maintained/supported) | github.com | unpublished | QA/QCnanopore |
| When you have Nanopore reads and you want to trim off the adapter sequence | ||
Prokka | 1.14.6 | github.com | Prokka: rapid prokaryotic genome annotation | annotation |
| Great for quickly (and accurately) annotating a bacterial genome | ||
QIIME2 | 2023.5.1 | qiime2.org | Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 | 16S |
| Anytime you want to figure out microbial community composition from 16S data | ||
QIIME1 | 1.9.1 | qiime.org | QIIME allows analysis of high-throughput community sequencing data | 16S |
| |||
Rosella | 0.4.2 | rhysnewell.github.io | unpublished | metagenomicsbinningMAGs | ||||
rust | 1.26.0 | www.rust-lang.org | programming language |
| ||||
samtools | 1.16.1 | samtools.sourceforge.net | The Sequence Alignment/Map format and SAMtools | NGS tools |
| A powerful suite of tools for working with aligment files (bam, sam, etc) | ||
seqtk | 1.3-r106 | github.com | working with fasta/fastq |
| I use this anytime I want to quickly subsample a fastq file | |||
seqKit | 2.3.0 | bioinf.shenwei.me | SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation | working with fasta/fastq |
| Anytime you need to manipulate a fastq/a file. Some overlap in functionality with seqtk | ||
snpEff | 5.1d | pcingola.github.io | unpublished | SNPs/INDELs |
| |||
SPAdes | 3.15.4 | cab.spbu.ru | SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing | assemblymetagenomics |
| If you have a metagenomic sequencing data and want to assemble microbial genomes de novo | ||
SWGA2 | 1.0.0 | github.com | A fast machine-learning-guided primer design pipeline for selective whole genome amplification | metagenomics |
| when you want to design primers for carrying out selective whole genome amplification (SWGA) | ||
Sourmash | 4.8.2 | sourmash.readthedocs.io | sourmash: a library for MinHash sketching of DNA | metagenomicsclassification |
| Fantastic software that takes an alignment-free approach to compare two or more fastq files to each other, or to all of refseq or genbank to understand what organisms might be present in the data. | ||
Spaceranger | 2.1.1 | www.10xgenomics.com | unpublished | single cell |
| When you have spatial gene expression from the Visium 10x platform | ||
SRA toolkit | 3.0.5 | github.com | public data |
| ||||
STAR | 2.7.10b | github.com | STAR: Ultrafast Universal RNA-seq Aligner | read alignment |
| Very fast and popular base-wise aligner | prebuilt STAR indexes for several species present in /data/reference_db/star | |
StrainPhlAn | 4.0.6 | segatalab.cibio.unitn.it | Microbial strain-level population structure and genetic diversity from metagenomes | metagenomicsclassification |
| |||
Sunbeam | 4.1.0 | sunbeam.readthedocs.io | Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments | metagenomicsclassification |
| Sunbeam: How to Set-up / Run | ||
Trimmomatic | 0.39 | www.usadellab.org | Trimmomatic: a flexible trimmer for Illumina sequence data | QA/QC |
| Anytime you need to trim or filter raw reads from a fastq file based on base quality scores or length | ||
Unicycler | 0.5.0 | github.com | Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads | assembly |
| If you have short (Illuminati) and long (Nanopore or PacBio) reads from a bacterial isolate and want to get a complete genome assembly | ||
VCFtools | 0.1.17 | github.com | variants |
| ||||
velocyto | 0.17 | velocyto.org | RNA velocity of single cells | single cell |
| |||