Software we use
💻

Software we use

We have a lot of software already installed on the server that covers applications ranging from QC analysis and preprocessing of raw sequence data, transcriptome analysis from RNAseq data, 16S and shotgun metagenomics pipelines, WGS tools, and more. If you have an account on our cluster, then you already have access to all of the software below, so get started!

If you’re looking for a piece of software and don’t find it below, just reach out to Dan Beiting to inquire about getting it installed.

NOTE: When available, I’ve included the appropriate publication/reference for each piece of software. Please cite the authors if you use their software! if a piece of software has not been published, you should cite the github repo or software website

List of software and uses

software
Version
Website
Citation
category
how to run
When to use
Reference files or databases
anvi’o

8.0

merenlab.org

Community-led, integrated, reproducible multi-omics with anvi’o and Anvi’o: an advanced analysis and visualization platform for ‘omics data

metagenomicsvisualizationMAGs

conda activate anvio-7.1

When you're ready to dive into Metagenome Assembled Genomes (MAGs).

Amazon Web Services Command Line Interface (AWS CLI)

2.12.6

docs.aws.amazon.com

unpublished

utility

aws

When you want to get reference genomes from the Illumina iGenomes project: https://ewels.github.io/AWS-iGenomes/

BaseSpace Sequence Hub CLI tool suite

1.5.3

developer.basespace.illumina.com

unpublished

NGS tools

bs

bcftools

1.18

samtools.github.io

Twelve years of SAMtools and BCFtools

NGS tools

conda activate bcftools then bcftools

bcl2fastqb

2.20.0.422

support.illumina.com

unpublished

NGS tools

bcl2fastq -R $rundirectory -o $outdirectory --sample-sheet $samplesheet.csv --no-lane-splitting

bcl-convert

4.2.7

emea.support.illumina.com

unpublished

NGS tools

bcl-convert

bedtools

2.31.0

bedtools.readthedocs.io

BEDTools: a flexible suite of utilities for comparing genomic features

NGS tools

bedtools

Anytime you want to calculate genomic metrics from sequence data (e.g. coverage)

BLAST

2.12.0

blast.ncbi.nlm.nih.gov

Basic Local Alignment Search Tool

sequence search

blastn, blastp, or blastx

bowtie2

2.5.1

bowtie-bio.sourceforge.net

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome and Fast gapped-read alignment with Bowtie2

read alignment

bowtie2

One of the best and most popular base-wise aligners. Even if you don't use it as your primary aligner, it is still used by many other software tools under the hood.

prebuilt bowtie2 indexes for many species are located in /data/reference_db in folders named by genus and species

BWA

0.7.17-r1188

bio-bwa.sourceforge.net

Fast and accurate short read alignment with Burrows–Wheeler transform

read alignment

bwa

I don't use this directly, but used by other programs for alignment

CellxGene gateway

0.3.11

github.com

unpublished

single cell

cellxgene-gateway

Allows us to host a cellxgene instance that works with multiple datasets

/data/reference_db/cellxgene_data

CellRanger

9

support.10xgenomics.com

single cell

cellranger

If you want to preprocess single cell genomic data from the 10x platform

CellRanger-arc

2.0.2

support.10xgenomics.com

single cell

cellranger-arc

If you want to preprocess single cell genomic data from the 10x platform

CheckM

1.1.3

ecogenomics.github.io

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

metagenomicsQA/QC

conda activate checkm then checkm lineage_wf

If you have a bacterial genome assembly and want to check the quality of the assembly

/data/reference_db/checkm

Clust

1.18.0

github.com

Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

transcriptomics

clust

If you have an RNAseq dataset with multiple timepoints and want to identify 'tight' modules of co-regulated genes across these timepoints. Clust also allows comparison of modules between datasets/experiments.

deeptools

3.5.2

deeptools.readthedocs.io

deepTools2: A next Generation Web Server for Deep-Sequencing Data Analysis

visualizationNGS tools

deeptools

DIAMOND

2.1.8

www.diamondsearch.org

Fast and sensitive protein alignment using DIAMOND

multiple sequence alignment

diamond

If you have a bunch of protein AA or translated DNA sequences that you want to align.

diamond formatted databases for UniRef90 and UniRef50 live in /data/reference_db/uniref

Docker

24.0.2, build cb74dfc

www.docker.com

unpublished

containerized software

docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Dorado

1.1.1

github.com

unpublished

nanoporebasecallingGPU

dorado

Fastp

0.23.4

github.com

fastp: an ultra-fast all-in-one FASTQ preprocessor

QA/QC

fastp

FastQC

0.12.1

www.bioinformatics.babraham.ac.uk

unpublished

QA/QC

fastqc

The preferred choice for rapid quality control assessment of raw reads in a fastq file

FastQ Screen

0.15.3

www.bioinformatics.babraham.ac.uk

FastQ Screen: A tool for multi-genome mapping and quality control

decontaminationQA/QC

fastq_screen

simple tool for figuring out if you fastq file has 'contaminating' reads from specific species. uses bowtie2 under the hood.

Filezilla

3.63.0

filezilla-project.org

unpublished

utility

filezilla

Filtlong

0.2.1

github.com

unpublished

QA/QCnanopore

filtlong

If you have Oxford Nanopore long read data and want to filter your raw data to remove reads based on length or quality

Freebayes

1.3.6

github.com

Haplotype-based variant detection from short-read sequencing

variantsSNPs/INDELs

freebayes

For variant calling

GATK

4.6.2

gatk.broadinstitute.org

variants

gatk

When you are working with SNPs/variants

Grabseqs

0.7.0

github.com

grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories

public data

grabseqs

A convenient wrapper around the fasterq_dump software that makes it easy to grab sequences from SRA, ENA, MGRAST and iMicrobe

GTDB-TK

2.1.1

ecogenomics.github.io

GTDB-Tk2: memory friendly classification with the genome taxonomy database and GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database

metagenomicsclassification

conda activate gtdb may need to run export GTDBTK_DATA_PATH=/data/reference_db/GTDB-Tk/release214 to make sure your environment ‘sees’ the reference database

/data/reference_db/GTDB-Tk

Humann3

3.7

huttenhower.sph.harvard.edu

Species-level functional profiling of metagenomes and metatranscriptomes

metagenomicsfunctional profiling

You have shotgun metagenomic data from a microbial community and want to understand functional content (e.g. bacterial metabolic pathways). Note that humann2 uses DIAMOND, MinPath and Bowtie2 under the hood

Humann: How to Set up / Run on Paired Files

htop

3.2.2

htop.dev

unpublished

utility

htop

iRep

1.10

github.com

Measurement of bacterial replication rates in microbial communities

metagenomics

iRep or bPTR

Jellyfish

2.3.1

genome.umd.edu

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers

NGS tools

jellyfish

For rapid/efficient counting of kmers in DNA

Kallisto

0.50.1

pachterlab.github.io

Near-optimal probabilistic RNA-seq quantification

transcriptomicsread alignment

kallisto

Our preferred choice for mapping RNA-seq raw reads to a reference transcriptome

prebuilt kallisto indexes for a few species in /data/reference_db/kallisto

Kallisto-BUStools

0.27.3

www.kallistobus.tools

Near-optimal probabilistic RNA-seq quantification and Modular, efficient and constant-memory single-cell RNA-seq preprocessing

single cell

kb

A great alternative to CellRanger for preprocessing single cell data from the 10x platform.

prebuilt kallisto indexes for a few species in /data/reference_db/kallisto

KneadData

0.12.0

huttenhower.sph.harvard.edu

unpublished

decontamination

kneaddata

If you want to remove 'contaminating' reads from a fastq file. Uses bowtie2 under the hood

Kraken2

2.0.7-beta

ccb.jhu.edu

Kraken: ultrafast metagenomic sequence classification using exact alignments and Improved metagenomic analysis with Kraken 2

metagenomicsclassification

conda activate kraken thenkraken2

/data/reference_db/kraken2db_standard/

Krakenuniq

0.5.8

github.com

KrakenUniq: confident and fast metagenomics classification using unique k-mer counts

metagenomicsclassification

conda activate kraken thenkrakenuniq

MACS3

3.0.0a6

macs3-project.github.io

Model-based Analysis of ChIP-Seq (MACS) and Improved peak-calling with MACS2

Epigenetics

conda activate macs3 then macs3

Anytime you have ATAC-seq or ChIP-seq data and want to identify 'peaks' or read pile-ups at specific positions in the genome

marker_alignments

0.4.2

github.com

Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes

metagenomicsclassification

marker_alignments

If you want to find microbial eukaryotes in metagenomic data

the EukDetect database used by this program lives in /data/reference_db/eukdetect

Mastiff

0.0.3

github.com

unpublished

metagenomicspublic data

mastiff

MaxBin2

2.2.7

sourceforge.net

MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm

assembly

run_MaxBin.pl

MEGAHIT

1.2.9

github.com

An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph and A Fast and Scalable Metagenome Assembler driven by Advanced Methodologies and Community Practices

assemblymetagenomics

megahit

If you want to assemble genomes from metagenomic data

MetaPhlAn4

4.06

huttenhower.sph.harvard.edu

Metagenomic microbial community profiling using unique clade-specific marker genes and Extending and improving metagenomic taxonomic profiling with uncharacterized species with MetaPhlAn 4

metagenomicsclassification

conda activate biobakery and then metaphlan

/data/reference_db/biobakery

Micro

2.0.11

micro-editor.github.io

unpublished

utility

micro

anytime you need to edit a text file in the terminal….it’s far better than vim or nano!

MosDepth

0.3.4

github.com

Mosdepth: quick coverage calculation for genomes and exomes

NGS tools

mosdepth and plot-dist.py for plotting

MultiQC

1.14

multiqc.info

unpublished

QA/QC

multiqc

Our preferred choice for quickly and easily summarizing QC metrics, as well as outputs from MANY other programs, in a convenient html report

Nextflow

23.04.2.5870

www.nextflow.io

unpublished

workflow management

nextflow

If you want to set up an automated workflow on our server

nf-core

2.10

nf-co.re

The nf-core framework for community-curated bioinformatics pipelines

workflow management

nf-core

nvitop

1.1.2

github.com

unpublished

utilityGPU

pipx run nvitop

nvtop

3.0.1

github.com

unpublished

utilityGPU

nvtop

Picard tools

3.0.0

broadinstitute.github.io

unpublished

NGS tools

java -jar /usr/local/bin/picard.jar

One of the main places we use this is for filtering out PCR duplicates in our ATAC-seq workflow

Plink

1.07

www.cog-genomics.org

Second-generation PLINK: rising to the challenge of larger and richer datasets

comparative genomics

plink

Used for GWAS and other popgen analyses

Porechop

0.2.4 (no longer maintained/supported)

github.com

unpublished

QA/QCnanopore

porechop

When you have Nanopore reads and you want to trim off the adapter sequence

Prokka

1.14.6

github.com

Prokka: rapid prokaryotic genome annotation

annotation

prokka

Great for quickly (and accurately) annotating a bacterial genome

QIIME2

2023.5.1

qiime2.org

Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2

16S

source activate qiime2-2023.5 and then qiime

Anytime you want to figure out microbial community composition from 16S data

QIIME1

1.9.1

qiime.org

QIIME allows analysis of high-throughput community sequencing data

16S

source activate qiime1 and then qiime

Rosella

0.4.2

rhysnewell.github.io

unpublished

metagenomicsbinningMAGs

rust

1.26.0

www.rust-lang.org

programming language

rustup or cargo or rustc

samtools

1.16.1

samtools.sourceforge.net

The Sequence Alignment/Map format and SAMtools

NGS tools

samtools

A powerful suite of tools for working with aligment files (bam, sam, etc)

seqtk

1.3-r106

github.com

working with fasta/fastq

seqtk

I use this anytime I want to quickly subsample a fastq file

seqKit

2.3.0

bioinf.shenwei.me

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation

working with fasta/fastq

seqkit

Anytime you need to manipulate a fastq/a file. Some overlap in functionality with seqtk

snpEff

5.1d

pcingola.github.io

unpublished

SNPs/INDELs

java -jar /usr/local/bin/snpEff/snpEff.jar for snpEff and java -jar /usr/local/bin/snpEff/SnpSift.jar for snpSift

SPAdes

3.15.4

cab.spbu.ru

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

assemblymetagenomics

spades.py [options] -o <output_dir>

If you have a metagenomic sequencing data and want to assemble microbial genomes de novo

SWGA2

1.0.0

github.com

A fast machine-learning-guided primer design pipeline for selective whole genome amplification

metagenomics

conda activate swga2 then soapswga

when you want to design primers for carrying out selective whole genome amplification (SWGA)

Sourmash

4.8.2

sourmash.readthedocs.io

sourmash: a library for MinHash sketching of DNA

metagenomicsclassification

sourmash

Fantastic software that takes an alignment-free approach to compare two or more fastq files to each other, or to all of refseq or genbank to understand what organisms might be present in the data.

Spaceranger

2.1.1

www.10xgenomics.com

unpublished

single cell

spaceranger

When you have spatial gene expression from the Visium 10x platform

SRA toolkit

3.0.5

github.com

public data

fasterq_dump, sam-dump, and more

STAR

2.7.10b

github.com

STAR: Ultrafast Universal RNA-seq Aligner

read alignment

STAR (all caps) or STARlong (for aligning long reads)

Very fast and popular base-wise aligner

prebuilt STAR indexes for several species present in /data/reference_db/star

StrainPhlAn

4.0.6

segatalab.cibio.unitn.it

Microbial strain-level population structure and genetic diversity from metagenomes

metagenomicsclassification

conda activate biobakery and then strainphlan

Sunbeam

4.1.0

sunbeam.readthedocs.io

Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments

metagenomicsclassification

conda activate sunbeam4.1.0

Sunbeam: How to Set-up / Run

Trimmomatic

0.39

www.usadellab.org

Trimmomatic: a flexible trimmer for Illumina sequence data

QA/QC

java -jar /usr/local/bin/Trimmomatic/trimmomatic-0.39.jar

Anytime you need to trim or filter raw reads from a fastq file based on base quality scores or length

Unicycler

0.5.0

github.com

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

assembly

unicycler

If you have short (Illuminati) and long (Nanopore or PacBio) reads from a bacterial isolate and want to get a complete genome assembly

VCFtools

0.1.17

github.com

variants

vcftools

velocyto

0.17

velocyto.org

RNA velocity of single cells

single cell

veloctyo

Autocycler

0.5.2

Latest one stop bacterial genome assembler from Unicycler authors

assembly

R packages installed, available via Posit Workbench

Package
Version
ArchR
1.0.3a
Azimuth
0.5.0
Banksy
1.4.0
Biobase
 2.68.0
BiocGenerics
 0.54.0
BiocManager
 1.30.26
Biostrings
2.76.0
BPCells
0.3.1
BSgenome.Hsapiens.UCSC.hg38
1.4.5
BSgenome.Mmusculus.UCSC.mm10
1.4.3
CellChat
2.2.0
celldex
1.18.0
ChIPseeker
1.44.0
circlize
0.4.16
cmdstanr
0.9.0.9000
ComplexHeatmap
2.24.1
cowplot
1.2.0
data.tree
1.2.0
DESeq2
1.48.2
devtools
 2.4.5
DoubletFinder
2.0.6
dplyr
1.1.4
dsb
2.0.0
edgeR
4.6.3
GenomicRanges
 1.60.0
ggbreak
0.1.6
ggplot2
4.0.0
ggsci
3.2.0
gt
1.0.0
harmony
1.2.3
Herper
1.18.0
INLA
25.6.13
IRangers
 2.42.0
liana
0.1.14
limma
3.64.3
lsa
0.73.3
miloR
2.4.1
monocle3
1.4.26
patchwork
1.3.2
pheatmap
1.0.13
presto
1.0.0
qlcMatrix
0.9.9
qs2
0.1.5
RColorBrewer
1.1.3
S4Vectors
 0.46.0
scales
1.4.0
scCustomize
3.2.0
sceasy
0.0.7
scico
1.5.0
scRepertoire
2.4.0
Seurat
5.3.0
SeuratDisk
0.0.0.9021
SeuratExtend
1.2.5
SeuratWrappers
0.4.0
Signac
1.15.0
SignatuR
0.3.0
SingleCellExperiment
1.30.1
SingleR
2.10.0
slingshot
2.16.0
SummarizedExperiment
1.38.1
tidyverse
2.0.0
tradeSeq
1.22.0
vegan
2.8.0
viridis
0.6.5

Python packages installed, available via Posit Workbench

Package
Python Version
scanpy
3.12.4
squidpy
3.12.4
muon
3.12.4
seaborn
3.12.4
plotnine
3.12.4
bbknn
3.12.4
h5py
3.12.4
scvelo
3.12.4
scrublet
3.12.4
cellphonedb
3.12.4
celltypist
3.12.4
leidenalg
3.12.4
scanorama
3.12.4