Software we use
πŸ’»

Software we use

We have a lot of software already installed on the server that covers applications ranging from QC analysis and preprocessing of raw sequence data, transcriptome analysis from RNAseq data, 16S and shotgun metagenomics pipelines, WGS tools, and more. If you have an account on our cluster, then you already have access to all of the software below, so get started!

If you’re looking for a piece of software and don’t find it below, just reach out to Dan Beiting to inquire about getting it installed.

NOTE: When available, I’ve included the appropriate publication/reference for each piece of software. Please cite the authors if you use their software! if a piece of software has not been published, you should cite the github repo or software website

List of software and uses

softwareVersionWebsiteCitationcategoryhow to runWhen to useReference files or databases
βœ…
anvi’o
7.1
metagenomicsvisualizationMAGs
conda activate anvio-7.1
When you're ready to dive into Metagenome Assembled Genomes (MAGs).
βœ…
Amazon Web Services Command Line Interface (AWS CLI)
2.12.6
unpublished
utility
aws
When you want to get reference genomes from the Illumina iGenomes project: https://ewels.github.io/AWS-iGenomes/
1.5.3
unpublished
NGS tools
bs
βœ…
bcftools
1.18
NGS tools
conda activate bcftools then bcftools
βœ…
bcl2fastq
v2.19.0.316
unpublished
NGS tools
conda activate bcl2fastq
2.31.0
NGS tools
bedtools
Anytime you want to calculate genomic metrics from sequence data (e.g. coverage)
βœ…
BLAST
2.12.0
sequence search
blastn, blastp, or blastx
2.5.1
read alignmentMicrobial ecology pipeline for 16S rRNA data
bowtie2
One of the best and most popular base-wise aligners. Even if you don't use it as your primary aligner, it is still used by many other software tools under the hood.
prebuilt bowtie2 indexes for many species are located in /data/reference_db in folders named by genus and species
0.7.17-r1188
read alignment
bwa
I don't use this directly, but used by other programs for alignment
βœ…
CellxGene gateway
0.3.11
unpublished
single cell
cellxgene-gateway
Allows us to host a cellxgene instance that works with multiple datasets
/data/reference_db/cellxgene_data
βœ…
CellRanger
7.1.0
single cell
cellranger
If you want to preprocess single cell genomic data from the 10x platform
βœ…
CellRanger-arc
2.0.2
single cell
cellranger-arc
If you want to preprocess single cell genomic data from the 10x platform
1.1.3
metagenomicsgenome assemblyQA/QC
conda activate checkm then checkm lineage_wf
If you have a bacterial genome assembly and want to check the quality of the assembly
/data/reference_db/checkm
1.18.0
transcriptomics
clust
If you have an RNAseq dataset with multiple timepoints and want to identify 'tight' modules of co-regulated genes across these timepoints. Clust also allows comparison of modules between datasets/experiments.
3.5.2
visualizationNGS tools
deeptools
2.1.8
multiple sequence alignment
diamond
If you have a bunch of protein AA or translated DNA sequences that you want to align.
diamond formatted databases for UniRef90 and UniRef50 live in /data/reference_db/uniref
βœ…
Docker
24.0.2, build cb74dfc
unpublished
containerized software
docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
βœ…
Dorado
0.3.1+bb8c5ee
unpublished
nanoporebasecallingGPU
dorado
0.23.4
QA/QC
fastp
0.12.1
unpublished
QA/QC
fastqc
The preferred choice for rapid quality control assessment of raw reads in a fastq file
0.15.3
decontaminationQA/QC
fastq_screen
simple tool for figuring out if you fastq file has 'contaminating' reads from specific species. uses bowtie2 under the hood.
βœ…
Filezilla
3.63.0
unpublished
utility
filezilla
0.2.1
unpublished
QA/QCnanopore
filtlong
If you have Oxford Nanopore long read data and want to filter your raw data to remove reads based on length or quality
1.3.6
variantsSNPs/INDELs
freebayes
For variant calling
4.4.0.0
variants
gatk
When you are working with SNPs/variants
0.7.0
public data
grabseqs
A convenient wrapper around the fasterq_dump software that makes it easy to grab sequences from SRA, ENA, MGRAST and iMicrobe
βœ…
GTDB-TK
2.1.1
metagenomicsclassification
conda activate gtdb may need to run export GTDBTK_DATA_PATH=/data/reference_db/GTDB-Tk/release214 to make sure your environment β€˜sees’ the reference database
/data/reference_db/GTDB-Tk
3.7
metagenomicsfunctional profiling
You have shotgun metagenomic data from a microbial community and want to understand functional content (e.g. bacterial metabolic pathways). Note that humann2 uses DIAMOND, MinPath and Bowtie2 under the hood
βœ…
htop
3.2.2
unpublished
utility
htop
1.10
metagenomics
iRep or bPTR
2.3.1
NGS tools
jellyfish
For rapid/efficient counting of kmers in DNA
0.50.1
transcriptomicsread alignment
kallisto
Our preferred choice for mapping RNA-seq raw reads to a reference transcriptome
prebuilt kallisto indexes for a few species in /data/reference_db/kallisto
0.27.3
single cell
kb
A great alternative to CellRanger for preprocessing single cell data from the 10x platform.
prebuilt kallisto indexes for a few species in /data/reference_db/kallisto
0.12.0
unpublished
decontamination
kneaddata
If you want to remove 'contaminating' reads from a fastq file. Uses bowtie2 under the hood
2.0.7-beta
metagenomicsclassification
conda activate kraken thenkraken2
/data/reference_db/kraken2db_standard/
Krakenuniq
0.5.8
metagenomicsclassification
conda activate kraken thenkrakenuniq
3.0.0a6
Epigenetics
conda activate macs3 then macs3
Anytime you have ATAC-seq or ChIP-seq data and want to identify 'peaks' or read pile-ups at specific positions in the genome
βœ…
marker_alignments
0.4.2
metagenomicsclassification
marker_alignments
If you want to find microbial eukaryotes in metagenomic data
the EukDetect database used by this program lives in /data/reference_db/eukdetect
βœ…
Mastiff
0.0.3
unpublished
metagenomicspublic data
mastiff
βœ…
MaxBin2
2.2.7
assembly
run_MaxBin.pl
1.2.9
assemblymetagenomics
megahit
If you want to assemble genomes from metagenomic data
4.06
metagenomicsclassification
conda activate biobakery and then metaphlan
/data/reference_db/biobakery
βœ…
Micro
2.0.11
unpublished
utility
micro
anytime you need to edit a text file in the terminal….it’s far better than vim or nano!
0.3.4
NGS tools
mosdepth and plot-dist.py for plotting
1.14
unpublished
QA/QC
multiqc
Our preferred choice for quickly and easily summarizing QC metrics, as well as outputs from MANY other programs, in a convenient html report
23.04.2.5870
unpublished
workflow management
nextflow
If you want to set up an automated workflow on our server
βœ…
nf-core
2.10
workflow management
nf-core
βœ…
nvitop
1.1.2
unpublished
utilityGPU
pipx run nvitop
βœ…
nvtop
3.0.1
unpublished
utilityGPU
nvtop
3.0.0
unpublished
NGS tools
java -jar /usr/local/bin/picard.jar
One of the main places we use this is for filtering out PCR duplicates in our ATAC-seq workflow
1.07
comparative genomics
plink
Used for GWAS and other popgen analyses
0.2.4 (no longer maintained/supported)
unpublished
QA/QCnanopore
porechop
When you have Nanopore reads and you want to trim off the adapter sequence
1.14.6
annotation
prokka
Great for quickly (and accurately) annotating a bacterial genome
2023.5.1
16S
source activate qiime2-2023.5 and then qiime
Anytime you want to figure out microbial community composition from 16S data
1.9.1
16S
source activate qiime1 and then qiime
βœ…
Rosella
0.4.2
unpublished
metagenomicsbinningMAGs
βœ…
rust
1.26.0
programming language
rustup or cargo or rustc
1.16.1
NGS tools
samtools
A powerful suite of tools for working with aligment files (bam, sam, etc)
1.3-r106
working with fasta/fastq
seqtk
I use this anytime I want to quickly subsample a fastq file
2.3.0
working with fasta/fastq
seqkit
Anytime you need to manipulate a fastq/a file. Some overlap in functionality with seqtk
5.1d
unpublished
SNPs/INDELs
java -jar /usr/local/bin/snpEff/snpEff.jar for snpEff and java -jar /usr/local/bin/snpEff/SnpSift.jar for snpSift
3.15.4
assemblymetagenomics
spades.py [options] -o <output_dir>
If you have a metagenomic sequencing data and want to assemble microbial genomes de novo
βœ…
SWGA2
1.0.0
metagenomics
conda activate swga2 then soapswga
when you want to design primers for carrying out selective whole genome amplification (SWGA)
4.8.2
metagenomicsclassification
sourmash
Fantastic software that takes an alignment-free approach to compare two or more fastq files to each other, or to all of refseq or genbank to understand what organisms might be present in the data.
2.1.1
unpublished
single cell
spaceranger
When you have spatial gene expression from the Visium 10x platform
3.0.5
public data
fasterq_dump, sam-dump, and more
2.7.10b
read alignment
STAR (all caps) or STARlong (for aligning long reads)
Very fast and popular base-wise aligner
prebuilt STAR indexes for several species present in /data/reference_db/star
βœ…
StrainPhlAn
4.0.6
metagenomicsclassification
conda activate biobakery and then strainphlan
4.1.0
metagenomicsclassification
conda activate sunbeam4.1.0
0.39
QA/QC
java -jar /usr/local/bin/Trimmomatic/trimmomatic-0.39.jar
Anytime you need to trim or filter raw reads from a fastq file based on base quality scores or length
0.5.0
assembly
unicycler
If you have short (Illuminati) and long (Nanopore or PacBio) reads from a bacterial isolate and want to get a complete genome assembly
βœ…
VCFtools
0.1.17
variants
vcftools
βœ…
velocyto
0.17
single cell
veloctyo