Software we use

Software we use

We have a lot of software already installed on the server that covers applications ranging from QC analysis and preprocessing of raw sequence data, transcriptome analysis from RNAseq data, 16S and shotgun metagenomics pipelines, WGS tools, and more. If you have an account on our cluster, then you already have access to all of the software below, so get started!

If you’re looking for a piece of software and don’t find it below, just reach out to Dan Beiting to inquire about getting it installed.

List of software and uses

softwareVersionWebsiteCitationcategoryhow to runWhen to useReference files or databases
anvi’o
7.0
metagenomicsvisualization
source activate anvio-7
When you're ready to dive into Metagenome Assembled Genomes (MAGs).
BamM
1.7.3
unpublished
NGS tools
bamm
Bandage
NA
genome assembly visualization
Bandage (must be sitting at the Linux to use, since it is a graphical interface)
BaseSpace Sequence Hub CLI tool suite
1.1.0
unpublished
NGS tools
bs
bcftools
NGS tools
bcftools
bedtools
2.26.0
NGS tools
bedtools
Anytime you want to calculate genomic metrics from sequence data (e.g. coverage)
BLAST
2.9.0
sequence search
option include blastn, blastp, or blastx
bowtie2
2.3.4.1
read alignmentMicrobial ecology pipeline for 16S rRNA data
bowtie2
One of the best and most popular base-wise aligners. Even if you don't use it as your primary aligner, it is still used by many other software tools under the hood.
prebuilt bowtie2 indexes for many species are located in /data/reference_db in folders named by genus and species
BWA
0.7.17-r1188
read alignment
bwa
I don't use this directly, but used by other programs for alignment
Circos
visualization
circos
CheckM
1.1.3
metagenomicsgenome assemblyQA/QC
checkm lineage_wf
If you have a bacterial genome assembly and want to check the quality of the assembly
uses checkm_db, which is located in /data/reference_db
Clust
1.8.10
transcriptomics
clust
If you have an RNAseq dataset with multiple timepoints and want to identify 'tight' modules of co-regulated genes across these timepoints. Clust also allows comparison of modules between datasets/experiments.
CONCOCT
1.1.0
metagenomics
concoct
When you have metagenomic data and want to put de novo assembled contigs into genome bins
Cytoscape
network analysis
navigate to /home/shared/softwares/Cytoscape_v3.5.1 folder. Double click to open program
deeptools
3.5.0
visualizationNGS tools
deeptools
DESMAN
metagenomics
desman
If you want to identify strains in your metagenomic data
DIAMOND
0.8.22.84
multiple sequence alignment
diamond
If you have a bunch of protein AA or translated DNA sequences that you want to align.
EMIRGE
metagenomics16S
emirge.py or emirge_amplicon.py
Fastp
0.20.1
QA/QC
fastp
FastQC
0.11.7
unpublished
QA/QC
fastqc
The preferred choice for rapid quality control assessment of raw reads in a fastq file
FastQ Screen
0.14.0
decontamination
fastq_screen
simple tool for figuring out if you fastq file has 'contaminating' reads from specific species. uses bowtie2 under the hood.
Filtlong
0.2.0
unpublished
QA/QC
filtlong
If you have Oxford Nanopore long read data and want to filter your raw data to remove reads based on length or quality
GATK
4.1.7.0
variants
gatk
When you are working with SNPs/variants
Grabseqs
0.7.0
public data
grabseqs
A convenient wrapper around the fasterq_dump software that makes it easy to grab sequences from SRA, ENA, MGRAST and iMicrobe
GraPhlAn
1.1.3
metagenomics visualization
graphlan.py or graphlan_annotate.py
If you want to create a visual link between microbiome samples, their phylogenetic relationship and sample or patient metadata
GroopM
0.3.4
metagenomics
groopm
HISAT2
Humann3
3.0.0.alpha.2
metagenomics
humann2
You have shotgun metagenomic data from a microbial community and want to understand functional content (e.g. bacterial metabolic pathways). Note that humann2 uses DIAMOND, MinPath and Bowtie2 under the hood
HTSeq
0.13.5
transcriptomics
htseq-count or htseq-qa
If you’ve already aligned data using a base-aware aligned (e.g. STAR or Bowtie2) and you want to summarize reads to genomic features (genes, exons, etc)
iRep
1.1.14
metagenomics
iRep or bPTR
Kallisto
0.46.0
transcriptomicsread alignment
kallisto
Our preferred choice for mapping RNA-seq raw reads to a reference transcriptome
prebuilt kallisto indexes for a few species in /data/reference_db/kallisto
KneadData
0.6.1
unpublished
decontamination
kneaddata
If you want to remove 'contaminating' reads from a fastq file. Uses bowtie2 under the hood
MinPath
biological pathway reconstructions using protein family predictions
MinPath1.4.py
Kraken2
2.0.6 beta
metagenomics
kraken2
kracken reference database is in /data/reference_db/kraken2db_standard/
MACS2
2.2.7.1
Epigenetics
macs
Anytime you have ATAC-seq or ChIP-seq data and want to identify 'peaks' or read pile-ups at specific positions in the genome
Mash
2.0
comparative genomics
mash
We don't use this as standalone software, but it is needed by Sourmash, which we use a lot
MEGAHIT
1.2.9
assemblymetagenomics
megahit
If you want to assemble genomes from metagenomic data
MetaPhlAn3
3.0
metagenomics
metaphlan
MosDepth
0.3.1
NGS tools
mosdepth and plot-dist.py for plotting
Mothur
1.44..1
16S
mothur
MultiQC
1.9
unpublished
QA/QC
multiqc
Our preferred choice for quickly and easily summarizing QC metrics, as well as outputs from MANY other programs, in a convenient html report
Nextflow
20.01.0.5264
unpublished
workflow management
nextflow
If you want to set up an automated workflow on our server
Picard tools
unpublished
NGS tools
java -jar /usr/local/bin/picard/build/libs/picard.jar
One of the main places we use this is for filtering out PCR duplicates in our ATAC-seq workflow
Porechop
0.2.4 (no longer maintained/supported)
unpublished
QA/QC
porechop
When you have Nanopore reads and you want to trim off the adapter sequence
Prokka
1.14.6
annotation
prokka
Great for quickly (and accurately) annotating a bacterial genome
QIIME2
16S
source activate qiime2
Anytime you want to figure out microbial community composition from 16S data
QUAST
QA/QC
quast.py
ROP
/usr/local/bin/rop/rop.sh
RSEM
1.3.0
read alignmenttranscriptomics
rsem-prepare-reference, rsem-calculate-expression, rsem-tbam2gbam, rsem-bam2wig
samtools
1.7
NGS tools
samtools
A powerful suite of tools for working with aligment files (bam, sam, etc)
seqtk
1.2-r101-dirty
working with fasta/fastq
seqtk
I use this anytime I want to quickly subsample a fastq file
Sourmash
4.0
metagenomics
sourmash
seqKit
0.12.0
working with fasta/fastq
seqkit
Anytime you need to manipulate a fastq/a file. Some overlap in functionality with seqtk
Sickle
1.33
unpublished
QA/QC
sickle se or sickle pe
Sourmash
3.3.0
comparative genomics
sourmash
Fantastic software that takes an alignment-free approach to compare two or more fastq files to each other, or to all of refseq or genbank to understand what organisms might be present in the data.
refseq and genbank microbial reference 'sketches' are in /data/reference_db/sourmash_refs
SPAdes
3.12.0
assemblymetagenomics
spades.py [options] -o <output_dir>
If you have a metagenomic sequencing data and want to assemble microbial genomes de novo
SQUID
1.4
transcriptomics
squid
If you have some RNAseq data and want to find fusion and non-fusion transcript sequence variants
SRA toolkit
2.9.1
public data
fasterq_dump, sam-dump, and more
STAR
2.6.1c
read alignment
STAR (all caps)
Very fast and popular base-wise aligner
prebuilt STAR indexes for several species present in /data/reference_db/star
Sunbeam
metagenomics
source activate sunbeam
Trimmomatic
0.39
QA/QC
java -jar /usr/local/bin/Trimmomatic-0.39/trimmomatic-0.39.jar
Anytime you need to trim or filter raw reads from a fastq file based on base quality scores or length
Unicycler
0.4.8
assembly
unicycler
If you have short (Illuminati) and long (Nanopore or PacBio) reads from a bacterial isolate and want to get a complete genome assembly