We have a lot of software already installed on the server that covers applications ranging from QC analysis and preprocessing of raw sequence data, transcriptome analysis from RNAseq data, 16S and shotgun metagenomics pipelines, WGS tools, and more. If you have an account on our cluster, then you already have access to all of the software below, so get started!
If youβre looking for a piece of software and donβt find it below, just reach out to Dan Beiting to inquire about getting it installed.
NOTE: When available, Iβve included the appropriate publication/reference for each piece of software. Please cite the authors if you use their software! if a piece of software has not been published, you should cite the github repo or software website
software | Version | Website | Citation | category | how to run | When to use | Reference files or databases | |
---|---|---|---|---|---|---|---|---|
anviβo | 7.1 | Community-led, integrated, reproducible multi-omics with anviβo and Anviβo: an advanced analysis and visualization platform for βomics data | metagenomicsvisualizationMAGs |
| When you're ready to dive into Metagenome Assembled Genomes (MAGs). | |||
Amazon Web Services Command Line Interface (AWS CLI) | 2.12.6 | unpublished | utility |
| When you want to get reference genomes from the Illumina iGenomes project: https://ewels.github.io/AWS-iGenomes/ | |||
1.5.3 | unpublished | NGS tools |
| |||||
bcftools | 1.18 | NGS tools | | |||||
2.20.0.422 | unpublished | NGS tools | bcl2fastq -R $rundirectory -o $outdirectory --sample-sheet $samplesheet.csv --no-lane-splitting | |||||
4.2.7 | unpublished | NGS tools | bcl-convert | |||||
2.31.0 | BEDTools: a flexible suite of utilities for comparing genomic features | NGS tools |
| Anytime you want to calculate genomic metrics from sequence data (e.g. coverage) | ||||
BLAST | 2.12.0 | sequence search |
| |||||
2.5.1 | Ultrafast and memory-efficient alignment of short DNA sequences to the human genome and Fast gapped-read alignment with Bowtie2 | read alignment |
| One of the best and most popular base-wise aligners. Even if you don't use it as your primary aligner, it is still used by many other software tools under the hood. | prebuilt bowtie2 indexes for many species are located in /data/reference_db in folders named by genus and species | |||
0.7.17-r1188 | Fast and accurate short read alignment with BurrowsβWheeler transform | read alignment |
| I don't use this directly, but used by other programs for alignment | ||||
CellxGene gateway | 0.3.11 | unpublished | single cell |
| Allows us to host a cellxgene instance that works with multiple datasets |
| ||
CellRanger | 7.1.0 | single cell |
| If you want to preprocess single cell genomic data from the 10x platform | ||||
CellRanger-arc | 2.0.2 | single cell |
| If you want to preprocess single cell genomic data from the 10x platform | ||||
1.1.3 | metagenomicsQA/QC |
| If you have a bacterial genome assembly and want to check the quality of the assembly |
| ||||
1.18.0 | Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data | transcriptomics |
| If you have an RNAseq dataset with multiple timepoints and want to identify 'tight' modules of co-regulated genes across these timepoints. Clust also allows comparison of modules between datasets/experiments. | ||||
3.5.2 | deepTools2: A next Generation Web Server for Deep-Sequencing Data Analysis | visualizationNGS tools |
| |||||
2.1.8 | multiple sequence alignment |
| If you have a bunch of protein AA or translated DNA sequences that you want to align. | diamond formatted databases for UniRef90 and UniRef50 live in | ||||
Docker | 24.0.2, build cb74dfc | unpublished | containerized software |
| ||||
Dorado | 0.3.1+bb8c5ee | unpublished | nanoporebasecallingGPU |
| ||||
0.23.4 | QA/QC |
| ||||||
0.12.1 | unpublished | QA/QC |
| The preferred choice for rapid quality control assessment of raw reads in a fastq file | ||||
0.15.3 | FastQ Screen: A tool for multi-genome mapping and quality control | decontaminationQA/QC |
| simple tool for figuring out if you fastq file has 'contaminating' reads from specific species. uses bowtie2 under the hood. | ||||
Filezilla | 3.63.0 | unpublished | utility |
| ||||
0.2.1 | unpublished | QA/QCnanopore |
| If you have Oxford Nanopore long read data and want to filter your raw data to remove reads based on length or quality | ||||
1.3.6 | Haplotype-based variant detection from short-read sequencing | variantsSNPs/INDELs |
| For variant calling | ||||
4.4.0.0 | variants |
| When you are working with SNPs/variants | |||||
0.7.0 | public data |
| A convenient wrapper around the fasterq_dump software that makes it easy to grab sequences from SRA, ENA, MGRAST and iMicrobe | |||||
GTDB-TK | 2.1.1 | GTDB-Tk2: memory friendly classification with the genome taxonomy database and GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database | metagenomicsclassification |
|
| |||
3.7 | Species-level functional profiling of metagenomes and metatranscriptomes | metagenomicsfunctional profiling | You have shotgun metagenomic data from a microbial community and want to understand functional content (e.g. bacterial metabolic pathways). Note that humann2 uses DIAMOND, MinPath and Bowtie2 under the hood | |||||
htop | 3.2.2 | unpublished | utility |
| ||||
1.10 | Measurement of bacterial replication rates in microbial communities | metagenomics |
| |||||
2.3.1 | A fast, lock-free approach for efficient parallel counting of occurrences ofΒ k-mers | NGS tools |
| For rapid/efficient counting of kmers in DNA | ||||
0.50.1 | transcriptomicsread alignment |
| Our preferred choice for mapping RNA-seq raw reads to a reference transcriptome | prebuilt kallisto indexes for a few species in /data/reference_db/kallisto | ||||
0.27.3 | Near-optimal probabilistic RNA-seq quantification and Modular, efficient and constant-memory single-cell RNA-seq preprocessing | single cell |
| A great alternative to CellRanger for preprocessing single cell data from the 10x platform. | prebuilt kallisto indexes for a few species in /data/reference_db/kallisto | |||
0.12.0 | unpublished | decontamination |
| If you want to remove 'contaminating' reads from a fastq file. Uses bowtie2 under the hood | ||||
2.0.7-beta | Kraken: ultrafast metagenomic sequence classification using exact alignments and Improved metagenomic analysis with Kraken 2 | metagenomicsclassification |
|
| ||||
Krakenuniq | 0.5.8 | KrakenUniq: confident and fast metagenomics classification using uniqueΒ k-mer counts | metagenomicsclassification |
| ||||
3.0.0a6 | Model-based Analysis of ChIP-Seq (MACS) and Improved peak-calling with MACS2 | Epigenetics |
| Anytime you have ATAC-seq or ChIP-seq data and want to identify 'peaks' or read pile-ups at specific positions in the genome | ||||
marker_alignments | 0.4.2 | Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes | metagenomicsclassification |
| If you want to find microbial eukaryotes in metagenomic data | the EukDetect database used by this program lives in | ||
Mastiff | 0.0.3 | unpublished | metagenomicspublic data |
| ||||
MaxBin2 | 2.2.7 | assembly |
| |||||
1.2.9 | An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph and A Fast and Scalable Metagenome Assembler driven by Advanced Methodologies and Community Practices | assemblymetagenomics |
| If you want to assemble genomes from metagenomic data | ||||
4.06 | Metagenomic microbial community profiling using unique clade-specific marker genes and Extending and improving metagenomic taxonomic profiling with uncharacterized species with MetaPhlAn 4 | metagenomicsclassification |
|
| ||||
Micro | 2.0.11 | unpublished | utility |
| anytime you need to edit a text file in the terminalβ¦.itβs far better than vim or nano! | |||
0.3.4 | NGS tools |
| ||||||
1.14 | unpublished | QA/QC |
| Our preferred choice for quickly and easily summarizing QC metrics, as well as outputs from MANY other programs, in a convenient html report | ||||
23.04.2.5870 | unpublished | workflow management |
| If you want to set up an automated workflow on our server | ||||
nf-core | 2.10 | The nf-core framework for community-curated bioinformatics pipelines | workflow management |
| ||||
nvitop | 1.1.2 | unpublished | utilityGPU |
| ||||
nvtop | 3.0.1 | unpublished | utilityGPU |
| ||||
3.0.0 | unpublished | NGS tools |
| One of the main places we use this is for filtering out PCR duplicates in our ATAC-seq workflow | ||||
1.07 | Second-generation PLINK: rising to the challenge of larger and richer datasets | comparative genomics |
| Used for GWAS and other popgen analyses | ||||
0.2.4 (no longer maintained/supported) | unpublished | QA/QCnanopore |
| When you have Nanopore reads and you want to trim off the adapter sequence | ||||
1.14.6 | annotation |
| Great for quickly (and accurately) annotating a bacterial genome | |||||
2023.5.1 | Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 | 16S |
| Anytime you want to figure out microbial community composition from 16S data | ||||
1.9.1 | QIIME allows analysis of high-throughput community sequencing data | 16S |
| |||||
Rosella | 0.4.2 | unpublished | metagenomicsbinningMAGs | |||||
rust | 1.26.0 | programming language |
| |||||
1.16.1 | NGS tools |
| A powerful suite of tools for working with aligment files (bam, sam, etc) | |||||
1.3-r106 | working with fasta/fastq |
| I use this anytime I want to quickly subsample a fastq file | |||||
2.3.0 | SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation | working with fasta/fastq |
| Anytime you need to manipulate a fastq/a file. Some overlap in functionality with seqtk | ||||
5.1d | unpublished | SNPs/INDELs |
| |||||
3.15.4 | SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing | assemblymetagenomics |
| If you have a metagenomic sequencing data and want to assemble microbial genomes de novo | ||||
SWGA2 | 1.0.0 | A fast machine-learning-guided primer design pipeline for selective whole genome amplification | metagenomics |
| when you want to design primers for carrying out selective whole genome amplification (SWGA) | |||
4.8.2 | metagenomicsclassification |
| Fantastic software that takes an alignment-free approach to compare two or more fastq files to each other, or to all of refseq or genbank to understand what organisms might be present in the data. | |||||
2.1.1 | unpublished | single cell |
| When you have spatial gene expression from the Visium 10x platform | ||||
3.0.5 | public data |
| ||||||
2.7.10b | read alignment |
| Very fast and popular base-wise aligner | prebuilt STAR indexes for several species present in /data/reference_db/star | ||||
StrainPhlAn | 4.0.6 | Microbial strain-level population structure and genetic diversity from metagenomes | metagenomicsclassification |
| ||||
4.1.0 | Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments | metagenomicsclassification |
| |||||
0.39 | QA/QC |
| Anytime you need to trim or filter raw reads from a fastq file based on base quality scores or length | |||||
0.5.0 | Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads | assembly |
| If you have short (Illuminati) and long (Nanopore or PacBio) reads from a bacterial isolate and want to get a complete genome assembly | ||||
VCFtools | 0.1.17 | variants |
| |||||
velocyto | 0.17 | single cell |
| |||||