software | Version | Website | Citation | category | how to run | When to use | Reference files or databases | |
---|---|---|---|---|---|---|---|---|
anviβo | 7.1 | metagenomicsvisualizationMAGs | conda activate anvio-7.1 | When you're ready to dive into Metagenome Assembled Genomes (MAGs). | ||||
Amazon Web Services Command Line Interface (AWS CLI) | 2.12.6 | unpublished | utility | aws | When you want to get reference genomes from the Illumina iGenomes project: https://ewels.github.io/AWS-iGenomes/ | |||
1.5.3 | unpublished | NGS tools | bs | |||||
bcftools | 1.17 | NGS tools | bcftools | |||||
bcl2fastq | v2.19.0.316 | unpublished | NGS tools | conda activate bcl2fastq | ||||
2.31.0 | NGS tools | bedtools | Anytime you want to calculate genomic metrics from sequence data (e.g. coverage) | |||||
BLAST | 2.12.0 | sequence search | blastn , blastp , or blastx | |||||
2.5.1 | read alignmentMicrobial ecology pipeline for 16S rRNA data | bowtie2 | One of the best and most popular base-wise aligners. Even if you don't use it as your primary aligner, it is still used by many other software tools under the hood. | prebuilt bowtie2 indexes for many species are located in /data/reference_db in folders named by genus and species | ||||
0.7.17-r1188 | read alignment | bwa | I don't use this directly, but used by other programs for alignment | |||||
CellxGene gateway | 0.3.11 | unpublished | single cell | cellxgene-gateway | Allows us to host a cellxgene instance that works with multiple datasets | /data/reference_db/cellxgene_data | ||
CellRanger | 7.1.0 | single cell | cellranger | If you want to preprocess single cell genomic data from the 10x platform | ||||
CellRanger-arc | 2.0.2 | single cell | cellranger-arc | If you want to preprocess single cell genomic data from the 10x platform | ||||
1.1.3 | metagenomicsgenome assemblyQA/QC | conda activate checkm then checkm lineage_wf | If you have a bacterial genome assembly and want to check the quality of the assembly | /data/reference_db/checkm | ||||
1.18.0 | transcriptomics | clust | If you have an RNAseq dataset with multiple timepoints and want to identify 'tight' modules of co-regulated genes across these timepoints. Clust also allows comparison of modules between datasets/experiments. | |||||
3.5.2 | visualizationNGS tools | deeptools | ||||||
2.1.8 | multiple sequence alignment | diamond | If you have a bunch of protein AA or translated DNA sequences that you want to align. | diamond formatted databases for UniRef90 and UniRef50 live in /data/reference_db/uniref | ||||
Docker | 24.0.2, build cb74dfc | unpublished | containerized software | docker run [OPTIONS] IMAGE [COMMAND] [ARG...] | ||||
Dorado | 0.3.1+bb8c5ee | unpublished | nanoporebasecallingGPU | dorado | ||||
0.23.4 | QA/QC | fastp | ||||||
0.12.1 | unpublished | QA/QC | fastqc | The preferred choice for rapid quality control assessment of raw reads in a fastq file | ||||
0.15.3 | decontaminationQA/QC | fastq_screen | simple tool for figuring out if you fastq file has 'contaminating' reads from specific species. uses bowtie2 under the hood. | |||||
Filezilla | 3.63.0 | unpublished | utility | filezilla | ||||
0.2.1 | unpublished | QA/QCnanopore | filtlong | If you have Oxford Nanopore long read data and want to filter your raw data to remove reads based on length or quality | ||||
1.3.6 | variantsSNPs/INDELs | freebayes | For variant calling | |||||
4.4.0.0 | variants | gatk | When you are working with SNPs/variants | |||||
0.7.0 | public data | grabseqs | A convenient wrapper around the fasterq_dump software that makes it easy to grab sequences from SRA, ENA, MGRAST and iMicrobe | |||||
GTDB-TK | 2.1.1 | metagenomicsclassification | conda activate gtdb may need to run export GTDBTK_DATA_PATH=/data/reference_db/GTDB-Tk/release214 to make sure your environment βseesβ the reference database | /data/reference_db/GTDB-Tk | ||||
3.7 | metagenomicsfunctional profiling | You have shotgun metagenomic data from a microbial community and want to understand functional content (e.g. bacterial metabolic pathways). Note that humann2 uses DIAMOND, MinPath and Bowtie2 under the hood | ||||||
htop | 3.2.2 | unpublished | utility | htop | ||||
1.10 | metagenomics | iRep or bPTR | ||||||
2.3.1 | NGS tools | jellyfish | For rapid/efficient counting of kmers in DNA | |||||
0.48 | transcriptomicsread alignment | kallisto | Our preferred choice for mapping RNA-seq raw reads to a reference transcriptome | prebuilt kallisto indexes for a few species in /data/reference_db/kallisto | ||||
0.27.3 | single cell | kb | A great alternative to CellRanger for preprocessing single cell data from the 10x platform. | prebuilt kallisto indexes for a few species in /data/reference_db/kallisto | ||||
0.12.0 | unpublished | decontamination | kneaddata | If you want to remove 'contaminating' reads from a fastq file. Uses bowtie2 under the hood | ||||
2.0.7-beta | metagenomicsclassification | conda activate kraken thenkraken2 | /data/reference_db/kraken2db_standard/ | |||||
Krakenuniq | 0.5.8 | metagenomicsclassification | conda activate kraken thenkrakenuniq | |||||
3.0.0a6 | Epigenetics | conda activate macs3 then macs3 | Anytime you have ATAC-seq or ChIP-seq data and want to identify 'peaks' or read pile-ups at specific positions in the genome | |||||
marker_alignments | 0.4.2 | metagenomicsclassification | marker_alignments | If you want to find microbial eukaryotes in metagenomic data | the EukDetect database used by this program lives in /data/reference_db/eukdetect | |||
Mastiff | 0.0.3 | unpublished | metagenomicspublic data | mastiff | ||||
MaxBin2 | 2.2.7 | assembly | run_MaxBin.pl | |||||
1.2.9 | assemblymetagenomics | megahit | If you want to assemble genomes from metagenomic data | |||||
4.06 | metagenomicsclassification | conda activate biobakery and then metaphlan | /data/reference_db/biobakery | |||||
Micro | 2.0.11 | unpublished | utility | micro | anytime you need to edit a text file in the terminalβ¦.itβs far better than vim or nano! | |||
0.3.4 | NGS tools | mosdepth and plot-dist.py for plotting | ||||||
1.14 | unpublished | QA/QC | multiqc | Our preferred choice for quickly and easily summarizing QC metrics, as well as outputs from MANY other programs, in a convenient html report | ||||
23.04.2.5870 | unpublished | workflow management | nextflow | If you want to set up an automated workflow on our server | ||||
nf-core | 2.9 | workflow management | nf-core | |||||
nvitop | 1.1.2 | unpublished | utilityGPU | pipx run nvitop | ||||
nvtop | 3.0.1 | unpublished | utilityGPU | nvtop | ||||
3.0.0 | unpublished | NGS tools | java -jar /usr/local/bin/picard.jar | One of the main places we use this is for filtering out PCR duplicates in our ATAC-seq workflow | ||||
1.07 | comparative genomics | plink | Used for GWAS and other popgen analyses | |||||
0.2.4 (no longer maintained/supported) | unpublished | QA/QCnanopore | porechop | When you have Nanopore reads and you want to trim off the adapter sequence | ||||
1.14.6 | annotation | prokka | Great for quickly (and accurately) annotating a bacterial genome | |||||
2023.5.1 | 16S | source activate qiime2-2023.5 and then qiime | Anytime you want to figure out microbial community composition from 16S data | |||||
1.9.1 | 16S | source activate qiime1 and then qiime | ||||||
Rosella | 0.4.2 | unpublished | metagenomicsbinningMAGs | |||||
rust | 1.26.0 | programming language | rustup or cargo or rustc | |||||
1.16.1 | NGS tools | samtools | A powerful suite of tools for working with aligment files (bam, sam, etc) | |||||
1.3-r106 | working with fasta/fastq | seqtk | I use this anytime I want to quickly subsample a fastq file | |||||
2.3.0 | working with fasta/fastq | seqkit | Anytime you need to manipulate a fastq/a file. Some overlap in functionality with seqtk | |||||
5.1d | unpublished | SNPs/INDELs | java -jar /usr/local/bin/snpEff/snpEff.jar for snpEff and java -jar /usr/local/bin/snpEff/SnpSift.jar for snpSift | |||||
3.15.4 | assemblymetagenomics | spades.py [options] -o <output_dir> | If you have a metagenomic sequencing data and want to assemble microbial genomes de novo | |||||
SWGA2 | 1.0.0 | metagenomics | conda activate swga2 then soapswga | when you want to design primers for carrying out selective whole genome amplification (SWGA) | ||||
4.8.2 | metagenomicsclassification | sourmash | Fantastic software that takes an alignment-free approach to compare two or more fastq files to each other, or to all of refseq or genbank to understand what organisms might be present in the data. | |||||
3.0.5 | public data | fasterq_dump , sam-dump , and more | ||||||
2.7.10b | read alignment | STAR (all caps) or STARlong (for aligning long reads) | Very fast and popular base-wise aligner | prebuilt STAR indexes for several species present in /data/reference_db/star | ||||
StrainPhlAn | 4.0.6 | metagenomicsclassification | conda activate biobakery and then strainphlan | |||||
3.1.1 | metagenomicsclassification | conda activate sunbeam3.1.1 | ||||||
0.39 | QA/QC | java -jar /usr/local/bin/Trimmomatic/trimmomatic-0.39.jar | Anytime you need to trim or filter raw reads from a fastq file based on base quality scores or length | |||||
0.5.0 | assembly | unicycler | If you have short (Illuminati) and long (Nanopore or PacBio) reads from a bacterial isolate and want to get a complete genome assembly | |||||
VCFtools | 0.1.17 | variants | vcftools | |||||
velocyto | 0.17 | single cell | veloctyo | |||||