software | Version | Website | Citation | category | how to run | When to use | Reference files or databases |
---|---|---|---|---|---|---|---|
anvi’o | 7.0 | metagenomicsvisualization | source activate anvio-7 | When you're ready to dive into Metagenome Assembled Genomes (MAGs). | |||
BamM | 1.7.3 | unpublished | NGS tools | bamm | |||
Bandage | NA | genome assembly visualization | Bandage (must be sitting at the Linux to use, since it is a graphical interface) | ||||
BaseSpace Sequence Hub CLI tool suite | 1.1.0 | unpublished | NGS tools | bs | |||
bcftools | NGS tools | bcftools | |||||
bedtools | 2.26.0 | NGS tools | bedtools | Anytime you want to calculate genomic metrics from sequence data (e.g. coverage) | |||
BLAST | 2.9.0 | sequence search | option include blastn , blastp , or blastx | ||||
bowtie2 | 2.3.4.1 | read alignmentMicrobial ecology pipeline for 16S rRNA data | bowtie2 | One of the best and most popular base-wise aligners. Even if you don't use it as your primary aligner, it is still used by many other software tools under the hood. | prebuilt bowtie2 indexes for many species are located in /data/reference_db in folders named by genus and species | ||
BWA | 0.7.17-r1188 | read alignment | bwa | I don't use this directly, but used by other programs for alignment | |||
CellRanger | 7.0.0 | single cell | cellranger | If you want to preprocess single cell genomic data from the 10x platform | |||
CellRanger-arc | 2.0.1 | single cell | cellranger-arc | If you want to preprocess single cell genomic data from the 10x platform | |||
Circos | visualization | circos | |||||
CheckM | 1.1.3 | metagenomicsgenome assemblyQA/QC | checkm lineage_wf | If you have a bacterial genome assembly and want to check the quality of the assembly | uses checkm_db, which is located in /data/reference_db | ||
Clust | 1.17.0 | transcriptomics | clust | If you have an RNAseq dataset with multiple timepoints and want to identify 'tight' modules of co-regulated genes across these timepoints. Clust also allows comparison of modules between datasets/experiments. | |||
CONCOCT | 1.1.0 | metagenomics | concoct | When you have metagenomic data and want to put de novo assembled contigs into genome bins | |||
Cytoscape | network analysis | navigate to /home/shared/softwares/Cytoscape_v3.5.1 folder. Double click to open program | |||||
deeptools | 3.5.1 | visualizationNGS tools | deeptools | ||||
DESMAN | metagenomics | desman | If you want to identify strains in your metagenomic data | ||||
DIAMOND | 0.8.22.84 | multiple sequence alignment | diamond | If you have a bunch of protein AA or translated DNA sequences that you want to align. | |||
EMIRGE | metagenomics16S | emirge.py or emirge_amplicon.py | |||||
Fastp | 0.20.1 | QA/QC | fastp | ||||
FastQC | 0.11.7 | unpublished | QA/QC | fastqc | The preferred choice for rapid quality control assessment of raw reads in a fastq file | ||
FastQ Screen | 0.15.1 | decontamination | fastq_screen | simple tool for figuring out if you fastq file has 'contaminating' reads from specific species. uses bowtie2 under the hood. | |||
Filtlong | 0.2.0 | unpublished | QA/QC | filtlong | If you have Oxford Nanopore long read data and want to filter your raw data to remove reads based on length or quality | ||
Freebayes | 1.3.6 | variantsSNPs/INDELs | freebayes | For variant calling | |||
GATK | 4.1.7.0 | variants | gatk | When you are working with SNPs/variants | |||
Grabseqs | 0.7.0 | public data | grabseqs | A convenient wrapper around the fasterq_dump software that makes it easy to grab sequences from SRA, ENA, MGRAST and iMicrobe | |||
GraPhlAn | 1.1.3 | metagenomics visualization | graphlan.py or graphlan_annotate.py | If you want to create a visual link between microbiome samples, their phylogenetic relationship and sample or patient metadata | |||
GroopM | 0.3.4 | metagenomics | groopm | ||||
HISAT2 | |||||||
Humann3 | 3.0.0.alpha.2 | metagenomics | humann2 | You have shotgun metagenomic data from a microbial community and want to understand functional content (e.g. bacterial metabolic pathways). Note that humann2 uses DIAMOND, MinPath and Bowtie2 under the hood | |||
HTSeq | 0.13.5 | transcriptomics | htseq-count or htseq-qa | If you’ve already aligned data using a base-aware aligned (e.g. STAR or Bowtie2) and you want to summarize reads to genomic features (genes, exons, etc) | |||
iRep | 1.1.14 | metagenomics | iRep or bPTR | ||||
Kallisto | 0.46.0 | transcriptomicsread alignment | kallisto | Our preferred choice for mapping RNA-seq raw reads to a reference transcriptome | prebuilt kallisto indexes for a few species in /data/reference_db/kallisto | ||
Kallisto-BUStools | 0.27.3 | single cell | conda activate kb then kb | A great alternative to CellRanger for preprocessing single cell data from the 10x platform. | prebuilt kallisto indexes for a few species in /data/reference_db/kallisto | ||
KneadData | 0.6.1 | unpublished | decontamination | kneaddata | If you want to remove 'contaminating' reads from a fastq file. Uses bowtie2 under the hood | ||
MinPath | biological pathway reconstructions using protein family predictions | MinPath1.4.py | |||||
Kraken2 | 2.0.6 beta | metagenomics | kraken2 | kracken reference database is in /data/reference_db/kraken2db_standard/ | |||
MACS3 | 3.0.0a6 | Epigenetics | conda activate macs3 then macs3 | Anytime you have ATAC-seq or ChIP-seq data and want to identify 'peaks' or read pile-ups at specific positions in the genome | |||
Mash | 2.0 | comparative genomics | mash | We don't use this as standalone software, but it is needed by Sourmash, which we use a lot | |||
MEGAHIT | 1.2.9 | assemblymetagenomics | megahit | If you want to assemble genomes from metagenomic data | |||
MetaPhlAn3 | 3.0 | metagenomics | metaphlan | ||||
MosDepth | 0.3.1 | NGS tools | mosdepth and plot-dist.py for plotting | ||||
Mothur | 1.44.1 | 16S | mothur | ||||
MultiQC | 1.13 | unpublished | QA/QC | multiqc | Our preferred choice for quickly and easily summarizing QC metrics, as well as outputs from MANY other programs, in a convenient html report | ||
Nextflow | 20.01.0.5264 | unpublished | workflow management | nextflow | If you want to set up an automated workflow on our server | ||
Picard tools | unpublished | NGS tools | java -jar /usr/local/bin/picard.jar | One of the main places we use this is for filtering out PCR duplicates in our ATAC-seq workflow | |||
Plink | 1.9 | comparative genomics | plink | Used for GWAS and other popgen analyses | |||
Porechop | 0.2.4 (no longer maintained/supported) | unpublished | QA/QC | porechop | When you have Nanopore reads and you want to trim off the adapter sequence | ||
Prokka | 1.14.6 | annotation | conda activate prokka then prokka | Great for quickly (and accurately) annotating a bacterial genome | |||
QIIME2 | 16S | source activate qiime2 | Anytime you want to figure out microbial community composition from 16S data | ||||
QUAST | QA/QC | quast.py | |||||
ROP | /usr/local/bin/rop/rop.sh | ||||||
RSEM | 1.3.0 | read alignmenttranscriptomics | rsem-prepare-reference , rsem-calculate-expression , rsem-tbam2gbam , rsem-bam2wig | ||||
samtools | 1.7 | NGS tools | samtools | A powerful suite of tools for working with aligment files (bam, sam, etc) | |||
seqtk | 1.2-r101-dirty | working with fasta/fastq | seqtk | I use this anytime I want to quickly subsample a fastq file | |||
Sourmash | 4.5 | metagenomics | conda activate sourmash then sourmash | ||||
seqKit | 0.12.0 | working with fasta/fastq | seqkit | Anytime you need to manipulate a fastq/a file. Some overlap in functionality with seqtk | |||
Sickle | 1.33 | unpublished | QA/QC | sickle se or sickle pe | |||
snpEff | 5.1d | unpublished | SNPs/INDELs | java -jar /usr/local/bin/snpEff/snpEff.jar for snpEff and java -jar /usr/local/bin/snpEff/SnpSift.jar for snpSift | |||
Sourmash | 4.5.0 | comparative genomics | sourmash | Fantastic software that takes an alignment-free approach to compare two or more fastq files to each other, or to all of refseq or genbank to understand what organisms might be present in the data. | refseq and genbank microbial reference 'sketches' are in /data/reference_db/sourmash_refs | ||
SPAdes | 3.12.0 | assemblymetagenomics | spades.py [options] -o <output_dir> | If you have a metagenomic sequencing data and want to assemble microbial genomes de novo | |||
SQUID | 1.4 | transcriptomics | squid | If you have some RNAseq data and want to find fusion and non-fusion transcript sequence variants | |||
SRA toolkit | 2.9.1 | public data | fasterq_dump , sam-dump , and more | ||||
STAR | 2.6.1c | read alignment | STAR (all caps) | Very fast and popular base-wise aligner | prebuilt STAR indexes for several species present in /data/reference_db/star | ||
Sunbeam | metagenomics | source activate sunbeam | |||||
Trimmomatic | 0.39 | QA/QC | java -jar /usr/local/bin/Trimmomatic-0.39/trimmomatic-0.39.jar | Anytime you need to trim or filter raw reads from a fastq file based on base quality scores or length | |||
Unicycler | 0.4.8 | assembly | unicycler | If you have short (Illuminati) and long (Nanopore or PacBio) reads from a bacterial isolate and want to get a complete genome assembly | |||
VCFtools | 0.1.16 | variants | vcftools | ||||