- Overview
- Before starting
- prepare your fastq files
- extract barcodes
- demultiplex libraries
- prepare metadata
Overview
Our lab – in collaboration with the VEuPathDB group at UPenn, UGA and UND – developed MicrobiomeDB, a web-based platform for sharing, integrating and carrying out sophisticated queries of microbiome experiments. By signing up for a free account on MicrobiomeDB, you can use tools on our site to analyze your microbiome data.
Before starting
- You’ll need access to your raw 16S rRNA gene sequence data, in the form of one forward.fastq.gz and one reverse.fastq.gz
- Sign-up for a Google account (alternatively, Microsoft Excel is fine)
- Get access to a compute cluster (or local workstation) with QIIME installed
- Sign up for an account on microbiomeDB.org (it’s free!)
prepare your fastq files
Activate qiime2 and unzip your fastq files
source activate qiime2
gunzip *fastq.gz
Depending on the exact format of your fastq files, you may need to remove the ‘+’ in the fastq files
perl -ane 'if (m/^\@/){s/\+//;} print;' forward.fastq > forward.fastq
extract barcodes
extract_barcodes.py \
-f forward.fastq \
-r reverse.fastq \ #only used for paired-end reads
-l 16 \ #this option specifies the barcode length. see documentation for guidance for your particular barcodes.
-o barcodes.fastq \
-c barcode_in_label
demultiplex libraries
If using paired-end reads, perform the following steps for both forward and reverse reads.
split_libraries_fastq.py \
-i forward.fastq \
-o forward/split_libraries \
-m metadata_file.txt \
-b barcodes.fastq \
--barcode_type 16 \
--store_demultiplexed_fastq #Without this flag the output is only a demultiplexed fasta file
Rename your files so the runID/name and the forward or reverse info is indicated in the name (in this example, the run name was ‘miseq12’ and these were forward reads)
mv seqs.fastq miseq12_forward_seqs.fastq
mv seqs.fna miseq12_forward.fna
mv histograms.txt miseq12_forward_histograms.txt
mv split_library_log.txt miseq12_forward_split_library_log.txt
split_sequence_file_on_sample_ids.py \
-i seqs.fastq \
-o forward/split_libraries/forward_fastqs \
--file_type fastq
prepare metadata
MicrobiomeDB allows users to analyze and visualize microbiome data based on any user-provided metadata. This metadata needs to be provided as a google sheet. Each column of your spreadsheet should represent a metadata variable (e.g., Age, Sex, Treatment, etc), and each row should be identified with unique sample IDs that matches those used in naming the fastq files above. If you need help naming your column headers using appropriate terminology, you may find it useful to view our terminology list on microbiomeDB.