Hybrid assembly of bacterial genomes

Before starting

This protocol assumes that you have both Illumina sequencing data (inherently short) and long(er) reads from Oxford Nanopore MinION or GridION sequencers. These should should be placed in folders called short_reads and long_reads, respectively.

Trim short reads using Trimmomatic

mkdir short_reads.trimmed/
java -jar /usr/local/bin/Trimmomatic-0.39/trimmomatic-0.39.jar SE -threads 24 -phred33 ./short_reads/raw/hiranonis_40-2.fastq ./short_reads/trimmed/hiranonis_40-2_trimmed.fastq.gz LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Trim long reads using Porechop

# move into the directory with the oxford nanopore reads
cd long_reads
# now trim using porechop
i=0
for seq in raw/*.fastq; do
        porechop -i $seq -o trimmed/lr_${i}_trimmed.fastq.gz
        ((i++))
done

Filter long reads using FiltLong

for seq in trimmed/*.gz; do
        name=$(echo $seq | cut -d'_' -f 2)
        filtlong --min_length 1000 --keep_percent 99 --target_bases 500000000 $seq | gzip > filtered/lr_${name}_filtered.fastq.gz
done
cd ..

Carry out hybrid assembly using Unicycler

unicycler -s ./short_reads/trimmed/yourShortReads.fastq.gz -l ./long_reads/filtered/big_file/yourLongReads.fastq.gz -o output_dir

Check the quality of your assembly using CheckM

checkm lineage_wf -t 24 . checkm_output/