Transcriptomics in the cloud

Transcriptomics in the cloud

This 2 day workshop is hosted by Dan Beiting, and is based on the online course he runs each year called DIYtranscriptomics. The course website will serve as the source for lecture videos, code, data and other resources for this workshop. This is a fast-paced workshop best suited for individuals with experience working in R and using the RStudio IDE. In addition, experience using the Tidyverse suite of R packages is recommended. The workshop page below is meant to serve as a high-level overview as we move through the material.

Workshop schedule

Day 1

TimeDescriptionTopics coveredVideo lecturesComments
9:00 - 9:30
Grab some coffee and get settled in
9:30 - 11:15
Module 3
Read mapping with Kallisto (2 videos)
You will NOT have time to do the read mapping yourself. However, these two videos will give you clear instructions on HOW read alignment and 'pseudo alignment' work, and how you could do read mapping on your laptop with the workshop dataset on your own time. For the purposes of this workshop, you'll start your analysis using the outputs from Kallisto, which are already in your RStudio Cloud project
Coffee/stretch break
11:30 - 12:30
Module 4
Measuring gene expression; only watch Part 1 video
This lecture covers important concepts that are fundamental to understanding counts, RPKM/FPKM, TPM and normalization
12:30 - 1:30
Break for Lunch
1:30 - 2:15
Discussion + coding start
Importing Kallisto counts directly into R/bioconductor
This really the first place where you'll be working in the RStudio cloud project. I'll join you by Zoom to guide you through launching the project and running the code in the Step 1 script in order to get all the data imported into your R environment.
2:15 - 3:45
Module 6
Filtering and normalization (2 videos)
Start and finish the code in the Step 2 script
3:45 - 4:00
Coffee/stretch break
4:00 - 5:00
Module 7
Principal Component Analysis (PCA); only watch Part 1 video
Start and finish the code in the Step 3 script
5:00 - 6:00
Discussion and Day 1 wrap-up
We'll pick up with the PCA result and work together to plot the results, explore and discuss
A chance to review key concepts and to have a Q&A discussion

Day 2

TimeDescriptionTopics coveredVideo lecturesComments
9:00 - 9:30
Grab some coffee and get settled in
9:30 - 11:15
Module 9
Differential gene expression; only watch videos for parts 1, 2 and 3.
Start and finish the code in the Step 5 script
11:15 - 11:30
Coffee/stretch break
11:30 - 12:30
Module 10
Module identification; only watch videos for parts 1 and 2
Start and finish the code in the Step 6 script
12:30 - 1:30
Break for Lunch
1:30 - 2:30
Discussion and code review
Review step 5 and 6 scripts together, discuss DEG analysis, What to do if you don't have any DEGs...or too many? Q&A.
2:15 - 3:15
Module 11
Functional enrichment analysis; videos for part 1 and 2
Begin Step 7 script
3:15 - 3:30
Coffee/stretch break
3:30 - 4:30
Module 11
Functional enrichment analysis; videos for part 3 and 4
Finish Step 7 script
4:30 - 6:00
Discussion and wrap-up
Q&A, demo Rmarkdown report, highlight reproducibility topics (e.g. Code Ocean)

Getting started

In interests of time, we will not install R, RStudio, Bioconductor, Kallisto, or any other software on your computers. Instead, I have set-up a RStudio Cloud instance that is already pre-stocked with all the software, R packages, data and code that you'll need for the workshop. You should have received an email with a link that allowed you to join this RStudio Cloud space. This space includes all the materials you'll need for the workshop, as well as the larger DIYtranscriptomics course, should you decide to pursue other content after the conclusion of the workshop. You will need the following to participate in the workshop:

  • A computer
  • An internet connection
  • Join the RStudio Cloud space for the workshop.
I'll be using Google Chrome as my browser. I'm not sure how the RStudio Cloud will behave on other browsers, so I recommend you use Chrome if possible.

Helpful tips

  • 'Work while you watch' - Try to work in your RStudio cloud project alongside the videos. When I'm working on a 'step' script, you should open the same script and being working along with me.
  • If you get stuck on something, don't let it derail you. Watch the video and try to understand the concepts and main ideas. You can always go back and work on the scripts later.
  • Write down any questions as they come to mind. Save these for our discussion sessions each days.

Working with raw fastq files

We won't have time to do read alignments during the workshop, but newer 'pseudoalignment' softwares like Kallisto and Salmon, have made read mapping possible even on the modest resources available on most modern laptop computers. If you're interested in installing Kallisto on your own computer (not our RStudio cloud project) and trying read alignments, you can see our SOP for installing Kallisto here. Again, this is not part of the workshop and should only be tried on your own time. Similarly, we'll discuss the software I like to use for quality checking raw reads, and you can read more about our SOP for QC here, but we won't actually install or run these programs during the workshop.

Practice after the workshop

Practice makes perfect (or at least better!). If you enjoyed the workshop, there's a lot more to explore on the DIYtranscriptomics website. Here are a few suggestions:

  • Module 3 - try installing FastQC and MultiQC for quality checking reads. Install Kallisto and align raw data
  • Module 7 - explore plotting of PCA results and making interactive graphics (videos 2 and 3)
  • Module 8 - learn how to access large amount of publicly available RNAseq data that has already been mapped to human or mouse using Kallisto
  • Module 9 - explore differential transcript usage (DTU) analysis (video 4)
  • Module 10 - install clust and explore its usage for making 'tight' clusters from timecourse data
  • Module 12 - create an Rmarkdown document from your analysis
  • Module 13 - learn to use custom functions and R packages to document and share your analyses
  • 'Hackdashes' - the course website will include three hackdashes that challenge you to apply your skills in a 2hr coding event that uses a different RNAseq dataset.