black line

rockefeller cornell mskcc

Introduction to Differential Gene Expression Analysis using RNA-seq:
[ Feedback ] [ Course Materials ]

Note to WCMC students and employees: in order to receive email announcements regarding workshops registration, if you are a student, please make sure to subscribe to the "Community" broadcast list by following the instructions found here. WCMC employees can subscribe here. Memorial Sloan Kettering and Rockefeller University students and employees should receive the email announcements automatically.

Registration has closed. All spots have been filled and the waiting list is full. If you registered for this course, you will receive an email notifying you of your enrollment status.

It is expected this course will be offered again.

Dates: [ one class in four parts ]

Part 1: Wednesday, November 6th, 2019 - 1:00pm to 4:00pm
Part 2: Thursday, November 7th, 2019 - 2:00pm to 5:00pm
Part 3: Wednesday, November 13rd, 2019 - 1:00pm to 4:00pm
Part 4: Friday, November 15th, 2019 - 10:00am to 1:00pm

RNA-seq is a commonly used method of interrogating the transcriptome, enabling both measurement of gene expression levels and isoform quantification. There is a broad constellation of bioinformatics tools available for the analysis of the large datasets that result from an RNA-seq experiment. This workshop will review the appropriate selection and correct usage of these tools, and how these vary with the specific questions being investigated.

Instructor: Friederike Dündar (ABC, WCM)

Goals and Objectives:

This workshop will present current methods of mapping the reads generated by an RNA-seq experiment to a reference genome and assigning reads to genome features, and using the expression levels of those features to identify differentially expressed genes between conditions. At each step, we investigate some of the methods, their principles and biases. We will look at techniques to quantify our confidence in the results, and some of the pitfalls to be aware of.

At the end of this workshop, participants will have performed analysis of a realistic dataset, from data retrieval through differential gene expression, and have an appreciation of the available data sources and tools, be acutely aware of biases, and be able to use these insights to critically interpret published results.


The workflows taught in this workshop will be executed at the UNIX command line, or using R/RStudio. This is not a UNIX or R course, and participants must have a working knowledge of UNIX and R/RStudio.

Specifically, participants should be comfortable with basic operations in a UNIX/Linux environment, including (1) moving around the directory structure and viewing and manipulating files (mkdir, ls, head, cat, cut) (2) running programs, redirecting standard input and output, using the pipe (|) operator in the command shell; and (3) using the grep and sed commands and crafting basic regular expressions.

Needed R skills include familiarity with all data types, ability to subset lists, matrices and data frames, use factor levels, and generate plots with base R graphics.

Participants are asked to bring a Macbook* with the following tools installed: Terminal, R (version 3.3. or higher), RStudio (min. version 1.0.136). *If you have a 2016 or newer Macbook, you must bring a USB-C ethernet adapter.


Sample alignment files

Sample read count table

STAR manual

QoRTs results

FastQC/multiQC results

genes for GO term analysis (mouse)

Course Notes