black line

rockefeller cornell mskcc

Finding the Missing Heritability: RU Web Tools for Wide-Locus GWAS

Note to WCMC students and employees: in order to receive email announcements regarding workshops registration, if you are a student, please make sure to subscribe to the "Community" broadcast list by following the instructions found here. WCMC employees can subscribe here. Memorial Sloan Kettering Cancer Center and Rockefeller University students and employees should receive the email announcements automatically.

Date: Thursday, April 3, 2014

Time: 10:00am-1:00pm, 2:00-5:00pm (coffee served)

Organizers: Knut M. Wittkowski, Benedetta Bigio (RU)

Location: The Rockefeller University, 1230 York Ave @ 66th Street – Weiss 301

Registration: Please contact Kristen Cullen via e-mail at

Summary: This tutorial provides an introduction to a novel non-parametric approach to GWAS (chip or NGS), which can identify functionally related clusters of genes in samples of a few hundred subjects only. Each session will consist of (a) highlights of the approach, (b) demonstrations using Web-based computational biostatistics tools accessible from, and (c) ample opportunity for discussion.

Rationale: Many single-SNP GWAS (ssGWAS) have been marred by both low sensitivity and specificity. Wide-locus GWAS is known to have the potential of higher power over ssGWAS with common diseases, but many available methods assume independence and additivity / multiplicativity of risk factors. As a downside, meaningful non-linear relationships may be overlooked (false negatives), while random errors, not subject to biological constraints, may occasionally fulfill any assumption, so that many ‘significant’ results are often false positives. μGWAS (GWAS based on u-statistics for genetically structured multivariate data) increases power by

  1. comprehensively analyzing information from several neighboring SNPs, drawing the position of SNPs within an LD block without causing biases through unrealistic assumptions (independence/additivity), and
  2. replacing the fixed (10−7.5) by a study-specific genome-wide significance, which accounts for (a) GWAS not being randomized and (b) the distribution of p-values depending on the minor allele frequency (MAF).

Objective: This tutorial presents non-parametric methods1 based on u-statistics2 and bioinformatics tools (R/S scripts, MS Excel spreadsheets, a Web server, all accessible from

Audience: This tutorial aims at introducing the novel approach and tools to a broad audience involved with or interested in GWAS or NGS. Little prior knowledge about statistics is expected.

Course Outline: As one of the first applications of u-statistics2, Gehan extended the Wilcoxon/Mann-Whitney test to censored data3. We will discuss generalizations of this approach to more general multivariate data, including gene expression profiles and complex phenotypes4. The demonstrations will include Excel spreadsheets. We will build upon the experience with u-statistics in FBAT of multiple sclerosis5 and extend u-statistics for multivariate data to partial orderings for genetic information (diplotypes, epistasis). Examples will include GWAS results suggesting Ras/Ca2+ signaling in two neurodevelopmental diseases (childhood absence epilepsy6 and autism7), as well as unpublished results in psoriasis, suggesting involvement of the HLA region and interleukins. We will demonstrate access to a GPU enabled grid/cloud infrastructure via the Web server and the novel graphics tools available for interpretation of the results.

Lecturers: Dr. Wittkowski received his MS in Statistics at the University of Carl-Friedrich Gauss, Göttingen, his PhD in Informatics at the Technical University of Stuttgart, and his ScD in Medical Biometry at the Eberhard-Carls-University, Tübingen. After joining The Rockefeller University, he foresaw the potential of u-statistics whose application to multivariate data had been abandoned in the 1940s because of high memory demand, and expanded the method to reflect structures among variables, with successful applications to sports, consumer preferences, and images, as well as in complex phenotypes, gene expression profiles, and wide-locus GWAS. Ms. Bigio received her MS in Software Engineering from the University of Palermo, Italy. She leads the development of the massively parallelized computational biostatistics tools making large data applications feasible.

  1. Wittkowski KM, Song T. Nonparametric methods for molecular biology. Methods Mol Biol. 2010;620:105-53. PMCID: 234771. Available from:
  2. Hoeffding W. A class of statistics with asymptotically normal distribution. Ann Math Stat. 1948;19:293-325.
  3. Gehan EA. A generalised Wilcoxon test for comparing arbitrarily singly censored samples. Biometrika. 1965;52:203-23.
  4. Wittkowski KM, Lee E, Nussbaum R, Chamian FN, Krueger JG. Combining several ordinal measures in clinical studies. Statistics in Medicine. 2004;23(10):1579-92.
  5. Ramagopalan SV, McMahon R, Dyment DA, Sadovnick AD, Ebers GC, Wittkowski KM. An extension to a statistical approach for family based association studies provides insights into genetic risk factors for multiple sclerosis in the HLA-DRB1 gene. BMC Med Genet. 2009;10:10. PMCID: PMC2669470. Available from:
  6. Wittkowski KM, Sonakya V, Song T, Seybold MP, Keddache M, Durner M. From single-SNP to wide-locus: genome-wide association studies identifying functionally related genes and intragenic regions in small sample studies. Pharmacogenomics. 2013;14(4):391-401. PMCID: 3643309. Available from:
  7. Wittkowski KM, Sonakya V, Bigio B, Tonn MK, Shic F, Ascano M, Nasca C, Gold-Von Simson G. A novel computational biostatistics approach implies impaired dephosphorylation of growth factor receptors as associated with severity of autism. Transl Psychiatry. 2014;4:e354. Available from: