Genetic Complexity and Controversy
Genetic studies have begun to piece together the relationships between bits of the genome and observable characteristics of interest (phenotypes). Starting about 2006, when genome-wide studies of ~106 markers began to be feasible, this association analysis was done one marker at a time. Concern about false positives due to population confounding, or data errors, led to the adoption of conservative adjustments and significance thresholds. Disappointingly few markers were able to satisfy these, and those markers could account for only a small fraction of the heritability (fraction of variance explained by additive genetics). This came to be known as the "missing heritability" problem.
A big step forward came from a UQ group who in 2010 pioneered simultaneous analysis of all markers, using a shrinkage (or penalised) regression model to overcome the "too many predictors" problem. They found that most of this missing heritability was there in front of our eyes: distributed widely over many markers that failed to reach genome-wide significance. However, because these markers each carry only weak causal effects, the form of the penalty function is important. We have shown that the assumptions made in the 2010 paper are unrealistic, and that the results from a large number of published analyses based on the methods of that paper are substantially inaccurate.
Another big step came in 2015 when a group from Harvard/MIT showed how to do similar analyses using only summary statistics for each marker, rather than requiring individual-level data. This is important because the latest datasets include up to ~106 individuals each typed at ~107 markers, too large for the shrinkage-regression methods described above. Their approach has proved very popular and led to hundreds of publications, many in major journals. However the Harvard/MIT authors assumed essentially the same implausible model for the effect sizes of individual markers, and results from their approach are again substantially inaccurate.
The Harvard/MIT group went on to use their approach to focus in particular on the question of where in the genome heritability is located, and it is here that the starkest differences arise between results from their methods, the current standard in the field, and those from our revised analyses. I will explain the reasons for these big differences and review the current state of the science and the controversy.
The view of complex-trait genetics emerging from our work is that nature is even more complex than we had suspected, and heritability for many traits is distributed very widely and thinly across the genome - the term "omnigenic" has recently been coined to describe this genetic architecture, a next step in the classical sequence of monogenic, oligogenic and polygenic.
This is joint work with Doug Speed, of Aarhus Institute of Advanced Studies, Denmark.
Professor David Balding, University of Melbourne