TY - JOUR
T1 - Effect of genome-wide simultaneous hypotheses tests on the discovery rate
AU - Eyheramendy, Susana
AU - Gieger, Christian
AU - Laan, Maris
AU - Illig, Thomas
AU - Meitinger, Thomas
AU - Wichmann, Erich
PY - 2011
Y1 - 2011
N2 - An increasing number of genome-wide association studies are being performed in hundreds of thousands of single nucleotide polymorphisms (SNPs). Many of such studies carry on a second stage in which a selected number of SNPs are genotyped in new individuals in order to validate genome-wide findings. Unfortunately, a large proportion of such studies have been unable to validate the genome-wide findings. In this study we aim to better understand how to distinguish the truly associated features from the false positives in genome-wide scans. In order to achieve this goal we use empirical data to look at three aspects that may play a key role in determining which features are called to be associated with the phenotype. First, we examine the usual assumption of a uniform distribu tion on null p-values and assess whether or not it affects which features are called significant and the number of significant features. Second, we compare the global behavior of the p-value distribution genome-wide with the local behavior at regions such as chromosomes. Third, we look at the effect of minor allele frequency in the p-value distribution. We show empirically that the uniform distribution is not a generally valid assumption and we find that as a consequence strikingly different conclusions can be drawn regarding what we call significant associations and the number of significant findings. We propose that in order to better assign significance to potential associations one needs to estimate the true distribution of null and non-null p-values.
AB - An increasing number of genome-wide association studies are being performed in hundreds of thousands of single nucleotide polymorphisms (SNPs). Many of such studies carry on a second stage in which a selected number of SNPs are genotyped in new individuals in order to validate genome-wide findings. Unfortunately, a large proportion of such studies have been unable to validate the genome-wide findings. In this study we aim to better understand how to distinguish the truly associated features from the false positives in genome-wide scans. In order to achieve this goal we use empirical data to look at three aspects that may play a key role in determining which features are called to be associated with the phenotype. First, we examine the usual assumption of a uniform distribu tion on null p-values and assess whether or not it affects which features are called significant and the number of significant features. Second, we compare the global behavior of the p-value distribution genome-wide with the local behavior at regions such as chromosomes. Third, we look at the effect of minor allele frequency in the p-value distribution. We show empirically that the uniform distribution is not a generally valid assumption and we find that as a consequence strikingly different conclusions can be drawn regarding what we call significant associations and the number of significant findings. We propose that in order to better assign significance to potential associations one needs to estimate the true distribution of null and non-null p-values.
KW - Genome-wide association study (GWAS)
KW - P-value distribution
KW - Single nucleotide (SNPs)
UR - http://www.scopus.com/inward/record.url?scp=79958100365&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:79958100365
SN - 1948-1756
VL - 2
SP - 163
EP - 177
JO - International Journal of Molecular Epidemiology and Genetics
JF - International Journal of Molecular Epidemiology and Genetics
IS - 2
ER -