Personal tools
You are here: Home 2010 project Truly genome-wide association mapping in Arabidopsis thaliana

Truly genome-wide association mapping in Arabidopsis thaliana

Preliminary results from the 250k SNP data

Genome-wide scan, chromosome 1 Genome-wide scan, chromosome 2 Genome-wide scan, chromosome 3 Genome-wide scan, chromosome 4 Genome-wide scan, chromosome 5

SNP density and the extent of linkage disequilibrium

The figures to the right show the result of a genome-wide scan using the preliminary 250k SNP data and a categorical trait related to flowering time, namely whether an accession did or did not flower within 200 days under long days without vernalization (data from Zhao et al., 2007). The difference from earlier studies (e.g., Zhao et al., 2007) is striking: peaks are now generally due to many SNPs, which means that we can estimate the width of the region of association (the width of the major peaks in the plots are on the order of 50-100 kb wide, but it should be possible to refine location better than this).  Equally importantly, it also means that, for the first time, we have reasonable power in terms of marker density.  Peaks we saw in earlier studies are still there, but several new ones have appeared.  I am guessing that we now capture most of them (power is of course still limited by the tiny sample size and by the accessions included).

Confounding by population structure

Accessions that did not flower include those from Finland and northern Sweden, plus a few from southern Sweden.  As expected with such a phenotype, the confounding by population structure is extreme: p-values of 10-5 are clearly the norm across the genome.  We will probably be able to strongly reduce the confounding using the mixed-model approach introduced by Ed Buckler's group (see Zhao et al., 2007), but it seems unlikely that we can do so without paying a heavy price in terms of false negatives.  Statistics is not magic: if allelic variation is too strongly associated with population structure, we risk throwing out the true associations with the spurious ones.  A much better approach is to combine the association mapping with linkage mapping in appropriate crosses in order to combine the increased resolution of association mapping with the robustness of standard linkage mapping.

Furthermore, the pattern of spurious associations is inherently interesting.  While some of the peaks in these figures are likely to be due to polymorphisms responsible for extreme late-flowering, others are likely to be due to other adaptively important polymorphisms in strong linkage disequilibrium with the former (e.g., freezing tolerance).

Document Actions