Genomic polymorphism data in Arabidopsis thaliana
The Arabidopsis thaliana "HapMap" project
Also available in presentation modeā¦
Background
As sequencing and genotyping costs continue to decrease, association mapping (also known as linkage disequilibrium mapping) is emerging as a powerful, general tool for identifying alleles and loci responsible for natural variation. Although its application to human disease has received most attention (especially the International HapMap Project), association mapping has tremendous potential in a wide range of organisms. Because it naturally occurs as inbred lines, A. thaliana is almost ideally suited for association mapping: once a set of lines has been genotyped, they can be phenotyped over and over, for the same or for different traits, by the entire community. A multi-group effort to realize this potential has been under way for some time:
- With funding from the NSF 2010 Program (DEB-0115062), the Bergelson, Kreitman, and Nordborg labs set out to sequence 1,500 short fragments in a panel of 96 lines using standard PCR-based dideoxy sequencing. The 1,214 annotated sequence alignments generated by the project to date are available for download. More information about this project can be found here.
- Based on the results of the project just described, the Ecker and Weigel labs selected a subset of 20 "maximally diverse" lines for whole-genome re-sequencing using Perlegen technology. The results were published in 2007 (Clark et al., 2007; Kim et al., 2007), and the data are available here.
- With continued support from the NSF 2010 Project (DEB-0519961), the Bergelson and Nordborg labs have joined forces with the Borevitz lab (supported by NIH GM073822) to develop an Affymetrix genotyping chip using SNPs discovered by the Perlegen re-sequencing (Kim et al., 2007), and use it to genotype around 1,300 lines. More information about this project can be found here.
- An effort to completely sequence many (or perhaps even all) of the 1,300 genotyped lines is also underway. The "1001 Genomes Project" is spearheaded by Joe Ecker and Detlef Weigel.
A project to help integrate all these polymorphism data has also been funded by the NSF 2010 Project (DEB-0723935). More information about this project can be found here.
Status of Bergelson/Borevitz/Nordborg efforts
- Over 6,000 accessions (including all common stock center "ecotypes") have been genotyped using 149 genome-wide SNPs in order to provide basic identity information. A paper describing these data is in preparation.
- Approximately 1,300 accession will be genotyped using our 250k SNP-chip. Over 900 have already been done. Data are being made available continuously. All genotyped lines will be available through the stock centers.
- A paper describing a first attempt at genome-wide assocation using these data is in preparation. A website that allows easy access to the data and results has been developed and will soon be made public.
