Personal tools
You are here: Home 2010 project The pattern of polymorphism in Arabidopsis thaliana

The pattern of polymorphism in Arabidopsis thaliana

Official page for NSF 2010 grant DEB-0519961 (2005-2008)

Introduction

The primary goal of this project, which is a continuation of a previous collaborative project between the Bergelson, Kreitman, and Nordborg labs, is to enable genome-wide association mapping in A. thaliana by genotyping sufficiently many lines using sufficiently many markers. Based on our analyses of existing data (Nordborg et al., 2005; Aranzana et al., 2005; Zhao et al., 2007; Kim et al., 2007) we have decided to genotype on the order of 1,300 lines using a custom Affymetrix 250,000 SNP chip developed from the recently released Perlegen re-sequencing data (Clark et al., 2007; Kim et al., 2007). This represents a considerable increase in effort over the original proposal, and has been made possible by the ever-decreasing costs of genotyping and by combining forces with the Borevitz lab (supported by NIH GM073822).

Project status

We will select lines for genotyping from a set of close to 6,000 lines that have been genotyped using 149 genome-wide SNPs, primarily to detect identical and heterozygous individuals, but also to get a better picture of population structure. Data generation for this phase of the project is essentially complete, and we are currently analyzing the data. The final sample has thus not been selected, but it is likely to contain several large regional "population" samples as well as a geographically diverse selection of lines.

Sample selection will also be informed by analysis of "pilot" 250k SNP data. We have just genotyped the 96 accessions used in the original 2010 diversity project, and we are using these data to improve our understanding of the haplotype structure of the species (as well as to fine-tune the genotyping pipeline). Once these analyses are completed, we will rapidly genotype the full sample of around 1,300 accessions. All genotyping will be completed by summer, and probably much earlier.

250k SNP data for the 96 accessions are available here. We caution that these are preliminary data: SNPs have been called using an extremely primitive algorithm, and two accessions, OMo2-1 and OMo2-3, are missing entirely (genotyping for Tsu-1 also failed, but genotypes for this accession are in the original Perlegen data). The data will change, and we are confident the error rates (which currently average 4%) will go down. Nonetheless, the data are good enough to be useful, and we are making them available to honor our promise of immediate public access.


Document Actions