Wellcome-CTC Mouse Strain SNP Genotype Set
(3rd August 2005)
This page contains information related to our project to genotype Recombinant Inbred Lines and Inbred Lines across 15360 SNPs.
All the genotyping was perfomed by Illumina, San Diego (thanks to Luana Galver, Sandy McBean).
Note that all the genotype files include their creation date as part of their name.
LAST CHANGE TO GENOTYPE FILES WAS ON 3rd August 2005
10052005: The data have now ben remapped onto Build 34 of the mouse genome. You can access the data relative to Build 33 or Build34. You can view the positions of the remapped SNPs here
03082005: We now have genotype data for an additional 10 strains CAST/Ei CASA/RkJ CIM CTP CZECHI/EiJ MAI MBT MOLC/RkJ MOLD/RkJ MOLFEiJ MSM/Ms MusSpretusOutbred1 MusSpretusOutbred2 PANCEVO/EiJ PWK/Rbrc PWK/Pas PWK/PhJ PWK/Ros SKIVE/EiJ SPRET/EiJ. These are mostly wild-derived strains with lower than expected genotype pass rates. It is therefore highly likely that the error rates for these strains are significantly higher than for the other data, and so should be treated with caution. To help the community to make the best use of this data, the genotypes for these strains are also available as a separate text file wild.genotypes_quality_lowcased.03082005.txt.gz that includes the Illumina-computed quality scores [the score score lie between 0 (bad) and 1 (good)]. Genotypes with quality measures below 0.8 are indicated in lower case. In addition the data are available from the snp selector in the usual way.
There are genotypes available for 480 strains and 13370 successful SNP assays that are mapped to build34 of the mouse genome, including 107 SNPs that are mapped to "random" unanchored sequence
13374 SNPs are mapped onto Build 33 of the mouse genome..
We have performed some basic error checking and have not discovered any major problems, but please take note that:
NEW The genotypes can be accessed via a web interface which lets one select genotypes for any of the genotyped strains, either for a single chromosome or across the genome, and optionally restricts output to just those SNPs that are polymorphic between the selected strains. Graphical output is also available.
Alternatively the data can be downloaded as a series of chromosome-specific compressed text files by following these links:
(b) Build33 (use is deprecated)
The file format is space-separated text, with one row of data per strain. The first column gives the strain name. The remaining columns are the genotypes in the marker order specified by the SNP names in the first row of the file.
The data are also available in 3 files, transposed so that each row corresponds to one marker and each column to one strain.These files are small enough to view in Excel.
Haplotype structure of Recombinant Inbred Lines inferred from the data.
Haplotype structure of all other Contributed strains relative to C57BL/6J.
Note that the following samples either failed completely or gave higher than expected failure rates. Many of the latter category are wild-derived mice, where presumably the presence of unknown variants in the flanking sequences caused additional failures.
The SNPs were selected as follows. Where possible we used validated SNPs known to be polymorphic on at least some of the eight strains A/J, AKR BALB/cJ, DBA2/J, C57BL/6J, LP/J, I, RIIIS/J, although in many cases we did not have full strain distribution data. About 7000 SNPs were contributed by GNF (Tim Wiltshire), and 7000 by Merck (Eric Schadt), and 1600 by JAX (Petko Petkov). We thinned out SNPs closer than 50kb with identical strain distribution patterns. We then identified all gaps > 500kb and looked for SNPs to fill them . We used Celera, Czech and Affy SNPs to do this (provided by Mark Daly and Rob Williams). We only included SNPs that mapped uniquely to Build33 of the mouse assembly according to BLAT (thanks to Martin Taylor).
We added a few special SNPs that determine the MHC alleles, the tyrosinase and agouti loci, and the mitochondrion. We included SNPS mapping to unordered chromosomal fragments (like 7_random) because these are likely to become part of the assembly in the future.
The resulting set of SNPs is fairly uniformly distributed on Build 33, see final-9-2-5.space.txt. When the next build comes out no doubt much of this careful work will be undone, and new gaps will appear, but this is the best we could do at the moment. Making this selection was surprisingly time-consuming and difficult. In particular filling the gaps was hard. Whether these gaps really are regions with few SNPs, or are caused by errors in the mouse genome assembly, or are caused by SNP ascertainment problems, remains to be seen.
List of SNPs with "Correct" strand. This file gives the Illumina reported alleles and strand, consistent with the genotypes reported here, and which is often different from the original submitted strand.
Original list of SNPs submitted for genotyping, a comma-separated text file in Illumina format. The columns are: SNP_Name,Sequence,Genome_Build_Version,Chr,Coordinate,Source,dbSNP_Version,Ploidy,Species,Customer_Strand). Note that all SNPs have been remapped onto Build 33 of the mouse genome, and where possible renamed by their dbsnp rs number if that exists. The original SNP name is included in the source information.
Coordinates start at 1, ie follow the DBSNP convention, which is different from UCSC coordinates which start at 0
Conditions of Use
Many thanks to Tim Wiltshire, Mathew Pletcher (GNF) , Eric Schadt (Rosetta/Merck), Petko Petkov (JAX), Mark Daly/Andrew Kirby (MIT/Broad), Rob Williams, Weikuan Gu, Lu Lu, Yan Jioa (University of Tennessee Health Science Center (supported by P20-MH 62009 and U24AA13513)), Chistophe Benoist (Harvard) for providing SNP information. (The source of each SNP is indicated in the file)
Mouse DNA samples
Many thanks to the following people for providing Mouse DNA samples: Christophe Benoist, Chris Ebeling, Beth Bennett, Lu Lu, Daniel Pomp, David Keays, Robert Reis, Grant Morahan, Gudrun Brockmann, Hiroke Nagase, Howard Gershenfeld, Jim Cheverud, Jimmy Spearow, Jonathan Flint, Kathy Hood, Molly Bogue/Susan Deveau, Morley/Haywood, Peter Demant, Petko Petkov, Rob Williams, Simon Horvat, Steve Clapcote, Xavier Montagutelli.
And thanks to the following for providing financial support: The Wellcome Trust (who provided the bulk of the funding), James Cheverud: SMXLG (R24RR015116), Gary A. Churchill: SJXL (R01GM072863), Kent Hunter: AKXD (NIH intramural support), Lu Lu and Beth Bennett: LXS (U01AA014425), Richard S. Nowakowski: CXB (R01NS049445), Robert W. Williams: AXB/BXA, BXD, BXH, and miscellaneous samples (P20-MH 62009 and U24AA13513)