Main

It has been estimated that the human genome contains more than 10 million nucleotide positions that have common (defined as >1%) variation between individuals in a population1. The prevailing interest in the large-scale genotyping of SNPs originates in the proposal that genome-wide association studies with SNP markers may enable the identification of genetic variation that predisposes to complex disorders2,3. As a result of the International Haplotype Mapping Project4 and recent data from Perlegen Sciences5, public databases currently contain data for more than 2 million SNP markers with verified allele frequencies. The aim of the International Haplotype Mapping Project is to characterize linkage disequilibrium (LD) patterns across the genome to facilitate selection of the most informative subsets of 'tagging' SNPs6 for genome-wide association studies. Studies on LD patterns suggest that genome-wide association studies will require the genotyping of several hundred thousands of SNPs in each individual7. Fine mapping of genes associated with disease in large genomic regions previously defined by linkage analysis also requires the genotyping of hundreds or thousands of SNPs. Efficient and cost-effective SNP genotyping methods will be required for routine clinical applications once disease-predisposing genes have been identified and the allelic variants that predict disease or improve diagnostics have been specified. Moreover, many of the complex diseases and traits may be caused by rare alleles, which can only be detected by resequencing complete genomic regions in multiple individuals first for identification of the variants8 and then for disease diagnostics. Thus, there are many calls for the genotyping of SNPs on a large scale.

The possibility of highly multiplexed analyses combined with the low costs per data point offered by microarray-based assays denote considerable advantages for SNP genotyping on a large scale. During the past ten years, microarray technology for the genome-wide expression analysis of tens of thousands of genes in each sample, together with computational tools for analysis of the large amount of data generated by this technology, has matured9. Analogously to expression profiling, microarray-based methods have been successfully adapted to the analysis of genomic copy-number alterations by hybridization to probes spanning large chromosomal regions using comparative genomic hybridization10. Development of robust microarray technology for genotyping at the resolution of single nucleotides on a genome-wide scale has, for reasons discussed below, proved to be a challenging task. Not until recently was there any progress in the development of microarray systems for highly multiplexed genotyping that has the potential for genome-wide SNP genotyping.

PCR strategies enabling multiplex SNP genotyping

Identification of a specific single-base change among the 3 billion bases that constitute the human genome is a challenging task, which was facilitated 20 years ago by PCR11,12. PCR offers a means of reducing the complexity of the genome and of increasing the copy number of the DNA templates to levels required for the specific and sensitive detection of single-base changes. But the design of robust PCR assays with multiplexing levels exceeding 10–20 amplicons has proved to be more difficult than initially anticipated. The reason is that, in multiplex PCR, the number of undesired interactions between the PCR primers increases exponentially as the number of primers included in the reaction mixture increases. This interaction usually results in preferential amplification of unwanted 'primer-dimer' artifacts instead of the intended DNA templates (amplicons). Another problem in multiplex PCR is sequence-dependent differences in PCR efficiency between the amplicons. The problems of multiplexing can be reduced to some extent by using PCR primers that are as similar as possible to one another13,14,15. But the multiplexing level that can be readily achieved in standard PCRs does not reach the multiplexing capacity offered by current technology for producing high-density DNA microarrays. Simultaneous analysis of a reasonable amount of genomic DNA with the current detection sensitivity of microarray scanners requires an amplification step. The PCR step complicates the molecular reaction principles underlying the assays and introduces multiple laboratory steps into the procedures and is therefore the chief obstacle to highly multiplexed SNP genotyping.

Complexity reduction. One approach taken to reduce the complexity of the genome before SNP genotyping is cleavage of genomic DNA with a restriction enzyme, ligation of common adaptor sequences to the restriction fragments and use of the adaptor sequences as binding sites for a single universal PCR primer16. This principle, which allows the amplification of thousands of DNA fragments with one PCR primer pair in a single reaction, was pioneered more than 10 years ago for generating fingerprints of anonymous plant DNA fragments using gel electrophoresis in the amplified fragment-length polymorphism technique17. The same principle has also been used to reduce the complexity of the genome before SNP discovery by sequencing18 and by hybridization to high-density oligonucleotide arrays19. The complexity of the human genome can be reduced by a factor of 50 (to 60 Mb) by using a single restriction enzyme (XbaI) followed by PCR with a universal primer to amplify fragments 250–1,000 bp in size20. This combined complexity-reduction and target-amplification step enables the genotyping of 10,000 SNPs using allele-specific hybridization to probes immobilized on high-density oligonucleotide arrays (GeneChip 10K arrays). One drawback of this PCR strategy is that only preselected panels of SNPs located in the amplified fragments can be genotyped. The multiplexing level of this system has been further increased to 100,000 SNPs by using a combination of two enzymes (XbaI and HindIII) for the complexity-reduction step and by amplifying PCR fragments up to 2,000 bp in two separate reactions21 (GeneChip 100K arrays). Here, 10% (300 Mb) of the genome is represented in the amplified fragments, which would correspond to multiplexed amplification of 100,000 DNA fragments in only two PCR reactions.

Long-range PCR. The high-density–array genotyping system from Perlegen Sciences takes a 'brute force' PCR approach. This system is first described for SNP discovery and haplotyping on chromosome 21. Total genomic DNA (in haploid human-rodent cell hybrids) representing the entire chromosome 21 (1% of the genome, 32 Mb) was amplified by long-range PCR of fragments of 10 kb, using more than 3,000 individual PCR reactions22. The amplified fragments were resequenced by allele-specific hybridization to very large microarrays carrying oligonucleotide probes produced by photolithographic in situ synthesis23. The same approach has been used for SNP discovery across the whole genome and for the genotyping of more than 2 million SNPs in (diploid) human genomic DNA samples5. The coverage of the human genome was 92% and required long-range PCR of 300,000 fragments. The amplicons were genotyped for 2 million SNPs using 49 different high-density oligonucleotide arrays for each individual sample. Also in this case, the complexity of the genomic DNA applied to an array is then reduced by a factor of 50.

PCR for signal amplification. Carrying out the SNP-genotyping reaction directly on genomic DNA without a prior amplification step, and using a PCR amplification step only for increasing the number of detectable target molecules created by the genotyping reaction, is a reverse and more flexible strategy for addressing the problem of multiplexing the PCR. 'Circularizable' probes, originally called 'padlock' probes24, that contain two regions complementary to adjacent regions in the target DNA sequence are allowed to anneal directly to the genomic DNA, followed by enzyme-assisted detection of the SNP alleles in the Molecular Inversion Probe assay (Parallele Biosciences)25. Analogously, in the GoldenGate assay (Illumina), two oligonucleotides are used for recognition of the target DNA flanking the SNPs prior to enzyme-assisted genotyping26,27. The genotyping reactions generate templates that can be amplified by PCR using universal primer sequences for which binding sites have been inserted by means of the detector oligonucleotides in both assays (Fig. 1). Tag sequences, introduced into the templates using the probes and primers, are used to capture the allele-specific PCR products on GeneChip (Affymetrix) or BeadArrays (Illumina) that carry immobilized complementary tag sequences. Proof of principle for the Molecular Inversion Probe assay was first shown by multiplexed genotyping of 1,500 SNPs25. Further refinement of the system with improved algorithms for assay design and genotype assignment have recently allowed multiplexing levels exceeding 10,000 SNPs28. The GoldenGate assay can be multiplexed for genotyping panels of 1,500 SNPs27. Both the Molecular Inversion Probe and the GoldenGate assays permit genotyping of custom-designed SNP panels from any genomic region of interest. A recently devised modification of the principles underlying the GoldenGate assay opens up the possibility of genome-wide genotyping of SNPs directly in the complexity of the whole genome without using a PCR step. A large number (106–107) of fragmented copies of the genome are initially produced by whole-genome amplification with random primers in this assay. A signal amplification procedure is used to attain sufficient fluorescence-detection sensitivity after genotyping. Proof of principle for the whole-genome genotyping assay was demonstrated by comparing genotyping data for a panel of 1,500 SNPs with genotyping data for the same panel of SNPs obtained by the original GoldenGate assay29. This principle for whole-genome genotyping is the basis for a new system for genotyping a genome-wide panel of 100,000 'exon-centric' SNPs (Illumina).

Figure 1: Comparison of the Molecular Inversion Probe and GoldenGate assays.
figure 1

Reaction principles and steps of the Molecular Inversion Probe assay (a) and the GoldenGate assay (b) illustrated for one heterozygous A/G SNP. (b) The genomic DNA is biotinylated and immobilized on avidin-coated microparticles. In both assays, oligonucleotides containing target-specific regions (red), binding sites for PCR primers (blue) and tag sequences for capture on a solid support (green) are allowed to anneal directly to genomic DNA. The 3′ end of the upstream target-specific oligonucleotide is extended by a DNA polymerase, followed by ligation to the downstream oligonucleotide. (a) The probe is extended by a single nucleotide in two separate single-base reactions. (b) A pair of allele-specific primers is used. The new, allele-specific DNA molecules created by the reactions are amplified by PCR. (a) The circular molecule is cleaved at a site between the PCR primers before and after PCR, followed by fluorescent labeling. (b) The fluorescence labels are introduced using the PCR primers. The labeled molecules are captured on GeneChip glass microarray (a) or a BeadArray (b) carrying complementary tag sequences for fluorescence detection.

Reaction principles and microarray design

Surprisingly, the microarray-based SNP genotyping systems used today rely on molecular strategies for distinction between SNP alleles that were introduced about 15 years ago12,30,31,32,33. The reaction principles and assays used for SNP genotyping have been reviewed elsewhere34,35. The robustness of the multiplexed microarray-based SNP genotyping systems is determined by the reaction principles applied for SNP allele distinction and the microarray formats used, given that the issues related to PCR have been addressed as discussed above. Table 1 summarizes some of the technical features of the current microarray-based SNP-genotyping systems. In addition, computer software for designing probes and primer for the assays, and algorithms for assigning the genotypes, are essential elements in each of the systems.

Table 1 Features of commercially available microarray systems for SNP genotyping

Allele-specific hybridization. Hybridization with allele-specific oligonucleotide probes (ASOs) is the reaction principle underlying the GeneChip assays. In allele-specific hybridization, the difference in thermal stability between a perfectly matched and mismatched ASO probe and its DNA target is used to distinguish between the SNP alleles. The thermal stability of an ASO probe hybrid depends to a large extent on the nucleotide sequence flanking a SNP, in addition to the stringency (temperature and ionic strength) of the reaction conditions. This intrinsic property of allele-specific hybridization impedes the design of multiplexed ASO microarrays for genotyping any SNP. In the GeneChip systems, this problem is circumvented by careful selection of the SNPs included in the panels on the basis of their performance in the assays and by interrogating each SNP using 40 different ASO probes. The arrays are designed to carry four sets of ten ASO probes corresponding to both strands of the SNP alleles with matched and mismatched probes for each SNP position and four additional nucleotide positions flanking the SNP5,20,21. The requirement of multiple probes per SNP results in very large microarrays for highly multiplexed genotyping. Two arrays with a probe feature size of 8 μM carrying 2.5 million different probes per array are used to genotype 100,000 SNPs in each individual DNA sample21. Correspondingly, as many as 49 such arrays are needed to genotype 2 million SNPs in one sample5. Manufacturing of arrays with oligonucleotides at such high density is accomplished by photolithographic techniques based on combinatorial synthesis of the probes in situ on the arrays23,36. The genotypes are assigned on the basis of the joint fluorescence patterns generated by the 40 hybridization reactions for each SNP using classification- or model-based algorithms developed specifically for this purpose37,38.

DNA polymerase- and ligase-assisted genotyping. Enzyme-assisted SNP genotyping methods provide highly specific distinction between SNP alleles34. This can be explained by the high accuracy of nucleotide incorporation by the DNA polymerases or the high specificity of the DNA ligases in joining two adjacent and perfectly matched DNA strands. Single-base primer extension and ligation are essentially independent of the DNA sequence context and therefore allow genotyping of most SNPs in the same reaction conditions. This possibility provides an obvious advantage compared with allele-specific hybridization by reducing the number of oligonucleotides required in the microarrays systems for multiplexed SNP genotyping. The first microarray-based systems relying on primer extension with SNP-specific primers39,40,41 or ligation probes immobilized on array surfaces42 were devised several years ago. The use of generic microarrays with complementary 'tag' or 'barcode' sequences for capturing the products of enzyme-assisted SNP genotyping reactions done in solution allows flexible design of the SNP assays and simplifies manufacturing of the microarrays. The concept of using tag sequences was first applied for PCR-based analysis of expressed yeast sequences43 using GeneChip arrays. For SNP genotyping, tag arrays have been used in single-base primer extension assays in combination with both high-density44 and lower-density45,46,47 microarrays, with microparticles48,49 and with ligase-assisted detection50,51.

In the Molecular Inversion Probe assay25, hybridization of the two ends of the probes to their genomic DNA targets leaves a single-base gap between the probe ends. The gap is filled by a single-base primer extension reaction, which can distinguish between the SNP alleles, and followed by circularization of the probe by joining its ends by ligation. Excess linear DNA is removed from the reaction mixture by exonuclease treatment. The circular probe molecule, containing primer-binding sites, serves as a template for inverse PCR, as discussed in the previous section. The amplified, inverted probe molecules are cleaved after PCR and labeled using either two or four fluorophores. They are then captured on a GeneChip tag microarray for detection using an array scanner compatible with the GeneChip format. The details of the labeling procedures used in the assays are not disclosed in the published article28. Improved design of the tag sequences was a key element for increasing the multiplexing level of the Molecular Inversion Probe assay from 1,500 SNPs to more than 10,000 SNPs28. Figure 1a illustrates the steps of the Molecular Inversion Probe assay.

In the GoldenGate assay, the required specificity for SNP genotyping is achieved by first attaching the genomic DNA to a solid support, to facilitate stringent washing procedures, and then using two oligonucleotides to recognize the genomic target regions. PCR primer-binding sites and tag sequences are also introduced using these oligonucleotides27. Allele-specific extension of the upstream primer is used to distinguish SNP alleles, followed by ligation of the extended primer to the downstream probe, creating a molecule that can be amplified by PCR. Allele-specific fluorescent labels are introduced using the PCR primers, and the reaction products are captured on BeadArrays, in which each bead type carries a unique complementary tag sequence. The BeadArray matrices have been produced by a random assembly of beads that are 0.3 μM in diameter, to which all the capturing tag sequences have been chemically coupled in a single, pooled reaction. The beads form random arrays and are decoded after production using a combinatorial hybridization scheme52. Figure 1b illustrates the steps of the GoldenGate assay. In the whole-genome genotyping assay, the randomly amplified fragments representing total genomic DNA are captured on BeadArrays carrying a pair of sequence-specific and allele-specific oligonucleotides for each SNP being genotyped. In this system, the beads carrying the decoded oligonucleotides have been grafted on Bead Chips on a microscope slide and allow genotyping of the set of 100,000 SNPs on one slide. The SNPs are genotyped by allele-specific extension of the immobilized capturing probe primers. Biotin residues are introduced during the primer-extension reaction to facilitate an indirect signal amplification step that introduces a fluorescence label for detection29.

Current and future applications of microarray-based SNP genotyping

Successful association studies for identification of alleles associated with disease susceptibility with modest effects will require the analysis of thousands of samples53. Therefore, the utility of microarray-based SNP genotyping systems in large-scale association studies is determined not only by the SNP multiplexing levels, but also by the number of samples that can be processed in parallel, and by the reagent costs involved in the assays. It has not been feasible so far, as illustrated by the comparison of SNP multiplexing levels and sample throughput between the current microarray-based SNP genotyping systems in Figure 2, to combine a high SNP multiplexing level with a high sample throughput. In the GeneChip 10K and Molecular Inversion Probe systems, for example, genotyping of 10,000 SNPs requires one microarray per DNA sample20,28. Genotyping of 2 million SNPs requires a 'wafer' of as many as 49 arrays for each individual DNA sample5. At the other extreme of the scale of SNP multiplexing and sample throughput is the SNPstream system (Beckman Coulter), which is based on tag microarrays in a glass-bottom, 384-well microtiter plate format and allows multiplex genotyping of 12 SNPs (and possibly 48 SNPs in the near future) in 384 samples on one array47. The GoldenGate assay is a functional compromise between multiplexing level and sample throughput; it allows multiplex genotyping of 1,500 SNPs in 96 samples on a single BeadArray matrix that matches a 96-well microtiter plate27.

Figure 2: Comparison of SNP multiplexing levels and number of samples analyzed per array in microarray-based SNP genotyping systems.
figure 2

In each of the systems, multiple arrays can be analyzed in parallel; thus, there is overlap between the methods of choice for a particular application.

The microarray-based SNP-genotyping systems have already found some medical applications despite their limitations in sample throughput and their high reagent costs per sample (Table 2). Genome-wide linkage mapping of genes using microarray systems with fixed SNP panels is a cost-effective and time-saving alternative to linkage mapping with multiallelic markers and allele separation by capillary electrophoresis. Several recent studies have found that SNP panels provide higher data quality, more accurate genotyping results and higher information content, and may also have higher power to detect linkage, than traditionally used panels of microsatellite markers54,55,56. Genome-wide detection of copy-number changes and allelic imbalance due to genomic instability in tumors is another application in which SNP genotyping systems can improve the resolution of microsatellite markers and of comparative genomic hybridization57,58. The number of samples available for analysis is typically lower for linkage studies and detection of copy-number changes in tumors than for association studies of cases and controls. So far, most association studies in large sample sets have used traditional SNP-genotyping assays in formats other than the microarray format, such as the Invader assay15,59 or single-base extension with fluorescence polarization detection60,61. One way to circumvent the limited sample throughput of the microarray systems in association studies is to pool the samples of cases and controls and detect the differences in SNP allele frequencies by quantitative genotyping in the pooled samples. This approach was recently used to screen 7,000 SNPs in candidate genes spanning 17 Mb of DNA for association with high-density lipoprotein–cholesterol levels using long-range PCR and high-density oligonucleotide arrays62.

Table 2 Examples of recent medical applications of microarrays for SNP genotyping

A considerable advantage of the GoldenGate and Molecular Inversion Probe assays is the flexibility of SNP selection. The most important application of these systems so far has been the production of highly accurate genotype data for more than 800,000 SNPs in the International Haplotype Mapping Project4. Because these genotype data are placed in public databases, they can serve as a resource for the development of statistical methods for selecting the most informative tagging SNPs and for population studies on LD and haplotype structure. The formats of these assays are cost-effective compared with previously available SNP-genotyping systems for fine mapping of LD patterns63. They are also useful for fine mapping of genes associated with disease susceptibility in regions previously identified by linkage analysis in affected families, as well as for association studies of candidate genes. Several genotyping centers that have recently acquired high-throughput microarray-based systems have initiated studies to identify positional or functional candidate genes that predispose to complex diseases.

Selection of the most informative SNP panels for genome-wide association studies still awaits the possible results of the Haplotype Mapping projects. The number of SNPs and the optimal SNP selection for analysis will vary between different populations and genomic regions, but several hundreds of thousand of SNPs will probably be required3,7,53. To increase the possibilities of identifying functional genetic variants, a more feasible approach for genome-wide association studies may be to focus on SNPs in coding or conserved regions of the genome. The goal of genome-wide association studies does not seem as distant as it was when this approach was first proposed almost ten years ago2 because of the recent development of highly multiplexed SNP methods. But there are still technical issues related to sample throughput and assay cost, as well as issues related to the statistical analysis of the data, that need to be addressed before comprehensive genome-wide association studies can be implemented in practice to identify genetic variants that cause common complex diseases and traits.