Mining Disease Susceptibility Genes through SNP ... - ACS Publications

Oct 1, 2003 - Christian Jurinke, Dirk van den Boom, Andreas Braun, and Charles R. Cantor*. Sequenom Inc., 3595 John Hopkins Court, San Diego, ...
4 downloads 0 Views 304KB Size
Mining Disease Susceptibility Genes through SNP Analyses and Expression Profiling Using MALDI-TOF Mass Spectrometry Kai Tang, Paul Oeth, Stefan Kammerer, Mikhail F. Denissenko, Jonas Ekblom, Christian Jurinke, Dirk van den Boom, Andreas Braun, and Charles R. Cantor* Sequenom Inc., 3595 John Hopkins Court, San Diego, California 92121 Received October 1, 2003

To find genes that underlie disease susceptibilities, genome-wide single nucleotide polymorphisms (SNPs) have been analyzed using high-throughput matrix assisted laser desorption/ionization (MALDI) time-of-flight (TOF) mass spectrometry (MS). As a proof-of-concept for this approach, gene regions have been identified that were previously associated by others with certain diseases or traits. On the same technology platform, accurate and absolute transcriptional profiling can be performed and applied to allele expression analysis. Here, we provide a brief review of the technology and its applications to disease gene discovery. Keywords: single nucleotide polymorphism • gene expression • differential allele expression • pharmacogenomics • MALDI • mass spectrometry

Introduction Since the first draft of the human genome was published, much attention has been turned to proteomics,1-5 which is aimed to identify and quantify the total protein content, the dynamic product of the genome in cells in different states, and determine the functions and interactions of each protein with other proteins and genes. At the molecular level, gene function is most closely associated with the biochemical activities of proteins. While proteomics is invaluable for discovering protein variants (including changes in sequence, modification, or expression level) related to disease, it only solves part of the puzzle in many cases. Proteins identified as responsible for certain diseases may not be ideal targets for pharmaceutical development. It remains to be seen if any drug candidate targeting a mutant protein would be effective in restoring its normal function. Compensation of the damaged function could likely involve targeting transcription and translation processes upstream or downstream. On the other hand, the new genetics provides a stunning opportunity to move ahead in our understanding of how genetic variability is connected to human health and disease. The sequence of the human genome is providing us with the first holistic view of our genetic heritage. While not yet complete, continued refinement of the data bring us ever closer to a complete human genome reference sequence. The human genome houses almost 3 billion base pairs of DNA, and it is commonly assumed that it contains 30 000-40 000 proteincoding genes. While the coding regions make up less than 5% of the genome, it is noteworthy that the function of the remaining DNA is not clear. There is a strong knowledgebase on how changes in the genomic sequences resulting in changes * To whom correspondence should be addressed. E-mail: ccantor@ sequenom.com.

218

Journal of Proteome Research 2004, 3, 218-227

Published on Web 01/30/2004

of amino acid composition may dramatically affect protein function (nonsynonymous changes), the current understanding is weaker on how genetic variability in noncoding sequences contribute to disease. The role of genes in a number of inherited monogenic disorders is well understood. For some diseases, one particular gene has such a major effect that mutations in it are said to ‘cause’ the disease. In most cases, however, there is no major single determinantsabundant genetic data suggest that most common diseases are polygenic, i.e., they are resulting from the combined action of alleles of more than one gene (e.g., heart disease, diabetes, and some cancers). Although such disorders are inherited, they depend on the simultaneous presence of several alleles; thus, the hereditary patterns are usually more complex than those of single-gene disorders. Knowing which genes are involved in predisposition for a disease would be crucial for understanding the interactions among the network of proteins responsible for the disease. Once the network of genes is identified, their expression at the mRNA and protein level can be further studied by gene expression profiling and targeted protein analyses. Significant changes in cellular mRNA levels between normal and disease cells provide important indications of disease development, and have been used for diagnosis6 and classification.7 Gene expression at the mRNA level is certainly affected by polymorphisms in regulatory sequences (e.g., promoter regions). These polymorphisms can only be identified by genomic methods. To discover genetic factors contributing to the etiology and pathophysiology of complex disorders, high-throughput genotyping is a necessity. Single nucleotide polymorphism (SNP) detection technologies are used to scan for new polymorphisms and to determine the allele frequencies of characterized polymorphisms. SNP detection technologies have evolved from labor intensive, time-consuming, and expensive processes to 10.1021/pr034080s CCC: $27.50

 2004 American Chemical Society

Mining Disease Susceptibility Genes

some of the most highly automated, efficient, and relatively inexpensive methods in biomedical research. As SNP analysis techniques and SNP marker sets are improving, researchers have begun to carry out large-scale genome scans for disease genes, with encouraging first results. We have adopted such a top-down approach for disease gene identification by genome-wide association studies8,9 and transcriptional profiling.10 SNPs are used as markers for genetic association studies.11,12 Significant allele frequency differences between cases and controls are indicative of a locus involved in disease etiology. An evenly distributed high-density SNP map is invaluable to genome-wide association studies and offers a good approach to identify the network of genes involved in disease. Matrix-assisted laser desorption/ionization time-offlight (MALDI-TOF) mass spectrometry (MS)13 combined with proper sample preparation is among the most promising techniques for delivering accurate, reliable, and high-throughput analyses of SNPs.14,15 It has also been adapted for absolute and accurate transcription profiling.16 Contrary to the analysis of proteins, where there is no amplification method available, genetic analyses on the DNA and RNA level can use amplification techniques such as PCR. The heterogeneous properties of proteins and peptides present great challenges for analytical methods, but the homogeneity of DNA or RNA samples makes the analyses simpler to execute and easier to adopt to a wide range of applications on the same platform. Here, we present a brief review of the techniques and our unique approaches to genetic analyses using high-throughput MALDI-TOF mass spectrometry. SNP Discovery. Continuing progress and success in several genome-sequencing efforts form the basis for the discovery of the most abundant type of genetic variationsSNPs. The identification and characterization of SNPs is one of the first approaches to extract medical and biological value from genome sequencing data and to elucidate genetic inter- and intra-species variations. A sufficiently dense map of highly polymorphic SNPs is a prerequisite for genome-wide association studies aiming to elucidate genotype-phenotype correlations. The exploration of the degree of linkage disequilibrium in the human genome also builds on the availability of a sufficiently dense map of SNPs. To date, over 5.8 million SNPs have been deposited in public databases. This number may prove sufficient for current efforts in genome-wide association studies and also sufficient to build a haplotype map of the human genome. However, once particular genomic regions are identified to be associated or in linkage with a specific phenotypic trait, higher density panels have to be established. In many cases, public databases do not contain sufficient SNP markers for high-density panels so that new SNPs have to be discovered. Once candidate genes have been selected in subsequent studies, the identification of actual disease causing mutations can be performed. This usually involves re-sequencing the respective genes in multiple informative individuals. MALDI-TOF MS was suggested in the early phase of the human genome project as a separator/detector for Sanger sequencing ladders. The hope was that analysis speed and accuracy could be increased drastically compared to gel-based sequencing. However, the read length obtained in MALDI-TOF MS analysis of Sanger sequencing ladders has been insufficient to this date to justify a broader use of the technology in sequencing efforts. Recently, novel approaches for SNP Discovery and mutation detection by mass spectrometry have been developed based on the availability of reference sequences

research articles for most of the human genome.17-19 To avoid the issues related to ion fragmentation and nearly exponentially decreasing sensitivity of MALDI-TOF MS with increasing analyte length, a concept similar to peptide mapping by mass spectrometry has been adopted. Rather than extending a primer and statistically terminating this primer extension reaction to generate a nested set of products, where the mass difference of termination products represents the nucleotide sequence, these approaches use the principle of base-specific cleavage of amplified target regions. Suppose that the nucleotide sequence of a target region is known, we have biochemical means of cleaving amplification products derived from this target region in four separate base-specific reactions (representing, in aggregate, cleavage at A, C, G, and T). We can generate “virtual” mass spectra in an in silico experiment and compare experimentally obtained mass spectra with this in silico model to identify potential sequence changes. Sequence changes present in the analyzed sample have a profound impact on one or multiple base-specific cleavage spectra: a sequence change can remove a cleavage site, in which case two cleavage products would merge to a larger fragment; a sequence change can add a cleavage site, in which case an existing cleavage product would be further cleaved into two smaller products; a sequence change can also lead to a mass shift, when the sequence change does not involve a cleavage base. Algorithms automatically analyzing base-specific cleavage spectra for identification of sequence changes and reconstruction of the sample sequence have been developed.20 The concept of base-specific cleavage has been combined into a high-throughput process called MassCLEAVE (Sequenom, San Diego, CA).21 In this process, target regions between 300 and 800 base pairs in length are amplified by PCR with primers carrying promoter tags. In a subsequent step, the PCR product is transcribed into a single-stranded RNA molecule using T7 or SP6 RNA polymerase. The transcript is treated with a base-specific RNase like RNase T1 or RNase A, and the cleavage is driven to completion. The cleavage products are conditioned for mass spectrometric analysis by addition of ionexchange resin and are dispensed onto a chip array, which can be automatically scanned in a MALDI-TOF mass spectrometer. After acquisition, the four base-specific cleavage spectra are used to validate the sample sequence and identify potential sequence changes. Although there are several ways to obtain base-specific cleavages, the use of an intermediate RNA transcription step has several advantages. First, it generates a single-stranded molecule and thus avoids issues related to overlapping cleavage patterns from forward and reverse strand, which otherwise would require elaborate strand separation methods. Second, the RNA transcription process further amplifies the analyte available for mass spectrometry. This allows simple ionexchange treatment and dilution to be used as a conditioning method. Third, RNA is less prone to depurination and thus is more stable in standard UV-MALDI. The MassCLEAVE assay is a homogeneous process, where reagents for subsequent steps are simply added to the reaction vial. This simplifies high-throughput processing and allows matching the sample processing with the high-throughput capabilities of MALDI-TOF MS. Figure 1 illustrates how sequence variations such as single base substitutions, insertions and deletions can be identified with base-specific cleavage. Additional signals usually indicate the presence of a sequence change. Compositional analysis based on the molecular mass Journal of Proteome Research • Vol. 3, No. 2, 2004 219

research articles

Tang et al.

Figure 1. A 300-700 bp sequence stretch of interest is amplified by PCR. Two PCR reactions are performed for each sample. One reaction introduces a T7 promoter tag in the forward strand of the amplicon and the other reaction introduces the T7 promoter tag in the reverse strand of the amplicon. PCR amplifications are followed by shrimp alkaline phosphotase (SAP) treatment to deactivate dNTPs and in vitro transcription, where each PCR product is split into two separate transcription reactions. One of transcription reaction uses dC instead of rC and the other transcription reaction uses dT instead of rU. The four different transcription products are then digested with RNase A, which cleaves after each rC and rU. Introduction of dC or dT during transcription mediates base-specific cleavage in each of the four reactions during the subsequent RNase A treatment. A part of the double stranded DNA sequence with a G/A polymorphism is illustrated as an example of the four base-specific cleavage reactions. The reference G-allele is shown in green, and the polymorphic A-allele is shown in red in the spectra. The fragment peaks containing the polymorphism are labeled with their sequences. (a) T-specific forward reaction. dC is used in the transcription reaction of the forward strand and RNase A cleaves after each rU. The G/A polymorphism is indicated by the peak shift of 16 Da. (b) C-specific forward reaction. dT is used in the transcription reaction of the forward strand and RNase A cleaves after each rC. The G/A polymorphism is indicated by the peak shift of 16 Da. (c) T-specific reverse reaction. dC is used in the transcription reaction of the reverse strand and RNase A cleaves after each rU. The C/T polymorphism introduces an extra cleave site and is revealed by the disappearance of the longer fragment dCdCdCdCGAGAAGU and the detection of two extra short fragments GAGAAGU and dCdCdCU (not shown). (d) C-specific reverse reaction. dT is used in the transcription reaction of the reverse strand and RNase A cleaves after each rC. The C/T polymorphism removes a cleave site and is revealed by the additional peak of dTGAGAAGdTAGC. The signal at 3326 Da in the reference sequence spectrum is created by two fragments of the same nucleotide composition. Only one of these fragments is affected by the SNP. The absence of the peak at 3310 Da in the sample spectrum originates from another SNP at a different location in the sequence.

information enables additional signals to be matched with potential sequence changes. A heterozygous sequence change can lead up to five additional signals in the aggregate of four cleavage spectra. This allows unambiguous characterization of the type of sequence change and its location. Homozygous sequence changes are even easier to analyze. In these cases, mass signals predicted from the reference sequence are missing, and the combination of missing and additional signals can be used for characterization and localization of sequence changes. A homozygous sequence change or a general “error” in the reference sequence generates up to 10 mass signal changes (also called observations). The combination of base-specific cleavage and MALDI-TOF MS provides a powerful tool for large-scale comparative sequence analysis. With current acquisition speeds as fast as 1.5 s for a cumulative spectrum (20 shots) a single mass spectrometer can scan more than 2.5 million bases a day. At the same time, the identification of sequence changes is not 220

Journal of Proteome Research • Vol. 3, No. 2, 2004

based on only a single observation (such as two fluorescent colors at one migration time in a capillary), but on multiple, redundant observations of mass signal changes and should provide a higher accuracy in identification of sequence changes or mutations. Allele Frequency Determination. To validate a SNP, which by definition occurs in at least 1% of the population, a large number of individual samples have to be analyzed, which is time-consuming and cost prohibitive. Alternatively, a pooled mixture from a large number of individuals can be used for analysis, and the allele frequency can be obtained by calculating the ratio of the two alleles. To allow this, the quantitative nature of the SNP detection technique has to be validated. The method of choice for SNP validation is a primer extension reaction followed by MALDI-TOF MS detection.22 The primer extension assay (MassEXTEND, SEQUENOM) is designed to yield only two extension products with different lengths; each represents one allele. A schematic representation

Mining Disease Susceptibility Genes

Figure 2. MassEXTEND assay for SNP validation. After PCR amplification, a primer adjacent to the SNP site is extended in the presence of deoxynucleoside triphosphates (dNTPs) and dideoxynucleoside triphosphates (ddNTPs). Whenever one kind of ddNTPs is present in the mixture, the same kind of dNTP is absent. The primer extension comes to a full stop when a ddNTP is incorporated, thereby yields one allele-specific extension product. In the example shown, a T/C polymorphism yields an extension product of 24-mer in the T-allele and a 25-mer in the C-allele with the selected dNTP and ddNTP mixture. Both extension products exist in the case of a heterozygote sample with approximately equal amount.

of the assay design is shown in Figure 2. The allele-specific extension products are detected in MALDI-TOF MS for SNP genotyping. The advantages of the method are obvious. There is no need to label the primers with fluorescent dyes. The enzymatic primer extension reaction adds specificity. Only one intrinsic property of the extension products, their molecular weight, is used for detection. The mass difference of one nucleotide (∼300 Da) between allele-specific extension products makes it easy to analyze in low-resolution linear TOF mass spectrometers without compromising accuracy. The assay design also facilitates relative quantification of the extension products for allele frequency determination in pooled DNA samples. It has been established that the ratio of analyte amounts is proportional to the ratio of peak areas observed in the MALDITOF mass spectrum for any given set of protein or DNA analytes.23-25 Figure 3 illustrates the quantitative ability of MALDI-TOF MS with nucleic acid molecules. In this experiment, two oligonucleotides differing by one base pair (16-mer vs 17-mer) were titrated over a range of concentrations so asto-create ratios from 0 to 100% (0.0-1.0) relative to each other. The line graph depicts the theoretical frequencies of the two analytes in black with a coefficient of determination (R2) of 1. Observed analyte frequencies are plotted in red and resulted in a R2 of 0.9986. The average variation over twenty-one data points was slightly more than 1% (0.011) of the expected frequency. The close correlation between observed and expected frequencies for two analytes on the MALDI-TOF MS demonstrates its quantitative ability. This experiment was

research articles

Figure 3. Expected vs observed frequencies for two oligonucleotides differing by a single base pair (16-mer: 5′-CCATCCACTACAACTA-3′ vs 17-mer: 5′-CCATCCACTACAACTAC-3′) mixed at known ratios. Mixed ratios were titrated from 0 (0% allele frequency) to 1.0 (100% allele frequency) for the two oligos. Expected ratio values are shown in black with a coefficient of determination (R2) of 1 (perfect fit). The observed oligo frequencies for the low mass (16-mer) are shown in red with standard deviations and a R2 of 0.9986 indicating very strong correlation to the expected frequencies. Frequencies were calculated using TYPER RT v3.0.1 software (SEQUENOM). The calculated oligo frequency with standard deviation for each assay represents the average of four replicate mixtures each dispensed in four replicates onto silicon chip arrays loaded with matrix (SpectroCHIP, SEQUENOM).

designed to mimic the primer extension products now routinely used to genotype SNPs.14 These measurements can be described as semiquantitative since the ratio of two analytes of unknown concentration are measured, not the amount of each product alone. Semiquantitative determination of allele frequencies is achieved by calculating the areas of the peaks associated with specific primer extension reactions. Since SNPs are generally biallelic, the allele frequency is calculated as the ratio of the area of each allele to the total summed area of alleles. The sum of the ratios of two alleles is always 1.0 for any given population of molecules. In a MALDI-TOF mass spectrum, identified peaks at specified masses (corresponding to expected alleles) are given a Gaussian fit and the area under the peak is integrated. This approach to quantification is useful for the analysis of allele distributions in nucleic acid mixtures such as pooled populations.26,27 Ross et al. were the first to describe the quantitative ability of MALDI-TOF MS in conjunction with primer extension reactions using mixtures of homo-and heterozygous individuals with known allele ratios.28 The authors showed that allele frequencies in complex mixtures of DNA had a limit of detection (LOD) of 2% and a limit of quantitation (LOQ) of 5-10% for minor allele frequencies using MALDI-TOF MS. These findings and those by our group conducting similar experiments at about the same time are used as defining parameters for large-scale association studies and genome-wide scans as described later in this review. Figure 4a shows a scatter plot of estimated allele frequencies vs genotyped allele frequencies for 24 primer-extension assays distributed randomly throughout the human genome. Ninetysix individual genomic DNAs were first genotyped for all assays to establish the exact allele frequencies in this sample. The DNAs were then pooled at an equimolar ratio to test the Journal of Proteome Research • Vol. 3, No. 2, 2004 221

research articles

Figure 4. Allele frequency analysis. a. Scatter plot of genotyped population allele frequencies (x-axis) vs allele frequencies calculated using pooled populations DNAs (y-axis). Twenty-four unique assays are depicted. The DNA population consisted of 96 individual DNAs at equimolar concentrations (260 pg per individual DNA/µL ) 25 ng/µL). Frequencies were calculated using TYPER RT software (SEQUENOM). The calculated allele frequency with standard deviation for each assay represents the average of four replicate reactions each dispensed in replicate of four onto silicon chip arrays loaded with matrix (SpectroCHIP, SEQUENOM). For genotyped frequencies, each of the 96 individual DNAs was genotyped for each of the 24 assays using the MassARRAY system (SEQUENOM). Best-fit line and coefficient of determination (R2) were calculated using Excel 2000 (Microsoft). b. Scatter plot of genotyped population allele frequencies (x-axis) vs allele frequencies calculated using pooled population DNAs (y-axis) as described in Figure 3a. The pooled allele frequencies have now been corrected for of the 24 assays using technique described in the text. Note the improvement in the coefficient of determination (R2) after correction with individual heterozygote allele ratios relative to Figure 3a.

accuracy of estimating allele frequencies relative to genotyped frequencies using a homogeneous primer extension assay developed for use with MALDI-TOF MS.29 Each assay was conducted in quadruplicate and standard deviations of 2% or less were achieved for the majority of assays as shown. The coefficient of determination (R2) in Figure 4a exceeds 0.95 (0.953) indicating very good, but not perfect, correlation between the allele frequencies calculated from the pooled samples and the genotyped samples. The differences may arise from several factors such as uneven amplification of alleles during PCR and primer extension reactions that occur for all technologies relying on these processes.30,31 However, the 222

Journal of Proteome Research • Vol. 3, No. 2, 2004

Tang et al.

summed effect of all inaccuracies can be measured using individual heterozygous samples for each assay under investigation. Heterozygotes have an allele ratio of 1:1 (one allele on each chromosome) and should therefore exhibit a 1:1 ratio of allele frequencies (0.50:0.50). Any deviation from this expected ratio can be quantified. The average frequency for each allele from multiple heterozygotes for a particular assay therefore represents the summed effect of any inaccuracies for that assay. The calculated deviation can then be applied to the pooled result as a correction factor.32 Figure 4b shows the 24 primer extension assays from Figure 3a with the corrected allele frequencies for the pooled DNA results. The coefficient of determination (R2) improves to 0.975 as a result of this correction. Multiple studies have compared the quantitative abilities of various platforms used for estimating allele frequencies in populations of nucleic acid molecules.31,33,34 MALDITOF MS based measurement of primer extension reactions have been shown to be at least as accurate, sensitive and reproducible as any other available technology according to these studies. The analysis of allele frequency distributions is a tool to study the abundance of particular alleles in sample sets and to compare these between different collections. The data generated can be used to identify causative genetic loci associated with complex diseases via linkage disequilibrium.35 SNPs have gained acceptance as a tool for conducting such studies because of their widespread distribution throughout genomes and their general ease of measurement.36 Whole Genome Scans and Discovery. The pooling approach makes the execution of genome-wide association studies feasible by reducing the time required and by lowering the costs dramatically.31 To conduct these studies successfully, it is also essential to have a sufficient number of functional SNP assays and well-characterized DNA sample collections. Testing 226 000 putative gene-based SNPs from the public domain in a DNA pool of 94 unrelated individuals of European ancestry,37 we identified 130 000 confirmed polymorphic assays, 105 000 of which were uniquely mapped to the current human genome assembly. Evenly spaced sets of 25 000 to 85 000 of these SNPs were then applied in genome-wide association studies for a variety of clinical phenotypes and quantitative traits. We conduct a genome scan as a multistep process, with filters at each step (Figure 5). Case and control pools are generated using equimolar amounts of typically more than 250 DNA samples. Twenty five nanograms of those DNA sample pools are then subjected to an initial genome scan using at least 28 000 SNPs. Allele frequency differences between the pools are determined, and, subsequently, statistically significant markers (p < 0.05) are run in triplicate.38,39 SNP assays that show reproducible significance are individually genotyped to confirm the association between genotype and phenotype. To date, we have executed 12 genome scans for various clinical phenotypes and quantitative traits, including several types of cancer (breast, prostate, skin, lung), metabolic phenotypes (type 2 diabetes, obesity, HDL-cholesterol levels), musculoskeletal phenotypes (osteoarthritis, bone mineral density), hypertension, and schizophrenia. We identified multiple candidate disease genes to be analyzed in follow-up programs. Among those genes are many proof-of-concept genes or gene regions that were previously discovered by other research efforts and described as being associated with the respective diseases. Besides genes in wellknown linkage regions and regions of loss-of-heterozygosity

Mining Disease Susceptibility Genes

research articles

Figure 5. Genome scan processsfrom case/control pools to genotyped candidate disease genes.

(LOH), we identified a number of genes previously associated by others with certain diseases or traits. Examples include the peroxisome proliferative activated receptor gamma, PPARγ, for which we reproduced the association of the P12 variant with susceptibility for type 2 diabetes,40 and the protein phosphatase 1 regulatory subunit 3A, PPP1R3, which has been described to be implicated in insulin resistance.41 In the melanoma scan we identified the B-RAF kinase gene, for which somatic mutations have been identified in 66% of patients with malignant melanoma.42 The DLC-1 (deleted in liver cancer 1) gene, a tumor suppressor gene previously described as a candidate for sporadic breast cancer as well as several other cancer types, was identified as highly significant in our genome scan for breast cancer susceptibility genes.43 We also discovered aggrecan 1, AGC1, to be associated with osteoarthritis as postulated earlier.44 And in the genome scan using groups of extreme HDL-cholesterol (HDL-C) levels, we re-discovered the cholesteryl ester transfer protein, CETP,45 and the lipoprotein lipase, LPL,46 both commonly recognized to be involved in the regulation of HDL-C levels. All candidate disease genes identified in the initial genetics discovery process are subjected to a follow-up program that includes additional genetics, phenotypic analysis, and biological evaluation. Additional genetics work is necessary to identify and eliminate false positive associations that are likely due to potential sample stratification and sampling errors. The most powerful method to achieve this is replication of the observation in one or more independent samples. For example, we were able to replicate findings for several genes originally identified in a German sample associated with type 2 diabetes, including PPARγ, using case and control samples from Denmark and Newfoundland. Another genetics tool we applied is the analysis of a dense set of SNPs surrounding the original finding. Fine-mapping the

Figure 6. Procedure of gene expression analysis.

genomic region not only provides additional confidence in a finding; it also helps narrow down the gene and/or domain underlying the phenotype. This approach enabled us, for example, to identify the FOXA2 gene, previously described as a gene potentially involved in the etiology of different types of diabetes,47,48 as a candidate for type 2 diabetes. In this case, the originally significant SNP was located 20 kb upstream of the gene and the analysis of proximal SNPs pointed us to the FOXA2 gene. On the basis of solid genetic results, we then perform secondary phenotype analysis and in silico biology of the identified candidate genes. The resulting gene-to-disease model determines biological target validation experiments and subsequently defines candidates for downstream development programs in therapeutics and diagnostics. Gene Expression Analysis. Transcriptional profiling is another secondary analysis to be carried out for a network of genes associated with a certain disease. With some minor modifications, the same MALDI-TOF MS based platform used for allele frequency analysis can be used for gene expression profiling.16 First, mRNA is reverse transcribed to cDNA. Then an oligonucleotide standard of 60-90 bp is introduced as a competitor for PCR amplification of a part of the cDNA (Figure 6). The standard is designed to have a sequence identical to part of the cDNA except for one artificial point mutation. A known quantity of the competitor is mixed with the cDNA and co-amplified by PCR. The two amplicons with only one baseJournal of Proteome Research • Vol. 3, No. 2, 2004 223

research articles pair difference can be treated as two different “alleles” and processed for allele frequency determination as described above. Since the ratio of cDNA to standard is faithfully preserved during PCR amplification, primer extension and MALDI-TOF MS, the starting amount of cDNA before PCR can be obtained from the peak ratio of the two “alleles” in the spectrum and the known amount of the original competitor. When the range of the cDNA concentration is not known, a multi-logarithmic titration of the competitor oligo is conducted (e.g., from 1 × 10-9 M to 1 × 10-18 M) in order to obtain a rough estimate of the cDNA concentration (Figure 7a). A serial dilution of the competitor oligo concentration within 1 order of magnitude can then be used to pinpoint the exact cDNA concentration, if desired (Figure 7b). Because of the great flexibility allowed for designing the competitor oligo, we can create a G/C “polymorphism” (i.e., at a G “allele” in cDNA, a C “allele” is used in the competitor oligo, or vice versa) and use four dideoxynucleoside triphosphates for termination of the primer extension reaction. Such an assay design results in two peaks that differ by 40 Da in the mass spectrum instead of the one-nucleotide differences observed in allelotyping assays. The 40-Da mass difference is large enough to completely separate two peaks in the 4-8 kDa range by a linear time-offlight spectrometer, but it is also small enough to minimize the difference in detection efficiency for two analytes of different masses. Compared to traditional competitive PCR methods for gene expression,49 in which PCR products from the gene of interest and the standard have to differ significantly in size to be separated by gel electrophoresis, our choice of a point mutation minimizes the potential difference in PCR amplification efficiencies and can be easily automated for high-throughput analyses. Although existing hybridization-based DNA array methods can monitor the expression of a large number of genes at the same time,50 only changes more than 2-fold can be detected with confidence due to high background noise. In contrast, our method based on competitive PCR, primer extension and mass spectrometry is much more accurate, with a typical coefficient of variation (CV) of less than 3% for the same cDNA.16 With moderate multiplexing, this method can be applied to monitor the expression of a network of genes at the same time, with very high accuracy, since it is extremely important to detect small but biologically important changes in gene expression. The method is also capable of analyzing absolute gene expressions with extreme sensitivity. As few as five copies of cDNA can be quantified16 and detection down to a single copy is possible.51 Methods based on cDNA microarrays and MALDI-TOF MS are complementary. The former is well suited for expression profiling of thousands of genes in a single sample and the latter is best for detailed monitoring the expression of 10-100 genes in a large number of samples. Differential Allele Expression Analysis. Our assay design also allows us to simultaneously quantify the expression of two different alleles in a heterozygous individual. The competitor oligo can be designed to contain a third allele not present in the SNP. However, in most such cases, only relative quantification is necessary and the competitor oligo can be omitted. Then the method is almost identical to that used for allele frequency determination in pooled samples, except that the sample is mRNA and a reverse transcription step is needed to make cDNA before PCR amplification of the two alleles. Unraveling differential allele expression is especially impor224

Journal of Proteome Research • Vol. 3, No. 2, 2004

Tang et al.

Figure 7. (a) Competitor titration for gene expression analysis. Logarithmic plot of titration of competitive template (internal standard) to determine relative gene expression levels for three genes: Chemokine (c-x-c motif) receptor 4 (CXCR4), Glyceraldehyde 3 phosphate dehydrogenase (GAPDH) and Hydroxymethylbilane (HMBS). Input cDNA amount was constant for all titration points. The point at which each titration curve crosses the y-axis at 0.50 represents the concentration at which a 1:1 ratio of cDNA to internal standard alleles have been observed and therefore represents the concentration of cDNA in that sample. Note that each gene shows an increase in cDNA frequency as the internal standard (competitor) concentration is decreased logarithmically until only cDNA template products are measured on the MALDITOF MS. (b) Single log titration covered by five different concentrations of competitor (internal standard) molecule for the measurement of GAPDH expression levels using competitive PCR and primer extension reactions coupled with MALDI-TOF MS (MassARRAY, SEQUENOM). The sample is a mix of thirty-one different cDNAs representing major tissue/organs systems in humans. As described in the text, the ratios of peak areas from the cDNA allele vs the competitor allele are used to calculate the frequency of each respective allele. Here, the cDNA allele is graphed on the y-axis and the competitor molecule concentration on the x-axis. The data fits a logarithmic relationship represented by a very good coefficient of determination (R2) as shown. Substituting 0.50 for y and solving for x in the line equation shown determines the cDNA concentration in the sample. This is the point at which the cDNA and the competitor alleles are at a 1:1 ratio.

tant for identifying the contributing genetic effects in population studies. Differential expression of alleles has mostly been studied in regards to genomic imprinting and X-chromosome inactivation. Genomic imprinting is an epigenetic form of gene regulation that determines the parent-dependent gene expression of genes marked or imprinted during gametogenesis and embryonic development. Imprinting involves differential DNA methylation of alleles in one sex cell lineage but not in the

research articles

Mining Disease Susceptibility Genes

other. Monoallelic expression of an imprinted gene may result from antisense transcript competition52 or from the involvement of various trans-acting factors.53 Other mechanisms of differential allelic expression not related to genomic imprinting are poorly understood. One may speculate that expression of allelic variants of any given gene may differ and may be inherited; however, this has remained largely unexplored partly due to the lack of appropriate methods of analysis. Recently, Yan et al.54 demonstrated Mendelian inheritance of allelic expression in humans. This observation is important for understanding the basis of human individual variation, and it may be valuable for deciphering the genetic basis of some common diseases. Individuals heterozygous for a specific SNP in an exon of a gene of interest are appropriate for studies of allele-specific expression. The relative frequency of alleles in cDNA prepared from such individuals reflects true differential allele expression, whereas the frequency of each allele in genomic nuclear DNA should be close to 50% and nearly identical in all samples and for all SNPs. These marker SNPs serve as a useful discrimination tool for identification and measurement of individual alleles. To measure allelic expression, Yan et al.54 used primer extension with fluorescent dideoxy terminators and analyzed extension products by capillary gel electrophoresis. In our work, we have employed chip-based MALDI-TOF MS for separation and detection of allele-specific analytes. Mass spectrometry offers several advantages over techniques based on fluorescent dyes. It provides a direct labelfree measurement of extended primers and does not involve gel separation. In combination with miniaturized MALDI chips,55 this method is a fully automated procedure and permits extensive use of robotics.26,37 Applying an allele frequency analysis procedure and software to individual cDNA samples should provide a straightforward way to measure allelic variation in gene expression. The phenomenon described by Yan et al.54 offers an excellent model system to validate the MS-based approach for differential allele expression. A series of experiments were performed to replicate the results of these authors for the TP73 tumor protein gene, a gene exhibiting high variability of allelic expression. Four individual human CEPH lymphoblast cell lines were cultured, and genomic DNA, total RNA and cDNA were prepared by standard procedures. A side-by-side investigation of the polymorphism (629T/C, rs1801174) located in the coding sequence of TP73 was performed using nuclear DNA and cDNA samples. After cDNA synthesis, the experimental procedure was similar to the allele frequency assessment in pooled samples by MassARRAY.26,37 Data analysis included the calculation of peak area ratios. MALDI-TOF MS derived allelic frequency values were obtained based on the relative peak areas of the two alleles. Using these values, we calculated ratios of the extended C allele to the extended T allele, i.e., the allele ratios for the 629C/T polymorphism of TP73. As expected, when tested for the relative frequency of the heterozygous genomic DNA, the relative ratio for all four samples was approximately 1. In contrast, the values for cDNA exhibited a substantial variation indicating true differences in allele expression. The allele ratio estimate from nuclear genomic DNA can be used to normalize measurements derived from cDNA. Normalized values for both genomic DNA and cDNA are shown in Table 1. Our ratio estimates of preferential expression of TP73 alleles in these four cell lines show strikingly good correlation to those obtained by Yan et al.54 These ratios ranged from 0.28 to 2.88. This remarkable correlation in all four experiments validates

Table 1. Differential Expression of TP73 Alleles in Four Individual cDNA Samples vs Allelic Frequency Estimate in the Respective Genomic DNA Samplesa using MS method

reference (Yan et al.)

cell line

genomic DNA

cDNA

cDNA

GM10864 GM10834A GM12699 GM12616

0.98 1.01 1.01 1.00

2.88 0.28 0.64 2.05

2.5 0.4 0.5 2.0

a Normalized C:T allele ratios are shown for both genomic DNA and cDNA. Ratios from Yan et al.54 (also normalized) are indicated for comparison. DNA and cDNA from CEPH lymphoblast cell lines were measured side-by-side using allelic frequency analysis procedure. PCR primers for DNA and cDNA were designed using genomic DNA and mRNA sequences, respectively. Four PCR reactions per sample were run, each dispensed on a MALDI-TOF SpectroCHIP (SEQUENOM, San Diego, CA) four times for a total of sixteen measured data points per sample and analyzed using MassARRAY system (SEQUENOM).

the use of MALDI-TOF MS in large-scale differential allele expression studies. Here, we describe a method for allele-specific expression based on the selection of allele-discriminating coding SNPs in genes of interest. The only requirement is the availability of a working primer extension assay. The location of the SNP is determined by the design of the experiment. The closer the SNP frequency is to 50%, the more heterozygous samples will be found for analysis. The inclusion of more than one SNP may increase the power of analysis. However, for the best comparison, these polymorphisms should belong to a single haplogroup and have similar frequencies. The selection of primers for PCR is straightforward and largely aimed at obtaining a robust and reproducible amplification. In addition, RT-PCR primers should be designed in a way that they do not amplify genomic DNA, which might be a contaminant of some cDNA preparations. Mass spectrometry is ideally suited for high-throughput studies because of data quality, reproducibility, and the quantitative nature of MALDI with regards to similar DNA analytes in the same mass spectrum.26,27,33,38 Differential analysis of allelic expression in appropriate tissue cDNA samples can be used to validate genetic association findings and may result in a better understanding of the relationship between genetic variation and disease etiology. The precise cellular mechanisms that control differential allele expression require in-depth investigation. This fact notwithstanding, introduction of the current quantitative approach offers an additional powerful tool for research in post-genomics.

Conclusions We have developed a MALDI-TOF MS based tool for highthroughput SNP discovery, allele frequency determination and gene expression profiling. Results from such genetic analysis can be used for genome-wide association studies to find genes associated with disease susceptibility and for the elucidation of disease mechanisms. The genetic approaches allow us to stratify follow-up studies to link genotype and phenotype. Combined with downstream proteomic analyses, this will provide a complete picture for the interaction of genes involved in disease and their response to environmental perturbations such as drug treatment. Identification of disease genes will improve our ability to not only identify the genetic risk factors for a whole variety of conditions but also ultimately increase our ability to develop new pharmacotherapy. Journal of Proteome Research • Vol. 3, No. 2, 2004 225

research articles References (1) Pandey, A.; Mann, M. Proteomics to study genes and genomes. Nature 2000, 405, 837-846. (2) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422, 198-207. (3) Sali, A.; Glaeser, R.; Earnest, T.; Baumeister, W. From words to literature in structural proteomics. Nature 2003, 422, 216-225. (4) Hanash, S. Disease proteomics. Nature 2003, 422, 226-232. (5) Godovac-Zimmermann, J.; Brown, L. R. Perspectives for mass spectrometry and functional proteomics. Mass Spectrom Rev. 2001, 20, 1-57. (6) Bittner, M.; Meltzer, P.; Chen, Y.; Jiang, Y.; Seftor, E.; Hendrix, M.; Radmacher, M.; Simon, R.; Yakhini, Z.; Ben-Dor, A.; Sampas, N.; Dougherty, E.; Wang, E.; Marincola, F.; Gooden, C.; Lueders, J.; Glatfelter, A.; Pollock, P.; Carpten, J.; Gillanders, E.; Leja, D.; Dietrich, K.; Beaudry, C.; Berens, M.; Alberts, D.; Sondak, V. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2000, 406, 536-540. (7) Golub, T. R.; Slonim, D. K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J. P.; Coller, H.; Loh, M. L.; Downing, J. R.; Caligiuri, M. A.; Bloomfield, C. D.; Lander, E. S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286, 531-537. (8) Risch, N.; Merikangas, K. The future of genetic studies of complex human diseases. Science 1996, 273, 1516-1517. (9) Horikawa, Y.; Oda, N.; Cox, N. J.; Li, X.; Orho-Melander, M.; Hara, M.; Hinokio, Y.; Lindner, T. H.; Mashima, H.; Schwarz, P. E.; del Bosque-Plata, L.; Oda, Y.; Yoshiuchi, I.; Colilla, S.; Polonsky, K. S.; Wei, S.; Concannon, P.; Iwasaki, N.; Schulze, J.; Baier, L. J.; Bogardus, C.; Groop, L.; Boerwinkle, E.; Hanis, C. L.; Bell, G. I. Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat. Genet. 2000, 26, 163-175. (10) Roth, C. M. Quantifying gene expression. Curr. Issues Mol. Biol. 2002, 4, 93-100. (11) Holden, A. L. The SNP consortium: summary of a private consortium effort to develop an applied map of the human genome. Biotechniques 2002, Suppl, 22-24, 26. (12) Judson, R.; Salisbury, B.; Schneider, J.; Windemuth, A.; Stephens, J. C. How many SNPs does a genome-wide haplotype map require? Pharmacogenomics 2002, 3, 379-391. (13) Karas, M.; Hillenkamp, F. Laser desorption ionization of proteins with molecular masses exceeding 10 000 daltons. Anal. Chem. 1988, 60, 2299-2301. (14) Jurinke, C.; van den Boom, D.; Cantor, C. R.; Koster, H. Automated genotyping using the DNA MassArray technology. Methods Mol. Biol. 2002, 187, 179-192. (15) Tang, K., Opalsky, D., Abel, K., van den Boom, D., Yip, P., Del Mistro, G., Braun, A., Cantor, C. R. Single nucleotide polymorphism analyses by MALDI-TOF MS. Int. J. Mass Spectrom. 2003, 226, 37-54. (16) Ding, C.; Cantor, C. R. A high-throughput gene expression analysis technique using competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc. Natl. Acad. Sci. USA 2003, 100, 3059-3064. (17) Rodi, C. P.; Darnhofer-Patel, B.; Stanssens, P.; Zabeau, M.; van den Boom, D. A strategy for the rapid discovery of disease markers using the MassARRAY system. Biotechniques 2002, Suppl, 62-66, 68-69. (18) von Wintzingerode, F.; Bocker, S.; Schlotelburg, C.; Chiu, N. H.; Storm, N.; Jurinke, C.; Cantor, C. R.; Gobel, U. B.; van den Boom, D. Base-specific fragmentation of amplified 16S rRNA genes analyzed by mass spectrometry: a tool for rapid bacterial identification. Proc. Natl. Acad. Sci. USA 2002, 99, 7039-7044. (19) Hartmer, R.; Storm, N.; Boecker, S.; Rodi, C. P.; Hillenkamp, F.; Jurinke, C.; van den Boom, D. RNase T1 mediated base-specific cleavage and MALDI-TOF MS for high-throughput comparative sequence analysis. Nucleic Acids Res. 2003, 31, e47. (20) Bo¨cker, S. SNP and mutation discovery using base-specific cleavage and MALDI-TOF mass spectrometry. Bioinformatics 2003, 19 Suppl 1, i44-i53. (21) Stanssens, P. Z., M.; Meersseman, G.; Remes, G.; Gansemans, Y.; Storm, N.; Hartmer, R.; Honisch, C.; Rodi, C. P.; Bo¨cker, S.; van den Boom, D. High-throughput MALDI-TOF Discovery of Genomic Sequence Polymorphisms. 2003, submitted. (22) Braun, A.; Little, D. P.; Koster, H. Detecting CFTR gene mutations by using primer oligo base extension and mass spectrometry. Clin. Chem. 1997, 43, 1151-1158. (23) Tang, K., Allman, S. L., Jones, R. B., Chen, C. H. Quantitative analysis of biopolymers by matrix-assisted laser desorption. Anal. Chem. 1993, 65, 2164-2166.

226

Journal of Proteome Research • Vol. 3, No. 2, 2004

Tang et al. (24) Nelson, R. W., McLean, M. A., Hutchens, T. W. Quantitative determination of proteins by matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry. Anal. Chem. 1994, 66, 1408-1415. (25) Bucknall, M.; Fung, K. Y.; Duncan, M. W. Practical quantitative biomedical applications of MALDI-TOF mass spectrometry. J. Am. Soc. Mass Spectrom. 2002, 13, 1015-1027. (26) Mohlke, K. L.; Erdos, M. R.; Scott, L. J.; Fingerlin, T. E.; Jackson, A. U.; Silander, K.; Hollstein, P.; Boehnke, M.; Collins, F. S. Highthroughput screening for evidence of association by using mass spectrometry genotyping on DNA pools. Proc. Natl. Acad. Sci. USA 2002, 99, 16 928-16 933. (27) Werner, M.; Sych, M.; Herbon, N.; Illig, T.; Konig, I. R.; Wjst, M. Large-scale determination of SNP allele frequencies in DNA pools using MALDI-TOF mass spectrometry. Hum. Mutat. 2002, 20, 5764. (28) Ross, P.; Hall, L.; Haff, L. A. Quantitative approach to singlenucleotide polymorphism analysis using MALDI-TOF mass spectrometry. Biotechniques 2000, 29, 620-626, 628-629. (29) Storm, N.; Darnhofer-Patel, B.; van den Boom, D.; Rodi, C. P. MALDI-TOF mass spectrometry-based SNP genotyping. Methods Mol. Biol. 2003, 212, 241-262. (30) Barratt, B. J.; Payne, F.; Rance, H. E.; Nutland, S.; Todd, J. A.; Clayton, D. G. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann. Hum. Genet. 2002, 66, 393-405. (31) Sham, P.; Bader, J. S.; Craig, I.; O’Donovan, M.; Owen, M. DNA Pooling: a tool for large-scale association studies. Nat. Rev. Genet. 2002, 3, 862-871. (32) Jurinke, C.; Oeth, P.; van dem Boom, D. MALDI-TOF Mass spectrometry: a versatile tool for high performance DNA analysis. Mol. Biotech. 2003, in press. (33) Le Hellard, S.; Ballereau, S. J.; Visscher, P. M.; Torrance, H. S.; Pinson, J.; Morris, S. W.; Thomson, M. L.; Semple, C. A.; Muir, W. J.; Blackwood, D. H.; Porteous, D. J.; Evans, K. L. SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis. Nucleic Acids Res. 2002, 30, e74. (34) Shifman, S.; Pisante-Shalom, A.; Yakir, B.; Darvasi, A. Quantitative technologies for allele frequency estimation of SNPs in DNA pools. Mol. Cell Probes 2002, 16, 429-434. (35) Cardon, L. R.; Bell, J. I. Association study designs for complex diseases. Nat. Rev. Genet. 2001, 2, 91-99. (36) Tabor, H. K.; Risch, N. J.; Myers, R. M. Opinion: Candidate-gene approaches for studying complex genetic traits: practical considerations. Nat. Rev. Genet. 2002, 3, 391-397. (37) Buetow, K. H.; Edmonson, M.; MacDonald, R.; Clifford, R.; Yip, P.; Kelley, J.; Little, D. P.; Strausberg, R.; Koester, H.; Cantor, C. R.; Braun, A. High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Proc. Natl. Acad. Sci. USA 2001, 98, 581-584. (38) Bansal, A.; van den Boom, D.; Kammerer, S.; Honisch, C.; Adam, G.; Cantor, C. R.; Kleyn, P.; Braun, A. Association testing by DNA pooling: an effective initial screen. Proc. Natl. Acad. Sci. USA 2002, 99, 16 871-16 874. (39) Herbon, N.; Werner, M.; Braig, C.; Gohlke, H.; Dutsch, G.; Illig, T.; Altmuller, J.; Hampe, J.; Lantermann, A.; Schreiber, S.; Bonifacio, E.; Ziegler, A.; Schwab, S.; Wildenauer, D.; van den Boom, D.; Braun, A.; Knapp, M.; Reitmeir, P.; Wjst, M. High-resolution snp scan of chromosome 6p21 in pooled samples from patients with complex diseases. Genomics 2003, 81, 510-518. (40) Altshuler, D.; Hirschhorn, J. N.; Klannemark, M.; Lindgren, C. M.; Vohl, M. C.; Nemesh, J.; Lane, C. R.; Schaffner, S. F.; Bolk, S.; Brewer, C.; Tuomi, T.; Gaudet, D.; Hudson, T. J.; Daly, M.; Groop, L.; Lander, E. S. The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat. Genet. 2000, 26, 76-80. (41) Hansen, L.; Hansen, T.; Vestergaard, H.; Bjorbaek, C.; Echwald, S. M.; Clausen, J. O.; Chen, Y. H.; Chen, M. X.; Cohen, P. T.; Pedersen, O. A widespread amino acid polymorphism at codon 905 of the glycogen-associated regulatory subunit of protein phosphatase-1 is associated with insulin resistance and hypersecretion of insulin. Hum. Mol. Genet. 1995, 4, 1313-1320. (42) Davies, H.; Bignell, G. R.; Cox, C.; Stephens, P.; Edkins, S.; Clegg, S.; Teague, J.; Woffendin, H.; Garnett, M. J.; Bottomley, W.; Davis, N.; Dicks, E.; Ewing, R.; Floyd, Y.; Gray, K.; Hall, S.; Hawes, R.; Hughes, J.; Kosmidou, V.; Menzies, A.; Mould, C.; Parker, A.; Stevens, C.; Watt, S.; Hooper, S.; Wilson, R.; Jayatilake, H.; Gusterson, B. A.; Cooper, C.; Shipley, J.; Hargrave, D.; Pritchard-

research articles

Mining Disease Susceptibility Genes

(43)

(44)

(45)

(46)

(47)

Jones, K.; Maitland, N.; Chenevix-Trench, G.; Riggins, G. J.; Bigner, D. D.; Palmieri, G.; Cossu, A.; Flanagan, A.; Nicholson, A.; Ho, J. W.; Leung, S. Y.; Yuen, S. T.; Weber, B. L.; Seigler, H. F.; Darrow, T. L.; Paterson, H.; Marais, R.; Marshall, C. J.; Wooster, R.; Stratton, M. R.; Futreal, P. A. Mutations of the BRAF gene in human cancer. Nature 2002, 417, 949-954. Yuan, B. Z.; Zhou, X.; Durkin, M. E.; Zimonjic, D. B.; Gumundsdottir, K.; Eyfjord, J. E.; Thorgeirsson, S. S.; Popescu, N. C. DLC-1 gene inhibits human breast cancer cell growth and in vivo tumorigenicity. Oncogene 2003, 22, 445-450. Kirk, K. M.; Doege, K. J.; Hecht, J.; Bellamy, N.; Martin, N. G. Osteoarthritis of the hands, hips and knees in an Australian twin samplesevidence of association with the aggrecan VNTR polymorphism. Twin Res. 2003, 6, 62-66. Freeman, D. J.; Griffin, B. A.; Holmes, A. P.; Lindsay, G. M.; Gaffney, D.; Packard, C. J.; Shepherd, J. Regulation of plasma HDL cholesterol and subfraction distribution by genetic and environmental factors: associations between the TaqI B RFLP in the CETP gene and smoking and obesity. Arterioscler. Thromb. 1994, 14, 336-344. Heizmann, C.; Kirchgessner, T.; Kwiterovich, P. O.; Ladias, J. A.; Derby, C.; Antonarakis, S. E.; Lusis, A. J. DNA polymorphism haplotypes of the human lipoprotein lipase gene: possible association with high density lipoprotein levels. Hum. Genet. 1991, 86, 578-584. Yamada, S.; Zhu, Q.; Aihara, Y.; Onda, H.; Zhang, Z.; Yu, L.; Jin, L.; Si, Y. J.; Nishigori, H.; Tomura, H.; Inoue, I.; Morikawa, A.; Yamagata, K.; Hanafusa, T.; Matsuzawa, Y.; Takeda, J. Cloning of cDNA and the gene encoding human hepatocyte nuclear factor (HNF)-3 beta and mutation screening in Japanese subjects with maturity-onset diabetes of the young. Diabetologia 2000, 43, 121124.

(48) Zhu, Q.; Yamagata, K.; Yu, L.; Tomura, H.; Yamada, S.; Yang, Q.; Yoshiuchi, I.; Sumi, S.; Miyagawa, J.; Takeda, J.; Hanafusa, T.; Matsuzawa, Y. Identification of missense mutations in the hepatocyte nuclear factor-3beta gene in Japanese subjects with late-onset Type II diabetes mellitus. Diabetologia 2000, 43, 11971200. (49) Becker-Andre, M.; Hahlbrock, K. Absolute mRNA quantification using the polymerase chain reaction (PCR). A novel approach by a PCR aided transcript titration assay (PATTY). Nucleic Acids Res. 1989, 17, 9437-9446. (50) Lockhart, D. J.; Dong, H.; Byrne, M. C.; Follettie, M. T.; Gallo, M. V.; Chee, M. S.; Mittmann, M.; Wang, C.; Kobayashi, M.; Horton, H.; Brown, E. L. Expression monitoring by hybridization to highdensity oligonucleotide arrays. Nat. Biotechnol. 1996, 14, 16751680. (51) Ding, C.; Cantor, C. R. Direct molecular haplotyping of long-range genomic DNA with M1-PCR. Proc. Natl. Acad. Sci. USA 2003, 100, 7449-7453. (52) Wutz, A.; Smrzka, O. W.; Schweifer, N.; Schellander, K.; Wagner, E. F.; Barlow, D. P. Imprinted expression of the Igf2r gene depends on an intronic CpG island. Nature 1997, 389, 745-749. (53) Constancia, M.; Pickard, B.; Kelsey, G.; Reik, W. Imprinting mechanisms. Genome Res. 1998, 8, 881-900. (54) Yan, H.; Yuan, W.; Velculescu, V. E.; Vogelstein, B.; Kinzler, K. W. Allelic variation in human gene expression. Science 2002, 297, 1143. (55) Little, D. P.; Braun, A.; O’Donnell, M. J.; Koster, H. Mass spectrometry from miniaturized arrays for full comparative DNA analysis. Nat. Med. 1997, 3, 1413-1416.

PR034080S

Journal of Proteome Research • Vol. 3, No. 2, 2004 227