Article pubs.acs.org/jpr
Clinical Proteome Informatics Workbench Detects Pathogenic Mutations in Hereditary Amyloidoses Surendra Dasari,*,† Jason D. Theis,‡ Julie A. Vrana,‡ Roman M. Zenka,§ Michael T. Zimmermann,† Jean-Pierre A. Kocher,† W. Edward Highsmith, Jr.,∥ Paul J. Kurtin,‡ and Ahmet Dogan‡,⊥ †
Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Mayo Proteomics Core, and Department of Molecular Genetics, Mayo Clinic, Rochester 55905, Minnesota, United States
∥
S Supporting Information *
ABSTRACT: Shotgun proteomics of hereditary amyloid deposits generates all the information necessary to identify pathogenic mutant peptides and proteins. However, these mutant peptides are invisible to traditional database search strategies. We developed a two-pronged informatics workflow for detecting both known and novel amyloidogenic mutations from clinical proteomics data sets. We implemented the workflow in a CAP/CLIA certified clinical laboratory dedicated for proteomic subtyping of amyloid deposits extracted from formalin-fixed paraffin-embedded specimens. Performance of the workflow was characterized on a validation cohort of 49 hereditary amyloid samples, with confirmed mutations, and 85 controls. The sensitivity, specificity, positive predictive value, and negative predictive value of the known mutation detection workflow were determined to be 92%, 100%, 100%, and 96%, respectively. For novel mutation detection workflow, these performance parameters were 82%, 99%, 99%, and 90%, respectively. Validated workflow was applied to detect amyloidogenic mutations from a clinical cohort of 150 amyloid samples. The known mutation detection workflow detected rare frame shift mutations in apolipoprotein A1 and fibrinogen alpha amyloid deposits. The novel mutation detection workflow uncovered unanticipated mutations (W22G and C71Y) of the serum amyloid A4 protein present in patient amyloid deposits. In summary, clinical amyloid proteomics data sets contain mutant peptides of clinical significance that are recoverable with improved bioinformatics. KEYWORDS: bioinformatics, amyloidosis, mutations, proteomics, clinical specimens
■
INTRODUCTION Amyloidosis refers to a complex spectrum of hereditary and acquired diseases that are characterized by abnormal extracellular deposition of misfolded proteins in various organs. A single amyloidogenic protein1,2 present in the deposit is the determining factor of its subtype and the associated disease phenotype. Traditionally, amyloid diagnosis and subtyping is performed in two steps. For diagnosis, the tissue specimen is stained with a chemical dye, Congo red (CR), which is taken up by the unique physical structure of amyloid plaques (βpleated sheet). This makes the amyloid deposits appear reddish-brown and produce apple-green birefringence under polarized light. Next, the subtype is inferred via immunohistochemistry (IHC), which resembles a guided search wherein clinical presentation is used to infer potential subtypes, and clinical surrogates and IHC results are used to finalize the subtype. However, IHC often produces ambivalent results because of the background serum contamination, epitope loss from the formalin fixation process, and lack of specific antibodies for all of the subtypes. This creates diagnostic gray zones hampering patient care. To remedy this, laboratory scientists have developed several shotgun proteomics methods for subtyping amyloid deposits obtained from formalin-fixed paraffin-embedded (FFPE) tissues and fat aspirates.3−6 When technical idiosyncrasies are ignored, © 2014 American Chemical Society
all of these methods operate within a singular framework and share a common bioinformatics problem. They start by isolating proteins from the amyloid deposits, extracted proteins are digested with trypsin, and the resulting peptides are analyzed via liquid chromatography tandem mass spectrometry (LC−MS/MS). Next, bioinformatics pipelines are leveraged to match the MS/MS spectra against a canonical protein sequence database using database search engines such as Sequest.7 Resulting peptide identifications are filtered and assembled into protein identifications using postprocessing software such as Scaffold.8 This type of informatics approach works well for identifying known amyloidogenic proteins2 but fails to detect the causative amino acid sequence variations in hereditary amyloidoses that are important for optimal patient management and appropriate genetic counseling. In this study, we describe a two-pronged proteome informatics workflow for detecting known and novel amino acid mutations from clinical amyloid shotgun proteomics data sets. Known mutations are detected by matching the MS/MS against a custom protein sequence database, augmented with amyloidogenic mutations, using a traditional database search strategy. This part of the workflow was implemented in a CAP/ Received: November 25, 2013 Published: March 20, 2014 2352
dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358
Journal of Proteome Research
Article
light and dissected by laser microdissection. Multiple (2−4) independent microdissections, each encompassing an area of 60,000 μm2, were performed for each case. FFPE fragments from each microdissection were collected in a cap containing cell lysis buffer and analyzed individually. Proteins were extracted from the fragments using heat and denatured via sonication. Extracted proteins were digested with trypsin, and 5 μL of the resulting peptides mixture was analyzed on a LTQOrbitrap XL mass spectrometer (Thermo-Fisher, Waltham, MA) connected to an Eskigent (AB Sciex, Dublin, CA) liquid chromatography (LC) system. A total of approximately 6.6 million MS/MS spectra were collected from all LC−MS/MS analyses. Binary spectral data present in the raw files were transcoded to either MGF format using the extract_msn software or mzML format using the msConvert tool of the ProteoWizard library.10
CLIA certified clinical testing laboratory at the Mayo Clinic. Novel mutations are detected by matching the MS/MS against wild type protein sequences using a sequence tagging search strategy configured to look for unanticipated mutations. The workbench is integrated into the SWiFT9 data processing environment and can take advantage of multinode computer clusters. Application of the workbench on clinical amyloid samples revealed 39 different amyloidogenic mutations in six different genes of patients with various types of hereditary amyloidosis.
■
MATERIALS AND METHODS
Study Subjects
The study was approved by the Mayo Clinic Institutional Review Board. Table 1 presents the demographics and
Bioinformatics
Table 1. Demographics of Patients Who Participated in This Study amyloid typea ATTR AApoA1 AApoA4 AGel controls ATTR AApoA1 AApoA4 AGel SAA4 AFib
no. of cases (M/F/U)b Validation Cohort 41 (33/8/0) 6 (2/1/3) 1 (1/0/0) 1 (1/0/0) 85 (54/31/0) Clinical Cohort 114 (88/25/1) 14 (5/8/1) 4 (2/2/0) 3 (3/0/0) 2 (2/0/0) 13 (9/2/2)
Figure 1 illustrates the two-pronged informatics workflow we developed for detecting known and novel mutations from
age (years)
c
63.5 62.0 50.0 68.0 69.5
± ± ± ± ±
11.0 8.1 0.0 0.0 10.8
63.7 59.9 72.0 63.0 76.0 62.8
± ± ± ± ± ±
14.2 13.2 18.0 8.8 7.0 10.2
a
All patients have hereditary amyloidosis, except controls. ATTR stands for transthyretin amyloidosis, AApoA1 stands for apolipoprotein A1 amyloidosis, AApoA4 stands for apolipoprotein A4 amyloidosis, AGel stands for gelsolin amyloidosis, AFib stands for fibrinogen amyloidosis, and SAA4 stands for serum amyloid A4 amyloidosis. bM stands for male, F stands for female, and U stands for unknown. cAmyloidogenic mutations were validated by Sanger sequencing the corresponding gene. These mutations are summarized in Table 2.
Figure 1. Informatics workflow for detecting amyloidogenic mutations. Right prong of the workflow detects known amyloidogenic mutations using augmented protein sequence databases. Left prong of the workflow detects novel mutations using wild type protein sequences. Sequest, Mascot, and X!Tandem are database search engines. DirecTag derives sequence tags of three amino acids in length from the tandem mass spectra (MS/MS). TagRecon reconciles the inferred tags against protein sequences while making allowances for unanticipated mutations. Scaffold and IDPicker filter the peptide identification results.
characteristics of the study subjects. Overall, the study has a validation cohort (N = 134) and a clinical cohort (N = 150). The validation cohort contains a total of 49 subjects with four different types of hereditary amyloidosis (Table 1). The presence of the pathogenic mutation in the amyloid protein of each subject was confirmed with Sanger sequencing. We also included a total of 85 negative control subjects in the validation cohort whose TTR genes were wild-type by Sanger sequencing. The performance characteristics of the mutation detection workbench were thoroughly evaluated using the validation cohort. The pipeline was later applied to identify amyloidogenic mutations from subjects in the clinical cohort, which contains a total of 150 subjects with six different types of hereditary amyloidosis (Table 1).
amyloid proteomics data sets. Known amyloidogenic mutations are incorporated into a sequence database and detected with a traditional database search strategy. Novel mutations are identified with error-tolerant sequence tag-based searches configured to use wild type protein sequences as reference. Detected mutations are validated using Sanger sequencing. Validated known amyloidogenic mutations are reported to the clinician. Validated novel mutations in amyloidogenic proteins are not clinically reported but are held for clinical evidence accumulation before feeding them back into the known amyloidogenic mutation detection workflow. A complete list of the search engine settings and protein assembly parameters utilized in this study is presented in Supplemental File 1.
Shotgun Proteomics of Amyloid Deposits
We utilized a previously published method to isolate the amyloid deposits from the FFPE tissue biopsies and subject them to tandem mass spectrometry (MS/MS).6 In brief, 10-μm thick sections of FFPE tissues were deparaffanized and stained with CR. CR-positive areas were identified using fluorescent
Amyloid Variant Database Preparation
Traditional protein sequence databases such as the SwissProt contain wild type sequences that are sufficient to identify an 2353
dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358
Journal of Proteome Research
Article
sequence tags from all MS/MS spectra present in a raw file. The software was configured to retain the best 50 tags of three amino acids in length for each spectrum. TagRecon16 reconciled the inferred sequence tags against a composite protein sequence database containing SwissProt’s complete human proteome and common contaminants. Decoy protein sequences were also searched to estimate peptide identification FDRs. The software was configured to derive semitryptic peptides from the sequence database and look for single point mutations as well as the following variable modifications: oxidation of methionine (+15.996 Da) and formation of Nterminal pyroglutamic acid (−17.023 Da). IDPicker17,18 filtered the peptide identifications at a stringent 2% FDR using an optimal combination of MVH, mzFidelity, and XCorr scores. Peptides were assembled into proteins following parsimony rules. Protein identifications with at least two independent peptide identifications were retained for clinical interpretation. For every case, the diagnostic amyloid protein was identified following the protocol described above. Detected mutant peptides were attested following strict criteria described elsewhere.16 The most abundant novel mutation detected in the diagnostic amyloidogenic protein with at least five spectral counts was validated using Sanger sequencing and retained for inclusion in the amyloidogenic mutation knowledgebase. The software utilized in this workflow is available for download, free of charge, from the Web site: http://fenchurch.mc.vanderbilt. edu/software.php.
amyloidogenic protein. However, these databases do not contain representative entries for amyloidogenic mutations. To remedy this, we compiled a list of known amyloidogenic mutations (including frame shifts) from a variety of data sources such as SwissVar,11 MSV3D,12 and published literature. This knowledgebase was manually curated to remove low penetrance variants, resulting in the retention of 456 mutations in 24 different proteins. We also included polymorphisms and other pathogenic variants found in the 24 amyloidogenic proteins in the final knowledgebase. This prevents forceful misinterpretation of a non-amyloidogenic mutant peptide MS/ MS as an amyloidogenic mutant MS/MS. Mutant peptides were generated and appended to a composite database containing sequences of common contaminant proteins and the SwissProt’s complete human proteome. Reversed sequence entries were appended to the final database for estimating false discovery rates (FDRs) of the peptide identifications. Supplemental File 2 lists all of the variants and their corresponding peptide sequences. These additional mutant peptide sequences serve as hook for the protein identification software to pull known amyloidogenic mutations from patient amyloid deposits. Detecting Known Mutants
MS/MS spectra present in each microdissection’s raw file were identified with three different database search engines: Sequest,7 X!Tandem,13 and Mascot.14 All search engines were configured to derive fully tryptic peptides from the augmented protein sequence database and look for oxidation of methionine (+15.996 Da) as a variable modification. X!Tandem automatically searches for the following variable modifications: formation of N-terminal pyroglutamic acid (−17.023 Da) and water loss from glutamates (−18.01 Da). All peptide identifications from a patient’s sample were combined and filtered using Scaffold software.8 Proteins with at least single peptide identification (peptide probability >0.9) were considered for clinical interpretation and validation. For every case, we created a clinical proteomics profile that lists all of the confident protein identifications present in each microdissection along with their respective spectral counts. A pathologist called the amyloid subtype by correlating the clinical factors with the most abundant amyloidogenic protein detected across all microdissections. The most abundant known pathogenic mutation detected in the diagnostic amyloidogenic protein with at least five spectral counts was validated using Sanger sequencing and clinically reported. Here we emphasize that the clinical reporting guidelines require Sanger validation of pathogenic mutations. We took advantage of this orthogonal validation rule to increase the proteomic mutation detection sensitivity by requiring only one highly confident peptide identification in lieu of two. This approach works well for detecting amyloidogenic mutations because their corresponding proteins are often embedded into the amyloid fibrils in a natively degraded form. The FFPE fixation process also degrades the proteins further, making it harder to recover multiple peptide identifications matching to an amyloidogenic mutation locus.
Sanger Sequencing of Amyloidogenic Genes
Candidate genes were isolated from the subject’s blood and subjected to chain-termination sequencing. All exons of a gene were amplified using hybrid primers containing 20−22 bases of gene specific sequence and a universal sequencing primer (UPS) sequence (19 or 23 bases for the forward and reverse primers) at the 5′ end. Amplified products were sequenced using UPS primers, the ABI Big Dye terminators (Applied Biosystems, Foster City, CA) and capillary electrophoresis on an ABI 3730 sequencer. Data were analyzed using Mutation Surveyor (SoftGenetics, College Station, PA) configured to use corresponding reference sequences obtained from GenBank. Detected known amyloidogenic mutations were reported to the clinician, whereas novel mutations were held in a knowledgebase of potentially pathogenic mutations. Novel mutations with strong clinical and/or biophysical evidence are fed back into our known amyloidogenic mutation detection workflow for future use. Molecular Dynamics (MD) Simulation of Novel Mutations in Amyloidogenic Proteins
We detected two novel mutations (W22G and C71Y) in serum amyloid A4 (SAA4) amyloidosis cases. The effect of these mutations on SAA4’s structure was assessed via MD simulation. Homology models were constructed for SAA4 using the ITASSER server.19 The top scoring model has a 4-helix bundle. We performed implicit solvent MD simulations of this model in three different sequence contexts (wild type, W22G, and C71Y). Each simulation was minimized, slowly heated to 270 K, and equilibrated for 1.5 million time steps. The next five million time steps were analyzed. We chose “time steps” rather than a unit of time because implicit solvent simulations speed up the kinetics. Simulations were performed in NAMD software,20 whereas visualization and trajectory analysis was performed in Visual Molecular Dynamics (VMD) software.21
Novel Mutant Detection
We utilize an error-tolerant search paradigm to detect novel mutations. This method derives short sequence tags from the MS/MS spectra and matches them to wild type protein sequences while making allowances for unanticipated amino acid substitutions. In this workflow, DirecTag15 inferred partial 2354
dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358
Journal of Proteome Research
■
Article
RESULTS AND DISCUSSION
Table 2. Validation Cohort’s Amyloidogenic Mutation Detection Summarya
Hereditary amyloid deposits are rich in mutant peptides of clinical significance. The corresponding shotgun proteomics data sets contain all the information necessary to identify these peptides. However, these mutant peptides are invisible to traditional database searches because canonical protein sequence databases do not contain representative sequence entries. As a result, most of the consequential amyloidogenic mutants present in the patient samples go undetected. To remedy this, we developed a two-pronged informatics approach for detecting both known and novel amyloidogenic mutations (Figure 1). We implemented the known mutation detection workflow in a CAP/CLIA clinical testing laboratory at the Mayo Clinic, making it the first shotgun proteomics-based mutation detection workflow that has been routinely used for patient care. The novel mutation detection workflow is utilized for clinical research.
protein, mutation (no. of cases)
enhanced database search
sequence tag search
Hereditary Amyloidosis TTR, T60A (10) 10/10 9/10 TTR, V30M (10) 10/10 10/10 TTR, V122L (5) 5/5 4/5 TTR, S50R (3) 3/3 2/3 TTR, A102S (2) 2/2 2/2 TTR, E54G (2) 2/2 2/2 TTR, T59K (2) 2/2 1/2 TTR, Val122Del*b (1) 1/1 0/1 TTR, A36P (2) 1/2 1/2 TTR, G47V (1) 0/1 1/1 TTR, I84S (1) 1/1 1/1 TTR, L107V (1) 1/1 1/1 TTR, S77Y (1) 0/1 0/1 ApoA1, P27R (2) 1/2 1/2 ApoA1, L99P (2) 2/2 2/2 ApoA1, G50R (1) 1/1 1/1 ApoA1, H179Fs*b (1) 1/1 0/1 ApoA4, N147S (1) 1/1 1/1 Gel, N231D (1) 1/1 1/1 Controls (Senile Amyloidosis of Various Types) TTR, none (85) 0/85 1/85
Detecting Known Amyloidogenic Mutations with High Sensitivity and Specificity
We assessed the reliability of the known amyloidogenic mutation detection workflow in a clinical setting. For this, we employed a validation cohort containing 49 known hereditary amyloidosis patients and 85 controls (Table 1). Amyloidogenic mutations in the patient population were validated by Sanger sequencing the corresponding gene. We also Sanger sequenced the TTR gene of the control subjects in order to rule out the presence of any mutations. These negative controls were critical to assess whether the workflow can resist reporting peptide sequence variants in the absence of mutations in the corresponding genes. Amyloid FFPE tissues from both patients and controls were subjected to shotgun proteomics. Resulting MS/MS were matched against a custom protein sequence database augmented with known amyloidogenic mutant peptides, using three different database search engines. Scaffold software processed the peptide identifications and assembled them into protein identifications. The most abundant mutation detected in the patient’s amyloidogenic protein was crossreferenced with the corresponding genetic information. Patient cases with matching amino acid mutation and gene mutation were considered as true positives (TPs). Patient cases with mismatching amino acid mutation and gene mutation were considered as false positives (FPs). Patient cases with no detectable amino acid mutation in the corresponding amyloid protein were considered as false negatives (FNs). Control cases had no mutations in their TTR gene (confirmed by Sanger sequencing). Hence, controls with no detectable TTR amino acid mutation were considered as true negatives (TNs), FPs otherwise. Table 2 summarizes the mutations detected in the validation cohort by enhanced database search-based known mutation detection workflow. The workflow classified the 134 validation subjects as 45 TPs, 85 TNs, 4 FNs, and zero FPs. We detected correct mutant peptide sequence for one FN case (G47V in TTR), but the number of spectral matches supporting the mutation was below the threshold for clinical reporting. Two FN mutations (S77Y in TTR and P27R in ApoA1) are confined to short tryptic peptides that are not amenable for detection. Overall, the sensitivity, specificity, positive predictive value, and negative predictive value of the known mutation detection workflow are 92%, 100%, 100%, and 96%, respectively.
a
Mutations in the patient cases were validated with Sanger sequencing. TTR genes of the control subjects were also Sanger sequenced to rule out the presence of mutations. None of the mutations were detected by a traditional database search configured to use wild type sequences. Enhanced database search uses mutant peptide sequences to detect mutations. In contrast, sequence tag search infers mutations by matching MS/MS against wild type protein sequences. *bDel stands for deletion, Fs stands for frame shift.
We applied the known amyloidogenic mutation detection workflow to a clinical cohort containing a total of 150 patients with six different types of hereditary amyloidosis (see Table 1 for patient demographics). Table 3 presents the summary of Table 3. Clinical Cohort’s Mutation Summarya amyloid type ATTR
AApoA1 AApoA4 AGel AFib
mutation (no. of cases) V122I(46), T60A(24), V30M(19), P24S(4), S50R(4), T59K(3), G47V(2), A120S(1), D38A(1), E54G(1), E89K(1), F33L(1), F64L(1), I84S(1), I84T(1), L107V(1), L58H(1), R54S(1), S52P(1) L99P(11), E58K(2), P27R(1) N147S(4) N211K(2), A578P(1) E545V(10), R573L(2), F521Fsb(1)
a
Mutations were detected using the known mutation detection workflow. bFs stands for frame shift. These mutations were independently confirmed by Sanger sequencing the respective genes obtained from the corresponding patients.
mutations detected by this workflow in the clinical cohort. All of these mutations were independently confirmed by Sanger sequencing the respective genes in corresponding patients. Curiously, two serum amyloid A4 (SAA4) amyloidosis cases failed to reveal any mutations when processed with this workflow. 2355
dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358
Journal of Proteome Research
Article
Sequence Tagging Detects Novel Point Mutations
We wanted to estimate the fraction of potential true positive mutations that are detected by the novel point mutation detection workflow. At the same time, we also needed to assess the number of false positives reported by this workflow. For this, we configured the workflow to detect mutations present in the validation cohort samples by matching the corresponding MS/MS scans against wild type protein sequences. In this scenario, a high-performance novel mutation workflow should maximize the recovery of true amyloidogenic mutations present in the patient samples while minimizing the reporting of peptide sequence variants that have no corresponding gene mutation. For each sample in the validation cohort, DirecTagTagRecon sequence tag software identified both wild type and mutant peptides. IDPicker filtered the peptide identifications and assembled them into protein identifications. Detected mutations were attested, and low-confident variants were removed from further analysis. For patient samples, the most abundant mutation discovered in the diagnostic amyloid protein was cross-referenced with Sanger sequencing gold standard. For control samples, the most abundant mutation discovered in TTR gene (if any) was compared to the corresponding genetic information. Cross-referenced mutations were classified as TPs, FPs, TNs, and FNs following the abovedescribed logic. Table 2 summarizes the mutations detected in the validation cohort by the sequence tag-based novel mutation detection workflow. This workflow classified the 134 validation subjects as 40 TPs, 1 FP, 9 FNs, and 84 TNs. On the basis of this information, we computed the sensitivity, specificity, positive predictive value, and negative predictive value of the novel mutation detection workflow as 82%, 99%, 98%and 90%, respectively. In our clinical setting, we first processed the samples with the known amyloidogenic mutation detection workflow. Samples that failed to reveal mutations at this stage were reflexed to the novel mutation detection workflow. Following this protocol, we applied the DirecTag-TagRecon workflow on two very rare cases of serum amyloid A4 (SAA4) amyloidosis and detected a W22G mutation and a C71Y polymorphism in SAA4 protein. The C71Y polymorphism (dbSNP Accession: rs2460827) was confirmed with Sanger sequencing. We could not obtain the patient sample required to confirm the W22G mutation. The impact of these mutations on the structure and function of the SAA4 protein was assessed using MD simulations. Since the 3D structure of the SAA4 protein is unknown, we constructed a homology model using I-TASSER software, and MD simulations were performed in three different sequence contexts (wild type, W22G, and C71Y). Figure 2 illustrates the effect of the mutations on the homology model. Both mutations destabilized the helix geometry of the SAA4 protein by significantly altering the helical content (Figure 2). We also computed the impact of these mutations on the thermodynamic stability of the protein using I-Mutant 2.0 software.22 This software employs a support vector machine to predict changes in the protein’s thermodynamic stability (Gibbs free energy) with respect to single point mutations. The software was configured to predict the ΔΔG (ΔGmutant − ΔGwild‑type) using the primary sequence of the protein, due to the lack of Xray crystal structure, at pH 7.0 and temperature of 25 °C. The ΔΔG values for W22G and C71Y mutations in the SAA4 protein were −3.1 and −0.66 kcal/mol, indicating that both of these novel mutations can potentially destabilize the structure
Figure 2. Structural implications of novel mutations detected in SAA4 protein. MD simulations were performed for a homology molecular model. (A) Fraction of residues in an α-helix geometry across the three sequence contexts is shown. C71Y significantly increases the helical content of the ensemble, while W22G leads to a minor decrease. (B−D) Snapshots after the same amount of simulation time for the (B) wild-type, (C) C71Y, and (D) W22G structures using the highest scoring homology model. Structures are colored N- to Cterminus from red through white to blue; depth cueing fog is also employed. The two mutated positions are shown as spheres and labeled. Overall, the mutations destabilize the native SAA4 fold. Either a particular loop region is stabilized in a helical position, further kinking the longer helices (C71Y), or this same region becomes unstructured and longer helices become unkinked (W22G).
of SAA4. On the basis of these two pieces of evidence, we reason that the detected SAA4 W22G and C71Y mutations in SAA4 amyloidosis have potential clinical consequences. These two mutations would be prime candidates for collecting additional clinical and biophysical evidence needed to associate them with amyloidosis. We typically do not report these novel mutations to the clinicians unless there is compelling clinical evidence buttressing their pathogenicity. Database Augmentation Is Necessary To Detect Amyloidogenic Frame Shift Mutations
Frame shifted genes such as fibrinogen alpha (FIBA) are known to produce novel forms of proteins, which misfold and accumulate in various organs leading to amyloidosis.23 Amyloid shotgun proteomics generates all the information necessary to identify these frame shifted proteins. However, traditional protein identification searches often fail to detect these proteins because sequence databases employed for the search do not contain representative entries for the frame shifted proteins. To remedy this, we collected, from clinical literature, a total of nine amyloidogenic frame shift mutations in APOA1 and FIBA genes. Sequences were created for frame shifted portions of the corresponding genes, annotated appropriately, and appended to the SwissProt’s complete human proteome. Tandem mass spectra (MS/MS) from the amyloid deposits were matched against the augmented sequence database using Sequest, 2356
dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358
Journal of Proteome Research
Article
Figure 3. Frame shift mutations in apolipoprotein A1 (APOA1) and fibrinogen alpha (FIBA) chain. Regions with peptide evidence are highlighted in bold letters on yellow background. Red line highlights the novel sequence arising due to a frame shift. FIBA protein was truncated up to the frame shift in order to obtain a compact representation.
Mascot, and X!Tandem software. Scaffold software filtered the resulting peptide matches and assembled protein identifications. Proteins with more than one high confident peptide identification were considered to be present in the amyloid deposit. Frame shifted proteins were accepted into the final results only if they produce at least one high confident peptide identification (probability >0.9) from the novel sequence portion of the protein. Figure 3A illustrates protein sequence coverage map of the APOA1 His179 frame shift detected in the validation cohort. One might question the validity of this frame shift because we detected only one confident peptide identification matching to the novel sequence portion of the protein. However, there are only two tryptic peptides in the frame shifted portion of the sequence. Additionally, we also confirmed this patient’s frame shift mutation via Sanger sequencing. Figure 3B illustrates the sequence coverage map for the fibrinogen alpha (FIBA) F512 frame shift detected in the clinical cohort. In contrast to the APOA1 frame shift, we detected multiple peptides matching to the frame shifted portion of the FIBA protein and also a peptide bridging the native and frame shifted portions of the protein, which further raises the confidence in the validity of this frame shift. Technically, the frame shifted peptides detected by proteomics might not need orthogonal Sanger validation if the peptide sequences are idiosyncratic to the mutant protein. However, frame shifted peptides that share sequence homology with other wild type (human or non-human contaminant) proteins must be validated via DNA sequencing. Regardless, our clinical guidelines require Sanger confirmation of all pathogenic mutations prior to clinical reporting.
protein sequences while looking for unanticipated mass shifts in the experimental peptide due to amino acid mutations. This method can recover the mutations present in patient samples without prior knowledge. When both search strategies were configured to recover mutant peptides from the validation cohort though, the database search-based known mutation detection workflow had higher sensitivity (92% vs 82%) and specificity (100% vs 99%) when compared to the sequence tag search-based novel mutation detection workflow. This is expected because the sequence tag search probes a larger search space of potential mutations, which results in a slightly lower sensitivity and specificity for the method. Hence, mutations detected by the novel mutation detection workflow needs to be validated by independent Sanger sequencing before considering them for clinical use. Another caveat is that frame shift mutations and amino acid deletion mutations cannot be detected with the novel mutation detection workflow. This is because frame shifted peptide sequences are completely new and cannot be matched to a wild type sequence by allowing for single amino acid substitutions. The deletion mutations can shorten the peptide sequence by more than one amino acid, which is not accounted for by the traditional sequence tagbased mutation detection search engines. Why Use Proteomics Instead of Genomics for Amyloid Subtyping?
Amyloid deposits are pure proteinaceous deposits, and the amyloidogenic protein present in the deposit is often produced elsewhere in the body. The deposits may be caused by abnormal folding of wild-type or mutated proteins. The mutations can be germline (hereditary) as seen in hereditary transthyretin amyloidosis (ATTR) or somatic as seen in immunoglobulin light chain amyloidosis (AL). The presence of a germline mutation in the TTR gene or somatic mutations in the immunoglobulin genes of clonal plasma cell neoplasms does not necessarily indicate that the mutation is pathogenic, and the abnormal protein produced by the mutation is a major constituent of the amyloid deposits. In this context, genetic testing by itself does not necessarily prove causality without phenotypic evidence from proteomics studies. In contrast, the proteomic method we described in this study provides conclusive evidence that the abnormal protein is actually deposited in the amyloid plaques.
Known Mutation vs Novel Mutation Detection Workflows
The two workflows described in this article employ completely orthogonal search strategies for recovering amyloidogenic mutations from patient samples. The known mutation detection workflow employs a database search strategy, which matches the experimental peptide mass spectra (MS/MS) against an augmented protein sequence database containing peptide sequence entries for all known amyloidogenic mutations. This strategy fails when the patient mutation has no sequence representation in the database. In contrast, the novel mutation detection workflow uses a sequence tag-based search strategy. This method derives short sequence tags from the MS/MS, and the tags are reconciled against the wild type 2357
dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358
Journal of Proteome Research
■
Article
(6) Vrana, J. A.; Gamez, J. D.; Madden, B. J.; Theis, J. D.; Bergen, H. R., 3rd; Dogan, A. Classification of amyloidosis by laser microdissection and mass spectrometry-based proteomic analysis in clinical biopsy specimens. Blood 2009, 114 (24), 4957−9. (7) Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5 (11), 976−989. (8) Searle, B. C. Scaffold: a bioinformatic tool for validating MS/MSbased proteomic studies. Proteomics 2010, 10 (6), 1265−9. (9) Zenka, R. M., Johnson, K. L., Bergen, H. R. Exploring Proteomics Metadata Using Spotfire and a Companion User Interface; American Society of Mass Spectrometry: Salt Lake City, 2011. (10) Chambers, M. C.; Maclean, B.; Burke, R.; Amodei, D.; Ruderman, D. L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; Hoff, K.; Kessner, D.; Tasman, N.; Shulman, N.; Frewen, B.; Baker, T. A.; Brusniak, M. Y.; Paulse, C.; Creasy, D.; Flashner, L.; Kani, K.; Moulding, C.; Seymour, S. L.; Nuwaysir, L. M.; Lefebvre, B.; Kuhlmann, F.; Roark, J.; Rainer, P.; Detlev, S.; Hemenway, T.; Huhmer, A.; Langridge, J.; Connolly, B.; Chadick, T.; Holly, K.; Eckels, J.; Deutsch, E. W.; Moritz, R. L.; Katz, J. E.; Agus, D. B.; MacCoss, M.; Tabb, D. L.; Mallick, P. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 2012, 30 (10), 918−20. (11) Mottaz, A.; David, F. P.; Veuthey, A. L.; Yip, Y. L. Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar. Bioinformatics 2010, 26 (6), 851−2. (12) Luu, T. D.; Rusu, A. M.; Walter, V.; Ripp, R.; Moulinier, L.; Muller, J.; Toursel, T.; Thompson, J. D.; Poch, O.; Nguyen, H. MSV3d: database of human MisSense Variants mapped to 3D protein structure. Database 2012, 2012, bas018. (13) Fenyo, D.; Beavis, R. C. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal. Chem. 2003, 75 (4), 768−74. (14) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551−67. (15) Tabb, D. L.; Ma, Z. Q.; Martin, D. B.; Ham, A. J.; Chambers, M. C. DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J. Proteome Res. 2008, 7 (9), 3838−46. (16) Dasari, S.; Chambers, M. C.; Slebos, R. J.; Zimmerman, L. J.; Ham, A. J.; Tabb, D. L. TagRecon: high-throughput mutation identification through sequence tagging. J. Proteome Res. 2010, 9 (4), 1716−26. (17) Ma, Z. Q.; Dasari, S.; Chambers, M. C.; Litton, M. D.; Sobecki, S. M.; Zimmerman, L. J.; Halvey, P. J.; Schilling, B.; Drake, P. M.; Gibson, B. W.; Tabb, D. L. IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. J. Proteome Res. 2009, 8 (8), 3872−81. (18) Zhang, B.; Chambers, M. C.; Tabb, D. L. Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J. Proteome Res. 2007, 6 (9), 3549−57. (19) Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinf. 2008, 9, 40. (20) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kale, L.; Schulten, K. Scalable molecular dynamics with NAMD. J. Comput. Chem. 2005, 26 (16), 1781−802. (21) Humphrey, W.; Dalke, A.; Schulten, K. VMD: visual molecular dynamics. J. Mol. Graphics 1996, 14 (1), 33−8 27−8. (22) Capriotti, E.; Fariselli, P.; Casadio, R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005, 33 (Web Server issue), W306−10. (23) Hamidi Asl, L.; Liepnieks, J. J.; Uemichi, T.; Rebibou, J. M.; Justrabo, E.; Droz, D.; Mousson, C.; Chalopin, J. M.; Benson, M. D.; Delpech, M.; Grateau, G. Renal amyloidosis with a frame shift mutation in fibrinogen aalpha-chain gene producing a novel amyloid protein. Blood 1997, 90 (12), 4799−805.
CONCLUSION In 1994, Mann and Wilm described an error-tolerant method for detecting amino acid mutations from tandem mass spectra. In the ensuing two decades, numerous approaches have been developed for improving the detection of mutant peptides from shotgun proteomics data sets. However, most of these methods have been confined to the research and have never been translated into actual patient care. In this work, we describe a two-pronged informatics workflow for detecting known and novel mutations present in amyloidogenic proteins. The database search-based known mutation detection workflow was implemented in a CAP/CLIA clinical testing laboratory for routine use in patient care. The sequence tag-based novel mutation detection workflow was implemented in a clinical research setting for detecting novel amyloidogenic mutations. Even though plenty of work needs to be done to make the shotgun proteomics-based mutation detection routine in a clinical laboratory, we believe that our implementation of the workflow to detect amyloidogenic mutations is a step in the right direction.
■
ASSOCIATED CONTENT
S Supporting Information *
Parameters used for all the search engines, peptide filtering software, and protein assembly software; peptide sequence entries of the amyloid mutation database. This material is available free of charge via the Internet at http://pubs.acs.org.
■
AUTHOR INFORMATION
Corresponding Author
*Tel: 507-284-0513. Fax: 507-284-0360. E-mail: Dasari.
[email protected]. Present Address ⊥
Department of Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS S. Dasari was supported by the Center for Individualized Medicine at the Mayo Clinic and the Department of Laboratory Medicine and Pathology (DLMP), Mayo Clinic. J.D.T., J.A.V., P.J.K., and A.D. were supported by the DLMP, Mayo Clinic.
■
REFERENCES
(1) Sipe, J. D.; Cohen, A. S. Review: history of the amyloid fibril. J. Struct. Biol. 2000, 130 (2−3), 88−98. (2) Sipe, J. D.; Benson, M. D.; Buxbaum, J. N.; Ikeda, S.; Merlini, G.; Saraiva, M. J.; Westermark, P. Amyloid fibril protein nomenclature: 2010 recommendations from the nomenclature committee of the International Society of Amyloidosis. Amyloid 2010, 17 (3−4), 101−4. (3) Brambilla, F.; Lavatelli, F.; Di Silvestre, D.; Valentini, V.; Rossi, R.; Palladini, G.; Obici, L.; Verga, L.; Mauri, P.; Merlini, G. Reliable typing of systemic amyloidoses through proteomic analysis of subcutaneous adipose tissue. Blood 2012, 119 (8), 1844−7. (4) Liao, L.; Cheng, D.; Wang, J.; Duong, D. M.; Losik, T. G.; Gearing, M.; Rees, H. D.; Lah, J. J.; Levey, A. I.; Peng, J. Proteomic characterization of postmortem amyloid plaques isolated by laser capture microdissection. J. Biol. Chem. 2004, 279 (35), 37061−8. (5) Murphy, C. L.; Wang, S.; Williams, T.; Weiss, D. T.; Solomon, A. Characterization of systemic amyloid deposits by mass spectrometry. Methods Enzymol. 2006, 412, 48−62. 2358
dx.doi.org/10.1021/pr4011475 | J. Proteome Res. 2014, 13, 2352−2358