Quantitative Differences in the Urinary Proteome of ... - ACS Publications

Jul 5, 2015 - J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, Maryland .... protocol were approved by the Internal Review Boards of J...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/jpr

Quantitative Differences in the Urinary Proteome of Siblings Discordant for Type 1 Diabetes Include Lysosomal Enzymes Moo-Jin Suh,† Andrey Tovchigrechko,† Vishal Thovarai,† Melanie A. Rolfe,† Manolito G. Torralba,† Junmin Wang,† Joshua N. Adkins,‡ Bobbie-Jo M. Webb-Robertson,‡ Whitney Osborne,§ Fran R. Cogen,§ Paul B. Kaplowitz,§ Thomas O. Metz,‡ Karen E. Nelson,† Ramana Madupu,† and Rembert Pieper*,† †

J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, Maryland 20850, United States Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, Richland, Washington 99352, United States § Children’s National Medical Center, 111 Michigan Avenue North West, Washington, DC 20010, United States ‡

S Supporting Information *

ABSTRACT: Individuals with type 1 diabetes (T1D) often have higher than normal blood glucose levels, causing advanced glycation end product formation and inflammation and increasing the risk of vascular complications years or decades later. To examine the urinary proteome in juveniles with T1D for signatures indicative of inflammatory consequences of hyperglycemia, we profiled the proteome of 40 T1D patients with an average of 6.3 years after disease onset and normal or elevated HbA1C levels, in comparison with a cohort of 41 healthy siblings. Using shotgun proteomics, 1036 proteins were identified, on average, per experiment, and 50 proteins showed significant abundance differences using a Wilcoxon signed-rank test (FDR qvalue ≤ 0.05). Thirteen lysosomal proteins were increased in abundance in the T1D versus control cohort. Fifteen proteins with functional roles in vascular permeability and adhesion were quantitatively changed, including CD166 antigen and angiotensin-converting enzyme 2. α-N-Acetyl-galactosaminidase and α-fucosidase 2, two differentially abundant lysosomal enzymes, were detected in western blots with often elevated quantities in the T1D versus control cohort. Increased release of proteins derived from lysosomes and vascular epithelium into urine may result from hyperglycemia-associated inflammation in the kidney vasculature. KEYWORDS: Urinary proteome, type 1 diabetes, protein biomarker, shotgun proteomics, diabetic vasculature, endothelial barrier function, lysosomal enzyme, α-fucosidase, CD166



using genomics,8 proteomics,9 metabolomics,10 and microbiome analysis11 in search of initiating events in the development of autoimmunity. Clinical studies and animal models have linked viral infections as precedents to the development of T1D.12 Proteomic technologies have been used to identify candidate biomarkers for the prediction of T1D onset and diagnosis of diabetic complications in preclinical stages.13,14 Platelet basic protein and protease C1 inhibitor were found to discriminate T1D from healthy human cohorts in serum, suggesting a role of innate immunity in the emergence of the disease.9 Haptoglobin was identified as an early T1D onset biomarker in serum in a virally induced rodent model of T1D.15 The current methods of choice for early diagnosis of diabetic complications are the measurements of albumin and the albumin/creatinine ratio in urine. Microalbuminuria is defined as the daily voiding of 30− 300 mg of albumin in urine.16 Chronic hyperglycemia associated with T1D can lead to vascular complications such as retinopathy, nephropathy, autonomic and peripheral neuropathy, and

INTRODUCTION Type 1 diabetes (T1D) mellitus has a prevalence of 1 in every 300−500 children in the United States.1 The prevalence of T1D in Americans under age 20 rose by 23% between 2001 and 2009,2 and the rate of T1D incidence among children under 14 worldwide is estimated to increase by 3% annually.3 The increase in T1D incidence with decreased age of onset occurring in individuals who had been previously considered to be at moderate genetic risk suggests that there is now a lower threshold for developing the disease. Autoimmune destruction of the insulin-producing pancreatic β-cells distinguishes T1D from type 2 diabetes (T2D) mellitus. The etiology of T1D involves complex interactions between genetic and environmental determinants. Animal models support the notion of a multifactorial process of disease onset.4 Evidence is emerging that autoimmunity is preceded by a persistent pro-inflammatory state,5 possibly influenced by intestinal microbiota.6 Markers of β-cell autoimmune destruction include antibodies against glutamic acid decarboxylase-65, insulin, protein tyrosine phosphatase ICA 512, and the zinc transporter Slc30A8.7 The apparent heterogeneity in causes of T1D has resulted in research © 2015 American Chemical Society

Received: January 22, 2015 Published: July 5, 2015 3123

DOI: 10.1021/acs.jproteome.5b00052 J. Proteome Res. 2015, 14, 3123−3135

Journal of Proteome Research



cardiovascular disease.17 These complications develop gradually and progress without symptoms for years or decades after T1D onset. Frequently elevated glucose levels perturb both carbohydrate and lipid metabolisms in cells and cause inflammation, which is in part associated with the nonenzymatic, covalent modification of proteins by carbonyl moieties of sugars and peroxidized lipids. Such advanced glycation and lipoxidation end products (AGE/ALE) activate inflammatory pathways by binding to cell surface receptors localized on immune cells.18,19 Modified proteins and adducts, such as carboxymethyllysine and pentosidine, modify intracellular signaling and lead to NF-κB induction followed by increased production of reactive oxygen species, pro-inflammatory cytokines, and cell adhesion molecules.19 AGEs and ALEs mediate cross-linking of extracellular matrix proteins, thus stiffening the vasculature.19 Human organs affected by inflammation implicating these adducts have long protein lifetimes, such as the retina of the eye and neurons,20 or a high metabolic burden, such as the kidneys.21 An AGE-associated biomarker clinically used to assess hyperglycemia and manage diabetes in patients is hemoglobin A1C (HbA1C).22 A recent review of the Diabetes Control and Complications Trial (DCCT) and the follow-on Epidemiology of Diabetes Interventions and Complications (EDIC) study revealed that median HbA1C concentrations for an intensive therapy cohort treated with insulin over 6.5 years on average were lower than those in a control cohort. After all patients received intensive insulin therapy, HbA1C level differences declined over 5 years and were not discernible thereafter, even though the risk of diabetic complications in the DCCT cohort originally receiving intensive insulin therapy continued to be lower.23 Early pathophysiological changes in an era of intensive insulin therapy widely available to T1D patients have not been examined in depth. This is of interest since hyper- and hypoglycemia and compliance of juvenile T1D patients with insulin therapy and diet regimens continue to be clinical challenges. Body fluid samples available noninvasively, such as urine or saliva, are desirable for the discovery of biomarkers to predict and assess the long-term effects of intermittent and frequent hyperglycemia. Surveying urinary proteins via mass spectrometry is attractive because the kidney produces urine and, at the same time, is one organ functionally perturbed in many long-term diabetic patients. Proteomic methods have been used in urinary biomarker discovery studies for diabetic complications including renal function decline in T1D and T2D patients.13,24,25 We conducted a comprehensive urinary proteomic survey of normalbuminuric T1D patients without symptoms of diabetic complications to expand the knowledge of protein abundance changes potentially linked to intermittent and/or frequent hyperglycemia, including proteins previously not surveyed in urine of diabetics due to low (approximately 1 nM) abundance. We chose healthy siblings as controls for juvenile T1D patients treated with the standard insulin therapy, with 40 human subjects available for each cohort. The purpose of this study was not to discover early stage biomarkers for diabetic complications. Rather, the intention was to examine whether urinary proteome signatures in young normalbuminuric T1D patients allowed us to establish links to the molecular pathophysiology of intermittent hyperglycemia, inflammation, and/or AGE formation.

Article

EXPERIMENTAL SECTION

Human Subjects, Urine Sample Collection, and Processing

Urine specimens were collected from T1D patients and healthy siblings associated with Children’s National Medical Center (CNMC). The Childhood and Adolescent Diabetes program at the CNMC is the largest center in the mid-Atlantic region of the U.S.A., caring for approximately 1600 juvenile patients with T1D. A comprehensive human subject protocol was developed to ensure patient confidentiality, including all clinical and molecular metadata collected during recruitment, and that the procedures for specimen collection were not associated with direct health risks. A human subject consent form was designed with sufficient information for the to-be-enrolled sibling pairs and their parents/ guardians (for subjects less than 18 years of age) to allow them an assessment of the public benefits of the research and the confidentiality loss or health risks. The consent form and protocol were approved by the Internal Review Boards of J. Craig Venter Institute (JCVI) and CNMC. JCVI staff had access to patient identifier codes but not the names or addresses of the study participants. The urine specimens were stored up to 4 weeks at −20 °C at the CNMC clinical site and transferred to the JCVI for storage at −80 °C. Urine samples were neutralized by adding 1 M Tris-HCl buffer (pH 7.8) at a 50 mM concentration, followed by centrifugation at 3000g for 15 min at 10 °C. The urine supernatant fraction (USF) was concentrated via centrifugation using an Amicon-Ultra tube (10 kDa MWCO; Millipore, Bedford, MA) at 10 °C. The USF sample of approximately 1.0 mL was exchanged into 50 mM sodium phosphate buffer (pH 7.8). Aliquots were taken for total protein analysis using the Bradford assay26 and SDS-PAGE in 4∼12% Bis-Tris gels (Invitrogen, Carlsbad, CA). The remaining USF sample was frozen at −80 °C until used for shotgun proteomic analysis. Proteolytic Digestion of USF Samples

The sample was subjected to filter-aided sample preparation (FASP) as previously described using a 30 kDa Microcon filtration device (Sartorius Biotech, NY).27 Briefly, a quantity of approximately 200 μg of protein was heat-denatured at 95 °C, reduced with DTT, alkylated with 50 mM iodoacetamide for 30 min at 20 °C in the dark, and digested with 2 μg of trypsin (sequence grade-modified trypsin, Promega, Madison, WI) overnight at 37 °C. A second 3 h digestion step adding 0.2 μg of trypsin and collection of tryptic digestion products in the filtrate followed. The urinary protein digestion products (peptide mixture) were lyophilized and frozen at −80 °C. Off-Line Reversed-Phase Liquid Chromatography of Peptide Mixtures

The urinary peptide mixture was suspended in 20 mM ammonium acetate (pH 6.5) with 1% AcN (solvent A) and fractionated on a nonporous Inertsil C18 reversed-phase column (4.6 mm × 15 cm Inertsil ODS-4, GL Sciences, Rolling Hills Estates, CA) at a flow rate of 0.5 mL/min using HPLC. Fractions were collected during the entire 96-min HPLC run as follows: 100% solvent A over 10 min; first linear gradient elution to 50% solvent B (20 mM ammonium acetate, pH 6.5, with 70% AcN) over 60 min; second linear gradient from 50% to 100% solvent B over 10 min; and isocratic flow at 100% solvent B for 16 min. Consecutive fractions with low A220 traces were combined to yield a total of 15−19 fractions. These were lyophilized twice with one resuspension step in 100 μL of deionized water. 3124

DOI: 10.1021/acs.jproteome.5b00052 J. Proteome Res. 2015, 14, 3123−3135

Article

Journal of Proteome Research Nano C18 LC−MS/MS Analysis

in the training data set was correlated with peptide properties, resulting in the computation of Oi values. The equation to calculate the APEXi value for protein i from a 2D-LC−MS/MS data set was as follows: APEXi = Pi(ni/Oi)/∑kN= 1 Pk(nk/Ok) × C, where Oi is the computed correction factor, Pi is the probability score of protein identification in ProteinProphet, ni is the count of PSMs, and C is an optional concentration factor. Setting the protein FDR at 1%, proteins identified at a 99% confidence level were used for spectra counting. The APEXi scores were multiplied by the concentration factor C of 5 × 106, although this computation did not translate in copy numbers per cell; proteins were derived from the cell-free, soluble fraction of urine. We remedied the APEX data output by assembling proteins with a ≥50% match threshold for in silico digested tryptic peptides into protein groups because separate analysis of proteins sharing ≥50% of their identified peptide content does not yield quantitatively reliable information. Technical replicates of APEX data sets, in most cases two shotgun proteomic analyses, were averaged. The resulting 43 data sets for both T1D subjects and healthy siblings pertained to 40 and 41 individuals (subject IDs), respectively, due to collections of more than one urine sample from the same individual in five cases. To investigate whether specific biological process categories were enriched among the top-ranked differentially abundant proteins, the bioinformatics software tool PANTHER32 was used.

All HPLC peptide fractions were suspended in 0.1% formic acid (solvent C). Nanoflow LC−MS/MS experiments were performed using a LTQ-Velos Pro ion-trap mass spectrometer coupled to an Easy-nLC II system with a FLEX nanoelectrospray ion source at the interface (Thermo Scientific, San Jose, CA). All fractions derived from a single USF sample were analyzed in one sequence. Peptides were trapped on a C18 Easy-column (100 μm × 2 cm, 5 μm, 120 Å, Thermo Scientific) using solvent C and separated via nano C18 LC (PicoFrit C18 column, 75 μm × 10 cm, 5 μm, 150 Å, New Objective, Woburn, MA) at a flow rate of 300 nL/min. A linear gradient from 2 to 40% solvent D (0.1% formic acid in AcN) over 58 min was followed by a second gradient from 40 to 80% solvent D over 22 min and re-equilibration with solvent C for 5 min. Electrospray ionization was achieved by applying 2.0 kV distally via a liquid junction. The LTQ-Velos Pro MS survey scan was performed in the ion trap (MS1), acquiring spectra over a mass range of m/z 380−1800, followed by datadependent MS2 scans for 12 precursor ions. The fragmentation mode was collision-activated dissociation (CID) with normalized collision energy of 35%. Dynamic exclusion was enabled; MS2 ion scans were repeated once and then excluded from further analysis for 20 s. Database Searches with the Mascot Search Engine

Mascot searches were automated using Mascot Daemon (Matrix Science) with precursor and peptide fragment tolerances of ±1.4 and ±0.8 Da, respectively. The data were searched against a database composed of all curated protein entries in the human proteome database (UniProtKB release 2013-09-18). The fasta file contained 20 257 protein sequences. Carbamidomethymodification of cysteine was set as a fixed modification; oxidized methionine and protein N-terminal acetylation were set as variable modifications. Peptide charge states 1+, 2+, and 3+ were considered, and two missed tryptic cleavages were allowed. Peptide false discovery rates (FDRs) were determined by searching against the reversed-sequence decoy database version of the human proteome using the application in the Mascot software. Searches were performed using a single mascot generic format (.MGF) file, which was derived from the combination of MS data files of the 15−19 fractions associated with a distinct USF sample. Mascot Percolator was used as a postprocessing data analysis step to improve the sensitivity of the search and enforce a peptide spectrum match (PSM)-level q-value threshold of 30 kDa and good tryptic peptide representations. Observation of tryptic peptides 3125

DOI: 10.1021/acs.jproteome.5b00052 J. Proteome Res. 2015, 14, 3123−3135

Article

Journal of Proteome Research the R function p.adjust; and several types of the effect size (reported and described in Supporting Information Dataset S1). Separately, we applied RankingWilcoxon using the Wilcoxon rank sum test with unpaired case and control observations, and followed the same stability ranking protocol. Second, we applied a stability selection approach as implemented in ref 35. This feature selection method implements stability selection procedure implemented in ref 36 with the improved error bounds described in ref 37. Elastic net (from R package glmnet38) was used as the base feature selection method that was wrapped by the stability protocol, building a binomial family model with the case/control status as a response and the matrix of protein abundance values as predictors. The mixing parameter α of the glmnet was set to 1.0, as selected based on a 15-fold cross-validation, minimizing deviance on the full data set. The predictors were standardized to zero means and unit variances. With its multivariate base feature selection method, this protocol can potentially detect those correlated groups of biologically relevant proteins that will be missed by the univariate Wilcoxon test. The ranking of proteins and their probability of being selected into the model were reported, as well as the probability cutoff corresponding to the per-family error rate (PFER) that is controlled by this method. Our PFER cutoff was set to 0.05, and the target number of features selected by the base classifier was set to 8. In our experience with omics datasets, the PFER control in this method is fairly conservative, and we typically look at the ranking of proteins as opposed to concentrating only on proteins that pass the PFER cutoff.

values that were as high as or higher than the observed value. The empirical density distributions for between and within dissimilarities were plotted using geom_density method of R ggplot2 package.43 The code that generated the analysis results reported here, starting from the APEX-normalized abundance table, is available in the public repository of our open-source analysis package MGSAT.44 The specific commit ID is 87c1fe8, and the high-level driver script is t1d_proteomics_project.r. MGSAT is written in R. It applies several types of statistical tests, normalizations, and plotting routines to the abundance matrices that are typically the output of annotating (meta)omics data sets, and it generates a structured HTML report that, in addition to results, shows method parameters and versions of the external packages. The user has extensive control over types of tests, parameters, and a description of a study design through a data structure that is provided as input to the top-level routine of the package. For this article, the archive of the full analysis report generated by MGSAT is included as Supporting Information Dataset S1 along with the input APEX abundance matrix and study metadata files. Western Blot Analysis

USF samples from T1D subjects and siblings were separated in 4−12%T SDS-PAGE gel in MES buffer and transferred to a poly(vinylidene difluoride) (PVDF) membrane at 30 V for 60− 90 min. The blotting quality was assessed by staining protein bands briefly with Ponceau S. The loading quantity was normalized such that each USF sample was represented by 15−20 μg of protein. Membrane incubation with TBST (10 mM Tris, pH 8.0, 150 mM NaCl, and 0.05% Tween 20) supplemented with 5% nonfat dry milk for at least 1 h was followed by incubation with a primary antibody dilution for 2 h, three TBST wash cycles, incubation with a secondary antibody− HRP conjugate dilution for 1 h, and three TBST and one TBS wash cycle. Incubation and wash steps (5 min) were performed at 20 °C. Immunoreactive proteins were visualized using the enhanced SuperSignal West Pico chemiluminescence substrate (from Thermo Fisher, USA) and exposure to autoradiography film over 30−120 min. Western blot antibodies, either monoclonal (mAb) or polyclonal (pAb), included an anticollectin-12 mAb and an anti-apolipoprotein M mAb (from ABCAM, Cambridge, MA), an anti-plasma α-fucosidase pAb and an anti-CD166 mAb (from Thermo-Fisher), and an anti-α-Nacetyl-galactosaminidase pAb (from Santa Cruz Biotechnology, Dallas, TX). Primary antibodies were used in 1:1000 to 1:2500 dilutions. Secondary antibody−HRP conjugates were used in 1:5000 to 1:10000 dilutions (either goat anti-mouse IgG, goat anti-rabbit IgG, or donkey anti-goat IgG). Scanned films for FUCA2 and NAGA western blots were analyzed by the Proteomweaver image analysis software tool (Bio-Rad, Hercules, CA) to obtain spot-volumetric data. Wilcoxon rank sum tests were used to determine the significance of protein band intensity changes. One purpose of western blots was to determine whether the proteins were represented by full-length proteins and/or urine-specific fragments.

Statistical Methods: Overall Similarity of Urinary Proteomes Using Intrafamily and Interfamily Comparisons

We performed two types of analysis when looking at the matrix of pairwise dissimilarities between abundance profiles (78 profiles aggregated and filtered as described above). First, we applied the PermANOVA (permutation-based analysis of variance)39 test of statistical significance (as implemented in Adonis function of R vegan package)40 of the association between the profile dissimilarities and the case/control condition. We used Bray− Curtis dissimilarity index41 and 4000 permutations. Before computing the dissimilarity matrix, the abundances of individual proteins were standardized with the range method of decostand function in package vegan. This test was done separately with and without pairing observations by families. Pairing by families was done by setting the strata parameter of Adonis to the family ID. Second, we tested if the abundance profiles within the same families were more similar than the profiles between families. In order to control for the difference associated with the main condition in our cohort, we considered only the profile−profile Bray−Curtis dissimilarities in case−control pairs. The null hypothesis was that the observed difference of “between” and “within” family dissimilarities was consistent with what could be expected if the family structure was assigned to the subjects at random. The alternative hypothesis was that the between/within difference was larger than would have been expected from a random family assignment. We simulated 4000 matrices in which both between and within dissimilarities came from the null distribution by permuting the family ID labels of the cases in the original dissimilarity matrix. The rank biserial correlation42 statistic was computed between the between and within sets of dissimilarities both in the observed and simulated matrices. Positive values of this correlation statistic would indicate between samples being stochastically larger than within samples. The pvalue was estimated as the fraction of the simulated statistic



RESULTS

Cohorts and Clinical Data Overview

The urine specimens analyzed in this study were derived from 39 families, including 40 children/adolescents diagnosed with T1D who averaged disease durations of 6.3 years at the time of recruitment and 41 healthy siblings approximately matched by age (Table 1). Two families with more than two siblings were 3126

DOI: 10.1021/acs.jproteome.5b00052 J. Proteome Res. 2015, 14, 3123−3135

Article

Journal of Proteome Research Table 1. Human Subjects Participating in the Study

a

sample cohort

sample size

healthy siblings type 1 diabetes

41 40

age (years)a age range (min, max) 11.9 ± 3.6 12.4 ± 3.4

diabetes duration (years)a

duration range (min, max)

HbA1c (%)a

gender (male/ female)

6.3 ± 3.4

2, 15

N.D. 8.4 ± 1.8

21/20 22/18

5, 20 6, 19

Data are mean ± SD unless otherwise indicated. N.D., not determined.

Figure 1. Empirical distribution density plots of the Bray−Curtis dissimilarity index computed between cases and controls observed between and within families. The plots demonstrate higher similarity of urinary proteome profiles of individuals within a family than across families while controlling for the differences associated with the case/control status.

examined the variability of the APEX quantification-based proteomic data. Samples associated with the 81 human subjects yielded 175 2D-LC−MS/MS data sets, affording in-depth insights into the composition of urinary proteomes of sibling pairs or groups discordant for T1D. The number of protein groups identified per experiment ranged from 467 to 1785 and averaged 1036 proteins at a FDR of 1%, derived from ProteinProphet search results. Protein groups, with group members sharing at least 50% of their peptides, are referred to as proteins from here on. The difference in the average number of protein identifications comparing T1D patient and healthy sibling data sets was negligible, with 1.3%. At a confidence level of 99% and a FDR ≤ 1%, derived from ProteinProphet search results, 5046 proteins were identified, making this one of the largest urine proteome surveys to date. Sixty-four percent of the proteins were identified from both cohorts. The large number of proteins not observed in both cohorts is related to the absence of protein homeostasis in urine, generating more variability among different individuals and low LC−MS/MS reproducibility of peptide observations in the low-abundance range, especially for an analytical method with two-dimensional LC separations. As described in the Experimental Section, filters applied to the data sets largely eliminated such proteins from the quantitative analysis.

included, one with three siblings and another with four siblings (one and two with T1D, respectively). Data on autoantibodypositive or -negative status of healthy siblings were not consistently available, thus preventing identification of individuals at risk of developing T1D. All T1D patients were treated with injectable insulin medications. Proteinuria, defined as >250 mg over 24 h voided urine, was not predicted for any T1D patient, although one patient’s urine had a large protein quantity in a single 50 mL voided sample. This adolescent was diagnosed only 3 years earlier, and the HbA1C level was low, with 6.1. On the basis of such protein measurements, all other subjects were normalbuminuric. Adjusted for urine specimen volumes, protein contents were not different with statistical significance (using an unequal variance t-test), comparing the T1D and healthy sibling groups: controls, μ = 39 mg and O′ = 30 (n = 41); T1D, μ = 52 mg and O′ = 106 (n = 40). This data, age, gender, time since disease diagnosis, and HbA1C levels are provided for each individual in Table S1, Supporting Information. Experimental and Biological Variability of the Urinary Proteome

Urinary proteome profiles obtained for 86 urine specimens were first examined in SDS-PAGE gels and, following doubledigestion of protein mixtures with trypsin, chromatographically on the peptide level, as shown in Figure S1, Supporting Information. Due to the stochastic sampling nature of datadependent LC−MS/MS shotgun proteomic experiments, in our case with two-dimensional LC separation of peptides, we 3127

DOI: 10.1021/acs.jproteome.5b00052 J. Proteome Res. 2015, 14, 3123−3135

Article

Journal of Proteome Research

Figure 2. Gene ontology (GO) biological process enrichment for 90 differentially abundant urinary proteins. The proteins submitted to this analysis were in a merged list of top-ranked proteins from the Wilcoxon rank sum test (q-value ≤ 0.1), the Wilcoxon signed-rank test (q-value ≤ 0.1), and the stability selection analysis (top 50 proteins). GO terms were downloaded and enrichment analysis was performed with the open source software tool PANTHER. Very general enriched GO categories, such as biological process, regulation of biological quality, catabolic process, and single-organism process, are not shown in the graphic. The bars represent the quotients of p-values of biological process categories found to be enriched in the data set.

Table 2. Proteins Altered in Abundance in T1D Patients and Involved in Metabolic and Inflammatory Processes UniProt ID

protein name

P13473

LAMP2

Q07075 P17050 O00754 P53634 P04066 Q13510 P15586 P20774 Q9BTY2 Q9UHL4 Q9Y646 P07686 P07858 P02750 P09228 P09455 O95445 P01042 Q14393 P40197 P01033

ENPEP NAGA MAN2B1 CTSC FUCA1 ASAH1 GNS OGN FUCA2 DPP7 CPQ HEXB CTSB LRG1 CST2 RBP APOM KNG1 GAS6 GP5 TIMP1

protein descriptiona Lysosome-associated membrane glycoprotein 2 Glutamyl aminopeptidase Alpha-N-acetylgalactosaminidase Lysosomal alpha-mannosidase Cathepsin C Tissue alpha-L-fucosidase Acid ceramidase N-acetylglucosamine-6-sulfatase Mimecan Plasma alpha-L-fucosidase* Dipeptidyl peptidase 2 Carboxypeptidase Q Beta-hexosaminidase subunit beta Cathepsin B Leucine-rich alpha-2-glycoprotein Cystatin-SA Retinol-binding protein 1 Apolipoprotein M* Kininogen-1 Growth arrest-specific protein 6 Platelet glycoprotein V Metalloproteinase inhibitor 1*

rankd GSS

q-value

l2 fc.paired.mediane

ratiof T1D/C

ADH, PTOX

12

0.0147

−0.5349

0.58

PMP, VASP DC, GPMP DC, GPMP DC, PMP DC, GPMP LMP CMP PGMP DC, GPMP DC, PMP PMP DC, GPMP INF, PMP INF, IM/AIM PI DC, VMP LMP DC, VASP, PI DC, INF, ADH VADH INF, LEUM

22 2 3 7 10 31 40 5 1 21 23 36 37 8 14 16 89 84 19 6 67

0.0238 0.0001 0.0007 0.0019 0.0092 0.0343 0.0411 0.0019 0.0001 0.0196 0.0238 0.0398 0.0398 0.0038 0.0178 0.0155 0.1746 0.1192 0.0194 0.0021 0.0716

0.6658 1.1144 2.4884 0.8037 1.1604 0.5562 0.3516 2.3940 1.6471 1.0869 0.5701 0.6608 0.5757 0.9268 1.1558 0.4586 0.8520 −0.3093 1.2737 1.7206 1.0758

2.17 2.97 10.47 2.19 2.51 1.71 1.43 7.53 6.11 2.40 2.03 1.69 1.69 2.52 3.27 8.10 3.88 0.76 2.62 30.87 4.80

cellular localizationb

biological processc

LYS, CM, VE LYS, CM, VE LYS, VE LYS, VE LYS, VE LYS, VE LYS, VE LYS, VE LYS, VE, ECM LYS, VE, S LYS, VE, S LYS, VE, S LYS, VE, S LYS, VE, S S, plasma S, plasma S, plasma S, plasma S, plasma, VE S, platelet G S, platelet G S, platelet G

*Proteins were also quantitatively measured in western blots. bEntries in this column pertain to subcellular localizations (LYS: lysosomes; CM: cell membranes; platelet G: platelet granules) and extracellular localizations (VE: vesicular exosomes; S: secreted; ECM: extracellular matrix; plasma: blood plasma) reported for the proteins. Cellular localizations determine the order of rows in the table. cEntries in this column pertain to evidence of a protein’s involvement in diabetic complications (DC), general inflammation (INF), immune and autoimmune processes (IM/AIM), adhesion (ADH), leukocyte migration (LEUM), vascular permeability (VASP), glycoprotein, proteoglycan, protein, carbohydrate, lipid, and vitamin metabolic processes (GPMP, PGMP, PMP, CMP, LMP, and VMP, respectively), protease inhibition (PI), and protection from cellular toxins (PTOX). d GeneSelector stability (GSS) ranking in Wilcoxon signed-rank test. eThe median log2 of the abundance ratios in paired case/control observations (n = 39). fThe ratios are between the sample means of the cases (n = 39) and controls (n = 39), with biological and technical replicates aggregated as described in the Experimental Section. a

samples (p-value 0.0073, R2 0.02276). When cases and controls were paired within families in the same PermANOVA test, the pvalue was an order of magnitude lower (0.0005). This suggests that the use of siblings to evaluate changes in the urinary proteome in the context of a disease increases the statistical power at a given number of subjects by controlling for some of

Comparison of Protein Abundance Profiles in Cases vs Controls and between vs within Families of the Study Participants

The PermANOVA test showed that protein profile dissimilarities were significantly associated with the case/control condition even when cases and controls were treated as independent 3128

DOI: 10.1021/acs.jproteome.5b00052 J. Proteome Res. 2015, 14, 3123−3135

Article

Journal of Proteome Research

Figure 3. Box plot for 20 proteins ranked highest in the GeneSelector protocol based on the Wilcoxon signed-rank test for protein abundance in the T1D vs healthy sibling cohorts. The median log2 fold change in abundance is shown, with values in the control samples used as baseline. Positive values correspond to increased abundance in T1D samples relative to controls matched within families (39 pairs). Protein names for the protein accession numbers are provided in Tables 1 and 2.

GO term biological process categories (Figure 2). Among the most enriched processes were proteolysis, biological adhesion, carbohydrate metabolism, organo-nitrogen compound metabolism, and glycoside catabolism. We examine the categories closely in the following two paragraphs.

the interindividual sources of variance that are unrelated to the condition of interest. When we controlled for the difference related to the disease condition, we could see that the difference between protein abundance profiles was smaller within families than across families (Figure 1). In the associated permutation test, the estimated p-value was 0.00025 when the original ordering of family IDs was included among the replicates or zero when it was not. The observed value of the rank biserial correlation statistic was 0.529638.

Many Differentially Abundant Proteins Are Involved in Lysosomal Protein and High Mr Glycoside Metabolisms

Thirteen proteins increased in abundance in the T1D cohort (qvalue ≤ 0.05 in Wilcoxon signed-rank test) were of lysosomal origin and are implicated in protein, glycoprotein, proteoglycan, and sphingolipid metabolic processes (Table 2). They were ranked 40 or higher in the corresponding GeneSelector protocol. Seven of these proteins are included in the plot of median abundance changes between the two cohorts showing the top 20 ranked proteins (Figure 3). Five lysosomal enzymes were ranked at the very top in stability selection analysis (Figure 4). In particular, plasma α-fucosidase (FUCA2, UniProt ID Q9BTY2) stood out as a urinary protein discriminating between the cohorts, also passing the PFER = 0.05 significance level in stability selection. The only lysosomal protein altered in abundance in the opposite direction was lysosome-associated membrane protein 2 (LAMP2), which has an important function in protecting the cytoplasm from lytic activities of lysosomes. Lysosomes are present in many cell types, but they are enriched in renal tubular epithelial cells due to specialized functions of the cells in the degradation of macromolecules, especially glycoproteins, following receptor-mediated reuptake from the urinary tract collecting ducts.45 Cathepsin B and cathepsin C, both

Proteins Differentially Abundant in Urine of Individuals with T1D vs Healthy Siblings

To perform statistical analyses on urinary proteome data sets comparing the two cohorts, we collapsed the APEX profiles into 78 data sets (39 each for T1D patients and healthy siblings) so that no family was represented by more than two quantitative urinary proteome profiles, and we performed metadataindependent filtering as described in the Experimental Section. Fifty and 21 proteins with q-values ≤ 0.05 were identified for the Wilcoxon signed-rank and the Wilcoxon rank sum tests, respectively, suggesting higher sensitivity of a test that matches samples within families. We also considered the stability selection analysis as described in the Experimental Section in order to rank proteins as differentiating the cohorts (Dataset S1, sections 1.1.1.1.1.1, 1.2.1.1.3.1 and 1.2.1.1.3.2, Supporting Information). Ranking of proteins by the three methods showed substantial overlap. Proteins that had q-values ≤ 0.1 in the rank tests and the top-50 proteins from the stability selection analysis resulted in a nonredundant list of 90 proteins used to identify enrichment for 3129

DOI: 10.1021/acs.jproteome.5b00052 J. Proteome Res. 2015, 14, 3123−3135

Article

Journal of Proteome Research

kidneys, such as angiotensin-converting enzyme 2 (ACE2), aspartoacylase-2 (ACY3), cell surface glycoprotein MUC18, and the membrane protein amnionless (AMN), the latter of which functions in receptor-mediated uptake of proteins into renal tubular cells. Eighteen proteins listed in Table 3 were cell surface or membrane associated, many of which are also components of vesicular exosomes, organelles released into body fluids. Nine proteins, two of them associated with the cytoskeleton, have been implicated in adhesion processes, specifically vascular adhesion. With the exception of mucosal addressin cell adhesion molecule1 (MADCAM1) and cortactin (CTTN), these proteins were increased in abundance in the T1D cohort compared to healthy siblings. Two proteins are not only adherent but also influence vascular permeability, MUC18 and cadherin-5 (CDH5). CDH5 is also called the vascular endothelial junction cadherin. Enhanced proteolytic degradation of CDH5 was linked to molecular changes during retinopathy in an animal model.51 Decreased CDH5 levels have been causatively linked to protease activities degrading the extracellular matrix. This presents an intriguing context to cathepsins B and C. Many adhesion proteins also affect leukocyte migration and interactions of leukocytes with endothelial cells. Four proteins influencing leukocyte migration, CD166 antigen, VACM1, L-selectin, and 4F2 cell-surface antigen heavy chain, were increased in abundance in the T1D cohort (Table 3). The functional relationships of VACM1 activity, the endothelial adherence junctions with the central protein CDH5, and transendothelial migration of leukocytes during inflammatory processes are wellcharacterized.52 Apart from proteins influencing leukocyte migration, abundance differences were detected for other proteins located at cell surfaces, secreted, and contributing to inflammatory pathways. The proteins collectin-12 and angiotensin-converting enzyme 2 (ACE2), both increased in the T1D group compared to the healthy control group, contribute to inflammatory pathways. ACE2, highly ranked in the stability selection analysis (Figure 4), is a carboxypeptidase converting angiotensin I to angiotensin 1-9, a peptide of unknown function, and angiotensin II to angiotensin 1-7, a vasodilator. ACE2 regulates renal inflammation, vascular permeability, and hypertension.53 The urinary protein abundance changes comparing the cohorts, tentatively suggest a link between the extracellular release of such proteins from the kidneys into the urine tract and low-level inflammation caused by hyperglycemia.

Figure 4. Results of a stability selection analysis for the top ranked proteins, showing probability of selecting proteins into the glmnet model that predicts the case/control status. The probability cutoff of 0.53 corresponds to the per-family error rate PFER = 0.0474 (vertical line). Five of the six proteins with the highest selection probability were lysosomal enzymes; the other protein was ACE2. Protein names for the protein accession numbers are provided in Tables 1 and 2. The exceptions are delta-aminolevulinic acid dehydratase (P13716)and Ig kappa chains V-I region DEE and V-III region VG (P01597 and P04433, respectively).

secreted proteases, contribute to the extracellular matrix disassembly, inflammation, and apoptosis in tissues. Among the nonlysosomal secreted proteins, one protease inhibitor was moderately decreased in abundance in the T1D cohort, kininogen-1 (KNG1), and two protease inhibitors, cystatin-SA and TIMP1, were increased (Table 2). KNG-1 is synthesized in the kidneys and inhibits urinary kallikrein. Abnormal function of the kininogen−kallikrein system has been associated with diabetes.46 KNG1 and an immunoglobulin κ-chain VIII (P04433, Figure 4) were the most abundant proteins in urine among those in the top-21 list in the stability selection analysis. Bradykinin is the peptide fragment of KNG1 mediating inflammation and vascular dilation. Like TIMP1, growth-arrestspecific protein 6 (GAS6) and platelet glycoprotein V (GP5) (Table 2) are released from platelet granules upon activation47 and are implicated in vascular adhesion processes during injury.48 GP5 had the highest fold change of all proteins comparing the T1D and healthy sibling cohorts. Among blood plasma proteins altered in abundance were leucine-rich α-2-glycoprotein (LRG1) and retinol-binding protein (RBP). LRG1 has been described as an inflammatory biomarker in autoimmune diseases,49 and RBP, as a biomarker of T1D-associated nephropathy.50

Analysis of Plasma α-Fucosidase (FUCA2) and Anti-α-N-acetyl-galactosaminidase (NAGA) in Western Blots

We selected six proteins, ranked among the top 100 proteins in the GeneSelector stability analysis including the highest-ranked proteins FUCA2 and NAGA, for western blots. Other criteria for selection were a role in vascular adhesion/cell migration (COLEC-12, CD166, TIMP1; Table 3), high abundance in urine (ApoM), and antibody availability. The expectations were that western blots should support trends of abundance changes observed in the proteomic analysis and provide information on whether the abundance differences pertained to full-length proteins or fragments. The purpose was not to evaluate western blots for biomarker verification. For FUCA2 and NAGA, western blots provided semiquantitative data for 57 and 27 samples, respectively. Consistently detected bands for the two proteins, in the 36 and 53 kDa ranges, respectively, are provided in the image panels of Figure S2, Supporting Information. An unequal variance t-test performed for these protein bands revealed pvalues for statistically significant differences of 0.048 (FUCA2;

Differentially Abundant Proteins with Functions in Biological Adhesion Affect Leukocyte Migration and the Microvasculature

In addition to platelet proteins, other proteins altered in abundance in T1D vs healthy sibling cohort were identified (Table 3). Among them were proteins highly expressed in the 3130

DOI: 10.1021/acs.jproteome.5b00052 J. Proteome Res. 2015, 14, 3123−3135

Article

Journal of Proteome Research Table 3. Proteins Altered in Abundance in T1D Patients and Involved in Cell Adhesion and Leukocyte Migration UniProt ID

protein name

protein descriptiona

P08195 Q96HD9 Q9BXJ7 O43451 Q9H665 Q13477

SLC3A2 ACY3 AMN MGAM IGFLR1 MADCAM1

P14151 P33151 P43121 Q13421 Q6UXB8 P19320 Q5KU26 Q13740 P14384 Q9BYF1 Q8NC42 Q9BS26

SELL CDH5 MCAM MSLN PI16 VCAM1 COLEC12 ALCAM CPM ACE2 RNF149 ERP44

P52790 Q9H0E2

HK3 TOLLIP

4F2 cell-surface antigen heavy chain Aspartoacylase-2 Protein amnionless Maltase-glucoamylase, intestinal IGF-like family receptor 1 Mucosal addressin cell adhesion molecule 1 L-selectin Cadherin-5 Cell surface glycoprotein MUC18 Mesothelin Peptidase inhibitor 16 Vascular cell adhesion protein 1 Collectin-12* CD166 antigen* Carboxypeptidase M Angiotensin-converting enzyme 2 E3 ubiquitin-protein ligase RNF149 Endoplasmic reticulum resident protein 44 Hexokinase-3 Toll-interacting protein

P52758 Q14247 P06396

HRSP12 CTTN GSN

Ribonuclease UK114 Src substrate cortactin Gelsolin

cellular localizationb

biological processc

rank GSSd

q-value

l2 fc.paired.mediane

ratiof T1D/C

CS, apic CM CS, apic CM, VE CS, apic CM, VE CS, apic CM, VE CS, CM CS, CM

LEUM PTOX PMP, VMP CMP INF ADH, LEUM

15 24 26 27 47 49

0.0155 0.0238 0.0280 0.0286 0.0442 0.0442

0.7686 −1.2449 0.0000 −0.6744 −0.4503 −0.3630

2.16 0.24 0.14 0.59 0.58 0.58

CS, CM CS, CM, ECM CS, CM, ECM CS, CM, S CS, CM, VE CS, CM, VE CS, CM, VE CS, CM, VE, S CS, CM, VE, S CS, S, VE CM, VE CM, VE, ER

ADH, LEUM VADH, VASP VADH, VASP ADH

72 35 38 42 9 13 59 54 34 4 20 28

0.0781 0.0353 0.0374 0.0471 0.0046 0.0147 0.0658 0.0575 0.0353 0.0019 0.0238 0.0343

0.4534 1.5829 0.5625 0.4989 0.7197 0.7087 0.7347 0.6623 −0.3679 1.0371 0.0000 1.2051

2.02 3.11 1.61 2.08 2.45 2.03 1.97 4.48 0.71 2.93 0.31 2.59

CY CY, VE

CMP TOLL, IM/ AIM

30 17

0.0353 0.0180

0.0000 0.0000

11.76 0.11

25 43 32

0.0258 0.0374 0.0343

−1.1157 −0.7552 0.6649

0.40 0.47 1.79

CY, VE CSK CSK, S, VE

VADH, LEUM INF ADH, LEUM GPMP DC, INF, VASP IM/AIM GPMP

ADH, ENDM ADH

*Proteins were also quantitatively measured in western blots. bEntries in this column pertain to subcellular localizations (CS: cell surface; CM: cell membrane; CY: cytosol; ER: endoplasmic reticulum; CSK: cytoskeleton) and extracellular localizations (VE: vesicular exosomes; S: secreted; ECM: extracellular matrix) reported for the proteins. cEntries in this column pertain to evidence of a protein’s involvement in vascular adhesion (VADH), leukocyte and endothelial migration (LEUM, ENDM), vascular permeability (VASP), TOLL receptor signaling pathway (TOLL); for other acronyms, see Table 2 legend. dGeneSelector stability (GSS) ranking in Wilcoxon signed-rank test. eThe median log2 of the abundance ratios in paired case/control observations (n = 39). fThe ratios are between the sample means of the cases (n = 39) and controls (n = 39), with biological and technical replicates aggregated as described in the Experimental Section. a



DISCUSSION Hyperglycemia is a result of suboptimal glycemic control in T1D patients. Frequent and, in particular, chronic hyperglycemia are implicated in the slow progression toward vascular diseases caused in part by the formation of AGEs due to aberrant glycation of proteins and other metabolic perturbations.57 We hypothesized that the urinary proteome harbors evidence of these pathophysiological changes, including incipient microvascular inflammation in asymptomatic stages. The cohort of T1D patients studied here showed none of the clinical symptoms associated with diabetic complications, which was not surprising given the average disease duration of only 6.3 years. Half of the T1D patients had HbA1C levels above 8, indicating suboptimal glycemic control. Healthy siblings of T1D patients were recruited as a control cohort, an aspect we considered to be useful to moderate interindividual variability of the proteome in urine, a nonhomeostatic body fluid. A statistical dissimilarity analysis performed here supported this notion. This comprehensive urinary proteome survey allowed us to profile a large number of proteins with estimated abundances of less than 75 nM in urine and to identify differences between T1D patients and healthy siblings for the first time. The discussion here is brought into context with other proteomic studies on hyperglycemia and diabetic complications of T1D because the proteins of true interest are those that may predict the risk of progression of T1D patients to vascular complications.

T1D vs control abundance ratio: 2.1) and 0.003 (NAGA; T1D vs control abundance ratio: 1.5). The FUCA2 band represented a fragment of this enzyme, whereas the NAGA band represented the full-length protein without the signal peptide. In NAGA blots, a few fragments of lower Mr were also detected, but they had lower intensities. More than half of the USF samples evaluated in these western blots were from T1D patients and healthy siblings not included in the proteomic surveys. NAGA and FUCA2 are lysosomal enzymes, and the role of such enzymes in inflammatory processes and autoimmune pathologies is well-documented.54 For other proteins detected in western blots, 16 USF samples were analyzed. Bands for the proteins CD166 and TIMP1 in western blots were weak and not further evaluated. A 30 kDa COLEC-12 fragment with a Mr value lower than that predicted for the full-length protein was detected. For apolipoprotein M (ApoM), 26 kDa bands expected for the full-length protein were detected. Band intensities for these two proteins were not consistently higher in T1D vs control samples (Figure S2, Supporting Information). ApoM is a lipoprotein expressed in the kidneys and secreted into plasma and urine, and it is a carrier of sphingosine 1-phosphate, a bioactive lipid mediator with a role in maintaining insulin secretion.55 Collectin12 is a C-lectin type scavenger receptor involved in the internalization and the degradation of oxidized low-density lipoproteins (LDL). Oxidized and AGE-associated LDLs are increased in immune complexes associated with cardiovascular complications of T1D.56 3131

DOI: 10.1021/acs.jproteome.5b00052 J. Proteome Res. 2015, 14, 3123−3135

Article

Journal of Proteome Research

from the platelet surface, such as GP5. GP5, TIMP1, and GAS-6 are also released by platelet α-granules.47 GAS-6 regulates the function of platelets.47,64 All three platelet proteins were increased in the urinary proteome of T1D patients relative to healthy siblings in our study, supporting the notion of increased platelet activities in the vasculature of T1D patients presymptomatic for disease complications. TIMP1 was previously reported to be elevated in plasma of T1D patients compared to controls.65 GAS6 is a receptor protein interacting with tyrosine kinases of the TAM family, the latter of which have important functions in apoptotic cell clearance and innate immunity.66 GAS6 has been implicated in glomerular hypertrophy, a condition occurring in the early stage of diabetic nephropathy.67 ApoM, a plasma protein increased in abundance in individuals with T1D according to our data, was reported to have a paradoxical effect on atherosclerosis.68 In summary, many of the aforementioned proteins are involved in inflammatory responses and functionally linked to the pathophysiology of T1D, which includes AGErelated increases in cellular glycoprotein turnover and atherosclerosis. Cadherin-5, the main protein stabilizing the endothelial cell junctions and important for the control of vascular permeability,69 was increased in abundance in urine of T1D patients in our survey. The protein has been reported to be shed during endothelial apoptosis via cleavage from the junctions by MMPs70 and to be implicated in diabetic nephropathy-associated vascular disruptions triggered by AGE formation.71 CD166 antigen is a compoment of the endothelial cell junction and facilitates transendothelial monocyte migration.72 Collectin-12 is a receptor protein involved in the recognition and degradation of oxidized LDL by vascular endothelial cells and may have endothelial cell-protective functions.73 Oxidized LDL contributes to the progression of T1D.74 Both CD166 and collectin-12 were increased in abundance in the urine of T1D patients in our survey. We also observed increased amounts of ACE2 in the urine of T1D patients compared to healthy controls. ACE2 is one of the two enzymes metabolizing angiotensins in the kidneys. Angiotensin-II causes hypertension, vascular inflammation, and increased endothelial cell permeability, contributing to microvascular complications of T1D.53 The aforementioned quantitative changes in the urinary proteome of normalbuminuric T1D patients support the notion of inflammatory responses. Multiple proteins differentially abundant in T1D compared to the healthy sibling cohort influence vascular adhesion and permeability and leukocyte migration, processes that are functionally linked to the pathophysiology of T1D.

In previous studies where urinary proteins were analyzed in the context of T1D or progression to diabetic complications, proteins of mostly high abundance in urine, such as collagen fragments, uromodulin, calgranulin, and lower Mr plasma proteins, were reported as potential diagnostic biomarkers.24,25,58 In a study of diabetic complications by Caseiro et al., gelsolin was identified as a urine diagnostic biomarker for T1D, whereas βhexosaminidase and ganglioside GM2 activator (GM2A) were denoted as potential biomarkers for retinopathy in diabetics.24 All three proteins were increased in abundance in urine of the T1D cohort in our study and were altered with q-values of