A Model Sea Urchin Spicule Matrix Protein Self-Associates To Form

Jul 16, 2016 - Martin PendolaGaurav JainYu-Chieh HuangDenis GebauerJohn ... Anastasia Davidyants , Yong Seob Jung , and John Spencer Evans...
0 downloads 0 Views 11MB Size
Subscriber access provided by Vanderbilt Libraries

Article

A model sea urchin spicule matrix protein selfassociates to form mineral-modifying protein hydrogels. Gaurav Jain, Martin Pendola, Ashit Rao, Helmut Cölfen, and John Spencer Evans Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.6b00619 • Publication Date (Web): 16 Jul 2016 Downloaded from http://pubs.acs.org on July 19, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Biochemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

A model sea urchin spicule matrix protein self-associates to form mineral-modifying protein hydrogels.

Gaurav Jain,1¶ Martin Pendola,1¶ Ashit Rao,2 Helmut Cölfen,2 and John Spencer Evans1*

1

Laboratory for Chemical Physics, Center for Skeletal and Craniofacial Biology, New York University,

345 E. 24th Street, NY, NY, 10010 USA. 2

Department of Chemistry, Physical Chemistry, Universität Konstanz, Universitätstrasse 10, Konstanz D-

78457, Germany.

Keywords:

Sea urchin / spicules / calcite / amorphous calcium carbonate / protein aggregation / hydrogels / intrinsic disorder / amyloid-like



Both authors contributed equally to this work

*To whom correspondence should be addressed. Email: [email protected]

1

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 46

ABBREVIATIONS TrxHis6 = recombinant thioredoxin poly(His)6 a f f i n i t y t a g ; β-IPTG = b e t a isopropylthiogalactoside; TEV protease = tobacco etch virus protease; IDA = iminodiacetic acid; ATT = 6-aza-2-thiothymine; Sf9 = clonal isolate of Spodoptera frugiperda Sf21 cells; MALDITOF-MS = matrix-assisted laser desorption/ionization time-of-flight mass spectrometry; gp67 = envelope glycoprotein insect cell signal secretion sequence; rhPTH =

recombinant human

parathyroid hormone; Gal = galactose; Man = mannose; Fuc = fucose; NAG = Nacetylglucosamine; NeuAc = N-acetyl neuramic acid (sialic acid); SpSM30 = Strongylocentrotus purpuratus spicule matrix protein 30; SpSM30B/C = Strongylocentrotus purpuratus hybrid spicule matrix protein 30, isoforms B and C; SpSM50 = Strongylocentrotus purpuratus spicule matrix protein 50, rSpSM30B/C-N = E. coli recombinant Strongylocentrotus purpuratus hybrid spicule matrix protein 30, isoforms B and C, non-glycosylated; rSpSM30B/C-G = recombinant Sf9 – expressed Strongylocentrotus purpuratus hybrid spicule matrix protein 30, isoforms B and C, glycosylated; PNC = pre-nucleation cluster; ACC = amorphous calcium carbonate; MgC = magnesium carbonate; ID = intrinsic disorder; CTLL = C-type lectin binding domain; MAQPG = Met/Asn/Gln/Pro/Gly-rich domain.

2

ACS Paragon Plus Environment

Page 3 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

ABSTRACT In the purple sea urchin Strongylocentrotus purpuratus the formation and mineralization of fracture-resistant skeletal elements such as the embryonic spicule requires the combinatorial participation of numerous spicule matrix proteins such as the SpSM30A-F isoforms. However due to limited abundance it has been difficult to pursue extensive biochemical studies of the SpSM30 proteins and deduce their role in spicule formation and mineralization. To circumvent these problems we expressed a model recombinant spicule matrix protein, rSpSM30B/C, which possesses the key sequence attributes of isoforms “B” and “C”. Our findings indicate that rSpSM30B/C is expressed in insect cells as a single polypeptide containing variations in glycosylation that create microheterogeneity in rSpSM30B/C molecular masses. These posttranslational modifications incorporate O- and N-glycans and anionic mono-, bisialylated and mono-, bisulfated monosaccharides on the protein molecules and enhance its aggregation propensity. Bioinformatics and biophysical experiments confirm that rSpSM30B/C is an intrinsically disordered, aggregation-prone protein that forms protein phases or hydrogels that control the in vitro mineralization process in three ways: 1) increase the time interval for prenucleation cluster formation and transiently stabilize an ACC polymorph; 2) promote and organize single crystal calcite nanoparticles; and 3) promote faceted growth and create surface texturing of calcite crystals. These features are also common to mollusk shell nacre proteins, and we conclude that rSpSM30B/C is a spiculogenesis protein that exhibits traits found in other calcium carbonate mineral-modification proteins.

3

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 46

Sea urchin embryo skeletal elements (spicules) are important research models for investigating the chemical mechanisms and biological processes that organisms use to control crystal growth and create sophisticated functional nanocomposites1 under very mild and sustainable conditions.1-5 In the purple sea urchin embryo, Strongylocentrotus purpuratus, the spicule is a single crystal of magnesium-bearing calcite (Ca.95Mg.05CO3 or MgC) that is formed by primary mesenchyme cells (PMCs) via deposition of amorphous calcium carbonate (ACC) within the extracellular spicule matrix (SM) created by protein families designated as SpSM. 6-24 This ACC phase then undergoes an amorphous-to-crystalline transition to form MgC and subsequently the SpSM proteins become occluded within the crystalline mineral phase. 12,22-24 Unfortunately the mechanistic details of these processes remain unknown. Thus there is a strong motivation to gain further information regarding the spicule matrix, the SpSM proteome, and the spatial and functional relationships between SpSM proteins and mineral phase. At this stage there are some general facts regarding SpSM proteins.

Genomic and

proteomic studies reveal that all expressed SM proteins feature a C-type lectin-like domain (CTLL),2-5,11 which may play an important role in matrix assembly and mineral formation. 11 Second, in ten SpSM proteins there also exists Met/Asn/Gln/Pro/Gly-rich (MAQPG) repetitive domains that bear a striking similarity with known elastomeric sequences. 2-5,11,14 These common sequence elements may convey important ECM functions, such as matrix protein assembly and elasticity, protein – mineral phase interaction/stabilization, or matrix-mineral organization to this family of proteins. Of the known SM proteins, the best characterized member is SpSM50, 2-5,13-21 the most abundant member of the SM proteome whose individual domains have been found to possess several important features related to protein self-assembly and ACC formation.11 In

4

ACS Paragon Plus Environment

Page 5 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

contrast, our current knowledge of other less abundant proteins, such as the highly homologous SpSM30 acidic glycoprotein isoforms (SpSM30A through F),2-5,9,13,14,17,18 is much more limited. The SpSM30 glycoproteins accumulate in the longitudinally growing portions of the spicule, i.e., parallel to the c-axis of calcite, whereas SpSM50 accumulates in the radially growing portions, i.e., normal to the c-axis.18 Curiously, knockdown of SpSM30 expression does not interfere with spicule mineralization, formation or elongation, whereas knockdown of SpSM50 expression does.10 Collectively, the emerging picture of spicule mineralization and matrix assembly is that SpSM30 isoform function and location are distinct from that of SpSM50. Unfortunately, due to the limited abundance of SpSM30 isoforms in the spicule, their similarities in molecular weight, and their temporal expression patterns, it has been difficult to pursue extensive biochemical studies of these acidic glycoproteins and deduce their role in spicule formation and mineralization vis à vis SpSM50. One way to circumvent these difficulties and clarify the role of SpSM30 glycoproteins within spicule mineralization is to create a model recombinant SpSM30 glycoprotein isoform for in vitro studies. An initial choice for a model system is the original documented SpSM30 glycoprotein sequence15,25 (Accession P28163, Fig 1; mature sequence length, 270 AA), which has been subsequently reclassified as SpSM30B/C, i.e., a hybrid of the “B” and “C” isoform sequences.2,4,5,14 The justification for this model system choice is as follows: (1) Within the 6 member SpSM30 family, the SpSM30B (268 AA) and SpSM30C (270 AA) isoforms are expressed in the embryonic spicule and in all adult mineralized tissues.17 Thus, both isoforms are important for spicule matrix formation and mineralization at all stages of sea urchin development. (2) Note that SpSM30B/C, SpSM30B,and SpSM30C exhibit 97% sequence

5

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig 1. The mature, Signal-P transmembrane-deleted primary sequence of the original SpSM30 protein (reclassified as SpSM30B/C). Putative sites for N- (red) and O- (blue) glycosylation are indicated above each target residue and were determined via the NetNGlyc and NetOGlyc prediction programs, respectively. Predicted intrinsically disordered (underlined) and amyloid-like cross-beta strand aggregation (highlighted) sequences are color-indexed to each metaserver. The putative CTLL domain is denoted in gray lettering (E92 – D180). Note that the MAQPG region runs from R181 to Q270.

homology with one another (Figure S1, Supporting Information).2,4,5,14 Thus, a model SpSM30B/C hybrid glycoprotein would provide an opportunity to capture overlapping functional aspects of two isoforms from the SpSM30 6-member series. Using baculovirus-infected insect cells we expressed and characterized a recombinant version of this hybrid protein sequence, rSpSM30B/C (where “r” stands for recombinant). Our approach was similar to that adopted for the expression of the recombinant abalone shell nacre glycoprotein, rAP24G:26 we are not attempting to generate an exact glycoprotein replica nor study the in situ glycosylation of this protein. Rather, our goal was to create a model protein that could help us address three issues: (1) do insect cells post-translationally modify rSpSM30B/C, and if so, what carbohydrate modifications occur to the protein? (2) what effect does SpSM30B/C have on the early events of in vitro calcium carbonate mineralization, and, can these provide insights into the potential role(s) that SpSM30 A-F isoforms play within spiculogenesis?

6

ACS Paragon Plus Environment

Page 6 of 46

Page 7 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(3) what similarities and differences exist between a calcitic spicule matrix protein and other well-documented calcium carbonate biomineralization proteins, such as those associated with the mollusk shell nacre aragonite?26-33 We find that rSpSM30B/C is expressed by insect cells as an “ensemble” consisting of both the unglycoslated form and 3 glycosylated variants that create microheterogeneity in rSpSM30B/C molecular masses, similar to what was reported for rAP24G.26 Oligosaccharide analyses reveals that the rSpSM30B/C ensemble consists of polypeptides possessing both N- and O-glycan modifications as well as anionic sialylated and sulfated monosaccharides that enhance its aggregation propensity. Functionally, the rSpSM30B/C ensemble resembles mollusk shell nacre proteins in many important respects: 26-33 rSpSM30B/C is an intrinsically disordered protein that self-associates under low ionic strength and mineralization conditions to form protein phases or hydrogels that prolong the time interval for nucleation and organize mineral nanoparticles. However, what distinguishes rSpSM30B/C from nacre proteins is that the resultant protein hydrogels promote faceted, nanotextured calcite crystal growth and transiently stabilize an ACC polymorph. Thus, the results obtained with the model rSpSM30B/C protein establish two important overarching concepts: (1) the sea urchin spicule matrix protein SpSM30B/C is capable of participating in spicule matrix assembly and mineralization processes;6-24 (2) intrinsically disordered, aggregation-prone biomineralization proteins and their modulatory effects on nucleation, nanoparticle organization, and crystal growth are common to calcium carbonate invertebrate skeletal structures such as mollusk shell 2633

and sea urchin spicules.

7

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

MATERIALS AND METHODS

Bacterial recombinant expression and purification of rSpSM30B/C-N. Target DNA sequence of SpSM30B/C (Accession number P28163 SM30_STRPU, UniProtKB mature expressed form sans 20 AA membrane leader sequence, S. purpuratus database entry Glean3:00826, Glean3:00827)2,4,5,17 was codon optimized and synthesized into the pUC57 vector for bacterial expression of an unglycosylated recombinant variant of SpSM30B/C (denoted as rSpSM30B/CN, where “N” stands for non-glycosylated). This unglycosylated protein was created for comparison with the glycosylated rSpSM30B/C variant to evaluate mass shifts associated with glycosylation. The cloning strategy included the incorporation at the N-terminus of the mature SpSM30B/C sequence a Trx-His tag + TEV cleavage site (Fig S2, Supporting Information), 26,27 and this combined sequence was subcloned into Expression Vector E6 plasmid for E. coli expression as performed by GenScript USA (Piscataway, NJ, USA; http://www.genscript.com/) using their proprietary OptimumGene system.26,27 E. coli BL21 (DE3) was transformed with this recombinant plasmid and a single colony was inoculated into LB medium containing kanamycin; cultures were incubated at 37 oC until cell density reached an OD600 nm of 0.6 – 0.8, at which point β-IPTG was introduced for induction.26,27 A total of 10 liters of bacterial culture were raised and cells were lysed by sonication and rSpSM30B/C-N protein was obtained from inclusion bodies. The supernatant after centrifugation was loaded onto Ni-IDA resin columns for purification and eluted with gradient concentrations (30 – 300 mM) of imidazole.26,27 Eluted fractions were then subjected to dialysis against TEV protease buffer (50 mM potassium acetate, 50 mM Tris pH 7.5, 0.5 mM TCEP) followed by digestion with TEV protease

8

ACS Paragon Plus Environment

Page 8 of 46

Page 9 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

overnight at 25 oC.26,27 This digestion mixture was then again purified on Ni-IDA affinity chromatography to remove the solubility/His tag, followed by chromatography on Superdex 200 (running buffer 50 mM Tris, 100 mM NaCl, pH 8.0) to remove the TEV protease, and fractions were collected, pooled, and dialyzed against storage buffer (50 mM Tris-HCl, 150 mM NaCl, 10% Glycerol, 500 mM L-Arg, pH 8.0).26,27 Total protein yield was 5 mg per 10 L of culture and established as > 90% pure by densitometric analysis of Coomassie Blue-stained SDS-PAGE 420% gradient gels (Fig S3, Supporting Information). Protein concentrations were determined us in g U V s p ec tro phot om et ry a t 28 0 nm and t he EX P A S Y P rot P ar m t ool (http://web.expasy.org/protparam/). N-terminal sequencing of the first 10 residues confirmed the expected sequence (data not shown). Using the protocol described below, MALDI-TOF-MS confirmed that the mass of the purified rSpSM30B/C-N protein (29678.7 Da) corresponds with the predicted MW (29678.7 Da) of the hypothetical G1 - P271 recombinant sequence (Fig 2; Fig S2, Supporting Information). The purified protein was stored at -80 oC until needed. Insect cell/baculovirus recombinant expression and purification of glycosylated rSpSM30B/C-G The cloning, insect cell expression, and purification of rSpSM30B/C was performed by GenScript USA (Piscataway, NJ, USA; http://www.genscript.com/) using their proprietary OptimumGene system.26 The DH10Bac strain was used for the recombinant bacmid (rbacmid) generation. The positive rbacmid containing gp67-His6-TEV SpSM30B/C gene construct (Fig S2, Supporting Information)31 was confirmed by PCR.

rBacmids were transfected into

glycosylation – optimized Sf9 insect cells31 with Cellfectin II (cationic-lipid formulation designed for optimal transfection of insect cells, ThermoFisher Scientific) and cells were incubated in Sf-900 II SFM (serum-free, protein-free insect cell culture medium, ThermoFisher

9

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 46

Scientific) for 5-7 days at 27℃ before harvest. The supernatant was collected after centrifugation and designated as P1 viral stock. The expression of gp67-His6-TEV tagged protein was checked by SDS-PAGE and Western blot using anti-His monoclonal antibody (Genscript, Cat. # A10086), which showed that the expression of the target protein was both intra- and extracellular. Subsequently, the Sf9 cells (100 mL) expressing gp67-His6-TEV-rSpSM30B/C-G protein were harvested at 96 hr post infection and the medium was dialyzed against Ni column running buffer overnight. The medium after centrifugation was loaded onto Ni-IDA resin columns for purification and eluted with gradient concentrations (30 – 300 mM) of imidazole. Eluted fractions were then subjected to dialysis against TEV protease buffer, followed by digestion with TEV protease overnight at 25 oC. This digestion mixture was then again purified on Ni-IDA affinity chromatography to remove the solubility tag, followed by chromatography on Superdex 200 (running buffer 50 mM Tris, 100 mM NaCl, pH 8.0) to remove the TEV protease, and and fractions were collected, pooled, and dialyzed against storage buffer and stored at -80 oC until needed. Total purified insect cell expressed rSpSM30B/C protein yield was 7 mg. Protein concentrations were determined using UV spectrophotometry at 280 nm and the EXPASY ProtParm tool (http://web.expasy.org/protparam/). Densitometric analysis of Coomassie Blue-stained SDS-PAGE 4-20% gradient gels (Fig S3, Supporting Information) indicated that the tag-free purified insect cell-expressed rSpSM30B/C sample consisted of multiple bands that migrated in two defined regions of the gel. The first group, denoted as “ensemble”, were observed as multiple bands that migrated between 30 and 42 kDa. The second group, denoted as “dimer”, were observed as multiple, weakly

10

ACS Paragon Plus Environment

Page 11 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

staining bands that migrated at ~ 66 kDa or almost 2x the MW of the “ensemble” bands.

N-

terminal sequencing of the first 8 residues in the rSpSM30B/C sample confirmed the expected tag-free recombinant rSpSM30B/C sequence, i.e. GQLPGAGG (Table S1, Supporting Information).

Henceforth, this insect-expressed sample, containing an ensemble of post-

translationally modified variants of the SpSM30B/C polypeptide chain, will be referred to as the “rSpSM30B/C-G” (G = glycosylated).

In subsequent experiments described below, stock

concentrations of this sample were created via repetitive exchange from storage buffer into unbuffered deionized distilled water (UDDW) and Amicon Ultra 0.5 mL 10 kDa regenerated cellulose centrifugal filters (Millipore USA), wherein protein samples were washed 3x with UDDW, then concentrated. MALDI-TOF-MS analysis of rSpSM30B/C-N and rSpSM30B/C-G. To precisely determine the actual molecular masses of insect cell expressed, post-translationally modified rSpSM30B/C-G, this protein ensemble was subjected to matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS)26,34 using a Bruker OmniFlex MALDI MS spectrometer in the negative ion mode. For comparison, the rSpSM30B/C-N sample was also analyzed in parallel to determine the differences between unglycosylated and glycosylated polypeptide chain masses that cannot be differentiated by SDS-PAGE due to glycoprotein gel migration anomalies (Fig S3, Supporting Information). Both protein samples were premixed with an equal volume of sinapinic acid (10 mg/mL, ThermoFisher Scientific). A 1.5 µL aliquot of the protein + matrix sample was placed on the steel grid target, allowed to air dry and the sample was covered with 1.5 µl of the sinapinic acid and the drying process was repeated again. Dried samples were analyzed using a Bruker UltrafleXtreme MALDI-TOF/TOF in linear mode with a detection

11

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 46

range of 10-95 kDa, 1.00 m/z resolution, 500 shots/sec, pulsed laser power of 200 mJ at 337 nm. For calibration and comparison, the SIGMA Aldrich ProteoMass Peptide and Protein MALDIMS Calibration kit (SIGMA-Aldrich, USA) were run using identical matrices and conditions. The mMass 5.50 software program 35 was employed for m/Z determinations. For subsequent studies listed below, the protein concentrations for the rSpSM30B/C-G were calculated using the MW value for the predominant glycosylated adduct species, i.e., 33287.4 Da (Fig 2), as an “average” molecular mass of the glycosylated protein ensemble. Oligosaccharide mapping of rSpSM30B/C-G. To identify putative sites for N- and O-linked glycosylation, the programs NetNGlyc,36 NetOGlyc,37 and GlycoPred38 were used to identify putative eukaryotic Asn-Xaa-Ser/Thr sequons and mucin-type GalNAc O-glycosylation sites (Fig 1). N- and O-linked oligosaccharides were released from a 50 μg lyophilized, purified rSpSM30B/C-G sample using non-reductive β-elimination based on published methods39-41 by GlycoSolutions Corporation (Marlborough, MA, USA). Released oligosaccharides were desalted and then fluorescently tagged with 2-aminobenzoic acid (2-AA) using reductive amination.39 Excess 2-AA was removed using acetone precipitation, then labeled oligosaccharides were spotted with matrix (2,5-dihydroxybenzoic acid, DHB) on a stainless steel target and analyzed in negative ion mode MALDI-TOF-MS as described above. Data analysis for MALDI-TOF-MS data was performed using GlycoMod (http://www.expasy.org). A tolerance of ± 0.2 Da was used. The derivative mass for the 2-aminobenzoic acid was 137 g/mol. The search was carried out using three possible adducts: [M-H] -, [M+Na-2H]- or [M+TFA]- (Table S2, Fig S4, Supporting Information). The structure assignment was based on general knowledge of possible glycosylation patterns in insect cells.42 Oligosaccharide cartoons were constructed using

12

ACS Paragon Plus Environment

Page 13 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

GlycoWorkBench (Fig 3, Fig S4, Supporting Information).26 AFM and light microscope imaging of rSpSM30B/C-G protein phases.

The aggregation

characteristics of rSpSM30B/C-N and rSpSM30B/C-G assemblies on mica substrates were performed using tapping mode AFM.26,27,32,33 The apo form [i.e., Ca(II) free] of each protein was imaged in 10 mM HEPES buffer (pH 8.0) at protein concentration of 1.5µM. AFM experiments were conducted in at room temperature using an Asylum Cypher AFM instrument equipped with Olympus AC240 rectangular-shaped silicon-nitride tips with a nominal spring constant of approximately 2 N/m (0.6-3.5 N/m), and a drive frequency varying in the range of ∼ 50 kHz - 90 kHz. All samples (100 µL) were aliquoted onto a freshly stripped surface of mica (0.9 mm thick, Ted Pella, Inc.) and incubated for a period of 15 min at ambient temperature prior to measurement. SPIP 6.6 software was implemented for image processing, noise filtering, and analysis of particle heights and diameters, including the calculation of R q, i.e., the surface roughness of the imaging surface.26,27,32,33 For detection of mesoscale protein hydrogel particles, 5 µL of a 30 µM solution of rSpSM30B/C-G in 10 mM HEPES, pH 8.0, was placed on a clean glass slide and imaged using bright field microscopy (100x lens, Nikon DS-U3 Light Microscope). In vitro mineralization assays. Mineralization assays were adapted from published protocols26-33 and were conducted by mixing equal volumes of 20 mM CaCl 2*2H2O (pH 5.5) and 20 mM NaHCO3 / Na2CO3 buffer (pH 9.75) to a final volume of 500 µL in sealed polypropylene tubes and incubating at room temperature for 1 hr.26-33 Aliquots of rSpSM30B/C-G stock solutions were added to the calcium solution prior to the beginning of the reaction, with final protein assay concentrations = 1.5 μM. The final pH of the reaction mixture was measured and found to be

13

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 46

approximately 8.0 – 8.2.26-33 Mineral and protein deposits formed during the assay were captured on 5 x 5 mm Si wafer chips (Ted Pella, Inc.) that were placed at the bottoms of the vials. Upon completion of the mineralization assay period, the Si wafers were rinsed thoroughly with calcium carbonate saturated methanol and dried overnight at 37 oC prior to analysis. For TEM studies, a 10 µL aliquot of the mineralization assay supernatant was withdrawn at the completion of the assay period, spotted onto formvar-coated Au grids that were glow discharged for 30 sec to remove the contaminants present on the film before the sample preparation, then washed and dried as described above. Electron Microscopy. SEM imaging of the Si wafers extracted from the mineralization assays was performed using a Merlin (Carl Zeiss) field emission SEM (FESEM) using either an Everhart-Thornley type secondary electron detector (SE2) or an annular secondary electron detector (in lens) at an accelerating voltage of 1.5kV, a working distance of 4 mm, and a probe current of 300 pA. Prior to analysis, SEM samples were coated with iridium using a Cressington 208HR sputter coater with thickness controller attachment. Transmission electron microscopy (TEM) imaging and electron diffraction analyses of TEM were performed on a Philips CM20 transmission electron microscope equipped with a LaB 6 filament electron beam source and 1024 x 1024 retractable CCD camera. All imaging and diffraction analyses were performed at 200 kV. A diffraction pattern of a polycrystalline gold standard was used as a calibration scale for all subsequently recorded diffraction patterns. The selected area diffraction (SAD) patterns were analyzed and indexed using CrysTBox software package. Calcium potentiometric titrations of rSpSM30B/C-G. Potentiometric titration experiments were performed on purified rSpSM30B/C-G samples at room temperature by using a computer-

14

ACS Paragon Plus Environment

Page 15 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

controlled titration system operated with the supplied software (Tiamo v2.2, Metrohm GmbH, Filderstadt Germany). The experimental set-up has been described earlier. 43-47A polymer-based ion-selective electrode and a flat-membrane glass electrode were used to monitor the free Ca(II) concentration and pH respectively. During a titration run, CaCl2 (10 mM) solution was dosed at a constant rate of 0.01 mL/min to 10 mL of the protein solution in carbonate buffer (10 mM), which was constantly stirred at 800 rpm at room temperature. Parallel runs were conducted on 5 and 50 nM rSpSM30B/C-G samples. Titrations were performed at constant pH value of 9.0 that were maintained by counter-titration of NaOH (10 mM). Reference and calibration experiments were performed by dosing CaCl2 (10 mM) into carbonate buffer (10 mM, pH 9.0) and water (pH 9.0), respectively.43-47 Bioinformatics. With regard to intrinsic disorder (ID) or unfolded structure, we employed three different predictive algorithms (GeneSilico MetaDisorder; 48 IUP,49 MFD2p50) to map out putative ID regions within SpSM30B/C. Similarly, using three aggregation-prone amyloid predictive algorithms (AGGRESCAN,51 ZIPPERdB,52 WALTZ53), we identified putative cross-beta strand regions within the SpSM30B/C sequence. To determine a hypothetical global structure of SpSM30B/C, we utilized the DISOclust (v1.1) - IntFOLD2 integrated protein structure and function prediction server (University of Reading, UK, using default parameters), which provides tertiary structure prediction/3D sequence homology modeling of protein sequences that contain folded and unfolded sequence elements.54

15

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 46

RESULTS

Analysis of rSpSM30B/C overexpression in E. coli and Sf9 insect cells and evidence of protein oligomerization. Prior to the genomic identification of SpSM30A-F isoforms, early studies 15,17,25 conducted with spicule matrix-extracted SpSM30 proteins provided a range of molecular weights (40,43,46 kDa) for this protein based upon SDS-PAGE separation techniques. More recently, SDS-PAGE studies with SpSM30B reveal that this protein is secreted in ~42 kDa and ~45 kDa forms.1 7 However, since the migration of glycoproteins on SDS-PAGE gels are subject to artifacts and MW distortions due to the interaction of the oligosaccharides with each other and with the gel itself,55-59 there is the possibility that the reported MW values

15,17,25

for SpSM30

isoforms may not be accurate. For this reason, we employed both SDS-PAGE (Fig S3, Supporting Information) and

Fig 2. MALDI-TOF-MS spectra of rSpSM30B/C-N and -G protein samples. [M + H+] adducts and their corresponding m/Z values (Da) are listed. “UG” = unglycosylated rSpSM30B/C; “G” = glycosylated rSpSM30B/C; “d”= dimer forms of UG and G1,2,3 rSpSM30B/C variants. Above each spectra is the corresponding simulated SDS-PAGE banding profile generated for the MALDI-TOF-MS spectra by the mMass 5.50 software program.

16

ACS Paragon Plus Environment

Page 17 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

MALDI-TOF-MS spectrometry to analyze the rSpSM30B/C translation products from both E. coli and Sf9 insect cells. Note that both hosts produce the same polypeptide chain (Fig S2, Supporting Information), once the N-terminal tags have been removed by protease digestion. Thus, the E. coli expressed rSpSM30B/C-N serves as a standard to characterize the glycan modifications performed on the Sf9 expressed rSpSM30B/C-G. On SDS-PAGE gels rSpSM30B/C-N migrates as a single Coomassie staining band at ~30 kDa (Fig S3, Supporting Information).

In contrast, the N-glycosylation-optimized Sf9 insect cell-expressed

rSpSM30B/C-G migrates as a diffusive series of bands at ~30 – 42 kDa, with the upper limit band closely corresponding to the ~42 kDa species reported for SpSM30B. 17 Since the protein components in this rSpSM30B/C-G sample yield the same N-terminal sequence (Table S1, Supporting Information), we conclude that the observed SDS-PAGE gel microheterogeneity for rSpSM30B/C-G arises from post-translational modifications.26

Curiously, we also note the

presence of higher MW bands (> 60 kDa) migrating in the rSpSM30B/C-G gel lane, which were present on the rSpSM30B/C-N gels (Fig S3, Supporting Information). As we shall see in the following sections, these higher MW bands reflect protein aggregation propensity. Using the rSpSM30B/C-N protein as a reference point, we progressed to MALDI-TOFMS spectrometric analyses of both protein samples, where we uncovered some unexpected and surprising phenomena (Fig 2; Fig S3, Supporting Information). Let us first examine the rSpSM30B/C-N sample.

The MALDI-TOF spectra for rSpSM30B/C-N exhibits 3 adduct

species – “1”, “2”, “3” -- with observed m/Z values of 29678.7, 59647.7, and 89448.7 Da, respectively.

These 3 adducts are mass multiples of one another:

(1) The “1” adduct

corresponds to the [M + H+] monomeric form of rSpSM30B/C-N (Fig S2, Supporting

17

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 46

Information), (2) The “2” adduct is ~ 2x the m/Z of “1”, + 145 Da; (3) The “3” adduct is ~ 3x the m/Z of “1”, + 138 Da. In addition, both “2” and “3” exhibit typical peak MALDI-TOF broadening and intensity reduction effects that are typical for non-covalent protein-protein complexes.60,61 On this basis, we conclude that adducts “2” and “3” correspond to the dimeric and trimeric forms of the [M + H+] rSpSM30B/C-N adduct, and thus SpSM30B/C is an aggregation-prone protein sequence. The situation becomes more complex when we analyze the MALDI-TOF-MS spectra for rSpSM30B/C-G (Fig 2). Here, at least 10 detectable adduct species can be identified, with peak broadening evident at higher m/Z values that is typical of non-covalent protein-protein complex behavior in MALDI-TOF.60,61 The first detected adduct appears as a shoulder of the major peak and has a m/Z value of 29678.7 Da, which correlates with the value for [M + H+] rSpSM30B/CN (Fig 2). We conclude that this adduct is an unglycosylated (UG) variant produced by the Sf9 insect cell and represents a minor pool of recombinant polypeptides that were either unmodified or were at one point glycosylated then deglycosylated by the insect cells. 26 The next 3 adducts, G1, G2, and G3, possess m/Z values of 30848.5, 33287.4, and 37157.2 Da, respectively, which represent the post-translationally modified (i.e., glycosylated or “G” versions) of [M + H+] rSpSM30B/C. Using the mass of 29678.7 Da as representative of the [M + H+] rSpSM30B/C-N polypeptide chain, we calculated the post-translational mass shifts for adducts G1, G2, and G3 to be +1170, +3609, and +7479 Da, respectively. Note that the ~ 3.6 kDa post-translational mass shift for the predominant G2 species is close to the mass shift value of ~3-4 kDa reported for the SpSM30 glycoprotein in earlier studies.15,17 Thus, the Sf9-expressed rSpSM30B/C-G protein is expressed as an ensemble of polypeptides that include the unglycosylated polypeptide chain

18

ACS Paragon Plus Environment

Page 19 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

along with 3 glycosylated variants, of which the G2 species (+3.6 kDa glycosylation) is predominant. We now turn our attention to the remaining rSpSM30B/C-G adduct species at m/Z > 50 kDa (Fig 2). Here, we detected three broad asymmetric peaks in this range. Obviously, as m/Z values increase, reduction in signal intensity and increase in signal broadening occur,26,60,61 which makes precise m/Z determinations difficult to obtain and thus the following adduct assignments reflect m/Z estimates. An adduct ensemble appears in the m/Z range of 52 – 72 kDa; this ensemble features 1 major and 3 minor species which mirror the asymmetry and intensity ratios of the 4 major adduct species UG, G1, G2, G3 and roughly correlates to ~ 2x m/Z value of these [M + H+] adducts (Fig 2). Moreover, the m/Z range for this broad adduct peak also correlates with the ~ 60 kDa bands that were detected in our SDS-PAGE gels (Fig S3, Supporting Information). Therefore, we have tentatively assigned this adduct peak to the the dimeric forms (d) of UG, G1, G2, and G3. The remaining two adduct peaks, denoted as “3” and “4”, are extremely broad, weak in intensity, and span the m/Z ranges of 85 - 105 and 115 - 135 kDa, respectively. Once again, these values are ~ 3x and 4x mass multiples of adducts UG, G1, G2, G3 and thus we have tentatively assigned adducts “3” and “4” as the trimeric and tetrameric forms of the unglycosylated and glycosylated rSpSM30B/C-G variants (Fig 2). Note that we were unable to detect a tetrameric form of the rSpSM30B/C-N protein (Fig 2), and as we shall see later on (Fig 4), the absence of a tetrameric form is due to the weaker aggregation propensity of rSpSM30B/C-N relative to rSpSM30B/C-G.

In conclusion, our MALDI-TOF-MS data

confirm the following: (1) insect cell-expressed rSpSM30B/C-G is an ensemble pool consisting of the unglycosylated polypeptide, UG, and three different glycosylated variants G1, G2, G3. (2)

19

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 46

Based upon the MALDI-TOF-MS spectra, the UG component is lower in abundance compared to the G1, G2, and G3 variants; (3) the calculated glycan mass shift (+3609 Da) for the major species, G2, closely corresponds to the glycan shift noted in earlier SpSM30 studies; 15,17 (4) there is evidence that both rSpSM30B/C-N and rSpSM30B/C-G self-associate or aggregate to form higher-ordered protein complexes, with rSpSM30B/C-G exhibiting greater aggregation propensity. Glycosylation mapping of rSpSM30B/C-G. Endoglycosidase digestion experiments revealed that N-linked oligosaccharides comprised ~ 3-4 kDa of the molecular masses of the SpSM30B/C protein.15,25 However, subsequent alkali digestion experiments produced no mass shift on SDSPAGE gels, suggesting that there were no O-linked oligosaccharides present on the SpSM30 proteins.15 In contrast, our bioinformatic predictions provide a different picture of hypothetical SpSM30B/C sequence glycosylation (Fig 1): we identified one putative Asn-Xaa-Ser/Thr sequon N-linkage site (N82) and six putative O-glycosylation sites (T201, T205, T210, T211, T218, T219), yielding a hypothetical total of 7 glycosylation sites.36-38 Thus, the glycosylation pattern of SpSM30 could vary depending on upon the host cell, post-translational processing, and other factors, and thus needs to be re-examined. For this reason, we determined the oligosaccharide profile of the model Sf9-expressed rSpSM30B/C-G ensemble via standardized N- and O-linked chemical digestion protocols 34,39-41 followed by negative ion mode MALDI-TOF-MS spectrometry analyses (Fig 3; Fig S4, Table S2, Supporting Information).26,34 The purpose of these analyses was not to determine the correct in situ spicule glycosylation of SpSM30B/C; rather, we wished to document the glycosylation patterns that insect cells impart to the rSpSM30B/C polypeptide chain, such that we can interpret

20

ACS Paragon Plus Environment

Page 21 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

the results generated in subsequent aggregation studies described later on.26 Intriguingly, these mapping studies revealed the presence of predominantly branched N- and O-glycans with only one example of a linear O-sialylated glycan. The glycan monosaccharide content includes mannose, galactose, N-acetylglucosamine, fucose, and N-acetylneuramic acid (i.e., sialic acid). The presence of galactose and mono- and bisialylated monosaccharides are indicators that complete glycosylation occurred on some of the rSpSM30B/C-G protein molecules, since these monosaccharides usually appear as terminal residues on complete N-linked oligosaccharide chains.26,34,39-42 As shown in Table S2, we detected the presence of branching O- and N-glycans, mono- and bisulfated N-glycans, and mono-, bisulfated and mono- and bisialylated O-glycans. From this, we conclude the following: (1) the presence of both N- and O-glycans confirms that

Fig 3. MALDI-TOF-MS spectra (negative ion mode) of N- and O-linked oligosaccharide fragments released from purified rSpSM30B/C-G via chemical digestion. The structure assignment was based on general knowledge of glycosylation in insect cells. Oligosaccharide cartoons were constructed using GlycoWorkBench. The most likely oligosaccharide fragment is presented for each adduct peak; however, additional monosaccharide matches were found for some masses. Sodium and TFA adducts are denoted. For a complete annotation please consult Table S2, Figure S4, Supporting Information.

21

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 46

the insect cell-expressed rSpSM30B/C-G ensemble consists of some percentage of hybrid glycoprotein molecules; (2) the presence of mono- and bisulfated/bisialylated monosaccharides (Fig 3, Table S2, Supporting Information) generate an anionic environment on the surfaces of rSpSM30B/C protein molecules. These anionic glycans could influence the folding, topology, and surface charge density of rSpSM30B/C proteins, which, in turn, could affect the putative structure and function of these proteins;26 (3) it is plausible that rSpSM30B/C N- and Oglycosylation is unique to insect cell expression alone 42 and is not representative of posttranslational processing within spicule cells during spiculogenesis. rSpSM30B/C-G self-associates to form protein films and particles. Previous studies conducted with other calcium carbonate-based proteins (i.e., aragonite-associated mollusk shell nacre proteins) revealed aggregation phenomena that led to the formation of mesoscale protein phases or hydrogels that possess both film- and particle-like features. 26-28;31-33 From our MALDI-TOFMS studies, we discovered that both rSpSM30B/C-N and rSpSM30B/C-G exhibit aggregation (Fig 2; Fig S3, Supporting Information).

However, as shown in Fig 2, it appears that

rSpSM30B/C-G may possess a higher aggregation propensity, since we were able to detect tetrameric forms for this protein but not for rSPSM30B/C-N. Thus, we employed AFM tapping mode imaging to resolve the following issues:

(1) do spicule matrix proteins such as

SpSM30B/C exhibit aggregation propensities similar to those observed for nacre protein sequences?26-28;31-33 and, (2) Is there a difference between rSpSM30B/C-N and -G in terms of aggregation propensity and protein phase dimensions? As shown in Fig 4, under pH 8.0 conditions which mimic the mineralization scenario, both recombinant proteins aggregate to form nanoscale amorphous-appearing particles and

22

ACS Paragon Plus Environment

Page 23 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Fig 4. (A) AFM tapping mode data (amplitude plots, z-plots) for 1.5 µM rSpSM30B/C-N and rSpSM30B/C-G in 10 mM HEPES, pH 8.0 on freshly cleaved mica surfaces. (B) Corresponding histogram distributions (± S.D.) taken from AFM datasets. The mean particle diameter (BLACK) and height (GRAY) measurements were taken for 30 particles of rSpSM30B/C-N (N) and rSpSM30B/C-G (G) variants. (C) Bright field light microscope image (100x) of a representative rSpSM30B/C-G hydrogel particle (30 µM protein in 10 mM HEPES, pH 8.) Note presence of porosities.

protein films on mica surfaces, similar to what has been reported for nacre proteins. 26-28;31-33 However, we note that the particle diameters obtained for rSpSM30B/C-G are greater in dimension (51%) compared to those obtained for rSpSM30B/C-N, and, only the rSpSM30B/C-G protein is observed to form strand-like aggregates. Furthermore, the surface roughness parameters (Rq) were found to be 0.08, 0.40, and 1.07 nm for plain mica, rSpSM30B/C-N, and rSpSM30B/C-G, respectively, indicating that both proteins formed films on the mica surface, 2628;31-33

with rSpSM30B/C-N > rSpSM30B/C-G by a factor of 2.6 in terms of protein film

thickness. Using light microscopy at higher rSpSM30B/C-G protein concentrations we were able to detect micron-sized, translucent protein particles that resemble hydrogels and feature porosities (Fig 4C). Thus, our AFM (Fig 4) and MALDI-TOF-MS (Fig 2) experiments confirm the following: (1) like nacre proteins, 26-33 the rSpSM30B/C primary sequence is aggregationprone; (2) glycosylation of rSpSM30B/C introduces greater aggregation propensity and dimensionality, leading to the formation of mesoscale porous protein phases or hydrogels. This is 23

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 46

in contrast to recent studies conducted with the bacterial- and insect cell-expressed AP24 proteins, where the non-glycosylated variant was found to exhibit a higher aggregation propensity.26 The effects of rSpSM30B/C-G on the early events of non-classical calcium carbonate nucleation. In subsequent mineralization experiments we will focus exclusively on the rSpSM30B/C-G variant in order to obtain information on SpSM30 glycoprotein functionality. It has been established that nacre protein phases modulate the early events in the non-classical nucleation pathway of calcium carbonates, i.e., the formation of pre-nucleation clusters (PNCs) and their assembly into amorphous calcium carbonate (ACC).26,28,31,33 To determine if this function is also common to rSpSM30B/C-G, we performed standardized Ca(II) potentiometric titration experiments (Fig 5; Table S3, Fig S5, Supporting Information). 43-47 These experiments were conducted at pH 9.0 to minimize bicarbonate ion formation, subsequent outgassing, and pH shift artifacts.43-47 Similar to charged polymers,43-47 other nacre proteins,26,28,31,33 and the CTLL and MAQPG domains of SpSM50,13 rSpSM30B/C-G prolongs the time interval required for nucleation of calcium carbonate particles (Fig 5A; Table S3, Supporting Information). 43-47 With respect to PNC stability, the slope of the prenucleation regime (i.e., the initial linear region) provides us with information on the equilibrium between free and cluster associated Ca(II) ions and the interactions between additive molecules and ion associates, leading to PNC stabilization (i.e., SlopeAdditive < SlopeRef) or destabilization (SlopeAdditive > SlopeRef) (Fig 5B, Table S3; Fig S5, Supporting Information).43-47 Here, we see that relative to the control scenario rSpSM30B/C-G induces a slight shift in equilibrium towards ion associates, thereby stabilizing PNCs.

Finally,

the role of additives on the products of nucleation is reflected in the post-nucleation solubility 24

ACS Paragon Plus Environment

Page 25 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Fig 5. Comparative Ca(II) potentiometric titration data for 5 and 50 nM rSpSM30B/C-G, pH 9.0, 10 mM carbonate. (A) Time required for nucleation; (B) Slope of the linear prenucleation regime. Negative value indicates PNC stabilization; positive value indicates PNC destabilization; (C) Solubility product after nucleation. Negative values indicate the presence of less soluble, possibly crystalline phase; positive values reflect transient stabilization of A C C . Data obtained for negative control (protein-deficient) titrations under identical conditions have been subtracted from each protein dataset. Detailed potentiometric plots and tabular data can be found in Fig S5, Table S3, Supporting Information.

or stability of the ACC phase (Fig 5C, Table S3; Fig S5, Supporting Information). 43-47 Note that the experimental values of solubility products determined here represent the most soluble of all phases present after nucleation. Here, we observe the most notable feature of rSpSM30B/C-G: the solubilities of the initial mineral deposits dramatically increase in the presence of rSpSM30B/C-G, i.e., the protein ensemble transiently stabilizes an ACC polymorph or modulates the kinetics of phase transition relative to the control scenario. This in vitro finding is highly relevant, given that ACC stabilization is an important event prior to MgC crystal formation in the sea urchin spicule.12,22-24 The effects of rSpSM30B/C-G on the later events of calcium carbonate nucleation. To determine how rSpSM30B/C-G affects the later events of mineralization (i.e., crystal growth), we tested this ensemble within our mineralization microassay system which generates predominantly 25

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 46

calcite crystals and allows comparisons of protein functionality and aggregation phenomena in 60 min or less.26-33 This microassay system permits the time-dependent simultaneous observation of nanoscale buoyant protein-mineral complexes (supernatant phase, TEM) and gravitydeposited nano- and mesoscale mineral and protein deposits (Si wafer capture, SEM). Supernatants recovered from protein-deficient controls (60 min) feature typical rhombohedral calcite crystals (Fig 6A). However, in the presence of 1.5 µM rSpSM30B/C-G we observe mesoscale protein matrices (Fig 6B, note arrows) which appear film-like as a result of dehydration fixation procedures utilized for TEM imaging. These matrices contain clusters of elongated mineral nanoparticles. These mineral deposits are single crystal calcite in nature as confirmed by electron diffraction pattern indexing (Fig 6C; Fig S6, Supporting Information)(Fig 6C).

In contrast, nacre-associated protein phases typically generate round or rhombohedral

mineral nanoparticles.26-28,31 An examination of Si wafer-captured mineral deposits at 60 min reveals that proteindeficient controls generate typical rhombohedral calcite crystals (Fig 6D).

However, in the

presence of rSpSM30B/C-G we observe striking modifications of existing calcite crystals (Fig 6E,F). Specifically, the random deposition of protein hydrogels onto existing calcite crystals leads to the formation of {104} faceted crystals that feature nanotextured surfaces along their lengths. Presumably, facets or flat surfaces appear due to some crystal surfaces growing much more slowly relative to other surfaces. This random protein hydrogel deposition and crystal modification process mirrors the behavior of nacre protein hydrogels under identical in vitro conditions,26,27,29,30 but the induction of faceted growth is unique to rSpSM30B/C-G. We conclude that under mineralization conditions rSpSM30B/C-G aggregates to form protein hydrogels that

26

ACS Paragon Plus Environment

Page 27 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Fig 6. EM images of supernatant-rescued deposits (top panel, TEM) and Si wafer captured crystals (bottom panel, SEM) recovered from in vitro mineralization assays (60 min duration). (A, D) mineral deposits, protein-deficient control; (B, E) 1.5 µM rSpSM30B/C-G. C) High magnification TEM image and corresponding electron diffraction pattern of crystals in (B); (F) High magnification SEM image of protein deposits and faceted calcite crystals featuring nanotextured surfaces. In (B,E,F), arrows denote rSpSM30B/C-G protein phases. The indexing of the electron diffraction pattern in (C) can be found in Supporting Information (Fig S6).

(1) induce and organize faceted calcite nanoparticles and (2) modify existing calcite crystals to adopt faceted morphologies and nanotextured surfaces. We believe that protein hydrogel deposition onto nucleating minerals may explain how SpSM30 glycoproteins accumulate in growing portions of the mineralizing spicule.18 The molecular basis of rSpSM30B/C protein phase formation.

Given the in vitro behavior of

the model rSpSM30B/C protein, we were interested in estimating the structural features of the native SpSM30B/C polypeptide chain (Fig 1) and how these might contribute to both the aggregation and mineralization processes. We turned to predictive bioinformatics methods to determine if the same features that are fundamentally important for nacre protein phase formation26-33 are also present in SpSM30B/C: namely, intrinsic disorder (ID)(i.e., the presence of unfolded or unstable protein conformations)48-50;62,63 and amyloid cross-beta strand domains

27

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 46

that foster protein-protein interactions.51-53 With regard to ID, we employed three different predictive algorithms (MetaDisorder; IUP, MFD2p)48-50 to map out putative ID regions within SpSM30B/C (Figure 1).

These

predictive methods identified two common disordered regions: (1) N-terminal Q1-N17; (2) Cterminal R181 – Q270 or the MAQPG region. The remaining region, i.e. the CTLL domain (E92 – D180), is folded. With regard to aggregation propensity, three predictive algorithms (AGGRESCAN,51 ZIPPERdB,52 and WALTZ)53 identified 7 common cross-beta strand regions within the SpSM30B/C sequence (E28-W34, Y42-A55, N82-N89, N94-E107, G121-E129, G147-F162, H182-H189, Fig 1). Of these, 3 cross-beta strand regions (N94-E107, G121-E129, G147-F162) exist within the putative CTLL domain, which suggests that this conserved domain may play an important role in rSpSM30B/C aggregation (Figs 2,4). Interestingly, only one putative glycan site, N82, resides within a predicted aggregation-prone cross-beta strand region just upstream of the CTLL domain (Fig 1). Conversely, the 5 putative O-glycan sites (i.e., T201, T205, T210, T218, T219) reside within the C-terminal MAQPG sequence (Fig 1). This preference of O-glycan sites for unfolded domains such as the MAQPG sequence has been confirmed in other proteomic studies.62,63 Finally, to visualize the overall molecular configuration of SpSM30B/C, we utilized the DISOclust IntFOLD2 3D modeling prediction program 54 to generate a hypothetical global conformation of this sequence (Fig 7). Here, one notes that the SpSM30B/C sequence presents as an extended protein molecule, with a globular CTLL domain (E92-D180) flanked on either side by disordered, extended regions (Q1-L91; R181-Q270 or MAQPG domain). We note that the extended-globular-extended global structure of SpSM30B/C is qualitatively similar to that

28

ACS Paragon Plus Environment

Page 29 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

predicted for the nacre protein, Pif97,27 and for the S. purpuratus protein, SpSM50.11 The IntFOLD predicted structure reveals that the disordered (N- and C-terminus), O-glycosylated (Cterminus), and cross-beta strand regions (CTLL) of SpSM30B/C are all accessible to the surrounding environment, and thus available to participate in many of the documented functions of rSpSM30B/C-G such as protein hydrogel formation (Figs 2,4), the modulation of PNC and ACC formation (Fig 5), the organization of mineral nanoparticles (Fig 6B,C) and the modification of crystal growth and morphology (Fig 6E,F). We note that these traits are also common to many nacre-associated proteins.26-33

Fig 7. IntFOLD predicted global structure of SpSM30B/C. The location of the CTLL (dashed circle) and MAQPG (dashed lines) domains and N, C termini are shown. For the CTLL domain, secondary structures are denoted in color: alpha helical (red), beta sheet (blue), and random coil, loop, or turn regions (gray). The top five models for the CTLL domain region have confidence and P value scores of 4.5E-03 (high) and global model quality scores of 0.5092, with best fit to template crystal structures 1qddA (human lithostathine), 5ao5A3 (Endo180), 1eggB (C-type carbohydrate recognition domain CRD4), and 1t8cA (C-type lectin domain cd23).

29

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 46

DISCUSSION

The fracture-resistant properties and amorphous-to-crystalline transformation process that are inherent within the sea urchin embryonic spicules reflect the participation of proteomes that are expressed and secreted by the spicule cells during development. 1-5,12,22-24 Until now, the participation of the SpSM30 isoforms2-5,9,13,14,17,18 in the spicule mineralization process have not been well-defined. But, with the model rSpSM30B/C system, we have now gained important insights into how these protein isoforms might be involved in the formation of the spicule organic matrix and the subsequent nucleation, stabilization, and crystal growth processes that take place during spiculogenesis. Using recombinant expression we have successfully generated glycosylated rSpSM30B/C and comparative studies confirmed two very important features. First, the rSpSM30B/C sequence can self-associate to form porous protein phases or hydrogels (Figs 2,4). Second, we find that insect cells express a microheterogeneous pool of rSpSM30B/CG polypeptides that consists of a minor unglycosylated species (UG) and 3 major glycosylated variants (G1, G2, G3) of which the G2 species is the predominant form that possesses the approximate glycan mass shift reported for SpSM30B (i.e., ~3.6 kDa)(Fig 2). We believe that the noted mass discrepancies between rSpSM30B/C-G (Fig 2) and those of previously published SpSM30 studies15,17,25 can be attributed to PAGE gel artifacts that are inherent for glycoproteins.48-52 As we noted in insect cell expression studies of the nacre protein, AP24, 26 the observed microheterogeneity in rSpSM30B/C-G masses could arise from a number of plausible post-translational scenarios: for example, variations or omissions in glycan site addition, glycan chain length, branching, and sulfate or sialyl modifications (Fig 3; Table S2, Supporting 30

ACS Paragon Plus Environment

Page 31 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Information). It is not known to what extent the Sf9 insect cell expression system mimics the glycosylation pattern utilized by S. purpuratus. For this reason, additional studies will be required to determine the composition, sequence, and fidelity of rSpSM30B/C glycosylation in vivo. One of the stated goals of this study was to determine the carbohydrate post-translational modifications of rSpSM30B/C polypeptides within insect cells.

What we found was quite

intriguing. First, the model rSpSM30B/C protein is expressed as a hybrid glycoprotein, as evidenced by the presence of both N- and O-linked branching glycans (Fig 3; Table S2, Supporting Information). This confirms bioinformatics predictions of both Ser/Thr and Asn putative glycosylation sites being present within the SpSM30B/C sequence (Fig 1). However, although spicule-extracted SpSM30B/C and SpSM30B possess N-linked glycans, no definitive data exists regarding the presence of O-linked glycans on these proteins.15,17,25 Thus the presence of O-glycans on rSpSM30B/C may reflect the nuances of insect cell post-translational processing and further research will be necessary to determine if spicule-expressed SpSM30A-F proteins are true hybrid glycoproteins.

Second, insect cell glycosylation incorporates additional negative

charges to the protein via the addition of mono- and bisulfated, mono- and bisialylated glycans (Fig 3; Table S2, Fig S4, Supporting Information). These anionic moieties could serve a number of important functions:26 a) Act as sites for electrostatic interactions with cations, ion clusters, or transient mineral phases during nucleation, thus prolonging nucleation time and transiently stabilizing an ACC polymorph or modulating the kinetics of phase transition (Fig 5; Fig S5, Table S3, Supporting Information); b) Act as interaction sites with cationic sidechains (Arg, Lys, His, Fig 1) on other rSpSM30B/C protein molecules, thereby promoting aggregation (Fig 2,4); c)

31

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Altering the structural stability and/or solubility of the rSpSM30B/C protein molecules thereby increasing the inherent aggregation potential of the final secreted glycoprotein over the unglycosylated form (Figs 2,4). Evidence of phenomena (a) and (c) were recently documented in studies of insect cell-expressed nacre protein, AP24G,26 and in molecular simulations of the biomineral-associated glycoprotein, otolin-1.64 We should note that if the rSpSM30B/C-G ensemble is microheterogeneous with respect to glycosylation, then there must exist glycoforms which differ from one another with regard to anionic charge, solvation, and molecular size/topology. This, in turn, explains the m/Z microheterogeneity that we observed in our MALDI-TOF-MS analyses (Fig 2).26 How this microheterogeneity influences spatio-temporal regulation of biomineralization in situ is not known at present, and additional studies of SpSM30 proteins will be required to assess this. The other stated goal of this study was to determine the effects of glycosylated rSpSM30B/C on the early and later events of in vitro calcium carbonate mineralization and provide insights into the potential role(s) that SpSM30A-F isoforms might play within spiculogenesis.9,10,13,15,17,18 With regard to the early events of non-classical nucleation, rSpSM30B/C-G prolongs the time interval for PNC formation and transiently stabilizes an ACC polymorph or modulates the kinetics of phase transition (Fig 5; Fig S5, Table S3, Supporting Information).

ACC stabilization and ACC-to-calcite phase transition are key steps in

spiculogenesis,6,22-24 and thus, like SpSM50,11,21,23,24 the SpSM30 isoforms are likely to be involved in ACC formation and stabilization processes at some point during spicule mineralization. With regard to the later stages of mineralization, our data provides evidence that SpSM30A-F may participate in two ways. First, porous protein phases or hydrogels (Fig 2,4)

32

ACS Paragon Plus Environment

Page 32 of 46

Page 33 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

facilitate the organization and growth of elongated single crystal calcite nanoparticles (Fig 6B). Second, rSpSM30B/C-G protein phases can deposit onto existing crystals and induce morphological and textural changes at the nanoscale that affect the mesoscale properties of the mineral phase, i.e., promote faceted growth over time (Fig 6E,F). Thus, the SpSM30B/C sequence is quite capable of forming hydrogels that modulate PNC, ACC, and calcite crystal formation during different stages of the nucleation process. These findings are highly relevant for spiculogenesis given that (1) mineral nanoparticles form and appear in the spicule matrix during the mineralization process,6,22-24 and (2) SpSM30 isoforms, presumably in the form of phases or hydrogels, are known to accumulate or deposit in calcite in situ.18 Finally, our model rSpSM30B/C dataset exhibits interesting parallels to the molecular and mineralization features noted for mollusk shell nacre-specific proteins such as AP7, 28,29,33 n16.3,30,31 Pif97,27 PFMG1,32,33 and AP24.26

Specifically, all of these proteins possess some

combination of a partially conserved folded domain, ID domains, and amyloid-like cross-beta strand aggregation-prone sequences (Fig 1). In the case of Pif9727 and SpSM30B/C, these similarities extend even further, in that both proteins exhibit a global structure consisting of extended-globular-extended regions (Fig 7). In both the nacre- and the spicule-associated protein sequences the combination of aggregation-prone and conformationally unstable ID regions promotes protein-protein interactions which induce aggregation and lead to the formation of amorphous protein phases or hydrogels.26-33 In turn, these phases or hydrogels modulate PNC and/or ACC formation, organize mineral nanoparticles, and induce nanopatterning and/or alter the morphology or growth direction of calcite in vitro.26-33 These striking qualitative similarities between rSpSM30B/C and nacre-specific proteins suggests that protein-mediated mineralization

33

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

in calcium carbonate invertebrate skeletal elements may follow a common molecular blueprint (i.e., formation of protein phases or hydrogels) that is differentially tuned via the primary sequence and post-translational modifications to a specific functionality within each skeletal system. For example, in the case of the primary sequence, we believe that the combination of CTLL and MAQPG domains (Fig 1)2-5,11 allows intrinsically disordered spicule matrix proteins such as rSpSM30B/C to promote ACC stabilization (Fig 5), calcite nanoparticle organization and elongation (Fig 6), and nanotextured faceted crystal growth (Fig 6) within micromineralization assays. Conversely, the C-RING, von Willebrand Type A, and EF-hand domains of the intrinsically disordered nacre proteins AP7,28,29,33 Pif97,27 and PFMG1,32 respectively, achieve different results (e.g., nanoparticle coatings, intracrystalline modifications) within the same in vitro mineralization systems.27-33 Likewise, in the case of post-translational glycosylation, the addition of glycans either downregulates (nacre AP24) 26 or upregulates (rSpSM30B/C-G, Fig 2, 4) protein aggregation propensities, which impact protein functionalities. 2-5,9,13,14,17,18,26 To further develop this intriguing story, it will be necessary to pursue comparative structure-function studies involving proteomes that reside within different calcium carbonate mineralized skeletal elements (e.g., otoliths,64,65 corals,66 sea urchins,2-5 mollusks26-33), such that the underlying similarities and tissue-specific differences inherent within specific proteomes are better understood.

ACKNOWLEDGMENTS We thank Dr. Liz Kast at GlycoSolutions for her help in the oligosaccharide mapping of rSpSM30B/C-G, Dr. Eric Chang and Ms Gabrielle Williamson for their help with the SEM imaging, Yong Seob Jung for his help with the light microscopy, and 34

ACS Paragon Plus Environment

Page 34 of 46

Page 35 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Prof Derk Joester, Northwestern University, for his help with primary sequence analyses. AFM imaging was conducted at the Molecular Cytology Core Facility, Memorial Sloan-Kettering Cancer Center, New York. This report represents contribution number 82 from the Laboratory for Chemical Physics, New York University.

Funding sources: Portions of this research (recombinant protein synthesis, AFM, SEM, TEM, glycomapping, bioinformatics) were supported by the Life Sciences Division, U.S. Army Research Office, under awards W911NF-12-1-0255 and W911NF-16-1-0262 (JSE).

A.R.

acknowledges a fellowship from Konstanz Research School Chemical Biology.

H.C.

acknowledges funding from the University of Konstanz.

Supporting Information Available. Primary sequence alignments of S. purpuatus spicule matrix proteins SpSM30, SpSM30B, SpSM30C (Fig S1); Recombinant expression and analyses of bacterial rSpSM30B/C-N and insect cell rSpSM30B/C-G variant mixture (Fig S2, S3); Representative N-terminal sequencing first cycle chromatograms of tag-free, purified rSpSM30B/C-G ensemble mixture (Table S1);

MALDI-TOF-MS adducts, monosaccharide

matching, and corresponding oligosaccharide structures for rSpSM30B/C-G protein ensemble (Table S2); MALDI-TOF-MS spectra (negative ion mode) of tag-free purified rSpSM30B/C-G N- and O-linked oligosaccharide profile (Fig S4);

Potentiometric Ca(II) titrations of

rSpSM30B/C-G (Fig S5, Table S3); Electron diffraction pattern obtained for mineral nanoparticle embedded within rSpSM30B/C-G protein matrix (Fig S6). This material is available free of charge via the Internet at http://pubs.acs.org.

35

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 46

REFERENCES

1 Studart, A.R. (2012) Towards high-performance bioinspired composites. Adv. Materials 24, 5024-5044. 2 Cameron, R.A., Samanta, M., Yuan, A., He, D., Davidson, E. (2009) SpBase: the sea urchin genome database and web site. Nucleic Acids Research. http://www.spbase.org/ 3 Sea urchin genome sequencing consortium et. al., (2006) The genome of the sea urchin Strongylocentrotus purpuratus. Science 314, 941-952. 4 Mann, K., Poustka, A.J., Mann, M. (2008) The sea urchin (Strongylocentrotus purpuratus) test and spine proteomes. Proteome Science 6, 1-10. 5 Mann, K., Wilt, F.H., Poustka, A.J. (2010) Proteomic analysis of sea urchin (Strongylocentrotus purpuratus) spicule matrix. Proteome Science 8, 1-12. 6 Seto, J., Ma, Y., Davis, S.A., Meldrum, F., Schilde, U., Gourrier, A., Jaeger, C., Cölfen, H. (2012) Structure-property relationships of a biological mesocrystal in the adult sea urchin spine. Proc. Natl. Acad. Sci USA 109, 3699-3704. 7 Berman, A., Addadi, L., Kvick, A., Leiserowitz, L., Nelson, M., Weiner, S. (1990) Intercalation of sea urchin proteins in calcite: Study of a crystalline composite material. Science 250, 664-667.

36

ACS Paragon Plus Environment

Page 37 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

8 Aizenberg, J., Hanson, J., Koetzle, T.F., Weiner, S., Addadi, L. (1997) Control of macromolecular distribution within synthetic and biogenic single calcite crystals. J. Am. Chem. Soc. 119, 881-886. 9 Sharmankina, V.V., Kiselev, K.V. (2016) Expression of SM30(A-F) genes encoding spicule matrix proteins in intact and damaged sea urchin Strongylocentrotus intermedius. Russ. J. Genetics 52, 298-303. 10 Wilt, F., Killian, C.E., Croker, L., Hamilton, P. (2013) SM30 protein function during sea urchin larval spicule formation. J. Struct. Biol. 183, 199-204. 11 Rao, A., Seto, J., Berg, J.K., Kreft, S.G., Scheffner, M., Cölfen, H. (2013) Roles of larval sea urchin spicule SM50 domains in organic matrix self-assembly and calcium carbonate mineralization. J. Struct. Biol. 183, 205-215. 12 Wu, C.-H., Park, A., Joester, D. (2011) Bioengineering single crystal growth. J. Am. Chem. Soc. 133, 1658–1661. 13 Seto J., Zhang, Y., Hamilton, P., Wilt, F.H. (2004) The localization of occluded matrix proteins in calcareous spicules of sea urchin larvae. J. Structural Biology 148, 23-30. 14 Livingston, B.T., Killian, C.E., Wilt, F., Cameron, A., Landrum, M.J., Ermolaeva, O., Sapojnikov, V., Maglott, D.R., Buchanan, A.M., Ettensohn, C.A. (2006) Genome-wide analysis of biomineralization-related proteins in the sea urchin Strongylocentrotus purpuratus. Devel. Biol. 300, 335-348. 15 Killian, C.E., Wilt, F.H. (1996) Characterization of the proteins comprising the integral matrix of Strongylocentrotus purpuratus embryonic spicules. J. Biol. Chem. 271, 9150–9159. 37

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

16 Killian, C.E., Wilt, F.H. (2008) Biomineralization of the echinoderm endoskeleton. Chem. Rev. 108, 4463–4474. 17 Killian, C.E., Croker, L., Wilt, F.H. (2010) SpSM30 Gene family expression patterns in embryonic and adult biomineralized tissues of the sea urchin Strongylocentrotus purpuratus. Gene Exp. Patt. 10, 135–139. 18 Kitajima, T., Urakami, H. (2000) Differential distribution of spicule matrix proteins in the sea urchin embryo skeleton. Dev. Growth Differ. 42, 295–306. 19 Urry, L.A., Hamilton, P.C., Killian, C.E., Wilt, F.H. (2000) Expression of spicule matrix proteins in the sea urchin embryo during normal and experimentally altered spiculogenesis. Dev. Biol. 225, 201–213. 20 Wilt, F.H., Ettensohn, C.E. (2007) Morphogenesis and biomineralization of the sea urchin larval endoskeleton. In: Baeuerlein, E. (Ed.), Handbook of Biomineralization. Wiley-VCH, Weinheim, pp. 183–210. 21 Wilt, F., Croker, L., Killian, C.E., McDonald, K. (2008) The role of LSM34/SpSM50 in endoskeletal spicule formation in sea urchin embryos. Invert. Biol. 127, 452–459. 22 Tester, C.C., Wu, C.-H., Krejci, M. R., Mueller, L., Park, A., Lai, B., Chen, S., Sun, C., Joester, D. (2013) Time-resolved evolution of short- and long-range order during the transformation of ACC to calcite in the sea urchin embryo. Adv. Fun. Mat. 23, 4185-4194. 23 Politi, Y, Metzler, R.A., Abrecht, M., Gilbert, B., Wilt F.H., Sagi, I, Addadi, L., Weiner, S., Gilbert, P. (2008) Transformation mechanism of amorphous calcium carbonate into calcite in the sea urchin larval spicule. Proc Natl Acad Sci U S A. 105, 17362-17366. 38

ACS Paragon Plus Environment

Page 38 of 46

Page 39 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

24 Gong, Y.U.T., Killian, C.E., Olson, I.C., Appathurai, N.P., Amasino, A.L., Martin, M.C., Holt, L.J., Wilt, F.H., Gilbert, P. (2012) Phase transitions in biogenic amorphous calcium carbonate. Proc Natl Acad Sci USA. 109, 6088-6093. 25

Brown, M.F., Partin, J.S., Killian, C.E., Lennarz, W.J. (1995) Spiculogenesis in the sea

urchin embryo: Studies on the SM30 spicule matrix protein. Develop. Growth Differ. 37, 69-78. 26 Chang, E.P., Perovic, I., Rao, A., Cölfen, H., Evans, J.S. (2016) Insect cell glycosylation and its impact on the functionality of a recombinant intracrystalline nacre protein, AP24. Biochemistry 55, 1024-1035. 2 7 Chang, E.P., Evans, J.S. (2015) Pif97, a von Willebrand and Peritrophin biomineralization protein, organizes mineral nanoparticles and creates intracrystalline nanochambers. Biochemistry 54, 5348-5355. 28 Perovic, I., Chang, E.P., Verch, A., Rao, A., Cölfen, H., Kroeger, R., Evans, J.S. (2014) An oligomeric C-RING nacre protein influences pre-nucleation events and organizes mineral nanoparticles. Biochemistry 53, 7259-7268 29 Chang, E.P., Russ, J.A., Verch, A., Kroeger, R., Estroff, L.A., Evans, J.S. (2014) Engineering of crystal surfaces and subsurfaces by an intracrystalline biomineralization protein. Biochemistry 53, 4317-4319 30 Chang, E.P., Russ, J.A., Verch, A., Kroeger, R., Estroff, L.A., Evans, J.S. (2014) Engineering of crystal surfaces and subsurfaces by framework biomineralization protein phases. Cryst.Eng. Commun. 16, 7406-7409

39

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 46

31 Perovic, I., Chang, E.P., Lui, M., Rao, A., Cölfen, H., Evans, J.S. (2014) A framework nacre protein, n16.3, self-assembles to form protein oligomers that participate in the post-nucleation spatial organization of mineral deposits. Biochemistry 53, 2739-2748 32 Perovic, I., Mandal, T., and Evans, J.S. (2013) A pseudo EF-hand pearl protein self-assembles to form protein complexes that amplify mineralization. Biochemistry 52, 5696-5703 33 Chang, E.P., Roncal-Herrero, T., Morgan, T., Dunn, K.E., Rao, A., Kunitake, J.A.M.R., Lui, S., Bilton, M., Estroff, L.A., Kröger, R., Johnson, S., Cölfen, H., Evans, J.S. (2016) Synergistic biomineralization phenomena created by a nacre protein model system. Biochemistry 55, 24012410. 34 Kuster, B., Wheeler, S.F., Hunter, A.P., Dwek, R.A., Harvey, D.J. (1997) Sequencing of Nlinked oligosaccharides directly from protein gels: In-gel deglycosylation followed by matrixassisted laser desorption/ionization mass spectrometry and normal-phase high-performance liquid chromatography. Anal. Biochem. 250 82-101. 35 Niedermeyer, T.H.J, Strohalm, P. (2012) mMass as a software tool for the annotation of cyclic peptide tandem mass spectra. PloS One 7, e44913, DOI:10.1371/journal.pone.0044913. 36 Chauhan, J.S., Rao, A., Raghava, G.P.S. (2013) In silico platform for prediction of N-, O-, and C-glycosites in eukaryotic protein sequences. doi:10.1371/journal.pone.0067008

40

ACS Paragon Plus Environment

PLOS

One

6:

Page 41 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

37 Steentoft, C., Vakhrushev, S.Y., Joshi, H.J., Kong, Y., Vester-Christensen, M.B., Schjoldager. K.T., Lavrsen, K., Dabelsteen, S., Pedersen, N.B., Marcos-Silva, L., Gupta, R., Bennett, E.P., Mandel, U., Brunak, S., Wandall, H.H., Levery, S.B., Clausen, H. (2013) Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J. 32, 1478-1488. 38 Hamby, S.E., Hirst, J.D., (2008) Prediction of glycosylation sites using random forests. BMC Bioinformatics 9, 500-513. 39 Huang, Y., Mechref, Y., Novotny, M. (2001). Microscale nonreductive release of O-linked glycans for subsequent analysis through MALDI mass spectrometry and capillary electrophoresis. Analytical Chemistry 73, 6063-6069. 40 Bigge, J.C., Patel, T.P., Bruce, J.A., Goulding, P.N., Charles, S.M., Parekh, R.B. (1995) Nonselective and efficient fluorescent labeling of glycans using 2-amino benzamide and anthranilic acid. Analytical Biochemistry 230, 229-238. 41 Pabst, M., Kolarich, D., Pöltl, G., Dalik, T., Lubec, G., Hofinger, A., Altmann, F. (2009). Comparison of fluorescent labels for oligosaccharides and introduction of a new postlabeling purification method. Analytical Biochemistry 384, 263-273. 42 Geisler, C., Jarvis, D. (2009) Insect cell glycosylation patterns in the context of biopharmaceuticals, in Post-translational Modification of Protein Biopharmaceuticals. Edited by Gary Walsh. Wiley-VCH Verlkag GmbH and Co., KgaA, Weinheim, Germany, pp 1-27. 43 Gebauer, D., Kellermeier, M., Gale, J.D., Bergstrom, L., Colfen, H. (2014) Pre-nucleation clusters as solute precursors in crystallization. Chem. Soc. Rev. 43, 2348-2371.

41

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 46

44 Demichelis, R., Raiteri, P., Gale, J.D., Quigley, D., Gebauer, D. (2011) Stable prenucleation mineral clusters are liquid-like ionic polymers. Nature Commun. 2, 1-8. 45 Gebauer, D., Cölfen, H., Verch, A., Antonietti, M. (2008) The multiple roles of additives in CaCO3 crystallization: A quantitative case study. Adv. Mat. 21, 435-439. 4 6 Verch, A., Gebauer, D., Antonietti, M., Cölfen, H. (2011) How to control the scaling of CaCO3: a “fingerprinting technique” to classify additives. Phys. Chem. Chem. Phys. 13, 16811−16820. 47 Kellermeier, M., Colfen, H., Gebauer, D. (2013) Investigating the early stages of mineral precipitation by potentiometric titration and analytical ultracentrifugation. In Research Methods in Biomineralization Science. De Yoreo, J.D., Editor. Methods in Enzymology 532, 45-69. 48 Kozlowski, L.P., Bujnicki, J.M. (2012) MetaDisorder: A meta-server for the prediction of intrinsic disorder in proteins. BMC Bioinformatics 13, 111-122. 49 Dosztányi, Z., Csizmók, V., Tompa, P., Simon, I. (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content, Bioinformatics 21, 3433-3434. 50 Mizianty, M.J., Stach, W., Chen, K., Kedarisetti, K.D., Disfani, F.M., Kurgan, L. (2010) Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources. Bioinformatics 26, 489-496. 51 Conchillo-Sole, O., de Groot, N.S., Aviles, F.X., Vendrell, J., Daura, X., Ventura, S. (2007) AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides. BMC Bioinformatics 8, 65-82.

42

ACS Paragon Plus Environment

Page 43 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

52 Goldschmidt, L., Teng, P.K., Riek, R., Eisenberg, D. (2010) The amylome, all proteins capable of forming amyloid-like fibrils. Proc. Natl. Acad. Sci USA 107, 3487-3492. 53 Oliveberg, M. (2010) WALTZ, an exciting new move in amyloid prediction. Nature Methods 3, 187-188. 54 Roche, D. B., Buenavista, M. T., Tetchner, S. J., McGuffin, L. J. (2011) The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction. Nucleic Acids Res. 39, W171-176. 5 5 Kost, T.A., Condreay, J.P., Jarvis, D.J. (2005) Baculovirus as versatile vectors for protein expression in insect and mammalian cells. Nature Biotechnology 23, 567-575. 56 Robb, R.J., Smith, K.A. (1981) Heterogeneity of human T-cell growth factors due to variable glycosylation. Molecular Immunology 18, 1087-1094. 57 Higa, H.H., Paulson, J.C. (1985) Sialylation of glycoprotein oligosaccharides with N-Acetyl-, N-Glycolyl-, and N-O-Diacetylneuraminic Acids. J. Biol. Chem. 260, 8838-8849. 58 Schwarzkopf, M., Knobeloch, K.P., Rohde, E., Hinderlich, S., Wiechens, N., Lucka, L., Horak, I., Reutter, W., Horsthorte, R. (2002) Sialylation is essential for early development in mice. Proc. Natl. Acad. Sci. USA 99, 5267-5270. 59 Patwa, A., Thiery, A., Lombard, F., Lilley, M.K.S, Boisset, C., Bramard, J.S., Bottero, J.Y., Bathelemy, P. (2015) Accumulation of nanoparticles in jellyfish mucus: A bio-inspired route to decontamination of nanowaste. Nature Sci. Reports 5, 11387-11395. 60 Song, F. (2007) A study of non-covalent protein complexes by matrix-assisted laser desorption/ionization. Am. Soc. Mass Spect. 18, 1286-1290.

43

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

61Yanes, O., Aviles, F.X., Roepstorff, Jorgensen, T.J.D. (2007) Exploring the “intensity fading” phenomenon in the study of noncovalent interactions by MALDI-TOF mass spectrometry. Am. Soc. Mass Spect. 18, 359-367. 62 Kurotani, A., Sakurai, T. (2015) In silico analysis of correlations between protein disorder and post-translational modifications in algae. Int. J. Mol. Sci. 16, 19812-19835. 6 3 Kurotani, A., Tokmakovm A.A.,Yutaka, K., Fukami, Y, Shinozaki, K., Sakurai, T. (2014) Correlations between predicted protein disorder and post-translational modifications in plants. Bioinformatics 30, 1095-1103. 64 Duchstein, P., Kneip, R., Zahn, D. (2013) On the function of saccharides during the nucleation of calcium carbonate-protein biocomposites. Cryst. Growth. Des. 13, 4885-4889. 65 Murayama, E., Herbomel, P., Kawakami, A., Takeda, H., Nagasawa, H. (2005) Otolith matrix proteins OMP-1 and Otolin-1 are necessary for normal otolith growth and their correct anchoring onto the sensory maculae. Mech. Dev. 122, 791-803. 66 Drake, J.L., Mass, T., Haramaty, L., Zelzion, E., Bhattacharya, D., Falkowski, P.G. (2013) Proteomic analysis of skeletal organic matrix from the stony coral Stylophora pistillata. Proc. Natl. Acad. Sci USA 110, 3788-3793.

44

ACS Paragon Plus Environment

Page 44 of 46

Page 45 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Table of Contents Figure

45

ACS Paragon Plus Environment

MAQPG

Biochemistry 1 CTLL N 2 3 4 5 6 7 8 9 10 11 12 200 nm

Page 46 of 46

C 10 μm

ACS Paragon Plus Environment 2 μm