Non-targeted pattern recognition in the search for ... - ACS Publications

0.18 µm film thickness), Thermo PolarisQ ion trap mass spectrometer). Initially, the oven temperature was maintained at. 35 °C for 1 min after pyrol...
0 downloads 0 Views 994KB Size
Subscriber access provided by UNIV OF LOUISIANA

Article

Non-targeted pattern recognition in the search for pyrolysis gas chromatography/mass spectrometry resin markers in historic lacquered objects. Louise Decq, Emmanuel Abatih, Henk van Keulen, Viviane Leyman, Vincent Cattersel, Delphine Steyaert, Emile Van Binnebeke, Wim Fremout, Steven Saverwyns, and Frederic Lynen Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.9b00240 • Publication Date (Web): 09 May 2019 Downloaded from http://pubs.acs.org on May 10, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Non-targeted pattern recognition in the search for pyrolysis gas chromatography/mass spectrometry resin markers in historic lacquered objects. Louise Decq,* +# Emmanuel Abatih, ♯ Henk Van Keulen, ∞ Viviane Leyman, § Vincent Cattersel, £ Delphine Steyaert, † Emile Van Binnebeke, † Wim Fremout+, Steven Saverwyns*+‡ and Frédéric Lynen*#‡ +Department

Laboratories, Royal Institute for Cultural Heritage (KIK-IRPA), Jubelpark 1, 1000 Brussels, Belgium. Science Group, Department of Organic and Macromolecular Chemistry, Ghent University, Krijgslaan 281, 9000 Ghent, Belgium. ♯ Fostering Innovative Research based on Evidence (FIRE), Ghent University, Krijgslaan 281, 9000 Ghent, Belgium ∞ Cultural Heritage Agency of the Netherlands, Hobbemastraat 22 1071 ZC Amsterdam, The Netherlands § Meise Botanic Garden, Nieuwelaan 38, 1860 Meise, Belgium £ Conservation Studies – Heritage & Sustainability, University of Antwerp, Blindestraat 9, 2000 Antwerp, Belgium † Royal Museums of Art and History (RMAH), Jubelpark 10, 1000 Brussels, Belgium #Separation

ABSTRACT: A differential expression analysis technology developed for linear modelling of gene expression data was used in combination with thermally assisted hydrolysis and methylation gas chromatography/mass spectrometry (THM-GC/MS) to support the analysis of lacquers and varnishes on historical objects. Exudates from tropical trees such as Manila copal, sandarac, South American copal and Congo copal, which were frequently used in finishing layers on decorative objects up to the early 20th century, were compared through this approach. Highly discriminating features indicate biomarkers that can help to identify copals in resinous lacquers. The approach allows new, more systematic ways for finding biomarkers in the analysis of lacquered objects of art and varnishes.

When studying the organic composition of historical art objects, researchers are faced with materials of high chemical complexity.1–3 The multiple layers of lacquer can consist of a mixture of oils, resinous materials, colorants, pigments, proteins, carbohydrates and the products of their interaction and degradation. Natural resins, usually an important ingredient of historical lacquer, consist of mixtures of neutral and acidic mono-, sesqui-, di- or triterpenoids.4 Their varied nature, in combination with botanical variations and similarities, can impede precise identification. Furthermore, the possibility of sampling an individual varnish layer is highly limited. If sampling is possible, the sample is small, often only a few µg. Thermally assisted hydrolysis and methylation (THM) using tetramethylammonium hydroxide (TMAH) coupled to gas chromatography/mass spectrometry (GC/MS) has become one of the main analytical techniques for the investigation of organic compounds in historical objects, as it returns the most varied information out of the small sample.1,5–8 With this one analysis, the presence of fatty acids can be detected and interpreted9, and many resinous materials can be identified6,10. Alternatively, trimethylsilylation can be chosen instead of methylation as derivatisation during pyrolysis.11,12 The method presented here is equally suitable for both. After analyzing a historical lacquer sample with THMGC/MS, the resulting pyrogram is scanned manually or (semi-) automatically versus a library of known markers, to find markers that give away its constitution.13 The presence of a

resin can be attested with markers, even if the markers themselves are not (completely) chemically identified.1,13,14 Aging processes influence the intensity of certain markers.15– 17 Different ingredients may interact and markers can overlap. Many resin markers are not uniquely correlated to one resin.4,6,18 Additionally, even small variations in the analysis method may influence the intensity of a peak.19 These issues sometimes prevent conclusive identification of a lacquer’s ingredients. Finding more resin markers can improve this situation. Today, most of the THM-GC/MS biomarkers used for identification of resins, usually its main pyrolysates, are visually detected in a reference sample chromatogram.6,14 However, the most abundant markers may not be the best; they can be less specific for a resin or less resistant to aging. Small quantities of a certain molecule, not yet visually discovered in this traditional way, may contribute to a more complete interpretation of the pyrogram. In this study, the applicability of a non-targeted metabolomics approach is evaluated as a way to automatically list the most significant markers for a resin, in order to enrich existing marker libraries.20 In the last decades, the search for characteristic molecules and biochemical pathways has evolved from manual to in-silico based search to differentiate groups of subjects. Chromatographic big data is processed into a feature matrix.21 A feature is a combination of a certain mass peak with its corresponding retention time. Consequently, a statistical method best suited to the situation is applied to find the most

1 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

significant discriminating features. This approach allows interrogating thousands of potential biomarkers simultaneously in an automated way. The potential of pattern recognition approaches has been confirmed through the discovery of many important biomarkers in the field of medicine and biotechnology22–24, but has been applied only occasionally on THM-GC/MS data to find bio-markers.25–27

Page 2 of 12

METHODS Samples. Today, resins are sold for a variety of applications, including cosmetics and art. In this context, they are often adulterated to optimize their properties and price, and their labeling can be misleading. For this reason, well-described resin samples of non-commercial origin were preferred: 4 sandaracs, 4 Manila copals, 4 Congo copals and 1 South American copal (table 1). All 13 resin samples were stored as lumps in closed jars in the dark, at least in recent history. One sample once belonged to an 18th century collection, most to collections of the 19th or early 20th century; for two the age could not be determined. Although they unavoidably underwent some natural aging during storage, they were not subjected to the harsh climatological conditions that thin lacquer layers often suffer from when exposed to light, oxygen and humidity. Still, due to this mild aging, volatile and unstable components are not expected to be as consistently present in every chromatogram and will not score as high as stable, more consistent features. Similarly, if an impurity would be present in one sample, corresponding markers will probably score low. In the end, these older samples may better reflect the ingredients as purchased by the creators of the historical lacquer. Table 1. Overview of samples. 1

Figure 1. Representative chromatograms from each resin type. Top five ranked features listed in table 2 are indicated with numbers for South American copal (SAM-blue), Congo copal (CON-yellow, only three markers were statistically significant), Manila copal (MAN-green), sandarac (SAN-red); with * for common markers for Congo copal and South American copal (CON-SAM), and # for common markers for sandarac and Manila copal (SAN-MAN).

identifier

Resin

Source institute (inv.)

Congo1

Congo copal (Guibourtia demeusei (Harms) J. Leonard)

1 (11)*

Congo2 Congo3 Manila1 Manila2 Manila3

Manila copal (Agathis dammara (Lamb.) Rich. & A. Rich.)

Sandarac2

1 (53)§ 2 (-)* 3 (BR-CBC-02248)* 3 (BR-CBC-01525)*

Manila4 Sandarac1

1 (122/52)* 1 (148/78)*

Congo4

Certainly, multivariate statistical methods have been used in the research of art objects28,29, including a prominent position for proteomics in binder analysis.30 Multivariate analysis techniques such as principle component analysis (PCA) and partial least square-discriminant analysis (PLS-DA) have been applied to visualize patterns and to interpret groups of known and unknown resinous and historical samples28,29,31, although not on data obtained with pyrolysis GC/MS, to the best of our knowledge. Often, input data is reduced to specific features of interest prior to statistical analysis. In contrast to these targeted approaches, this paper aims to explore the potential of linear models on the full chromatographic data (untargeted), in order to automatically list discriminating markers. In this pilot study, four resins are investigated: Manila copal, Congo copal, sandarac, and South American copal. Copals are a large group of resins known to be difficult to identify.4,10,17 Sandarac is traditionally not described as a copal, but, from a chemical point of view, fits perfectly into this group. The four resins were selected because they largely cover the different members of the copal family, with varying degrees of similarity. Sandarac and Manila copal can be considered as regular copals, with a polymeric backbone constructed of mainly communic acid and communol.17,32 They have many but not all pyrolysates in common15,33 (figure 1). This means that the pure resins can already be discerned by the presence of some eluates. However, when mixed in a lacquer, identification may become more challenging and additional markers become useful. Congo copal and South American copal both belong to the so called enantio copals, with a polymer backbone made of ozic acid and ozol, as do other African copals, and amber class Ic and Id.34–37 Their producing plants are botanically closely related.4

1 (94/24)*

Sandarac (Tetraclinis articulata (Vahl) Mast.)

1 (24)* 3 (BR-CBC-02212)*

Sandarac3

4 (-)§

Sandarac4

5 (A26)#

S.- American

South American copal (Hymenaea courbaril L.)

6 (PR-OP-05/14)*

In situ THM-GC/MS Analysis. The method used is based on earlier experiences in the analysis of varnishes and lacquers.19 Several grains or a full lump of each sample were ground, and a small amount of 200–400 µg was transferred to a glass vial. TMAH was added (80–160 µl 2.5 wt% in methanol from 25 wt% in methanol, SigmaAldrich, and absolute methanol for HPLC analysis, Acros Organics, 99.99%), relative

1 Source institute: 1 - Royal Museum for Central Africa, Tervuren, Belgium; 2- Doerner Institute, Munich, Germany; 3- Meise Botanic Garden, Belgium; 4- Rijksdienst Cultureel Erfgoed, Amsterdam, The Netherlands; 5- Hochschule für Bildende Künste, Dresden, Germany (originally from Vigani cabinet, Cambridge); 6- Royal Institute for Cultural Heritage, Brussels, Belgium. # 18th century collection, * 19th or early 20th century collections; § undetermined age.

2 ACS Paragon Plus Environment

Page 3 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry obtained by dividing the signal intensity by the integrated signal of the corresponding chromatogram. Integrated signal of all peaks in the pyrograms was calculated with AMDIS, as the area under the component after deconvolution. The internal standard, added in a late stadium to monitor full methylation of the sample as well as column and instrument performance, was not used because not entirely related to the amount of sample injected.

to the weight of the sample. This solution also contained 100 ng/µl heptadecanoic acid (Sigma-Aldrich,>98%) as internal standard. The content of the vial was well mixed to homogenize, and 2 µl was transferred to the stainless-steel pyrolysis cup (Eco-cup LF, Frontier Lab) with auto-Rx glass fiber disk for pyrolysis and subsequent chromatography (Frontier Lab MultiShot Pyrolyzer 3030D, in a helium atmosphere at 480°C; Thermo TraceGC gas chromatograph, SLB-5 ms capillary column (Supelco, 20 m x 0.18 mm i.d. x 0.18 µm film thickness), Thermo PolarisQ ion trap mass spectrometer). Initially, the oven temperature was maintained at 35 °C for 1 min after pyrolysis. Next, a 10 °C/min gradient was applied until 240 °C. Finally, the column was heated to a temperature of 315 °C at a rate of 6 °C/min; this temperature was maintained for 5 min. Carrier gas was helium at a constant flow of 0.9 mL/min. The MS transfer line temperature was kept at 290 °C. Ionization was carried out in the ion volume of the ion trap mass spectrometer under the standard EI positive mode at 70 eV. The scan range was 35–650 amu, with a cycle time of 0.59 s. Efforts were taken to minimize the time span between preparation of the mixture with TMAH and its analysis, as a long contact period of the sample with the alkaline can be considered disadvantageous for certain molecules. All samples were analyzed three times, in an alternating order. Experience with the equipment has shown that, as long as the system is not thoroughly overloaded, natural resin peaks in general do not show up in blanks in significant amounts to be comparable with the signals reported for this experiment.

Statistical analysis In search for a method best suited for the detection of small but consequent differences in a rather small sample set, differential expression analysis with limma (Linear Models for Microarray and RNA-Seq Data, version 3.24.15) came into view. This software package is a core component of Bioconductor, an R-based open-source software development project in statistical genomics20,39–41 (using R Studio software version 0.99.893- for generating the PCA plot version 1.1.463) developed as a complete tool for the interpretation of microarrays and RNA sequencing data. Due to their complex experimental designs, such studies often involve only a small number of biological replicates. In response to this statistical challenge, limma has specialized statistical techniques in order to get the most out of each dataset.20 The core component of limma is the ability to fit row by row linear models to assess differential (genetic) expressions. It operates on a matrix of expression values, where each row represents a gene and each column corresponds to an RNA sample. Analyzing the data as a whole, the highly parallel nature of the data structure is used to make statistical conclusions more reliable when the number of samples is small.20 The structure of the matrix of expression values is surprisingly similar to that of the feature matrix after feature extraction of the data discussed in this article. In the latter, the data are sorted in a matrix of signal intensities, with features (m/z - retention time combinations) in the rows and different resin samples in the columns. Linear modelling in limma was carried out using the LMFIT function, fitting a separate model to the expression values for each row. The highly parallel nature of the experimental design lends itself to a particular class of statistical methods, called parametric empirical Bayes.20 the function EBAYES borrows information across all features to obtain more precise estimates of variability within the feature. Being natural exudates, resins can contain volatile and nonvolatile small molecules together with macromolecular polymer components4. Presence of these components can vary due to differences in botanical and geographical origin, treatment, age and storage conditions.4 Important differences between specimens of the same species were observed in the pyrograms of this study. There is no guarantee of linearity in the production of certain pyrolysis products when introduced amounts are variable, and interactions with different concentrations of other materials can affect pyrolytic pathways in unpredictable ways. Within this context of broad variety, metabolic dependencies or ratios of intensities are of lesser interest in the search of THMGC/MS resin markers. A log10 transformation was performed on the integrated values in the feature matrix, to temper discrepancies related to relative abundancies of components 31(intensities are offset from zero before transforming to the logscale to avoid missing values or large variances20).

Feature extraction Pyrolysis and derivatization with TMAH of the complex resinous material causes extensive fragmentation. Good feature detection is therefore a crucial step in the data processing pipeline. The feature extraction process should report as many as possible ‘real’ features (features of low intensity induced by compounds), while keeping the rate of false positives low (feature-like signals caused by chemical noise). MZMine 2.19 was used for this purpose38, following these steps: rubberband baseline correction, peak detection by centroid mass detector, chromatogram builder, deconvolution with local minimum search, deisotopic peak grouper, RT normalize, alignment with join aligner, filtering with a minimum of 3 peaks in a row and gap filling using ‘same RT and m/z range gap filler’. All parameters were optimized based on the observed mass accuracy and the peak widths in both time and mass domains. The algorithm generated a two-dimensional matrix of 19987 features and their corresponding signal intensity integrations in each of the 39 chromatograms. Kovats retention indices were calculated by the AMDIS software (Automated Mass spectral Deconvolution and Identification System, v.2.70), based on the separation of a C7–C40 alkanes mixture. Mass spectral identification was performed using the NIST 11 Mass Spectral Library, using spectra provided by other institutions (via shared libraries of ESCAPE and Users’ Group for Mass Spectrometry and Chromatography MaSC) and published reference data.

Normalization and interpretation Signal intensity can be affected by the sample amount being pyrolyzed. To correct for possible inaccuracy during weighing and sample transfer, the data can be adjusted with normalization, making data more consistent. Normalization was

3 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In this study, limma detects the most significant differences by comparing two groups of chromatograms. Two parameters were combined to track the five most significant differing biomarker candidates. LogFC is a measure for differential expression. It discerns upregulated (positive) from downregulated (negative), and quantifies the average fault change between the two groups. Thresholds were set on logFC>0.0014 (upregulated) and logFC0 : markers for SAN (1) 121,159,199,213,239,255,271,299,314,316,346 (2) 121,145,159,199,213,225,273,299,314,332,(360)*

21.82/2409 22.61/2500

1.78E-09/ 9.14E-06 3.69E-07/ 0.0016

0.46/2.9 0.25/6.6

(3) 155,171,199,203,211,271,283,285,298*

20.11/2205

6.37E-07/ 6.98E-06

0.62/2.0

(4) 173,189,203,285,300 (5) 145,187,199,201,213,227,241,256,269, 287,301,316

20.35/2232 20.96/2304

4.09E-06/ 8.55E-05 7.12E-06/ 0.0030

0.52/14.7 0.10/15.5

Sandarac vs. Manila copal - logFC