Is Machine Translation a Reliable Tool for Reading German Scientific

6 days ago - Max Planck Institute for Polymer Research , Ackermannweg 10, 55128 ... und Funktionen, Houben-Weyl Methoden der Organischen Chemie, ...
0 downloads 0 Views 940KB Size
Article Cite This: J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

pubs.acs.org/jcim

Is Machine Translation a Reliable Tool for Reading German Scientific Databases and Research Articles? Sonia Zulfiqar,*,† M. Farooq Wahab,*,‡ Muhammad Ilyas Sarwar,§ and Ingo Lieberwirth∥ †

Department of Chemistry, School of Sciences & Engineering, The American University in Cairo, New Cairo, 11835, Egypt Department of Chemistry & Biochemistry, University of Texas at Arlington, Arlington, Texas 76019, United States § Department of Chemistry, Quaid-i-Azam University, Islamabad 45320, Pakistan ∥ Max Planck Institute for Polymer Research, Ackermannweg 10, 55128 Mainz, Germany J. Chem. Inf. Model. Downloaded from pubs.acs.org by UNIV OF SUNDERLAND on 10/26/18. For personal use only.



S Supporting Information *

ABSTRACT: A significant number of published databases and research papers exist in foreign languages and remain untranslated to date. Important sources of primary scientific information in German are Beilstein Handbuch der Organischen Chemie, Gmelin Handbuch der Anorganischen Chemie, Landolt-Börnstein Zahlenwerte und Funktionen, Houben-Weyl Methoden der Organischen Chemie, fundamental research papers, and patents. Although Reaxys has acquired Beilstein and Gmelin, many original references are still in German since 1770s, and the information presented in printed and online versions is often not duplicated. To read these resources, either costly professional translation services are needed or a reading knowledge of German has to be acquired. A convenient approach is to utilize machine translation for reading German texts; however, there is a question of translation reliability. In this work, several different platforms that employ neural network for machine translation (NMT) were tested for translation capability of scientific German. From a preliminary survey, Google Translate and DeepL were finalized for further studies (German to English). Excerpts from German documents spanning more than a century have been carefully chosen from standard works. DeepL Translator and Google Translate were found to be reliable for converting German scientific literature into English for a wide variety of technical passages. As a benchmark, human and machine translations are compared for complex sentences from old literature and a recent publication. Care and intuition should be used before relying on machine translation of methods and directions in general. Reagent addition (to or from) may be inverted in some synthetic procedures using machine translations. of new information and finally it was given up as a hopeless task.6 The consolidated versions of Beilstein and Gmelin now appear as Reaxys by Elsevier covering data of 400 journals and patents (English language patents only).7 We have not explicitly compared Reaxys and printed versions of Gmelin and Beilstein here. Although the chemical information is now available in English, the original reference papers from which the data was extracted for Beilstein and Gmelin are still in German. Similarly, printed versions of HW Methoden and Landolt-Börnstein Tables along with other important items listed in Table 1 are not translated. Recently SciFinder (Chemical Abstract Service, American Chemical Society) collaborated with Iconic Translation Machines, Ltd., for translating 3 million of abstracts from Chemische Zentralblatt (1830−1960).8 Many synthetic and professional scientists rely on print versions of these books since many libraries around the world cannot afford these online subscriptions. It is humanly not possible to translate every non-English publication into English because of the sheer amount of material (books, articles, patents) published throughout the world. Consequently,

T

raditionally, physicists, chemists, and mathematicians were trained to have a reading knowledge of one or two foreign languages in addition to English in North American/European universities and around the world.1,2 The trend has almost been discontinued in the last two decades in chemical sciences as English is currently being used as a de facto language of scientific communication.3 Still, a large number of German articles and classical works exist in physical sciences which remain untranslated to date. Table 1 shows an extensive survey of the databases in synthetic chemistry, physics, quantum mechanics, mathematics, engineering, and thermodynamic data in the German language. Some of these databases are partly available in English now. Two extensive databases of factual information on organic and inorganic compounds are Beilstein Handbuch der Organischen Chemie and Gmelin Handbuch der Anorganischen Chemie.4,5 This is followed by Landolt-Börnstein Zahlenwerte, which contains physicochemical information for material science and chemists spanning a century. The above-mentioned texts contain essential historical, chemical, and physical properties of millions of known and verified compounds accumulated over several hundred years. It is natural to ask if translations for these volumes exist or not. Attempts were made to translate Beilstein Handbuch in 1930s into French; however, translators could not keep up the pace with rapid flow © XXXX American Chemical Society

Received: August 6, 2018

A

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Gmelin Handbuch der Anorganische Chemie (Gmelin’s Handbook of Inorganic Chemistry)10

Landolt-Börnstein Zahlenwerte und Funktionen aus Physik, Chemie, Astronomie, Geophysik (Landolt-Börnstein Tables and Functions of Physics, Chemistry, Astronomie, and Geophysics)12 Methoden der Organischen Chemie, Houben-Weyl (Methods of Organic Chemistry, Houben-Weyl) Angewandte Chemie (Applied Chemistry)

b

c

comments

other significant journals Annalen der Physik (where Einstein formulated his famous energy-mass relationship) Zeitschrift f ür Physikalische Chemie Zeitschrift f ur Physik, 1920−1997, merged with other journals to form the new European Physical Journal, contains landmark original papers on quantum mechanics Fresenius’ Zeitschrif t f ür Analytische Chemie published as Analytical & Bioanalytical Chemistry (since 2002) Chemie in unserer Zeit since 1967 equiv to ACS Chemical and Engineering News (still in German) Naturwissenschaf ten, published since 1913, now issued as The Science of Nature 1834−2000, merged with Chemiker Zeitung (1877−2002) now published as Advanced Synthesis & Catalysis

h

Published since 1868, now merged with several publications in European Journal of Inorganic Chemistry and European Journal of Organic Chemistry

Chemische Berichte (Chemical Reports)

g

A comprehensive encyclopedia of physics with 64 volumes dealing with all classical physics, optics, electromagnetism, quantum mechanics, spectroscopy and thermodynamics Chemical reports mainly in organic chemistry

Organic synthetic reactions compiled from 1909 to 1987 in German. After the 1990s, it became the Science of Synthesis in English In 1962, an international version was launched. Each article is published in German and in English to date by the German Chemical Society Many articles are written by original discoverers and Nobel laureates describing their own work and findings (1926−1988)

Handbuch der Physik (Handbook of Physics)

Standard reference work on organic reactions and synthetic approaches Publishing since 1887, covering all disciplines of chemistry

Voluminous works consisting of more than 0.4 million pages since 1881. Volumes from the 1960s are in English and available from Reaxys Database (Elsevier). Extensive database of inorganic and organometallic compounds with Published since 1772, amounting to 1.5 million compounds and 1.3 million their synthesis and brief physical/chemical properties different reactions, with over 85 000 titles. Later volumes are in English. The online version is maintained by Reaxys (Elsevier)11 The world’s largest resource for physical and chemical data. Published from 1883 to date. Later volumes are in English

Largest resource of checked/published physical and chemical information of known carbon compounds7

description

f

e

d

Beilstein Handbuch der Organischen Chemie (Beilstein’s Handbook of Organic Chemistry)9

a

title of publication

Table 1. Major Reference Works in German/Partial German−English

Journal of Chemical Information and Modeling Article

B

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

emerged as a very powerful tool with neural machine translations only in European languages; however, its working principles have not been disclosed by DeepL. According to the Web site of DeepL (https://www.deepl.com/en/translator), it also employs neural network approach. Beilstein Dictionary27 and the expertise of a native German speaker with a specialization in physics and chemistry were utilized to assess any stumbling blocks in the translations, and the whole text was pasted to the Google Translate box (https://translate.google.com). A web version of DeepL Translator (DeepL GmbH, associated with Linguee Dictionary) was used for translating the same selected passages (https://www.deepl.com/en/translator). The Google Translate and DeepL were used on the same dates for comparison purposes. Both DeepL and Google Translate have the feature of highlighting a certain word and find alternative meanings. An additional feature of Google Translate is constant feedback from users and “Suggest an edit” option for a given translation, which is meant for continuous improvement of the translation service. The formatting of subscripts, chemical/mathematical formulas in DeepL, as well as Google Translation, was lost during translations and the formatting was done manually for all passages. Underlined sentences in translations indicate errors, problems, or translations that need improvement. Optical Character Recognition (OCR). A significant challenge before placing the text in Google Translate or DeepL Translator was that all the text before the 1990s was available as a scanned text in portable document format (PDF). A sample of text recognition by Acrobat Pro DC (version 2018.011.20058) from 1897 and a 2005 passage is shown in the Supporting Information. There are significant errors in spellings, chemical formulas, and mathematical notation in older text from OCR. Similarly, in the 2005 article from Angewandte Chemie, umlauts could not be copied. In general, old text from PDF files could not be easily “read” by machine translators without significant mistakes. Because of the old printing and low quality of scans, the chosen sections were first hand typed in MS-Word, followed by comparing spellings with original text. After spelling checks by two authors, the text was machine translated.

machine translation is a viable tool for chemical information retrieval. There are two obvious options for reading foreign texts: (a) professional translation, which is very costly and timeconsuming, or (b) acquire reading knowledge with approximately 2000 words to extract functional but limited information. Since scientific language is characterized by precision and is free from colloquialisms, the use of machine translation for getting the “sense” out of a given foreign language text is a fastest method.13 The question that remains unanswered is, how reliable is today’s machine translation for reading a wide variety of German literature sources in Table 1. The prime concern would be the scientific accuracy of the translated text. Nonetheless, particular attention should also be paid to the pitfalls, and one must be aware of misleading interpretations. In the past, translator output would yield poor results because the translation was entirely based on phrases.14,15 Recent platforms have made a quantum leap in translation quality by using neural machine translation (NMT) to bridge the gap between human and computer-aided translations.15 Previous reports in this literature over the last five decades have focused on translation of chemical names.13,16−19 Herein, we are presenting a comparative analysis of machine translated texts from a variety of advanced German databases and research papers spanning almost a century. The second goal is to assess whether old and modern scientific German can be translated meaningfully and reliably into English or not. To the best of our knowledge, no comparison exists for machine translated scientific and technical German with human interpretation.



METHODOLOGY Selection of Passages and Assessing Machine Translation. In this work, carefully chosen short passages from a wide variety of sources were used to encompass 100 years of literature to assess machine translation from the sources as highlighted in Table 1. It was desired to maintain brevity while ensuring that the information is a stand-alone paragraph. Secondly, the passages reflect different areas of organic chemistry, inorganic chemistry, surface spectroscopy, and quantum chemistry. There are two classes of sources (a) factual databases and (b) research papers. Out of these seven passages, a human translation of only one article exists in English. The first and second passages consist of the data for two compounds, selected from Beilstein Handbuch der Organischen Chemie (1897 and 1933),20,21 which is a representative of German old scientific writing/spellings. The third passage on thermodynamic data is chosen from Gmelin Handbuch der Anorganischen Chemie.22 The fourth passage is from LandoltBörnstein Zahlenwerte und Funktionen aus Physik, Chemie, Astronomie, Geophysik,23 which is the most extensive database series of functions and numerical constants. The fifth piece is a synthesis recipe from Houben-Weyl Methoden der Organischen Chemie. The sixth and seventh passages consist of research articles of fundamental nature, one from a group theory paper by Tisza, where he developed selection rules for molecular vibrations using group theory.24 Last passage has been taken from Angewandte Chemie (2005) related to Raman spectroscopy experiment of surface monolayers.25 The machine translation of Angewandte Chemie article is compared with the text in Angewandte Chemie (International Edition).26 The language is highly specialized and technical in all the selected passages. Google Translate was chosen because its working principles have already been published.14,15 In 2017, DeepL Translator



RESULTS AND DISCUSSION Chemical Information from Beilstein Handbuch der Organischen Chemie and Machine Translation. Printed Beilstein in German represents a monumental reference work for practicing chemists and researchers with a unique writing style which is considered as a very condensed, telegraphic pattern (very short sentences).28 The factual data have a large number of abbreviations in all passages. The print versions of Beilstein often give a very insightful history of organic compounds with respect to their natural occurrence, preparation, structure, and physicochemical properties. Some of these historical aspects are missing from both SciFinder by Chemical Abstract Service and Reaxys. In SciFinder, the abstracts are limited since 1900; however, it is rapidly including pre-1900s data from Chemische Zentralblatt. To test the machine translation of Beilstein passages, full words were added instead of the abbreviations except obvious abbreviations (e.g., alkoh.). Herein, we present two short passages, one representing very old writing style (1897) to modern (1933) spellings for two different compounds. [Quoted with permission from Beilstein, F., Handbuch der Organischen Chemie Dritte Umgearbeitete Auflage; Leopold Voss: Hamburg and Leipzig, 1897; Vol. Dritte Band.] Methylxanthin C6H6N4O2 (identisch mit Heteroxanthin?). C

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

and probably a typographical error/old spelling “f iltrirt” for f iltriert and rother for red, the translations are correct. Kalkmilch should be milk of lime as shown by DeepL, but the meaning is clear in Google Translate as well. The abbreviation ccm was correctly translated into cc (cubic centimeter) except by DeepL. The information on melting point as “Melts against 310°” is also an old style literal translation but the information is correctly presented. The only unusual Google translation is that of “Lange Tafeln” as “long boards” to describe the crystals that, although correct in a literal sense, should be better translated to “long sheets.” The symbol J (Jod = Iodine) in CH3J is CH3I, which is retained in the machine translation. Next, we survey modern spelling and text for a Beilstein entry for methyl phenyl diimide.21 The style is still highly condensed sentences despite 40 years of difference in publication dates (1897 vs 1933). [Bernhard Prager, P. J.; Paul Schmidt, D. S. Beilsteins Handbuch der Organischen Chemie; Julius Springer: Berlin, 1933; Vol. Sechzehnter Band.] Methyl-phenyl-diimid, Methanazobenzol, Benzolazomethan C7H8N2C6H5.N:N.CH3. Beim Eintragen von Quecksilberoxyd in eine äther. Lösung von β - Methylphenylhydrazin (Bd. XV, S. 118) (TAFEL, B. 18, 1742). Geringe Mengen von Benzolazomethan lassen sich erhalten, wenn man äquimolekulare Mengen von Phenylhydrazin und Formaldehyd in alkoh. Lösung aufeinander wirken läßt und das entstandene ölige Produkt mit Wasserdampf destilliert (BALY, TUCK, Soc. 89, 986; vgl. dazu STOBBE, NOWACK, B. 47 [1914], 578. − Gelbes Ö l. Siedet nicht unzersetzt gegen 150°; sehr leicht flüchtig mit Wasserdämpfen. Die Lösung in alkoh. Kali färbt sich bei längerem Stehen rot (KNORR, WEIDEL, B. 42, 3525). [DeepL Machine Translation] Methyl-phenyl-diimide, methanazobenzene, benzenazomethane C7H8N2C6H5.N:N.CH3. When introducing mercury oxide into an ethereal solution of beta-methylphenylhydrazine (vol. XV, p. 118) (TAFEL, vol. 18, 1742). Small amounts of benzene azomethane can be obtained if equimolecular amounts of phenylhydrazine and formaldehyde are allowed to interact in alcohol. solution and distils the resulting oily product with steam (BALY, TUCK, Soc. 89, 986; cf. STOBBE, NOWACK, B. 47 [1914], 578. - Yellow oil. Do not boil undecomposed at 150°; very volatile with water vapours. The solution in alcohol. Potash turns red after prolonged standing (KNORR, WEIDEL, B. 42, 3525). [Date accessed 19 Sept. 2018.] Unlike the previous entry on methylxanthine, the spellings are modern spellings. The machine translation is scientifically accurate and even the standard abbreviation “alkoh.” was correctly translated, as well as “äther. Lösung” according to the Beilstein Dictionary is “ätherische Lösung”, that is, ethereal solution. Google Translate also shows similar translation without any problems except that in one sentence alkoh. Lösung remains alkoh. solution (see Supporting Information). Note the upper case after an abbreviation in machine translation. Extracting Descriptive Thermodynamic Data from Gmelin Handbuch der Anorganischen Chemie. Gmelin’s Handbook of Inorganic Chemistry is a multivolume collection on inorganic and organometallics.5 Since 1771, Gmelin was published in German until 1982. The level of details in Gmelin can be assessed from the fact that there is a separate book on noble gas compounds. The effort to extract information from Gmelin is rewarding in terms of minute details and comprehensive information. Second, the handbook also covers inorganic chemistry information well before 1900s. We have chosen a passage from Gmelin to acquire thermodynamic data on the

Im Harn von (Kaninchen und) Hunden, denen Theobromin (BONDZYNSKI, GOTTLIEB, B. 28, 1114; ALBANESE, G. 25 [2] 320) oder Kaffein̈ (ALBANESE) eingegeben wurde. Man fällt den Harn mit Kalkmilch, f iltrirt, säuert das Filtrat mit Essigsäure an und fällt mit Kupferacetat. Der Niederschlag wird durch H2S zerlegt. − Krusten; mikroskopische Säulen oder lange Nadeln (aus heißem Wasser). Schmilzt gegen 310° unter Zersetzung und Sublimation. Löslich in 1592 Thln. Wasser von 18°, in 109 Thln. kochendem Wasser, in 7575 ccm absol. Alkohol bei 17°, und in 2250 ccm kochendem Alkohol. Unlöslich in CHCl3. Wird, aus der Lösung in Natronlauge, durch NH3−Salze gefällt. Beim Erhitzen ̈ Beim des Silbersalzes mit CH3J (+ Holzgeist) entsteht Kaffein. Abdampfen mit HCl und einer Spur KClO3 hinterbleibt ein rother Fleck, der durch Zusatz eines Tropfens Kalilauge violett wird. − Na.C6H5N4O2 + 4H2O. Lange Tafeln und Säulen. − Ba(C6H5N4O2)2(bei 100 bis 105°). Niederschlag; Rosetten (aus heißem Wasser). − Ag2O. C6H6N4O2 (bei 120°). Gelatinöser Niederschlag. Unlöslich in CHCl3. [Google Translate] Methylxanthine C6H6N4O2 (identical to heteroxanthine?). In the urine of (rabbits and) dogs to whom Theobromine (BONDZYNSKI, GOTTLIEB, B. 28, 1114, ALBANESE, G. 25 [2] 320) or Kaffein (ALBANESE) was entered. The urine is precipitated with lime, filtered, the filtrate acidified with acetic acid and precipitated with copper acetate. The precipitate is decomposed by H2S. - crusts; microscopic columns or long needles (from hot water). Melts against 310° with decomposition and sublimation. Soluble in 1592 parts water of 18°, in 109 parts boiling water, in 7575 cc absolute. Alcohol at 17°, and in 2250 cc of boiling alcohol. Insoluble in CHCl3. Is, from the solution in sodium hydroxide, precipitated by NH3 salts. When the silver salt is heated with CH3J (+ Holzgeist), caffeine is formed. Evaporation with HCl and a trace of KClO3 leaves a red spot, which turns purple when a drop of potassium hydroxide solution is added. - Na.C6H5N4O2+ 4H2O. Long boards and columns. - Ba(C6H5N4O2)2 (at 100 to 105°). Precipitation; Rosettes (from hot water). - Ag2O. C6H6N4O2 (at 120°). Gelatinous precipitate. Insoluble in CHCl3. [Date accessed 17 Sept. 2018.] [DeepL Machine Translation] Methylxanthine C6H6N4O2 (identical to heteroxanthine?). In the urine of (rabbit and) dogs fed theobromine (BONDZYNSKI, GOTTLIEB, B. 28, 1114; ALBANESE, G. 25 [2] 320) or caffeine (ALBANESE). The urine is precipitated with lime milk, filtered, the filtrate acidified with acetic acid and precipitated with copper acetate. The precipitate is broken down by H2S. - Crusts; microscopic columns or long needles (from hot water). Melts toward 310° under decomposition and sublimation. Soluble in 1592 parts water of 18°, in 109 parts boiling water, in 7575 ccm absolute. Alcohol at 17°, and in 2250 ccm boiling alcohol. Insoluble in CHCl3. Is precipitated from the solution in sodium hydroxide solution by NH3 salts. When heating the silver salt with CH3J (+ wood alcohol) caffeine is formed. When evaporating with HCl and a trace of KClO3, a red stain remains, which becomes violet when a drop of potash lye is added. - Na.C6H5N4O2+ 4H2O. Long plates and columns. - Ba(C6H5N4O2)2(at 100 to 105°). Precipitation; rosettes (from hot water). - Ag2O. C6H6N4O2 (at 120°). Gelatinous precipitate. Insoluble in CHCl3. [Date accessed 17 Sept. 2018.] The information translated into English from Google Translate and DeepL is a decent translation, and a century-old piece of information has been accurately translated with very minor issues such as “dogs to whom Theobromine and Kaffien was entered.” Machine translation from DeepL shows no such problems. Despite old spellings of Theilen (Teilen, which means parts) D

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Table 2a So Einheit (Unit) Konstante Ab (Constant Ac) Konstante Bb (Constant Bc) Temperaturbereich in K (Temperature range) Druckbereich (Pressure range)

−1 (2.9+2.8 −1.3) × 10

5.24 × 10−4

8.9 × 10−3

Torr·I cm3 ·Torr 1/2

H /W atm1/2

atm1/2

−2.55

−3.78

−5250

6.61 × 10−5 Molenbruch atm1/2

1.1 × 10−3

−0.29

ij mole fraction yz jj z j atm1/2 zz k { −4.68

−3.46

−1088

−9360

−1093

−4373

1100 bis (to) 2400

1170 bis (to) 1740

1940 bis (to) 2160

1873 bis (to) 2700

2700 bis (to) 3273

600 bis 10−8 Torr (8 × 104 bis 1.33 × 10−6 Pa)

1 atm (105 Pa)

1 atm (105 Pa)

1, 25, 100 atm (105 bis 107 Pa)

ppm

a Table number has been added by authors. bDie Konstanten A und B sind aus Werten in den Originalen berechnet; die Werte von A gelten für Konzentrationen in Atom-% und Drücke in Pa. c[DeepL Translation] The constants A and B are calculated from values in the originals; the values of A are valid for concentrations in atomic% and pressures in Pa (date accessed 20th Sept. 2018).

related fields of Chemistry, Physics, Astronomy, and Geophysics for more than 100 years.28 Currently, all the data is maintained by Springer Materials. Like Beilstein and Gmelin, the online versions are not available in many universities libraries, although printed German versions can be found. The style of published Landolt-Börnstein tables is to explain the concept in the text as well as equations before showing the tabulated values. The language of this database is formal, descriptive, and very different from Beilstein’s compact language. [Adapted with permission from Eucken, A., LandoltBörnstein Zahlenwerte und Funktionen aus Physik, Chemie, Astronomie, Geophysik und Technik, 6th ed.; Springer-Verlag, GmbH: Berlin, 2013 (softcover reprint of the hardcover sixth edition).] Bindungspolarisierbarkeiten. Berechnet aus bekannten Hauptpolarisierbarkeiten (siehe Tabelle 3) geeigneter Molekeln. Da die Polarisierbarkeit einer Molekel die Eigenschaf ten eines Tensors besitzt, kann sie rein formal in für die einzelnen Bindungen charakteristische Beträge zerlegt werden. Aus den Bindungspolarisierbarkeiten parallel und senkrecht zur Valenzrichtung lassen sich dann umgekehrt durch eine Tensoraddition die Hauptpolarisierbarkeiten einer Molekel berechnen. Es muß aber betont werden, daß diese Werte nur formale Bedeutung haben und nicht die den einelnen Bindungen wirklich zugehörigen Werte darstellen, weil infolge der sehr starken Wechselwirkung die resultierenden Hauptpolarisierbarkeiten auch nicht ungefähr als die Summe der Polarisierbarkeiten einzelner Bindungen dargestellt werden können. Nur die mittlere Polarisierbarkeit läßt sich einigermaßen als die Summe der wahren mittleren Polarisierbarkeiten der einzelnen Bindungen darstellen. [DeepL Translation] Bond polarizabilities. Calculated from known main polarizabilities (see Table 3) of suitable molecules. Since the polarizability of a molecule has the properties of a tensor, it can be formally broken down into characteristic amounts for the individual bonds. From the bond polarizabilities parallel and perpendicular to the valence direction, the main polarizabilities of a molecule can be calculated by tensor addition. However, it must be stressed that these values have only formal meaning and do not represent the values really belonging to the individual bonds, because due to the very strong interaction the resulting main polarizabilities cannot be represented approximately as the sum of the polarizabilities of individual bonds either. Only the mean polarizability can be represented to some extent as the sum of the true mean polarizabilities of the individual bonds. [Date accessed 29th Sept. 2018.]

solubility of hydrogen in tungsten, which was considered at one time to be almost insoluble.29 [Quoted from Kotowski, E. H. E. P. u. A., Gmelin Handbuch der Anorganischen Chemie, W, Systeme mit Edelgassen, 8th ed.; Springer-Verlag: Berlin Heidelberg GmbH, 1978, with permission.] Lösung von Wasserstof f in Wolfram. Wolfram weist eine außerordentlich geringe Löslichkeit f ür Wasserstof f auf, so daß in älteren Arbeiten keine oder keine merkliche Löslichkeit beobachtet werden konnte. H2−Gas wird in W atomar gelöst, dementsprechend gilt das Sieverts-Gesetz, c ∼ p (c = H-Konzentration, p = H2-Druck); außerdem können die Ergebnisse meist durch die Arrhenius-Beziehung S = So exp(−ΔH/RT) beschrieben werden (So= Löslichkeitskonstante, ΔH = Lösungswärme), aus der die Beziehung lg c = 1/2 lg p + lg S = 1/2 lg p + A + B/T folgt. Die isobare Löslichkeit nimmt mit steigender Temperatur zu. Die Daten für So, A und B sind in folgender Tabelle zusammengestellt: [DeepL Machine Translation.] Tungsten has an extraordinarily low solubility for hydrogen, so that no or no noticeable solubility could be observed in older studies. H2-gas is dissolved atomically in W, accordingly the Sieverts law, c ∼ p (c = Hconcentration, p = H2-pressure) applies; in addition the results can mostly be described by the Arrhenius relation S = So exp(-ΔH/RT) (So = solubility constant, ΔH = heat of solution), from which the relation lg c = 1/2 lg p + lg S = 1/2 lg p + A + B/T follows. The isobaric solubility increases with increasing temperature. The data for So, A and B are compiled in Table 2. [Date accessed 20th Sept. 2018.] The translation for entries in all tables were entered one by one and added into the original table. As in the preceding sections focused on concise factual information, the translation from DeepL and Google Translate (see Supporting Information) is entirely free from errors in this case of Gmelin. The only difference between Google Translate was in “H2-Gas wird in W atomar gelöst” which means that hydrogen gas in W is atomically dissolved. Google Translate suggested “H2 gas is atomically resolved in W.” The symbol of the logarithm, written in Gmelin as “lg” is entirely intact and not mistranslated. Machine Translation of Data from Landolt-Börnstein Zahlenwerte und Funktionen aus Physik, Chemie, Astronomie, Geophysik.23 The Landolt-Börnstein tables both in print and online versions are the most extensive collection of critically evaluated property data in Materials Science and the closely E

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Table 3a,b,c

represent the values really associated with the individual bonds, because, owing to the very strong interaction, the resulting principal polarizabilities can not be represented approximately as the sum of the polarizabilities of individual bonds.” which is also very close to human translation. A Synthetic Procedure From Methoden der Organischen Chemie (HW). As stated earlier in Table 1, Methoden der Organischen Chemie or HW as it is nicknamed is a standard multivolume reference work on synthetic organic chemistry.30 The multivolumes of HW provides preparative methods in a critical and detailed manner. Optimized methods are highly comprehensive with detailed instructions. An additional advantage of HW is the graphical schemes of reactions. The chosen passage consists of short sentences as instructions for synthesizing a derivative of ruthenocene.30 [Quoted with permission from Houben-Weyl, 1986; Vol. E 18, p 237.] (Dimethylamino-methyl)-ruthenocen: Eine Lösung von 1,25 g (5,4 mmol) Ruthenocen in 14 ml Eisessig und 1,4 ml Phosphorsäure wird bei 20° tropfenweise mit 1,2 ml frisch destilliertem Bis-[dimethylamino]-methan versetzt und die Mischung 8 Stdn. bei 120° im Stickstoff-Strom gerührt. Nach dem Abkühlen wird mit Wasser verdünnt und mit Ether extrahiert, wobei 0,28 g nicht umgesetztes Ruthenocen zurückgewonnen werden. Die wäßr. Lösung wird mit Natronlauge stark alkalisch gestellt, mit Ether extrahiert, die vereinigten Ether-Phase mit Wasser gewaschen, über Magnesiumsulfat getrocknet und i. Vak. eingedampft; Ausbeute: 1,09 g [(70%); 90%, auf umgesetztes Ruthenocen]; hellgelbes, rasch kristallisierendes Ö l; Schmp.: 39−42°. [Google Translate.] (Dimethylamino-methyl)-ruthenocene: A solution of 1.25 g (5.4 mmol) of ruthenocene in 14 mL of glacial acetic acid and 1.4 mL of phosphoric acid is added dropwise at 20° with 1.2 mL of freshly distilled bis [dimethylamino]methane and the mixture stirred for 8 h at 120° in a stream of nitrogen. After cooling, it is diluted with water and extracted with ether to recover 0.28 g of unreacted ruthenocene. The aqueous solution is made strongly alkaline with sodium hydroxide solution, extracted with ether, the combined ether phase washed with water, dried over magnesium sulfate and evaporated in vacuo; Yield: 1.09 g [(70%); 90%, on converted ruthenocene]; pale yellow, rapidly crystallizing oil; Melting point: 39−42°. [Accessed 22 Sept. 2018.] [DeepL Translation] (Dimethylamino-methyl)-ruthenocene: A solution of 1,25 g (5,4 mmol) ruthenocene in 14 mL glacial acetic acid and 1,4 mL phosphoric acid is added dropwise at 20° with 1,2 mL freshly distilled bis[dimethylamino]-methane and the mixture is stirred for 8 h at 120° in a nitrogen stream. After cooling, dilute with water and extract with ether, recovering 0,28 g of unreacted ruthenocene. The aqueous solution is strongly alkaline with sodium hydroxide solution, extracted with ether, the combined ether phase washed with water, dried over magnesium sulfate and evaporated in vacuum; yield: 1,09 g [(70%); 90%, on converted ruthenocene]; light yellow, rapidly crystallizing oil; melting point: 39−42°. [Accessed 22 Sept. 2018.] The HW Methoden procedure has several standard abbreviations.6 Since the abbreviations end in a period it may “confuse” machine translation leading to undesired uppercases. The abbreviations Stdn., wäβr. and i. Vak., Schmp were written as Stunden (hours), wäβrig (aqueous, Old German spelling of wässerig), im Vakuum (in the vacuum), and Schmelzpunkt (melting point). Both Google and DeepL translation are smooth and appear to be free from problems. The commas have been correcty changed to decimal places but DeepL retains the European convention and

Polarisierbarkeiten (Polarizabilities)

Bindung

Parallel zur Valenzrichtung α∥ 1025 [cm3]

Senkrecht zur Valenzrichtung α⊥ 1025 [cm3]

Mittlere Polarisierbarkeit α 1025 [cm3]

(Bond)

(Parallel to valence direction)

(Perpendicular to valence direction)

(Average Polarizability)

7,9 18,8 22,5 28,6 35,4 36,7 50,4 19,9

5,8 0.2 4,8 10,6 12,7 20,8 28,8 7,5

6,5 6,4 10,7 16,6 20,3 26,1 36,0 11,6

C−Hal Cal−Cal Car−Car CC CC C−Cl C−Br >CO (Carbonyl) CO (CO2) CS (CS2) CN (HCN) H−S (H2S) H−N (NH3)

20,5

9,65

75,7 31

27,7 14

23,0 5,8

17,2 8,4

13,3

7,5

a

The table number has been added by authors. bWeiterhin haben G. Otterbein: Physikal. Z. 35 (1934) 249 sowie G. Sachse: Physikal. Z. 36 (1935) 357 versucht, aus Messungen an Flüssigkeiten f ür einzelne Gruppen Hauptpolarisierbarkeiten abzuleiten. Da in der Flüssigkeit die Wechselwirkung der Molekeln einen wesentlichen, aber nicht näher bestimmbaren Einfluß ausübt, sind diese Werte nicht mehr f ür die Einzelmolekel charakteristisch. c[DeepL Translation] Furthermore G. Otterbein: Physical. Z. 35 (1934) 249 as well as G. Sachse: Physikal. Z. 36 (1935) 357 attempts to derive main polarizabilities from measurements on liquids for individual groups. Since the interaction of the molecules in the liquid exerts an essential, but not more precisely determinable influence, these values are no longer characteristic for the individual molecules.

This excerpt was the descriptive style of Landolt-Börnstein Tables, and the machine translation from Google Translate, as well as DeepL, provides complete sense with even older spellings of “Molekeln” for “Moleküle” and minor problems of translating Bindung as “Binding” by Google Translate whereas DeepL translates it correctly as Bond in the quoted Table 3. Note that both Google and DeepL allow alternate translations. For example, the heading in the Landolt table “Senkrecht zur Valenzrichtung” was correctly translated as “perpendicular to the valence direction” by Google, whereas DeepL initially suggested “vertical to the valence direction”. The alternate translation “perpendicular to the valence direction” was available. Additionally, from both DeepL and Google Translate (Supporting Information), the word “reversely = umgekehrt” is missing. From the bond polarizabilities parallel and perpendicular to the valence direction, the main polarizabilities of a molecule can be reversely calculated by tensor addition. The only caveat was found with complicated run-on sentences (underlined) which were typically found in old writing styles. The human translation of this underlined sentence would be “ It has to be emphasized, however, that these values have only a formal meaning and do not represent the values associated to the individual bonds, because due to the very strong interaction, the resulting main polarizabilities cannot even approximately be represented as the sum of the polarizabilities of individual bonds.” Google Translate shows “It must be emphasized, however, that these values have only formal meaning and do not F

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling e ilϕ + e−ilϕ = 2coslφ

British spellings (sulphate, crystallising). However, there is a common major pitfall which the readers must be aware of in machine translations. The very first direction in Google Translate and DeepL is quite misleading. The true suggestion is to add freshly distilled Bis-[dimethylamino]-methane dropwise to the ruthenocene mixture rather than the other way round. This problem may arise in sentences where the subject is missing. We speculate that the translation went wrong because of the term “wird ... mit ... versetzt” is in passive voice, but the subject of the sentence is missing., A wird mit B versetzt implies B was added to A, so the readers should be cautious of descriptive methods and double check them. Interestingly, Google translate shows another translation of the first sentence on some other days because of the word versetzt as “A solution of 1.25 g (5.4 mmol) of ruthenocene in 14 ml of glacial acetic acid and 1.4 ml of phosphoric acid is treated dropwise at 20° with 1.2 ml of freshly distilled bis [dimethylamino] methane and the mixture for 8 h at 120° stirred in a nitrogen stream” which is close to the original meaning. DeepL also shows such variance “A solution of 1,25 g (5,4 mmol) ruthenocene in 14 ml glacial acetic acid and 1,4 ml phosphoric acid is mixed dropwise at 20° with 1,2 ml freshly distilled bis[dimethylamino]methane and the mixture is stirred for 8 h at 120° in a nitrogen stream.” Similarly, “on converted ruthenocene” can be understood from the context as “with regards to converted ruthenocene.” Care must be exercised in reagent addition as the order may flip “from” or “to.” Excerpt From Scientific Journals. The second part of the manuscript deals with research papers. The writing style of scientific articles is significantly different from either Gmelin, Beilstein, and Landolt-Börnstein. The notable style in such articles is to use long sentences, which are sometimes called as Bandwurmsatz (tapeworm sentences). Such sentences are stumbling blocks for new language learners as well as machine translations. This writing style is characteristic of many landmark scientific papers of both classical English and German works. Passages from Zeitschrift f ur Physik und Angewandte Chemie are chosen for illustrating the difference between old and the new style. Zeitschrift für Physik. This excerpt is from Tisza’s seminal work “Zur Deutung der Spektren mehratomiger Moleküle”24 “On the interpretation of Spectra of Polyatomic Molecules”, where he used symmetry arguments for infrared and Raman spectroscopy for the first time. Herein, we explore how a mathematical text is translated where symbols are interspersed in the text. This article, although influential in group theory, has not been translated into English. The historical importance of this paper is that Tisza also made use of the proper character tables which we see today in spectroscopic usage. [Quoted with permission Tisza, L., Zur Deutung der Spektren mehratomiger Molekü le, Zeitschrift fü r Physik 1933, 82 (1−2), 48−7.] Es sind Moleküle, bei denen eine sogenannte zufällige Entartung auf tritt, die nicht aus der Symmetrie, sondern daraus folgt, daß die Schrödinger Gleichung des Systems reell ist und zu einem einfachen der Gruppe G zufolge nicht entarteten Eigenwert eine komplexe Eigenfunktion gehört. Dann wird die konjugiert komplexe Funktion eine Eigenf unktion desselben Eigenwertes. Physikalisch ist zwischen der notwendigen und dieser zufälligen Entartung kein wesentlicher Unterschied. Im allgemeinen können wir die Gruppe derart erweitern, daß die Entartung dann als Folge der Symmetrie auf tritt. Betrachten wir etwa C∞. Die einreihigen Darstellungen eilφ, e−ilφ sind miteinander zufällig entartet. Wenn wir die Gruppe mit der Spiegelung σv zu C∞v erweitern, tritt statt dieser eine zweireihige Darstellung mit dem Charakter auf.

Wir können also C∞v statt C∞ untersuchen, doch ist folgendes zu beachten: C∞v hat Darstellungen, die sich bloß darin unterscheiden, daβ sie bei Anwendung von σv gerade bzw. ungerade sind. (Die Darstellungen A1 und A2 in §7,2). Als Darstellungen von C∞ muß man diese freilich identifizieren. Ein ähnliches Verhältnis ist zwischen Cp und Cpv bzw. T und Td (Tetraederdrehspiegelgruppe). Bei diesen ist es ebenfalls überflüssig, die ersteren zu untersuchen. Schließlich haben wir ebenfalls folgende Gruppen zu behandeln: 1. Inversion (Ci) und Spiegelung (Cs) 2. Zweidimensionale Drehspiegelgruppe (C∞vD∞) 3. p-zahlige Achse a) p ungerade (Cpv, Dp) b) p gerade (Cpv, Dp, Spv) 4. Oktaeder- und Tetraedergruppe (O, Td) 5. Ikosaedergruppe (J) Dementsprechend ist § 7 in f ünf Punkte eingeteilt. Es kommen in jedem Punkte erst die Klassen der entsprechenden Gruppe mit den Charakteren von sämtlichen irreduziblen Darstellungen. Isomorphe Gruppen haben dieselben Darstellungen, wir behandeln sie stets parallel [Google Translate] They are molecules in which a so-called accidental degeneracy occurs, which does not result from symmetry, but from the fact that the Schrödinger equation of the system is real and that a complex eigenfunction belongs to a simple eigenvalue that does not degenerate according to group G. Then the conjugate complex function becomes an eigenfunction of the same eigenvalue. Physically, there is no essential difference between the necessary and this accidental degeneration. In general, we can extend the group so that the degeneracy occurs as a consequence of symmetry. Consider about C∞. The single-row representations eilφ, e−ilφ are degenerated with each other at random. If we extend the group with the reflection σv to C∞v, a double-row representation of the character occurs instead. eilφ + e−ilφ = 2coslφ

Thus, we can study C∞v instead of C∞, but note the following: C∞v has representations that differ only in that they are even or odd when using σv. (The illustrations A1 and A2 in §7,2). As representations of C∞ one must of course identify them. A similar relationship is between Cp and Cpv and T and Td (tetrahedral rotating mirror group). With these, it is also unnecessary to examine the former. Finally, we have to treat the following groups: 1. Inversion (Ci) and mirroring (Cs) 2. Two-dimensional rotating mirror group (C∞v, D∞) 3. p-number axis a) p odd (Cpv, Dp) b) p even (Cpv, Dp, Spv) 4. octahedral and tetrahedral group (O, Td) 5. Icosahedral group (J) Accordingly, § 7 is divided into five points. It comes in each point first the classes of the corresponding group with the characters of all irreducible representations. Isomorphic groups have the same representations, we always treat them in parallel. [Accessed 22 Sept. 2018.] [DeepL Translation] These are molecules in which a so-called random degeneration occurs, which does not result from symmetry, but from the fact that the Schrödinger equation G

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

A Surface Spectroscopy Interpretation from Angewandte Chemie: Comparison of Machine versus Human Translation. The passage on surface chemistry has been written by Bin Ren and other coauthors with Gerhard Ertl. The paper was first written in English followed by German translation by a native speaker (Bruno Pettinger, coauthor).32 Both English and German versions were submitted by the authors to Angewandte Chemie. The title of the paper is “Spitzenverstärkte RamanSpektroskopie von Benzolthiol, adsorbiert an Au-und Pt-Einkristalloberflächen” which means Tip-Enhanced Raman Spectroscopy (TERS) of Benzenethiol Adsorbed on Au and Pt Single-Crystal Surfaces.”25 The main advantage here is that a benchmark comparison can be done here.26 As described in Table 1, prior to 1962, there is at least one century of research papers that remain untranslated and inaccessible to a large number of monolingual speakers in Angewandte Chemie. [Quoted with permission from Ren, B.; Picardi, G.; Pettinger, B.; Schuster, R.; Ertl, G., Spitzenverstärkte RamanSpektroskopie von Benzolthiol, adsorbiert an Au-und Pt-Einkristalloberflächen, Angewandte Chemie 2005, 117 (1), 141−144.] Als Testsystem verwendeten wir eine selbstorganizierte Monoschicht von Benzolthiol auf Pt(110). Die TERS-Spektren sind in Abbildung 1 dargestellt. Wie Kurve (a) zeigt, wird kein Raman-Signal beobachtet, wenn die STM-Spitze 1 μm von der Oberfläche zurückgezogen oder vollständig entfernt wird. Bringt man jedoch die Au-Spitze in Tunnelposition mit dem Pt(110)-Substrat (d. h. in einen Abstand von ungefähr 1 nm über der Oberfläche), so können Spektren hoher Intensität und Qualität gemessen werden (Spektren (b)−f) in Abbildung 1). Die Raman-Intensität der stärksten Bande betrug ungefähr 20 cps und sank geringfügig im Laufe der Zeit. Das weist auf einen Zerfallsprozess hin, ähnlich dem Bleichprozess von Farbstoff molekülen unter Beleuchtung. Jedoch sollte adsorbiertes Benzolthiol keine Absorptionsbande im Wellenlängenbereich des verwendeten Anregungslichtes auf weisen; deshalb muß die Intensitätsabnahme der sehr hohen elektromagnetischen Feldstärke zugeschrieben werden, die in der unmittelbaren Nähe der Tunnelspitze erzeugt wurde und vermutlich zur Photodesorption oder zum Photozerfall von Benzolthiol führte. Diese Schlussfolgerung wurde experimentell durch die Verringerung der Laserleistung um einen Faktor 10 und durch die Untersuchung eines unzerstörten Bereiches der Oberfläche bestätigt. Unter diesen Bedingungen blieb das Signal nahezu konstant; die drei stärksten Raman-Banden waren mit einem Signal/Rausch-Verhältnis von 3:1 klar erkennbar. Die Intensität der stärksten Bande betrug ungefähr 3.4 cps (nicht gezeigt). Im Vergleich zu der mit voller Laserleistung gemessenen Intensität von 24 cps zeigt dieses Ergebnis, dass bereits während der Auf nahme des ersten Spektrums unter voller Laserleistung ein Teil des adsorbierten Benzolthiols zersetzt wurde, was zu einer verringerten Signalintensität f ührte. [DeepL Translation] As a test system we used a selforganized monolayer of benzene thiol on Pt(110). The TERS spectra are shown in Figure 1. As curve (a) shows, no Raman signal is observed when the STM tip is withdrawn 1 μm from the surface or completely removed. However, if the Au tip is brought into tunnel position with the Pt(110) substrate (i.e., at a distance of about 1 nm above the surface), high intensity and high quality spectra can be measured (spectra (b)-f) in Figure 1). The Raman intensity of the strongest bands was about 20 cps and decreased slightly over time. This indicates a decay process similar to the bleaching process of dye molecules under illumination. However, adsorbed benzenethiol should not have an absorption band in the wavelength range of the excitation light used; therefore, the decrease in intensity must be attributed to

of the system is real and that a complex eigenfunction belongs to a simple eigenvalue which is not degenerate according to group G. Then the conjugated complex function becomes a eigenfunction of the same eigenvalue. Then the conjugated complex function becomes an eigenfunction of the same eigenvalue (Note: This is repeated by the DeepL Translator). Physically, there is no significant difference between the necessary degeneration and this random degeneration. In general, we can extend the group so that the degeneration occurs as a result of symmetry. Let’s look at C∞ for example. The singlerow representations eilφ, e−ilφ are randomly degenerate together. If we extend the group with the mirroring σv to C∞v, a two-row representation with the character appears instead of this. eilφ + e−ilφ = 2coslφ

So we can examine C∞v instead of C∞, but please note the following: C∞v has representations which only differ in the fact that they are even or odd when σv is used. (The representations A1 and A2 in §7,2). As representations of C∞ one must certainly identify these. A similar relation is between Cp and Cpv respectively T and Td (tetrahedron rotational mirror group). With these it is likewise superfluous to examine the former. Finally we have to treat the following groups: 1. inversion (Ci) and mirroring (Cs) 2. two-dimensional rotating mirror group (C∞v, D∞) 3. p-numbered axis (a) p odd (Cpv, Dp) (b) p even (Cpv, Dp, Spv) 4. octahedron and tetrahedron group (O, Td) 5. icosahedron group (J) Accordingly, § 7 is divided into five points. In each point the classes of the corresponding group with the characters of all irreducible representations come first. Isomorphic groups have the same representations, we always treat them in parallel. [Accessed 22 Sept. 2018.] Just like Beilstein, the translation is readable. The readers can identify similar terminology as used today in modern group theory.31 The symbols had to be retyped in the translated version because they lost their formatting. As we have seen earlier, abbreviations must be avoided for machine translation to proceed smoothly. The abbreviation bzw., again a standard abbreviation beziehungsweise (= respectively) was written in full before translation. Similarly, instead of “a double-row representation of the character occurs instead” in Google Translate, the meaning “...a double row representation takes its place instead with the character” is more appropriate. DeepL takes care of these minor issues and shows better fluency. Minor problems include p-number axis should be the p-fold axis. However, the meaning of most of the sentences is clear; provided the user has an intuitive knowledge of the terminology. Another useful feature of Google Translate and DeepL is alternative translations with the frequency of usage (often scientific words are of low frequency). For instance, the sentence “Physikalisch ist zwischen der notwendigen und dieser zufälligen Entartung kein wesentlicher Unterschied.” was translated as, “Physically, there is no essential difference between the necessary and this accidental degeneration” Clicking on “Entartung” provides alternative meanings and the sentence can make a better translation “Physically, there is no essential difference between the inevitable and this accidental degeneracy” which makes more sense now. Similarly DeepL shows “p gerade” as “p straight”, alternative but correct meaning is “p even” and the user can choose it from alternative definitions. H

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

already been decomposed, leading to a decreased signal intensity. [Human Version.26] Since this passage is relatively recent as compared to previous readings, the machine translation of both Google and DeepL translators is not different from the English published version. All sentences carry almost one-to-one correspondence, as the sentences are short, without too many dependent subclauses. This style is very different from pre-1980s research articles. It is evident that the machine translation of modern texts is excellent with a bright future.



CONCLUSIONS This work evaluated the power of machine translation for researchers with an emphasis on German scientific literature, which is an untapped resource of information, often rich in historical development. We have executed translation of German scientific literature for about one century, covering both traditional as well as recent texts. This study revealed that machine translation is a powerful, reliable and instantaneous tool for translating not only short phrases but even large passages can successfully be converted into English. Google translation may not be perfect but indeed helpful to catch the central idea of the given text. Both Google Translate and DeepL are constantly improving. Currently, machine translation is near to human interpretation, and hopefully, in the near future, it will be exactly equivalent to human analysis and translation. The older work until the 1990s is typically available as scanned PDF by various publishers, which cannot be directly used for machine translation, partially due to relatively low-resolution printing issues. Moreover, it is hard to copy and paste the text in the translator in this form. However, recent German literature can directly be copied and pasted in the translator and handled efficiently. Additionally, reader discretion is also imperative in translating any foreign language text into English particularly synthetic methods. In future, neural machine translations may help us convert all the resources given in Table 1 into English.

Figure 1. [Original Text] Abbildung 1. TERS von Benzolthiol an Pt(110). Das Spektrum (a) wurde mit zurückgezogener Spitze aufgenommen. Während der Messung der Spektren (b−f) befand sich die Spitze im Tunnelmodus. Die Spektren (b−f) wurden aufeinander folgend gemessen. Laserleistung: 5 mW. [DeepL Translation] Figure 1. TERS of benzenethiol at Pt(110). Spectrum (a) was recorded with the tip retracted. During the measurement of the spectra (b-f) the peak was in tunnel mode. The spectra (b-f) were measured successively. Laser power: 5 mW. Adapted with permission from ref 25. Copyright 2005 Wiley.

the very high electromagnetic field strength generated in the immediate vicinity of the tunnel tip and presumably leading to photodesorption or photodecay of benzenethiol. This conclusion was confirmed experimentally by the reduction of the laser power by a factor of 10 and by the investigation of an undamaged area of the surface. Under these conditions, the signal remained nearly constant; the three strongest Raman bands were clearly visible with a signal-to-noise ratio of 3:1. The intensity of the strongest bands was approximately 3.4 cps (not shown). Compared to the intensity of 24 cps measured at full laser power, this result shows that part of the adsorbed benzene thiol was already decomposed during the recording of the first spectrum at full laser power, resulting in a reduced signal intensity. [Accessed 22 Sept. 2018.] [Quoted with permission from ref 26.] The test system we employed was a self-assembled monolayer of benzenethiol on Pt(110). The TERS spectra are shown in Figure 1. As spectrum (a) shows, no Raman signal can be detected if the STM tip is retracted by about 1 μm or if it is completely absent. When, however, the Au tip is moved into the tunneling position above the Pt(110) substrate (at a distance of about 1 nm above the surface), high-quality spectra can be obtained (spectra b−f in Figure 1). The Raman signal of the most intense peak is about 20 counts per second (cps) and decreases slightly with time, showing a decay process reminiscent of the bleaching of dye molecules upon illumination. However, adsorbed benzenethiol is not expected to have an appropriate absorption band in the wavelength region of the excitation laser we used. Thus, the intensity decay must be attributed to the very high electromagnetic field generated in the close vicinity of the tip apex, possibly leading to the photodesorption or photodecomposition of benzenethiol. This conclusion was confirmed by decreasing the laser power by a factor of ten and probing a new patch on the surface. Under these conditions the signal became fairly stable; the three most prominent bands are clearly visible with a signalto-noise ratio of 3:1, and the intensity of the strongest band was then about 3.4 cps (not shown). If one considers the signal intensity of 24 cps found with full laser power, this indicates that even for the first spectrum, a part of the surface benzenethiol had



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.8b00534. Details on OCR and Google Translate texts not shown in the manuscript (DOCX)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. *E-mail: sonia.zulfi[email protected]. ORCID

M. Farooq Wahab: 0000-0003-4455-2184 Author Contributions

M.F.W. and S.Z. have contributed equally. Notes

The authors are not affiliated with any language translating program or service. No funding was associated with this work. The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors would like to thank Alice Essenpreis of Springer Nature for providing permissions of Beilstein, Gmelin, LandoltBörnstein texts, and Dr. Toby Reeve Senior Scientific Editor, I

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

(22) Kotowski, E. H. E. P. u. A. Gmelin Handbuch der Anorganischen Chemie, W, Systeme mit Edelgassen, 8th ed.; Springer-Verlag: Berlin, 1978. (23) Eucken, A. Landolt−Bornstein Zahlenwerte Und Funktionen Aus Physik, Chemie, Astronomie, Geophysik Und Technik, 6th ed.; SpringerVerlag, GmbH: Berlin, 2013; softcover reprint of the hardcover 6th ed. (24) Tisza, L. Zur Deutung der Spektren mehratomiger Moleküle. Eur. Phys. J. A 1933, 82, 48−72. (25) Ren, B.; Picardi, G.; Pettinger, B.; Schuster, R.; Ertl, G. Spitzenverstärkte Raman-Spektroskopie von Benzolthiol, adsorbiert an Au-und Pt-Einkristalloberflächen. Angew. Chem. 2005, 117, 141−144. (26) Ren, B.; Picardi, G.; Pettinger, B.; Schuster, R.; Ertl, G. Tipenhanced Raman spectroscopy of benzenethiol adsorbed on Au and Pt single-crystal surfaces. Angew. Chem., Int. Ed. 2005, 44, 139−142. (27) Frankfurt, B. I. Beilstein Dictionary: German to English for the Users of Beilstein Handbook of Organic Chemistry; Springer-Verlag: Berlin, 1984. (28) Maizell, R. E. How to Find Chemical Information A Guide for Practising Chemists, Teachers and Students; John Wiley & Sons: New York, 1979. (29) Kong, X.-S.; Wang, S.; Wu, X.; You, Y.-W.; Liu, C.; Fang, Q.; Chen, J.-L.; Luo, G.-N. First-principles calculations of hydrogen solution and diffusion in tungsten: Temperature and defect-trapping effects. Acta Mater. 2015, 84, 426−435. (30) Pauson, P. L. In Houben-Weyl Methoden der organischen Chemie Organo-p-Metal Compounds; Georg Thieme Verlag: Stuttgart, 1986; Vol. E18, p 237. (31) Cotton, F. A. Chemical Applications of Group Theory, 3rd ed.; John Wiley & Sons: USA, 1990. (32) Wahab, F. M.; Ren, B. Personal Communication, 12 Sept. 2018.

Science of Synthesis of Georg Thieme Verlag for providing permissions of their German texts.



REFERENCES

(1) Clark, B. R. The Research Foundations of Graduate Education: Germany, Britain, France, United States, Japan; University of California Press: Berkeley, CA, 1993. (2) Gordin, M. D. Scientific Babel: How Science Was Done Before and After Global English; The University of Chicago Press: Chicago, 2015. (3) Ferguson, G.; Pérez-Llantada, C.; Plo, R. English as an international language of scientific publication: A study of attitudes. World Englishes 2011, 30, 41−59. (4) Parkar, F. A.; Parkin, D. Comparison of Beilstein CrossFirePlusReactions and the Selective Reaction Databases under ISIS. J. Chem. Inf. Comput. Sci. 1999, 39, 281−288. (5) O’Sullivan, D. A. Gmelin Handbook Flourishes At 200th Birthday of Founder. Chem. Eng. News 1988, 66, 22−24. (6) Shreve, J. T. F. a. R. N. Advanced Readings in Chemical and Technical German; John Wiley & Sons, Inc.: New York, 1940. (7) Thayer, A. N. N. Elsevier Acquires Beilstein Database. Chem. Eng. News 2007, 85, 12. (8) Case Study: CAS creates first English version of world’s oldest chemical journal with Iconic language technology. http:// iconictranslation.com/case-studies/cas-chemzent/ (accessed 20 Sept 2018). (9) Beilstein Handbook of Organic Chemistry (University of Buffalo Library Guide). https://research.lib.buffalo.edu/beilstein (accessed 22 Sept. 2018). (10) Gmelin’s Handbook of Inorganic Chemistry (University of Buffalo Library Guide). https://research.lib.buffalo.edu/gmelin (accessed 22 Sept. 2018). (11) Lawson, A. J.; Swienty-Busch, J.; Géoui, T.; Evans, D., The Making of ReaxysTowards Unobstructed Access to Relevant Chemistry Information. In The Future of the History of Chemical Information; American Chemical Society: 2014; Vol. 1164, pp 127− 148. . (12) Rossini, F. D. Historical Background of Data Compiling Activities. J. Chem. Doc. 1967, 7, 2−6. (13) Schwartz, L. The history and promise of machine translation. Innovation and Expansion in Translation Process Research; John Benjamins Publishing Company, 2018; p 161. (14) Johnson, M.; Schuster, M.; Le, Q. V.; Krikun, M.; Wu, Y.; Chen, Z.; Thorat, N.; Viégas, F.; Wattenberg, M.; Corrado, G. Google’s multilingual neural machine translation system: enabling zero-shot translation. 2016, arXiv:1611.04558. arXiv.org e-Print archive. https:// arxiv.org/abs/1611.04558. (15) Wu, Y.; Schuster, M.; Chen, Z.; Le, Q. V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K., Google’s neural machine translation system: Bridging the gap between human and machine translation. 2016, arXiv:1609.08144. arXiv.org e-Print archive. https://arxiv.org/abs/1609.08144. (16) Sayle, R. Foreign Language Translation of Chemical Nomenclature by Computer. J. Chem. Inf. Model. 2009, 49, 519−530. (17) Cooke-Fox, D. I.; Kirby, G. H.; Lord, M. R.; Rayner, J. D. Computer translation of IUPAC systematic organic chemical nomenclature. 5. Steroid nomenclature. J. Chem. Inf. Model. 1990, 30, 128−132. (18) Stillwell, R. N. Computer Translation of Systematic Chemical Nomenclature to Structural Formulas-Steroids. J. Chem. Doc. 1973, 13, 107−109. (19) Summers, L. Machine Translation of Russian Organic Chemical Names into English by Analysis and Resynthesis of The Component Fragments. J. Chem. Doc. 1962, 2, 83−86. (20) Beilstein, F. Handbuch der Organischen Chemie Dritte Umgearbeitete Auflage; Leopold Voss: Hamburg and Liepzig, 1897; Vol. Dritte Band. (21) Bernhard Prager, P. J.; Paul Schmidt, D. S. Beilsteins Handbuch der Organischen Chemie; Julius Springer: Berlin, 1933; Vol. Sechzehnter Band. J

DOI: 10.1021/acs.jcim.8b00534 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX