Experimental Investigation of Information Processing under

Aug 5, 2008 - Overall, the results establish several key points about information of four varieties purchased in real-time Brownian venues. The points...
1 downloads 3 Views 240KB Size
10594

J. Phys. Chem. B 2008, 112, 10594–10602

Experimental Investigation of Information Processing under Irreversible Brownian Conditions: Work/Time Analysis of Paper Chromatograms Daniel J. Graham,* Christopher Malarkey, and William Sevchuk Department of Chemistry, Loyola UniVersity Chicago, Chicago, Illinois 60626 ReceiVed: December 20, 2007; ReVised Manuscript ReceiVed: June 27, 2008

A wet laboratory study of chemical information processing in Brownian (random walk) environments is presented. Point samples of adsorbed dyes were subject to diffusion on paper chromatography sheets; the resulting images were recorded digitally using an office scanner. The experiments enabled the measurement of four types of information along with the attendant costs of work and time. The data are examined for their statistical distribution and scaling properties and Fourier spectral components. Whereas theory, calculations, and reversible pathways were central to the previous paper, the present study is devoted to experiments and irreversible transformations. Overall, the results establish several key points about information of four varieties purchased in real-time Brownian venues. The points concern the kinetics of information, memory effects, and the distributions of information changes with energy. 1. Introduction For the past several years, we have explored a road that links information theory and chemistry. The wayside stops have examined (1) the base information in molecules apart from their structural attributes,1 (2) the information expressed by molecules in liquid environments, taking regional and atom/covalent bond structure into account,2 and (3) the information in classical thermodynamic transformations.3 The exploration has not occurred in a vacuum. Information and chemical and biological structure/function, complexity, and algorithm problems have indeed powered vibrant fields for over three decades.4–12 In organic applications, the Shannon and Kullback measures have provided robust descriptors for molecules and natural products libraries.13–15 In analytical fields, information theory has provided optimization strategies for chemical component separations, data transfer, and the evaluation of mass spectral data.16–20 In thermodynamic territory, information has augmented the mathematical tools for describing both micro- and macroscopic systems.3,21,22 Information and communication theory have likewise shed light on molecular excited states and relaxation channels.23 Yet one is cognizant of three issues. First is that quantifying the information for a molecule in a Brownian environment or natural products database begins with a chemical structure graph.2,13–15 On the one hand, this does not diminish the insights furnished in quantitative structure/activity relationship (QSAR) investigations. On the other hand, certain limitations are imposed. The mass, dipole moment, etc., of 3-methyl-5-enecyclohexane-1-one can be accessed experimentally. By contrast, the information is quantified in Brownian venues by encoding collision sequences... (C-C)(C-H)(CdO)... based on the graph (Figure 1, upper half).2 The information can be realized at higher resolution by incorporating the molecular orbital structure into the collision sequences.24 In all cases, the end results chart the message space for the molecule and reflect the function and activity. At the same time, the reflection is limited by the graph’s fidelity to chemical properties. As is well-appreciated, com* To whom correspondence should be addressed. E-mail: [email protected]. Phone: (773) 508-3169. Fax (773) 508-3086.

Figure 1. General considerations of information, molecules, and thermodynamic systems. The upper-half shows the structure graph of 3-methyl-5-ene-cyclohexane-1-one and a sequence of collisions in a Brownian environment. This molecule was extensively featured in the discussions of refs 2 and 24. The lower portion illustrates two systems in schematic terms. For the system on the left, incremental work ∆W is supplied externally in time ∆t so as to purchase information ∆I. For the system on the right, incremental work ∆W derives internally and is lost by the system to render ∆I.

pounds that demonstrate resonance stabilization require more than one graph in order to represent the chemistry.25 A second issue is that the acquisition of information is neither free nor instantaneous.21,22,26 Rather for every trapping event, there exists an amount of work ∆W that is expended by, or on behalf of, a system during time ∆t so as to purchase information ∆I (Figure 1, lower half). These transactions are readily quantified for electronic logic gates and biopolymers.22,26–28 In small-molecule applications, however, the quantifying circumstances are severely limited to beams and free jet expansions.23,29 While crucial to chemistry, work/information price tags and the spending dynamics are not very accessible experimentally in

10.1021/jp711953r CCC: $40.75  2008 American Chemical Society Published on Web 08/05/2008

Information Processing under Irreversible Brownian Conditions ordinary systems, e.g., 3-methyl-5-ene-cyclohexane-1-one in ethanol at room temperature. The third issue concerns interdependence. In an environment such as liquid ethanol, information is communicated by dissolved 3-methyl-5-ene-cyclohexane-1-one by collisions involving the (C-C), (CdC), (CdO), and (C-H) units. A robust processor must have an Angstrom-scale sensing capacity in order to register each of these units accurately. A lower-resolution device would underestimate the message space size and/or chart the structure erroneously. Along the same lines, the term “information” does not enjoy a singular definition. Instead, there are several that have originated from Shannon, Kullback, Fisher, and others.30 Thus when 3-methyl-5-ene-cyclohexane-1-one is subject to information processing, the amount registereds Shannon, Kullback, etc.sdepends on the programming details. The information expressed in Brownian environments is an important topic unto itself as discussed originally by Bennett.22 It is also crucial to several areas of chemistry, thus motivating studies beyond line/vertex graphs, collision models, and the reversible circumstances discussed in the previous paper. In particular, the thermodynamic transactions that relate work, time, and information in irreversible venues need to be examined by additional experiments. In the ideal, the experiments would delineate the interdependence of the source and processor as well as the type of information registered. With the above in mind, we have worked the past few years with systems that offer insights into information expressed and processed under Brownian conditions. To be sure, biopolymers and molecular beams already embody such systems. Our efforts, however, focused on testbeds that were macroscopic and very accessible as a consequence. Several years ago we carried out (and reported in this journal) an image processing investigation of detergent foams inspired by the cellular pattern research of Stavans, Glazier, Weaire, and their co-workers.31–33 These samples were used to quantify fluid structure by way of spatial correlation functions grounded on nature’s potentials, not approximated ones. The experiments discussed in the present paper were pursued in much the same spirit. We report here a study of dyes subject to diffusion on paper chromatography sheets. The latter are familiar devices used for chemical assays and separations.34 For our purpose, the chromatograms served as real-time Brownian processors of information under irreversible conditions. However elementary, the data were grounded on benchtop experiments, with the transactions relating work, time, and information quantified. Detailed pictures were indeed realized for four general flavors of information. This paper is organized as follows. Technical details of the experiments are presented in section 2, whereas several results are illustrated in section 3. Notably, the data offer signature features of four information quantities. Each demonstrated its own response to expended work and time with noteworthy statistical and spectral features. As is well-appreciated, Brownian phenomena and their chemical applications have been studied intensely for more than a century.35,36 The information affiliated with reversible transformations has been discussed at length.3,20,21,25 The applications of information theory to chemical component separations and the formation of patterns under nonequilibrium conditions have been thoroughly reviewed.16,17,20,37 The stochastic theory of chromatography has also been extensively developed over the years by several researchers.38 Even so, prior to the experiments, we had only sparse ideas what to expect regarding the bit-wise information responses to real-time work expenditures compared with, say, an analysis of transistor logic circuits and magnetic storage media. In a

J. Phys. Chem. B, Vol. 112, No. 34, 2008 10595 Brownian process, would different information measures demonstrate parallel behavior given identical expended time and work? Would the behavior be trivial, i.e., greater resource expenditures purchase greater information? Would a single rate law apply to all transactions? These were some of the questions during the investigation; to the best of our knowledge, such information/cost issues have not been addressed previously in the chemistry literature. Having completed the endeavor, we can pose answers and make four significant points (section 4) about information under Brownian conditions. 2. Paper Chromatograms as Brownian Processors (A) Technical Details. All chemistry instrumentationsspectrometers, thermometers, etc.-connect work, time, and information in one fashion or another. Our study concentrated on systems that were the most quantitatively accessible, minimally polluting, and inherently statistical. The initial survey experiments involved gel electrophoresis where the forced random walks of protein mixtures were effected by electric field gradients. A combination of staining and digital recording techniques subsequently provided intensity distributions which were amenable to analysis. To this end, computer programs were developed that could query the pixel states affiliated with each staining pattern. These programs were modifications of the ones developed for the detergent foam studies.31 It was realized, however, that paper chromatograms afforded simpler devices with still less environmental impact. Here, the motion of one or more adsorbed dyes was brought about by a solvent such as water or alcohol. The solvent flow caused the dyes to spread through the stationary pore lattice and along the surface in a type of forced random walk. Information expressed by the trapped dye pattern was the natural byproduct as paid for by time and the free energy/loss of the solvent and paper. Note the features in common between chromatographic and Brownian systems such as transcription enzymes interacting with biopolymers. The foregoing depend on the diffusive motion of solvated particles, multiple collisions and sites, and diverse spectra of transit times; a record tape is the end result following a loss of free energy. All of these phenomena are addressable in mathematical terms using stochastic methods.22,38 Chromatographic systems, paper or otherwise, are by no means singular as macroscopic Brownian processors. Yet they may be the most prevalent ones in chemical investigations. As early as 1959, the techniques were noted to play a role in fifty percent of the research reported in the Journal of Biological Chemistry.39 In paper chromatography, the image information is obtained regardless of the liquid flow direction and medium orientation: the stationary phase absorbs the solvent and the dyes spread. During this project, several configurations were investigated: radial, horizontal, vertical, and diagonal. When the flow was horizontal, absorption of liquid by the paper was the sole work source. When the flow was downward vertical, paper wetting plus the loss of solvent gravitational energy contributed work. Yet when the solvent flow opposed the force of gravity, the work costsslower limitsswere most readily quantified by the mass and height of lifted material. Such was the case regardless of the paper structure and solvent identity. In all configurations, the effects of heat and aggregation of dye molecules were insignificant. The dynamics were addressable simply using a laboratory stopwatch and hand-timing. The capillary rise of a liquid through paper action was addressed extensively in the 1950s by several researchers.40 For our part, a handle on the work costs is provided as follows. Let F denote the mass of solvent (e.g., water, ethanol, etc.) absorbed

10596 J. Phys. Chem. B, Vol. 112, No. 34, 2008

Graham et al.

D ∞ V ∞ (1/M)1/2

(4)

For a rectangular paper strip, the mass of absorbed solvent increases linearly with Ly

M ∞ Ly

(5)

By combining eqs 3–45, one arrives at

Ly ∞ ((1 ⁄ Ly)1/2t)1/2

(6)

The above can be rearranged to yield

Ly ∞ t2/5 ) t0.40

Figure 2. Chromatograms and solvent motion. The upper panel schematically illustrates a vertically mounted rectangular paper strip with the labeled quantities discussed in the text. The lower panel illustrates the time dependence of Ly measured for distilled water at room temperature and Whatman 1 chromatography paper. The best-fit line yields a slope equivalent to the time exponent 0.417 along with a coefficient of determination R2 of 0.997.

per unit area of a uniformly thick, vertically mounted paper strip. An infinitesimal quantity absorbed by the paper is thus F dx dy (Figure 2, upper half). This quantity can be raised to a height y at an incremental work cost dW given by:

dW ) Fgy dy dx

(1)

where g is the acceleration due to gravity. Let Lx denote the width of the paper and Ly represent the height of the solvent front. The energy required to lift FLyLx quantity of solvent is the integral of eq 1

W ) gF

∫0L dx∫0L y dy ) (1/2)gFLxLy2 x

y

(2)

For uniformity, a single size and brand of chromatography paper were utilized in the experiments illustrated here, namely Whatman 1 with dimensions 7.5 cm × 15 cm × 0.016 cm. Thus Lx and F were held constant in experimental runs with variable Ly; for each run, the work scaled simply as the square of Ly. Stop-flow effects were brought about by separating the paper from the solvent reservoir at times ranging from several seconds to tens of minutes. The solvent motion was diffusive in a manner approximated by scaling considerations. Let the front position Ly of solvent with total mass M and diffusion coefficient D scale with time t as

Ly ∞ (Dt)1⁄2

(3)

D is limited by the thermal speed V that in turn is limited by M

(7)

The lower panel of Figure 2 illustrates the measured time dependence of Ly for distilled water lifted through the Whatman 1 paper at room temperature. We found the time exponent experimentally to be 0.417 (R2 ) 0.997), in good agreement with the scaling considerations. The work of previous researchers identifies several sources of deviations from eq 7. These would include the liquid surface tension, pore structure of the stationary phase, surface roughness, and evaporation effects.40 Even so, the scaling ideas offer a ready and simple portrayal of the kinetics. (B) Image Processing and Analysis. Several types of experiments were carried out employing different solvents, dyes, and initial conditions. After considerable exploratory labor, we concentrated on the patterns evolved from point sources adsorbed 2.5 cm from one edge of each paper strip. The sources were 1 mm2 in area and composed of India ink or washable marker dyes. These dyes were subject to the diffusing effects of distilled water lifted through each paper support by capillary action. Every experimental run yielded a set of dye streak patterns: for each pattern, there corresponded a measured time and energy cost of acquisition. Every sample was allowed to dry thoroughly. The streak patterns, unlike the gel electrophoresis data, required no developer for visualization; they were subsequently digitized using a Hewlett-Packard desktop scanner, as in the detergent foam experiments of previous years.31 All image data were archived as TIFF files. Optical scanners offer variable resolution, brightness, and contrast. We aimed for TIFF records that were most faithful to the dye patterns as perceived visually. After identifying the optimum parameters, we employed a uniform resolution, brightness, and contrast for all experimental runs. The information posed by each chromatogram sample obviously depended not only on the dye material and solvent, but also on the method of image query/recording. For our experiments, one query method focused only on the optical densities: the set of grayscale intensities posed by every streak pattern subject to effective averaging of the reflected primary colors. The second method accommodated the full-color states representing a pattern. By the former method, every intensity value corresponded to a single 8-bit number, e.g., 10010110 in binary, 150 in decimal; every pixel thus posed 28 ) 256 possible states. The full-color records were more extended. Here every pixel corresponded to an ordered triplet of 8-bit numbers, one for each of the primary colors. Accordingly every pixel in a streak pattern afforded (28)3 ) 224 ≈ 1.68 × 107 possible states. The query logistics are represented schematically in Figure 3. The upper panel shows the differences between grayscale and full-color statessthe random variables whose numerical values followed from binary-to-decimal conversions. The fullcolor states obviously rendered more information than grayscale: the latter-type pixel with value 178, for example, does not provide any specifics regarding the individual color contribu-

Information Processing under Irreversible Brownian Conditions

Figure 3. Analysis logistics and population distributions. The upper panel schematically illustrates a portion of a chromatogram image. In gray scale analyses, a single eight-bit number corresponds to every pixel following effective density averaging of the primary colors. In full-color analyses, each pixel corresponds to an ordered triplet of 8-bit numbers, one for each primary color. The lower panel shows a sample grayscale population distribution along with the work and time costs.

tions. Each dye streak pattern was represented by a maximum of 60 × 400 ) 24 000 pixels. This arrangement offered as many as 60 × 400 × 8 ≈ 1.9 × 105 bits that required querying in grayscale analyses; 1.9 × 105 × 3 ) 5.7 × 105 bits in fullcolor analyses. The spatial resolution was fixed so that every pixel corresponded to 0.115 mm2 of chromatogram surface area. The analyses proved simple for both grayscale and full-color records. Even so, the number of possible samples N for both modes was quite large. For the grayscale records, N ≈ 25624 000 ≈ 1057 800; N ≈ 16 800 00024 000 ≈ 10173 400 for the full-color. A blank chromatogramsor the portions untouched by the mobile phasesoffers zero information because all of the corresponding pixel states are identical. Matters are different when a dye is adsorbed and indeed subject to diffusion through the pores and along the surface. Here the representative pixel states below the front position demonstrate a nontrivial population distribution such as the grayscale one illustrated in the lower panel of Figure 3. Note that the time and work (inset quantities) pertain to the elution pattern as a whole and not to any particular region underneath the solvent front. It was via such distributions that all of the information quantities were measured. We note that the full-color population distributions were four dimensional: one dimension for each primary color plus the state population number. For each experimental run, it was sufficient to characterize 15 images. For each image, the fluctuations in the state populations were estimated by assembling several distributions based on randomly selected pixels. This bootstrap method enabled averages and standard deviations to be estimated for all of the information quantities. Figure 4 illustrates typical streak

J. Phys. Chem. B, Vol. 112, No. 34, 2008 10597

Figure 4. Examples of dye streak patterns. The upper images are derived from point sources of water-soluble black dye. The lower set is derived from point sources of water-soluble red dye. The longest streaks correspond to ca. 8 cm in length.

patterns recorded in the experiments. The image complexities were governed by dye diffusion along and lateral to the direction of solvent flow. All information quantities were available following digitization of the images. (C) Information Analysis. Four types of information were examined. The first was based on the occurrence frequencies fi of each of the 256 possible gray scale states. Bit-values for the Shannon information I were quantified for each TIFF file using the following formula:

Igrayscale ) -

i)255

255

i)0

0

∑ fi log2 fi ) -K∑ fi ln fi

(8)

where K ) 1/ln(2) ≈ 1.44 and Σifi ) 1. In so doing, Igray scale quantified the number of bits required for efficient coding of the optical density states, given a distribution such as illustrated in Figure 3. If each possible state was expressed with identical frequency 1/256, Igrayscale would equal log2 256 ) log2 28 ) 8 bits. Equation 8 was extended to the full-color analyses. Here, the occurrence frequencies of 2563 possible states were tabulated. A second value of the Shannon information, Ifull color, was thus measured for each sample. Here, if every possible state was expressed with equal frequency, Ifull color would equate with log2 2563 ) log2 224 ) 24 bits. The third type of information also derived from full-color analyses. The individual frequencies fi, fj, fk of the primary color states were quantified along with the joint frequencies fijk. These quantities together yielded the mutual information MI:

MI ) +

∑ ijk fijk log2(fijk/fifjfk) ) K∑ ijk fijk ln(fijk/fifjfk) (9)

For each chromatogram, MI identified the number of extra bits required for coding, if it was assumed, correctly or not, that the primary color states were uncorrelated with each other. For a

10598 J. Phys. Chem. B, Vol. 112, No. 34, 2008

Graham et al.

Figure 5. Shannon information as a function of spent work. The upper panel shows data for Igrayscale for multicomponent dye patterns. Arrows are to transitions in which Igrayscale was enhanced, diminished, or simply maintained in value. The lower panel shows Ifull color for the same patterns. For both panels, the error bars are smaller than the symbol widths.

random distribution of states fijk ) fifjfk; such an image would render zero bits of MI.41 Nonzero MI quantifies the price for making erroneous assumptions. The fourth type of information resembled MI but was based on gray-scale records. The Kullback information KI identified the number of extra bits required for coding a distribution {fi}, if it was assumed, correctly or otherwise, that that the distribution was detailed by another {qi}30

KI )

∑ fi log2(fi/qi) ) K ∑ fi ln(fi/qi)

(10)

Clearly identical {fi} and {qi} express zero bits of KI. Importantly, population distributions that are peculiarly mismatched, e.g., fi > 0, qi ) 0, yield KI as high as infinity. As with MI, KI connects a price in bits with distribution assumptions tendered. 3. Results Typical results for Igrayscale and its work dependence are presented in the upper panel of Figure 5. For this experiment, the information source was a water-soluble black dye having red, blue, and yellow components (cf. Figure 4, upper half). The samples were subject to work expenditures as high as ∼0.70 mJ. Igrayscale was found to increase with expended work, but not in a simple monotonic way. Arrows point to one example each where Igrayscale was enhanced, diminished, or simply maintained by spending additional work. In constructing Figure 5, error bars were included so as to mark the Igrayscale averages plus/ minus one standard deviation (1σ). The bars are observed to be less than the size of the symbols. This proved to be a consistent

Figure 6. Mutual and Kullback information as a function of spent work. The data are derived from the patterns used to construct Figure 5. For the Kullback information, the reference distribution derived from the right-most, maximum-work sample. MI, the error bars prove smaller than the symbol widths; this is not the case for KI.

feature of all the experiments: the Igrayscale disparities among different pixel sets were relatively slight for any given image. The lower panel in Figure 5 shows Ifull color derived from the same experimental run. One observes Ifull color to increase with expended work, but not in a uniform way, or simply in a way that mirrors Igrayscale. Work expenditures