A/C
Interface
ELECTRONIC LAB NOTEBOOKS -
-
—
—
-
—
-
*
•
A Shareable Resource
E
very scientist creates a special form of diary: the scientific notebook. It is a log of the day-to-day experiments performed, as well as a partial interpretation of their results. These diaries are rarely published, and they are usually read only by colleagues who witness the results. Instead, reports containing parts of the log are distributed, and papers containing detailed interpretations of the reports are published. Although verbal disclosures of its contents are common, the notebook remains private. All of these reports, papers, and oral presentations constitute communication. But, in an information sense, these forms of exchange are filtered, bandwidth limited, and subject to noise. The reports contain only a fraction of what is in the original log, and papers reflect a personal interpretive flavor. Oral presentations may be less connected to fact, are ephemeral, and can be affected by the fluency of the speaker and the attention of the listener. The original notebooks are often difficult to follow because of arcane notations or incompleteness. Even theflowof the research is difficult to track because many activities are usually intertwined. As a result, sharing our most personal scientific document is not straightforward. There are many reasons to improve this process. From the corporate perspective, these include accelerating the development cycle by improved group communication, preserving scientific data and information, and supporting possible 428 A Analytical Chemistry, July 1, 1995
Electronic notebooks can improve the discovery process, turning ideas into products in a shorter time
examine the need to change our concept of a scientific notebook, how these notebooks can be used to improve the discovery process, and what work habits need to be altered to gain these benefits. Electronic formats
Wouldn't it be desirable to have a way to share portions of our lab work with others electronically, and to permit controlledaccess searching across many lab notebooks for scientific or legal purposes? Object-oriented databases. A repatent litigation. From a creative perspecsearch notebook is a collection of various tive, we often arrange face-to-face meetobjects: written records of experimental ings to communicate results, concepts, design, drawings of apparatus and chemiand problems to individuals who have cal structures, circuit schematics, images different acquisition rates, modes of think- obtained from various photographic ing, and verbal communication habits. records, graphs of results, and tables of These meetings are vital, yet they are ofdata. Clearly the paper notebook can be reten inefficient. placed by an object-oriented database (OODB) that could handle these various Most scientists use electronic services formats electronically. An OODB treats all such as Dialog, CAS, and STN to search older literature. A smaller subset use indi- elements as objects with associated characteristics. They are usually manipulated vidualized search profiling to electroniby the use of simple icons that represent cally access current journals, but few use authorized access to collaborators' re- them pictorially. search notebooks. Why shouldn't our reThe power of OODBs, in contrast to search notebooks become a shareable re- more traditional hierarchical databases or source? Scientific electronic notebooks relational databases, is that the characthat offer many new, interesting features teristics of an object are easily inherited by are now available (2). In this article we will newly spawned objects, opening up easy expansion to novice users. The concept also reduces the development costs of vendors and maintenance time of on-site Raymond E. Dessy Virginia Polytechnic Institute and State programmers. Today, some commercial electronic lab notebook products use University 0003-2700/95/0367-428A/$09.00/0 © 1995 American Chemical Society
this approach, providing only a good graphic user interface (GUI) for an OODB to coordinate the presentation and juxtaposition of data objects created by other vendors' programs. Other commercial products have many specialized creation elements built in. Either approach permits the user to electronically navigate through a multidimensional space of data and information entered into the notebook. Hyperspace navigation: the thread that binds. This developing technology offers an exciting prospect—a research record that is not limited to the linear chronological and physical format of paper-based notebooks. What follows is a simple extrapolation from existing products and technology. All of the concepts have been demonstrated, and many are operational in various environments. Let's look at the near future.
It is a simple matter to create multiple threaded views (MTVs) that logically link related elements into a virtual subnotebook for improved continuity and clarity. For example, most scientists work on many problems in parallel. One must manually page through a paper notebook to follow the progress line. It is possible to write forward and backward page pointers (a linked list) within individual experiments in a paper notebook to aid the worker and other readers, but this is seldom done. After all, even a complete and up-to-date index is uncommon. In electronic format, autoindexing among related experiments is simple. By using overlapping windows and icons, it is possible to establish linked lists that can then be stored as virtual sub-notebook objects. As just another object in the OODB, each can be viewed, searched, and manip-
ulated. Separate projects are viewed as discrete, logically contiguous sequences of experiments, excluding unconnected work. The timeline can even be warped (while the original experimental time/ date stamps are retained) to provide a smoother flow. This is most useful when later experiments reveal the explanation for earlier experiments. Automatic indexing on keywords and full-text searching are possible. This makes it easier to find results. Existing software permits a user to file material with related items automatically and to use either keywords or a trial document as a search template. These approaches have been used as dynamic databases (dynabases) for both individual and group work (2,3). Literature sources and experiments can now be easily merged. Analytical Chemistry, July 1, 1995 429 A
A/C
Interface
Client/server sharing. Client/ server versions of this type of software permit the author to share authorized portions of such a notebook with other individuals who can help in interpretation of the data or who might find the work valuable in solving their own problems. The material can be digested at the reader's own pace, and interactive queries can be handled electronically. If groups are involved, a virtual meeting can minimize the confrontations, dominant personalities, and different cognition rates associated with live gatherings. Most researchers do not expect large-group team efforts to be the new creative paradigm in science. However, the expectation is that the networking of notebook and other materials will facilitate small-group interactions. Successful examples of this approach in medical research and the clinical area already exist. This approach encourages collaborative seeding of creativity. Colleagues can create their own personal threaded view (PTV) from a variety of workers and sources, selecting those experiments and reports that are important to their mission. They can then page through their own sequence to discern patterns, ideas, explanations, and anomalies. Another ideal use of this new medium is to facilitate transfer of responsibility from one team leader to another. All of the experimental data can be made available in an organized format. In a client/server configuration "your" pages continue to reside in your private notebook. They can only be used by "others" with proper access; "your" copy remains the accessible compound document, allowing updates and changes visible to your privileged colleagues. Change
Such approaches will work effectively only if the new form of notebook is kept in a way that promotes understanding by other persons. This requires a consistent, logical, and complete presentation of all work. Carefully crafted abstracts are necessary, and an annotated dictionary of terms is essential. Some degree of standardization in format is required, and adherence to a common data format, such as the ADISS/AIM standard, would be helpful (4). However, the new electronic journal 430 A
Analytical Chemistry, July 1, 1995
needs to be much more if it is to serve individual, group, and corporate needs in unique ways. Ajournai model. In this vein, scientists can take lessons from the psychologist Ira Progoff, who has proposed a tool called "journaling" (5). This approach, which should not be confused with what a computer programmer calls journaling, asks that individuals commit to paper various logs of their activity to more fully understand what they are doing and why. Journaling involves a daily log that is akin to our scientist's paper notebook, as well as a period log that involves longer time spans, an intersection log that involves critical decision points as well as paths taken and not taken, and periodic time-expansion logs that re-examine the antecedents and potential consequences of
not in the notebook. Long-term (period) analysis is usually lacking, and attempts to put milestones in place are uncommon. In fact, most paper notebooks are organizationally challenged. An electronic journal can help the user create a feedback log that explicitly shows connections and makes use of the other logs simple. Scientific electronic accounts
Another type of electronic document would require somewhat more time to create, but its utility would compound exponentially. Because the word "notebook" seems confining and the word "journal" has specific meanings in the scientific literature, we will call this product a scientific electronic account (SEA). It would provide a legacy to other workers and to the organization. In a sense, the SEA creates a history of an experimental effort; it is useful as the work is being done and essential as it is viewed in retrospect. The threaded views suggested above are interactive formats. With multimedia becoming common, there is good reason to expect that SEAs will eventually incorporate not just text and drawings, but moving images and visualization objects, as well as narrative and sound. We've never done these things in paper notebooks because of the restrictions of that medium. At least one commercial product has demonstrated the utility of narrative actions taken and not taken. The reader of transmission in the highly litigious clinical environment. Intel demonstrated videoa Progoff journal is, of course, the author, along with others who are participat- supported problem solving, coupled with CD-ROM database access and video coning in what is called a "journal workferencing, at the fall 1991 Comdex (6). Inshop." This approach has applicability to tel's Indeo video compression allows perour lab work. Clear articulation of action, motivation, and result help clarify a per- sonal conferencing over ISDN lines and makes image storage possible (7). Kouzes son's effort. Our own journal. First experiments and Schur have demonstrated scientific uses in their elegant development of the are seldom clean, clear, or completely exCollaboratory paradigm, a collaborating lab plicable. False starts are common and misinterpretations arerife.Points or inter- environment (8). sections are reached at which one must Acceptability. SEAs can be easily decide future research directions. Paper stored on optical drives, where data are renotebooks seldom explain why decisions corded by laser ablative removal of matewere made, and if they do, only the se- rial from the disk's surface, or by laser lected path is developed in detail. The degradation of a polymer film beneath a paths not taken are not described. This reflective surface. Such recording does not makes it difficult to backtrack, particularly have the impermanence of magnetic refor other scientists taking up the work. cording. Any feedback in the project's flow that conIn addition, the data on such an optical nects prior work with current work is disk are encrypted in various ways; usuusually logged in the mind of the worker, ally an 8- to 14-bit run-length-limited (RLL)
Electronic lab notebooks are rapidly evolving toward the SEA paradigm.
binary code conversion is used, and the original logically adjacent bytes are physically separated (interleaved) on the disk. This process occurs in addition to the normal sector interleaving always used in recording to rotating media. The material entered is time and date stamped, and finally each block of data has a companion redundant section that is used for error detection and recovery, similar to the technique used in CD-ROMs. This redundant section involves generating a bit pattern from the original data file using a nonbinary polynomial cyclic redundancy check (CRC) algorithm. The approach is called Reed-Solomon code (9). It is one of many elements that make such recordings secure and acceptable. Because this point is often one of contention, let's examine how it all works. Most readers have encountered the abbreviation CRC in their computer work. The CRC is usually a 16-bit section appended to magnetic diskfiles.Each byte, as written to disk, is also multiplied by a binary polynomial (a 16-term polynomial whose coefficients are 1 or 0). Simple logic operations generate a running sum as each succeeding datum in the file is sequentially processed. The least significant bytes of thefinalsum are called the CRC. If you read such a datafile,calculate your own CRC on the fly, and then compare it with the redundant CRC stored on disk, a match indicates that the "read" has been successful. In practice this can detect (but not correct) burst errors of up to 16 bits. Reed-Solomon approaches create a redundant record that can be more than 25% the size of the original datafile.Imagine the original data file as a travel expense spreadsheet, with sums at the borders of each column and row and a check sum-ofsums at the bottom right (cross additions). Cross-interleaved Reed-Solomon (CIRC) encoding can both detect and correct singular and burst read errors, providing hard error rates far better than any magnetic disk in laboratory use. However, this complex encoding, encryption, and redundancy approach has great legal impact. Destruction of an existing record is impossible to hide. Falsifying an already written record is physically impossible (how do you alter a few 1-pm spots that have been destroyed?), and logically difficult (how do you fix the CIRC redun-
Figure 1 . Shareable resources. Electronic libraries, electronic lab notebooks, and other groupware products can facilitate creation of new ideas, quickly flesh them out from literature sources, help expedite group research, and turn the ideas into products in a shorter time. (Adapted with permission from Pacific Northwest Laboratory.)
dant code?). When one considers how simple it is to burn or shred paper records and how easy it is to alter a written number on a piece of paper, it is clear why many groups are quite sanguine concerning the legal admissibility of writeonce-read-mostly (WORM) optical drive records (10-13). Although no court case has yet tested this analysis, it is just a matter of time before the precedent is set. One problem does remain—the lifetime or nature of the disk drive. As technology matures, new drive systems will develop, and eventually backward compatibility will cease. At that point, legally bonded disk rewriting services will emerge that can write legacy disks to new media and maintain acceptability. Getting from here to there. Prophetic transmission scenarios that have been explored by Steve Schmidt of Parke Davis and our research group involve wireless (rf and IR) links from bench laptops or docking laptops, coupled with a server with a redundant array of inexpensive disks (RAID, magnetic) that serve as a high-speed cache buffer to the WORM drive archive. Time and date stamping is done at the server, providing further protection and certifiability. The RAID
drives are hot swappable, assuring that any component failure does not result in loss of data. Present and future
Management recognizes that shareable tools (groupware) like those described are a way to more effectively use the research it supports, shorten the product development cycle, and respond to regulatory and customer queries (Figure 1). But the bench scientist may view them with suspicion and question. They appear to invade personal space and private possessions, yet the accounts of our research should eventually be available to a broader audience, and in a corporate environment are company property. Perhaps our concerns can be ameliorated by reviewing what a good SEA can give us as individuals: automatic indexing of experiments, full-text search and retrieval, multjthread viewing, automatic report generation, parallel literature reference storage, seamless interaction with chemometrics software, image storage of relevant materials generated locally or absorbed over the Internet, voice-over narration of complex procedures supporting compressed video images, simplified network Analytical Chemistry, July 1, 1995 431 A
A/C
Interface
communication with other workers for astory scientists. Project planning tools, consistance and distribution, and manuscript ferencing tools, and decision-support preparation by electronic "cut-and-paste" tools will slowly drift from the business enoperations. These capabilities alone justify vironment to the scientific lab. Many the effort involved in re-engineering our groupware products created for the busiwork practices. ness environment will take on new features such as image management and So far we have addressed the quesmigrate into the lab (Figure 2). Scientific tions "should we?" and "can we?" What databases, particularly those with molecuabout "will we?" No implementation will lar CAD capabilities, will mature into real succeed without strong support and input SEAs. The next few years will be critical. from the users, nor will it succeed without highly visible commitment from tech- Logic and product quality do not always determine success in the marketplace. Lab nically aware higher management. Getworkers need to take part in the developting these steps to occur concurrently, along with a dramatic change in work hab- ment of these products, which are too important to be left to information scienits, different communication pathways, tists, marketing cleverness, or chance. and altered management structures and styles, will be a formidable task. The shareable resource provides its own prob- Q u e s t i o n s t o a s k lems because of human nature. A profesHow do you evaluate first-generation prodsional's responsibility to colleagues and ucts? The current environment is similar to their ideas should be the same whether the LIMS market in the mid-1980s. Electhe communication is in verbal, written, or tronic lab notebooks are rapidly evolving toelectronic form. Good computer records ward the SEA paradigm because better and proper access control can help curb, software creation tools are available. Do you but not cure, abuses. We are the most for- experiment now and start up the learningmidable problem these new tools face. adaptation curve or do you wait until a stable environment exists? This is a question Current electronic lab notebook you must answer, but there are some simproducts are just the first of a series of groupware products that will affect labora- ple tests you can apply to today's products.
«fCSA Mosaic; Document V File
Options
Mivigtt*
Onnoiai* ~\
Document TM1· 'OBI· Hon» P*g* URL ; t-.'.-.f : - • www. « n s l . p i l . j w /
; RetMtMiel 1 ff.)
%\
J Environmental Moîecular Sciences Laboratory The iioyirooDKaU! Moiccuiar Scien^Lsbofatorv fEMSL) is totaled al t North *ost I-aboratodes CPSl.i in Httblind. Washington. A new facility m construction will bouse EMSL research activities along with M I tapotera.
(BΟ-
m-
Figure 2. A new view of the lab. A typical screen image of a collaborator environment showing electronic lab notebooks, whiteboards, shared database views, electronic library browsing, and personal conferencing. (Adapted with permission from Pacific Northwest Laboratory.) 432 A
Analytical Chemistry, July 1, 1995
Remember that there is no substitute for hands-on experience. Market pressures have made the words "release version" ap ply to beta-level developments. Today's beta forms are usually alpha-level code. The short half-life of products has forced ven dors to pre-announce products, deliver them too early, and rely on upgrades to fix problems. Is the system configurable to your lab's needs? Each type of lab, and each lab within that subset, has a unique personal ity. That personality must change as it adapts to these powerful tools, but neither human nor lab personalities can change immediately. Does the product import and export data with equal facility? Does it support the data structures (objects) commonly in use? Some products have word-processing capabilities that do not support sub scripts and superscripts, a fatal flaw in a chemical environment. Is it easy to make literature sources part of the account? Can equivalents of Post-it notes be used for attaching personal comments or for small-group exchanges? Is the equivalent of a whiteboard available for larger group communications? Are these notes made a permanent part of the record? Can they be hidden for those who want an unclut tered view of the original work? In OODBs, the space requirements can grow at a phenomenal rate as new types of objects are added to the system. Be sure to try your activities on the software for "size." Is journaling (transaction logging) sup ported? All activities at the server need to be recorded. Are audit trails maintained for Good Laboratory Practices (GLP) pur poses? WORM optical disks do not and cannot remove old files or change up dated records. They create entire new cop ies. But back-links to the original data or records must be maintained and be auto matically accessible. Graphic, tabular, and CAD objects can be embedded in other text objects. The an choring in these compound documents should be "live," so that alteration of any object is automatically reflected in all other connected objects and that fact is visually flagged. Is it possible to define trigger condi tions that automatically cause the execu tion of predefined procedures? This per-
mits the user to define conditions (such as concentration, time, and conjunctions of observations) that will automatically create reports, flag the user's terminal, or create warning messages. Optical scanning, both optical character recognition (OCR) and bitmapped, is important. New algorithms for lossless (2-3:1) and lossy (20:1) storage compression are powerful tools for conserving space. Many scientific textual applications can use bitonal compression. Standards are in a great state of flux and there are legal squabbles over several common storage and encryption methods. Are access control schemes available that satisfy both corporate and user needs for privacy and security? What user identification and verification methods are supported and needed? The responses to the FDA white paper circulated to industry asking its attitude toward such techniques for use in electronic submission of new drug applications (NDAs) ranged from ID/password, through "smart cards," to biometrics (signature or fingerprint). The current consensus is that any two of these will be acceptable to FDA Access control and simple identification procedures may not be enough. Further encryption may be an important security issue. This may involve encryption on disk and/or encryption for network communication. Asymmetric private key/ public key approaches involve two keys, both of which are needed for clear text exchange (14). The transmitter encrypts the source with a private key. Authorized readers have a public decryption code that unscrambles the data stream. This is a "pretty secure" system, particularly if certificate objects and an authentication code are used. Other approaches generate separate faster symmetric session keys in which the same key encrypts and decrypts (15). It might take 64 MIPS years (million instructions per second) to crack such a 40bit cipher, one year on a 64 MIPS machine (full military grade is 128-bit ciphers; cipher length is random bit pattern used for encryption.) Some techniques involve multipartite keys, and the several parts can be separately escrowed for emergency recovery. Some paranoia exists in this arena.
All of these developments will stress existing networks, and distance and bandwidth will achieve paramount importance. Ethernet will evolve into Switched Ethernet, fiber-distributed data interface (FDDI), or asynchronous transfer mode (ATM) (16) to increase capability as users exchange larger and different types of records. Current copper-based Ethernet runs at 10-100 Mbit/s, but as the system load increases, user bandwidth drops precipitously. Although switching hubs can regain some of this, Ethernet is basically a contention protocol and is not deterministic. FDDI is a time-token protocol that is deterministic. (In contention protocols, users compete for the facility. In deterministic protocols, use is scheduled.) The packet lengths in Ethernet are variable and inherently long. Interactive work and wider area network applications work best with shorter lengths, called cells. ATM, which uses 53-byte cells, is a time-division multiplexed approach (one in which available time is sliced into sections and allocated to users). Eventually, gigabit/second nets will be needed.
References
(1) Borman, S. Chem. Eng. News 1994, 72(21), 10-20. (2) Press, L.J. Assoc. Comput. Mack. 1992, 35, 26. (3) Peat-Marwick group resource sharing project (ftp: pub/wais/doc/wais-corp.txt@ think.com). (4) Lysakowski, R.J. Chromatogr. Sci. 1994, 32,236-42. (5) Progoff, I. At a Journal Workshop; Jeremy Tarcher: Los Angeles, CA 1992. (6) "The Second Decade—Computer-Supported Collaboration"; Intel Video 241226001; Literature DA03; 800-548-4725. (7) InfoWorld 1994,16, Nov. 14 issue, p. 65. (8) Kouzes, R. Nat. Sci., March 1995, pp. 190-97. (9) Dessy, R. Anal. Chem. 1985,57, 692 A698 A (10) "Authentication of Electronic Records for Legal Defensibility"; Electronic Records Consortium, Oct. 1994. The document contains information on legal issues of weight versus admissibility, and relevance, reliability, and uniqueness (FDA criteria), as well as credibility and corroboration (PTO criteria). Contact Kewal Likhyani at DuPont Fibers, Chestnut Run Plaza, Laurel Run Building, Wilmington, DE 19880-0705 (302-999-3845; fax 302-9992805) for more information. (11) Perritt, H. H., Jr. "Electronic Records Management and Archives," Univ. ofPittsburgh Law Review 1992,53(4), 998. (12) "Guidelines for Admissibility of Records Although scientists should rely on Produced by Information Technology Sysin-house communications and network tems as Evidence," AIIM Technical Report, TR 31-19920. groups, it is important to monitor their activ(13) Barry, R. E. "Electronic Document Manities. Too often control of communications agement and Records Management Sysleads to control of power. Premature, detems: Toward a Methodology for Requirements Definition," In OIS 93: Proceedlayed, or incorrect decisions can prove diings of the Document Management sastrous to the lab. One example is the exConference; Hendley, T., Ed.; Meckler Ltd: planation of why ATM cells are 53 bytes London, 1993. (14) Simmons, G. J. Commun. ACM 1994, long. Europe originally wanted 16 (phone 37(11), 56. and voice); the U.S. 128 (computers). The fi(15) http://home.mcom.com/info/rsa.html; nal compromise was 48 (plus 5 header http://www.rsa.com/; http://home. mcom.com/home/manual_docs/ bytes), simultaneously too short and too index.html; http://home.mcom.com/ long (16). Logic doesn't always win! info/security-overview.html;ftp://ftp. csn.net/mpj; and ftp://ftp.eff.org/pub/ Net_info/Tools/Crypto. I thank Mary Woodward of Tripos, Kris Pet(16) Commun. ACM 1995, 38(2). tersen of Megalon, Esther Allen of MDL, and Steve Schultz of ForeFront Technologies, whose questions prompted much of the first ar- Raymond E. Dessy, Emeritus Professor of ticulation of these ideas. David Kilman of the Chemistry at VPI&SU, became interested in Advanced Computing Laboratory at Los Alamos SEAs in 1987 when he wrote a prototype first brought the connection between scientific electronic notebooks and Progoff s work to in Forth that he still uses for his research involving the design and study ofmicrobiomy attention. I extend special thanks to research colleagues Ching-Wan Yip and Yue-Ling sensors. Address comments to him at the DeWong, who opened their minds and personal partment of Chemistry, VPI&SU, Blacksdatabases to many queries regarding networkburg, VA 24061-0212. ing and ciphers and who performed the digital modifications of thefigures.Richard Kouzes of Pacific Northwest Laboratory's Collaboratory Project (http://www.emsl.pnl.gov:2080) proThis article is available in the vided the original screen images. Kewal LikhHot Articles section of the ACS yani of DuPont/Electronic Records ConsorPublications Division home page tium and Steve Schmidt of Parke-Davis provided valuable references on legal aspects of (http://pubs.acs.org). electronic records. Analytical Chemistry, July 1, 1995 4 3 3 A