Advances in Scientific Software Packages

While most publicity about the software industry focuses on applications in office and home settings, there is also significant development taking pla...
2 downloads 4 Views 657KB Size
4 Advances in Scientific Software Packages Channing H . Russell

Downloaded by UNIV OF AUCKLAND on December 24, 2017 | http://pubs.acs.org Publication Date: June 27, 1986 | doi: 10.1021/bk-1986-0313.ch004

BBN Software Products Corporation, Cambridge, M A 02238

Early scientific software packages focused on compilers, individual applications, and specific aspects of computer support such as statistics. More recently, software packages provide a broad, integrated, easy to use, and extensible set of capabilities to support research data management. RS/1 (TM) is described as an example of modern scientific software. While most p u b l i c i t y about the software industry focuses on applications i n o f f i c e and home s e t t i n g s , there i s also s i g n i f i c a n t development taking place i n packaged software s p e c i f i c a l l y designed for research data management. Use of commercially a v a i l a b l e s c i e n t i f i c data management software, e i t h e r as a stand alone t o o l or as the base f o r b u i l d i n g a p p l i c a t i o n s , i s increasingly contributing to the p r o d u c t i v i t y of research organizations. I t i s i n d i c a t i v e of the growing importance of the s c i e n t i f i c software market that a major corporation, Bolt Beranek and Newman Inc. (BBN), has recently formed a new business a c t i v i t y c a l l e d BBN Software Products, focused e n t i r e l y on developing software for science. BBN Software Products' leading software product, RS/1 (TM) (BBN, 1979), incorporates the concepts and technologies developed through BBN's h i s t o r y of support of s c i e n t i f i c applications. RS/1 i s an integrated information-handling environment supporting data management, a n a l y s i s , graphics, s t a t i s t i c s , modeling, and reporting. The package i s used by a broad spectrum of s c i e n t i s t s engaged i n a wide v a r i e t y of research and development a c t i v i t i e s . Based on t h e i r corporate experience i n supporting t h e i r own research and development projects as w e l l as a number of governmentsponsored a c t i v i t i e s , the development of s c i e n t i f i c software at BBN grew out of a long h i s t o r y of support of a v a r i e t y of s c i e n t i f i c applications. For example, i n tho. 1960's, BBN developed a h o s p i t a l information system f o r Massachusetts General Hospital which used an early minicomputer, the PDP-1, to support c l i n i c a l research a c t i v i t i e s . Under the sponsorship of the National I n s t i t u t e s of Health, they developed a system c a l l e d PROPHET (1) which provides a 0097-6156/86/0313-0023$06.00/0 © 1986 American Chemical Society

Provder; Computer Applications in the Polymer Laboratory ACS Symposium Series; American Chemical Society: Washington, DC, 1986.

Downloaded by UNIV OF AUCKLAND on December 24, 2017 | http://pubs.acs.org Publication Date: June 27, 1986 | doi: 10.1021/bk-1986-0313.ch004

24

COMPUTER APPLICATIONS IN THE POLYMER LABORATORY

n a t i o n a l timesharing service to s c i e n t i s t s studying the e f f e c t s of chemical substances on b i o l o g i c a l a c t i v i t y . PROPHET operates as a kind of automated laboratory notebook, oriented around data structures such as tables and graphs which are f a m i l i a r to s c i e n t i s t s (Castleman et a l . , 1974). A s i m i l a r o r i e n t a t i o n towards data representations f a m i l i a r to s c i e n t i s t s i s used i n the CLINFO system, which provides s p e c i a l i z e d support f o r the study of time-oriented c l i n i c a l data i n the general c l i n i c a l research center s e t t i n g (Gottlieb et a l . , 1979). RS/1 represents a d i s t i n c t contrast to e a r l i e r s c i e n t i f i c software. In general, early software used by s c i e n t i s t s took three primary forms. (1) Languages such as FORTRAN, BASIC, and PASCAL were h e l p f u l i n supporting highly computational a p p l i c a t i o n s , but were not designed to support research data management. (2) S p e c i f i c s c i e n t i f i c programs were also developed which were l i m i t e d to supporting a s i n g l e narrow applications area and were usually operable only using p a r t i c u l a r computer system equipment. (3) Specialized software packages began to appear i n s p e c i f i c areas, such as SAS (SAS, 1982) and BMDP (Dixon, 1975) i n s t a t i s t i c s and MLAB (Knott, 1979) i n curve f i t t i n g . Some of the e a r l i e r software, such as SAS or APL, has since been extended to provide broader support, often with l i m i t a t i o n s that r e f l e c t t h e i r h i s t o r y . In a d d i t i o n , integrated business packages, such as Lotus 1-2-3 i n the microcomputer world and some of the data management packages, have also been used f o r s c i e n t i f i c work. BBN took a more systematic approach than had been used i n the development of e a r l i e r software packages f o r the s c i e n t i f i c market; the company's goal from the s t a r t of the development process was t o design an integrated general purpose package to provide broad-based i n t e r a c t i v e support f o r research data management. The r e s u l t was RS/1. C h a r a c t e r i s t i c s of Modern S c i e n t i f i c

Software

In order to understand the c h a r a c t e r i s t i c s of state-of-the-art s c i e n t i f i c software, a d e s c r i p t i o n of the f a c i l i t i e s of the RS/1 system i s presented i n t h i s section as an example of the kind of software now a v a i l a b l e to support s c i e n t i f i c data management needs. A l l f a c i l i t i e s are based i n a s i n g l e integrated system. No extra steps are needed, f o r example, to use tabular data as the basis f o r s t a t i s t i c s or to graph the r e s u l t s of modeling. Data Management Data management i n RS/1 i s based on two-dimensional tables. Each c e l l of a table can contain data representing f i x e d or f l o a t i n g point numbers, dates, times, or free text. C e l l s i n a p a r t i c u l a r column are not a l l constrained to the same type; i t i s possible, f o r example, to include a note about some missing data i n a column of numerical r e s u l t s . A user can work with many hundreds of tables. Tables are based on d i s k f i l e s , accessed through a kind of paging scheme, so there i s no l i m i t on table s i z e . Some users work with tables containing hundreds of columns and tens of thousands of rows.

Provder; Computer Applications in the Polymer Laboratory ACS Symposium Series; American Chemical Society: Washington, DC, 1986.

4. RUSSELL

Advances in Scientific Software Packages

25

Data Analysis Data analysis makes d i r e c t use of tables. A new column can be created as a transformation of e x i s t i n g data. No a d d i t i o n a l steps are needed to create the column; a s i n g l e command defines the transformation and transfers the data (Figure 1). The r e s u l t s can be seen on the screen immediately, again without a d d i t i o n a l commands or setup steps.

Downloaded by UNIV OF AUCKLAND on December 24, 2017 | http://pubs.acs.org Publication Date: June 27, 1986 | doi: 10.1021/bk-1986-0313.ch004

Graphics RS/1 supports the major kinds of a n a l y t i c a l graphics needed i n a s c i e n t i f i c s e t t i n g , including s c a t t e r p l o t s , f i t t e d curves, bargraphs, histograms, p i e c h a r t s , three-dimensional d i s p l a y s , (Figure 2 ) , and contour p l o t s . Graphs are stored as permanent data objects that can be edited and redisplayed. This makes i t p o s s i b l e to transform a graph made f o r analysis i n t o a graph s u i t a b l e f o r p u b l i c a t i o n . An advanced terminal independent graphics support c a p a b i l i t y permits the output of pictures on more than a hundred d i f f e r e n t graphics devices. Statistics S t a t i s t i c s a v a i l a b l e i n the system include a large set of commonly used analysis techniques, as w e l l as advanced nonlinear curve f i t t i n g techniques. S t a t i s t i c a l r e s u l t s can be displayed numerically or graphically. Modeling Modeling o f f e r s a spreadsheet-like c a p a b i l i t y , which permits the i n t e g r a t i o n of several d i f f e r e n t spreadsheets and the i n c l u s i o n of general data manipulation commands i n c e l l s . Text and Graphics An optional extension to the system makes i t simple to produce documents containing mixed text and graphics i l l u s t r a t i o n s , which can be output on a laser p r i n t e r . Easy to Use Ease of use i s a major theme i n modern commercial software. The best packages o f f e r several s t y l e s of i n t e r a c t i o n s u i t a b l e f o r advanced or beginning users. I n RS/1, both a command based and a menu based s t y l e of i n t e r a c t i o n are a v a i l a b l e . Using an Englishl i k e command, a bargraph can, f o r example, be constructed from a table i n a s i n g l e step (Figure 3 ) . Curve f i t t i n g can be accomplished i n much the same s t r a i g h t forward way (Figure 4). Menu oriented i n t e r a c t i o n s are also supported to make complex e d i t i n g easy to understand, and to provide a way f o r beginning users to s t a r t using the system. Thorough documentation and optional d i r e c t and videotape t r a i n i n g are available.

Provder; Computer Applications in the Polymer Laboratory ACS Symposium Series; American Chemical Society: Washington, DC, 1986.

26

COMPUTER APPLICATIONS IN THE POLYMER LABORATORY

D r i l l Test Data

Downloaded by UNIV OF AUCKLAND on December 24, 2017 | http://pubs.acs.org Publication Date: June 27, 1986 | doi: 10.1021/bk-1986-0313.ch004

0 Bi t Code 1 2 3 4 5 6 7 8 9

1 Ho r dne ss

N22 N32 N44 N81 N82 S14 S17 S19 S22

0 0 0 0 0 0 0 0 0

2 Feet Drilled

80 84 80 80 84

218 309 662 512 1 14

82 80 83 83

319 404 844 56

Col l e c t e d 4/18/82 Measurements of h e i g h t # COLUMN 5 OF

+/-

3 F i no I He i ght 0 0 0 0 0 0 0 0 0

4

50 48 41 46 59 49 47 33 74

Lower Bound

5

Upper Bound

0 0 0 0 0 0 0 0 0

0.550 0.528 0.451 0.506 0.649 0.539 0.517 0.363 0.814

450 432 369 414 531 441 423 297 666

10%

DRILL = COLUMN 3 * 1 . 1



Figure 1. A single command allows the user to create a new column i n an RS/1 table as a transformation of existing data. (Reproduced with permission from University of South Carolina Press: Columbia, S.C., 1986; Channing Russell In Research Data Management in the Ecological Sciences; Michener, William, Ed.; pp 373-381.)

Figure 2. Graphs, like the 3-dimensional display shown here, can be stored as permanent data objects that can then be edited and redisplayed. (Reproduced with permission from University of South Carolina Press: Columbia, S.C., 1986; Channing Russell In Research Data Management i n the Ecological Sciences; Michener, William, Ed.; pp 373-381.)

Provder; Computer Applications in the Polymer Laboratory ACS Symposium Series; American Chemical Society: Washington, DC, 1986.

Advances in Scientific Software Packages

4. RUSSELL

27

Downloaded by UNIV OF AUCKLAND on December 24, 2017 | http://pubs.acs.org Publication Date: June 27, 1986 | doi: 10.1021/bk-1986-0313.ch004

A L L O C A T I O N OF WATER TO A G R I C U L T U R E . I N D U S T R Y , DOMESTIC CONSUMPTION

INDIA

MEX .

MONG. J A P A N X///A

USSR HUNGARY

US

W.

GER

UK

INDUSTRY AGRICULTURE

IXN^SN DOMESTIC

CONSUMPTION

F i g u r e 3. A b a r graph can be c o n s t r u c t e d from a t a b l e i n a s i n g l e s t e p . (Reproduced w i t h p e r m i s s i o n from U n i v e r s i t y o f South C a r o l i n a P r e s s : Columbia, S.C., 1986; Channing R u s s e l l I n Research Data Management i n the E c o l o g i c a l S c i e n c e s ; M i c h e n e r , W i l l i a m , Ed.; pp 373-381.)

Provder; Computer Applications in the Polymer Laboratory ACS Symposium Series; American Chemical Society: Washington, DC, 1986.

COMPUTER APPLICATIONS IN THE POLYMER LABORATORY

Downloaded by UNIV OF AUCKLAND on December 24, 2017 | http://pubs.acs.org Publication Date: June 27, 1986 | doi: 10.1021/bk-1986-0313.ch004

BIT

0

WEAR

200

400

FEET A =

1 2 3 4 5 6 7 8 9

N22 N32 N44 N81 N82 S14 SI 7 S19 S22

600

800

1000

DRILLED

Final He i g h t 1/ (1+SQRT(3.497648e-03*X)) I)r i 1 1

Bi t Cod e

ANALYS1 S

Hardness

0 0 0 0 0 0 0 0 0

80 84 80 80 84 82 80 83 83

Test

Da t a

lee t D r i l l ed 218 309 662 512 11 4 319 404 844 56

Collected 4/18/82 Measurements of height

F i na 1 He i g h t 0 . 50 0.48 0.4 1 0 .46 0 . 59 0. 49 0.4 7 0 .33 0.74

+/-. 1 0 %

F i g u r e 4, Users can f i t c u r v e s t o the data p o i n t s i n an RS/1 t a b l e by e n t e r i n g a s i m p l e E n g l i s h command. (Reproduced w i t h p e r m i s s i o n from U n i v e r s i t y o f South C a r o l i n a P r e s s : Columbia, S.C., 1986; Channing R u s s e l l I n Research Data Management i n the E c o l o g i c a l S c i e n c e s ; M i c h e n e r , W i l l i a m , Ed.; pp 373-381.)

Provder; Computer Applications in the Polymer Laboratory ACS Symposium Series; American Chemical Society: Washington, DC, 1986.

4.

RUSSELL

Advances in Scientific Software Packages

29

Downloaded by UNIV OF AUCKLAND on December 24, 2017 | http://pubs.acs.org Publication Date: June 27, 1986 | doi: 10.1021/bk-1986-0313.ch004

Extensibility A key a t t r i b u t e of software intended f o r use i n science i s e x t e n s i b i l i t y . Research data management, almost by d e f i n i t i o n , involves s p e c i a l purpose information handling beyond standard data management techniques. These s p e c i a l i z e d needs are often confined to one p a r t i c u l a r phase of a n a l y s i s ; i d e a l l y , s c i e n t i f i c software should support the smooth i n t e g r a t i o n of special-purpose programming i n t o an a p p l i c a t i o n which also makes maximal use of what i s already a v a i l a b l e i n the system. Too often i n the past, e n t i r e new systems had to be constructed i n a language l i k e FORTRAN because the e x i s t i n g h i g h e r - l e v e l data management systems did not support the needed kinds of customization. There are several kinds of e x t e n s i b i l i t y b u i l t i n t o RS/1. A f u l l structured programming language c a l l e d RPL i s a part of the system. This language, s t y l i s t i c a l l y s i m i l a r to PL/I, also allows d i r e c t access to data objects such as tables and graphs, and allows the intermixture of h i g h - l e v e l data management commands with t r a d i t i o n a l programming constructs. The RPL language i s designed f o r easy-to-write, compact programs. Like APL, i t supports a run-time environment i n which v a r i a b l e s can represent d i f f e r e n t data types at d i f f e r e n t times. There i s no need f o r the kind of data declarations which make programming awkward i n t r a d i t i o n a l languages. Flexibility In the s c i e n t i f i c world, maximum f l e x i b i l i t y i s needed i n i n t e r f a c i n g to a v a r i e t y of d i f f e r e n t programs, i n accessing various databases, and i n outputting information to d i f f e r e n t kinds of graphics devices. This goes f a r beyond the kinds of f l e x i b i l i t y needed i n business-oriented packages. RS/1 supports an a b i l i t y to c a l l programs w r i t t e n i n other languages, and has interfaces to a growing set of commercial data base systems. E s p e c i a l l y powerful a p p l i c a t i o n s can be constructed using information from a large database as the basis f o r analysis. Output can be directed to a v a r i e t y of p r i n t e r s , p l o t t e r s , and display terminals, and i t i s even p o s s i b l e f o r sophisticated users to add support f o r new kinds of devices, through a f a c i l i t y c a l l e d the terminal data table, without a need f o r system programming. Applications of S c i e n t i f i c Software S c i e n t i f i c software packages are beginning to have a large impact on p r o d u c t i v i t y i n research organizations. In a major pharmaceutical research and development organization at Merck, Sharp and Dohme, RS/1 i s used at several d i f f e r e n t l a b o r a t o r i e s i n the U.S., Canada, the U.K. and c o n t i n e n t a l Europe to share data. A large VAX i n s t a l l a t i o n supports users i n d i f f e r e n t s i t e s through a computer network. C o l l a b o r a t i v e projects between d i f f e r e n t l a b o r a t o r i e s , using data shared through the c e n t r a l system, have become common.

Provder; Computer Applications in the Polymer Laboratory ACS Symposium Series; American Chemical Society: Washington, DC, 1986.

30

COMPUTER APPLICATIONS IN THE POLYMER LABORATORY

The semiconductor industry makes heavy use of manufacturing data to provide ongoing q u a l i t y c o n t r o l information. RS/1 i s used as the c e n t r a l software t o o l to t i e together a number of d i f f e r e n t data bases and software systems to make the data a v a i l a b l e f o r graphing and a n a l y s i s . More than a thousand s c i e n t i s t s and engineers use RS/1 at DuPont s Experimental Station i n Wilmington, Delaware. I t was p a r t i c u l a r l y a t t r a c t i v e to DuPont that the software was a v a i l a b l e on microcomputers as w e l l as superminicomputers; t h i s enabled them to adopt RS/1 as a standard f o r remote research labs and engineering stations i n production plants throughout the Eastern seaboard. In the new f i e l d of genetic engineering, s c i e n t i f i c data management software i s used to manage the long alphabetic codes that represent genetic sequences, as w e l l as more t r a d i t i o n a l numeric and text applications. At Genentech, s c i e n t i s t s use the software for these tasks as w e l l as f o r laboratory data a n a l y s i s .

Downloaded by UNIV OF AUCKLAND on December 24, 2017 | http://pubs.acs.org Publication Date: June 27, 1986 | doi: 10.1021/bk-1986-0313.ch004

T

Future Developments Advances i n computer science continue to serve as the basis f o r new extensions to software products. In p a r t i c u l a r , a r t i f i c i a l i n t e l l i g e n c e techniques have begun to mature to the point at which they can play a r o l e i n s c i e n t i f i c software. In the future, s c i e n t i f i c software w i l l incorporate expert systems technology i n order to provide a new l e v e l of assistance to s c i e n t i s t s i n applying s t a t i s t i c a l and graphical techniques to data analysis. Two major areas are l i k e l y to be the focus of expert systems i n the s c i e n t i f i c software area: a s s i s t i n g users without extensive s t a t i s t i c a l t r a i n i n g i n s t a r t i n g to use s t a t i s t i c s , and helping design m u l t i f a c t o r experiments. Acknow1e dgment s The PROPHET system i s sponsored by the Biotechnology Resources Program, D i v i s i o n of Research Resources, National I n s t i t u t e s of Health under contract #N01-RR-8-2118.

Literature Cited Castleman, P. A., Russell, C. H., Webb, F. N., Hollister, C.A., Siegel, J.R., Zdonic, S.R., Fram, D.M., "The Implementation of the PROPHET System", AFIPS Conference Proceedings, Vol. 34, pp. 457-468, 1974. Dixon, W.J., Ed. BMDP Biomedical Computer Programs, 3rd Ed., Los Angeles, University of California Press, 1975. Gottlieb, A.G., Fram, D.M., Whitehead, S.F., Rubin, G.M., Russell, C.H. , Castleman, P.A., Webb, F.N., "CLINFO: A Friendly Computer System for Clinical Research", XII International Conference on Medical and Biomedical Engineering, Jerusalem, Israel, 1979. Knott, Gary D., "MLAB - A Mathematical Modeling Tool", Computer Progress in Biomedicine, Vol. 10, No. 3, pp. 271:280, December 1979. SAS Institute Inc., SAS User's Guide: Basics, 1982 Edition, Cary, NC, 1982. RECEIVED May 5, 1986 Provder; Computer Applications in the Polymer Laboratory ACS Symposium Series; American Chemical Society: Washington, DC, 1986.