operations research in analytical chemistry - American Chemical Society

B-1640 Sint-Genesius-Rhode, Belgium. Leonard Kaufman. Centre of Operational Research and Statistics. Vrije Universiteit Brussel. Terhulpse Steenweg 16...
0 downloads 0 Views 2MB Size
Désiré L. Massart Pharmaceutical Institute Vrije Universiteit Brussel Paardenstraat 67 B-1640 Sint-Genesius-Rhode, Belgium

Leonard Kaufman Centre of Operational Research and Statistics Vrije Universiteit Brussel Terhulpse Steenweg 166 B-1050 Brussels, Belgium

One of the principal preoccupations of the analytical chemist is to devise optimal analytical methods. To do this, he must very often make a choice between many possible combinations, even for very simple procedures such as a colorimetric determination, in which the result depends on the values of one or (usually) more parameters. The analyst's task is then to select the optimal combination of parameter values so that the determination is optimised with respect to some criterion such as the signal-to-noise ratio. Such a problem can be solved by many different optimisation techniques. One of the most attractive, by virtue of its simplicity, is the simplex method described in an earlier REPORT by Deming and Morgan (i). This is an evolutionary operations method which is part of the vast science called operations research (OR). There are many definitions of OR. For our purpose, we define it simply as a collection of

mathematical methods permitting the selection of the "best" combination from a large set of possibilities. There are many instances in analytical chemistry in which the best combination should be selected and in which OR can be useful. Several examples will be given in this report. The main applications of OR deal with organisation problems with economical or social implications; therefore, applications to problems of this nature in the management of analytical laboratories are rather straightforward. Goulden's (2) recent article about OR and other management techniques in analytical research, development, and service covers part of these applications. Therefore, only one such application will be described here, namely, the choice of the optimal combination of apparatus and manual methods in a clinical laboratory (3). We will try to show that OR can

1244 A · ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

also be used in the optimisation and/ or selection of analytical procedures or programs by introducing some of the more important methods used in OR with typical examples from management science and by applying these to analytical situations. To permit easy understanding, these situations are greatly simplified, but in fact, more complex problems can be tackled. In most cases, because of space limitations, we have omitted mathematical details and have only described the construction of the OR model. Although the examples used reflect the authors' interest in chromatography, the methods are equally applicable to other branches of analytical chemistry. Optimal Configuration of a Clinical Laboratory (Integer Programming) De Vries (3), in investigating some of the economical aspects of clinical

Report

OPERATIONS RESEAR IN ANALYTICAL CHEMI

laboratories, has shown that the prob­ lem of minimising costs in the chemi­ cal laboratory can be reduced to an in­ teger programming problem. It is sup­ posed that / different determinations must be carried out with a number of apparatus or manual methods. There are J apparatus (manual, 1-, 2-, 4-, 6-, 8-, 12-channel apparatus with or with­ out direct digital readout) available, and the problem consists of making a selection among these to carry out all the determinations at total minimal cost. All determinations of the same substance must be carried out by the same apparatus. Costs taken into ac­ count are fixed, Cj, and variable (the cost per analysis), Vj, of each appara­ tus. The number of determinations with each apparatus, Nj, and the max­ imal capacity, Lj, of each apparatus are also given. The problem can be formulated mathematically as follows: Minimise Σ (Cj + NjVj)Xj

The constraints (Equation 3) express that no apparatus can be bought if its capacity is smaller than required. There are a number of mathematical methods to solve such linear integer programs. Essentially, these methods can be divided into two groups. In the first group, called cutting plane methods, the solution is found by solving the problem as a linear program (the vari­ ables Xj are considered as real instead of integer). At the same time a new constraint is added at each iteration. In the second group of methods, called enumeration procedures, the set of possible solutions is divided into sub­

sets which are examined successively until the best solution is found. The branch and bound method (discussed in a later section) is part of the latter group. A complete discussion of these methods can be found in Zionts (4). Distribution Problem (Graph Theory)

Suppose that a production unit is connected with a number of clients through a pipeline. One must know how to interconnect the clients and the production unit so that the pipe­ line has a minimal length. The dis­ tances between the clients and the production units and between the

Figure 1. Examples of trees in a graph. Uppermost figure is minimal spanning tree of complete graph given in Figure 2

(a)

(1)

subject to the constraints Σ mijXj = 1

i=l,...,7

NjXj *= Lj

(2) (3)

XjAO, 1]

where Xj is a 0-1 variable equal to 1 if apparatus j is part of the selected (op­ timal) configuration and equal to 0 when this is not the case, and m;7 is a coefficient equal to 1 if it is possible to use apparatus j for determination i and 0 if this is not the case. The eco­ nomic or objective function (Equation 1) expresses the total costs which are to be minimised. The constraints (Equation 2) express that exactly one apparatus must be used for each kind of determination, exclude that one kind of determination should be car­ ried out on more than one apparatus, and require that each kind of determi­ nation should indeed be carried out. ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975 · 1245 A

Table 1. Distance Between Points in Figure 2 A A Β C D Ε F G

Β

0 28 0 32 23 35 40 100 80 119 104 127 105

C

D

Ε

F

G

0 60 0 103 75 0 128 90 29 0 126 105 30 35

0

Figure 2. Complete graph for distribution problem

clients themselves are known. In Fig­ ure 1, A is the production unit and B-G are the clients. Two possible con­ figurations of the pipeline are given in the figure. It is clear that (a) is a bet­ ter solution than (b). Let us now see how to find the optimal one. By draw­ ing all possible interconnections be­ tween the points, a graph is obtained where clients and production unit are the nodes and the interconnections are the edges, the values of which are given in Table I. Both (a) and (b) of Figure 1 are graphs which are actually part of the graph in Figure 2. These graphs are connected (all points are linked directly or indirectly to each other) and contain no cycle (if F were connected to G, EFG would constitute a cycle). This is called a tree, and the tree for which the sum of the values of the edges is minimal is called the min­ imal spanning tree. In our example, the minimal spanning tree also yields the shortest distribution pipeline. The problem of finding the latter is there­ fore reduced to finding the minimal spanning tree in Figure 2. Several algorithms allow this, the conceptually simplest one being Krus­ kal's algorithm (5), although other al­ gorithms are better adapted for com­ puter calculation. Kruskal's algorithm can be stated as follows: "Choose from the edges that are not yet part of the tree, the one with the smallest value which does not form a cycle." This is applied to Figure 2 with the values of Table I. One starts by selecting the smallest value in the table: Step 1: edge BC (23) Step 2: edge AB (28) Step 3: edge E F (29) Step 4: edge EG (30) Step 5: not edge AC (cycle), but AD (35) Step 6: not FG, BD, or CD (cycle), but DE (75). The optimal distribution network is therefore given by Figure 1 (a). How can such a distribution problem be used in analytical chemistry? If one carefully considers Figure 1 (a) (which

is drawn on scale), one observes that two clusters can be distinguished, A-D and E-G. These clusters can be ob­ tained formally by breaking up the longest edge (DE) in the tree. Clustering techniques can aid in de-

veloping optimal analysis procedures. One example is the selection of opti­ mal combinations of thin-layer chro­ matographic systems. When one elab­ orates a qualitative identification scheme with this technique, one has to

o?o UbC

010

0 40

0Λ8

0 4b

-8 Figure 3. Minimal spanning tree for eight TLC systems for basic drugs. Distances between systems are (1-p) values. Values of correlation coefficient obtained from ref. 7

Figure 4. Communications network

1246 A · ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

combine two or more TLC systems so that they do not yield too much corre­ lated information. By clustering the available systems according to their similarity and selecting one system from each cluster, this purpose can be attained. This was applied by De Clercq and Massart (6) to the selection of the best combination of three TLC or PC sys­ tems from a data set containing Rf data for 100 basic drugs in eight sys­ tems (7). By using the correlation coefficients between each pair of sys­ tems as calculated by Moffat, and con­ sidering (1-p) as the distance between systems, a graph such as in Figure 2 was obtained. The minimal spanning tree derived from this graph is given in Figure 3. If one accepts the convention that edges should be broken only when the resulting clusters consist of at least two members, the best combination must consist of one system from each of the groups 2 + 3/4 + 5 + 6/1 + 7 + 8. By measuring the information con­ tent (8) of each separate system and selecting the best system from each group, one obtains in a rather simple way the presumably best combination 3, 6, 7. This is in fact, the optimal solu­ tion as shown by Moffat and Smalldon (7), who calculated how many pairs of

drugs could be separated with each possible combination. This is, of course, the most accurate way of se­ lecting the optimal combination. It is also a rather more elaborate method than most practising separation chem­ ists are prepared to carry out. In con­ trast, the complete OR technique, cor­ relation coefficient calculations in­ cluded, can be carried out in one after­ noon with the aid of a pocket calcula­ tor and yields almost certainly the best or second best possible combina­ tion. Communication Network Problem (Graph Theory) Psychologists and sociologists have used graphs to represent the commu­ nications between individuals in cer­ tain organisations and in this way to deduce certain characteristics about the organisation in question. An inter­ esting application for analytical chem­ istry in this respect is the work by Allen (9) and Frost and Whitley (10) about communication patterns in re­ search and development laboratories. A typical problem in this area is the determination of the groups of indi­ viduals between whom communica­ tions exist. Suppose some of a popula­ tion of eight people, A-H, are directly in contact with each other, whereas

others are not. The problem is to dis­ tinguish the sets of people between whom a communication (direct or in­ direct) exists. For example, in Figure 4, C and F communicate directly, Β and D indi­ rectly (by way of A or E), and there is no communication at all between the sets ABDE and CFGH. In graph theo­ retical terms {ABDE) and (CFGHj are connected graphs, whereas (ABCDEFGH) is not, and the problem is re­ duced to finding the connected com­ ponents of the latter graph. One of the possibilities for applying this in the development of analytical methods is: Let us suppose that we want to do research on the thin-layer chromatographic separation of a rath­ er large group of substances and, in particular, we want to develop better methods than the existing ones. One of the possible strategies is to select those groups that are particularly hard to separate by existing TLC pro­ cedures and to concentrate on finding better methods for those substances. A group of substances that are hard to separate is defined as a group consist­ ing of substances difficult to separate from at least one of the other sub­ stances of the group. There are several possibilities for determining which pairs of substances answer this de-

This valuable tool includes 15 new items from two supplements of the 4th edition, plus 36 new reagents, such as calcium sulfate, quinoline, benzoyl chloride, lactose, silver diethyldithiocarbamate, and many more. The fifth edition also contains for the first time: • flame and flameless atomic absorption methods • new colorimetric test for arsenic • polarographic and chromatographic pro­ cedures • Karl Fischer method for water

REAGENT CHEMICALS

By returning the green card in the back of the book, you will receive free reprints of two future supplements to be announced in An­ alytical Chemistry.

American Chemical Society Specifications 5th Edition

685 pages (1974) Cloth bound $32.50. Post­ paid in U.S. and Canada, plus 40 cents else­ where.

Analytical chemists—update your reference shelf by ordering this in­ dispensable handbook of the latest ACS specifications for 320 reagent chemicals.

Order from: Special Issues Sales American Chemical Society 1155 Sixteenth St., N.W. Washington, D.C. 20036

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975 ·

1247 A

Table I I . hRf

Values of A n t i b i o t i c s (fi

m Réf.

11) System

No.

Antibiotic

1

2

3

4

5

6

100 4 91

84 77 90

43 6 93

90 23 88

_7 89 3 89

8 88 0 45

9 20 27 86

10 92 0 85

11 15 0 9

1

Actinomycin

92

2 3

A m i n o c i d i n sulphate Chloramphenicol

0 86

25 87 85

4 5 6

Chlortetracycline hydrochloride Colistin sulphate Cycloheximide

28 0 72

51 94 87

81 65 92

82 96 91

31 49 91

64 54 80

56 24 80

0 14 59

43 14 93

0 0 70

0 0 13

7 8 9

Demethyltetracycline Dihydrostreptomycin Etamycin

25 0 93

31 95 33

60 8 90

73 95 94

30 0 92

45 57 95

43 0 94

0 0 85

28 0 39

27 0 93

0 0 15

10

Filipin

95

11 12

Griseofulvin Leukomycin tartrate

90 94

0 0 82

97 96 96

87 91 95

94 80 94

93 90 93

92 92 94

0 94 93

0 0 28

75 95 95

0 89 15

13 14 15

Lincomycin hydrochloride Mikamycin Β Misionin

38 91 72

100 13 0

90 95 93

95 81 86

89 94 67

71 96 79

66 95 72

65 55 50

100 25 0

0 97 0

0 27 0

16 17 18

Neomycin Nystatin Oxacillin

0 6 46

95 0 88

0 84 93

77 68 96

0 0 95

45 55 .66

0 53 68

0 0 0

0 0 94

0 0 58

0 0 0

19 20 21

Paromomycin P e n i c i l l i n G-Na Penicillin V - K

0 40 37

92 94 88

0 88 88

81 94 94

0 91 94

22 62 57

0 64 62

0 0 0

0 100 100

0 0 0

0 0 0

22 23 24

P o l y m i x i n Β sulphate Puromycin Pyrrolidinemethyltetracycline

0 49

97 65

67 92

96 95

0 86

67 73

52 67

0 43

0 18

0 0

0 0

7

65

83

82

43

55

36

25

37

0

0

25 26 27

Rifamycin Ο Rifamycin S Rifamycin SV-Na

96 95 91

0 28 43

94 93 94

100 94 93

93 92 81

100 93 92

100 94 91

100 83 91

65 77 79

100 92 100

54 69 50

28 29 30

S p i r a m y c i n base Staphylomycin S t r e p t o m y c i n sulphate

86 93 0

94 45 100

100 100 0

100 0 85

93 93 0

94 94 36

88 92 0

0 18 0

7 34 4

99 100 0

0 0 0

31 32 33

T e r r a m y c i n base T y l o s i n base V t o m y c i n sulphate

25 92 0

74 92 100

92 92 0

100 100 100

54 92 0

67 87 49

55 85 0

10 35 0

52 43 0

0 100 0

0 0 0

scription. One of the simplest, though not necessarily the best, seems to be to determine the euclidean distance. For substances A and Β this is equal to OAB = V Σ l(hRf)A ~

Wf)B]2

(4) where η is the number of systems for which hRf values are given. One can then consider difficult those separa­ tions for which this value is lower than a predetermined value, draw a graph in which those pairs of solutes cata­ logued as difficult to separate are con­ nected, and look for the connected components of the graph, exactly as in the communication network de­ scribed. When the number of compo­ nents is too large, the graph is too

complex and a matrix notation is used in which a 1 means that there is a direct link between two components and a 0 means that this is not the case. As an example, this was carried out for 33 antibiotics. Their Rf values taken from ref. 11 in 11 TLC or PC systems are given in Table II. Substances are difficult to separate when DAB < 50. The resulting matrix is given in Table III. Using a very simple algorithm, one concludes that the following group of chromatographically similar substances are present: • Aminocidin sulphate, dihydrostreptomycin, neomycin, paromomycin, streptomycin sulphate, viomycin sulphate • Chlorotetracyclin, cycloheximide, tylosin base

1250 A · ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

• Chlorotetracyclin hydrochloride, colistin sulphate, demethyltetracycline, pyroUidinemethyltetracycline, terramycin base • Etamycin, mikamycin Β • Penicillin G-Na, Penicillin V-K • Rifamycin 0 , rifamycin S, rifamy­ cin SV-Na. Such a classification also has some significance where the structure of these antibiotics is concerned: the oli­ gosaccharides dihydrostreptomycin, neomycin, paromomycin, and strepto­ mycin are found in one group and so are the tetracyclines (chloro-, demethyl-, pyroUidinemethyltetracyc­ line, and terramycin), the penicillins, and the rifamycins. When the sets are large, the probability increases that the complete graph is connected so

Table I I I . Communications Matrix f r o m Data in Table II Using Equation 4 1

2

3

4

5

1 2 - 3 - 4 _ _ 5 6 - 1 7 . . 1 . 8 - 1 9 10 11 12 13 14 15 16 - 1 17 18 1 9 - 1 20 21 22 - - - - - 23 - 24 - 1 1 25 - 26 27 - 28 29 30 - 1 - - - 31 - 1 32 - 1 3 3 - 1

6

7

.

.

8

9 10 11 12 13 14 15 16 17 18 19

20 21 22 23 24 25 26 27 28 29 30 31 32 33

1 1

-

1

1

-

-

1

-

-

-

-

-

-

-

-

-

-

-

-

-

1

-

1

1

-

-

1 :

that no groups can be separated. In this case, one can proceed by decreasing the value used as a criterion for qualifying the separation as difficult. The algorithm proposed here is a classification algorithm which constitutes a pattern cognition or clustering technique. Therefore, the classification problem presented earlier as a distribution network problem can also be solved in this way and vice versa. In fact, classification can be carried out by a variety of techniques, some of which are not generally considered as OR methods. Some of these have been used in analytical chemistry: a statistical technique called pattern cognition was introduced for the classification of GLC stationary phases by Wold (12), and numerical taxonomy was proposed (13) for the same purpose and for the combination of TLC systems problems (14). In terms of the latter technique, the algorithms described here are single-linkage algorithms. They are related to pattern recognition techniques which in the last few years have become accepted techniques in analytical chemistry (1517). There is a difference between pat-

1

-

-

1

1

1

-

-

-

- 1

tern cognition and recognition techniques: in pattern recognition one must first distinguish between two or more given classes (clusters) according to patterns (representing, for example, mass spectra) with the aid of a training set of known compounds and then use the pattern of an unknown compound to identify it as belonging to one of the classes. In pattern cognition, one determines which classes can be distinguished without the use of a training set. Here, we discuss OR classification techniques with a TLC example because this is a rather simple application which can be solved by simple algorithms. However, applications are also possible in GLC (13, 16) and infrared and mass spectrometry. More sophisticated algorithms are then necessary. Shortest Path Problem (Graph Theory and Dynamic Programming)

The theory of graphs can be applied to the optimisation of chromatographic separation schemes for multicomponent samples by using a shortest path

-

-

-

1

-

-

-

algorithm (18). The same result can be obtained by using dynamic programming (19) which is a method of sequential optimisation based upon Bellman's (20) principle of optimality: "A policy is optimal, when at a given stage and whatever the preceding decisions, the decisions which remain to be taken constitute an optimal policy taking into account the preceding decisions" or, more succinctly, an optimal policy is composed of optimal subpolicies. This principle can be applied to problems in which many decisions are required to obtain an optimal result in a system composed of sequential steps, on the condition that the later stages do not influence the results obtained after the earlier stages. It is used quite often in chemical reactor designs such as the design of multistage crosscurrent liquid-liquid extraction or distillation (21). Analogous applications can be found in analytical chemistry. One can, for example, apply this to the optimisation of gradient elution chromatography or to certain countercurrent distribution applications. Since the mathematics tend to be complex, we will use two simple applications to il-

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975 · 1251 A

Figure 5. Map showing possibilities for construction of highway between points A and Κ

column. These ions are considered to be initially present together on a col­ umn (notation ABC//). One can then elute one of the elements. If one first elutes C(AB//C), the next step in the separation scheme must be the sepa­ ration of A and B, for example, by elution of A. This leads to the situation B//C/A. The last step in this case is the elution of B, leading to //A/B/C. If one knows [for example, from a data library (18)] the time necessary to carry out each step, a graph such as the one in Figure 7 results (19). The shortest path from ABC// to //A/B/C yields the desired optimal separation scheme. Another example of problem solving by dynamic programming is the fol­ lowing: a company disposes of a cer­ tain amount of money which can be invested in four projects. The expect­ ed profit as a function of the amount invested is known. The question is: what is the combination of invest­ ments which yields the highest expect­ ed profit? This is a well-known prob­ lem to the OR specialist. The same problem arises in analytical chemistry when a program for chemical analysis must be composed so that the amount of material consumed is minimal (and at most equal to a given quantity) and the total information obtained about a number of compounds or elements present is maximal. If there is a way in which the results obtained by the methods in an analytical program can be evaluated numerically, then the op­ timal program can be obtained. The difficulty lies in the numerical evalua­ tion (see Conclusions). Location Problem (Heuristic and Branch and Bound Methods)

Figure 6. Graph representing map given in Figure 5, as used in dynamic program­ ming

lustrate the principles of the method, namely, a shortest path and an invest­ ment problem. Let us suppose that a highway must be built between towns A and K. There are several possibili­ ties depicted in Figure 5. The con­ struction costs for each section are known, and the least expensive route must be determined. A graph is con­ structed as in Figure 6. The points are represented on subsequent levels (I, I I , . . . V), and the costs are calculated level by level. The calculations for B-D are trivial. For Ε there are three possibilities. By way of Β the cost will be 10, by way of C, 9, and by way of D, 11. Whatever the path chosen from Ε to K, the best route through Ε will be by way of C, and one can conclude that if the optimal solution passes through E, it will pass also through C. We have applied Bellman's principle

of optimality: ACE is an optimal subpolicy. In the same manner, the best subpolicy for all other points on level II is calculated. Thereafter, one passes on to the next level. The best substrategy for Η is, for example, ADGH (cost: 19). Optimal subpolicies are selected in this manner until arrival at the last level where the selected subpolicy con­ stitutes the complete optimal policy (ADGHK). This calculation procedure eliminates the consideration of other possible combinations such as, for ex­ ample, the possibility ABEIK. Quite complex multielement sepa­ ration problems (7-11 compounds or elements on several columns) can be resolved by shortest path calculations. The simplest possible case is the sepa­ ration of three ions A, B, and C on one

1254 A · ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

Suppose that the problem is to lo­ cate a series of ρ new supermarkets or other service centers in a country in which there are η villages. Each super­ market must be located in one of the villages, and the problem is to mini­ mise the total distance from the villages to the nearest supermarket. To visual­ ise the problem, consider Figure 8 in which 10 villages are represented by points on a map. In this case, if two supermarkets are wanted, they should be built in A and B. The effect is to split up the set by what is called a two-median, in two halves, so that vil­ lages A, C, D, E, F buy in A and G, H, I, J, Β in B. At this stage it is not so easy to visually select the optimal location of three supermarkets or, in OR language, to find the three-medi­ an. Furthermore, in the economic ver­ sion of this problem, additional condi­ tions can be introduced such as un­ equal sizes of the villages and the pos­ sibility for the customers to fulfill a fraction of their demand in different service centers. Luckily, we do not

ABC//

Level 1

Level 4

Level 2

Figure 7. Shortest path for separation of A, B, and C

Figure 8. Illustration of location model

need to consider these complications in the analytical version which will be explained later. The problem of finding a p-median can be solved in two different ways: one can find the optimal solution by a branch and bound method or a "good" solution (not necessarily the ideal one) by a heuristic method. The latter is usually employed as a starting solu­ tion for the former. A heuristic meth­ od can be defined as a technique used to solve programming problems by the search for a feasible solution which is not necessarily the optimal one. The principal advantage of a heuristic method is the speed with which it is possible to find a "good" solution. This makes it possible to obtain ac­ ceptable solutions of much larger problems than those solvable by exact techniques. In the heuristic solution, one starts with a O-median, i.e., a solution with 0 chosen elements. One element is then

iteratively added to the solution, until ρ such elements or, in other words, a ρ-median has been obtained. This is carried out by adding the element which causes the largest decrease in distance (distance is defined here as the sum of the distances from each el­ ement to the nearest chosen element). One then investigates whether it is possible to obtain a better solution by changing one of the elements. If this is possible, the change which causes the largest decrease is carried out and re­ peated until no further decrease is ob­ served. Branch and bound methods are fairly recent. Proposed by Land and Doig (22) in 1960 for solving the linear integer programming problem, these methods have since found many uses such as in the renowned traveling salesman problem. The basic idea of the branch and bound method is the following. Sup­ pose that the given objective function

is to be minimised and assume that a solution is available (this solution was found by a heuristic method). At first the set of all solutions is partitioned into several subsets (branch). Then for each subset a lower bound (i.e., the lowest value that the objective func­ tion could obtain) is computed for the value of the objective function for the solutions of that subset (bound). The subsets, the bounds of which exceed the value of the known solution, are then excluded from further consider­ ation. One of the remaining subsets is then partitioned further into smaller subsets. Lower bounds are computed on the new subsets, and the process is repeated until a subset contains only one solution. If this solution is better than the best one found previously, it replaces it. When all subsets have been excluded, the method termi­ nates. This was applied to a GLC problem. In GLC there is a large variety of sta­ tionary phases. To characterise them, one measures the retention indices of a number of standard solutes or func­ tional "probes". This is, for example, the basis of the Rohrschneider (23) index. A question which recurs regu­ larly in the literature is how much and which probes should be used. Most authors answer this question on the basis of a retention index library, which is thought to be representative for the whole retention index universum, i.e., the retention indices for all compounds on all known GLC phases. Such a library can take the form of a data set containing the retention index of some 60 compounds on 25 GLC phases (24). Of course, there can be some doubt concerning the validity of considering such a library represen­ tative for all possible compounds and GLC phases. However, we need not go into this here, and we can restate the question of the selection of the probes: how can one choose a number of stan­ dard solutes so that they are as repre­ sentative as possible for a given set? An OR approach to this problem is to construct a graph, the nodes of which are the compounds from which the probes must be selected. If the similarities between the solutes are ex­ pressed numerically in one way or an­ other (here as 1-p, where ρ is the cor­ relation coefficient between the reten­ tion indices observed for the solutes on a selected set of stationary phases), then these values can be thought of as distances. In this way, one obtains a complete graph with edges the values of which are the distances. This is a situation which can be represented as in Figure 8 (the edges are not drawn for the sake of clarity, but their values are supposed to be known). If A . . . J are the solutes and one would want to represent them by two probes, then

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975 · 1255 A

one would choose A and B. Mathemat­ ically, the problem can be described as: Minimise Σ Σ dijXij subject to Σ X'J = ι ί

Xij < Yi ZYi

=P

Sequencing Problem (Heuristic and Branch and Bound Methods)

i

ΥίΦ, 1}

ΧϋΦ, ι) (ί = 1 , . . . ,n)(j

= 1

ferent from all other solutes in the set. The one which resembles it most is di­ oxane. This result, obtained in an en­ tirely independent way, confirms the judiciousness of Rohrschneider's choice. This correspondence indicates that the location model proposed here gives valid results and that it can be applied with success to the choice of sets of ρ = 4 or less functional probes and to the selection of representative substances in other branches of ana­ lytical chemistry.

η)

ρ = number of probes. di;- = distance between substance j and probe i. Xij = a coefficient that permits distin­ guishing which probe is representative for substance /. Xij = 1 if j is closest to probe ί and is therefore represented by i and = 0 when this is not the case. Y, = a coefficient that permits distin­ guishing whether a substance was se­ lected as a probe. Y, = 1 when this is the case and 0 when it is not. Because of the complexity of the mathematics, a detailed account of methods and results will be presented in a later article. As a sample result, the p-medians (p = 1,. . . , 6) that are obtained with the heuristic method for Rohrschneider's data set (23) are given in Table IV. This discussion will be confined to the ρ = 5 result. Rohrschneider proposed ethanol, methyl ketone, nitromethane, pyri­ dine, and benzene as functional probes, whereas in Table IV one finds ethanol, propionaldehyde, acetonitrile, dioxane, and thiophene. Ethanol is found both among our probes and Rohrschneider's, and there is very lit­ tle difference between methyl ethyl ketone and propionaldehyde (p = 0.9995), acetonitrile and nitromethane (p = 0.9988), and benzene and thio­ phene (p = 0.9989). The difference be­ tween dioxane and pyridine is some­ what larger. However, pyridine is difTable I V . Functional Probes Selected from Rohrschneider's Set Using Heuristic Method Ρ = 1 ethylbromide Ρ = 2 dioxane, cyclopentanol Ρ = 3 benzene, crotonaldehyde, cy­ clopentanol Ρ = 4 ethanol, crotonaldehyde, di­ oxane, thiophene Ρ = 5 ethanol, propionaldehyde, acetonitrile, dioxane, thio­ phene Ρ = 6 benzene, ethanol, phenylacetylene, propionaldehyde, ace­ tonitrile, n-dibutylether

In many instances, analytical tests are applied according to some prede­ termined sequence with the aim of identifying a compound. There are several problems for consideration in the optimisation of such a scheme. First is the minimisation of either the average number of tests or the maxi­ mal number of tests in a dichotomous scheme. Katona (25) investigated this problem by using noiseless encoding or combinatorial search theory. These techniques are not generally consid­ ered as being part of OR and will therefore not be discussed here. Secondly, imagine a toxicological laboratory whose purpose is to detect and identify poisons in toxicological samples. Only one poison is contained in each sample, and the probability of occurrence p;, i = 1 . . . η is known. The poison is identified by carrying out one by one a number of selective identification methods. Each method identifies only one compound. If the mean execution time i;, i = 1 . . . η for each method is known, determine the sequence of methods that minimises the mathematical expectation of the sum of the times of execution of the methods that have to be carried out. A simple heuristic method was devel­ oped which, as was proved subse­ quently, gives the exact optimal,an­ swer. The method can be described as: • Range the methods in the order of decreasing p; coefficients • Invert methods i and i + 1 if P;i[+i — Pi+ii; < 0, and repeat this until no such inversions are possible. Conclusions From the examples given, the con­ clusion is that OR techniques do have applications in analytical chemistry. These techniques have already been used to solve real analytical problems (the location and distribution prob­ lem, for instance). This is only to be expected for a science such as analyti­ cal chemistry in which many possibili­ ties must often be compared or com­ bined. At first sight it is rather sur­ prising that more applications have not been proposed in the literature.

1256 A · ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

Apart from the fact that these tech­ niques are either unknown to analyti­ cal chemists or else that they have not realised that such techniques can be applied in this domain, there are a few reasons why operations research tech­ niques have not been used more fre­ quently. In many of the examples given, it is necessary to evaluate nu­ merically properties of analytical pro­ cedures such as the information yield (the investment problem) or the simi­ larity of requisites such as TLC phases (the communication network prob­ lem). In some instances such as the in­ vestment problem, this is not easy at all. In fact, the utility of the solution obtained in such a problem can be questionable because of the lack of an accurate scale of values. If the application of OR techniques in analytical chemistry is more closely investigated, one often encounters the difficulty of according meaningful fig­ ures of merit to a technique. This con­ clusion stresses the need felt by many authors [Kaiser (26), Wilson (27) Dupuis and Dijkstra (28), and us (7)] to develop more scientific ways of ex­ pressing the performance characteris­ tics of analytical methods. The other problem which hampers the applica­ tion of such techniques is that analyti­ cal problems are often too complex to be cut down to the size which can be handled by models such as the graphs which have been used in this report. This is also the main difficulty en­ countered in more classical, i.e., eco­ nomical, applications. In the same way that the optimal solution of a person­ nel allocation problem in industry is likely to be rejected by management for fear of trouble with the trade unions, the analytical chemist will re­ ject an optimal solution which directs him to add strong perchloric acid to a hot ether extract or an optimal set of GLC phases because he wants to work at high temperatures and one of the phases is known to have insufficient thermal stability. In such cases, the solution obtained by OR techniques is used more as a touchstone for evaluat­ ing more realistic solutions, i.e., to see how close to optimality these solutions come. Notwithstanding these difficulties, OR techniques could have many more applications in analytical chemistry than at present, particularly in in­ stances such as the combination, se­ lection, classification, or comparison of spectroscopic, chromatographic, or extraction systems, etc., for which it is relatively easy to obtain meaningful descriptors. OR techniques also have the advantage of working with models that are easily visualised. They should therefore offer an opportunity for for­ malising our way of thinking about an­ alytical chemistry problems and for

e l i m i n a t i n g s o m e of t h e e m p i r i s m w i t h w h i c h we solve s u c h f r e q u e n t l y occur­ ring a n d i m p o r t a n t q u e s t i o n s as t h e selection of t h e b e s t a n a l y t i c a l m e t h o d for a given p r o b l e m .

Acknowledgment T h e authors t h a n k P . Hansen, H. D e Clercq, M . D e t a e v e r n i e r , J . S m e y e r s - V e r b e k e , E . Blockeel, a n d o t h e r c o l l a b o r a t o r s for p e r m i s s i o n t o use u n p u b l i s h e d ideas a n d r e s u l t s .

References (1) S. N. Deming and S. L. Morgan, Anal. Chem., 45, 278A (1973). (2) R. Goulden, Analyst, 99, 929 (1974). (3) T. De Vries, Het Klinisch-Chemisch Laboratorium in economisch perspectief, H. E. Stenfert-Kroese, Leiden, The Netherlands, 1974. (4) S. Zionts, "Linear and Integer Pro­ gramming," Wiley, New York, N.Y., 1974. (5) J. B. Kruskal, Proc. Am. Math. Soc, 7, 48 (1956). (6) H. De Clercq and D. L. Massart, pre­ sented at the Vlllth International Sym­ posium on Chromatography and Electro­ phoresis, Brussels, Belgium, 1975. (7) A. C. Moffat and K. W. Smalldon, J. Chromatogr., 90, 9 (1974). (8) D. L. Massart, ibid., 79,157 (1973). (9) T. J. Allen, Res. Dev. Manage., 1,14 (1971). (10) P. A. Frost and R. D. Whitley, ibid., ρ 71. (11) J. Souto and A. Gonzalez de Valesi, J. Chromatogr., 46, 274 (1970).

(12) S. Wold, Dept. of Statistics, Universi­ ty of Wisconsin, Rept. No. 357, Madison, Wis. (13) D. L. Massart, P. Lenders, and M. Lauwereys, J. Chromatogr. Sci., 12, 617 (1974). (14) D. L. Massart and H. De Clercq, Anal. Chem., 46,1988 (1974). (15) T. L. Isenhour, B. R. Kowalski, and P. C. Jurs, Crit. Rev. Anal. Chem., 4 , 1 (1974). (16) A. Eskes, F. Dupuis, A. Dijkstra, H. De Clercq, and D. L. Massart, Anal. Chem., 47, 2168 (1975). (17) B. R. Kowalski, ibid., ρ 1152Α. (18) D. L. Massart, C. Janssens, L. Kauf­ man, and R. Smits, ibid., 44, 2390 (1972). (19) D. L. Massart, C. Janssens, L. Kauf­ man, and R. Smits, Z. Anal. Chem., 264, 273 (1973). (20) R. Bellman, "Dynamic Program­ ming," Princeton Univ. Press, Princeton, N.J., 1957. (21) G.S.G. Beveridge, and R. S. Schechter, "Optimisation: Theory and Practice," McGraw-Hill, New York, N.Y., 1970. (22) A. H. Land and A. G. Doig, Econometrica, 28, 497 (1960). (23) L. Rohrschneider, J. Chromatogr., 22, 6 (1966). (24) W. O. McReynolds, ibid., 12,113 (1974). (25) G.O.H. Katona, in "A Survey of Com­ binatorial Theory," J. N. Srivastava et al., Eds., ρ 285, North-Holland, Amster­ dam, The Netherlands, 1973. (26) H. Kaiser, Anal. Chem., 42 (2), 24A (1970). (27) A. L. Wilson, Talanta, 20, 725 (1973). (28) F. Dupuis and A. Dijkstra, Anal. Chem., 47, 379 (1975). Financial assistance by FKFO and FGWO.

New Ampholytes from Bio-Rad New Bio-Lyte® carrier ampholytes for isoelectric focusing are now available from stock in one wide working pH range (Bio-Lyte 3/10) and in six narrow pH ranges, Bio-Lyte 3/5, 4/6, 5/7, 6/8, 7/9 and 8/10. (The product designations are indicative of the working pH range.) Made of polyamino-polysulfonic acid, the Bio-Lytes are ideal for use with a polyacrylamide gel as the stabilizing medium, either by substituting directly for the ampholytes you are now using, or by following the suggested formula­ tions in Bio-Rad's Bulletin 1030 or in the instructions that accompany each Bio-Lyte shipment. Bulletin 1030 has all the details, includ­ ing pH profiles, actual separations and complete pricing. It also contains infor­ mation on the new Gel Pro-pHiler described below.

Gel Pro-pHiler With the new Gel Pro-pHiler, miniature pH electrodes and a pH meter, you can take accurate pH readings

D é s i r é L. M a s s a r t (left) was b o r n in G h e n t , Belgium, w h e r e h e o b t a i n e d his c h e m i s t r y degree a n d l a t e r his P h D (in 1969) a t t h e local u n i v e r s i t y as a m e m b e r of t h e staff of Professor H o s t e ' s r e s e a r c h t e a m on a c t i v a t i o n analysis. I n 1968 he was a p p o i n t e d a t t h e t h e n new F l e m i s h U n i v e r s i t y of B r u s s e l s (Vrije U n i v e r s i t e i t B r u s s e l ) . C u r r e n t l y , h e t e a c h e s g e n e r a l a n a l y t i c a l c h e m i s t r y a n d food a n a l y sis a t t h e u n i v e r s i t y ' s P h a r m a c e u t i c a l I n s t i t u t e a n d is d i r e c t o r of t h e l a b o r a t o r y of a n a l y t i c a l c h e m i s t r y . C u r r e n t r e s e a r c h topics a r e a p p l i c a t i o n s of a t o m i c a b s o r p t i o n s p e c t r o m e t r y in t h e m e d i c a l sciences a n d of ion selective electrodes in food, p h a r m a c e u t i c a l , a n d e n v i r o n m e n t a l analysis, a n d f u n d a m e n t a l s t u d i e s on ion e x c h a n g e a n d ion selective electrodes. H e is a u t h o r a n d c o a u t h o r of over 60 scientific p a p e r s . D r . M a s s a r t ' s p e r s o n a l r e s e a r c h a m b i t i o n is t o d e v e l o p objective (i.e., m a t h e m a t i c a l ) m e t h o d s for t h e c h a r a c t e r i s a t i o n , selection, or o p t i m i s a t i o n of a n a l y t i c a l m e t h o d s . L e o n a r d K a u f m a n o b t a i n e d a degree in m a t h e m a t i c s a t B r u s s e l s U n i v e r s i t y in 1970 a n d a P h D in o p e r a t i o n s r e s e a r c h in 1975. At p r e s e n t h e is a n a s s i s t a n t in t h e D e p a r t m e n t of S t a t i s t i c s a n d O p e r a t i o n s R e s e a r c h of B r u s s e l s U n i v e r s i t y a n d t e a c h e s a course in o p e r a t i o n s r e s e a r c h a t t h e P o l y t e c h n i c u m of Lille, F r a n c e . H i s r e s e a r c h i n t e r e s t s include location p r o b l e m s in 0 - 1 p r o g r a m m i n g a n d t h e a p p l i c a t i o n of o p t i m i s a t i o n t e c h n i q u e s t o a n a l y t i c a l c h e m i s t r y . In t h e s e fields he is a u t h o r or c o a u t h o r of 10 scientific p a p e r s .

holds a cylindrical gel in position so you can measure the pH profile of a gel as soon as it is removed from its tube. When you are finished, the gel emerges virtually undamaged and ready for staining. If you are using isoelectric focusing, or if you suspect you should be, then write for Bulletin 1030. You'll find every­ thing you need for this proven method of separating proteins.

Laborator

32nd and Griffin BIO-RAD Richmond, C A

Avenue 94804 P h o n e (415) 234-4130 Also in: Rockville Centre, N.Y.; Mississauga, Ontario; L o n d o n ; M u n i c h ; Milan; Sao Paulo

CIRCLE 37 ON READER SERVICE CARD ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975 ·

1257 A