Atom-by-atom searching using massive parallelism. Implementation of

Implementation of atom mapping on the distributed array processor DAP-610, the MasPar MP-1104, and the connection machine CM-200. David J. Wild and ...
14 downloads 0 Views 1MB Size
J . Chem. If. Comput. Sci. 1991, 31, 225-233

generously supported by the Bundesministerium fur Forschung und Technologie (BMFT). The authors and the Beilstein Institute gratefully acknowledge this support. We would also like to thank Dr. G. Penzlin of the Beilstein Nomenclature Department for his expert advice. REFERENCES AND NOTES (1) Part 1:

Wisniewski, J. L. J . Chem. Inf. Compur. Sci. 1990, 30,

324-332. (2) Intemational Union of Pure and Applied Chemistry, Organic Chemistry Division, Commission on Nomenclature of Organic Chemistry. Nomenclarure oforganic Chemistry, Sections A , B, C, D, E, F,and H, 1979 ed.; Rigaudy, J., Klesney, S. P., Eds.; Pergamon Press: Oxford, England, 1979 559 pp. (3) Cooke-Fox, D. 1.; Kirby, G.H.; Rayner, J. D. Computer Translation of IUPAC Systematic Organic Chemical Nomenclature. J. Chem. In5 Compur. Sci. 1989, 29, 101 and references therein. (4) Ref 2, p 6. (5) Ref 2, p 7. (6) Ref 2, p 9. (7) Ref 2, p 80. (8) Weissbach, 0. Private communication. (9) Ref 2, p 13. (IO) Ref 2, p 14. (11) Ref 2, p 17. (12) Ref 2, p 18. (13) Ref 2, p 19.

225

(14) Ref 2, P 20.

# ;:I: (17) Ref 2, (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31)

(32) (33) (34) (35) (36)

20. Ref 2, pp 22-27. Ref 2, p 28. Ref 2, p 30. Rlicker, G.; Rocker, C. Nomenclature of Organic Polycycles out of the Computer-How to Escape. the Jungle of the Secondary Bridges. Chimia 1990, 44, 116. Ref 2, p 31. Ref 2, p 35. Ref 2, p 38. Ref 2, p 42. Ref 2, p 106. Ref 2, p 46. Haberfield, P. Chem. Eng. News 1990,68 (34). 4. Ref 2, p 47. Ref 2, p 53. International Union of Pure and Applied Chemistry, Organic Chemistry Division, Commission on Nomenclature of Organic Chemistry, Revision of the Extended Hantzsch-Widman System of Nomenclature for Heteromonwycles (Rccommendations 1982). Pure Appl. Chem. 1983, 55, 409-416 and references therein. Ref 2, p 56. Ref 2, p 62. Ref 2, p 64. Ref 2, p 101. Ref 2, p 64.

Atom-by- Atom Searching Using Massive Parallelism. Implementation of the Ullmann Subgraph Isomorphism Algorithm on the Distributed Array Processort PETER WILLETT* and TERENCE WILSON Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, U.K. STEWART F. REDDAWAY Active Memory Technology, 65 Suttons Park Avenue, Reading RG6 lAZ, U.K. Received January 30, 1991 The AMT Distributed Array Processor (DAP) is a massively parallel SIMD processor array that contains thousands of processing elements. This paper describes the implementation of atom-by-atom searching on the DAP using Ullmann's subgraph isomorphism algorithm. Two alternative algorithms are discussed. The first of these allows rapid processing of a single structure, the adjacency matrix of which is distributed acms the array of processing elements. The second is much slower in execution for a single molecule but allows very large numbers of structures to be searched in parallel. With current codes, the first algorithm is faster and out-performs a large mainframe; however, developments of the second algorithm are expected to make this the faster. Combined algorithms are also described that utilize both approaches. INTRODUCTION Parallel processing involves the use of some, or many, processors operating in parallel to reduce the elapsed time that is required to carry out some computational task.'" Parallel processing techniques have been used for scientific and engineering applications for many years; more recently, there has been increasing interest in the use of such techniques for applications that involve extensive nonnumeric or symbolic computation. One such application is that of database searching, which involves the matching of a query record against a large database of analogous records to identify those that satisfy some retrieval criterion. While each individual matching operation may be quite rapid in execution, searching can be highly demanding of computational resources when a large database containing many records needs to be processed: * To whom all correspondence should be addressed.

'Dedicated Sheffield.

to

Michael Lynch to celebrate his 25 years at the University of 0095-2338/91/1631-0225$02.50/0

this suggests the use of a data parallel4 system architecture that allows the query record to be compared with as many database records as possible at the same time. The last few years have seen many reports of the use of parallel computer hardware for a range of chemical and biochemical database applications. Examples of this include studies of searching protein sequence d a t a b a ~ e sof , ~clustering ~~ chemical structure databases,'** and of both atom-by-atomel' and maximal common substructure searching.12 In this paper, we report the initial results of an investigation of the implementation of atom-by-atom searching using a massively parallel computer, the Active Memory Technology (AMT) Distributed Array Processor (DAP). The second section describes the particular atom-by-atom searching procedure that we have investigated, this being based on U11mann's subgraph isomorphism algorithm." The third section then describes the DAP. The fourth section presents two, very different, ways of implementing the Ullmann algorithm on the 0 1991 American Chemical Society

226 J . Chem. InJ Comput. Sci., Vol. 31, No. 2, I991

W I L L E T ET AL.

DAP, and this is followed, in the fifth section, by the results of atom-by-atom searches that use these two implementations. The paper concludes with a summary of our main findings and suggestions for future work.

that can be mapped to airthen there must exist a neighbor of bj, y , that can be mapped to x. This can be stated more formally as follows: if ai corresponds to bj, Le., if mij = 1, then

ULLMANN'S SUBGRAPH ISOMORPHISM ALGORITHM

(i.e., for all x , if x is a neighbor of ai, this implies that there must exist an atom y in B such that x and y correspond, and y is a neighbor of bj). If this condition is not true, then mu is set to 0. The heuristic above is tested for every m . . = 1, with i = 1 ... Pa and with j = 1 ,.. PB. Generally some the mij elements will be negated, thus reducing the number of possible mappings between query and database atoms. If during this process any element of M is negated, the refinement is repeated since this may cause further changes to be made. If at any stage during the refinement process the following condition holds 3 i ( l ... P,)[mij = 0 V j ( 1 ... PB)]

Chemical substructure searching involves the identification of all of those compounds in a structure database that contain some specified query substructure, irrespective of the environment in which the query substructure occurs. Substructure searching is the chemical equivalent of the graph-theoretic problem of subgraph isomorphism, which involves determining whether a query graph is contained within another graph and which is known to belong to the class of NP-complete computational problem^;'^.'^ in the chemical context, subgraph isomorphism is usually referred to as atom-by-atom searching. The NP-complete nature of atom-by-atom searching means that a chemical substructure search normally commences with a screening search, which rapidly eliminates that great percentage of the database that cannot possibly contain the query substr~cture.'~*'~ There is an obvious brute-force algorithm for subgraph isomorphism, given the adjacency matrices A and B representing a query pattern and a database structure containing Pa and P@atoms, respectively. The algorithm involves gencombinations of Pa atoms from erating all P,!/[P,!(P, - Pa)!] the structure and then determining whether any of these combinations is an exact match for the query pattern, this match involving an exhaustive check of the arrangement of all of the atoms and bonds. This is prohibitively expensive for all but the smallest queries and structures, and backtracking and pruning procedures are normally used to order and to control the search. Such procedures are widely used for improving the efficiency of graph matching operations (and of other types of combinatorial problems that can be expressed as a depth-first tree search).'* Ullmann noted that drastic increases in the efficiency of depth-first backtracking search can be obtained by the use of a refinement pr0~edure.l~ This is a heuristic that is called at each node of the search tree in order to eliminate branches leading to mismatches and, thus, to minimize the amount of searching that is required. Ullmann advocated the use of the refinement technique in the context of general graphs. Brint and Willett demonstrated that it appeared to be very well suited to the processing of 3-D chemical graphs,I9 and the algorithm has been used in 3-D substructure searching systems Downs et for both small moleculesZoand al." subsequently showed that the refinement procedure was analogous to the use of relaxation techniques in substructure searchingZ2and that it appeared to be more efficient for 2-D substructure searching than the conventional set reduction algorithm. Using the notation given previously, the refinement procedure involves a match matrix, M,that contains all of the possible equivalences between atoms from A and atoms from B. The elements of this matrix, m y (1 Ii IPa; 1 Ij IP@) are such that m.. = "

I

1 if the ith atom of A can be mapped to thejth atom of B 0 otherwise

M is initialized by setting to one all elements that correspond to pairs of atoms (a,,b,Jthat are of the same atomic type and that have sets of connected atoms and bonds such that the set surrounding ai is identical with or a subset of that surrounding bj. The Ullmann heuristic may be stated as follows: given a query atom a, that has a neighbor x, and a database atom bj

\d ~ ( ...l Pa)[(aiZ= 1 ) * 3 y ( l ... Pb)(m,bjy = 1))

(Le., there exists an atom i in A such that mi.= 0 for all of the atoms, j , of B) then a mismatch is identided, since there exists a query atom that cannot be mapped to any of the database atoms. This condition is tested after every iteration of the refinement procedure so that mismatches are identified as rapidly as possible. The refinement procedure can thus be expressed in pseudo-code as follows: mismatch:=FALSE REPEAT change := FALSE FOR i := 1 TO Po DO FOR j := 1 TO Po DO IF mij THEN BEGIN

= 1) + 3y (1.. .Po)(mz,. b,, = 1)) change:=change OR (NOT(m,j)) END mismatch:=3 i (1 .. . P,)(m,j = 0 V j (1...Po) UNTIL (NOT(change) OR mismatch) mi, := V z (1.. .Po)((ai,

The description so far has been simplified in that we have considered only binary adjacency matrices, Le., matrices that note whether two atoms are connected, without consideration of the type of bond that links them together. Such information is readily incorporated by assuming that the matrices A and B contain integer bond types rather than Boolean connectivities. In this case, the Ullmann heuristic may be restated as follows: Given a query atom a, that has a neighbor x connected to it by a bond of type t , and a database atom bj that can ba mapped to a,, then there must exist a neighbor of bj, y , that is connected to it by a bond of type t and that can be mapped to x . Note that the symmetric nature of the adjacency matrices means that byj = bjy and thus the mathematical formulation of the heuristic IS

V x(l ... Pa)[(ai,= t )

3 y(1

... P@)(m, A

(bjy = t ) ) ]

A pseudo-code version of the complete Ullmann algorithm is

given in the Appendix to this paper, while a worked example of its use is presented by Mitchell et a1.21 DISTRIBUTED ARRAY PROCESSOR The DAP is an example of a processor ~ r r a y . ~ ~The -~' earliest designs for processor arrays, by Unger28 and by Slotnick et al.,29defined the basic machine architecture that consists of a 2-D synchronous array of simple processing elements, or PES, under the supervision of a single master control unit, or MCU, which broadcasts instructions for execution in the processor array. A schematic diagram of the DAP is shown in Figure 1, from which it will be seen that a processor array is an example of a single instruction stream, multiple

ULLMANN SUBGRAPH ISOMORPHISM ALGORITHM

J. Chem. I n t Comput. Sci., Vol. 31, No. 2, 1991 227 NORTH

-

Columnm

Processors

Array Store EAST

WEST

F i i 1. Schematicdiagram of the DAP,illustrating the way in which instructions are broadcast from the MCU to the processor array.

data stream (or SIMD) computer' with all of the PES executing the same sequence of instructions in parallel: a facility known as masking can be used to disable PES temporarily as a result of previously executed instructions. Individual types of processor array are characterized by the number and complexity of the PES, the size of the local memory that is associated with each of the PES, and the type of interconnection network that is used to link the PES together.' The first processor array actually constructed was the ILLIAC IV, which was a word-slice computer with one word per PE processed at a time. More recent processor arrays, such as the DAP, the Massively Parallel Processor,LMand the Connection Ma~hine,~'J* have all been based on arrays of bit-organized PES. The bit-level organization results in arithmetic processes being carried out in software, so that operation times are dependent on their bit complexity and precision. Thus, an operation on a 16-bit integer is faster than that on a 32-bit integer, and integer operations are much faster than real operations. This characteristic of the DAP also ensures that logical operations are executed very efficiently indeed and that fast, high-level functions can be built up in software to operate in parallel on the array of PES. The DAP was the first array processor to be marketed commercially, having been originally developed by International Computers Limited using a large mainframe processor as the host; the DAP is now manufactured by AMT, who produce versions hosted by VAX or SUN workstations. Current models consist of 1024 or 4096 PES in a 32 X 32 or 64 X 64 2-D array (the DAP 510 and DAP 610, respectively) and with memory sizes of up to 512 Mbytes. An individual PE is connected to its N, E, S,and W neighbors, so as to give a 2-D array, and the PES in each row and column are connected by data highways, which allow data to be broadcast (or received) from the MCU (or all of the PES) simultaneously. These linkages are all bit-serial in nature and are illustrated in Figure 2. PES at the edges of the array are connected to those on the opposite boundary in a wraparound fashion. There is a body of powerful software, the Parallel Data Transforms (PDTs) referred to under Algorithm 11, for achieving a wide range of types of data rearrangement. A fast input/output channel allows the transfer of data in or out of DAP memory at a rate of some 50 Mbytes/s, this characteristic making the DAP well suited to real-time applications such as signal processing and graphics. The DAP is usually programmed in either a high-level Fortran extension, Fortran Plus, which was used for all of the work reported here, or a low-level assembler, APAL (although other language extensions, such as C, Prolog and Ada, are under development or have been studied). Fortran Plus is essentially an extension of Fortran 77 that allows the full exploitation of the parallel capabilities of the DAP and that encompasses many of the features that are expected to become available in Fortran 90, e.g., vector and matrix handling functions. A national DAP service is provided in the U.K. by the Centre for Parallel Computing at Queen Mary and Westfield

1-5

Row

SOUTH

Figure 2. Interconnection of the PES comprising the DAPs processor array.

College, University of London. The experiments here used a DAP 610 at the Centre. This machine contains a 64 X 64 array of PES, each of which has 8 Kbytes of local storage, giving 32 Mbytes overall. PARALLEL IMPLEMENTATIONS OF THE ULLMANN ALGORITHM

Introduction. There are two main types of data parallelism that can be used to implement a sequential algorithm on a processor array:33 Inner loop parallelism involves the identification of any inherent parallelism in the algorithm that is to be implemented, so that different parts of it can be executed concurrently. On an SIMD machine, this involves a p plying the same operations to (subsets of) the data structures involved in the algorithm. Outer loop parallelism involves the same, undivided algorithm being applied to different data items at the same time? This type of parallelism is particularly well suited to the architecture of processor arrays since the records comprising the database that is to be searched can be distributed across the available PES. A search is then effected by loading the query record into the array processor's master control unit and broadcasting the attributes that have been specified in the query for comparison with the sets of attributes characterizing the records stored in each of the PES. The distribution of the database may be done statically, by assigning each PE one record (or group of records), with all PES sharing the same code; a review of the use of processor arrays for database searching using this approach is provided by Willett and Rasmussen.*' Alternatively, the distribution of the database can be carried out dynamically by means of load balancing procedures, which ensure that records are assigned to PES during program execution as the latter become available. This model of computation is often used with parallel machines that make use of the processor farm model of computation, where computational work is allocated to processors as they become free for further processing; the use of processor farms for database searching is discussed by Cringean et aL3*and by Downs et al." These two, rather different approaches to parallel searching are exemplified by the two DAP implementations of the Ullmann algorithm that we have studied as described Algorithm I uses inner loop parallelism while Algorithm I1 uses outer loop parallelism. Algorithm I. The Ullmann algorithm consists of two main parts, these being the depth-first backtracking tree search and the refinement procedure that is called at each node of this search tree. Much of the first stage is inherently sequential

228 J . Chem. In6 Comput. Sci., Vol. 31, No. 2, 1991

WlLLETT ET AL.

in character and thus not obviously suited to an SIMD implementation. However, the great bulk of the processing takes place during the refinement procedure, and this is eminently well suited to the DAP, since refinement involves a large number of logical operations that can be mapped onto the DAP with great efficiency. Specifically, one bit in each of the 64 X 64 array of PES can be used to represent one element in each of the logical matrices, A, B, and M, that lie at the heart of the Ullmann algorithm: The DAP allows the simultaneous processing of an entire matrix, whereas a conventional processor requires that each matrix element, mij, must be processed in sequence. We have noted that the Ullmann heuristic is

V x(l

... Pa)[(aix= 1)

==)

3 y(1

... Ps)(mxybjy= I)]

(1)

Let us define a vector R the elements of which, ri, are given by ri = 3 y(l ... Ps)(m,byj = 1) (2) then the Ullman condition may be rewritten as mV = mij

A ii A a,

V i, j such that mij = 1

(3)

This is the reverse of eq 1 and is inefficient for sequential implementation: it is, however, useful when considering a parallel implementation of the algorithm. If A, B,and M are DAP-sized matrices, with each element corresponding to a single PE, ri (i = 1 ... Pa) may be evaluated in parallel and the results placed in a vector R: ri = 3 y(l

... Ps)(mx,,byj= 1) V x

=j

A x

= 1 ... Pa (4)

The matrices M and B must be aligned to allow this operation to be executed. This is done by working with MT, instead of M, to allow m e to be executed as a single logical AND of two matrices. he vector R may be determined by using the DAP function ORROWS, which evaluates the OR of all of the elements in a matrix column for all of the rows in the matrix: this operation is also performed efficiently on the DAP. To complete a single relaxation eq 3 must be completed for j = 1 ... Ps given a fixed x . For the DAP implementation the reverse case is adopted: when executing eq 3, x varies over j = 1 ... Pa, since we deal with the matrix A in parallel and j is fixed by x = j . In order to vary j in the domain j = 1 ... P6 for a fixed x, M T is shifted after each execution of eq 3, Ps times in all. It follows that the matrix A must also be shifted similarly, which corresponds to checking the heuristic for a single neighbor x and updating M accordingly. To execute the heuristic for the other neighbors, the adjacency matrices must be aligned correctly by shifting. The actual order in which neighbors are processed depends upon the position in the array: however, the heuristic is complete for every neighbor after Ps shifts. Equations 3 and 4 may be written in Fortran Plus as R=MATR(ORROWS(M.AND.B)) M=M.AND..NOT.( (.NOT.R).AND.A) where MATR is a DAP operator that replicates a vector ES times to fill a matrix rowwise (ES is the edge size of the DAP, which was 64 for the DAP 610 that we have used). The second line of code can be simplified by using De Morgan's Theorem to give M=M.AND.(R.OR..NOT.A) The relaxation algorithm may thus be implemented as follows, where ALL, LEQ, SHEC, and TRAN are Fortran Plus operators for checking that some condition holds in all of the PES, for logical equivalence, for shifting East in a cyclic

manner, and for transposing a matrix, respectively, and where ORCOLS is analogous to ORROWS: REPEAT MTRAN=TRAN(M) MCOPY=M FOR J:=1 TO Pp DO BEGIN R=MATR( ORROWS( MTRAN.AND.B)) M=M.AND.(R.OR..NOT.A) MTRAN=SHEC(MTRAN) A=SHEC(A) END

EXIT=.NOT.(ALL(ORCOLS(M))) NOCHANGE=ALL(MCOPY.LEQ.M) UNTIL (EXIT.OR.NOCHANGE)

This code requires the use of a DAP with an edge size, ES, such that ES = PB,Le., a DAP that contains PB X P6 PES, if the cyclic shifts are to route the data in A and MTRAN correctly. This is unlikely to be the case in practice, and one of the following strategies will need to be adopted:36 Perform the loop ES times: this is optimally efficient when Ps = ES but is much less efficient when Ps