15
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
The Advanced Flexible Processor, Array Architecture BRUCE COLTON Control Data Corporation, P.O. Box 1249-B, Minneapolis,MN55440
The Advanced F l e x i b l e Processor (AFP) is a r e l a t i v e l y powerful computer employing a h i g h l y p a r a l l e l a r c h i t e c t u r e . I t has been designed to stand alone, hosted by a general-purpose computer, or to f u n c t i o n w i t h i n arrays o f Advanced F l e x i b l e Processors. All of the features f o r efficient i n t e r p r o c e s s o r communication and c o n t r o l are built i n t o each Advanced F l e x i b l e Processor to allow efficient computation i n a m u l t i p r o c e s s i n g environment. The Advanced F l e x i b l e Processor was developed by an advanced computer research d i v i s i o n o f C o n t r o l Data c a l l e d the Information Sciences D i v i s i o n (ISD). The Information Sciences D i v i s i o n began work on the Advanced F l e x i b l e Processor i n 1976. Our primary goal was to develop a programmable computing machine that would provide the computational power and speed r e q u i r e d by many o f the intense a l g o r i t h m i c processes associated with image processing, while p r o v i d i n g some of the flexibility of a general-purpose machine.(1)
0097-6156/81/0173-0245$05.50/0 © 1981 American Chemical Society
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
246
SUPERCOMPUTERS IN CHEMISTRY
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
E a r l y Research and Development i n M u l t i p r o c e s s i n g In 1968 C o n t r o l Data began research i n t o the feasibility of performing some of the tasks a s s o c i a t e d with image processing f u n c t i o n s , such as modular change d e t e c t i o n on digital computers. CDC began t h i s e f f o r t by t e s t i n g algorithms on the super computer of that day, the CDC 6600. Based on t h i s p r e l i m i n a r y a l g o r i t h m i c study e f f o r t on the modular change d e t e c t i o n problem, ISD began development of the 1280 Change D e t e c t i o n System which was a hardwired-logic implementation b u i l t s p e c i f i c a l l y to perform the Change Detection Algorithm. Our experience i n the designing and b u i l d i n g of special-purpose hardwired systems f o r image processing a p p l i c a t i o n s i n d i c a t e d the need f o r a more f l e x i b l e approach to the development of special-purpose systems. In 1972 the Information Sciences D i v i s i o n began development of the F l e x i b l e Processor (FP), which was a programmable, special-purpose computer employing a h i g h l y p a r a l l e l a r c h i t e c t u r e . The F l e x i b l e Processor, l i k e i t s successor the Advanced F l e x i b l e Processor, was designed to operate as an i n d i v i d u a l programmable processing element i n an array of other i n d i v i d u a l l y programmable elements. The F l e x i b l e Processor used a g l o b a l bus i n t e r c o n n e c t i o n system between processors. Later i n v e s t i g a t i o n s began to determine other i n t e r c o n n e c t i o n network a r c h i t e c t u r e s which might prove to be more o p t i m a l l y s u i t e d f o r a M u l t i p l e I n s t r u c t i o n , M u l t i p l e Data Stream (MIMD) type of array architecture(£,3). The products of t h i s i n i t i a l research i n t o v a r i o u s i n t e r c o n n e c t i o n schemes r e s u l t e d i n ISD developing a r i n g connected a r c h i t e c t u r a l approach to l i n k i n g F l e x i b l e Processors i n l a r g e m u l t i p r o c e s s i n g a r r a y s . In 1976 C o n t r o l Data d e l i v e r e d i t s f i r s t modular change d e t e c t i o n system b u i l t around the F l e x i b l e Processor r i n g connected a r c h i t e c t u r e to Wright-Patterson A i r Force Base. Research i n d i c a t e d that a processor capable of performing at computational r a t e s 10 times that of the F l e x i b l e Processor was i n order and would be r e q u i r e d to meet the burgeoning computational demands of the 1980's (4-7)• Thus, C o n t r o l Data Corporation began the development of the Advanced F l e x i b l e Processor using the l a t e s t LSI technology which was developed by CDC f o r use i n i t s most advanced Cyber computers. AFP
Hardware Overview
An Advanced F l e x i b l e Processor i s implemented on four l a r g e scale i n t e g r a t e d (LSI) c i r c u i t panels. The component technology i s emmitter coupled l o g i c (ECL) c h i p s . Each LSI panel c a r r i e s a t o t a l of approximately 500 F200K ECL l o g i c chips and 1,100 ECL 100K l o g i c c h i p s . The Advanced F l e x i b l e Processor employs the same freon c o o l i n g system used i n CDC's Cyber 200 s e r i e s computers. T h i s technology provides an increased r e l i a b i l i t y
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
15.
COLTON
Advanced
Flexible
Processor
247
f i g u r e at the chip l e v e l of approximately 100 times that achievable using ECL 100K l o g i c chips i n an a i r - c o o l e d environment. The rough computational c a p a b i l i t i e s provided by an array of 16 Advanced F l e x i b l e Processors would be approximately 800 b i l l i o n a r i t h m e t i c and l o g i c a l operations per second. A f a r l a r g e r number of operations could be added to the t o t a l i f one were to count the many operations a s s o c i a t e d with operand t r a n s f e r and data management which are concurrently performed by the AFP i n support of the arithmentic and l o g i c a l computations. I n t e r c o n e c t i o n Technology. AFP systems employ a r i n g connected a r c h i t e c t u r a l concept. The i n t e r p r o c e s s o r communication between two adjacent Advanced F l e x i b l e Processors i n the communications r i n g i s approximately 800 m i l l i o n b i t s per second. A unique c h a r a c t e r i s t i c of the r i n g connected a r c h i t e c t u r e employed by the Advanced F l e x i b l e Processor provides a d i s t i n c t advantage i n the performance c a p a b i l i t y of multiprocessor systems. Program p a r t i t i o n i n g s t r a t e g i e s allow one to r e a l i z e p r o p o r t i o n a l increases i n a v a i l a b l e r i n g system intercommunication bandwidth as processors are added to the multiprocessor a r r a y . T h i s feature i s i n d i r e c t c o n t r a s t to other multiprocessor a r c h i t e c t u r e s i n which i n t e r p r o c e s s o r communication i s strangulated as processors are added to the system. As a r e s u l t of t h i s unique feature, an array of 16 Advanced F l e x i b l e Processors may provide o v e r a l l system bandwidth f o r intercommunications of 26 b i l l i o n b i t s per second. AFP Performance B e n e f i t s . Comparisons between the performance of the Advanced F l e x i b l e Processor and other current super computers have been made on the image processing Change Detection Algorithm. The Advanced F l e x i b l e Processor has been determined to be approximately 2,000 times f a s t e r than a CDC 6600 on the Change Detection Algorithm, and to provide approximately 100 times the c a p a b i l i t y of the CDC 7600 computer. The Advanced F l e x i b l e Processor i s found to perform 20 times f a s t e r than i t s predecessor, the F l e x i b l e Processor. In terms of cost e f f e c t i v e n e s s , the Advanced F l e x i b l e Processor appears to be at l e a s t two orders of magnitude more c o s t - e f f e c t i v e than any of the current super computers on the Change Detection Algorithm, and one order of magnitude more c o s t - e f f e c t i v e than i t s predecessor, the F l e x i b l e Processor. A r c h i t e c t u r a l Concepts The f o l l o w i n g d i s c u s s i o n w i l l review some of the issues r e l a t e d to the choice of a m u l t i p r o c e s s i n g s o l u t i o n f o r those problems f o r which general-purpose uniprocessors do not provide adequate s o l u t i o n s .
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
248
SUPERCOMPUTERS IN
CHEMISTRY
P i p e l i n e d P r o c e s s i n g . Consider a processing f a c i l i t y composed of a s i n g l e processor to which i s presented an incoming stream of data elements. Computations are to be performed upon these incoming data elements, and output r e s u l t s are to be provided on a r e a l or near r e a l time b a s i s (Figure l a ) . When the number of operations to be performed on the incoming data elements increases to the point where a s i n g l e processor cannot provide output r e s u l t s w i t h i n the r e q u i r e d r e a l or near r e a l time c o n s t r a i n t s , or where an input backlog i s s t e a d i l y growing, then i t would be required that the compute power of the s i n g l e processor be augmented by the a d d i t i o n of processors i n t o the system to work j o i n t l y on the common task at hand. The common task would be p a r t i t i o n e d among the added processors i n a p i p e l i n e f a s h i o n where each processor would operate only upon a s i n g l e s e r i a l stage of the e n t i r e computation, and would pass i t s intermediate r e s u l t s onto the next cooperative processing element, which would be working on the next s e q u e n t i a l stage of the computation (Figure l b ) . As each computational stage completes the processing of a data element, the next data element i n sequence may be input to that stage, and stage processing i n i t i a t e d . The p i p e l i n e i s " f u l l " when there i s a data element simultaneously being processed through each and every stage of the p i p e l i n e . Each of the N processors, corresponding to the N stages of the p i p e l i n e , would then be busy, c o n t r i b u t i n g to the t o t a l processing power brought to bear on the problem. P a r a l l e l P r o c e s s i n g . I f the incoming data r a t e of the proposed system were to increase to a p o i n t beyond the i n d i v i d u a l I/O c a p a b i l i t y of a s i n g l e processor, then i t would be r e q u i r e d that processors be added to the computation i n a p a r a l l e l f a s h i o n , each performing i d e n t i c a l operations on a p a r a l l e l set of data elements (Figure l c ) . In summary, one can s t a t e that when the number of i n s t r u c t i o n s i n a p a r t i c u l a r a l g o r i t h m increases beyond the c a p a b i l i t y of a s i n g l e processor to provide r e a l time or near r e a l time r e s u l t s , then a d d i t i o n a l r e q u i r e d processors would be added i n a p i p e l i n e f a s h i o n , whereas when the I/O r a t e of an i n d i v i d u a l processor i s exceeded, then processing elements are r e q u i r e d to be added i n a p a r a l l e l f a s h i o n . The Modular Change D e t e c t i o n system developed by the Information Sciences D i v i s i o n c o n s i s t e d of four p i p e l i n e s with ten processors i n each pipe. General M u l t i p r o c e s s o r Taxonomies In general i t i s not adequate to simply provide c a p a b i l i t y for only p a r a l l e l or p i p e l i n e c o n f i g u r a t i o n s of processing elements, or f o r that matter some p a r a l l e l - p i p e l i n e d combination thereof. Algorithms are g e n e r a l l y more complex than that, and r e q u i r e more complex feedback paths, such as exemplified i n r e c u r s i v e types of algorithms.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
•
UNIPROCESSOR
-> •
•
c m
OUTPUT
•
P
—> • D D Q
Figure 1.
Pipelined and parallel processing.
PARALLEL MULTIPROCESSING
i i i i i n n • —^[T]—> •
H D D
PIPELINED MULTIPROCESSING
T T
D-^|I}^II]^(ZHIH1]^DDCII
ODD •
DDD
NPUT STREAM
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
250
SUPERCOMPUTERS IN CHEMISTRY
Four g e n e r a l i z e d types o f intercommunications a r c h i t e c t u r e for m u l t i p r o c e s s i n g that may be considered are shown i n Figure 2 and are: 1) a g l o b a l bus i n t e r c o n n e c t i o n a r c h i t e c t u r e where a l l communication between the processors occurs on a s i n g l e , common data bus; 2) a f u l l y interconnected a r c h i t e c t u r a l scheme where each cooperating processing element has a unique data path to every other processing element i n the array; 3) a shared memory type of processing element i n t e r c o n n e c t i o n where a l l communication and data t r a n s f e r occurs through a common, shared memory f a c i l i t y ; and 4) a r i n g connected a r c h i t e c t u r a l concept which c o n s i s t s o f a c i r c u l a r i n t e r c o n n e c t i o n o f processing elements, where each processing element i s d i r e c t l y connected to i t s two neighboring processors i n the r i n g . The Advanced F l e x i b l e Processor uses the l a t t e r two i n t e r c o n n e c t i o n schemes. The Advanced F l e x i b l e Processor uses both a dual, c o u n t e r - r o t a t i n g r i n g i n t e r c o n n e c t i o n system, as w e l l as a common shared memory f a c i l i t y . Each of the p r e v i o u s l y mentioned i n t e r c o n n e c t i o n a r c h i t e c t u r e s possess c h a r a c t e r i s t i c strengths and weaknesses, r e q u i r i n g e v a l u a t i o n on the b a s i s o f s e v e r a l c r i t e r i a . These a r c h i t e c t u r e s may be evaluated on the b a s i s o f : 1) system r e l i a b i l i t y , 2) expansion c a p a b i l i t y , and 3) c o s t . System Interconnect R e l i a b i l i t y . From the standpoint o f r e l i a b i l i t y , the shared memory system i n the g l o b a l bus both have problems i n the area o f s i n g l e - p o i n t f a i l u r e s . Ifa f a i l u r e of the bus or the c e n t r a l memory occurs, the e n t i r e system i s i n c a p a c i t a t e d . A r i n g system, when bypass hardware i s employed, demonstrates very good f a u l t t o l e r a n t c h a r a c t e r i s t i c s . The f u l l y interconnected system i s the best o f four systems considered i n the area of f a u l t tolerance since each processor has a dedicated path to every other processor f o r intercommunications. System Interconnect E x p a n d a b i l i t y . From the standpoint o f expansion l i m i t a t i o n s , the shared memory system has problems i n that the number o f ports are f i x e d . Expanders can be used to a l l e v i a t e t h i s problem to some degree, but p h y s i c a l c o n s t r u c t i o n problems are u l t i m a t e l y met. A l s o , the memory bandwidth o f the shared memory system i s f i x e d and i s r e l a t i v e l y slow, thus l i m i t i n g the degree o f p r a c t i c a l expansion. A g l o b a l bus system has l i m i t e d fanout c a p a b i l i t i e s ; e l e c t r i c a l problems are g e n e r a l l y encountered a f t e r a r e l a t i v e l y low number of processors are added to the system. A l s o , the g l o b a l bus system demonstrates the lowest bandwidth c a p a b i l i t y of a l l of the systems, since a l l of the processing elements used the common shared bus. In f a c t , the operating bandwidth o f the g l o b a l bus system w i l l never reach i t s t h e o r e t i c a l maximum due to the i d l e time spent while processors access the bus, r e l e a s e the bus, and r e s o l v e bus access c o n f l i c t s .
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
15. COLTON
Advanced Flexible Processor
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
251
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
252
SUPERCOMPUTERS IN CHEMISTRY
The f u l l y interconnected system i s l i m i t e d i n that the number of ports are f i x e d with respect to the processing elements. While expanders can be used, ultimate p h y s i c a l problems w i l l be encountered. In the r i n g system, the bandwidth between adjacent processors i s f i x e d ; howevet, u t i l i z i n g the s p e c i a l c h a r a c t e r i s t i c s of a r i n g connected a r c h i t e c t u r e provides system intercommunication bandwidths which tend towards the a r i t h m e t i c product of t h i s f i x e d interprocedure bandwidth and the number of processors i n the system. Thus, a proportionate increase i n supporting intercommunications bandwidth i s a v a i l a b l e as processors are added to the system. Expansion w i t h i n a r i n g connected system i s , of course, v i r t u a l l y u n l i m i t e d and has a very low cost impact. System Interconnect Cost Performance. One may o b t a i n a measure of the cost e f f e c t i v e n e s s of the four g e n e r a l i z e d a r c h i t e c t u r e s by p l o t t i n g the cost to throughput r a t i o f o r each a r c h i t e c t u r e as a f u n c t i o n of the number of processors i n the system (Figure 3 ) . The shared memory system i s the most expensive of the four g e n e r a l i z e d a r c h i t e c t u r e s , with the g l o b a l bus system coming i n at c l o s e second. The f u l l y interconnected system i s about 5 times more c o s t - e f f e c t i v e than a g l o b a l bus approach f o r a 30-processor system; however, the r i n g system i s superior to a l l when the process i s p a r t i t i o n e d to take advantage of the unique bandwidth c h a r a c t e r i s t i c s that a r i n g connected a r c h i t e c t u r e provides. Advanced F l e x i b l e Processor
Architecture
The Advanced F l e x i b l e Processor i s a unique and powerful a r c h i t e c t u r e p r o v i d i n g an extremely high degree of f l e x i b i l i t y and c o s t - e f f e c t i v e n e s s . I t c o n s i s t s of 16 r e l a t i v e l y autonomous f u n c t i o n a l u n i t s interconnected by a power 16 x 18 p o r t , crossbar i n t e r c o n n e c t . Each of the data paths interconnected by the crossbar i s 16 b i t s wide. Table 1 describes the f u n c t i o n a l u n i t breakdown of the Advanced F l e x i b l e Processor. A conceptualized f u n c t i o n a l o r g a n i z a t i o n of the AFP i s shown i n Figure 4.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Advanced
Flexible
Processor
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
COLTON
Figure 3.
Normalized system cost/throughput vs. number of processors.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Figure 4.
Functional organization of the advanced flexible processor.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
i
H W
C
§
w
C
C/5
4*
L/X
N>
15.
COLTON
Advanced
Flexible
TABLE I .
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
Number of
255
Processor
FUNCTIONAL UNIT BREAKDOWN
Type of F u n c t i o n a l Unit
Units
Number o f Pipelined Segments
2
E x t e r n a l Memory Access Unit
1
2
Ring Port I/O Units
1
1
Control Unit
2
2
Adders Unit
2
1
M u l t i p l i e r Unit
3
2
S h i f t Boolean/Logic Unit
2
4
2K Data Memory Units
2
2
8 Word F i l e Registers
2
Computations may be streamed through the Advanced F l e x i b l e Processor very e f f i c i e n t l y due to dual I/O port c h a r a c t e r i s t i c s of the i n t e r n a l a r c h i t e c t u r e . Data elements may be independently streamed i n and out of the Advanced F l e x i b l e Processor through any one or a l l of the four I/O channels. For example, data may be streamed i n through one o f the memory I/O channels, computations performed, and then streamed out through one of the other three I/O channels simultaneously. M u l t i f u n c t i o n a l P a r a l l e l i s m . The i n t e r n a l a r c h i t e c t u r e of the Advanced F l e x i b l e Processor allows m u l t i p l e computational streams to be constructed and executed i n p a r a l l e l . By way of example, one might imagine the m u l t i p l y u n i t requesting operands from one of the memory I/O ports and one of the data memories, while at the same time an adder may be requesting the product computed by the m u l t i p l i e r on a previous machine c y c l e and another data element from one o f the remaining three data memories to serve as input operands f o r an a d d i t i o n operation. At the same time, the remaining adder may be using the sum produced by the f i r s t adder on a previous machine c y c l e and the r e s u l t from a s h i f t boolean operation to perform a s u b t r a c t i o n . The ultimate g o a l , of course, i s to get as many f u n c t i o n a l u n i t s
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
256
SUPERCOMPUTERS IN CHEMISTRY
executing simultaneously as p o s s i b l e and thereby a c h i e v i n g the highest concurrency of execution. Each of the i n t e r n a l f u n c t i o n a l u n i t s o f the AFP are I/O b u f f e r e d to t h e i r r e s p e c t i v e crossbar ports as shown i n Figure 5. Each f u n c t i o n a l u n i t i s equipped with input l a t c h r e g i s t e r s , b u f f e r i n g the crossbar i n p u t s , and output l a t c h r e g i s t e r s , b u f f e r i n g the f u n c t i o n a l u n i t outputs to the crossbar. T h i s design allows the intermediate storage of v a r i a b l e s between the f u n c t i o n u n i t s and thus allows the f u n c t i o n a l u n i t s o f the AFP to be p i p e l i n e d together with the maximum f l e x i b i l i t y . Single or m u l t i p l e p i p e l i n e d chains are e a s i l y supported through the crossbar as a r e s u l t of t h i s method o f " d i r e c t data hand-off" between the f u n c t i o n a l u n i t s . Advanced F l e x i b l e Processor Performance The machine c y c l e time o f the Advanced F l e x i b l e Processor i s 20 nanoseconds. Every f u n c t i o n a l u n i t can provide r e s u l t s every 20 nanoseconds. Thus, 50 m i l l i o n 16-bit m u l t i p l i e s , 200 m i l l i o n 16-bit data memory r e f e r e n c e s , and 100 m i l l i o n 16-bit adds or s u b t r a c t s , e t c . can be performed every second. The maximum o p e r a t i o n a l speed of the Advanced F l e x i b l e Processor, t h e r e f o r e , i s 800 m i l l i o n operations per second when a l l 16 f u n c t i o n a l u n i t s are executing. AFP I/O Performance. The r i n g port I/O u n i t provides the i n t e r f a c e f o r each Advanced F l e x i b l e Processor to the r i n g interconnect system. Two r i n g ports are provided to each Advanced F l e x i b l e Processor and thus the c a p a b i l i t y f o r d u a l - r i n g i n t e r c o n n e c t i o n systems e x i s t s . The r i n g port I/O u n i t handles a l l of the data management, s y n c h r o n i z a t i o n , and p r o t o c o l required to communicate on the r i n g system without i n t e r r u p t i n g the a r i t h m e t i c processing o f the Advanced F l e x i b l e Processor. The e x t e r n a l memory access u n i t provides the i n t e r f a c e between the AFP and the c e n t r a l , high-performance, random access memory s t o r e . Each e x t e r n a l memory access u n i t can provide peak data I/O r a t e s of 3.2 b i l l i o n b i t s per second and sustained I/O r a t e s o f 800 m i l l i o n b i t s per second. Thus, the t o t a l sustained c a p a b i l i t y of an Advanced F l e x i b l e Processor from the two r i n g port I/O u n i t s and the two e x t e r n a l memory access u n i t s i s 3.2 b i l l i o n b i t s per second. AFP Computational Performance. The m u l t i p l y u n i t o f the Advanced F l e x i b l e Processor provides the c a p a b i l i t y to produce two 16-bit products or one 32-bit product every 20 nanosecond machine c y c l e . The m u l t i p l i e r a l s o provides the c a p a b i l i t y to do p o p u l a t i o n and s i g n i f i c a n t counts. The two adders provide the c a p a b i l i t y o f performing four 8 - b i t adds, two 16-bit adds, or one 32-bit add every 20 nanosecond machine c y c l e . The s h i f t
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
FROM > INSTRUCTION
CROSSBAR
CROSSBAR
X
Figure 5.
CONTROL
V
COMPARE REGISTER(CR)
Register level organization of a generic AFP functional unit.
Z =
0
ADDRESS COMPARE
OVERFLOW
MOST S I G B I T (MSB)
CONDITION LATCH
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
258
SUPERCOMPUTERS IN CHEMISTRY
boolean u n i t s allow b a r r e l s h i f t s o f up to 15 b i t s performed every 20 nanosecond machine c y c l e and i s capable o f performing a l l of the 16 b a s i c boolean l o g i c f u n c t i o n s . Each data memory allows the reading or w r i t i n g of one 16-bit word every machine c y c l e . The f i l e memories allow the reading and w r i t i n g o f four 16-bit words every machine c y c l e . The c o n t r o l u n i t manages program execution and handles branching and accessing o f programming i n s t r u c t i o n s . The i n d i v i d u a l program memory w i t h i n the c o n t r o l u n i t o f each AFP c o n s i s t s of 1,024 program i n s t r u c t i o n s . Each program i n s t r u c t i o n i s 200 b i t s wide and provides the c a p a b i l i t y o f i s s u i n g 39 i n s t r u c t i o n p a r c e l s every 20 nanoseconds. The c o n t r o l bandwidth o f the AFP i s thus very h i g h , and allows a f l e x i b i l i t y c o n t r o l f l e x i b i l i t y i n c o n t r o l f o r the easy management and execution o f the 16 f u n c t i o n a l u n i t s and the crossbar r e c o n f i g u r a t i o n on a machine c y c l e b a s i s . As a r e s u l t , the Advanced F l e x i b l e Processor i s capable o f performing 100 m i l l i o n , 250 m i l l i o n , or 500 m i l l i o n a r i t h m e t i c and l o g i c operations every second i n the 32-bit, 16-bit, or 8 - b i t modes o f operation r e s p e c t i v e l y . Comparison T e s t i n g . Latching r e g i s t e r s as shown i n Figure 5 are a l s o provided w i t h i n each f u n t i o n a l u n i t f o r the storage of input comprand values. Arithmetic computations and t e s t i n g of r e s u l t a n t outputs can thus be concurrently performed w i t h i n a l l of the f u n c t i o n a l u n i t s . The current c o n d i t i o n a l status o f each f u n c t i o n a l u n i t can therefore be provided to the c o n t r o l u n i t every machine c y c l e f o r branch d e c i s i o n processing. When counting a l l of the a r i t h m e t i c and l o g i c operations plus a l l of the comparison r e s u l t s provided by each o f the f u n c t i o n a l u n i t s , one f i n d s the Advanced F l e x i b l e Processor capable of performing an astounding 2.9 b i l l i o n , 8 - b i t operations per second. This compute c a p a b i l i t y represents the upper t h e o r e t i c a l l i m i t f o r the Advanced F l e x i b l e Processor. On average, a t y p i c a l computational process can keep four of the a r i t h m e t i c f u n c t i o n a l u n i t s plus s e v e r a l memory and I/O u n i t s busy concurrently, allowing a s i n g l e AFP to achieve an average computational rate o f about 200 to 250 m i l l i o n 16-bit a r i t h m e t i c and l o g i c a l operations per second. The features provided by a s i n g l e AFP are summarized i n Table 2. The very modular c o n s t r u c t i o n o f AFP systems and of the AFP i t s e l f allows for very c o s t - e f f e c t i v e system implementation. The m o d u l a r i z a t i o n of f u n c t i o n a l u n i t s about the crossbar interconnect allows the enhancement o f AFP performance s p e c i f i c a t i o n by r e p l a c i n g e x i s t i n g f u n c t i o n a l u n i t s with s p e c i a l i z e d f u n c t i o n a l u n i t s designed s p e c i f i c a l l y to meet performance requirements. T y p i c a l examples of s p e c i a l i z e d functions are: f a s t f o u r i e r transform u n i t s , f l o a t i n g point add, m u l t i p l e , and divide/square root u n i t s .
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
15.
COLTON
Advanced
Flexible
Processor
259
TABLE II SINGLE AFP FEATURES
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
FEATURE
ADVANTAGE
250 MILLION ARITHMETIC COMPUTATIONS PER SECOND FOR EACH AFP
EXPANDABLE COMPUTE POWER TO MATCH APPLICATION
FUNCTIONALLY DESIGNATED INTERMEDIATE OPERAND REGISTERS
ALLOWS UNINTERRUPTED COMPUTATION STREAMING, ELIMINATING REGISTER RESERVATION HICCUPS
DIRECT DATA HAND-OFF BETWEEN 16 FUNCTIONAL UNITS THROUGH CROSSBAR SWITCH
PROVIDES BROADEST CAPABILITY FOR MULTIPLE CHAINING WITH NO REQUIREMENTS ON OPERAND INDERDEPENDENCE
DATA FAN OUT OF 1:16 ON ALL FUNCTIONAL UNITS
ELIMINATES OPERAND CONTENTION, ALLOWING MULTIPLE USE OF A SINGLE OPERAND IN ONE MACHINE CYCLE.
FOUR INDEPENDENT DATA MEMORIES PROVIDING CONCURRENT ACCESS AND COMBINED CAPABILITY TO SUPPLY 16 INPUT REQUESTS SIMULTANEOUSLY
PROVIDES 8 KB OF CIRCULATING VECTOR STORAGE, AVOIDING COSTLY VECTOR LENGTH START-UP TIMES
FOUR INDEPENDENT I/O PORTS PROVIDING SIMULTANEOUS READ/WRITE ACCESS TO HPR MEMORY
ELIMINATES VECTOR LENGTH HICCUPS IN COMPUTATION STREAM PEAK BANDWIDTH 8 BILLION BITS/SECOND SUSTAINED BANDWIDTH 3.2 BILLION BITS/SECOND
200 BIT WIDE INSTRUCTION PACKET
INSTRUCTION ISSUE RATE IS 39 INSTRUCTION PARCELS /CYCLE OR 2 BILLION INSTRUCTIONS PER SECOND
TRANSPARENT SINGLE LEVEL INTERRUPT EXCHANGE MANAGEMENT
NO SPECIAL INTERRUPT EXCHANGE SOFTWARE PACKAGES REQUIRED
INSTRUCTION CACHE SIZE OF 1024 INSTRUCTION PACKETS, EACH 200 BITS WIDE
40 THOUSAND INSTRUCTION PARCELS PER PROGRAM INTERVAL
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
260
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
AFP
SUPERCOMPUTERS IN CHEMISTRY
System A r c h i t e c t u r e
System arrays of Advanced F l e x i b l e Processors are l i n k e d together and synchronized v i a f a c i l i t i e s provided by the r i n g port f u n c t i o n a l u n i t s . Data elements 16 b i t s wide, along with 12 b i t s of c o n t r o l information, are passed between r i n g ports on adjacent AFP's. The c o n t r o l information provides a l l of the a s s o c i a t e d addressing information to d e f i n e the s i n g l e processor or subset of system processors to which the message i s to be sent. Information i d e n t i f y i n g the appropriate data r e g i s t e r f i l e i n which the incoming data element i s to be stored i s a l s o contained w i t h i n the c o n t r o l f i e l d of the r i n g packet. Each data memory i s capable of d e f i n i n g 16 independent data f i l e s . Designated b i t s w i t h i n the c o n t r o l f i e l d provide i n t e r p r o c e s s o r s y n c h r o n i z a t i o n information as w e l l . F a c i l i t i e s w i t h i n the r i n g port provide the l o g i c c a p a b i l i t i e s to use these designated b i t s to achieve cross f i l e s y n c h r o n i z a t i o n . These features assure that a processor i s not capable of beginning a computational task u n t i l the appropriate s i n g l e data f i l e or set of data f i l e s which are to be used as operands i n the pending computation are stored away i n the processor. These synchronizing c o n t r o l features a l s o prevent another processor from o v e r - w r i t i n g f i l e s w i t h i n a computing processor. Input and output FIFO b u f f e r i n g provides e l a s t i c i t y i n communication between processors on the r i n g systems to minimize processor i d l e time. Thus, due to the b u i l t i n c a p a b i l i t i e s of the r i n g port f u n c t i o n a l u n i t s , the processing elements are released from the i n f l e x i b l e l o c k - s t e p s y n c h r o n i z a t i o n r e q u i r e d of other s i n g l e i n s t r u c t i o n , m u l t i p l e data stream (SIMD) machines and m u l t i p l e i n s t r u c t i o n , m u l t i p l e data stream (MIMD) machines. F u r t h e r , the system allows f o r the c o n s t r u c t i o n of m u l t i p l e e l a s t i c p i p e l i n e s to be created across system AFP's, which f u n c t i o n as powerful processing elements i n the dual r i n g connected a r c h i t e c t u r e . A Minimum AFP System. A minimum AFP system c o n f i g u r a t i o n would c o n s i s t of a host computer, p r e s e n t l y a PDP 11/70, communicating with a s i n g l e AFP v i a a modified r i n g port i n t e r c o n n e c t i o n , MRP/C (Figure 6 ) . An AFP operating as an attached processor i n t h i s c o n f i g u r a t i o n would enhance system performance of the host processor by p r o v i d i n g a c a p a b i l i t y of 250 m i l l i o n a d d i t i o n a l a r i t h m e t i c computations per second. The r i n g port i n t e r f a c e u n i t s through which which r i n g s of AFP's may be interconnected are i n d i c a t e d i n f i g u r e 6 by the a b b r e v i a t i o n RPO. M u l t i p r o c e s s o r AFP Systems. AFP's can be e a s i l y added to the minimum system shown i n F i g u r e 6. A t y p i c a l multiprocessor expansion i s shown i n Figure 7. AFP's are interconnected on the
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
15.
COLTON
Advanced
Flexible
261
Processor
DISK TAPE CRT
PDP 11/70
MRP/C
RP(M) x
AFP
M A
mi
RP(S) Figure 6.
Minimum AFP system configuration.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Figure 7.
1 HPR MEM 1/8 MB
I HPR MEM 1/8 MB
"
DISK TAPE CRT
Typical AFP system configuration showing capabilities for expansion.
1 HPR MEM 1/8 MB
COMPUTER
HOST
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
3
H
o
3
c H m
i
c
to to
15.
COLTON
Advanced
Flexible
Processor
263
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
host r i n g with each a d d i t i o n a l AFP augmenting the computational c a p a b i l i t i e s of the system by 250 m i l l i o n a r i t h m e t i c operations per second. An a d d i t i o n a l r i n g i n t e r c o n n e c t i o n channel, shown i n F i g u r e 7, i s a l s o provided f o r i n t e r p r o c e s s o r communication and c o n t r o l . Up to 256 Advanced F l e x i b l e Processors can be supported on each system r i n g . C e n t r a l i z e d High Performance Memory. A m u l t i p r o c e s s o r system of AFPs may share a common, high-performance random access memory store (HPR) between processors. A l l system HPR requests sent from the e x t e r n a l memory access u n i t s (XMAU) of the AFP's are managed by the Storage Access C o n t r o l l e r (SAC). M u l t i p l e SAC's may be employed as memory requirements are expanded. Each SAC i s capable of t r a n s f e r r i n g data to and from the AFP array at a sustained rate of 6.4 b i l l i o n b i t s per second. This c e n t r a l i z e d , high-performance memory store may be expanded from 125 k i l o b y t e s to 16 m i l l i o n bytes, p r o v i d i n g a maximum memory bandwidth of 12.8 b i l l i o n bytes per second. The advanced technique of processor intercommunications s i g n i f i c a n t l y reduces processor i d l e time. Processor i d l e time i s f u r t h e r reduced through a s o p h i s t i c a t e d h i e r a r c h i c a l approach to mass memory and I/O management, which ensures continuous data support to the processing elements and a continuous computational flow. A l l memory and communication paths are designed to support extremely high bandwidths. C e n t r a l i z e d Mass Memory F a c i l i t y . In a d d i t i o n to the high-performance c e n t r a l memory, AFP systems may be configured to provide a c e n t r a l i z e d mass memory f a c i l i t y composed of slower, r e l a t i v e l y inexpensive memory technologies that can be accessed by each system AFP. The mass memory h i e r a r c h y can be configured to meet i n d i v i d u a l requirements and may include MOS random-access storage, d i s k s , tapes, and memory a r c h i v e s , as w e l l as high-speed i n t e r f a c e s to general p e r i p h e r a l I/O dvices and d i s p l a y s t a t i o n s . The MOS Memory technology provides low-cost random access storage at a f r a c t i o n of a cent per b i t . This cost compares to that of the higher performance technology used i n the HPR c e n t r a l memory of 3-4 cents per b i t . Sustained read/write data r a t e s to and from MOS memory can exceed 1,000 m i l l i o n b i t s per second. MOS c a p a c i t y may be expanded from 256 k i l o b y t e s to 1 b i l l i o n bytes to provide ample random access storage at a low cost per b i t r a t i o to c o s t - e f f e c t i v e l y meet the storage requirements f o r very large problems. AFP
System Performance f o r S p e c i f i c A p p l i c a t i o n s
A number of s p e c i f i c a p p l i c a t i o n s f o r the AFP have been s t u d i e d at Informationa Sciences D i v i s i o n . The performance of s i n g l e and m u l t i p r o c e s s o r systems of AFP's has been assessed f o r these
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
264
SUPERCOMPUTERS IN CHEMISTRY
applications. The computational performance of the Advanced F l e x i b l e Processor on a r e p r e s e n t a t i v e s e t o f these alogorithms i s shown i n Table 3. Beyond these areas of i n v e s t i g a t i o n there are yet broader a p p l i c a t i o n s f o r the Advanced F l e x i b l e Processor that are being i n v e s t i g a t e d . Data r e t r i e v a l systems(80 as w e l l as f l o a t i n g point a p p l i c a t i o n s are s t a r t i n g to be addressed.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
AFP
Software
Facilities
The programming of the AFP system i s p r e s e n t l y done through the use of two very powerful software development t o o l s . The f i r s t of these t o o l s i s the AFP cross assembler, MICA. The second t o o l i s the AFP i n s t r u c t i o n l e v e l simulator, ECHOS. The MICA cross assembler and the ECHOS i n s t r u c t i o n l e v e l simulator allows a l l programming to be done " o f f - l i n e . " AFP programs can be w r i t t e n using the e d i t o r f a c i l i t i e s o f e i t h e r a PDP 11/70 or a C o n t r o l Data Cyber 700 s e r i e s computer. The e d i t e d f i l e s are then processed by the MICA cross assembler. MICA checks f o r a l l i l l e a g a l l e x i c a l and syntax usages as w e l l as i l l e a g a l hardware usages. F u n c t i o n a l u n i t and cross bar usage c o n f l i c t s are i d e n t i f i e d to the programmer through the f a c i l i t i e s o f MICA. MICA produces a b i n a r y f i l e of the submitted program which runs d i r e c t l y on the Advanced F l e x i b l e Processor. The b i n a r y f i l e produced by MICA a l s o runs d i r e c t l y on ECHOS the AFP i n s t r u c t i o n simulator. ECHOS provides a r e g i s t e r l e v e l s i m u l a t i o n of the submitted program. ECHOS i n t e r a c t i v e l y executes the program i n software p r e c i s e l y the way the program w i l l run i n the Advanced F l e x i b l e Processor. A programmer can s i n g l e step through h i s program s p e c i f y i n g the p r i n t out of a l l or a s e l e c t e d s e t o f f u n c t i o n a l r e g i s t e r s i n the AFP. The accuracy, power, and d e t a i l of the ECHOS simulator allows a programmer to c o n f i d e n t l y expect h i s program to run the very f i r s t time i t i s run on an AFP. Thus, programming a c t i v i t i e s can be c a r r i e d out with no i n t e r r u p t i o n to u s e f u l AFP system data processing. Development of higher l e v e l programming languages f o r the AFP i s c u r r e n t l y underway. FORTRAN, ADA, and the data flow language VAL which has been developed at the Massachusetts I n s t i t u t e of Technology and the Lawrence Livermore N a t i o n a l Laboratory(£) are a l l candidates to be supported. Summary The Advanced F l e x i b l e Processor i s a unique entry i n t o the multiprocessing f i e l d . I t provides the dynamic c a p a b i l i t i e s o f f e r e d by an MIMD machine with advanced features provided by the i n t e r p r o c e s s o r r i n g communications network; e f f i c i e n t u t i l i z a t i o n o f the system processors i s therefore e f f e c t e d . Within each Advanced F l e x i b l e Processor, dynamic m u l t i p l e chaining can be achieved due to the s u p e r i o r f l e x i b i l i t y o f the
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
(1024 X 1024)
(32 X 64) ARRAY
MATRIX TRANSPOSE
(50 X 50) POINTS
MATRIX INVERSE
(128 X 128) POINTS
MULTISPECTORAL CLASSIFICATION
(55 X 80) ELEMENTS
2-DIMENSIONSAL MATRIX DECONVOLUTION
1
1
16
1
20 ns PER
5.0 msec
2 usec/POINT
20 nsec/POINT
21 msec
20 usee
OPERATIONS/SEC
PER POINT
10 nsec/POINT
2.58 BILLION
8 msec
6.6 msec
1240 OPERATIONS
COMPUTATION
480 ns/POINT
MULTI PLY-ADD
10 MILLION COMPARES
0.4 sec
40 ns/PAIR
1
GEOLOCATION, 100,000 MESSAGES
100 LOCATIONS OF INTEREST
0.1 msec
20 ns/BUTTERFLY
4
16 BIT ACCURACY
TIME 0*4 msec
RATE 80 ns/BUTTERFLY
OF AFP'S
TOTAL
1
KERNAL
NUMBER
COMPLEX FFT, 1024 POINT
APPLICATION
TABLE I I I AFP APPLICATION PERFORMANCE
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
266
SUPERCOMPUTERS IN
CHEMISTRY
i n t r a - p r o c e s s o r crossbar. M u l t i p l e f u n c t i o n a l u n i t s can be executed simultaneously with each f u n c t i o n a l u n i t p r o v i d i n g a broad range of i n s t r u c t i o n defined o p e r a t i o n a l c a p a b i l i t i e s . F u n c t i o n a l u n i t make up w i t h i n an Advanced F l e x i b l e Processor can be v a r i e d and optimized f o r v a r i a b l e data s e t s . M u l t i p l e comparisons are a v a i l a b l e w i t h i n each machine c y c l e f o r simultaneous m u l t i p l e c o n d i t i o n sensing. Special-purpose f u n c t i o n a l u n i t s can replace e x i s t i n g f u n c t i o n a l u n i t s w i t h i n the Advanced F l e x i b l e Processor, a l l o w i n g processor c a p a b i l i t i e s to be t a i l o r e d to the p r e c i s e a p p l i c a t i o n requirements. Modular system c o n s t r u c t i o n allows compute power modularity; thus, processing systems can be c o s t - e f f e c t i v e l y t a i l o r e d to the users i n d i v i d u a l requirements. Future Computational Trends. The trends i n computational requirements over the l a s t 25 years has increased l o g a r i t h m i c a l l y by an order of magnitude every 8 years. Users w i l l continue to demand higher performance, computational f a c i l i t i e s at a r a t e matching or surpassing that of the previous 25 years. The demand appears to be i n s a t i a b l e as long as cost e f f e c t i v i t y can be sustained. Semiconductor technology has been able to meet these demands over the past 25 years, however, presently there appears to be a slowing i n the r a t e of t e c h n o l o g i c a l advances w i t h i n the semiconductor area. A c i r c u i t d e n s i t y increase by a f a c t o r of two every year as p r e d i c t e d by Moore's law i s not p r e s e n t l y being met, due to the increased problems i n semiconductor f a b r i c a t i o n that are being confronted. There i s l i t t l e evidence that t h i s trend w i l l reverse i t s e l f over the coming years. The current trend i n d i c a t e d by the widening of t h i s t e c h n o l o g i c a l gap, seems to i n d i c a t e that the only way to meet the computational requirements of the s c i e n t i f i c community i n the mid 1980s i s through the a p p l i c a t i o n of m u l t i p r o c e s s o r technology. Developing the s k i l l s to employ multiprocessor technology to solve the l a r g e s c i e n t i f i c problems that are p r e s e n t l y being proposed w i l l provide the foundation to bridge the computational gap between 1985 and 1990. Future advances i n semiconductor technologies w i l l y i e l d increases i n computational speed at the c i r c u i t l e v e l . Those advances w i l l c e r t a i n l y be incorporated i n t o m u l t i p r o c e s s i n g hardware, and thus the s k i l l s we develop to employ m u l t i p r o c e s s i n g i n the 1980's w i l l as w e l l provide the stepping stones to meet the computational demands of the 1990's. Acknowledgements The information presented i n t h i s paper i s a d i r e c t r e s u l t of the c o l l e c t i v e knowledge gained by the personel i n the Information Sciences D i v i s i o n of C o n t r o l Data. T h i s knowledge has been gained from more than nine years of experience i n the f i e l d of m u l t i p r o c e s s i n g .
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
15.
COLTON
Advanced
Flexible
Processor
267
Literature Cited 1.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 4, 2018 | https://pubs.acs.org Publication Date: November 6, 1981 | doi: 10.1021/bk-1981-0173.ch015
2.
3.
4.
5.
6.
7.
8.
9.
A l l e n , G.R., A Reconfigurable A r c h i t e c t u r e f o r Arrays of Microprogrammable Processors, S p e c i a l Computer A r c h i t e c t u r e s f o r P a t t e r n Processing, Purdue U n i v e r s i t y , West L a f a y e t t e , Inc., CRC P u b l i s h i n g Corporation, to be published i n 1981. Hsu, T.T., On the Performance and Cost E f f e c t i v e n e s s of Some Multiprocessor Systems, 1977 I n t e r n a t i o n a l Conference on Parallel Processing, August 1977. Stenshoel, C.R., Production Image Processing System Design Study, C o n t r o l Data Corporation F i n a l Report to Centre N a t i o n a l d'Etudes S p a t i a l e s , May 1977. Juetten, P.G. and A l l e n , G.R., An Image Processor A r c h i t e c t u r e , C o n t r o l Data Corporation, Minneapolis, Minnesota, 1977. A l l e n , G.R. and Juetten, P.G., SPARC - Symbolic Processing Algorithm Research Computer, /Proc. Image Understanding Workshop/, Science A p p l i c a t i o n s , Inc., Report No. SAI-79-814-WA, 1978. A l l e n , G.R., Advanced Image Processing Systems Design Studies, C o n t r o l Data Corporation F i n a l Report to the Rome A i r Development Center, Contract No. F30602-76-C-0362, March 1978. Cyre, W.R., A l l e n , G.R., and Juetten, P.G., Symbolic Processing Algorithm Research Computer Progress Report, C o n t r o l Data Corporation, Minneapolis, Minnesota, 1978. Cyre, W.R., A p p l i c a t i o n s o f a Reconfigurable Array o f F l e x i b l e Processors i n I n t e l l i g e n c e Information R e t r i e v a l , C o n t r o l Data Corporation F i n a l Report to the Rome Air Development Center, Contract No. F30602-78-C-0065, J u l y 1979. Ackerman, W.B. and Dennis, J.B., VAL-A Value-Oriented A l g o r i t h m i c Language: P r e l i m i n a r y Reference Manual, Laboratory f o r Computer Science, Massachusetts I n s i t i t u t e of Technology
RECEIVED July 21, 1981.
Lykos and Shavitt; Supercomputers in Chemistry ACS Symposium Series; American Chemical Society: Washington, DC, 1981.