Description of a High Speed Vector Processor

performed from sequential access devices such as digital magnetic tape or flexible disc .... signal convolution, autocorrelation, spectral signature a...
1 downloads 0 Views 553KB Size
8

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 11, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0057.ch008

Description of a High Speed Vector Processor J. N. BÉRUBÉ and H. L. BUIJS Bomen, Inc., 2371 Nicolas Pinel, Ste.-Foy, Québec, Canada

A high speed vector processor has been developed which, coupled with a low level minicomputer or microprocessor, provides an efficient data reduction facility for Fourier Transform Spectrometry. The vector processor performs the dot product of two arrays at high speed and very high precision. Since in many data reduction applications the computation of dot products present the greatest load to the data processing system, the vector processor will be found useful for a wide range of tasks. In Fourier Transform Spectroscopy the vector processor is used to perform high speed numerical filtering and fourier transformation. Numerical Filter Modern spectrometric applications sometimes demand very high spectral resolution over a relatively large optical bandwidth. A model DA3.003 Fourier Transform Spectrometer manufactured by Bomem Inc. is capable of providing spectral resolution of 0.003cm over the visible to millimeter wavelength region. A large amount of data (millions of elements) is produced when such resolution is recorded over a large optical bandwidths. Fortunately, it is seldom required that such large bandwidths must be analysed completely at one time; in the vast majority of cases it is preferred to limit the analysis to a succession of rather narrow portions of the spectrum which have been judiciously chosen because of the particular information contents of the spectrum. In such cases the interferogram vector, the raw data from the instrument, may be numerically filtered such that only the information contents of the selected portion of the spectrum are retained. This results in reduced number of data points to be Fourier Transformed in order to produce the spectrum. -1

In a p p l i c a t i o n s where l i m i t e d time i s a v a i l a b l e f o r measurement, such as i n a n a l y s i n g substances i n chemical r e a c t i o n , the r a t e o f information generated may become greater than the storage r a t e normally a v a i l a b l e u s i n g data storage devices such as

106 Lykos; Minicomputers and Large Scale Computations ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 11, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0057.ch008

8.

BERUBÉ

AND

107

High Speed Vector Processor

Buijs

magnetic tape o r f l e x i b l e d i s c systems. Real time numerical f i l t e r i n g can be a p p l i e d to reduce the data storage rate to t h a t compatible with the storage system. Numerical f i l t e r i n g a l s o r e duces the volume o f data storage r e q u i r e d f o r a given s e t o f experiments. The high speed v e c t o r processor, developed f o r the above ment i o n e d a p p l i c a t i o n , permits r e a l time numerical f i l t e r i n g t o be performed with i n p u t data r a t e s up t o 200,000 data p o i n t s per second. The numerical f i l t e r i n g process used c o n s i s t s o f numeric a l l y convolving the input data with the impulse response o f a f i l t e r f u n c t i o n having near u n i t y gain over the d e s i r e d s p e c t r a l r e g i o n (σ ± o /2) and h i g h s i g n a l r e j e c t i o n outside t h i s r e g i o n . Such a f i l t e r i s g e n e r a l l y c l a s s i f i e d as "non-recursive" or " f i n i t e impulse response". The general input-output r e l a t i o n s h i p i s o f the form: 0

r

N-1 Yn = Σ a k * x - k k=0 n

0 )

where {Xn} i s the i n p u t data sequence, {Yn} the output data sequen­ ce, and {a^} the c o e f f i c i e n t s o f the f i l t e r , ( i . e . t h e impulse response f u n c t i o n ) . I t may be noted here t h a t non-recursive numerical f i l t e r i n g presents s e v e r a l unique c h a r a c t e r i s t i c s which are o f use t o F o u r i e r Transform Spectroscopy and other a p p l i c a t i o n s . P r i m a r i l y , the f i l t e r f u n c t i o n i s a p p l i e d t o data i n the d i g i t a l sampled domain, the t r a n s f e r f u n c t i o n operates t h e r e f o r e on the s i g n a l as d e t e r ­ mined by the sample source. In the modern F o u r i e r Transform Spectrometer, sampling i s c o n t r o l l e d very p r e c i s e l y from a r e f e r ­ ence i n t e r f e r o g r a m generated by a s t a b l e monochromatic l i g h t source. Since the numerical f i l t e r operates on the sampled data, e r r o r s due t o phase s h i f t s and time frequency v a r i a t i o n s are not i n j e c t e d during the f i l t e r process. This a t t r i b u t e would be ap­ p l i c a b l e t o other a p p l i c a t i o n s where s i g n a l v a r i a t i o n and t h e r e ­ fore sampling i n t e r v a l are f u n c t i o n s o f parameters other than time,. A second c h a r a c t e r i s t i c o f use to F o u r i e r Transform Spectroscopy i s the a b i l i t y t o s h i f t the output sampling f u n c t i o n with r e s p e c t to the s i g n a l . This allows c e n t e r i n g o f the output sample func­ t i o n with r e s p e c t t o s i n g u l a r s i g n a l s . Fourier

Transform

The raw data from the i n t e r f e r o m e t e r p o r t i o n o f a F o u r i e r Transform Spectrometer i s the a u t o c o r r e l a t i o n f u n c t i o n o f the i n ­ cident r a d i a t i o n . Once the f i l t e r i n g process, b a n d l i m i t i n g and sample f u n c t i o n c e n t e r i n g has been performed, the spectrum may be computed. F a s t F o u r i e r Transform (FFT) algorithms are o f t e n used f o r t h i s since they provide means f o r transforming s p e c t r a with a minimum o f numerical o p e r a t i o n s . General computing f a c i l i t i e s

Lykos; Minicomputers and Large Scale Computations ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

108

MINICOMPUTERS

A N DLARGE

SCALE

COMPUTATIONS

with f a s t hardware m u l t i p l y / d i v i d e and random access data storage c a p a c i t y comparable to the v e c t o r l e n g t h can perform the FFT q u i t e rapidly. Consider however the general equation f o r the D i s c r e t e F o u r i e r Transform (DFT) :

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 11, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0057.ch008

A

r

N-1 =Σ X ' e x p (-2Trjrk/N) k=0

(2)

k

where A i s the r c o e f f i c i e n t of the transformed vector (spec­ trum) and Xfc i s the Κ ^ sample of the i n p u t vector ( f i l t e r e d i n ­ t e r ferogram) . One can see that the s t r u c t u r e of the DFT i s i d e n ­ t i c a l to that of the numerical f i l t e r . I f a high speed v e c t o r processor i s needed f o r numerical f i l t e r i n g , the same processor can be used to perform F o u r i e r transformation by u s i n g the DFT algorithm. D i s c r e t e F o u r i e r Transformation has s e v e r a l advantages with respect to F a s t F o u r i e r Transformation. Some examples p a r t i c u l a r ­ l y r e l a t e d to spectroscopy f o l l o w : 1. F o u r i e r Transformation by DFT i n v o l v e s only s e q u e n t i a l access to the data to be transformed whereas the FFT algorithm r e q u i r e s repeated access to d i f f e r e n t p o r t i o n s of the o r i g i n a l vector and to intermediate r e s u l t s . The DFT can therefore be e f f i c i e n t l y performed from s e q u e n t i a l access devices such as d i g i t a l magnetic tape or f l e x i b l e d i s c whereas f o r e f f i c i e n t a p p l i c a t i o n of the FFT, e i t h e r l a r g e random access memory or very high speed d i s c s must be used. 2. Using the DFT any p o r t i o n of the t o t a l s p e c t r a l range may be computed and the s p e c t r a l i n t e r v a l and sample p o s i t i o n s may be chosen at w i l l . I t i s sometimes p o s s i b l e , t h e r e f o r e , to compute a p o r t i o n of the spectrum and begin p l o t t i n g that p o r t i o n while the computation of other p o r t i o n s of the spectrum continues. 3. Computation of one s p e c t r a l p o i n t or s e v e r a l i s o l a t e d p o i n t s i s a l s o p o s s i b l e , t h i s i s sometimes u s e f u l f o r monitoring chemical r e a c t i o n s or mixtures. The p r o c e s s i n g format of the F F T , on the other hand, i s r e l a t i v e l y f i x e d ; i n order to compute one p o i n t i n the spectrum, a l l of the p o i n t s must be computed and the s p e c t r a l i n t e r v a l and sampling p o s i t i o n s are a l s o f i x e d . 4· C a l c u l a t i o n o f the sine or cosine transformation u s i n g the DFT i s performed i n one-fourth the operations r e q u i r e d f o r complex transformation. This i s not true when the FFT i s used. t n

η

Performance f o r DFT As mentioned p r e v i o u s l y , the Fast F o u r i e r Transform i s often chosen f o r spectrum computation since i t provides a minimal number of r e q u i r e d operations (n log^n complex m u l t i p l i c a t i o n s and a d d i ­ t i o n s i n r a d i x 4- system) i n order to a r r i v e at a complete t r a n s ­ formed spectrum. The DFT r e q u i r e s n operations to f u l l y t r a n s 2

Lykos; Minicomputers and Large Scale Computations ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8.

BERUBÉ

AND

Buijs

High Speed Vector Processor

109

form a r e a l v e c t o r using the sine or cosine transform and J+a operations f o r transformation of a complex asymétrie v e c t o r of η p o i n t s . C l e a r l y , when l o n g vectors are to be t o t a l l y transformed the number of operations becomes very l a r g e and the time r e q u i r e d i n c r e a s e s . The r e l a t i v e l y high speed of the v e c t o r processor i s used to keep the p r o c e s s i n g time to a reasonnable value f o r vec­ t o r s of commonly used length ( p l o t t i n g r e s t r i c t i o n s o f t e n l i m i t l e n g t h of output vector) and, where very wide s p e c t r a l i n t e r v a l s are to be transformed, the numerical f i l t e r i s used to f i r s t se­ parate the i n t e r v a l i n t o s e c t i o n s such t h a t computing time i s h e l d w i t h i n acceptable l i m i t s . Table I i l l u s t r a t e s the computa­ t i o n times f o r d i r e c t transformation u s i n g the high speed v e c t o r processor. 32 b i t , f l o a t i n g p o i n t , s p e c t r a l data i s assumed and I/O i s assumed to be v i a standard IBM compatible Floppy D i s c .

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 11, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0057.ch008

1

Hardware The high speed v e c t o r processor i s configured f o r ease o f i n t e r f a c e to most minicomputers or h i g h e r - l e v e l microprocessors. Its f u n c t i o n i s to perform high speed, high p r e c i s i o n , m u l t i p l i ­ c a t i o n and accumulation i n one or more r e g i s t e r s . The c o n t r o l of the system i s performed by the host computer with a minimum number of c o n t r o l t r a n s f e r s . S e r i a l data input i n 16 b i t f l o a t i n g p o i n t format may be accommodated d i r e c t l y from an e x t e r n a l source or may be t r a n s f e r r e d v i a the host computer. Output data i s nor­ mally i n 32 b i t f l o a t i n g p o i n t format f o r DFT and 16 b i t f l o a t i n g p o i n t format when numerical f i l t e r i n g i s performed. A f u n c t i o n a l block diagram of the system i s shown i n Figure 1. The hardware i s composed of the f o l l o w i n g : - a high speed, high p r e c i s i o n m u l t i p l i e r , 0.5us f o r 4-0 b i t pro­ ducts, 0.8ys f o r 64 b i t product. - one programmable length accumulator of up to 64. b i t s - f l o a t i n g to f i x e d p o i n t i n p u t converter - f i x e d p o i n t to f l o a t i n g p o i n t output converter - 4£ x 16 b i t memory organized as FIFO ( f i r s t - i n - f i r s t - o u t ) - 4-K x 16 b i t memory f o r storage of f i l t e r c o e f f i c i e n t s - s i n e / c o s i n e generator f o r up to 64.K d i f f e r e n t values - c o n t r o l l o g i c f o r automatic operation as determined by host computer - host computer i n c l u d i n g at l e a s t 8K bytes of read/write memory, DMA c a p a b i l i t y , and, p e r i p h e r a l storage u n i t such as f l o p p y d i s c or d i g i t a l magnetic tape r e c o r d e r . The c o n t r o l i n f o r m a t i o n f o r the v e c t o r processor can be read­ i l y generated u s i n g any programming language. Computation of the f i l t e r c o e f f i c i e n t s i s a l s o a s t r a i g h t forward task f o r s c i e n t i f i c programmers. For use with F o u r i e r Transform Spectrometers the D i g i t a l Equipment Corporation LSI-11 has been found q u i t e u s e f u l . The LSI-11 has s u f f i c i e n t computing power to be u s e f u l f o r general tasks such as c o n t r o l of other instruments, s c i e n t i f i c computation, process c o n t r o l , e t c . I t supports the time-tested DEC RT-11

Lykos; Minicomputers and Large Scale Computations ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

110

MINICOMPUTERS A N DLARGE

jIncoming Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 11, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0057.ch008

Data

FIFO ΛΚ χ 16

SCALE

COMPUTATIONS

Interface

Processor Initiate Control Parameters

Coefficient RAM Λ Κ χ 16

Data Storage Cosine Generator

F l o a t i n g to F i x e d Converters M u l t i p l i e r s + Shift Registers F i x e d to F l o a t i n g Converter

DACs f o r XY P l o t t e r

User s Terminal 1

DMA Interface

Figure 1.

Vector processor block diagram

Lykos; Minicomputers and Large Scale Computations ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8.

BÉRUBE

AND

Table I.

Buijs

High Speed Vector Processor

111

Computation times f o r d i r e c t transformation u s i n g the high speed v e c t o r processor.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 11, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0057.ch008

Points i n l ) t i m e to compute 2)time to compute 3)time to compute v e c t o r to be at l e a s t one and 2048 s p e c t r a l same number of transformed up to 200 specelements. s p e c t r a l elements t r a l elements.* as i n t e r f e r o g r a m points. 4K 8K 16K 32K 6ΛΚ 128K 256K

1 s 2 s 4 s 8 s 16 s 32 s 64 s

4.2 s 8.5 s 17 s 34 s 1 min 10 s 2 min 20 s 4 min 40 s

8.5

s

34 s , 2 min 30 s min 40 s 36 min 30 s 2 h 30 min 10 h 5 min

9

* This computation time i s l i m i t e d by f l o p p y d i s c input/output time The f i r s t group o f computation times i s u s e f u l f o r quick t u r n around monitoring of a very r e s t r i c t e d s p e c t r a l r e g i o n . The se­ cond group of computation times i n d i c a t e s the rate at which r e a ­ sonable s i z e d p l o t t e r page sets of data may be generated, and the t h i r d group gives times of maximum computing e f f o r t f o r a given s i z e interferogram.

operating system and F o r t r a n compiler. These have been used to supply c o n t r o l to the high speed v e c t o r processor. "While the system was p r i m a r i l y designed to provide s u p e r l a t i v e p r o c e s s i n g accuracy and r e l a t i v e l y high speed i t a l s o provides an economical means f o r accomplishing these t a s k s . Conclusions The high speed v e c t o r processor has been shown to be a high performance device f o r implementation of numerical f i l t e r i n g and f o r performing F o u r i e r Transformation by DFT. Since both of these f u n c t i o n s are r e p e t i t i v e a p p l i c a t i o n s of dot product com­ p u t a t i o n the system may f i n d a p p l i c a t i o n whenever c a l c u l a t i o n of a dot product i s r e q u i r e d . Examples are c o r r e l a t i o n a n a l y s i s , s i g n a l c o n v o l u t i o n , a u t o c o r r e l a t i o n , s p e c t r a l signature a n a l y s i s , and of course general v e c t o r operations.

Lykos; Minicomputers and Large Scale Computations ACS Symposium Series; American Chemical Society: Washington, DC, 1977.