Langmuir 2009, 25, 4145-4153
4145
Analysis of Multibeam Data for Neutron Reflectivity† N. F. Berk*,‡,§ and C. F. Majkrzak§ Department of Materials Science and Engineering, UniVersity of Maryland, and NIST Center for Neutron Research, National Institute of Standards and Technology, 100 Bureau DriVe, Stop 6102, Gaithersburg, MD 20899-6102 ReceiVed August 26, 2008. ReVised Manuscript ReceiVed December 3, 2008 We offer mathematical proof that multiple-beam neutron reflectivity, corresponding to simultaneous collection of data at multiple angles (wavevector transfers) does not perform better, errorwise for counting noise, than single-beam data collection for the same total number of reflected neutronssand may perform much worse, depending on the beam modulation strategy used. The basic idea is that the nominal statistical benefit of summing data at, say, N different wavevector transfers is undone by needing to collect N differently modulated (i.e., weighted) sums in order to extract the reflectivities. To our knowledge, a general proof of this behavior for arbitrary strategies has been lacking. The formal result can be summarized by saying that the best nondiagonal matrix modulation strategies are orthogonal (unitary) matrices, or constant multiples thereof, and that these can do no better than diagonalsi.e., single-beamsstrategies.
Introduction
Inverting Multiple Beam Data
We offer mathematical proof that multiple-beam (or “multiplexed”) neutron reflectivity, corresponding to simultaneous collection of data at multiple angles (wavevector transfers) does not perform better, errorwise for counting (shot) noise, than singlebeam data collection for the same total number of reflected neutronssand may perform much worse, depending on the beam modulation strategy used. The basic idea is that the nominal statistical benefit of summing data at, say, N different wavevector transfers is undone by needing to collect N differently modulated (i.e., weighted) sums in order to extract the reflectivities. Such behavior has been described before,1 but to our knowledge a general proof for arbitrary strategies has been lacking. The formal result can be summarized by saying that the best nondiagonal matrix modulation strategies are orthogonal (unitary) matrices, or constant multiples thereof, and that these can do no better than diagonalsi.e., single-beamsstrategies. In this paper we lay out precisely what we mean by multiplebeam measurements and discuss shot noise “error” propagation through such systems. We derive a multiplication factor for the average weight of observed reflectivity noise relative to the average weight of the associated single-beam reflectivity noise and prove a theorem that this factor cannot be less than unity for any beam modulation strategy. The smallest possible value, unity, of the noise factor is achieved only by “trivial” (diagonal matrix) strategies and by nontrivial modulations described by orthogonal (unitary) matrices. Ill-chosen (nonunitary) strategies can lead to noise that grows with the number of beams in the configuration, as we illustrate in the Examples section. In the Conclusions, we briefly comment on recent innovative uses of divergent beams (a form of multiplexing) in reflectivity studies.2,3
Noiseless Data. Consider an instrument consisting of N beams simultaneously incident on a reflecting film. These beams have known intensities, Bn, n ) 1,..., N, comprising the set denoted by {Bn}N. The corresponding reflected beams are detected by N detectors configured to correspond to known scattering wavevectors {Qs}N, s ) 1,..., N at known wavelengths. We label each such reflected beam by the index s, for “slot.” In the absence of counting noise, the ideal reflected intensities comprise the set {Is}N, where Is ) BsR0s, and where {R0s}N are the theoretical reflectivities associated with the corresponding {Qs}N (the “0” subscript denotes their theoretical nature). Thus each slot also is associated with a “perfect” reflectivity, R0s ) R0(Qs). For specular reflection, each detector records only a single reflected signalsviz., the Is “in” slot ssand there is no problem in identifying the unique reflectivities R0s ) Is/Bs in each slot. For nonspecular reflection, however, a given detector may receive up to N reflected beams, corresponding to all the {Qs}N in the configuration. The resulting signal in the giVen detector thus is the sum of signals
†
Part of the Neutron Reflectivity special issue. * Corresponding author. E-mail:
[email protected]. ‡ University of Maryland. § NIST. (1) Harwit, M.; Sloane, N. J. Hadamard Transform Optics; Academic Press: New York, 1979. These authers show that other error types can benefit from multibeam strategies. (2) Rekveldt, M. Th.; Bouman, W. G.; Kraan, W. H.; Uca, O.; Grigoriev, S. V.; Habicht, K.; Keller, T. Elastic Neutron Scattering Measurements Using Larmor Precession of Polarized Neutrons. In Lecture Notes in Physics, Neutron Spin Echo Spectroscopy, Basics, Trends, and Applications; Mezei, F., Gutberlet, T., Pappas, C., Eds.; Springer: Berlin, 2003; p 87.
S)
N
N
s)1
s)1
∑ Is ) ∑ BsR0s
(1)
(Of course, some R0s may be zero in the given detector.) Clearly, a single measurement of S is insufficient to determine {R0s}N uniquely, even though {Bn}N is known. However, by performing N separate measurements, S1, S2,..., SN, in the same detector, with each one corresponding to different sets of incident beams over the N slots, {B1s}N, {B2s}N,..., {BNs}N, we obtain the set of equations
Sm )
N
N
s)1
s)1
∑ Ims ) ∑ BmsR0s
(2)
for m ) 1,..., N. Here we we use index m for “measurement.” In obvious matrix notation, these measurements are summarized as
S ) BR0
(3)
where S and R0 are N-dimensional column vectors and B (sometimes notated below as BN) is an N × N matrix consisting of N rows (the mth row representing a measurement Sm) and N
10.1021/la802780v CCC: $40.75 2009 American Chemical Society Published on Web 01/30/2009
4146 Langmuir, Vol. 25, No. 7, 2009
Berk and Majkrzak
columns (the sth column corresponding to the sth slot. The beam matrix B can be said to represent a “modulation” of the incident beam. A particular choice of B will be called a modulation (or beam) strategy, and we will call the intersections of measurements and slots “channels.” Thus in a beam strategy, each channel c ) {m, s} is associated with an incident beam Bc ) Bms. If B is invertable (we assume B is perfectly known), i.e., if det B * 0, then (3) is uniquely solved by
It is clear we can never measure a sufficient number of Sm to exactly determine all the Rms, since each instance of S introduces N additional (noisy) reflectivities; thus N measurements generate N2 unknowns, and it is impossible to catch up. However, if B is invertable, we can follow our nose and, in the manner of (4), define a solution as
Rˆ ) B-1S ) R0
for r ) 1,..., N, as if the {Rc}N were not random. We will call this result the nominal inverse; in the noiseless case above, the nominal inverse is the true inverse, i.e., the exact solution. The nominal inverse is the exact solution of the linear regression
(4)
or (anticipating forthcoming notation)
Rˆ(r) ) (B-1S)r ) R0s|s)r
(6)
and using (6), the mth measurement of S yields N
Sm )
∑ BmsRms
(7)
s)1
Alternatively, one could require Rms ) R0s for all m, R0s being the theoretical or ideal reflectivity in slot s, and then assert Bms ) Pran(Ims)/R0s as the random variable expressing the counting noise. The latter viewpoint, however, entails the awkward notion of defining incident beams in terms of sample-dependentsand therefore unknownsreflectivities, which is not common practice in single-beam reflectometry (unless R0s ) 1). The noisy reflectivities appearing in (6), while not directly observed in a multibeam measurement, could be measured one at a time in a sequence of single-beam experiments; and it is natural, therefore, to identify them as being the essential random variables of the problem. Indeed, considering B as “perfectly” known greatly facilitates devising an approximate solution, as we see next. Moreover, any uncertainty or “error” in Bms can be transferred to Rms, since, obviously, (xBms)(yRms) ) Bms(xyRms), for any x and y. Thus, the ensuing analysis can be applied to the problem of solving a linear system of equations when only the matrix of coefficients is uncertain, i.e., x * 1, y ) 1, by interchanging x and y. (3) Pynn, R.; Fitzsimmons, M. R.; Fritzsche, H.; Gierlings, M.; Major, J.; Jason, A. ReV. Sci. Inst. 2005, 76, 053902.
(8)
2
(5)
where index r labels a “result” (in this case, a solution). That is, the rth result is the reflectivity in the s ) rth slot. Notice that the details the modulation strategy are unimportant for this purpose, as long as B is well conditioned. A square matrix B is called singular if it is not invertable, i.e., if det B ) 0. It is called ill conditioned if |det B| < ε, where ε is some small number relevant to computational accuracy (typically “machine accuracy”). Singular matrices contain at least one row or column that is a linear combination of the others, and it is more-or-less easy to eliminate such possibilities in designing a strategy. As we will learn in the context of noisy data, however, “good” conditioning of a modulation strategy B requires more than a not-too-small determinant. Roughly speaking, it is necessary that both B and B-1 be well conditioned. Since det (B-1) ) (det B)-1, this implies that |det B| should not be allowed to be too large, as well as not too small. This is restrictive but, it turns out, not sufficient for a good strategy. Noisy Data. Allowing for Poisson counting noise (shot noise) in the reflected intensities introduces more than nuisance computational problems. For known Bms, the reflected intensity is Pran(Ims) ) Pran(BmsR0s), where X ) Pran(x) is a Poisson random variable with mean value x and standard deviation x. The observable reflectivity then is a random variable defined by
Rms ) Pran(BmsR0s)/Bms
Rˆ(r) ) (B-1S)r
N
min{Rˆ(r)}N
∑ |Sm - BRˆ(m)|p
(9)
m)1
for known B, with term-by-term zero minimum for any objective power p. Thus Rˆ(r) has the unique attribute N
Sm )
∑
N
BmrRˆ(r) )
r)1
∑ BmsRms
(10)
s)1
for m ) 1, 2,..., N. Now, because the linear system in (7) is strongly underdetermined, each Sm is degenerate over sets {Rms}N2, meaning that, given B and {Sm}N, the number of solutions is infinite. Geometrically, {Bm}N and {Rm}N can be viewed as vectors b Bm and b Rm in the N-dimensional Cartesian space VN. Then b Bm · b Rm ) Sm defines the hyperplane normal to b Bm that terminates all Bm. For nonsingular vectors b Rm with specified projection Sm onto b B, the point intersection of these N hyperplanes terminates the b unique vector ˆ R having the same {Sm}N, which is the nominal b ˆ may or may not lie close to the inverse. Depending on B, R bm}N determining {Sm}N. For singular B, in particular, “cluster” {R bm}N can not span VN, since it contains linearly dependent {B bm}N thus can not all meet vectors; the hyperplanes normal to {B at a point. The noiseless limit, {Rmn}N2 f {R0s}N ) {Rˆ(r)}N, restores consistency for nonsingular B. For noisy measurements, on the other hand, we must access the degree of consistency of different strategies by gauging how close {Rˆ(r)}N comes to the desired but unobservable {R0s}N. With the geometric picture in mind, one may well imagine that orthogonal modulations, with b Rm′ ) cδmm′ (constant c), do better in this regard than Bm · b nonorthogonal strategies. For these latter, angular dispersion of bm}N tends to compress with increasing N, the vectors within {B effectively degrading their linear independence. We will give examples of such behavior later in the discussion. The results of nominal inversion easily can be written as N
Rˆ(r) )
∑
Cms(r)Rms
(11a)
m,s)1
where
Cms(r) ) (B-1)rmBms
(11b)
This form explicitly expresses the N nominal inverses Rˆ(r) as weighted averages of the N2 unknown Rms. The Cms(r) satisfy trivial sum rules N
∑ Cms(r) ) δsr
(12a)
m)1 N
∑
s,m)1
and
Csm(r) ) 1
(12b)
Multibeam Data for Neutron ReflectiVity
Langmuir, Vol. 25, No. 7, 2009 4147
N
∑
N
Cmr(r) ) 1
Γs(r) )
(12c)
r)1
N
∑
Cms(r)R0s ) R0
(13)
m,s)1
which agrees with (4). Thus the nominal inverse of S is a true inverse for noiseless data. We can consider the promulgation of shot noise through the nominal inversion by defining shot noise ˆ (r), respectively, by and result deviations (“errors”), ∆ms and ∆ way of
Rms ) R0s + ∆ms
(14a)
ˆ (r) Rˆ(r) ) R0r + ∆
(14b)
and
so that from (11) and (12), N
ˆ (r) ) ∆
∑
Cms(r)∆ms
In (18b), ∆•s refers to the shot noise in any fixed measurement position (m ) 1,..., N) in an experiment. (N ordered measurements of S comprise an experiment; “generating” the ensemble causes measurement number m ) • to be repeated an infinite number of times.) Thus in (18b), var ∆•s is the shot noise variance associated with (any measurement of) the sth slot in the setup. ˆ (r) is a sum over According to (18b), the ensemble variance of ∆ slots of shot noise variances, weighted by “sensitivity” factors, Γs(r). It is important to note in (18b) that both Γs(r) and var ∆•s depend on the modulation matrix B: the former, manifestly so, of course, because of Cms; the latter, because ∆ms ) Rms - R0s, where Rms is a Poisson random variable with mean R0s and variance R0s/Bms for nonzero elements of B. (For now, let us continue to assume that Bms g 0. An effective beam strategy, B f B′, can have negative elements in a modification of the setup in which reflectivity differences are measured, which will be discussed below in Differential Modulation Strategies). As a touchstone, consider a mathematically trivial modulation strategy consisting of N separate single-slot measurements The corresponding beam matrix can be defined as diagonal, with nonzero elements Bnn ) bn, and thus
(15)
Cms(r) ) δmrδsr
m,s)1
with the same weights as in (11). Using the channel concept, with {m, s} ) c, this also is N2
ˆ (r) ) ∆
∑ Cc(r)∆c
(16)
c)1
ˆ (r) as an viz., a sum over all channels. The first line gives ∆ accumulation of slot errors, while the second defines each of these as B-dependent weighted sums of shot noise. Equation 16 relates the error in Rˆ(r)srelative to R0(r)sto the channel shot noise. (At this point of the discussion, we believe that “error” is more appropriate than “uncertainty,” since the Rˆ(r) are defined quantities.) The simplest consequence of (16), by analogy with the derivation of (13), is that “systematic errors” in the Rc, viz., m-independent measurement errors ∆ms ) ∆0s, simply shift the R0s by the channel bias; i.e., bias is “passed through” without augmentation or diminution. Now consider an ensemble of experiments on the same setup, and let 〈 · · · 〉 represent the formal average over these. Clearly, for (unbiased) shot noise, we can assume
〈∆c 〉 ) 0
(17a)
〈∆c∆c′ 〉 ) 0
(17b)
and
ˆ (r) ) var ∆ | var ∆ •s s)r
(20)
as expected for single-beam measurements. The mean result variance, or mean square result error (MSRE), is N
ˆ (r) ) MSRE ) var ∆
∑ Γs′ var ∆•s
(21a)
s)1
with
Γs′ ) N-1
N
∑ Γs(r)
(21b)
r)1
Thus MSRE is a weighted sum of noise variances. A useful figure of merit for a beam strategy is N
Ω)
∑ Γs′ ) N-1∑ Γs(r) s)1
(22)
r,s
So if var ∆•s ) σ2 for all slotssa reasonable approximation when {R0s}N covers a small dynamical rangesthen MSRE ) Ωσ2. With the definition of Cms(r) in (11b), we can recast (22) as -1 Ω ) N
∑ |Cms(r)|2
m,r,s N
∑ (BB†)mm((B†B)-1)mm
(23)
m)1
ˆ (r)〉 ) 0 〈∆
(18a)
and N
N
N
s)1
m)1
s)1
∑ 〈∆•s2〉 ∑ Cms2(r) ) ∑ Γs(r)var ∆•s (18b)
where var(•) ) 〈(• - 〈•〉)2〉 and
(19)
Therefore, in (18b), Γs(r) ) δsr for a trivial strategy, and so
) N-1
for c′ * c. Thus from (16),
ˆ (r) ) var ∆
(18c)
m)1
For noiseless data, we define Rms ) R0s, for m ) 1,..., N, indicating that the reflected beams in each slot are equal to the “true” R0 in every measurement. Then from (11) and (12),
Rˆ(r) )
∑ Cms2(r)
Since B is real, |Cms(r)| ) Cms(r), and B† ) BT; however, we prefer to cast these formulas in the more general terminology of complex linear algebra. It must be emphasized that Ω is a coursegrained measure of result error. Nevertheless it is a useful indicator of the reliability of a multiple-beam strategy, while also enabling a rough estimate of measurement uncertainty: say, e.g., reporting results as
4148 Langmuir, Vol. 25, No. 7, 2009
Berk and Majkrzak
Rˆ(r) ( k√MRSE ) Rˆ(r) ( k√Ωσ
(24)
where now σ ) (var ∆•s) , and where k is a “coverage factor,” usually k ) 1, 2, or 3, depending on local practice. Below, we prove that Ω g 1 for all B. Differential Modulation Strategies. It will prove very useful to slightly expand our earlier definitions of setup and experiment. Let us now allow the application of two distinct modulation strategies, B1 and B2, over the same channels on the setup, calling the corresponding data sets {S1,m}N and {S2,m}N, respectively. Thus, from (7), on the setup we have the two data sets 1/2
N
Si,m )
∑ Bi,msRi,ms
(25)
s)1
for i ) 1, 2, and their difference can be represented as N
S1,m - S2,m )
∑ (B1,ms - B2,ms) s)1
R1,ms + R2,ms + 2
N
∑ (B1,ms + B2,ms) s)1
R1,ms - R2,ms (26) 2
Or, from (14),
(
N
S1,m - S2,m )
∑ (B1,ms - B2,ms) R0s + s)1
)
∆1,ms + ∆2,ms + 2
N
∑ (B1,ms + B2,ms) s)1
∆1,ms - ∆2,ms (27) 2
To make this useful, define
Theorem and Corollary on Noise Propagation
(-1)i Bi ) B0 B′ 2
(28)
where B0 is a uniform matrix, B0,ms ) b0si.e., no modulationsand B′ is the modulation “of interest.” Then (14) becomes
S m )
N
∑ Bms (R0s + ∆ms ) + S 0,m
(29a)
s)1
where
S m ) S1,m - S2,m
(29b)
1 ) (∆1,ms + ∆2,ms) ∆ms 2
(29c)
and S 0,m )
b0 N (∆ - ∆2,ms) 2 s)1 1,ms
∑
(29d)
By analogy with (8) and ensuing equations, the nominal inversion of (29a)swith respect to B′sgives
ˆ ′(r) + ∆ ˆ (r) Rˆ′(r) ) ((B′)-1S′)r ) R0s + ∆ 0
(30a)
with N
ˆ ′(r) ) ∆
induced noise (DIN) and note that it is separately measurable. On the same setup, do two experiments with the uniform beam strategy B0, and compute the difference S0′; then with the given B′, compute the DIN from (30c)sor more precisely, a value of DIN drawn from the setup ensemble. If there is a cost to a differential strategy (aside from additional measurement time), what is the benefit? The answer is that while B for “regular” strategies must be non-negative in all channels, since the Bms are beam intensities, B′ can have negative elements, since each is an intensity difference. The only restriction is that B0 be chosen so that |Bms ′ | e b0 in all channels of a given B′. This extra freedom provides a much larger class of potentially useful modulation schemes to choose from than do regular strategies. For example, for N > 2, unitary (orthogonal) matrices tend to have roughly equal numbers of non-negative and negative elements, so regular strategies can not be unitary or even sensibly approximate unitary matrices. Unitary modulations have the desirable property that Ω ) 1, so that gross error magnification is prevented. Adding DIN essentially doubles the error, since ˆ ′(r) and ∆ ˆ 0′(r) depend on the conditioning of B′ in roughly the ∆ same way, but for unitary B′, this, more-or-less, is the only penalty. In the remaining discussion, we drop the (′) notation. Except for the appearance of DIN, there is no structural distinction between regular and differential modulations, as shown by (29); and the differential shot noise ∆′ms is, in general terms, statistically indistinguishable from ∆ms for regular strategies, as long as it is remembered that the scale of shot noise is determined by true beam intensities, not differential intensitiessi.e., in (28), by the B1,ms and B2,ms, not the |Bms ′ |.
∑
Cms (r)∆ms
(30b)
m,s)1
and
ˆ (r) ) ((B′)-1S ) ∆ 0 0 r
(30c)
′ In (30b), Cms (r) is just the Cms(r) defined by (11b), evaluated here ˆ 0′ (r) in (30) is new, a “price for matrix B′. The extra error term ∆
paid” for using differential modulation. We will call it difference-
Recalling (23), for any N × N matrix B, let -1
Ω[B] ) N
N
∑ (BB†)mm((B†B)-1)mm
(31)
m)1
It is easy to see that Ω ) 1 if B is diagonal or a constant scalar multiple c of a unitary matrix. In the latter case, BB† ) B†B ) N c2 × 1, the constant then cancels out, and Ω ) N-1 ∑m)1 1) 1. We will not distinguish between unitary (det B ) 1) and antiunitary (det B ) -1), since the two are indistinguishable with respect to Ω. A diagonal B realizes a “trivial” strategy, one indistinguishable from N single-beam measurements, so it is not surprising that it does not magnify propagated shot noise. A unitary strategy is not trivial, in the sense we have defined it, but it also assures that (on average) shot noise is propagated without magnification. Of course, a prime motivation for considering nontrivial inversion strategies rests in the hope that they suppress shot noise propagation to some degree, thus taking “full advantage” of the multiplicity of beams contributing to the summed data in each detector element. That is, we desire modulation strategies for which Ω < 1. While not immediately obvious from (31) that this goal is unattainable, computing Ω for various B of possible interest may lead one to suspect that, at the least, it is hard to reach. (We will survey some cases below.) In fact, we will now prove that there exist no matrix strategies B that produce Ω < 1. Theorem. Ω[B] g 1 for all square matrices B, with Ω[B] ) 1 only for B that are diagonal and for B that are constant scalar multiples of unitary (or antiunitary) matrices. Corollary. No multiple-beam modulation strategy can, on aVerage, propagate less shot noise through nominal inVersion than a unitary (orthogonal) strategy or a diagonal (triVial) strategy. Thus, in this sense, no multiple-beam measurement can outperform equiValent single-beam measurements.
Multibeam Data for Neutron ReflectiVity
Langmuir, Vol. 25, No. 7, 2009 4149
The corollary is an interpretive restatement of the theorem, given that Ω[B] represents the ratio of the average variance of the error in nominal inversion to the average variance of the measurement shot noise. Thus, if the interpretation is accepted, proving the theorem proves the corollary. Proof. That Ω[B] ) 1 if B is diagonal or unitary is selfevident, as described above. Since Ω[cB] ) Ω[B] for scalar constant c, we can assume, for now, that c ) 1. In order to show that for all other matrices, Ω[B] > 1, we use the method of singular value decomposition. Every square matrix B has a singular value decomposition (SVD), which is representable in Dirac notation as
∑ aµν ) 1 Thus
Ω[B] ) 1 +
∑ |ψν〉ζν〈φν|
B |φν〉 ) ζν|ψν 〉 †
λµ λν - 1 + aνµ -1 λν λµ
(42)
Next we wish to view Ω[B] as a function of variables λν, ν ) (0) ; i.e., 1, 2,..., N, for fixed aµν ) aµν
b] ) Ω[λ
∑ µ 0. Thus it is easy to see that
∑
(0) b] ) 2 ∂λ2µB[λ aµν λν > 0 λµ3 ν*µ
(48)
b] is always concavefor ν ) 1, 2,..., N and, therefore, that Ω[λ upward. Thus its minimum value is attainable along any continuous λ-path. A convenient one for us is along the N 1-dimensional diagonal b λd ) (λ, λ,..., λ, 1), where we have defined λN ) 1. This can be done without loss of generality, since it is equivalent to taking B f ζNB. Then
b ]) Ω[λ d
(0) (0) 1 (λ - 1) + aNµ ∑ [aµN ( λ - 1)] ) 1 + bτ(λ) (49)
µ 0, with its unique minimum b]sattains bd]sand thus also Ω[λ occurring at λ ) 1. Therefore Ω[λ its minimum value at b λ )b 1, corresponding to a unitary matrix b]. Since this is true for every matrix, the theorem 1 is proved. B[1
Comments (1) For nonunitary diagonal matrices Bd, the SVD can always be arranged to make |ψν〉 and |φν〉 unit vectors along the ν-axes, reducing the SVD to a trivial identity. This induces b ) 0, so b] degenerates to the hyperplane Ω[λ b] that the hypersurface Ω[λ ) 1. (2) A matrix B is normal (“diagonalizable”) if it can be brought into diagonal form by a similarity transformation. A necessary and sufficient condition for normality is BB† ) B†B. In this case, N |ψµ(m)|2|ψµ(m)|2 ) aνµ, and then we can take aµν ) N-1 ∑m)1
b] ) 1 + Ω[λ
∑ aµν(0)τ(λν/λµ)
(52)
µ 1, so κ2[B] does not characterize trivial strategies appropriately. From (42), considering that λν/λµ e max λ/min λ in every term, we easily obtain
Ω[B] e κ2[B]2
(54)
Except for well-conditioned B, for which Ω[B] ≈ κ2[B] ≈ 1, this bound is much too generous, which is not surprising, given the derivation. Typically, for ill-conditioned B, Ω[B] ≈ κ2[B] ) O(N). Also typically, Ω[B] < κ2[B], i.e. κ[B] overestimates the error scale, but it is easy to exhibit examples to the contrary: e.g.,
(
1.81941 1.03541 1.04588 1.03541 1.91493 1.06669 1.04588 1.06669 1.37565
)
(55)
for which (rounded) Ω ) 11.22 and κ2 ) 8.28. Finally, we note in passing that an alternative approximation of Ω[B] can be had from (38) by setting |φµ(m)|2 ) |ψµ(m)|2 ) N-1, minimally consistent with the required bases normalizations. Call the resulting summation
κ˜ 2[B] ) N-2
N
N
∑ ∑ λν-1 ) λµλν-1 λµ
µ)1
ν)1
Clearly, 1 e κ˜ 2[B] e κ2[B], with equality for unitary B.
(56)
Multibeam Data for Neutron ReflectiVity
Langmuir, Vol. 25, No. 7, 2009 4151
For some matrix types, it is possible to work out explicit formulas for Ω[B]. (1) Uniform Upper Left Triangular (Hankel) Matrices.
( )
1 1 B1(N) ) l 1
1 1 l 0
... ... ... ...
1 0 l 0
(57a)
Ω[B1(N)] ) N
(57b)
B1(N) is normal for all N, since ) Its singular values thus are the absolute values of its eigenvalues, which have been described by Blank,4 and, in decreasing order, behave asymptotically with large N as B1B1†
ζν(N) )
B1†B1.
N + O(1) (ν - 1/2)π
(58)
Thus the largest singular value grows as ζ1(N) ) 2N/π. (Note that the leading term of Blank’s asymptotic formula cannot be N ζν(N) ) 1, since the missing use to compute |det B1| ) Πν)1 terms of O(1) are important for this purpose; however, the crude N (ξν(N) - {N/[(ν - 1/2)π]}) gets the estimate “O(1)” ) N-1 ∑ν)1 order of magnitude correctly for N not too large.) (2) Uniform Lower Left Triangular Matrices.
( )
1 1 B2(N) ) l 1
0 1 l 1
... ... ... ...
( )
(3) Zero-Diagonal Uniform Matrices.
Examples
0 0 l 1
Ω[B2] ) N + (N - 1)/N
(59a)
(59b)
Thus Ω f N + O(1) with increasing N. Note that while B2(N) has only unity eigenvalues, as read off the diagonal, it is not normal. However, since B2B2† ) B1B1†, B1(N) and B2(N) have the same singular values, although different SVD bases. One also sees that B2(N) is a row permutation of B1(N) and thus is its logical equivalent, but with Ω[B2(N)] > Ω[B1(N)], illustrating the broken symmetry discussed above. The performance difference, however, decreases to O(N-1) as N increases. Using commonly available mathematical software, one will find that among all N! row permutations of B1(N), there are only N distinct values of Ω, one for each subset of permutations sharing the same fixed row, which increase from Ω[B1(N)] to Ω[B2(N)] according to N + (n - 1)/N for n ) 1, 2,..., N. The column permutations of any row permutation of B1(N) yield the same set of distinct Ω values, corresponding now to fixed-column permutations. Thus, all tolled, the (N!)2 logical equivalents of B1(N) are characterized by just N Ω values, each occurring N!(N - 1)! times over the class. The class minimum Ω is shared by B1(N) and its corresponding lower right triangular matrix, while the maximum is shared by B2(N) and its upper right triangular counterpart. Of course, as N increases, the entire class performs increasing poorly by the Ω characteristic. We note in passing that |det B| ) 1 for the class, as for unitary B; clearly det B is not a useful quantifier of error propagation for this class. However, the “condition number” max ζN/min ζN ) O(N) does “track” Ω[B1(N)] for large N. (4) Blank, M. L. Russ. Math. SurV. 2001, 56, 149.
0 1 B3 ) l 1
1 0 l 1
... ... ... ...
1 1 l 0
Ω[B3] ) N - 1 - 3(1 - N-1)2
(60a)
(60b)
Thus, again, Ω f N with increasing N. B3 is normal and has eigenvalues b ξ ) (N - 1, -1,..., -1). Thus, its singular values are b ζ ) (N - 1, 1,..., 1). These examples exhibit the angular compression of nonorbm}N mentioned in the geometric analysis following thogonal {B (10). In the triangular cases, the angle θm,m+1 between neighboring vectors goes as cos θm,m+1 ) 1 - (N - m)-1. Thus the vectors bm)N gradually tend toward parallelism with increasing N. Similar (B behavior occurs in the zero-diagonal case, where cos θm,m+1 ) 1 - (N - 1)-1 f 1. The effect, of course, is the elementary decrease of the angle between a pair of adjacent vectors of fixed difference as their lengths increase. Nonetheless, it is worthwhile noting that this compression is not predicted by the determinant; although |det B| ) N - 1 for the uniform zero-diagonal matrices, |det B| ) 1 for the triangular matrices. The quantity Ω[B] - 1 seems to be a useful number for characterizing this type of angular compression, with Ω[B] - 1 ) 0 signifying no such compression at all. (4) Hadamard Matrices. Hadamard matrices, HN, are N × N matrices composed only of elements (1 and which are required to have orthogonal rows. Thus they satisfy
HNHNT ) N × 1
(61)
Thus det HN ) (N, and HN is nonsingular. Because of the restriction on its elements, HN exists only for N ) 1, 2, and N ) 4n for any positive integer n. With HN ) √N UN, UNU TN ) 1 from (61); thus UN is orthogonal. Therefore, if B ) HN, it is a constant multiple of an orthogonal matrix and consequently Ω[B] ) Ω[UN] ) 1. With regard to shot noise propagation, therefore, Hadamard matrices perform as well as any other unitary strategy, although the binary nature of their elements may be useful in implementations of multiple beam setups. A similar conclusion was reached without formal proof by Harwit and Sloan.1 (5) Discrete Fourier Transform Matrices (DFTMs). Generally speaking, a modulation B effects a discrete linear transformation of {noisy} stimulus {Rms}N to response {Sm}N. The analysis up to this point has focused solely on inverting the transformation, but in some cases the response itself may be of interest. For example, the “modulation” defined by matrix elements DFT Bms [N] )
1 2πi(m-1)(s-1)⁄N e √N
(62)
for m ) 1, 2,..., N and s ) 1, 2,..., N, performs a discrete Fourier transform on a Nyquist mesh and is unitary. A complex-valued modulation such as this is not feasible for neutron beams, obviously, but real-valued analogs of (62) are possible. The real and imaginary parts of (62) are singular matrices, however, because of the inversion symmetries of the sinusoids, taken separately. This problem is overcome by halving the interval of the Nyquist mesh, with or without offsets and end-point corrections; e.g., for discrete cosine transforms (in a standard ordering):
4152 Langmuir, Vol. 25, No. 7, 2009 DCT1 Bms [N] )
Berk and Majkrzak
1 1 1 π(m - 1)(s - 1) 1 - δm,1 - δm,N cos 2 2 N √N (63a)
(
)
1 π(m - 1)(s - 1 ⁄ 2) cos N √N 2 - δm,1 π(m - 1)(s - 1/2) DCT3 [N] ) Bms cos N √N DCT2 Bms [N] )
(63b) (63c)
and DCT4 Bms [N] )
1 π(m - 1 ⁄ 2)(s - 1 ⁄ 2) cos N √N
(63d)
Of these, only BDCT4 is unitary, independently of N, and thus Ω[BDCT4] ) 1. However BDCT1, BDCT2, and BDCT3 approach unitarity with increasing N, so Ω[BDCTn[N]] ) 1 + O(N-1) as N f ∞, for n ∈ {1, 2, 3}. For n ∈ {2, 3}, Ω[BDCTn[N]] ≈ 1 to within 1% at N ) 11, while DCT1 does not achieve comparable unitarity until N J 375. We note that DCT5 Bms [N] )
2 - δm,1 π(m - 1)(s - 1/2) cos N N
(64)
also is unitary for all N, although it is not among the canonical DCTs. For completeness, we also note that (BDCT1)-1 ) BDCT1, (BDCT2)-1 ) BDCT3, (BDCT4)-1 ) (BDCT4)T ) BDCT4, and (BDCT5)-1 ) (BDCT5)T. In practice, DFTs typically arise as approximations to Fourier transforms, e.g.,
g(x) ) gm
) )
∫0∞ f(k) cos kx dkf kmax N π(m - 1/2)(s - 1/2) f cos N s)1 s N
∑
kmax
√N
(65)
N
∑ fsBmsDCT4 s)1
where gm ) g[(m - 1/2)∆x], fs ) f[(s - 1/2)∆k], and ∆k∆x ) π/N. The real-space (more generally, the conjugate-space) function g(x) can be interpreted as a “correlation function”, i.e., a correlation integral of some functional of scattering length density, depending on the nature of the reflectivity spectrum. Let us say now that this is the quantity of direct interest, rather than the “underlying” reflectivities being summed. The propagation of shot noise ∆Rms through DCT modulations N cms∆Rms, where cms can be estimated by writing ∆Sm ) N-1 ∑s)1 is one of the cosine functions in (63). Then, assuming the random variables, ∆Rms, are independent, the estimated standard deviations of the errors in the Sm are
STD(∆Sm) )
√var(∆R
2 ms)cms
√N
(66)
where the overbar denotes an average over the s-index. For simplicity, take var(∆Rms) ) var(∆Rm1) for all s, so that (66) becomes
STD(∆Sm) )
STD(∆Rm1)
√N
√c
2 ms
(67)
2 1/2 where (cms ) ) 0.5 for the DCTn above. This is the familiar law-of-large-numbers benefit of averaging, viz., STD(∆Sm) ) O(1/N). Clearly, however, it unrelated to multibeam data collection, i.e., to how the reflectivities are acquired. Exactly the
same benefit accrues to calculating {Sm}N in the computer from a given {Rˆr}N. And as we have seen, there is no error benefit over single-beam acquisition to measuring {Rˆr ) Rˆ(r)}N on a multibeam rig with a unitary modulation strategy. We can also think of this in the following way. (We now will use gm for Sm.) It is true that a multibeam measurement of g1, say, automatically “does” the summation in (65) for a particular {R1s}N; but this measurement alone does not inform us of the reflectivities, so we cannot use it to compute g2. Instead, we must measure g2, then g3, and so on, to gN. This entails the same overall measurement time (for the same total number of neutrons on the sample) as if we were only interested in measuring N reflectivities, {Rˆr}N, using a multibeam strategy. Indeed, we could invert the just-measured {gm}N to obtain {Rˆr}N and (instantaneously) recompute {gm}N to the same degree of statistical uncertainty. That is, for DCTn modulations, we have the inferences {Rms}N2 f {gm}N f {Rˆr}N f {gm}N. The latter two are purely computational and thus take no time. The first involves N explicit measurements, viz., either N measurements of {gm}N in the multibeam setup with DCTn modulation, or N singlebeam measurements of {Rm)s,s}N with a diagonal (single-beam) strategy. Both pathways take the same instrumental time and yield {gm}N to the same overall accuracy.
Conclusions We have shown how shot noise is propagated through modulated multibeam configurations for measuring reflectivities. We have introduced a measure, Ω, of the average noise error resulting from the quasi-inversion of summed data and have proven a theorem that Ω g 1, with Ω ) 1 occurring only for “trivial” single-beam modulation strategies and for constant multiples of orthogonal (unitary) nontrivial strategies. The corollary to the theorem states, as a consequence, that modulated multibeam measurements of reflectivities can do no better, in regard to noise propagation, than single-beam measurements using the same average number of neutrons. We have also given some examples of “good” and “bad” modulation strategies. The most interesting lesson there is that strategies that are easy to implement and, which intuition might suggest, are likely to perform well, often perform very badly, errorwise, as the number of beams increases. The question can be asked: Does our analysis of noise propagation through a generic multiple-beam reflectometer apply to all such instruments? We believe it does, to the extent that we can consider the incident neutrons as being partitioned into discrete sets of convergent, well-collimated beams orsmore generallysthat measurements made on a multibeam instrument could also be made on a “corresponding” single-beam instrument. The mathematical issue is whether a given multiple-beam modulation strategy has an underlyingsand realizablessingle-beam strategy to be compared with. Consider the discussion in Example 5, for example, which illustrated emulation of a continuous cosine modulation by an N-beam DCT strategy. The cosine transform in (65) evokes the modulation that emerges in a natural way from spin-labeling of divergent beams in recently devised instruments.2,3 Certainly, for large enough N, the continuous transform can be approximated by a discrete transform as accurately as desired; but for fixed ∆x, ∆k decreases as L-1 with increasing length scale, L ) N∆x, implying increasingly higher angular resolution. For sufficiently large lengths scales, therefore, the angular resolution dictated by the multibeam approximation in (65) may not be realizable by well-collimated beams without having to count for prohibitively long times. This, however, is a practical mattersalbeit a very important one. The point to be
Multibeam Data for Neutron ReflectiVity
made here is that intensity modulation, in and of itself, does not result in higher efficiency for given statistical accuracy. Instead, as in the SESAME3 technique, it may enable the requisite instrumental wavevector transfer resolutionsfor the length scales of intereststhat otherwise might not be achievable without unacceptable reduction of usable incident beam intensity when “lossy” devices, such as absorbing collimators or crystal monochromators with narrow wavelength bands, must define
Langmuir, Vol. 25, No. 7, 2009 4153
extremely tight beams. The analysis of noise propagation through such rigs, however, remains within the formal scope of our theorem. Acknowledgment. The authors acknowledge helpful discussions with Brian Maranville. LA802780V