Single Photon in Hierarchical Architecture for Physical Decision

Nov 28, 2016 - WPI Center for Materials Nanoarchitectonics, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan ...
0 downloads 10 Views 3MB Size
Subscriber access provided by UNIV TORONTO

Article

Single Photon in Hierarchical Architecture for Physical Decision Making: Photon Intelligence Makoto Naruse, Martin Berthel, Aurélien Drezet, Serge Huant, Hirokazu Hori, and Song-Ju Kim ACS Photonics, Just Accepted Manuscript • DOI: 10.1021/acsphotonics.6b00742 • Publication Date (Web): 28 Nov 2016 Downloaded from http://pubs.acs.org on December 3, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Photonics is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Single Photon in Hierarchical Architecture for Physical Decision Making: Photon Intelligence *

Makoto Naruse,1, Martin Berthel,2 Aurélien Drezet,2 Serge Huant,2 Hirokazu Hori,3 and Song-Ju Kim4

1

Network System Research Institute, National Institute of Information and Communications Technology, 4-2-1 Nukui-kita, Koganei, Tokyo 184-8795, Japan

2

Univ. Grenoble Alpes, CNRS, Inst. NEEL, F-38000 Grenoble, France

3

Interdisciplinary Graduate School of Medicine and Engineering, University of Yamanashi, Takeda, Kofu, Yamanashi 400-8511, Japan

4

WPI Center for Materials Nanoarchitectonics, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan

* Corresponding author: 4-2-1 Nukui-kita, Koganei, Tokyo 184-8795, Japan.

Email address: [email protected]

1 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT: Understanding and using photonic processes for intelligent functionalities, referred to as photonic intelligence, has recently attracted interest from a variety of fields, including post-silicon computing for artificial intelligence and decision making in the behavioral sciences. In a past study, we successfully used the wave-particle duality of single photons to solve the two-armed bandit problem, which constitutes one of the important foundations of decision making and reinforcement learning. In this paper, we propose and confirm a hierarchical architecture for single-photon-based decision making that verifies the scalability of the principle. Specifically, the four-armed bandit problem is solved given zero prior knowledge in a two-layer hierarchical architecture, where the polarization of single photons is autonomously adapted in order to effect adequate decision making. In the hierarchical structure, the notion of layer-dependent decisions emerges. The optimal solutions in the coarse layer and the fine layer, however, conflict with each other in some contradictory problems. We show that while what we call a tournament strategy resolves such contradictions, the probabilistic nature of single photons allows for the direct location of the optimal solution, even for contradictory problems, hence manifesting the exploration capability of single photons. This study provides insights into photon intelligence in hierarchical architectures for artificial intelligence as well as a novel aspect of photonic processes for intelligent functionalities. KEYWORDS: single photon, decision making, reinforcement learning, information photonics

2 ACS Paragon Plus Environment

Page 2 of 39

Page 3 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Modern society is becoming increasingly reliant on artificial intelligence (AI).1 AI at present is based on computer algorithms and digital computing, and suffers from a theoretical limitation known as the von Neumann bottleneck,2,3 as the design of conventional digital computing devices anticipates the end of Moore’s law,4 which imposes limits on the extent to which integrated circuits can be downscaled. Consequently, the utilization of unconventional physical processes and architectures for intelligence, referred to as natural intelligence, is attracting increasing attention. The relevant methods include quantum annealing,5 laser-based solution search,6 new type of solution-searching circuits such as complementary metal–oxide–semiconductor (CMOS) annealing7 and the photon intelligence approaches proposed by us.8-10 Meanwhile, human intelligence, especially decision making, has been intensively examined through such mathematical and physical modelling approaches as quantum decision theories11,12 and neuroscience,13 part of which influences reinforcement learning algorithms.14 Reinforcement learning is also important in computational psychiatry.15,16 Physical insights into and the implementation of natural intelligence, particularly with regard to decision making, are stimulating for computation, physics, and the behavioral sciences. One of the most fundamental issues in machine learning and the decision sciences is the multiarmed bandit problem (MAB), which concerns how to maximize the total reward from multiple slot machines.17 To solve this problem in general, an exploration of search procedures for the highestreward probability machine, which is precisely defined shortly below, is needed; however, too much exploration may result in excessive loss, whereas too quick a decision or insufficient exploration

3 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

might result in missing the best machine. This is called the “exploration-exploitation dilemma”. 17-19 The MAB is one of the most basic problems in reinforcement learning.17 The MAB is important for various practical applications, such as information network management,20,21 Web advertisement,22 Monte Carlo tree search,23 and clinical trials.24 In our previous study, we experimentally showed that a single photon can solve the two-armed bandit problem using the nitrogen-vacancy (NV) center in a nanodiamond as a single-photon source.10 The wave-particle duality of the single photon is utilized where the probabilistic attribute of the photon takes the role of exploration while its particle nature is immediately and directly associated with a particular decision. The theoretical background for this has been examined by comparisons to other MAB algorithms25 and category-theoretic modelling and analysis.26 However, many issues remain unresolved in the route to realizing artificially constructed, physical decision-making machines. A fundamental issue in this regard is scalability; the number of choices involved in a decision may be numerous, not merely binary as assumed in the first proof-ofprinciple experiment in 10. A hierarchical architecture is a promising approach to scalable intelligent systems27 and optical devices dealing with multiple channels,28 and has been applied to quantum computing platforms.29 The effectiveness of a hierarchical approach for single-photon-based decision making is, however, completely unknown. Moreover, interesting notions of decision making emerge in a hierarchical architecture—that is, decision making at finer and coarser scales of the hierarchy.

4 ACS Paragon Plus Environment

Page 4 of 39

Page 5 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Specifically, this paper shows that the four-armed bandit problem is resolved given zero prior knowledge by using single photons in a two-layer tree-structure architecture, where the polarization of single photons is autonomously adapted. We have to be aware that the optimal solution in the coarse layer can conflict with that in the fine layer (as explained in detail below in the problem exemplified by CASE 3). We show that while a simple “tournament” strategy resolves such contradictions, the probabilistic nature of single photon allows the direct location of the optimal solution on a fine scale, which is a manifestation of the exploration ability of single photons. This manifests yet another fundamental of the quantum nature of single photons in vital intelligent roles, in contrast to the literature on single photons that focuses only on the contexts of quantum key distributions30 and quantum computing.31,32 Meanwhile, the proposed principle using single photons could be transformed to the rapidly growing fields of quantum plasmonics33 based on our former results based on near-field optics.9 Also, our hierarchical approach to photonic decision making can be positioned in a growing field of photonics-based machine learning.6,8-10,34-36 

RESULTS

Hierarchical single-photon-based decision maker. For the simplest case that preserves the essence of the solution of the MAB problem in a hierarchical system, we consider a player who selects one of four slot machines (slot machines 1, 2, 3 and 4) with the goal of maximizing a reward. Denoting the reward probabilities of the slot machines by Pi (i = 1,L, 4) , respectively, the problem then is to select the machine with the highest reward probability, referred to as the ‘highest-reward

5 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

probability machine’. A unit reward dispensed by a slot machine is identical among all machines. The machine-selection decision is associated with single-photon detection by designated photo detectors corresponding, respectively, to slot machines 1 to 4, as described below. The architecture of the optical system has a tree structure, where an incoming single photon is directed by a polarizing beam splitter (PBS), denoted by PBS1 in Figure 1, following which it experiences another PBS, either PBS2 or PBS3, resulting in photon detection by one of four avalanche photodiodes (APDs), APDi (i = 1,L, 4) in Figure 1. The central idea of single-photon decision making is to adapt the polarization of single photons by the angle of three half-wave plates (denoted by HWP1, HWP2, HWP3) located at the fronts of three PBSs, respectively. We see here a hierarchical structure: PBS1 governs the decision of whether to select {slot machine 1 or 2} or {slot machine 3 or 4}, referred to as the “coarse-scale” decision hereafter. For the sake of simplicity, we call the relevant collections Group 1 for {slot machine 1 and 2} and Group 2 for {slot machine 3 and 4}. That is, the coarse-scale decision concerns whether to choose either of Group 1 or Group 2: namely, which of the reward probabilities P1 + P2 or P3 + P4 has the greater value? Meanwhile, PBS2 and PBS3 concern the machine selection decision concerning [slot machine 1 or 2] and [slot machine 3 or 4], respectively. We call a “fine scale” decision one that concerns the choice of a machine with the maximum Pi (i = 1,L,4) . Initially, the linear polarization of the single photons that are input is oriented at π / 4 with respect to the horizontal of PBS1, enabling the photons to be directed to Group 1 or Group 2 with

6 ACS Paragon Plus Environment

Page 6 of 39

Page 7 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

50:50 probability. We suppose that due to PBS1, the polarization of the photon directed towards Group 1 is vertically polarized, whereas that directed towards Group 2 is horizontally polarized. The polarizations of single-photon incidents on PBS2 and PBS3 are also initially orientated at π / 4 with respect to the horizontal of PBS2 and PBS3 via HWP2 and HWP3, respectively; hence, photons are to be directed towards APD1 or APD2 with a 50:50 probability whereas those directed towards APD3 or APD4 have the same probability. Meanwhile, the total probability of photon detection by either of the APDi (i = 1,L,4) is 1. This is a notable aspect of the single-photon decision maker in the sense that the probabilistic (wave) and particle attributes of a single photon are employed. The principle of single-photon-based decision making is inspired by the tug-of-war (TOW) method invented by Kim et al.,25,37 which originated from the observation of slime moulds—the concurrent expanding and shrinking of their bodies, while maintaining a constant intracellular resource volume, allows them to gather environmental information, and the conservation of the volumes of their bodies entails a nonlocal correlation among the body. The TOW is a metaphor to represent such nonlocal correlation, which enhances decision making performance.25 This mechanism satisfactorily matches the intrinsic attributes of a single photon in a hierarchical architecture, as examined in this study, not merely a two-armed system.10 Until a single photon is detected by either of multiple detectors, a single photon is not localized in the system. The possibility of photon detection at each of the APDs is not perfectly zero unless single photon polarization is perfectly horizontal or vertical. This is a remarkable aspect of the single-photon

7 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

decision maker: it exploits the quantum attributes of photons. If photon observation was based on classical light, e.g. observing light intensity, we would have needed to implement an additional step to facilitate decision making. In our hierarchical single-photon-based decision making architecture, the TOW mechanism is implemented by three polarization adjusters (PAs), which are respectively marked PAi (i = 1,2,3) , and control the corresponding HWPi (i = 1,2,3) . The numerical indicators of the polarization adjusters, referred to as the PA value hereafter, are also represented by PAi (i = 1,2,3) . The control mechanisms of the PAs for the coarse and fine scales are given by the following: Control Mechanism of PA1 (coarse-scale control) [C-1] A PA1 value of zero indicates a polarization at 45° with respect to the horizontal; single photons are directed towards Group 1 or 2 with 50:50 probability. [C-2] The decision to select the slot machine is immediately made by observing a single photon in APDi (i = 1,L,4) . [C-3] If a reward is successfully dispensed from one of the slot machines in Group 1, PA1 is “moved” in the direction of the chosen Group, i.e. if slot machine 1 or 2 is selected based on photon detection by APD1 or APD2 and a reward is obtained, PA1 is moved such that input polarization is more horizontally polarized by controlling HWP1. Moreover, if no reward is dispensed by slot machine 1 or 2, the PA is moved in the direction of the unselected machine,

8 ACS Paragon Plus Environment

Page 8 of 39

Page 9 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

i.e. input polarization is more vertically polarized in this case. The same mechanism applies to Group 2 with slot machines 3 and 4. By iterating steps [C-2] and [C-3], PA1 configures the system so that the group with the higher probability of reward is more likely to be selected by incoming single photons. The details of the mechanism are formulated below (TOW mechanism of PA1). Control Mechanism of PA2 and PA3 (fine-scale control) [F-1] A PA2 value of zero indicates a polarization at 45° with respect to the horizontal; single photons, impinging on PBS2, are directed to APD1 or APD2 with 50% probability. [F-2] The decision to select the slot machine is immediately made by observing a single photon in APDi (i = 1,L,4) . [F-3] If a reward is successfully dispensed by slot machine 1, PA2 is moved in the direction of slot machine 1, i.e. if slot machine 1 is selected and a reward is obtained, PA2 is moved such that input polarization is more horizontally polarized by controlling HWP2. Moreover, if no reward is dispensed by slot machine 1, PA2 is moved in the direction of the unselected machine, i.e. input polarization is more vertically polarized. The same mechanism applies to slot machine 2. By iterating steps [F-2] and [F-3], PA2 configures the system such that the machine with the higher reward probability between slot machines 1 and 2 is more likely be selected. The same

9 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

architecture is adapted for PA3 with respect to slot machines 3 and 4. The details of the mechanism are formulated below (TOW mechanism of PA2 and PA3 (fine-scale TOW)).

Measurements. The experimental setup was based on the architecture shown in Figure 1. A single photon was emitted by a nitrogen-vacancy (NV) color center from a nanodiamond,38-40 which featured broadband emission in the visible range (650-700 nm) at room temperature,41 passed through a polarizer and a zero-order half-wave plate (HWP1), and impinged on PBS1. One of the branches connected to another zero-order half-wave plate (HWP2) followed by PBS2, whereas the other branch connected to HWP3 and PBS3. The orientations of HWPi (i = 1,L ,3) were configured by respective rotary servomotors. The single photon was detected by one of four APDs (APDi

(i = 1,L ,4) ) connected to a 100-ps bin size and a multiple-event time digitizer (time-to-digital converter (TDC)) to record detection times. The details of the optical system used in the experiment and the single-photon emission from the NV center are described in the Supporting Information. Example sequences of single-photon series detected by APDs are shown in Figures 2a, 2b and 2c. The vertical red, green, blue and cyan bars represent single-photon detection events by APDi

(i = 1,L ,4) , and the horizontal axis represents time. In Figure 2a, all HWPs are configured so that the output polarizations are approximately 45° with respect to the horizontal and the incoming single photons are directed to the four APDs with equal probability. On the contrary, in Figure 2b, the HWP1 configures the polarization nearly perfectly horizontally; hence, detection is induced mostly

10 ACS Paragon Plus Environment

Page 10 of 39

Page 11 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

by either APD1 or APD2. In Figure 2c, HWP2 is also configured so that the output polarization is nearly horizontal, leading to detection mostly by APD1. Note, however, that a few photons are observed in, for example, APD4 in Figure 2b and APD2 in Figure 2c; such probabilistically rare events are important for adaptive decision making in uncertain environments.25 In order to play a slot machine and reconfigure HWPs, some time is needed. In this study, the decision is made by the first single-photon detection event of a given cycle; if detection occurs at APDi, the decision is immediately made to play the slot machine i. In the experiment, the slot machines were emulated by a host controller (See the Supporting Information for details). Specifically, reward probabilities Pi (i = 1,L ,4) were given as threshold values. If a random number between 0 and 1 generated by the host controller was less than the reward probability of the selected slot machine, reward was dispensed. Based on the PA values ( PAi (i = 1,L ,3) ), the linear polarization was made more vertical or horizontal by rotating the HWPs using a rotary positioner. In Figures 2d, 2e and 2f, the red, green, blue and cyan circles, respectively, indicated 0.5 s worth of photon counts detected by APDi,

(i = 1,L ,4) , as a function of the orientation of the half-wave plates. In controlling HWPi, other HWPs (HWPj ( j ≠ i) ) were kept in 50:50 setups. Note that the orientation angles shown in the horizontal axis did not indicate linear polarization with respect to the horizontal direction, but to the absolute value defined in the rotary positioner used in the experiment. We clearly observed that (1) the single photon incidence followed similar characteristics in Group 1 (APD1 and APD2) and

11 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 39

Group 2 (APD3 and APD4) with respects to HWP1 dependencies (Figure 2d), (2) whereas [APD1 and APD2] and [APD3 and APD4] exhibited an opposite trend regarding HWP2 and HWP3, respectively (Figure 2e and Figure 2f). Because the sensitivities of the APDs were not identical, and owing to possible misalignment in the optical setup, the polarization dependencies did not exhibit perfect symmetry. The extinction ratio of the polarizer was 105 and that of the PBS was 103 (product information is shown in the Supporting Information). We think that the intrinsic optical properties of various optical components in the experimental setup did not yield significant asymmetry. To implement the PA mechanisms in the hierarchical architecture, we quantified the TOW mechanism as below. Let all initial PA values be zero. TOW mechanism of PA1 (coarse-scale TOW) If, in cycle t, the selected machine yields a reward (or in other words, the slot machine wins), PA1 is updated at cycle t + 1 based on

PA1 (t + 1) = −∆1 + α1 PA1 (t ) if slot machine 1 or 2 ( Group 1) wins

PA1 (t + 1) = + ∆1 + α1 PA1 (t ) if slot machine 3 or 4 ( Group 2 ) wins

(1)

where α1 refers to the forgetting parameter,8 and ∆ is the constant increment (in this experiment,

∆1 = 1 and α1 = 0.999 ). When the selected machine does not yield a reward (or loses in the slot play), PA1 is updated by

PA1 (t + 1) = +Ω1 + α1PA1 (t ) if slot machine 1 or 2 ( Group 1) loses

PA1 (t + 1) = −Ω1 + α1PA1 (t ) if slot machine 3 or 4 ( Group 2 ) loses

12 ACS Paragon Plus Environment

(2)

Page 13 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

where Ω1 is a parameter defined below. Intuitively speaking, PA1 increases if the slot machines in Group 1 are more likely to win, and decreases if those in Group 2 are considered to be more likely to earn rewards. This is as if the value of PA1 is being pulled by Group 1 and Group 2, which coincides with the notion of TOW. The values of PA1 is then adapted to polarization control via HWP1, so that polarization is more horizontal in the former case (Group 1 is more likely to win) and vertical in the latter (Group 2 is more likely to win). Specifically, the orientation of HWP1 at cycle t is determined by

HWP1 (t ) = POS1 (  PA1 (t )  )

(3)

where   represents the round-off function to the closest whole number. Function POS1 (n) specifies the orientation of HWP1 based on polarization dependencies characterized as in Figure 2d (details are described in the Supporting Information). In TOW-based decision making, Ω1 is determined based on the history of betting results. Let the number of slot machines selected i by cycle t be Ni and the number of winning slot machines i be Li . The estimated reward probabilities by the slot machines in Group 1 and Group 2 are, respectively, given by

L + L4 L +L PˆG1 = 1 2 , PˆG 2 = 3 . N1 + N 2 N3 + N 4

(4)

Ω1 is then given by

13 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Ω1 =

PˆG1 + PˆG 2 2 − ( PˆG1 + PˆG 2 )

Page 14 of 39

(5)

while the initial Ω1 value is assumed to be unity, and a constant value is assumed when the denominator of Equation (5) is zero. The detailed derivation of Equation (5) is shown in 25. TOW mechanism of PA2 and PA3 (fine-scale TOW) We describe only the TOW mechanism of PA2 below, since that of PA3 follows the same principle and the corresponding slot machines. If, in cycle t, the selected machine yields a reward (or in other words, wins the bet), the value of PA2 is updated at cycle t + 1 based on

PA2 (t + 1) = −∆2 + α 2 PA2 (t ) if slot machine 1 wins PA2 (t + 1) = + ∆2 + α 2 PA2 (t ) if slot machine 2 wins

(6)

with ∆2 = 1 and α 2 = 0.999 , as in the case of PA1 (t ) . When the selected machine does not yield a reward, the value of PA is updated by PA2 (t + 1) = + Ω 2 + α 2 PA2 (t ) if slot machine 1 loses PA2 (t + 1) = −Ω 2 + α 2 PA2 (t ) if slot machine 2 loses

(7)

where Ω2 is a parameter defined later (Equation (10)). In other words, the value of PA2 is pulled by slot machines 1 and 2 in a tug-of-war manner. The value of PA2 is adapted to HWP2 control so that polarization is more horizontal if slot machine 1 is expected to dispense more reward, whereas it is more vertical if slot machine 2 is considered to be more beneficial. As in the former case, the orientation of HWP2 is determined by

14 ACS Paragon Plus Environment

Page 15 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

HWP2 (t ) = POS2 (  PA2 (t )  )

(8)

where POS2 (n) specifies the orientation of HWP2 based on the polarization dependencies characterized in Figure 2e. The estimated reward probabilities of slot machine 1 and slot machine 2 are, respectively, given by

L L Pˆ1 = 1 , Pˆ2 = 2 , N1 N2

(9)

followed by Ω2 , which is given by

Ω2 =

Pˆ1 + Pˆ2 . 2 − ( Pˆ1 + Pˆ2 )

(10)

As mentioned earlier, the other PA value in the fine scale, PA3 (t ) , is given in the same manner, by taking account of slot machines 3 and 4 instead of slot machines 1 and 2. The decision-making procedure in the hierarchical architecture is summarized as follows: [1] Photon arrival time is measured through APDs and a TDC system. [2] The decision is made based on the first photon detection in Step [1]. The selected slot machine is played. [3] Reward is dispensed or not. [4] The PA values are updated based on Equations (1), (2), (6), and (7). [5] The orientations of the HWPs are determined using Equations (3) and (8).

15 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 39

[6] The values of Ωi (i = 1, L , 3) are updated using Equations (4), (5), (9), and (10). [7] The rotary positioner is controlled; then, the system returns to Step [1].



DISCUSSION

We first solve the typical four-armed bandit problems given by following two cases, where the reward probabilities are respectively given by

CASE 1:{P1 , P2 , P3 , P4 } = {0.8, 0.2, 0.1, 0.1} CASE 2:{P1 , P2 , P3 , P4 } = {0.8, 0.1, 0.2, 0.1}

.

(11)

Since the maximum reward probability is P1 = 0.8 for both cases, the correct decision (precisely speaking, the correct decision in the fine scale) is to select slot machine 1. Note also that the elements of the probabilities are identical, although the order is slightly different. Starting with zero prior knowledge of the reward probabilities, the hierarchical single-photonbased decision maker makes consecutive 30 plays and repeats these plays 10 times. The red solid and dashed lines in Figure 3a, respectively, show the correct decision rate at cycle t for CASE 1 and

CASE 2 problems, defined by the ratio of the number of time the highest-reward probability machine is chosen in cycle t in all trials (10 times), which gradually increases over time, demonstrating successful decision making. Since P1 + P2 > P3 + P4 holds for both cases, choosing machines in Group 1 is the correct decision at the coarse scale. We note that the difference of the sum of reward probabilities between Group 1 and Group 2 is given by [CASE1] (P1 + P2 ) − (P3 + P4 ) = 0.8 and [CASE2] 0.6, indicating that CASE 1 is a relatively easier problem than CASE 2. Indeed, the blue solid and dashed lines in 16 ACS Paragon Plus Environment

Page 17 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Figure 3a show the correct decision rate at the coarse scale for Cases 1 and 2, respectively, defined by the ratio of the number of selections of machines belonging to Group 1 in all trials, where Case 1 is quickly approaching unity, namely, the case where rapid adaptation is implemented. Consequently, Case 1 exhibits more rapid adaptation than Case 2 at the fine scale as well, as shown by the red lines in Figure 3a. Figure 3b summarizes the temporal evolution of polarization adjuster values. Here, we observe that (1) Both PA1 and PA2 in Case 1 (red and blue solid curves, respectively) decrease more quickly than those in Case 2 (red and blue dashed curves, respectively), meaning that single-photon polarization is shifted to the horizontal by HWP1 and HWP2, coinciding with the decision-making performance in Figure 3a. (2) PA3 , which concerns decisions among slot machines 3 and 4, persists with a value of around zero in both cases (green solid and dashed curves for Case 1 and Case 2, respectively), since the difference in their reward probabilities is zero or very small; hence, these machines are rarely selected when the system finds the best machine. In these demonstrations as well as the following ones, the resolutions of polarization control specified by HWPi (t ) (i = 1,L ,3) consist of seven steps. The highest-reward probability slot machine may not belong to the higher-reward probability group at the coarse scale. Take, for example, a case given by

[Case 3] {P1 , P2 , P3 , P4 } = {0.7,0.5,0.9,0.1}

17 ACS Paragon Plus Environment

(12)

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

where the correct decision is to choose slot machine 3 ( P3 = 0.9 ), which belongs to Group 2. At the coarse scale, on the contrary, Group 1 has a larger winning probability ( P1 + P2 = 1.2 ) than Group 2 ( P3 + P4 = 1.0 ), which implies that the optimal solution at the coarse scale is to select either slot machine 1 or 2, which is contradictory to the correct decision at the fine scale (slot machine 3). Hence, polarization control at the coarse scale (HWP1) encounters difficulty in guiding the decision towards Group 2. The red solid line in Figure 4 shows the correct selection rate at the fine scale, which fluctuates about 0.5, indicating that the system was not able to find the correct decision. One way to derive the correct decision at the fine scale for such a contradictory problem is what is called the “tournament” method: In the first round, coarse-scale polarization control is not activated ( PA1 is kept zero) while each branch of the fine scale experiences polarization adjustments. As a consequence, a higher-reward probability machine is more likely to be chosen at each branch, that is, P1 = 0.7 in Group 1 and P3 = 0.9 in Group 2. In the second round, coarse-scale polarization control is initiated, whereas fine-scale control is fixed; this means that the TOW principle applies to the winners of the first round, leading to the highest-probability machine at the end. The blue lines in Figure 4 show experimental verification based on the tournament method, where the first 15 cycles belonged to the first round and the next 15 to the second round. In the second round, we observed that the correct decision rate at the finer scale, depicted by the solid blue line, increased, whereas that at the coarse scale, shown by the dashed blue line, decreased, unlike in the “non-tournament” method

18 ACS Paragon Plus Environment

Page 18 of 39

Page 19 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

discussed earlier, and represented by red lines; this showed that the tournament method works successfully in a single-photon-based hierarchical decision maker. The other approach to derive the global optimal solution, or the maximum-reward probability machine, in the hierarchical architecture without using tournament-based approaches involves further exploiting the probabilistic attributes of photons by increasing the resolutions of the polarization adjuster. Concerning the same problem [Case 3], the solid, dashed, dotted and dash-dotted lines in Figure 5a demonstrate correct decision rates (at the fine scale) with the number of resolutions of polarizations, or the number of steps in polarization adjustment, being 11, 9, 7 and 5, respectively. We can observe that with increasing resolution, the correct decision rate at the fine layer increases. This is also shown by the square marks in Figure 5b that compare the correct decision rates at cycle 30 as a function of the number of polarization adjustment steps. We should also remark that the correct decision rate at the coarse layer decreases with increasing number of PA steps, as shown by the square marks in Figure 5b. The number of steps of polarization adjustment resolution can have a much larger value; Figure 5c shows numerical simulations of the correct decision rate until cycle

t = 500 , calculated as an average of 100 iterations with varying numbers of polarization adjustment steps: 5, 7, 9, 11, 51 and 101. With increasing resolution, the correct decision rate approached unity, whereas the adaptation became slower. At cycle t = 30 , the correct decision rate at the fine and the coarse scales behaved as the number of polarization adjustment steps, as shown in Figure 5d; the portion between resolutions 5 and 11 exhibited a similar trend in the experimental results shown in

19 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5b, demonstrating the exploration abilities of single photons for the optimal solution in a hierarchical architecture. Finally, we discuss this study in light of practical considerations. First, the second-order (2) photon-intensity correlation g (0) of the single-photon source was sufficiently smaller than unity,

as shown in Figure S2 in the Supporting Information. Therefore, we observed no events where photons were detected by multiple APDs in the same time bin during the decision-making experiments. If multiple APDs simultaneously detect photons, a decision cannot be made, and the system needs to try another photon observation. Meanwhile, since the single-photon generation rate in our experiment from the NV in a nanodiamond was approximately 50 k photons/s, there was no photon observation in any channel in experiments where a single capture of 100-ps timing resolution detection spanned only approximately 10 µs due to limited bandwidth between the TDC and the host controller; hence, we repeated photon measurements until a single photon was observed. This is among the practical limits of the operating speed of the system. The host controller (see the Supporting Information for specifications) serially controls three rotary servomotors for HWPs. Overall, the latency for a single slot play spanned around a few seconds, which limited the number of executable slot plays in the experiments. Improving the operating speed of the total system by enhancing the single-photon rate,42 the speed of the polarization modulation by, for example, electrooptic phase modulators and incorporating fully parallel control mechanisms are important engineering topics for future research.

20 ACS Paragon Plus Environment

Page 20 of 39

Page 21 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Second, integration is an important issue. Adapting integrated planar lightwave circuit technologies, previously used in some quantum systems,29,43 into photon intelligence is an interesting topic for future study. Another possibility is the use of nanophotonic devices based on optical nearfield−mediated energy transfer9 based on quantum dots44 and shape-engineered nanostructures.45 Third, there is the need to study more sophisticated decision-making problems. This study dealt with the multi-armed bandit problem for a single player. As clearly shown in Figure 5c, there is a trade-off between the speed of adaptation and the accuracy of decisions; examining an optimal strategy based on the requirements of applications is an interesting topic of research. Moreover, when the number of players increases to more than one, called the competitive multi-armed bandit (CMAB) problem, the entire problem becomes more complicated and the issue of the Nash equilibrium becomes a serious concern.46 Kim et al. proposed a physical architecture that solves CMABs.47 Using the intrinsic attributes of photons, such as entanglement,31 which is experimentally confirmed in quantum plasmonics,33 for such complex problems is another exciting research topic in photon intelligence.



CONCLUSION

To summarize, we have experimentally demonstrated that single photons in hierarchical architectures can solve multi-armed bandit problems from zero prior knowledge for decision making and reinforcement learning based on the intrinsic wave-particle duality of photons, which is a decisive step towards verifying an important architecture for massive parallelism. Using the NV center in a

21 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

nanodiamond as a single-photon source, the polarization of single photons was adapted by multiple polarization adjustors such that the highest-reward probability machine was chosen. The notions of coarse- and fine-scale decisions emerge in hierarchical architectures. The correct decision at the coarse and fine scales may contradict with each other. We showed that while the global optimal solution was derived by solving from the finer to the coarser scale in a step-by-step manner, referred to as the tournament method, exploiting the probabilistic abilities of a single photon allows the direct derivation of the optimal selection. This study unveiled single-photon intelligence in a hierarchical architecture for future AI as well as the potential of photonics for intelligent functionalities.



ASSOCIATED CONTENT

Supporting Information Further experimental details and additional data related to single photons.



AUTHOR INFORMATION

Corresponding Author *Email: [email protected]

Author Contributions M.N., S.H., and S.-J.K. directed the project. M.N. and S.-J.K. designed the system architecture. M.N., M.B., A.D., and S.H designed and implemented the optical systems. M.B. and M.N. conducted the optical experiments. M.N. and S.-J.K. analyzed the data. H.H. discussed the physical principles, and M.N., M.B., A.D., and S.H. wrote the paper.

22 ACS Paragon Plus Environment

Page 22 of 39

Page 23 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Notes The authors declare no competing financial interests.



ACKNOWLEDGMENTS

The nanodiamond sample was provided by G. Dantelle and T. Gacoin. This work was supported in part by the Core-to-Core Program, A. Advanced Research Networks from the Japan Society for the Promotion of Science, and in part by Agence Nationale de la Recherche, France, through SINPHONIE (ANR-12-NANO-0019) and PLACORE (ANR-13-BS10-0007).



REFERENCES

1. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. 2. Backus, J. Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs. Commun. ACM 1978, 21, 613–641. 3. Merolla, P. A. et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 2014, 345, 668–673. 4. Chien, A. A.; Vijay, K. Moore's Law: The First Ending and a New Beginning. Computer 2013, 46, 48–53. 5. Boixo, S.; Albash, T.; Spedalieri, F. M.; Chancellor, N.; Lidar, D. A. Experimental signature of programmable quantum annealing. Nat. Comm. 2013, 4, 2067. 6. Takata, K.; Utsunomiya, S.; Yamamoto, Y. Transient time of an Ising machine based on injection-locked laser network. New J. Phys. 2012, 14, 013052.

23 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

7. Yamaoka, M.; Yoshimura, C.; Hayashi, M.; Okuyama, T.; Aoki, H.; Mizuno, H. A 20k-Spin Ising Chip to Solve Combinatorial Optimization Problems With CMOS Annealing. IEEE J. Solid-St. Circ. 2016, 51, 303–309. 8. Kim, S. -J.; Naruse, M.; Aono, M.; Ohtsu, M.; Hara, M. Decision Maker Based on Nanoscale Photo-Excitation Transfer. Sci Rep. 2013, 3, 2370. 9. Naruse, M.; Nomura, W.; Aono, M.; Ohtsu, M.; Sonnefraud, Y.; Drezet, A.; Huant, S.; Kim, S. J. Decision making based on optical excitation transfer via near-field interactions between quantum dots. J. Appl. Phys. 2014, 116, 154303. 10. Naruse, M.; Berthel, M.; Drezet, A.; Huant, S.; Aono, M.; Hori, H.; Kim, S. -J. Single-photon decision maker. Sci. Rep. 2015, 5, 13253. 11. Pothos, E. M.; Busemeyer J. A quantum probability explanation for violations of 'rational' decision theory. Proc. Royal Soc. London B: Bio. Sci. 2009, rspb-2009. 12. Cheon, T.; Takahashi, T. Interference and inequality in quantum decision theory. Phys. Lett. A 2010, 375, 100–104. 13. Shihui, H.; Northoff, G. Culture-sensitive neural substrates of human cognition: A transcultural neuroimaging approach. Nat. Rev. Neurosci. 2008, 9, 646–654. 14. Daw, N.; O’Doherty, J.; Dayan, P.; Seymour, B.; Dolan, R. Cortical substrates for exploratory decisions in humans. Nature 2006, 441, 876–879.

24 ACS Paragon Plus Environment

Page 24 of 39

Page 25 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

15. Huys, Q. J.; Maia, T. V.; Frank, M. J. Computational psychiatry as a bridge from neuroscience to clinical applications. Nat. Neurosci. 2016, 19, 404–413. 16. Chen, C.; Takahashi, T.; Nakagawa, S.; Inoue, T.; Kusumi, I. Reinforcement learning in depression: A review of computational research. Neurosci. Biobehav. Rev. 2015, 55, 247–267. 17. Sutton, R. S.; Barto, A. G. Reinforcement Learning: An Introduction; The MIT Press: Massachusetts, 1998. 18. Laureiro-Martínez, D.; Brusoni, S.; Zollo, M. The neuroscientific foundations of the exploration−exploitation dilemma. J. Neurosci. Psychol. Econ. 2010, 3, 95–115. 19. Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time analysis of the multi-armed bandit problem. Machine Learning 2002, 47, 235–256. 20. Lai, L.; Gamal, H.; Jiang, H.; Poor, V. Cognitive Medium Access: Exploration, Exploitation, and Competition. IEEE Trans. Mob. Comput. 2011, 10, 239–253. 21. Kim, S. -J.; Aono, M. Amoeba-inspired algorithm for cognitive medium access. NOLTA 2014, 5, 198–209. 22. Agarwal, D.; Chen, B. -C.; Elango, P. Explore/exploit schemes for web content optimization. Proc. of ICDM 2009, 1–10. 23. Kocsis, L.; Szepesvári, C. Bandit based Monte Carlo planning. Machine Learning: European Conf. Machine Learning, Lecture Note Computer Sci. 2006, 4212, 282–293.

25 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

24. Robbins, H. Some aspects of the sequential design of experiments. B. Am. Math. Soc. 1952, 58, 527–535. 25. Kim, S. -J.; Aono, M.; Nameda, E. Efficient decision-making by volume-conserving physical object. New J. Phys. 2015, 17, 083023. 26. Naruse, M.; Kim, S. -J.; Aono, M.; Berthel, M.; Drezet, A.; Huant, S.; Hori, H. Category theoretic foundation of single-photon-based decision making. Preprint at http://arxiv. org/abs/1602.08199. 27. Ishikawa, M.; Namiki, A.; Senoo, T.; Yamakawa, Y. Ultra high-speed Robot Based on 1 kHz vision system. 2012 IEEE/RSJ Int. Conf. Intelligent Robots and Sys. 2012, 5460–5461. 28. Hanzawa, N.; Saitoh, K.; Sakamoto, T.; Matsui, T.; Tsujikawa, K.; Uematsu, T.; Yamamoto, F. PLC-Based Four-Mode Multi/Demultiplexer With LP11 Mode Rotator on One Chip. J. Lightwave Technol. 2015, 33, 1161–1165. 29. Peruzzo, A.; Laing, A.; Politi, A.; Rudolph, T.; O’Brien J. L. Multimode quantum interference of photons in multiport integrated devices. Nature Commun. 2011, 2, 224. 30. Diamanti, E.; Takesue, H.; Honjo, T.; Inoue, K.; Yamamoto, Y. Performance of various quantum key distribution systems using 1.55 µm up-conversion single-photon detectors. Phys. Rev. A 2005, 052311. 31. Ladd, T. D.; Jelezko, F.; Laflamme, R.; Nakamura, Y.; Monroe, C.; O’Brien J. L. Quantum computers. Nature 2010, 464, 45–53.

26 ACS Paragon Plus Environment

Page 26 of 39

Page 27 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

32. Lounis, B.; Orrit, M. Single-photon sources. Rep. Prog. Phys. 2005, 68, 1129. 33. Dheur, M. C.; Devaux, E.; Ebbesen, T. W.; Baron, A.; Rodier, J. C.; Hugonin, J. P.; Lalanne, P.; Greffet, J. –J.; Messin, G.; Marquier, F. Single-plasmon interferences. Science Adv. 2016, 2, e1501574. 34. Cai, X. -D.; Wu, D.; Su, Z. -E.; Chen, M. -C.; Wang, X. -L.; Li, L.; Liu, N. -L.; Lu, C. -Y.; Pan, J. -W. Entanglement-Based Machine Learning on a Quantum Computer. Phys. Rev. Lett. 2015, 114, 110504. 35. Lau, H. -K.; Pooser, R.; Siopsis, G.; Weedbrook, C. Quantum machine learning over infinite dimensions. Preprint at http://arxiv.org/abs/1603.06222 36. Tezak, N.; Mabuchi, H. A coherent perceptron for all-optical learning. EPJ Quantum Technol. 2015, 2, 10. 37. Kim, S. -J.; Aono, M.; Hara, M. Tug-of-war model for the two-bandit problem: Nonlocallycorrelated parallel exploration via resource conservation. BioSystems 2010, 101, 29–36. 38. Beveratos, A.; Brouri, R.; Gacoin, T.; Poizat J.-P.; Grangier, P. Nonclassical radiation from diamond nanocrystals. Phys. Rev. A 2001, 64, 061802. 39. Sonnefraud, Y.; Cuche, A.; Faklaris, O.; Boudou, J. P.; Sauvage, T.; Roch, J. F.; Treussart,. F.; Huant, S. Diamond nanocrystals hosting single nitrogen-vacancy color centers sorted by photoncorrelation near-field microscopy. Opt. Lett. 2008, 33, 611–613.

27 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

40. Berthel, M.; Mollet, O.; Dantelle, G.; Gacoin, T.; Huant, S.; Drezet, A. Photophysics of single nitrogen-vacancy centers in diamond nanocrystals. Phys. Rev. B 2015, 91, 035308. 41. Dumeige, Y.; Treussart, F.; Alléaume, R.; Gacoin, T.; Roch, J. F.; Grangier, P. Photo-induced creation of nitrogen-related color centers in diamond nanocrystals under femtosecond illumination. J. Lumin. 2004, 109, 61–67. 42. Strauf, S.; Stoltz, N. G.; Rakher, M. T.; Coldren, L. A.; Petroff, P. M.; Bouwmeester, D. Highfrequency single-photon source with polarization control. Nat. Photon. 2007, 1, 704–708. 43. Aspuru-Guzik, A.; Walther, P. Photonic quantum simulators. Nat. Phys. 2012, 8, 285–291. 44. Naruse, M.; Tate, N.; Aono, M.; Ohtsu, M. Information physics fundamentals of nanophotonics. Rep. Prog. Phys. 2013, 76, 056401. 45. Naruse, M.; Tate, N.; Ohtsu, M. Optical security based on near-field processes at the nanoscale. J. Opt. 2012, 14, 094002. 46. Maskin, E. Nash equilibrium and welfare optimality. Rev. Econ. Stud. 1999, 66, 23–38. 47. Kim, S. -J.; Naruse, M.; Aono, M. Harnessing Natural Fluctuations: Analogue Computer for Efficient Socially Maximal Decision Making. Philosophies, Special Issue of Natural Computation: Attempts in Reconciliation of Dialectic Oppositions. In Press. Preprint at http://arxiv.org/abs/1504.03451.

28 ACS Paragon Plus Environment

Page 28 of 39

Page 29 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Figure 1. Single photon in hierarchical architecture for physical decision making. Linearly polarized single photons, emitted from the NV center in a nanodiamond, are directed to one of four photodetectors through three half-wave plates and polarized beam splitters arranged in a hierarchical or tree structure. The detection event at each detector is immediately associated with the selection of the slot machine. Due to the wave attribute of the single photon, the probability of photon detection for any of the detectors may not be zero, whereas individual single photon results in detection at one of the detectors thanks to the particle nature of photons. On the basis of the betting results, three polarization adjusters (PAs) controlled the orientation of the half-wave plates using rotary positioners, so that the single photon was more likely to be detected by the higher-reward-probability slot machine.

29 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Series of single photons in the hierarchical architecture depending on the polarizations. (a, b, c) Single-photon series over about 1 ms detected by ADSs with different HWP setups. (d, e, f) Photon counts over 0.5 s as a function of the orientation of three HWPs.

30 ACS Paragon Plus Environment

Page 30 of 39

Page 31 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Figure 3. Demonstration adaptation at coarse and fine layers of the hierarchical system. (a) The correct decision rate, based on 30 consecutive plays of slot machines with regard to two problem instances given by CASE 1 and CASE 2, increased over time, demonstrating accurate adaptation of the hierarchical single-photon-based decision maker. The correct decision at the fine layer was to select slot machine 1, whereas that at the coarse scale was to select either slot machine 1 or 2. Since the coarse scale decision was easier for CASE 1, quicker adaptation was observed in CASE 1. (b) Evolution of polarization adjuster values ( PAi (i = 1,L ,3) ). PA1 and PA2 for CASE 1 problem decreased rapidly, which was the foundation of the rapid adaptation observed in (a). PA3 stayed about zero for both cases, since the reward probability differences between P3 and P4 were zero or very small, as well as the fact that slot machines 3 and 4 were not chosen as time elapsed.

31 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Demonstration of the tournament method to solve contradictory problems at coarse and fine scales. The optimal solution, or the highest-reward probability slot machine, may not belong to the higher-reward probability group at the coarse scale, referred to as contradictory problems, such as {P1 , P2 , P3 , P4 } = {0.7, 0.5, 0.9, 0.1} (CASE 3). The best option was slot machine 3 ( P3 = 0.9 ), but P1 + P2 > P3 + P4 means that Group 1 (slot machines 1 and 2) were better at the coarse scale. The

tournament method derived the global optimal, whereby the fine scale local maximum was selected in the first round, followed by the second round, where the global maximum was derived by comparing the winners of the first round. The blue lines show the correct selection rate by the tournament method, which increased over time in the second round, whereas the non-tournament method, depicted by red lines, had difficulty in finding the solution.

32 ACS Paragon Plus Environment

Page 32 of 39

Page 33 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Figure 5. Adequate exploitation of probabilistic attribute of single photons allowed direct solution of contradictory problems. (a, b) By enhancing the resolutions of the PAs or the number of steps in the PAs, the probabilistic nature of the single photon was enhanced. By increasing the number of PA steps from five to 11, the correct selection rate of a contradictory problem increased without employing the tournament methods. (c) Numerical simulation of the correct decision rate revealed that the collect decision rate approached unity by increasing the PA resolution at the expense of slow adaptation. (d) The correct decision rate at cycle 30 as a function of the number of PA steps. The trend of PA resolution agreed well with the experiment (b) and the simulation (d).

33 ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For Table of Contents Use Only Manuscript title: Single Photon in Hierarchical Architecture for Physical Decision Making: Photon Intelligence Names of authors: Makoto Naruse, Martin Berthel, Aurélien Drezet, Serge Huant, Hirokazu Hori, and Song-Ju Kim

Table of Contents (TOC) Graphic

34 ACS Paragon Plus Environment

Page 34 of 39

Page 35 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Single photon in hierarchical architecture for physical decision making. Linearly polarized single photons, emitted from the NV center in a nanodiamond, are directed to one of four photodetectors through three halfwave plates and polarized beam splitters arranged in a hierarchical or tree structure. The detection event at each detector is immediately associated with the selection of the slot machine. Due to the wave attribute of the single photon, the probability of photon detection for any of the detectors may not be zero, whereas individual single photon results in detection at one of the detectors thanks to the particle nature of photons. On the basis of the betting results, three polarization adjusters (PAs) controlled the orientation of the halfwave plates using rotary positioners, so that the single photon was more likely to be detected by the higherreward-probability slot machine. Figure 1 195x150mm (300 x 300 DPI)

ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Series of single photons in the hierarchical architecture depending on the polarizations. (a, b, c) Singlephoton series over about 1 ms detected by ADSs with different HWP setups. (d, e, f) Photon counts over 0.5 s as a function of the orientation of three HWPs. Figure 2 183x248mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 36 of 39

Page 37 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Demonstration adaptation at coarse and fine layers of the hierarchical system. (a) The correct decision rate, based on 30 consecutive plays of slot machines with regard to two problem instances given by CASE 1 and CASE 2, increased over time, demonstrating accurate adaptation of the hierarchical single-photon-based decision maker. The correct decision at the fine layer was to select slot machine 1, whereas that at the coarse scale was to select either slot machine 1 or 2. Since the coarse scale decision was easier for CASE 1, quicker adaptation was observed in CASE 1. (b) Evolution of polarization adjuster values (PAi (i=1,2,3) ). PA1 and PA2 for CASE 1 problem decreased rapidly, which was the foundation of the rapid adaptation observed in (a). PA3 stayed about zero for both cases, since the reward probability differences between P3 and P4 were zero or very small, as well as the fact that slot machines 3 and 4 were not chosen as time elapsed. Figure 3 248x159mm (300 x 300 DPI)

ACS Paragon Plus Environment

ACS Photonics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Demonstration of the tournament method to solve contradictory problems at coarse and fine scales. The optimal solution, or the highest-reward probability slot machine, may not belong to the higher-reward probability group at the coarse scale, referred to as contradictory problems, such as {P1,P2,P3,P4}={0.7,0.5,0.9,0.1} (CASE 3). The best option was slot machine 3 (P3=0.9), but P1+P2>P3+P4 means that Group 1 (slot machines 1 and 2) were better at the coarse scale. The tournament method derived the global optimal, whereby the fine scale local maximum was selected in the first round, followed by the second round, where the global maximum was derived by comparing the winners of the first round. The blue lines show the correct selection rate by the tournament method, which increased over time in the second round, whereas the non-tournament method, depicted by red lines, had difficulty in finding the solution. Figure 4 247x84mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 39

Page 39 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Photonics

Adequate exploitation of probabilistic attribute of single photons allowed direct solution of contradictory problems. (a, b) By enhancing the resolutions of the PAs or the number of steps in the PAs, the probabilistic nature of the single photon was enhanced. By increasing the number of PA steps from five to 11, the correct selection rate of a contradictory problem increased without employing the tournament methods. (c) Numerical simulation of the correct decision rate revealed that the collect decision rate approached unity by increasing the PA resolution at the expense of slow adaptation. (d) The correct decision rate at cycle 30 as a function of the number of PA steps. The trend of PA resolution agreed well with the experiment (b) and the simulation (d). Figure 5 282x188mm (300 x 300 DPI)

ACS Paragon Plus Environment