Optimal Navigation of Self-Propelled Colloids - ACS Nano (ACS

Sep 25, 2018 - Here, we report a feedback control strategy by which to navigate self-propelled colloids through free space and increasingly complex ma...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIVERSITY OF TOLEDO LIBRARIES

Article

Optimal Navigation of Self-Propelled Colloids Yuguang Yang, and Michael A. Bevan ACS Nano, Just Accepted Manuscript • DOI: 10.1021/acsnano.8b05371 • Publication Date (Web): 25 Sep 2018 Downloaded from http://pubs.acs.org on September 27, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

Optimal Navigation of Self-Propelled Colloids Yuguang Yang and Michael A. Bevan∗ Chemical & Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218

FOR TABLE OF CONTENTS USE ONLY (maximum width 9.0 cm and height 4.0 cm; maximum width 3.54 in and height 1.57 in) ABSTRACT: Controlling navigation of self-propelled, Brownian colloids in complex microstructured environments (e.g., porous media, tumor vasculature) is important to emerging applications (e.g., enhanced oil recovery, drug delivery). Here, we report a feedback control strategy to navigate self-propelled colloids through free-space and increasingly complex mazes. Colloid rod position and orientation within mazes is sensed in real-time, and instantaneous propulsion along the rod long-axis is actuatable via light intensity. However, because uncontrolled rod rotational diffusion determines the propulsion direction, feedback control based on a policy is required to decide how to actuate propulsion magnitude vs. colloid position and orientation within mazes. By considering stochastic rod dynamics including self-propulsion, translational-rotational diffusion, and rods-maze interactions, a Markov decision process (MDP) framework is used to determine optimal control policies to navigate between start and end points in minimal time. The free-space navigation optimal policy effectively reduces to a simple heuristic where propulsion is actuated only when particles point towards the target. The emergent structure of optimal control policies in mazes is based on globally following the shortest geometric paths; however, locally propulsion is actuated to either follow paths towards the target or to produce collisions with maze features as part of generating more favorable positions and orientations. Findings show how coupled effects of maze size, propulsion speed, control update time, and relative particle translational and rotational diffusivities influence navigation performance. KEYWORDS: active colloids | feedback control | Markov decision process | fractal mazes | first passage time

∗ To whom correspondence should be addressed: [email protected] Yang & Bevan

Page 1 of 22 ACS Paragon Plus Environment

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 22

Inspired by the natural microscopic swimmers, such as bacteria and sperm, recent efforts have been made to fabricate synthetic self-propelled particles.1-3 Such self-propelled particles take various forms, such as bimetallic spheres and rods as well as chains of particles,4 which function via different propulsion mechanisms such as catalytic osmotic flows1 or field-mediated artificial flagellar motion.5 A number of studies have investigated the behavior of self-propelled particles in homogeneous bulk systems,6 in inhomogeneous environments with random or patterned obstacles,7 and within assembled clusters.8,9 By controlling trajectories of selfpropelled particles to perform tasks such as localization, targeting, and collective motion, it is anticipated that applications could be realized involving drug delivery,10 environmental remediation,11 oil recovery,12 and functioning micro-machinery.13-16 To date, control of selfpropelled particles has generally involved simple feedback control to position particles at prescribed locations in free space.6, 17-20 However, navigation of self-propelled particle trajectories within micro-structured environments containing obstacles and dead-ends, as in mazes, has not been addressed. Such control is essential to enable self-propelled particles to navigate porous networks (e.g., tissue, soil). Here, we develop a general approach to robust optimal feedback control of self-propelled colloidal particle trajectories in complex micro-structured media. Optimal path planning generally requires minimizing geometric path lengths and avoiding dead-end pathways (e.g., like self-driving cars) to efficiently navigate between start and end points. For potential applications that require self-propelled colloid navigation in large-scale micro-structured environments, optimal path planning is essential to minimize passage times, traversed distances, and consumption of scarce resources. While generic optimal path planning algorithms for deterministic and stochastic systems are well established,21,22 developing an optimal path planning algorithms specifically for self-propelled colloids requires careful consideration of Brownian translation and rotation. In particular, random Brownian motion drives colloids to uniformly sample all positions and orientations in free space and mazes as part of maximizing entropy. The ability to control actuation of rod propulsion is necessary to bias stochastic trajectories along designated geometric paths connecting start and end points. In one trivial limit, the ability to actuate propulsion amplitude on short time scales and in all degrees of freedom could be used to completely suppress Brownian motion to produce deterministic trajectories (like traditional robots). However, the more likely situation encountered in nano- and micro- scale systems is that feedback control updates occur on timescales slower than those associated with Brownian motion, and under-actuation does not enable arbitrary forces and torques to control all degrees of freedom. Optimal feedback control of self-propelled particles in mazes is a non-trivial problem that has not been solved to the authors’ knowledge, particularly where the objective is a minimum passage-time using an approach to directly address effects of stochasticity and underactuation. Based on the aforementioned considerations as well as the types of self-propelled particles that have been practically demonstrated in experiments, here we report results for optimal control policies in simulated experiments for the case depicted in Fig. 1. We consider the quasi-2D motion of micron sized colloidal rods with a propulsion magnitude that is tunable via light intensity23-32 or other globally actuatable mechanisms,23, 32-34 and a propulsion direction along the rod’s long axis that is determined by uncontrolled Brownian rotation. Results are reported for optimal control of such particles in free-space and increasingly complex mazes Yang & Bevan

Page 2 of 22 ACS Paragon Plus Environment

Page 3 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

ON

A

OFF

B v y

LR

φ

2aR

y

x

x

s  x, y, ϕ

Figure 1 | Feedback controlled navigation of self-propelled rods in mazes. (A) (1) An imaging system senses the particle state, s=(x, y, φ), within a maze, (2) which is transferred to the controller (a “Maxwell demon” like entity who knows the optimal control policy) at control update time interval, tC, (3) where it is determined whether to turn a light ON or OFF to actuate particle propulsion. The controller uses an optimal control policy determined using a MDP framework based on the particle’s state dependent transition probability. (B) The lab frame coordinate system used to track rod center of mass position and orientation. The self-propelled rod with length, LR=2um, and diameter, 2aR=0.4um, is propelled along its long axis with speed, v, that is proportional to light intensity.

using a feedback scheme consisting of: (1) a sensor, based on a microscope and camera to track rods’ positions and orientations, (2) an actuator, using a light source to tune rod propulsion speed (but not direction) at a given control update time, and (3) and a control policy, which closes the loop by specifying in real-time propulsion actuation based on the rod state to rapidly navigate between points. By developing a probabilistic model (Markov chain model) of the rod dynamics under different propulsion settings, a Markov decision process (MDP)33 framework is used offline to determine the optimal control policy for various geometries including free-space to increasingly complex mazes. After identifying essential navigation control principles in a series of case studies, findings are generalized to show how optimal control policies scale with maze feature size, control update time, and relative rates of colloid propulsion and diffusion. By using the MDP framework, the approach to optimal control investigated in this work is rigorous, robust, and general and can be easily adapted to other propulsion mechanisms beyond light controlled self-propulsion. To provide some additional context for the conceptual approach in this work, the proposed strategy can be compared to Maxwell’s demon. In Maxwell’s thought experiment, a demon controls a door between two halves of a container; opening and closing the door after sensing approaching molecules’ speeds is employed with the objective to separate slower and faster molecules between the two halves. Because this process would raise one side’s temperature and lower the other side’s temperature from an initially uniform temperature, the 2nd law of thermodynamics appears to be violated (via an apparent net entropy decrease; the 2nd law is probably not violated since the demon’s efforts likely increase entropy elsewhere). We find inspiration from the demon’s ability to exploit control of thermal motion to achieve a nontrivial outcome. Instead of actuating a door to separate molecules with different thermal energies, here we actuate self-propulsion of thermally rotating colloids to navigate mazes. The natural tendency without control in both experiments is for entropy to be maximized via random sampling of states within the container (to produce a uniform temperature) or maze (to produce a Yang & Bevan

Page 3 of 22 ACS Paragon Plus Environment

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 22

random walk or diffusion). In the case of Maxwell’s demon, sensing molecular speed is used to decide whether to actuate a door, whereas in the present case, sensing particle orientation is used to decide whether to actuate propulsion. However, as will be shown, the decision to actuate propulsion (i.e., the control policy) in maze navigation is less obvious than for Maxwell’s demon, who simply actuates a door based on a threshold molecular speed. In the following, we report a method to determine and demonstrate optimal control policies for actuating Brownian self-propelled colloids with the objective of optimally navigating mazes. RESULTS & DISCUSSION Colloidal Dynamics to Optimal Control To pose a well formulated control problem, we first introduce a sufficiently realistic Brownian Dynamics (BD) simulation model of self-propelled colloidal rod particles to capture the dominant physics commonly observed in experiments.34 Although more rigorous and complex models of rod-boundary hydrodynamic and colloidal interactions (including translationrotation coupling)35 and propulsion mechanisms36-38 could introduce quantitative changes to the following results, the conceptual problem and algorithms are not expected to differ significanlty based on such model variations. Practically, an equation of motion containing different or additional terms could be used to obtain optimal control policies using the general method illustrated in this work. The equation of motion for the lab frame position vector, r, and orientaton, ϕ, of a self-propelled Brownian colloidal rod in two dimensions is given by coupled equations as, Dt ⋅ F∆t + ∆r B + v cos(φ )e1 + v sin(φ )e 2 kT D φ (t + ∆t ) = φ (t ) + r Γ∆t + ∆φ B kT

r (t + ∆t ) = r (t ) +

(1)

where Dt is the translational diffusivity tensor containing coefficients for different directions,35, 39 Dr is the rotational diffusivity, F and Γ are forces and torques due to rod-obstacle electrostatic interactions, v is propulsion speed (e.g., actuated by light or another mechanism), kT is thermal energy, ∆t is the time step, and e1 and e2 are orthogonal unit vectors in the Cartesian coordinate lab frame (see calculation of coefficients and additional details in Methods). This model assumes gravity and substrate repulsion confine rod particles in a quasi-2D layer.35, 39 The cylindrical rod particles are uniaxial (2um length, 0.4um cylindrical cross sectional diameter) so that the only degrees of freedom are the particle center-of-mass and orientation angle in the lab frame. A key aspect of this model from a control perspective is that the orientation, i.e., the direction of the propulsion, is not controlled, which is realistic based on self-propelled particle experiments.40 As a result, when there is nonzero propulsion velocity (v>0), directed deterministic motion occurs at short times (t > 1/Dr).40 To navigate a rod through a maze using only controlled propulsion, an intuitive strategy is to actuate propulsion when orientation and position favorably influence trajectories to avoid obstacles and dead-ends. Such a strategy could be quantified by a control policy, π, which is a set of rules that close the loop between an actuatable velocity, v, and observable system states, s=(x, y, φ), to achieve a navigation objectives using control update time, tC. However, it is difficult to guarantee that such a control policy would be effective or produce an “optimal” trajectory. The Yang & Bevan

Page 4 of 22 ACS Paragon Plus Environment

p

y/um

x/um

B

y/um

p(sn+1|sn,vn)

A

p(sn+1|sn,vn)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

p(sn+1|sn,vn)

Page 5 of 22

x/um

C

∆φ/rad

Figure 2 | Transition probability in free-space for diffusion and self-propulsion. (A) Plot of p(sn+1|sn,vn) shown for position coordinates (x, y) without self-propulsion (i.e., diffusion) after 1s time step starting from initial state s=(x, y, φ) = (0, 0, 0). (B) Plot of p(sn+1|sn,vn) with same parameters as part (A) except with v=4.5 um/s. (C) Plot of p(sn+1|sn,vn) shown for angular coordinate (φ) with and without selfpropulsion after 1s. Note that the rotational probability is not influenced by self-propulsion since there is no angular contribution.

optimal control policy, π*, is a policy that navigates between states with a minumum integrated process cost, where the cost can be expressed as a quantifiable metric, e.g., total time, distance traveled, energy consumed, etc. Here we use a MDP framework to compute the optimal control policy.33 The MDP framework is appropriate in the present problem to consider the non-linear, coupled stochastic Brownian rotation and self-propulsion. To implement MDP, a discrete-time Markov chain model is constructed to capture the rod’s transition probability between different states at different selfpropulsion speeds. Then the optimal control policy is obtained by minimizing a cost function associated with the entire integrated process of moving between different states based on the Markov chain model. The MDP framework is general and can be employed to develop optimal control policies for different equations of motion and actuation mechanisms (e.g., see examples of tunable depletion41 and electric field42-44 mediated colloidal assembly). Uncertainty in particle states in experiments from imaging limitations can be considered within the MDP framework using signal processing strategies (e.g., filters).22 Details of computing the Markov chain model, transition probability, and optimal control policy are reported in Methods. Free-Space Navigation We first demonstrate optimal control of a self-propelled colloidal rod in quasi-2D free space (i.e., absence of obstacles or confinement). The essential input to the MDP framework is a Markov chain model of the rod dynamics characterized by a transition probability, p(sn+1|sn,vn), from state sn to state sn+1 for a given propulsion velocity, vn , and control update time, tC. Here the subscript n and n+1 denotes the state and speed are measured at time tn and tn+1 = tn + tC. For navigating quasi-2D free space, Fig. 2 shows a plot of p(sn+1|sn, vn) constructed from BD simulations for a starting state of sn= (x, y, φ)=(0, 0, 0), a control update time, ∆tC = 1s, and two velocity states v=0um/s and v=4.5um/s (i.e., light intensity “off” and “on”). In the absence of propulsion, the transition probability has a Gaussian distribution in the x,y and φ coordinates with the expected mean (at the origin) and variances for 2D translational (4DttC) and rotational (2DrtC ) diffusion.39 For a propulsion of v=4.5um/s, the transition probability is a distorted Gaussian in the particle coordinate system along the propulsion direction (a “Banana” distribution45) with the mode at (x, y, φ)=(4.5, 0, 0). The angular displacement distribution (Fig 2C) is unaffected by propulsion and is based solely on rotational diffusion (i.e., Gaussian with zero mean and variance of 2DrtC ≈ 60 degrees). The relatively large Yang & Bevan

Page 5 of 22 ACS Paragon Plus Environment

ACS Nano

A

B

0

45

90

135

180

225

270

315

y/um

OFF

ON

x/um

C

D

1s

5s

10s

15s

20s

25s

30s

50s

1e-1 frequency p(τ)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 22

1e-2 1e-3 1e-4 1

10

100

τ/s

1000 10000

first passage time t / s

Figure 3 | Optimal navigation of self-propelled colloids in free space. (A) Simulated optimally controlled trajectory for 200s period with starting state (x, y,φ) = (30, 30, 0)() and target position (x, y) = (0, 0) (★) (animated in Movie S1). The blue shaded rod indicates self-propulsion in the ON state (v=4.5um/s) and red indicates the OFF state corresponding to Brownian diffusion (v=0um/s). (B) Visualization of optimal control policy calculated using Eq. (14) as a function of states in (x, y,φ); states are discretized into a Cartesian grid in particle position (x, y) and octants (in 0, 45 degree increments) in the particle orientation relative to the forward direction of the particle long-axis relative to the target. Blue indicates the self-propelled ON state (v=4.5um/s) and red indicates the OFF state corresponding to Brownian diffusion (v=0um/s). (C) First passage time distributions, p(τ), from simulation (points) for controlled (black) and uncontrolled diffusion (red) and uncontrolled propulsion (blue). First passage time distribution from models (lines) for controlled (Eq. (17)) and uncontrolled diffusion and propulsion 2 (asymptotic limit of 1/tln (t)). (D) Theoretical probability evolution from Eq. (16) under optimal control as a function of time.

angular displacement variance produces the large spread in the position probability (Fig 2B) due to coupling. The free-space transition probability (Fig. 2) can be used in Eq. (14) to compute the optimal control policy to navigate a self-propelled Brownian rod between two points in freespace. A representative trajectory (Fig. 3A, Movie S1) for controlled navigation between initial and final coordinates demonstrates how propulsion is actuated based on the rod’s position and orientation relative to the target. The optimal policy specifies propulsion vs. eight discretelypartitioned rod states, which are plotted in a coordinate frame referenced to the target state (Fig. 3B, zero angle indicates the rod pointing directly towards the target). The optimal policy, π*(x, y, φ), is compactly expressed via projection of the target-rod distance vector onto the rod long axis, dn, (which effectively accounts for orientation via compact notation) as, ON ( v = 4.5 µ m s ) , d n > 2.3µ m  OFF ( v = 0 µ m s ) , d n ≤ 2.3µ m

π * ( dn ) = 

(2)

which shows that only the orientation and distance of the rod relative to the target are important. This policy could be non-dimensionalized to scale for other v and tC, but dimensions are retained for explicit connection to the example in Fig. 1. Practically, the optimal control policy adjusts Yang & Bevan

Page 6 of 22 ACS Paragon Plus Environment

Page 7 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

propulsion speed to minimize distance between the rod position and the target at each control update time step. The intuitive picture underlying the control policy is that when the target is in front of the rod and relatively far away, propulsion is turned ON to reduce distance to the target; if the target is either behind the rod or nearby (i.e., dn ≲ 0.5vtC ≈ LR), propulsion is turned OFF to avoid increasing distance to the target or overshooting. As already previewed, this policy has some similarities to that employed by Maxwell’s demon; a single action is taken when the desired system state appears via stochastic thermal fluctuations, and the action is based on information to drive an apparent entropy decreasing process. For free-space navigation, the optimal policy effectively reduces to a simple heuristic; actuate propulsion when the particle is pointing towards the target. Such a simple control scheme is not obviously expected to be the case for particles interacting with obstacles/features within a maze. To evaluate performance and utility of the optimal free-space control policy (Eq. (2)), the time for rods to traverse between initial and target positions or the first-passage time, τ, is characterized for several cases. The target in this and all cases is defined as Starget = [(x,y): |xxtarget| + |y- ytarget| < 0.5LR]. Trajectories (~103) are measured for three different cases: (1) no propulsion (i.e., rod diffusion), (2) propulsion engaged at all times, and (3) optimally controlled propulsion (Fig. 3C). For each case, first passage time histograms are reported to account for stochastic rod motion and distributed passage times. The key finding from these results is that the optimally controlled trajectories have a finite, compact distribution, and hence a finite mean first passage time of 〈τ〉 ≈ 60s. In contrast, in either the absence of propulsion or for full uncontrolled propulsion, the τ distributions are heavy tailed (i.e., tails not exponentially bounded), which results in unbounded means (i.e., 〈τ〉). Rods experience random walks in both uncontrolled cases, where full propulsion has a higher effective translational diffusivity40 of Deff = Dt+v/4Dr. The long-time asymptotic limit for an unbounded first passage time distribution is expected to scale as ~τ-1ln2(τ),46 which agrees with simulated results for both uncontrolled cases. The time evolution of the rod’s positional probability provides a means to visualize the rod’s stochastic motion as the optimal policy controls propulsion velocity and navigation from initial to target coordinates (Fig. 3D). Results for positional probability at different time points are obtained by evolving the Markov chain model via Eq. (16) (see Methods). The probability of the controlled trajectories evolves from an initial delta function at the starting position and then at longer times stretches along the shortest geometric path towards the target. The probability distribution first reaches the target by ~20s, and then concentrates at the target until it reaches a compact distribution that becomes increasingly centered on the target at >60s. Because the controlled process remains stochastic, the probability evolution is consistent with both the τ distribution (Fig. 3C; ~10s> 1, the self-propelled particle essentially becomes either an unpropelled Brownian random walker or a constant propulsion random walker, which both lead to an unbounded 〈τ〉. Finally, the optimal Pe at fixed τC gradually increases vs. increasing obstacle size; this can be understood based on the fact that smaller obstacles (η ) benefit from finer control associated with less propulsion (smaller Pe). However, it is important to note in all the above cases that continuously decreasing τC while increasing Pe does not completely eliminate positioning error or continuously improve navigation performance. This is an intrinsic error inherent to the control system that cannot be eliminated by choice of τC and Pe. The intrinsic error can be defined as σin=minσ(τC,Pe), which indicates σIN is also a function of Dr and Dt (based on Eqs. (3), (4)). To quantify σIN at different combinations of Dr and Dt, values of Pe and τC were varied to find the minimum position error in free-space, which can be plotted vs. Dr and Dt (Fig. 7D). The error is observed to increase as σin≈(Dt/Dr)0.5, which appears as diagonal contour lines on a log-log plot. This indicates that even after optimal values of τC and Pe are determined for a given set of Dr and Dt, finite error remains and depends on the relative ratio of the diffusivities as well as their absolute values. This intrinsic positioning error reflects the uncontrollable elements even under feedback control (i.e., rotation is uncontrollable in the current problem), which provides insight on the fundamental limitations of controlling the position or navigation of self-propelled colloids. As such, any system that requires positioning error below this limit will not be efficiently controllable. In summary, under optimal control, positioning accuracy and (Fig. 7A) and navigation performance (Fig. 7C) share similar dependencies on control update time and propulsion speed. Control update time and propulsion speed need to be chosen in a cooperative manner to maximize positioning accuracy and minimize navigation 〈τ〉. In addition, there are limits associated with both positioning accuracy and navigation performance due to intrinsic uncontrollable elements. The close connections between positioning accuracy and navigation performance can be understood via the optimal control principles revealed in the example cases (Figs. 3-5). The ability to optimally navigate between start and end points is achieved by accurately and rapidly controlling colloid positions to sub-targets along the globally shortest path.

CONCLUSIONS & OUTLOOK We reported a general rigorous framework to determine optimal control policies for Yang & Bevan

Page 14 of 22 ACS Paragon Plus Environment

Page 15 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

navigation of self-propelled colloids in free space and mazes. In free-space, a strategy is employed where real-time sensing of colloid position and orientation is used to determine the propulsion rate; in this case, the control policy effectively reduces to one where propulsion is turned on only when the rod is pointing towards the target. In mazes, obstacles obstruct paths between self-propelled colloids and targets, which alters the resulting control policy. Globally, the strategy identifies the shortest path between start and end points, and locally, propulsion is controlled to often follow paths avoiding collisions but in some cases to promote collisions with maze features. Propulsion into boundaries appears to redirect particles to positions and orientations relative to maze features that favor a higher probability of reaching targets faster. While controlling trajectories to avoid collisions is consistent with intuition, promoting particleboundary collisions emerges in a manner that is not obvious a priori and is not captured with a simple heuristic. The optimal control policy enables orders-of-magnitude faster first passage times between points compared to either Brownian motion or uncontrolled propulsion. The optimal control performance scales linearly with the traveled path length in both free space and mazes, which contrasts a nearly cubic dependence for uncontrolled random walks. Control parameters are generalized in terms of non-dimensional control update times, propulsion rate, and obstacle dimensions based on rod diffusion rates and dimensions. Findings show using small update times coupled with high propulsion rates provides optimal control over the closely connected tasks of minimizing steady-state colloid positioning error and minimizing colloid navigation times between points. Preliminary studies showed continuous propulsion settings and finer angular discretization did not alter policies compared to the ones reported in this paper. Our findings are general and can easily be adapted to different actuators and navigation problems using the MDP framework. In future work, the goal is to implement the reported control method using actuation mechanisms cited in this paper as well as approaches we have previously reported (where practical implementation issues are addressed related to imaging, noise, resolution, etc.).43, 49-51 Extensions of these concepts could be applied to ensembles of selfpropelled colloids using distributed actuators (e.g., electrode arrays,52 liquid crystal displays, lasers) to perform additional tasks such are cargo capture and transport in free-space and mazes. In such approaches, dynamic target positions and actuation of each particle could be controlled by navigating each particle’s trajectory relative to its neighbors (e.g., parallel maze problems). METHODS Brownian dynamics. To model the forces and torques acting on rods due to electrostatic interactions with obstacles, the rod is modeled as a chain of touching spherical beads. Forces acting on the beads that composing the rod will then be transformed to the equivalent forces and torques acting the mass center of the rod. For a rod with position and orientation characterized by (r, φ)= (x, y, φ), the positions of its m spherical beads (where m= LR/2aR) of radius aR are given as,

ris = r + 2aR n1 (i −

1+ m ), i = 1, 2,..., m 2

(6)

where n1 is the unit direction vector of the rod. Obstacles are also represented by a collection of spherical beads such that the interaction between the rod and the obstacles can be easily calculated. Denote R as the set of indices for beads composing the rod and O the set of indices Yang & Bevan

Page 15 of 22 ACS Paragon Plus Environment

ACS Nano 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 22

for beads composing the obstacles. The forces due to rod-obstacle interactions are simply the summation of interaction forces spherical beads with the obstacle beads, given as,

F = ∑F , F = ∑ s i

i∈R

s i

rijs

B ppκ exp  −κ ( rijs − 2aR )  , i ∈ R

s j∈O ij

r

(7)

where F is the force on the rod, Fs is the force on the spherical beads, Bpp is the pre-factor for electrostatic interactions, κ-1 is the Debye length, rijs=rjs-ris is the vector pointing from bead i to bead j, and rijs is the magnitude of rijs. The torque on the rod can be related to forces on beads as, Γ = ∑ (r js − r ) × F js

(8)

j∈ R

The forces and torques in Eq. (7) and (8) then can be plugged into Eq. (1) for simulation. Random Brownian translational and rotational displacement vectors ∆rB and ∆φB obey the relationships, ∆r B = 0, ∆r B ( ∆r B ) ' = 2Dt ∆t

(9)

∆φ B = 0, ∆φ B ∆φ B = 2 Dr ∆t

where Dr is the rotational diffusivity, and Dt is the translational diffusivity tensor obtained as Dt = nnDt,|| + (I-nn)Dt,⊥, where I is the indentity tensor, n=(cos(φ),sin(φ)) is the orientation vector, Dt,|| and Dt,⊥ are translational diffusivity coefficients parallel and perpendicular to the rod long axis.35 BD simulations are used to construct transition probabilities and to test the efficacy of optimal control policies. When constructing first p(τ) for different control strategies, ~1000 simulated trajectories were run from the specified initial state until each trajectory reached the target (within 1um). The histograms for p(τ) are obtained on linear scale. The integration time step in all cases is 0.1ms, and all other simulated parameters are reported in Table 1. Table 1. Parameters used in BD simulations of self-propelled colloidal rods include: (a) cylindrical rod cross-sectional radius, (b) electrostatic potential pre-factor, (c) Debye screening length, (d) rod aspect ratio, (e) rod length, (f) translation diffusivity along rod long axis, (g) translation diffusivity perpendicular to rod long axis, (h) rotational diffusivity about rod long axis, (i) self-propulsion speed.

parameter aR (nm)a Bpp (a/kT)b κ-1 (nm)c m(LR/2aR)d LR(nm)e

equation (6) (7) (7) (6)

value 200 2.2974 30 5 2000

parameter Dt,|| (m2/s)f

Dt,⊥ (m2/s)g Dr (rad2/s)h v (m/s)i

equation (1),(9) (1),(9) (1),(9) (1)

value 5.13e-13 4.02e-13 0.55 4.5e-6

Transition Probability & Markov chain model. A discrete-time Markov chain for the rod’s dynamics under different propulsion velocities is fully characterized by the transition probability, p(sn+1|sn,vn), where sn and sn+1 are states at time tn and tn+1 = tn + tC, and vn is the selfpropulsion speed during time tn and tn+1. The Markov chain model and the transitional probabilities are the critical inputs for designing an optimal control policy using MDP. The transition probabilities can be obtained directly by: (1) running multiple short BD simulations starting at the every state in S, and then (2) collecting statistics for the resulting states after a Yang & Bevan

Page 16 of 22 ACS Paragon Plus Environment

Page 17 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Nano

discrete time step. However, such an approach is time-consuming and is not easily scalable to larger system sizes. To overcome this issue, an approximate but numerically accurate approach (a posteriori) we devised is to distinguish state transitions into two general cases, including: (1) rods moving in free space (i.e., no rod-obstacle interaction), and (2) rods moving near obstacles. To implement this approach, free space transition probabilities are constructed by translating and rotating about a single initial state, since p(sn+1|sn,vn) depends only on differences between consecutive states(i.e., xn-xn-1,yn-yn-1, φn-φn-1). The state space is constructed by discretizing the 2D Cartesian space as grids with resolution 5a (half of the length of a rod) in x and y directions, and the angular space with resolution π/8 (45◦). With this resolution, rods approximately occupy two to three 2D grid elements. The state space, S, consists of all configurations that do not overlap with obstacles. Transition probabilities near obstacles differ from free-space transition probabilities due to rod-obstacle interactions, and thus depend on rod and obstacle geometries, positions, and orientations. Capturing all details of rod transition probabilities near obstacles is impractical and in many cases unnecessary; here we approximate transition probabilities involving rod-obstacle collisions by equating the transition probability to zero for inadmissible overlapping states (i.e., forbidden positions and orientations since rods cannot penetrate obstacles), and renormalizing the remaining probability for admissible states. This simplifies the transition probability near boundaries with minimal but sufficient details of rod-obstacle interactions. When employing this estimate of the transition probability in the control scheme in this work, it was found that simply disallowing propulsion when the probability of overlapping states exceeds a threshold (~70%) was sufficient for control purposes. It appears unnecessary for the problems investigated in this work to more accurately quantify the transition probability corresponding to rod-obstacle interactions. However, care should be taken when extending such a simplification to other geometries where overlapping states dominate (e.g., more confined geometries). Despite using a convenient approximation for the transition probability to minimize computational expense (