Subscriber access provided by UNIV OF LOUISIANA
Commentary/Forum
Chemical Game Theory Darrell Velegol, Paul Suhey, John Connolly, Natalie Morrissey, and Laura Cook Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.8b03835 • Publication Date (Web): 14 Sep 2018 Downloaded from http://pubs.acs.org on September 18, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Chemical Game Theory Darrell Velegol*, Paul Suhey, John Connolly, Natalie Morrissey, Laura Cook Penn State University, Department of Chemical Engineering, University Park PA 16802 USA * To whom correspondence should be addressed (
[email protected]) first submitted 2017 Dec 20, revision 1 on 2018 Aug 04, revision 2 on 2018 Sep 11
TOC image: Chemical Game Theory (CGT) solves strategic game theory problems (top) using principles from Chemistry and Chemical Engineering.
Abstract The purpose of this paper is to describe a new framework for representing and solving strategic game theory problems. This framework, called “Chemical Game Theory” (CGT), uses well-known, rigorous principles from Chemistry and Chemical Engineering to solve strategic decision problems that could be analyzed using Traditional Game Theory (TGT). In strategic decisions, players each can choose from among two or more alternative possibilities, and the outcome depends upon the collective choices from all players. In this article we will analyze some of the premises of TGT as compared with CGT. In CGT, the players’ choices are treated as metaphorical molecules, and outcomes are calculated according to chemical reaction methods. The important concept of entropic choices is introduced, and pre-bias effects are included naturally as initial concentrations of reactants. CGT is not a generalization of TGT; rather, it represents contested decision problems differently, and gives different solutions. In this article we use the formalism of Chemistry to provide a “knowlecular approach” to analyzing contested decisions. This approach has a rich capacity to represent decision-making scenarios and serve as a decision-making algorithm for contested decisions, where leadership power plays an important role. Keywords: game theory, strategic decision, Prisoner’s Dilemma, pre-bias, entropic choices, reputation, decision reactions, perception function, knowlecular approach.
ACS Paragon Plus Environment
1
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 31
Introduction The purpose of this paper is to describe a new framework for representing and solving strategic game theory problems. We call this framework “Chemical Game Theory” (CGT). CGT uses well-known, rigorous principles from Chemistry and Chemical Engineering in the area of strategic decision-making. In strategic decisions, players each can choose from among two or more alternative possibilities, and the outcome depends upon the collective choices from all players. Over the past decades, strategic decisions have been analyzed with “game theory”, which we will here call “Traditional Game Theory” (TGT).1,2 In this article we will analyze some of the premises of TGT, as compared with CGT. In CGT, the players’ choices are treated as metaphorical molecules, and outcomes are calculated according to chemical reaction methods. Whereas TGT has been described as “normative”,3 directing what “rational” players “should” do, CGT, like chemistry, is a predictive theory. In this article we use the formalism of Chemistry to provide a “knowlecular approach” to analyzing contested decisions. This approach has a rich capacity to represent decision-making scenarios and serve as a decision-making algorithm for contested decisions, where leadership power plays an important role. Strategic decisions or games are not optimization problems in the usual sense. For example, to find the optimum of a function y = 3x2 - 12x +4, we could use calculus to find a minimum at {x = 2, y = -8}. If we have constraints for our optimization, then we could use linear programming or similar methods.4 In both cases, there is one clear objective function.5 Strategic games are more complex. They are frequently competitive, contested, or political. Rock-paperscissors is an example of a strategic game, since one person’s payoff or utility depends on the choices of both the other player and self, and since each player has a different idea of an “optimum”. After all, each wants a win for self and a loss for an opponent. Within the field of Chemical Engineering systems engineering, numerous papers have incorporated game theory. An early example was in the field of robust control synthesis.6 Other examples have studied the Stackelberg competition (leader-follower) and the related bilevel optimization problem.7 Some of this literature incorporates TGT in the analysis, thus using Nash equilibria.8 Thus, these studies do not incorporate the CGT ideas of entropic decisions or initial pre-bias that will be described in this article. This article consists of five parts: 1) Traditional (TGT) representation and solution of strategic games, 2) observations on TGT solutions, 3) CGT representation of the games, 4) CGT solutions of the games, 5) conclusions and possibilities. The approach taken in this article is to show how Traditional Game Theory represents and solves simple decision problems, and then to show how Chemical Game Theory represents and solves the same games, revealing important differences. CGT is not a generalization of TGT; rather, it represents contested decision problems differently — from the knowlecular level to the systems level, as in usual Chemical Engineering problems — and gives different solutions. In this article we focus primarily on the Prisoner’s Dilemma (PD) game, one of the most well-known and well-studied games.
ACS Paragon Plus Environment
2
Page 3 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Traditional (TGT) representation and solution of strategic games First we describe one version of the Prisoner’s Dilemma (PD) game. Two Players A and B intend to rob a store, but are caught by police immediately prior to the act. A and B are isolated into different jail cells without communication. The district attorney (the Decider D) says to A that if A tells on B (as TGT would say, “defects”), and B stays quiet (“cooperates”), that B will receive 3 years of prison, while A will receive none, and vice versa. If both tell, their confessions are less valuable, so each receives 2 years of prison. And even if both stay quiet, each will receive 1 year of prison due to unrelated charges created by the Decider. How should A and B “play this game”? Should each player tell, or stay quiet? One common way to represent this game is in “normal form” (Table 1). The solution to this game is usually found by inspection. Let’s start by assuming Player B chooses b1 (quiet). As given in the table on the left side (bold italics) of each numerical block, A receives 1 year by choosing a1, and 0 years by choosing a2; therefore, if A is “rational”,9 A would choose a2 (tell). What if B chooses b2? In this case, A receives 3 years by choosing a1, and 2 years by choosing a2, and so again A would choose a2. That is, either way, A receives a better outcome by choosing a2, and thus A would choose to tell if A is rational. By the same reasoning, B would choose b2. As a result, the solution to the game is that both players choose “tell” 100% of the time if they are “rational” — collectively giving them the worst possible outcome. That is the dilemma: By apparently choosing the best option for themselves individually, they get the worst collectively. The PD game has been used to study choices in climate change,10 fisheries,11 and many other tragedy of the commons12 scenarios. Table 1. Prisoner’s Dilemma (PD) game in normal form. Player A can choose possibility a1 or a2, while B can choose possibility b1 or b2. In each of the four payoff blocks, the value on the left (bold italics) belongs to A, and the value on the right belongs to B. For instance, if A tells and B remains quiet (i.e., a2, b1), then A receives 0 years of prison, while B receives 3.
a1 = quiet a2 = tell
b1 = quiet +1, +1 0, +3
b2 = tell +3, 0 +2, +2
In the PD game, a “rational” player’s “strategy” is to tell 100% of the time. This is called a “pure strategy”. We will define fractions (f) to represent the strategies. For instance, in the PD game described above in Table 1, TGT finds that Player A will play a1 with fa1 = 0 and fa2 = 1. More generally, the strategy of both players is expressed in terms of their probabilities for playing each of their options. The children’s game Rock-Paper-Scissors is an example of a “mixed strategy”. In this game, a rational player has a strategy to play Rock one third of the time, and the same for Paper or Scissors. That is, faR = faP = faS = ⅓. In the simple PD game, each player has only two possibilities, stay quiet or tell. Thus, fa1 + fa2 = 1 and fb1 + fb2 = 1. For the PD game, we have shown that according to TGT, the strategies of rational players are {fa1 = 0, fb1 = 0}, or equivalently, {fa2 = 1, fb2 = 1}. This strategy is called a “Nash equilibrium”,13 since it has the property that neither player can attain a better
ACS Paragon Plus Environment
3
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 31
outcome by changing strategy unilaterally. That is, if B continues to play with fb2 = 1, and A chooses to play a2 with a 98% probability (i.e., fa2 = 0.98), then A has changed unilaterally, and increases the probability of getting a worse outcome. A critical proof by Nash showed that in any finite game (i.e., finite number of players, each with a finite number of choices), there is at least one Nash equilibrium.14 Having a Nash equilibrium does not mean that the decision is “optimal”. In the PD game, the Nash equilibrium is not optimal by either player’s standard. In fact, for the PD game the TGT-predicted decision is not Pareto optimal, since both players can improve their situation by staying quiet.15 We will look at one other type of game, involving “mixed strategy” solutions. In the Battle of the Students (BoS) game, A and B are figuring out how to spend their evening. Player A wants to go to a volleyball game, while B wants to study together. But mainly, they want to be together. Here and for the rest of this article, we will frame the game matrix in terms of “pains” (h, dimensionless), so a negative amount of pain is favorable for the players (Table 2). As described later, we do this to maintain an analog with Chemistry, that having a negative Gibbs free energy change makes a given chemical process spontaneous. In the field of economics, similar information is conveyed with the words “payoff”, “utility”, “cost”, or “loss”. Table 2. Battle of the Students (BoS) game in normal form. The numbers are given in terms of “pain” (h, dimensionless), such that a more negative number is more desirable, and a more positive number is less desirable.
a1 = volleyball a2 = studying
b1 = volleyball -3, -2 0, 0
b2 = studying 0, 0 -2, -3
As given in Table 2, if both attend the volleyball game, both are happy (large negative values), although A is a bit happier. We see a similar result if they both choose to study together. If both attend different events, neither is pleased. How will the players choose? Let’s look at the {a1, b1} block. If A chooses to change unilaterally from a1 to a2, then A increases pain from hA11 = -3 to hA21 = 0, and so would not do so. If B chooses to change unilaterally from b1 to b2, then B increases pain from hB11 = -2 to hB12 = 0, and so would not do so. Thus, since neither player can improve their situation by changing their strategy of play unilaterally, then {fa1 = 1, fb1 = 1} is a Nash equilibrium. By similar reasoning, there is another Nash equilibrium at {fa1 = 0, fb1 = 0} (i.e., study-study). These are both pure strategies (i.e., 100%). In fact, there turns out to be a third Nash equilibrium for the BoS game. This strategy is a mixed strategy, so that 0 < fa1 < 1 and 0 < fb1 < 1. We can calculate this third strategy. First we calculate the total change in pain that a player receives by playing the game, for both Player A (∆hA) and Player B (∆hB). The change is from before they play the game, until after the game. These changes are given by
(1)
ACS Paragon Plus Environment
4
Page 5 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Thus, every time the game is played, some outcome must occur, and so (fa1 + fa2)(fb1 + fb2) = fa1 fb1 + fa1 fb2 + fa2 fb1 + fa2 fb2 = (1)(1) = 1. Since fa1 + fa2 = 1 and fb1 + fb2 = 1, we know that dfa1 = -dfa2 and dfb1 = -dfb2. That is, if Player A chooses a1 more, then A chooses a2 less. To find the Nash equilibrium, we want to find the fraction of the time that A plays a1, for which changing the strategy (i.e., playing a1 more, or less) does not improve A’s situation. Mathematically then, to find a mixed strategy (i.e., fa1 or fb1 not equal to 0 or 1), we require both: (2) These do not give maxima or minima, since the second derivatives are zero. Rather, the two equations give a solution for a stationary point where a Nash equilibrium occurs16:
(3) if 0 ≤ fa1 ≤ 1 and 0 ≤ fb1 ≤ 1 Thus, for the BoS game given in Table 2, we have a third Nash equilibrium at {fa1 = 0.60, fb1 = 0.40}. If the players choose a1 and b1 with these fractions, neither has an incentive to change their strategy unilaterally. It is interesting that in Eq 3, the strategy for Player A depends on only the pains of Player B, and vice versa.
A few observations on TGT solutions TGT has been described as a normative theory, indicating what a “rational” player “should” do, and it defines “rational” in particular ways given by rational choice theory17. Solving decision problems using this approach gives quantitative solutions, which lead to several important observations. 1) We sometimes find multiple Nash equilibria for a given problem. This has been called the “equilibrium selection problem”. For instance, in the BoS game we have three Nash equilibria: Which one is correct? Or what is the probability for choosing each equilibrium? There is literature on the selection problem,18 as well as secondary criteria. For instance, Schelling’s focal points or Harsanyi and Selten’s risk dominance or payoff dominance are sometimes used to discern the more plausible equilibrium. However, determining which secondary criterion to apply is itself unclear. 2) Traditional games have Nash equilibria that arise for players having no pre-bias. This might be called the “blank slate precondition” or the tabula rasa lack of previous awareness. Alternatively, one might try to put the precondition information into the pain matrix and not use
ACS Paragon Plus Environment
5
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 31
it otherwise. It is unclear then how one can separate the awareness of alternatives — we might be quite familiar with some outcomes, or they might be quite common — from the preference for alternatives. Additionally, if precondition information is used to adjust the pain values, it becomes difficult to separate the effects of the outcome from the effects of the pre-bias. 3) For decisions where the pains are similar, we find what we call the “indifference problem”. Take for example the PD game given in Table 3, where payoffs differ by just � = 0.01 instead of 1. If these pains are in terms of years of prison, the outcomes differ by 4 days of prison time instead of 1 year. Many players might consider 2 years of prison nearly equal to 2 years and 4 days of prison, but classic TGT says the outcome is still exactly the same as the original PD game: The players should play the strategy {fa1 = 0, fb1 = 0} (i.e., tell-tell) in 100% of the instances — it is the only rational solution. Some traditional game theorists recognize that there is a value of � where the players might become indifferent among choices, but it can be ambiguous how choices change continuously with �. Another point is that for even a small amount of uncertainty in the pains, such that the given pains change by a small amount in the right places, the decision could flip from 100% for {tell, tell} to 100% for {quiet, quiet}. Thus, although the pains are nearly indifferent, the resulting game solution would flip completely. Table 3. Prisoner’s Dilemma (PD) game highlighting the indifference problem.
a1 = quiet a2 = tell
b1 = quiet 1.99, 1,99 1.98, 2.01
b2 = tell 2.01, 1.98 2.00, 2.00
4) In comparisons to experimental data on human behavior, one finds that people are “irrational” compared with TGT. To highlight the differences between model and experiment, we choose a few highly-cited selections from the literature for the PD game to demonstrate. In a 1993 study, the correlation between studying Economics and cooperation in competitive games was examined.19 College students were put into groups of three and taken to separate rooms to choose a strategy for a one-shot (non-repeated) PD game against each of the other two players. The players were rewarded based on the game matrix in Table 4. Note that the values in Table 4 are dimensional, and listed as payoffs, in accordance with the original article. We examine these as dimensionless pains later in this article. 19
Table 4. A Prisoner’s Dilemma (PD) style game with monetary rewards. Here, the two choices of “cooperate” and “defect” replace the choices of “quiet” and “tell” previously used in this article. The same principle applies: When both players cooperate, they experience a more favorable outcome (receiving more money) than if they both defect. Note that this table is given in terms of reward, rather than pain.
a1 = cooperate a2 = defect
b1 = cooperate $2, $2 $3, $0
b2 = defect $0, $3 $1, $1
When players played the game conventionally (that is, without being allowed to make promises to cooperate), the cooperation rates were 28.2% for Economics majors and 52.7% for non-Economics majors.20 These numbers are surprising if you consider only TGT. The Nash
ACS Paragon Plus Environment
6
Page 7 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
equilibrium for the Prisoner’s Dilemma says that players should “defect” every single time. However, it is clear from these experimental results that players do not act as TGT predicts. The difference between Economics majors and non-Economics majors is also unexplained by this model. Since a rational actor is assumed in TGT, are the Economics majors “more rational” than the non-Economics majors because they picked “defect” more often? This data set is not unique. In many experimental situations, similar rates of cooperation have been found. In another study, students at the U.S. Naval Academy played a Prisoner’s Dilemma game with payoffs ranging from $2 to $20. The experimental cooperation rate was found to be 54%.21 In fact, it appears that in many of the one-shot experimental PD games in the literature, players usually cooperate about half of the time.22 These results are not predicted by traditional theory, and hypotheses have been proposed to resolve the discrepancy between the model and the data. Common hypotheses for why people cooperate when it is supposedly “irrational” to do so include reputation effects and “warm glow” effects.23,24 However, reputation effects in traditional games are applicable only in repeated game situations, and so do not explain cooperation in one-shot games; likewise, warm-glow altruism cannot explain the pattern of declining cooperation in finitely repeated games. Deviation from rationality has sometimes been attributed to the players having “made mistakes”.24 These observations all derive from the premises of the TGT model, especially the description of all players as “rational” agents. In fact, TGT has no requirement that the theory matches actual human behavior; rather, the model says what rational agents “should” do, based on a particular definition of the word “rational”, especially that the agent is goal-seeking and consistent. In that regard, TGT is a rigorous, mathematical subject which gives quantitative solutions to decision problems, and it has been helpful historically in certain situations. When these assumptions are taken as given, the conclusions given by TGT follow naturally and rigorously. But what if we take a different path, seeking not what self-regarding agents “should do” or “ought to do”, but rather the predictive path of what value-neutral agents “will do”? Such a path might address the four observations listed previously. Furthermore, such a descriptive model might well require a different representation of the problem. This is what we propose with Chemical Game Theory (CGT).
Chemical game theory (CGT) representation Chemical game theory (CGT) represents and solves games as Chemical Engineering problems — from the knowlecular to the systems level. We thus treat the players’ choices and their memes as “knowlecules”,25 meaning we treat them as metaphorical chemical molecules.26,27 This knowlecular approach to decision problems opens strategic decision-making to the existing apparatus of Chemistry and Chemical Engineering. Rather than determining what rational players should do, CGT uses a predictive approach to predict what actual players will do. CGT replaces the “rational” and “should” with analysis and design involving molecular language,
ACS Paragon Plus Environment
7
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 31
representation, and methods of solution.28 CGT is not a generalization of TGT; rather, it represents contested decision problems differently, and gives different solutions. At this time, it is still only a hypothesis that CGT will explain human decision making. Experiments will test this hypothesis over time, just as data is used to test and improve the models in Chemistry. It may seem simplistic to represent something as complex as a human choice — with all its ramifications, conditions, and subtleties — with a single symbol, like “a1” or “b2”. But in fact, this concept is drawn from ordinary Chemistry. Hydrogen, for example, is the simplest atom. It is composed of subatomic particles, and the distribution of the electron around the nucleus is described by the Schrodinger equation, giving s and p and other orbitals. Yet in chemical reactions, hydrogen is represented by a simple H. Carbon has only 6 electrons, and its description becomes enormously complicated. Yet in chemical reactions, carbon is represented by a simple C. It may seem even more radical to describe an entire person’s memome — their experiences, philosophies, knowledge, biases, fears, dreams, and more — with a single symbol, such as “A”. But this representation has some analogy to a DNA molecule holding all the information of our human genome. In CGT we use the meme concept when describing the self of each player. Much information is contained in the simple symbolic notation used here, but just as in Chemistry, we will use that simple notation to predict how the molecules (here, knowlecules) will behave. The core of Chemical Game Theory is concerned with chemical reactions between the players and their choices to form decisions (Figure 1). We call these “decision reactions”. In the schematic shown, knowlecules a2 and b1 react with Player A’s memes, represented by A knowlecules, which contain all the experiences, philosophies, knowledge, and biases of Player A. At the knowlecular level, reactants a2, b1, and A react to form intermediate decision A21 under the aid of a catalyst (the black item with red and blue pockets). The catalyst could act as a multi-substrate enzyme, specific and effective.
Figure 1. Schematic of the decision reaction for Player A on the molecular level. Choices a2 and b1 react with the self of A, under the aid of a catalyst, to given intermediate decision A21. This is one example of the 12 reactions shown in Figure 2. Here we show a2 + b1 + A = A21; determining the stoichiometry of decision reactions could become challenging in decision scenarios more complicated than simple games like the PD.
ACS Paragon Plus Environment
8
Page 9 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Now we turn to the systems level, as is done in Chemical Engineering. Though we often consider only two players in the classical PD game, here we consider three: A, B, and the Decider D. Each of these has a decision to make, based on their inputs. Figure 2 shows the thinking of Player A. For Player A to make a decision, she must consider herself, Player B, and Player D. In this article we will not consider information asymmetries between the players, nor will we include any deception, although both could be readily included. Thus we assume that each player understands the game completely and accurately. These assumptions are easy to change in CGT, but for here, that means that Player B’s block diagram will look the same as that for Player A. Figure 2 shows a block flow diagram of the process for Player A. In the first reactor, the two choices for each decision (for example, a1 and b1) combine with the player making the decision (represented as a solid species of knowlecule A) to create a fourth chemical species (A11), which is the contribution Player A makes to the {quiet, quiet} decision. We note that Reactor A is the vessel in which the decision reactions occur, while the solid species A represents the memome of Player A. Reactor A a1, a2, b1, b2, A
a1 + b1 + A = A11 a1 + b2 + A = A12 a2 + b1 + A = A21 a2 + b2 + A = A22
[+1] [+3] [0] [+2]
A11 + B11 + D = D11 A12 + B12 + D = D12 A21 + B21 + D = D21 A22 + B22 + D = D22
Reactor B a1, a2, b1, b2, B
a1 + b1 + B = B11 a1 + b2 + B = B12 a2 + b1 + B = B21 a2 + b2 + B = B22
Reactor D [-1] [-1] [-1] [-1]
D11, D12, D21, D22
[+1] [0] [+3] [+2]
Figure 2. Block flow diagram for PD game, from A’s perspective. For this article, we will assume that B has exactly the same perspective, so that there is no information asymmetry, though asymmetries can be readily 29 accommodated. After species exit each Reactor A, B, and D, a separation step removes unreacted reactants. For example, going into Reactor D, there is no a1, a2, b1, b2, A, or B. Separators are not shown for space considerations. Reactions 1-4 occur in Reactor A, 5-8 in Reactor B, and 9-12 in Reactor D. It is important to note that the configuration shown in this figure is not unique. Other block diagrams could be drawn, and we could even have a step after Reactor A in which the Aij are separated from unreacted ai and bj, then allowed to revert back to ai and bj under the action of a catalyst, and separated to get only the ai contribution.
Player A also anticipates what Player B will do, by considering B as a separate reactor (Reactor B). Player A then must consider how the Decider (Reactor D) will take the inputs from Reactors A and B to make the final decision. In some situations, Player D might actually be Player A or Player B, a binding contract under the interpretation of a judge, a mechanical or electronic mechanism, or something else that takes the inputs from A and B to make a decision. In the PD game, Player D might be the district attorney. The 12 possible reactions are shown in Figure 2 in each reactor. Like most “equilibrium chemical reactions” in actual practice, we have
ACS Paragon Plus Environment
9
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 31
catalysts for each reactor, so that the reaction proceeds in a reasonable time.30 In addition, once the products leave a certain reactor, they do not return to their original reactants since the catalyst is not present. Each reaction has an energy of reaction per mole of product (e.g., ∆hA21 for the decision A21 shown in Figure 1), as well as a Gibbs free energy change of reaction (e.g., ∆gA21 for the decision A21). In Chemistry, these energies have units of J/mol. Chemical energies are nondimensionalized by the thermal energy RT (i.e., as in PV = nRT).31 In this article we will not detail the meaning of temperature or RT — that is an important for future work — but rather we will let RT be like an energy scale of indifference, to relate to rational choice theory. Thus, if two choices are 0.01RT different in energy, the agent is nearly indifferent to the choices. Indeed the decision reactions that are described in this article could also be translated to a single agent choosing among alternatives. We let g�ij = ∆g�ij / RT and h�ij = ∆h�ij / RT. Despite the fact that the heat of reaction (∆h) and the Gibbs free energy change of reaction (∆g) have different values in Chemistry, we will make a gross approximation in this article since we do not have data for a particular PD game: We will set g�ij = h�ij. Thus, in Table 1, the pain for Player A associated with {quiet, quiet} could be gA11 = hA11 = +1, and the pain for Player B associated with {tell, quiet} is gB21 = hB21 = +3. Eventually, these values must be found from experimental data, as is done in Chemical Physics. They might be obtained from revealed preferences over many decisions. Furthermore, there are better methods given by various forms of utility theory.32,33 But for here, this means that we will use hB21 = gB21 = +3 and so on. This approximation does not alter the core message of this article. The Aij chemical species from Reactor A are then combined with Bij from Reactor B, to react in Reactor D. The Decider is important, as Player D could be completely unbiased (all ∆g are zero); push equally for at least some decision to be made (all ∆g are negative numbers and equal); or even favor one decision over another. Perhaps the Decider has a strong preference for Player A to receive a large penalty; then D12 {quiet, tell} would bring the Decider a low-pain outcome (i.e., favorable), and would accordingly have a more negative pain value. We note that while the block diagram shown in Figure 2 is very simple, the process flow diagram of actual chemical plants (and perhaps decisions with many players and steps) may contain many dozens of unit operations, forming a sophisticated network of flows and operations.34 We note that the flow of the decision-making process could proceed according to any of several PFDs, just as a Chemical Engineering plant can be designed in a variety of ways. For instance, we could have a step after Reactor A in which the Aij are separated from unreacted ai and bj, then allowed to revert back to ai and bj under the action of a catalyst, separated to get only the ai contribution, and the ai is fed to Reactor D. A similar arrangement could be made for the bi. This would yield different final results in CGT. Behavioral experiments could help determine which PFD is correct.
ACS Paragon Plus Environment
10
Page 11 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
The reaction outcomes for each of the twelve reactions shown can be calculated in the usual ways, either thermodynamically or kinetically. Here we take the thermodynamic approach using equilibrium reactions. This condensed derivation follows Denbigh.35 We start with the well-known thermodynamic result , where G is the Gibbs free energy, S is the entropy, T is the temperature, p is the pressure, V is the volume, µi is the chemical potential of species i, and ni is the moles of species i. In the usual way, if we examine a reaction at constant T and p, so that dT = 0 and dp = 0, we simplify to . The 36 reaction coefficients (�i) are set by the stoichiometry. We can thus express the change in moles for each species in terms of an extent of reaction (ε), as for the chemical potential
. Using a general expression
, with units of J/mol, we can now express the Gibbs
energy as . The activity of a gas phase molecule is ai = yi ϕi (p/p0), at 0 a pressure p and standard pressure p , where yi is the mole fraction of species i, including all inert species. If the pressure is not too high, the gas will be approximately ideal, so that the fugacity coefficients ϕi ≈ 1. The Gibbs free energy varies with the extent of reaction (ε), and to find the equilibrium extent, we minimize the Gibbs free energy. Thus, . By definition the standard state Gibbs energy change of reaction is
, so we have an expression
(4) Eq 4 defines thermodynamic equilibrium, and we solve for the extent of reaction that satisfies this equation. If we were to plot G(ε), we would see the clear minimum for G. This Gibbsian expression thus quantifies the Le Chatelier principle, indicating how a change in pressure (first term on right hand side of Eq 4) or mole fractions (second term on right hand side) are related to the molar Gibbs energy of reaction (term on left hand side). The second term on the right hand side of Eq 4 is due to an entropy of mixing (see Ref 35).37 The entropy of mixing term is essential for establishing equilibrium in chemical reactions, although it is not always recognized as such. In this article we see it as an essential piece of missing chemical physics for decision reaction equilibrium. In most cases we rewrite Eq 4 as an equilibrium constant (K). We can do this for a single reaction as given above, or for any number of reactions. For instance, for the reaction that produces A21 (i.e., A’s choice for tell-quiet), at p / p0 = 1 we have
(5)
(similar for all other K values)
ACS Paragon Plus Environment
11
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 31
Here we remember that gA21 is the nondimensional molar Gibbs free energy of reaction; this value is given in the game table as the pain for player A in the a2-b1 block. Equation 5 thus includes the free energy associated with the entropy of mixing component for the chemical reaction. For simplicity in this article, we will always set the pressure to standard pressure, and not vary the temperature. While the effects of both of these parameters are well-known for chemical reactions,35 they will require future study for CGT. Measurements like temperature seem almost trivial now in Chemistry, but the meaning and measurement of temperature were not always so simple,38 and further investigation might relate temperature to emotional phrases like “hotheaded” or “cold-blooded” decisions. We note that having new theories often opens up new categories of measurements, which might be helpful in the social sciences. At this point we don’t yet know the mole fractions y, and so we must evaluate them using a stoichiometric table, which is a mass balance. Table 5 shows the stoichiometric table for Reactor A, written as usual in terms of the extents of reactions εi. The stoichiometric table lists the species, initial concentrations, change in concentrations, end concentrations, and final y mole fractions.39 Initial concentrations in the CGT model represent the player’s pre-biases, and in the absence of other information, it is sometimes useful to approximate the player’s perspective as “unbiased”, with all initial reactant concentrations as c0a1 = c0b1 = 0.5. In addition, Tables 5 and 6 have a line for inert species; we set these to 0 for now, and discuss the concept of inert species (e.g., distractions) in the next section. At this point we have four unknown εi for Reactor A, and we also have equilibrium expressions for four reactions. Thus, we can solve for each extent of reaction. In Reactor B we have a very similar scenario. No reaction will go all the way to completion, as entropy dictates that we must have the most possible states in the system consistent with the total energy. This is a distinction with TGT, which, since it doesn’t account for entropic choice, would allow a dominant pure strategy. In CGT the changes in concentration indicate how much of that species the player “plays” in their own mind. From these changes in concentration and the initial concentrations, we can find the final concentrations and the relative amount of each species at equilibrium. Similarly, Reactor D can be represented with a stoichiometric table (Table 6). The reactants for D are the decisions from Reactors A and B, which react with the Decider to create final decisions. The relative amounts of these final decisions give the proportion of the time that a particular result (for example, {quiet, quiet}) would occur for these players with the given prebiases. The D reactions are numbered 9 to 12 (see caption of Figure 2 for numbering). For this two-player PD game, we thus have 12 chemical reactions, as illustrated in Figure 2. The solution will arise by solving for all Kij as in Eq 5, together with results for mole fractions from Table 5 (and a similar one for Player B) and Table 6. The equations are solved readily even with Excel Solver, which is how we solved the equations in this article. For larger
ACS Paragon Plus Environment
12
Page 13 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
systems, we use other software like Mathematica or GAMS. For small extents of reaction, the systems can be linearized and solved readily. The next section gives results for our calculations, and what they mean. Many of the results are surprising, in light of TGT. Table 5. Stoichiometric table for PD game in Figure 1, for Reactor A. Here we have started Player A as unbiased, since the amount of a1 and a2 knowlecules are the same. Note that we have set any inert species — which act as “distractions” — to zero. Additionally, the solid species representing the memome of player A does not appear in this table, as solids are often well-approximated as having a chemical activity of unity. Reactor A has catalysts that catalyze only the reactions for the A’s, and when the reactants and products leave Reactor A and its catalysts, those reactions essentially stop. The extents are written for reactions 1 to 4 (see Figure 2), which occur in Reactor A. Reactions 5 to 8 would appear in Reactor B.
Species
Initial
Change
End
y mole fraction
a1 a2 b1 b2
0.50 0.50 0.50 0.50
- (ε1 + ε2) - (ε3 + ε4) - (ε1 + ε3) - (ε2 + ε4)
0.50 - (ε1 + ε2) 0.50 - (ε3 + ε4) 0.50 - (ε1 + ε3) 0.50 - (ε2 + ε4)
[0.50 - (ε1 + ε2)] / ∑ [0.50 - (ε3 + ε4)] / ∑ [0.50 - (ε1 + ε3)] / ∑ [0.50 - (ε2 + ε4)] / ∑
A11 A12 A21 A22
0 0 0 0
+ε1 +ε2 +ε3 +ε4
ε1 ε2 ε3 ε4
inert
0
0
0
0
total
∑0 = 2.00
- (ε1 + ε2 + ε3 +ε4)
∑ = 2.00 - (ε1 + ε2 + ε3 +ε4)
1.00
ACS Paragon Plus Environment
ε1 ε2 ε3 ε4
/∑ /∑ /∑ /∑
13
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 31
Table 6. Stoichiometric table for the Decider reactor D. In most of this article, the pain for the Decider is gDij = -1 for each reaction in Reactor D, unless otherwise specified. Reactions 9 to 12 occur in Reactor D. Note that the catalysts in Reactor D allow only certain combinations to react, giving some correlation of the final results.
Species
Initial
Change
End
y mole fraction
A11 A12 A21 A22 B11 B12 B21 B22
ε1 ε2 ε3 ε4 ε5 ε6 ε7 ε8
-ε9 -ε10 -ε11 -ε12 -ε9 -ε10 -ε11 -ε12
ε1 - ε 9 ε2 - ε10 ε3 - ε11 ε4 - ε12 ε5 - ε 9 ε6 - ε10 ε7 - ε11 ε8 - ε12
(ε1 - ε9 ) (ε2 - ε10 ) (ε3 - ε11 ) (ε4 - ε12 ) (ε5 - ε9 ) (ε6 - ε10 ) (ε7 - ε11 ) (ε8 - ε12 )
D11 D12 D21 D22
0 0 0 0
ε9 ε10 ε11 ε12
ε9 ε10 ε11 ε12
ε9 ε10 ε11 ε12
inert
0
0
0
0
total
∑0
∆
∑ = ∑0 + ∆
1.00
/∑ /∑ /∑ /∑ /∑ /∑ /∑ /∑
/∑ /∑ /∑ /∑
Chemical game theory (CGT) solutions In this section we will see that CGT solutions arise due to the interplay of initial player pre-bias and decision entropy. Neither of these aspects is considered in the usual TGT, and so these are new parts of the representation and the solutions. In this section, we will discuss how incorporating pre-bias and entropic choices impacts the four observations of TGT previously described.
Equilibrium selection The first point we will discuss in this section is the role of “entropic choices” in decision making. It is one of the central differences of traditional and chemical game theory.40 In analyzing the PD game using TGT, the solution to the PD game is {tell, tell} with probability = 1.00. In TGT for the PD game, which has a pure strategy, each player does not use information about the other player. Rather, each player looks at their own pains, and concludes that no matter what the other player chooses, their own pain is minimized by choosing “tell”. Entropy is not included in the analysis. In CGT, if we start with unbiased players (i.e. equal initial concentration of decision reactants), then Figure 3a shows that the probabilities of outcomes expressed as mole fractions for the PD 0-1-2-3 game are yD11 = 0.523, yD12 = yD21 = 0.183, and yD22 = 0.111. This is shown by the point c0a1 = c0b1 = 0.5 in Figure 3a. Thus, players collectively choose D11 (i.e., quiet-
ACS Paragon Plus Environment
14
Page 15 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
quiet) 52.3% of the time, rather than 0% of the time as in TGT, and D22 (tell-tell) 11.1% of the time, rather than 100% of the time as in TGT. CGT therefore explicitly addresses the equilibrium selection problem by offering a quantitative way to determine the frequency of each possible equilibrium. At this point, a traditional game theorist might argue that a player choosing “quiet” is “irrational”. However, the result for CGT is different from traditional because the representation is different, and the criterion for equilibrium is different. In CGT, each player quantitatively includes the other player. In the hypothetical case of unbiased players with equal pains for all outcomes, entropy would dictate a “fair distribution” over all possibilities,41 meaning that each player would play “quiet” with the same frequency as “tell”. This would be the result if the Gibbs free energy changes for all decision reactions were zero, so that the change in Gibbs free energy were due only to entropy of mixing — i.e., purely entropic choices. In the present PD 01-2-3 game, the non-mixing portions of the free energies influence the choices toward minimizing the total free energy (i.e., here, the final decision equilibrium) at the end. Having finite probabilities for both “quiet” and “tell” for this PD 0-1-2-3 game is not due to shortcomings of the player, or errors, or irrationality, but rather due to the effect of entropy and the criterion for equilibrium on the choices. Entropic choices will always exist for finite pains in CGT. Thus, the three drivers for how each player plays are 1) the initial choice concentrations, which represent prior bias in a Bayesian sense,42 2) entropy, which aims to distribute the outcomes fairly, and 3) pains, which biases the results toward one choice or another. In the usual way, the final equilibrium is a fight between the fairness of the entropic effects and the bias of the pain energies. Additionally, rather than each player choosing to “tell” with probability 1.00, the final output is that each player chooses to play “quiet” more often than they play “tell”. Since D11 was chosen 52.3% of the time, and D12 18.3% of the time, this means that a1 existed in 70.6% of the final decisions. Thus, a1 “quiet” was played with a probability 0.706, and a2 “tell” with probability 0.183 + 0.111 = 0.294. Likewise, the probability of b1 = 0.706 and b2 = 0.294. Importantly, the players play a1 or b1 even higher than their initial pre-bias.43
ACS Paragon Plus Environment
15
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 31
Figure 3. Final outcomes yDij (mole fraction, normalized so that ⅀yDij = 1) for players in a PD 0-1-2-3 game. For Figures 3a-d, a game is played at each concentration c0a1, and the results are given by the 4 yDij points shown for that concentration, which must sum to unity. Note that TGT does not incorporate initial biases, and always predicts yD22 = 1; that is, the players always “tell”. It is assumed that A and B “see the same game” (i.e., no information asymmetries), and that the Decider has the same free energy gDij = -1 for each final decision. a) c0a1 = cb01. b) cb01 = 0.1 (B is biased to tell). c) cb01 = 0.5 (un-biased). d) cb01 = 0.9 (B is biased to stay quiet). At the left portion of plot (a), where c0a1 and cb01 are small, the primary outcome is D22 as expected. Once c0a1 = cb01 > 0.4, D11 is the primary outcome. In going from (b) to (d), we see an increase of D11 (quiet-quiet) at the right portion of the plots. That is to say, the initial bias of information dominates the final decisions.
ACS Paragon Plus Environment
16
Page 17 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Blank slate Players following a CGT mindset account for the reputation of other players in a Bayesian-type manner. This is done through the use of initial concentrations, which represent the pre-bias of the players. Information about the initial concentrations — whether it is correct or not — arises from knowing particular individuals, their culture, their family, their friends, their profession, their nation, their religion, and more. Anonymity can mask Bayesian preconceptions, while prominent features and signaling can increase it. Factors like racism or phobias can contribute to the initial concentrations, including in mistaken or harmful ways. Furthermore, people are trained “by life” and school and news to look for signals that convey information about others, altering initial concentrations. Reputation is accounted for in TGT, for instance through repeated game models.2,44 However, CGT accounts for reputation using initial concentrations of a1, a2, b1, and b2 for Reactor A and Reactor B. As a result, players can be anywhere from risk-averse to opportunity seeking, or unbiased in between. Now we turn to the results for players with an initial pre-bias, where the initial reactant concentrations are not equal. Calculations are summarized in Figure 3b-d. While unbiased players prefer D11, players that are initially biased towards “tell” (i.e., high c0a2 or c0b2) move the decision toward less cooperation. This outcome can be qualitatively predicted using Le Chatelier’s principle. To illustrate the effects of pre-bias and entropic choices, the results of two games with different pain values are plotted in Figure 4. When the magnitidues of the pains are smaller (Figure 4a), even with the same ratio of pains, the four final decisions are more equal. When the magnitudes of the pain values increase (Figure 4b), the players cooperate more of the time. The D11 decision, where both players cooperate, begins to dominate earlier as the dimensionless pains increase (which could have the same pain at lower temperature), and the D11 decision is the most common decision even when both players are biased 70% towards “tell” (c0a1 = 0.3). This may seem counterintuitive: If players cooperate, don’t they run the risk of getting hit with the huge +6 penalty? Players would aim to avoid this high pain in the traditional model. We hypothesize the following explanation, which will require future experimental evidence. If naive players truly have c0a1 = c0b1 = 0.5, they are subject to having a bad outcome. Remember, CGT does not dictate what is “rational”; rather, CGT predicts that if the players still have a sizable concentration of a1 and b1, that they will choose to play these. However, it happens that especially in close relationships, sometimes there is a nice reward when both have a certain choice (e.g., a1-b1), but a very costly penalty if one player chooses differently. In that case, we hypothesize that players learn that they can trust and choose c0a1 ≈ 1 and c0b1 ≈ 1, or else they will choose c0a1 ≈ 0 and c0b1 ≈ 0. We hypothesize that it is unlikely that experienced and educated players will retain c0a1 ≈ 0.5 and c0b1 ≈ 0.5 in potentially costly situations. Such sensitive situations can occur in close relationships, where trust pays dividends, but breaking trust is hugely costly.
ACS Paragon Plus Environment
17
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 31
Figure 4. Two Prisoner’s Dilemma games with differing pain values but the same ratio of pains. a) Pains of 0, 0.5, 1, and 1.5; b) pains of 0, 2, 4, and 6. For comparison, the common 0-1-2-3 game (same ratio) can be found in Figure 3a. As the pains increase — this could happen as the temperature becomes smaller for the same internal energy — it appears to become more likely that players will cooperate. See the text for a possible explanation.
Indifference CGT also addresses the indifference problem, where the pains differ by only a very small amount. CGT does this by incorporating the numerical pain of each decision into the thermodynamic calculation via the �g values. As � become small, the calculated pains become closer together (Table 7, Figure 5). The traditional solution indicates that {tell, tell} will be the only outcome 100% of the time — at least until � becomes small and the players approach indifference. As mentioned earlier, if there is a small amount of uncertainty, the small changes might “flip” the TGT solution from {tell, tell} to {quiet, quiet}. CGT illustrates how choices change continuously with �, showing that decisions do become less differentiated as pain values
ACS Paragon Plus Environment
18
Page 19 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
become closer together, until they are almost all equally likely. Small variations would change the results in a small, continuous manner. Table 7. This table is used to construct Figure 5. � values were changed from 1 (our original PD game) to 0 (all pains = 2). At � = 0.01, pain values are equal to those in Table 3, shown prior (i.e. gA11 = 1.99, gA12 = 2.01, gA21 = 1.98, gA22 = 2.00).
a1 = quiet a2 = tell
b1 = quiet 2- , 2-� 2-2 , 2+�
b2 = tell 2+ , 2-2� 2, 2
Figure 5. The yDij (normalized mole fraction) values for two unbiased players for the game in Table 8. As ε values decrease, the final decisions become more equally likely. TGT predicts that in the range of 0 < � < 1, the overall solution does not change, and the Nash Equilibrium remains at 100% D22 (i.e.,= tell, tell). No traditional NE is given for when � = 0 (all pains the same). In contrast, CGT explicitly accounts for the value of �, and thus kT serves as a measure of indifference. In CGT, yDij values change as the relative pains change, as expected from human behavior. This figure reveals the importance of “entropic choices” when the pains between different outcomes are not greatly impacted by energetics.
Experimental results The last observation we listed was that TGT does not accurately predict experimental data. As one initial test of the hypothesis that Chemical Game Theory can describe experimental human play strategy, we will now return to the two experimental studies whose results we summarized earlier in this article, and solve the Prisoner’s Dilemma game for each using CGT. Both experiments framed the participants’ payoffs in terms of money. This is a dimensional quantity, and so we developed a “perception function” as a type of phenomenological constitutive equation for utility.
ACS Paragon Plus Environment
19
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 31
The perception function is useful for solving the game using CGT. The various values in the payoff matrix — whether they are in dollars, tickets, or years of prison time — will not bring the same pain to every player. For example, a millionaire might consider $100 a pittance, while a college student might be quite content to receive that much money. As Bernoulli analyzed, humans often perceive stimuli on a logarithmic scale. Examples of this phenomenon can be found for many human perceptions, including sound loudness (logarithmic decibel scale), sound tone (logarithmic octave scale), and geologic intensity (logarithmic Richter scale). Furthermore, as Kahneman and Tversky showed,45 each person has their own reference point. To create a perception function, the authors used data based on our own preferences, and formulated these into a Weber-Fechner type law (Eq 6),46 which is a similar to other dosestimulus response functions. (6) To create this perception function, the authors answered the following question, considering themselves as college students: “A professor of sociology issues a request for participants in a study that will take one hour of time. To participate, what is the amount of monetary reward that you would consider to be the absolute minimum (level 0), small (1), medium (2), large (3), or huge (4)?” Our data was used to construct an average perception function (Eq 6) where p is the pain associated with an amount of money m. The statistics are based on a 90% confidence interval. The pains calculated from this perception function were used to calculate extents of reaction for the three reactors — for Player A, Player B, and the Decider D. The cooperation rate for the game was found by dividing the amount of “quiet” played (the sum of the change in a1 and b1 for all reactors) by the amount of any decision played (the sum of the change in a1, b1, a2, and b2 for all reactors). The players were assumed to start as unbiased. Cooperation rates were calculated using the perception function in two ways: i) In one case we used the average values alone [i.e., p = -0.98 ln(m/$3.06), with uncertainties = 0]; ii) In another case we used the uncertainties in Eq 6 in Monte Carlo statistical simulations.47 We’ll first examine case (i). In the study comparing Economics majors and non-Economics majors, the experimental cooperation rates were 28.2% for Economics majors and 52.7% for non-Economics majors. Assuming unbiased players, we used the perception function in Eq 6 with the monetary payoffs from Table 4 to create the pain matrix in Table 8. The CGT solution produced a cooperation rate of 57.4% — much closer to the rate for non-Economics majors than the solution classical economics theory would predict. This result raises a follow-up question: What is the reason for the difference between Economics majors and non-Economics majors?
ACS Paragon Plus Environment
20
Page 21 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
Table 8. Pain values (again, with positive as painful) calculated using the perception function Eq 6 and the monetary payoffs in Table 4. These pain values are the Gibbs free energy values associated with each reaction in the CGT framework. By solving the system of three reactors, we get a final concentration of each possible outcome — telling us how frequently that outcome would occur. The value $0 was approximated as $0.01 to avoid the singularity inherent in logarithmic functions. To many people, being handed a penny for an hour’s work would be almost like getting nothing at all.
a1 = quiet a2 = tell
b1 = quiet 0.42, 0.42 0.02, 5.61
b2 = tell 5.61, 0.02 1.10, 1.10
CGT can be used to assess the pre-bias. The pre-biases of each player can be biased towards “quiet”, or towards “tell”. By varying these pre-biases, through the initial concentrations, we can examine what behavior the CGT model predicts for players with a given pre-bias (Figure 6). One might say, “You are simply fitting more parameters, and so of course you fit the data better”. This is a fair criticism. However, we have aimed to use the standard tools from Chemistry and Chemical Engineering; a similar kind and amount of data is required to engineer a chemical plant, which we might argue is simpler than a human mind. Figure 6 shows that, as expected, a player who is biased towards “tell” (i.e., biased against “quiet”) will cooperate less. The experimental values for both groups in the study fall within the range predicted by CGT. This might indicate that the Economics majors were strongly biased — perhaps because they had learned the classical theory behind the Prisoner’s Dilemma, and played what they had been told people “should” play. Contrast this with the cooperation of the non-Economics majors, which is fairly close to the “unbiased” CGT solution, for which c0a1 = c0b1 = 0.50. We speculate that the experimental results might well depend on the nation and culture of the participants.
ACS Paragon Plus Environment
21
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 31
19
Figure 6. PD game (⬤) compared with experimental data from literature (-----). Depending on the initial concentrations, or pre-biases, the CGT model predicts different cooperation rates. This plot could be interpreted to mean that non-Economic majors have little pre-bias and are close to 50-50, while Economics majors have a greater pre-bias to defect. Note that the Nash equilibrium would be a horizontal line at the bottom for cooperation rate = 0.
In case (ii), where we used Monte Carlo calculations, we randomly generated twenty sets of perceived pains from a normal distribution following Eq 6, and treated these as twenty players. We then played Players 1 and 2, then Players 3 and 4, and so on, for ten games and thus ten outcomes. From these, we calculated an average outcome with a 90% confidence interval. With the Monte Carlo method, the mean cooperation rate of the ten randomly generated games was found to be (57.0 ± 0.5)%, assuming unbiased initial concentrations. This result is similar to the CGT-calculated rate obtained from the average perception function (57.4%). Furthermore, the CGT solution predicts the experimental result within uncertainty. In addition, the experimental data also had uncertainty, although the amount is not clear from the paper. This suggests that while individual players may vary from the average — and, in fact, will always vary in an experimental setting according to the CGT model — the model predicts that this variation is