BioNSi: A Discrete Biological Network Simulator Tool - ACS Publications

Technical Note ... The Biological Network Simulator (BioNSi) is a tool for modeling biological networks and simulating their discrete-time dynamics, i...
0 downloads 3 Views 2MB Size
Subscriber access provided by University of Glasgow Library

Technical Note

BioNSi: A Discrete Biological Network Simulator Tool Amir Rubinstein, Noga Bracha, Liat Rudner, Noga Zucker, Hadas E. Sloin, and Benny Chor J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.6b00278 • Publication Date (Web): 29 Jun 2016 Downloaded from http://pubs.acs.org on July 12, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

BioNSi: A Discrete Biological Network Simulator Tool

Amir Rubinstein,

∗,†

Noga Bracha,



Liat Rudner,

and Benny Chor



Noga Zucker,



Hadas E. Sloin,





†Blavatnik School of Computer Science, Tel Aviv University ‡Department of Neurobiology, George S. Wise Faculty of Life Sciences and Sagol School of

Neuroscience, Tel Aviv University E-mail: [email protected]

Abstract Modeling and simulation of biological networks is an eective and widely used research methodology.

BioNSi (Biological Network Simulator) is a tool for modeling

biological networks and simulating their discrete-time dynamics, implemented as a Cytoscape App.

BioNSi includes a visual representation of the network that enables

researchers to construct, set the parameters, and observe network behavior under various conditions. To construct a network instance in

BioNSi, only partial, qualitative

biological data suces. The tool is aimed for use by experimental biologists and requires no prior computational or mathematical expertise.

BioNSi is freely available

at http://bionsi.wix.com/bionsi, where a complete user guide and a step-by-step manual can also be found.

Keywords Biological Networks, Modeling, Simulation, Discrete Models

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 33

Introduction Networks play a central role in describing and modeling complex systems in a wide variety of scientic, social, communication and transportation contexts.

13

In the context of bio-

logical sciences, network based tools proved highly valuable in analyzing and understanding system level phenomena. Known examples include protein-protein interaction networks, metabolic networks,

6,7

regulatory networks,

interacting nucleotides/amino acids),

10

8,9

4,5

structural networks (network of chemically

and more.

Of special interest in our context are

networks that are used to model and simulate biological systems (or subsystems) of moderate sizes.

Typically, life scientists working on a specic system are highly familiar with

the underlying principles of the system, and with many details of its components. Yet, the complete patterns of interactions between the various components may be too complex to be analyzed manually. Software tools that can simulate such network behavior frequently provide an important approach in bridging those gaps.

The starting point for this genre

of tools is a network, constructed according to the biological knowledge and understanding of the researcher.

The tool then enables extensive

in silico

simulations, in a scale that is

infeasible experimentally, making it possible to tune the system, adding or erasing connections, changing strengths of interactions, etc. Such simulations may be used to strengthen or weaken hypotheses regarding the system, and, furthermore, suggest predictions as to the system's behavior under various conditions.

Such predictions can later be tested experi-

mentally, followed by modications in the network, if necessary, according to the accepted scientic paradigm. Biological network models that are used for simulation purposes are generally either continuous or discrete.

9

In a continuous network model, either the time or the system com-

ponents' states, or both, are described by real numbers. Such networks may provide detailed descriptions of the biological systems under study.

11,12

On the negative side, they often

require a very detailed knowledge of quantitative biological data,

i.e.

parameters such as

concentrations and kinetic constants, whereas such knowledge may be unavailable. Discrete

2

ACS Paragon Plus Environment

Page 3 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

network models limit the time steps or the states of components, or both, to integers. Despite the lower granularity in the system's description, discrete networks provide a satisfactory alternative in many cases: they tend to be more intuitive, often do not require full knowledge of the system, and may be more computationally ecient (for examples, see,

13 14 15 16 ,

,

). For

both continuous and discrete models, computational tools are often used as "black boxes", with limited familiarity of the underlying computational models and their inherent limitations. This may lead to misinterpretation of results.

17

In this paper, we present a simple, intuitive model, accompanied by a user-friendly tool called

BioNSi



Bio logical N etwork Si mulator.

The user interface of the tool is simple, and

enables the construction of a network, setting its parameters, and running simulations under various conditions, without any computational background.

methods BioNSi

is implemented as a Cytoscape 3 plugin,

18

and is written in Java. The computational

model is an extension of a discrete transition model, previously suggested in

14

and.

19

The computational model BioNSi nodes

implements the following model: A network is represented as a state graph, in which

represent mRNA, proteins, nutrients, cellular events, external signals, etc.

assume discrete

states

maximal state for node activity,

Ui = 9

Ui

taken from a xed range of integers

i (Ui ≥ 0).

{0, . . . , Ui },

- full activity). Nodes maximal states can be set up to

Ui = 20,

gives enough granularity of 10 dierent activity levels. Although

U

Time in the model is also discrete, represented as clock "tics" ( t

state of node

i

at time step

t

Ui

is the

Nodes' states represent their level of activity (0 - no

node to node, it may be convenient to dene a common upper bound

U = 9.

where

Nodes

is denoted

si (t).

3

ACS Paragon Plus Environment

Ui

but normally

may vary from

for all nodes,

= 0, 1, 2, . . . ).

e.g. The

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Each directed edge and a

delay d(i, j).

Page 4 of 33

(i, j) from node i to node j is assigned two parameters:

a

weight w(i, j)

The weight is either a positive integer (activation, translation, etc.) or

a negative one (repression, inhibition, degradation, etc.). Degradation usually appears as a self-negative edge. The weight reects the strength of the interaction. The eect of node on node

j

is dened to be the product of node

i's

w(i, j).

state and the weight

i

Thus, the

strength of the eect is determined not only by the weight of the edge, but also by how active the source node is. The delay is a non-negative integer, which represents the time-scale of

t,

the interaction: at any simulation step based on node

i's

state

eect on each node

j

d(i, j)

the eect of node

steps earlier,

i

on node

i.e. si (t − d(i, j)).

j

will be computed

At any time step

t,

the net

is computed by summing up all nodes' eects on it:

ef f ecti (t) =

X

w(i, j) · si (t − d(i, j)).

j

In addition to a weight and a delay, an edge may specify conditions for it to be active at any step of the simulation, in the form of

dependency edges.

node to an edge. For example, an edge from node that some node

k

i

to node

These are edges going from a

j

may require for its activity

is in a state above 3 (positive dependency), and/or that another node

`

is in state below 6 (negative dependency). If this requirement is not fullled at some time step in the simulation, node

i

will have no eect on node

j

during that step. When there is

more than one dependency condition on a specic edge, it is possible to dene any number of conditions as a minimal requirement (specically, between these conditions, while 1 out of A

conguration

m

m

out of

m

will dene an AND gate

denes an OR gate).

of the network at a specic time step is a "snapshot" of its entire state:

the states of all nodes at that time step (including all earlier "pending" states on delayed edges, which have yet to take eect in future simulation steps).

A simulation starts with

an initial conguration, and consists of the repeated application of a

transition function

(described below) to the current conguration in a synchronous manner. A simulation ends

4

ACS Paragon Plus Environment

Page 5 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

when either a steady state is reached (two consecutive identical congurations), or a loop of congurations is detected. Any of these two outcomes is termed an

attractor

of the system.

At any time step and for each node in the network, the transition function updates its state according to the net eect on it. States may increase, decrease or remain unchanged: the state of node

i

will increase in the next time step if

ef f ecti (t)

is positive, and decrease if it

is negative (otherwise it will remain unchanged). There are two modes by which states may change: a constant update (as in the change can be absolute value of

+1

or

−1

ef f ecti (t)

(with states limited to the range

[0, Ui ]),

is; and a logarithmic-order update (as in

is logarithmically proportional to the value of

19

14

), where

no matter what the ), where the change

ef f ecti (t), i.e. ±[ln(|ef f ecti (t)| + 1)].

The

logarithmic update mode avoids, on one hand, the problematic nature of constant functions that are not aected by the amplitude or strength of the interaction, and, on the other hand, of

e.g.

linear functions, which may cause changes that are too extreme. We therefore advise

the usage of the logarithmic mode in the simulation of biological networks. However, the constant update function may be more appropriate for threshold type responses, in which a particular eect occurs at a constant rate whenever a regulator exceeds a threshold level. In addition to executing a single simulation at a time,

BioNSi

also enables a batch mode,

or statistical mode simulation, in which multiple initial congurations are carried out, as explained later.

Timed nodes A special type of nodes enables representing external signals that inuence the system. Such nodes are not aected by other nodes in the manner described above: their expression pattern throughout the simulation is pre-dened, independent, and determined a-priori, "externally", by the user. Such nodes may be suitable for introducing signals such as light pulses, administration of agents (

e.g., drug, hormone), nutrient supply or depletion, etc.

become active after a lag period (

Timed nodes may

e.g., only after 100 simulation steps passed), thus simulat5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 33

ing intervention with the biological system at some point in the experiment. Furthermore, timed nodes can be dened as cyclical ( injections), or as one-shot signals (

e.g.,

for day-night light regimes, repeating agent

e.g., Nitrogen depletion that induces meiosis in buddying

yeast).

Visualization of a network Fig.1A summarizes the main color and shape conventions of the graphical representation of

BioNSi

networks.

This representation contains nodes and directed edges (both "regular"

edges and "dependency" edges). Nodes' initial states appear below their names (

e.g.

A starts at 9, B and C start at 0), except for timed nodes, denoted "Timed" (

e.g.

node the

node termed signal). Nodes' colors and shapes are determined by the user, and have no predened meaning in the model. Edges are color coded and visually styled according to their function. Negative weight edges appear in red (

e.g.

the self edge on B, and edge from C to

A), while positive weight edges in green (the edge from B to C). The default weight for edges is either +1 or -1. When a weight is set to another value, the weight appears on the edge itself (

e.g.

the edge from C to A has a weight -2). Dashed edges denote edges with a delay

that is greater than 0 (

e.g.

the edge from A to B). Note that delay values are not displayed,

and are available through the edges' setting panel. denote dependency conditions (

e.g.

Gray edges (from a node to an edge)

the edge from the signal node to node A depends on the

state of node C). The dependency, as explained earlier, can be either positive or negative. Note that the condition parameters are not displayed, and are accessible through the edges' setting panel. The results of the simulation of this network are shown on the right hand side of the network, and are explain in the next subsection. Fig. 1B shows another example for the usability of dependency edges: a dimer formation can be represented as two monomer nodes up-regulating the dimer node, when each interaction depends on the existence of the other monomer. In this case, the condition for the edge from monomer1 is that monomer2 is at state

> 0,

and vice versa. This mutual dependency structure forms a logical AND gate.

6

ACS Paragon Plus Environment

Page 7 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Single conguration simulation In

BioNSi, the user constructs the network, based on relevant biological knowledge on inter-

actions between nodes, and sets the required network parameters, as summarized in Table 1.

For nodes - initial states and maximal states, for timed nodes - their lag period, the

pre-dened expression pattern desired, and whether this expression is cyclical or not. Edges require setting of their weights, delays, and dependencies, if any. Once the complete network structure and parameters are set, the simulation of the network can be launched. The user may choose between the two state update functions (constant or logarithmic, mentioned earlier). Furthermore, a desired number of simulation steps can be dened (otherwise the simulation will end when either a steady state or a loop has been detected). Longer simulation periods are convenient, for example, when the user wishes to observe more than one cycle of a periodic behavior. The results of the simulation are presented in a plot, depicting all nodes' states as a function of "time" (simulation steps). For example, Fig. 1A presents the simulation results of the network from that gure, under the logarithmic update function. Colors of curves in the plot match nodes' colors in the graph. The user may choose to lter nodes in or out for display in the results plot area. Finally, the results of the simulation can be exported in a simple table as a .csv le format.

Statistical simulation mode In addition to the described simulation mode, in which a single initial conguration is simulated,

BioNSi

also supports a statistical mode of operation. In this mode, multiple initial

congurations are simulated, and statistical data is collected.

This enables a higher level

study of the system, and may reveal global properties, such as robustness, stability, failsafety, conditions for periodic behavior, etc.

For example, it is possible to examine if a

network is resistant to perturbations in initial states of various nodes. The user chooses a set of initial states per node (instead of only one initial state per node). A simulation is run

7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 33

per each combination of the initial states specied. The number of simulations is therefore equal to the size of the Cartesian product of the sets of initial states.

When the search

space is too large, it is not possible to explore it exhaustively within reasonable time. Thus,

BioNSi

allows sampling some percentage of the initial network states, making the explo-

ration tractable, albeit less accurate. The simulation results are grouped by the attractor reached,

i.e.

the various steady states and loops that were reached in all the simulations. For

example, if two dierent initial congurations both lead to the same steady state - they will be grouped together as part of the "basin" of that steady state. As for loops, the grouping is done according to a single node specied by the user - an

indicator node :

if two dierent

initial congurations lead to loops in which the indicator node performs the same cycle these loops will be grouped together. The statistical data provided includes a list of the attractors reached, the fraction of initial conguration that led to each, sizes of loops, and the average length of the trajectories that led to each attractor. A representative run of a simulation for each attractor is shown (the rst run that led to any specic attractor). In addition,

BioNSi

provides, for each attractor,

the distribution of nodes' initial states that led to this attractor.

This "projection" of

simulation results on nodes' initial states allows studying how specic nodes' initial states aect the global behavior of the network.

For example, we can nd out what fraction

(between 0 and 1) of an attractor's initial states started with node

i

in state

0, 1, ...U

(see

later example in case study 1). A statistics mode for timed nodes is also available, and allows executing and collecting data on simulations of various lag periods and pre-dened expression patterns. For example, one can examine the eect of short light pulses at dierent times during the night on a circadian clock system, and how dierent amplitudes and/or durations of such pulses change the regime of the biological clock. This statistical information is very useful in cases when the size of the Cartesian product of lag periods, signal durations and signal amplitudes under study is large. The same statistical information is provided as mentioned earlier, except for

8

ACS Paragon Plus Environment

Page 9 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

the distribution of nodes' initial states, which is replaced by the starting times of the various attractors. Statistical results can also be exported as a .csv le format for further analysis.

Textual table interface for modications There are two ways in which the user can update an existing network: rst, through the graphical interface where individual objects are accessed and can be added, modied or deleted; second, via a textual table interface, which enables modifying parameters of multiple objects at once more easily. The table supports sorting by columns, which makes it convenient to "group" items together (

e.g., all edges that come out of a specic node).

Exporting and importing networks as .csv format BioNSi

supports exporting networks as a specic .csv (textual) format. Once exported, the

le can be modied and/or imported back later. This is a convenient way to construct large networks. The .csv representation includes data about the graphical representation of the network (nodes shapes, colors and location on the Cytoscape canvas). Thus, network layout will not be lost when saving networks in this textual representation.

Results and discussion We demonstrate the usage of out tool by presenting two case studies involving two dierent biological systems: cell cycle in yeast and the molecular mechanism of the circadian clock in vertebrates. All the simulations were conducted with the logarithmic state update function. The networks demonstrated in these case studies are available as supplementary les of this manuscript, as well as on the tool's website, in both .cys (Cytoscape) and .csv (text) format.

9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 33

Case study 1 A Boolean network model was used by Li

et al. 20

cell cycle pathway in the budding yeast

Saccharomyces cerevisiae .

to describe the key regulators of the yeast Their model was used

to simulate a single instance of the cell cycle, initiated by an increase in the cell size, after which the cell arrests in stationary G1 stage, waiting for additional signals. In their model, nodes are limited to just two levels of activity - 0 ("OFF") and 1 ("ON"). In such simplied model, intermediate levels are not expressible. Li

et al.

conclude that the yeast cell cycle

network is robust, namely, changes in the state of the system at the onset of the simulation do not alter its fundamental behavior. They do so by examining all possible initial states of

11 an 11-node network ( 2

= 2048

initial states in total), and show that 1764 of these (86%)

lead to the same nal state - the main attractor of the system, which resembles stationary G1 state. We used

BioNSi

to examine this conclusion under a wider range of initial and intermedi-

ate states, thus possibly strengthening or weakening the above mentioned conclusions. The corresponding network appears in Figure 2A. We used nodes states ( Ui

U = 9,

a global upper bound for all

= 9 for every node i), and let each node start at any of the states between 0

and 9 (inclusive). Since the search space, namely

1011

initial network states, is too large to

explore exhaustively, we sampled a small percentage of it, namely states), which took

BioNSi

0.01% (107

initial network

about 15 minutes.

The attractors reached in this simulation were all steady states (no loops). Due to the greater exibility of nodes' states in our simulation, compared to the Boolean case, it is not surprising that more attractors were reached, 417 in total. However, the distribution of attractor sizes resembles that of the Boolean model in:

20

86.6% of the initial congurations

converge to one main attractor, followed by several signicantly smaller attractors of size less than 2% (Table 2).

The main attractor in our simulations resembles stationary G1

conditions, in which Sic1 and Cdh1 are active, in agreement with the simulations of the Boolean model.

20

This gives additional support for the robustness of the yeast cell-cycle

10

ACS Paragon Plus Environment

Page 11 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

network suggested by Li

et al.,

as the system exhibits signicant tolerance even to extreme

"noise" in the levels of its components. Next, we examined the relation between specic nodes' initial states and the global behavior of the system,

i.e.

the attractor that was reached. As mentioned earlier,

BioNSi

allows to "project" the set of initial states in each attractor on any individual node. Table 3 shows this distribution for the main attractor in our simulation. For example, the state of Cln3 is positively correlated with this attractor: the larger its initial state is, the more states within the attractor have this value initially (Table 3, denoted "+" in the bottom row of the "Cln3" column). By way of contrast, Sic1, for example, is negatively correlated with the main attractor (Table 3, "-" in the bottom row of the "Sic1" column). Interestingly, some nodes have a non-monotonic relation with the main attractor. Clb1,2 is one such notable example. While an increase of its initial state from 0 to 6 lowers the number of initial states leading to the main attractor, further increasing the initial state to 7, 8, and 9 reverses the trend (Table 3, "no" in the bottom row of the "Clb1,2"). This phenomenon is shared by four other nodes in the network, namely Cln1,2, SBF, MBF and Cdh1. Fig. 2B shows the nonmonotonic relations of these nodes graphically. We repeated this experiment with dierent state spaces,

e.g.

an exhaustive search of all combinations of states between 0 and 5, and the

results were similar. We note that in the Boolean model, such non-monotonicity cannot be observed. We are not aware of a biological explanation for this phenomenon. This analysis, however, may raise several hypotheses that can be further examined both computationally and experimentally. For example, it is possible that when the levels of Clb1,2 are extremely high at G1 conditions, the system is likely to "reset" itself to normal G1 conditions rst, and then launch a new cycle. Alternatively, if either the monotonicity or the non-monotonicity does not hold experimentally, a change in the network or its parameters is called for.

11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 33

Case study 2 The majority of organisms in nature have adapted to the rhythmic daily changes in the environment, caused by the earth's rotation around the sun. These mechanisms, termed circadian clocks, are evolutionary conserved and well-studied in a wide variety of organisms.

21,22

We constructed a network representing the main regulators of the vertebrate circadian clock molecular mechanism.

A screenshot of this network in

BioNSi

is shown in Fig.

3A. The

network consists of 22 nodes, which represent the main circadian clock loop components (Clock, Bmal1, Per and Cry), as well as a few accessory loop components (Dec1, ROR, REVerbα).

21,22

Nodes representing genes appear in lowercase, while protein nodes are in up-

percase. In some cases, we separated a component into two nodes, representing two "states" of this component.

For example, the Bmal:Clock dimer may be located in the cytoplasm

(CLOCK-BMAL-C), or in the nucleus (CLOCK-BMAL-N), where it acts as a transcription factor. Two additional nodes represent the mRNA and protein levels of CCGs (clock control genes), which serve as the output components of the clock. The network structure is based on known literature.

21,22

For example, the network includes degradation eects (

e.g.,

from

CK1 to PER, and self-edges on most nodes), as well as positive regulations, such as initiation of transcription (

e.g., from CLOCK-BMAL-N to cry), translation ( e.g., from cry to CRY) or

dimer formation (

e.g., from PER and CRY to PER-CRY-C, which occurs in the cytoplasm).

When knowledge was partial, the corresponding parameters were chosen based on a trial and error process (adding or removing nodes, changing their connectivity, etc.), until the patterns of behavior observed

in vivo

were achieved. Matching computational results served

as validation for the network structure and parameters, while results that did not match or contradicted

in vivo

experiments led to re-examining the network structure and parameters,

and consequently to required modications.

A detailed justication for the structure and

parameters of the nal network appears in the supplementary text le. The simulation results in Fig. 3B show the cyclic behavior of the node CCG, which exhibits cycles of 26.5 steps on average (a cycle of 26 steps followed by a cycle of 27 steps). Other clock component

12

ACS Paragon Plus Environment

Page 13 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

nodes also follow this pattern (data not shown), matching experimental data.

22,23

In order to provide further evidence that the presented network is a credible model for the molecular mechanism of the vertebrate circadian clock, we implemented in the network a known Clock mutation, which has been thoroughly studied,

∆Clock

∆Clock. 24,25

The mutant

allele is a dominant negative mutation that causes the deletion of 51 amino acids in

the C-terminal activation domain of the Clock protein. When

∆Clock

it competes with the normal clock over the dimerization with Bmal1.

is expressed in a cell,

∆Clock:Bmal1

com-

plexes can still bind to E-box elements in the DNA, but they are decient in transactivation activity.

26

Therefore, heterozygote

∆Clock

cells exhibit a disturbance in the rhythm of the

circadian clock: a lower amplitude and a longer phase.

∆Clock

27

In order to simulate a heterozygote

mutation, we added to the network a node representing

∆Clock,

named D-CLOCK

(see Figure S-1). This node elevates, together with Bmal1, the activation of a

∆Clock:Bmal

cytoplasm node (D-CLOCK-BMAL-C), which then positively regulates a

∆Clock:Bmal

cleus node (D-CLOCK-BMAL-N). The competition between Clock and

∆Clock

nu-

was rep-

resented in our network by dependency edges in the following manner: when the state of D-CLOCK-BMAL-N is relatively high, namely above 5, it blocks the elevation of all the mRNAs activated by Clock: Bmal1. A simulation with D-CLOCK at state 0 represents the wild type conditions, and shows identical results to those presented in Fig. 3B. However, when D-CLOCK is set to state 9, representing the mutant conditions, the CCG rhythm exhibits a lower amplitude and a longer period length (Figure 3C), in agreement with the

vivo

results.

in

27

Based on our suggested network of the vertebrate circadian clock,

BioNSi

was used to

examine the bi-directional interactions between the TGF- β signaling pathway and the circadian clock molecular mechanism. Comparing simulation results under various assumptions to experimental data,

BioNSi

enabled the suggestion of a plausible molecular mechanism for

these interactions. Furthermore, this simulation allowed to interpret the biological results, suggesting that the

in vivo

observed phase delay upon inhibition of TGF- β signaling is a

13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 33

results of a period lengthening of the clock (data not shown). An example of a possible usage of the

BioNSi

network of the circadian clock is the ex-

amination of the inuence of sunlight on this system. Previous evidence shows that dierent components of the clock are inuenced by light.

2830

However, the exact mechanisms of this

inuence are not yet fully understood. At the behavioral level, there exists a well-known experimental phenomenon called the "light phase response curve".

21,22

A light phase response

curve depicts the advance or delay of the circadian clock's phase, resulting from short light pulses introduced during various times of the day. Light pulses around the beginning of the night cause a delay in the circadian rhythm, while pulses occurring later during the night and towards dawn result in a phase advance

21,22

(Fig. 4A). We used

BioNSi

to generate a

network that exhibits this behavior. Since the inuence of light is only partially understood, we used a trial and error process and examined numerous options computationally.

We

added a "Sun" node to the network, and initially set it to up-regulate a single node in the network. To generate a computational phase response curve, we ran simulations in which the Sun node "emits" light pulses 5 simulations steps long, with starting points covering a single cycle of CCG (equivalent to 24 hours). None of our simulations resulted in the phase response curve observed

in vivo.

For example, Fig. 4B presents the phase shift as a function

of the pulses' starting point, when the sun node up-regulated CK1.

No phase delay was

evident at any time point. This suggests that the inuence of light on a single component in the network is not enough, and that additional interactions with light exist. We consequently examined numerous other options of clock components regulated by light. One setting in which the computational phase response curve greatly resembled the experimental one (Fig. 4C) is the activation of Bmal1 and Dec1 in addition to CK1 by the sun node (Figure S-2). For this suggested network, we were able to replicate another known eect of light on the clock, namely the synchronization of the clock's rhythms to 12:12 light:no-light cycles. The sun node was set to cyclic patterns of light:no-light,

i.e.

31

repeating cycles of 12 steps

at state 9 followed by 12 steps at state 0. This resulted in the synchronization of the clock

14

ACS Paragon Plus Environment

Page 15 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

components to the light cycles, after a stabilization period (Fig. 5). Particularly, the CCG cycle length was reduced from 26.5 steps to 24 steps, similar to the change of the circadian clock phase to a 24 hours rhythm when synchronized to light:no-light cycles

in vivo . 31 While

the exact manner by which light inuences the circadian clock remains to be uncovered, we demonstrate how

BioNSi

can be used in order simulate the inuence of light on the

circadian clock, and to generate plausible hypotheses regarding this inuence.

It is also

interesting to note that our network supports previous evidence suggesting that phase delay and phase advance are caused by two dierent entities regulated by light, and not just a single entity.

3236

To conclude, this type of analysis may help researchers raise hypotheses and design new experiments to validate them.

Further usage of this network of the molecular circadian

clock can take the form of introducing additional mutants and perturbations. For example, knock-out of a gene can be simulated by deleting a node from the network or removing its outgoing edges, over expression of a gene can be simulated by increasing the initial state of the corresponding node, etc. We believe such a framework can serve as a fruitful ground for future computational analyses of the molecular mechanism of the circadian clock, and the way it is inuenced by light and interacts with additional molecular networks (for example TGF-β signaling).

Comparison with existing software While many computational tools are based on continuous methods and dierential equations (

e.g. 11,12 ),

tools using discrete, or partly discrete models are also available.

a similar avor to ours is AMINO,

37

One tool of

a tool for modeling biological pathways dynamics,

which is also implemented as a Cytoscape App.

While both tools belong to the family

of discrete models, they are based on dierent computational models and allow somewhat dierent analyses.

AMINO is based on the formalism of Timed Automata, providing a

15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 33

formal semantics for the domain-specic language used to represent signaling networks. It also enables model checking, a technique for the automatic verication of the correctness of network properties.

Networks contain nodes with states (termed levels"), and edges

represent either activation or inhibition, with kinetic constants assigned to them. kinetic constant resemble, but are not identical to the delays on the edges in

BioNSi :

These while

delays reect a lag in the eect of an interaction, kinetic constants relate to the reaction rate (

e.g.

slow, medium, fast).

BioNSi,

Unlike

AMINO's model contains neither weights

on the edges, nor dependency type edges. Both tools allow specifying the desired number of simulation steps.

However,

BioNSi

also allows the termination of a simulation once a

steady states or a loop is reached. Statistical analysis in AMINO enables studying the eect of noise in the kinetic constants of the edges, while in

BioNSi, statistics can be gathered on

the combinations of initial states, to explore attractors' distribution and properties. SimBoolNet

38

is a tool for the simulation of signaling network dynamics (also available

as a Cytoscape App). It is based on an extension of Boolean models. Nodes' activity levels, as well as weights on the edges, are represented by real numbers between 0 and 1.

This

continuous range of values increases the expressiveness of the model, while at the same time allowing a direct comparison with Boolean approaches. The transitions function iteratively traverses the whole network by a depth-rst search, and updates the state of every node. This is an asynchronous update function, which is well suited for modeling signal propagation. In

BioNSi,

the transition function is synchronous, better tting the modeling of regulatory

networks, in which interactions occur simultaneously. le specifying a directed graph, while in

BioNSi,

The input to SimBoolNet is a text

both GUI (graphical user interface) and a

textual format are available. SQUAD

39

also uses a combination of discrete and continuous approaches, and is partic-

ularly suitable for dierentiation processes, where multiple stable steady states of a system can be interpreted as dierent cell types. A network of interactions is translated into a set of Boolean equations, using the logical operators OR, AND, and NOT, and the steady states

16

ACS Paragon Plus Environment

Page 17 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

of this set of equations is computed.

These equations are consequently converted into a

continuous dynamical system of equations, providing continuous levels of activation between 0 and 1 for nodes. SQUAD solves the continuous dynamical system numerically, with initial states set to the stable steady states found earlier. SQUAD allows the identication of the stable steady states present in the network. However, it does not provide information on the events that lead to these states. Although

BioNSi

does not provide a full list of each attrac-

tor's initial congurations, it does provide statistical information about the distribution of specic nodes' initial states. SQUAD does not include dependency type edges as well. Each of these tools, as well as others that exists, have their pros and cons in terms of expressiveness, eciency, and simplicity. With regard to model expressiveness, we nd that the dependency edges are quite unique to

BioNSi, providing a simple way to represent AND/OR

relations between nodes. In terms of eciency, our sampling of initial congurations in the statistical mode allows an eective analysis of very large sets of possible combinations of initial states.

Finally, throughout the design and development of our tool, we focused on

keeping it as simple and intuitive as possible. Together with the friendly Cytoscape based GUI, which allows dening networks graphically, we believe our tool will further facilitate the incorporation of

in silico

approaches within the biological community.

Discussion We introduced

BioNSi,

a biological network simulator tool, implemented as an App of the

open source platform Cytoscape.

The computational model underlying this tool requires

no mathematical or computational expertese, and is based on discrete values in both time and components' state. This makes the tool usable even when only partial biological data is available. We exemplied the usage of our tool with two dierent case studies. The network construction was guided by biological knowledge and previous ndings about the specic sys-

17

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tems.

Page 18 of 33

When biological knowledge was partial, the corresponding nodes, edges and their

parameters were chosen based on a trial and error process. The supplementary text le and Figure S-3 describe such a routine used to construct a simple "toy" network.

Normally,

during most of the process the simulation results and the experimental data do not fully agree, and the network is modied accordingly. Constructing a network in such a manner has its advantages and disadvantages. On one hand, a trial and error process helps clarifying details regarding the network. It requires examining various options, thinking about the ne details of the interactions, deciding which elements should be included in the model, to what level of detail, etc. In some cases, failure to capture the system's behavior, as measured

in

vivo, reects the lack of knowledge about the system, and may help design new experiments. On the other hand, modelling is vulnerable to over-tting of parameters so they agree with experimental results. However, in a discrete computational model such as ours, the analysis of the network is mostly qualitative, and therefore it is not the exact values of the parameters that are important, but what matters are mostly the relations between them.

BioNSi

is based on a computational model that is fairly simple and intuitive. We have

experienced teaching parts of this computational model in the "computational approaches for life scientists" course.

40,41

This allowed familiarizing students with the notions of discrete

modeling and simulation, while at the same time providing a practical hands-on experience. We believe that our tool will prove useful in encouraging the usage of in the life sciences.

18

ACS Paragon Plus Environment

in silico experiments

Page 19 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Tables Table 1:

Network parameters dened by the user in

BioNSi.

The user construct a

network (nodes connected by directed edges), and sets the required parameters. Parameter

Range or type

Default value

Nodes

initial state

0-maximal state

0

maximal state

0-20

9

Timed nodes

lag

natural number

0

expression pattern

any

none

cyclical?

yes/no

no

weight

integer

+1

Edges

delay

natural number

0

dependencies

"node X above/below

none

state Y"

Table 2:

Attractors reached for the yeast cell cycle.

Each row represents one attractor.

The leftmost column states the percentage of initial network congurations that converged to that attractor. The main attractor corresponds to G1 conditions. Attractor size

Cln3

MBF

SBF

Cln1,2

Cdh1

Swi5

Cdc20

Clb5,6

Sic1

Clb1,2

Mcm1

86.6%

0

0

0

0

9

0

0

0

9

0

0

1.87%

0

0

1

1

0

0

0

0

0

0

0

1.35%

0

0

2

2

0

0

0

0

0

0

0

1.32%

0

1

0

0

9

0

0

0

9

0

0

1.22%

0

0

0

0

8

0

0

0

9

0

0

412 additional attractors of size