A unified, mechanistic framework for developmental and evolutionary change | Enrico Borriello, Sara I. Walker and Manfred D. Laubichler

Reproduced from: https://arxiv.org/abs/1809.02331

A unified, mechanistic framework for developmental and evolutionary change

Enrico Borriello∗1,2, Sara I. Walker†1,2,5,7 and Manfred D. Laubichler‡1,3,4,6

1ASU-SFI Center for Biosocial Complex Systems, Arizona State University,Tempe, AZ, USA
2Beyond Center for Fundamental Concepts in Science, Arizona State University, Tempe AZ
3Santa Fe Institute, Santa Fe, NM, USA
4Marine Biological Laboratory, Woods Hole, MA, USA
5School of Earth and Space Exploration, Arizona State University, Tempe AZ
6School of Life Sciences, Arizona State University, Tempe AZ
7Blue Marble Space Institute of Science

September 10, 2018

Abstract

The two most fundamental processes describing change in biology – development and evolution – occur over drastically different timescales, difficult to reconcile within a unified framework. Development involves a temporal sequence of cell states controlled by a hierarchy of regulatory structures. It occurs over the lifetime of a single individual, and is associated to the gene expression level change of a given genotype. Evolution, by contrast entails genotypic change through the acquisition or loss of genes, and involves the emergence of new, environmentally selected phenotypes over the lifetimes of many individuals. Here we present a model of regulatory network evolution that accounts for both timescales. We extend the framework of boolean models of gene regulatory network (GRN) – currently only applicable to describing development – to include evolutionary processes. As opposed to one-to-one maps to specific attractors, we identify the phenotypes of the cells as the relevant macrostates of the GRN. A phenotype may now correspond to multiple attractors, and its formal definition no longer require a fixed size for the genotype. This opens the possibility for a quantitative study of the phenotypic change of a genotype, which is itself changing over evolutionary timescales. We show how the realization of specific phenotypes can be controlled by gene duplication events, and how successive events of gene duplication lead to new regulatory structures via selection. It is these structures that enable control of macroscale patterning, as in development. The proposed framework therefore provides a mechanistic explanation for the emergence of regulatory structures controlling development over evolutionary time.

Keywords: gene regulatory network, boolean network, evolution, development, control.

1 Introduction

Understanding the mechanisms underlying the emergence and persistence of new cell types is a central problem in the evolution and development of multicellular organisms. Whereas all cell types can in principle access the same genetic information, in practice, regulation of gene expression restricts this, such that only a subset of an organism’s total genomic information content is accessible to a given cell type at a given time, permitting differentiation of many phenotypes from a single genotype [1]. Regulation of gene expression therefore plays a dominant role in establishing cell types. From a formal point of view, the question of how new cell types emerge, therefore, reduces to the problem of understanding how new regulatory structures emerge to specify and control the expression of novel phenotypes.

The interplay among these regulatory genes, and their interaction with the other components of the cell governs the expression levels of both mRNA and proteins, where the set of interactions is described as a “gene regulatory network” (GRN). Understanding the topology of GRNs controlling body form is a central problem in developmental biology. Understanding their change over evolutionary time is an equally important problem in morphogenesis, as it provides a mechanistic explanation for the differentiation of body structures.

Nonetheless, any attempt at trying to predict the steady states of the regulatory process – i.e. the cell types – in terms of a dynamical model (coupled ordinary differential equations, boolean network models, stochastic gene networks, to name a few common approaches) faces the difficulty of having to reconcile the fixed number of genes in these models, whose expression level is representative of a given cell type [2–4], with the possibility for the size of the genotype to change over evolutionary timescales. Chromosome loss and gain, single gene and whole genome duplication, as well as horizontal gene transfer all alter the number of genes participating in the dynamics of a GRN. In doing so, these processes deprive the mapping of cell types to gene expression patterns of its original meaning. Therefore, even when successful in explaining developmental change, current GRN models must be redefined after each modification of the genotype for their use in evolutionary biology.

In this manuscript, we generalize the well established framework of boolean, dynamical models of GRNs as proposed by Kauffman [5], to include features of evolutionary biology. In Boolean models, the attractors of the network dynamics encode different, stable cellular phenotypes, permitting a model for how multiple cell type identities can be encoded in the same regulatory structure. The novelty of our approach consists in relaxing the restrictive one-to-one mapping between network attractors and cell phenotypes by redefining phenotypes as collections of gene expression patterns with a given subset of genes sharing the same pattern (section 2). While the traditional definition, assuming a one-to-one map between phenotype and genotype, yields increasingly fine-tuned specifications for the phenotype for progressively larger genotypes, our novel definition permits greater evolutionary plasticity by identifying the phenotypes with a macrostate, as opposed to individual (micro)states, of a dynamical system.

We will show that, under the relaxed assumption of identifying phenotypes as macrostates of the underlying Boolean GRN, a fixed genotypic size is not necessary for specifying or retaining phenotypes through evolutionary processes. We will exploit this possibility to study the emergence of new cell types, as well as the consolidation or loss of old types, as a consequence of the changing size and topology of the GRN over evolutionary timescales, and of shifting environmental conditions. As such, our model also addresses an inconsistency arising from considering concepts belonging to different levels of description of a well conceived ontology of biological objects [6] as being modeled as ’same-level’ processes: for example, gene expression levels and phenotypes, as they were interchangeable. Our approach will instead assume gene expression to be at a lower level of the ontology than phenotype, while phenotype and environment will belong to the same, higher ontological level.

In what follows, we focus on the case study of gene duplication, and use it as an example of genotype-changing evolutionary process. Gene duplication has occurred in all three domains of life [7], and is an ancient mechanism dating to before the last universal common ancestor [8]. Features of genome evolution and function, not somehow related to gene duplication events, seem to be the exception rather than the rule. Gene duplication is, by far, the dominant force in creating new genes, and at least 50% of genes in prokaryots [9, 10] and over 90% of those in eukaryotes [11] are the result of gene duplication. Nonetheless, with the exception of a few papers [12–14], it has rarely been discussed as a mechanism for evolving new regulatory patterns for cell type identity.

We propose the interplay between gene duplication and natural selection as the driving force responsible for the assemblage of genetic “modules”, or “core sets of regulatory genes” [15–21], in charge of specific cell functions. Evidence in favor of functional modularity in biology has steadily increased over the last three decades, and, as of today, modules have been both reconstituted in vitro [22], and transplanted from one cell type to another [23]. It is now well accepted that biological functions are only rarely attributed to individual molecules – the role of hemoglobin in transporting oxygen along the bloodstream being among the best examples. Far more often, a biological function results instead from the interaction among many different proteins, like in the transduction process converting pheromone detection into the act of mating in yeast [24–26]. Network controllability is usually reinterpreted in GRN dynamical models as the mathematical counterpart of extracellular signaling. Our macrostate interpretation of the phenotypes still allows us to adopt and assign biological meaning to network controllability techniques. For example, the control kernel of a GRN is defined in [27] as the minimum number of genes/nodes whose expression it is necessary to control, in order to steer the dynamics of the rest of the network toward a desired attractor (phenotype). Focusing on the developmental timescale we show it is still possible to easily rephrase, generalize, and adapt the notion of control kernel within our new framework. Therefore, our unified framework, while being suitable to mechanistic studies of GRN mutations over the evolutionary timescale, is still able to describe developmental change.

As a final application of our method, we will show how our definition gives rise to a nested hierarchy of phenotypes in a GRN (section 5), arising through evolutionary accumulation of new phenotypic states. As a consequence of this nested structure, the older a phenotypic trait is, the less likely it is to be disrupted by mutations. Our model can therefore add further theoretical justification for the preservation of early assembled modules of a GRN, like the ones underling development of phyletic body plans, suggested by Davidson and Erwin as having being preserved since the Early Cambrian. [28]

The manuscript is structured as follows: The next section contains a brief review of dynamical boolean models of GRNs sufficient to orient those not familiar with the general theory. It reviews the main features of Kauffman’s seminal theory, and of the identification between cell types and attractors. We then expose our core idea that the cell types are more properly described by collections of attractors, i.e. by macrostates of the dynamics. We then conclude the section by drawing an example network that we will use in the sections 3 and 4 as a toy model to explore the consequences of our hypothesis. Section 3 is devoted to the main consequence of our approach. We show how gene duplication and mutation events alter the nature of the cell types expressed as a consequence of selective pressure induced by a shifting environment. Section 4 shows how our generalized approach does not inhibit the possibility of adopting network controllability methods to describe epigenetically induced developmental change. We again use the same toy model to illustrate key concepts, and show extracellular control is now more easily achieved than in Kauffman’s original framework. Before concluding, we show in section 5 that our framework naturally explains the progressive acquisition of compatible phenotypic traits in terms of a nested phenotypic structure that shelters earlier traits from deleterious mutations.

2 Boolean Models

This section describes the mathematical details of the boolean, dynamical model we will assume in the rest of this manuscript. In living tissues, the intrinsic patterns of gene expression coupled with signaling input dictates cell fate. Both of these processes can readily be modelled by a boolean network. Boolean networks were originally proposed by Kauffman [5] as a viable mathematical model of GRNs. They permit exploration of the complex steady-state dynamics of GRNs, where the attractors of the dynamics can be identified with different cell types/fates, e.g. quiescence, proliferation, apoptosis, differentiation, etc. The interested reader can check [29–41] only to cite some remarkable examples of the vast literature corroborating the success of the boolean approach.

2.1 Review of Boolean Models

At a given time t, the state of the boolean model of a GRN is known when the state xi(t) of every gene i is known. The Boolean nature of the model assumes two possible states for gene i: active, which corresponds to xi(t) = 1, or inactive, with xi(t) = 0. For a network describing the interactions among n genes, the state at time t is specified by an n−dimensional boolean array x1(t), . . . , xn(t). The dynamics of the GRN is then described by n boolean functions f1, . . . , fn which provide the state of the network at time t + 1, given its state at time t (synchronous update, but asynchronous models are also possible [42]):

For n genes, 2n possible states (gene expression patterns) exist, and the dynamics of the network is represented by a trajectory (a time series) in the discrete space containing the totality of these states. For deterministic functions fi, and because of the finite size of this state space, these trajectories will eventually converge to either a fixed state or a cycle of states. These special activation patterns take the name of attractors of the boolean network, and were identified by Kauffman as corresponding to the stable phenotypes of the GRN.

Nothing has been said so far about the functional form of the boolean functions fi, and readers interested in an insightful analysis of how constrained such a specification is from experimental data are redirected to [43,44]. For reasons that will become clearer in section 3, and given the conceptual nature of this manuscript, we will restrict ourselves to the case of boolean network with thresholds:

where aji is the relative weight of the regulatory signal from gene j to gene i (activation when aji is positive, inhibition otherwise), bi is the activation threshold of gene i, and sgn(x) is a unitary step function, defined by sgn(x) = 0 if x ≤ 0 but sgn(x) = 1 if x > 0. In a more compact notation, we can write

where we have introduced the adjacency matrix A = (aij), and the columns B = (bi) and X = (xi). A useful feature of these models is that they convey the topology of the network, implicit in the definition of generic boolean functions fi, in a very explicit way, with the edges of the network representing non-null entries of the adjacency matrix.

Now that we have laid the groundwork for the numerical models we will be using in the rest of this manuscript, let us go back to the interpretation of the attractors of a boolean network as the phenotypes of a GRN. Given this identification, robustness and evolvability of GRNs can be mapped directly to the dynamics of the attractor landscape of the corresponding boolean model [13, 45, 46]. In this framework, the emergence of new phenotypes has a precise mathematical meaning as the acquisition of new attractors, which can arise due to mutations in the network structure.

Many steady-state attractors are permitted in a boolean network, making it an ideal model for describing how the genomic information contained within a single initial fertilized egg is differentially expressed in so many distinct cell types in response to different regional specification and morphogenetic histories [47].

In the absence of external regulation, the likelihood of a given cell type, or phenotype, is quantified in terms of the number of initial configurations of the network converging on the attractor state encoding that cell type. The higher the number of initial configurations leading to the same equilibrium dynamics, i.e. the production of a well defined set of proteins, the more likely the cell type that set of proteins represents will be. We will refer to these probabilities as the basins of attraction of the possible cell types encoded in the GRN.

On the other hand, the effect that external regulation, in the form of extracellular signaling during development, might have in determining the expression of a specific subset of genes, is even more dramatic, as it might force the development toward cell types that were not even initially accessible to the unperturbed GRN. The minimum number of genes whose expression needs to be controlled, for the cell fate to be determined, is the control kernel associated to that cell type [27].

2.2 GRNs as Evolvable Systems

Most of the predictions derived within the framework we have just reviewed rely on the assumption that different cell phenotypes correspond to different attractors of the GRN. While the connection between cell types and attractors is both numerically and experimentally well-motivated, their one-to-one correspondence might be an artifice due to the diminutive size of the mathematical models that were actually solvable in the not-so-distant past, when a small number of attractors were naturally identified with different phenotypes. Recent advances in computing power are finally making the study of much larger networks possible. One interesting example is the recent dynamical model developed by Fumiã and Martins [48] for the integration of the main signaling pathways involved in cancer. The signaling among almost 100 different genes is responsible for 63 different attractors, that eventually correspond to only three phenotypically distinct and incompatible cell fates: apoptotic, quiescent, and proliferative. An important observation is that these phenotypes are determined by just a small subset of values, e.g. the constant activation of the effector caspases in apoptotic cells, within a much larger gene activation pattern.

A second example of several attractors all sharing the same phenotypic identity is already present in the much smaller GRN describing cell-fate determination during Arabidopsis thaliana flower development [33]. Four among ten possible attractors all represent the same inflorescence meristematic cell type.

Therefore, while most of the literature on network robustness and evolvability focuses on individual attractors, the concept of phenotype – that we redefine here as a macro-state of the cell characterized by the activation values (fixed or cycling) of only a subset of gene nodes – seems to be better suited to the idealization of biological systems. While an attractor can represent a phenotype by itself, a phenotype can be compatible with multiple attractors.

Our definition is exemplified in the following diagram, showing the case of a phenotype Φ defined by the expression of the two genes represented by dark gray boxes, and the inaction of the two genes shown in light gray tone:

The expression of any other gene (boxes with a question mark) can take any value, as long as the four genes entering the definition of Φ exhibit the right expression pattern.

A critical motivation for this redefinition of the mathematical identity of a phenotype is a conceptual difficulty which naturally arises in Kauffman’s original framework [5]. Kauffman’s definition loses its strict meaning in light of evolutionary biology, for the identification of the attractors of a boolean network with the phenotypes of a GRN would require – after the regulatory network has lost or acquired genes – the comparison between genotypes of different lengths.

An immediate, advantageous consequence of our definition is that it keeps its meaning even after changes occur in the genotype. As an example, the following diagram, shows the case of a single gene duplication event enlarging the genome expressing Φ:

As this mutation is not affecting any of the defining genes, this change in the size of the genome does not alter our definition of Φ. Therefore, it makes sense to study whether Φ is still expressed by the new GRN, and whether the gene replica is favoring or disfavoring Φ’s basin. As we will show in section 3, these questions can be addressed in quantitative terms. The inclusion of different gene replicas will induces different changes in the relative size of the basin of attraction of Φ, and a good dynamical model of the GRN will be enough to determine the duplication events that increase the basin of Φ. When Φ is selected, these are likely to be the duplication events that are fixated more often.

A second, less relevant difference between our approach and those outlined in previous literature on GRNs, is that we will not require the basin of attraction of a specific cell type to be exactly 100% for the network to be representative of that specific type. Instead, we will only require one basin to be much larger than the remaining others. In the following example, aimed at making these ideas more concrete, we will assume 80% as the threshold a basin needs to exceed for the remaining attractors to be treated as negligible.

2.3 A Sample Network

In the next two sections, we will explore the consequences of our approach in studying evolutionary and developmental processes withing the boolean model framework. To anchor our discussion to a concrete example, we will introduce a small, tractable network that we will adopt as a toy model, and modify as needed.

Figure 1: Boolean threshold network for the GRN of cell type 1.

The example is shown in figure 1. It represents the GRN expressing a phenotype that we will call cell type 1. The subset of genes defining the phenotype are represented by nodes 1, 2, and 3. Inheriting the terminology adopted in [49], we will refer to them as the effector genes. The characteristic pattern of cell type 1 is assumed to have effector genes 1 and 3 expressed, and inactive gene 2 (blue = active, white = inactive in figure 1). We will refer to this by saying that cell type 1 has expression pattern 101. In determining the basin of attraction of cell type 1, we will sum the basins of every attractor of the GRN which is compatible with the expression pattern 101, i.e. every pattern of the form 1, 0, 1, x4, . . . , x8, where x4x8 (gray nodes) can take any possible (binary) value representing the expression/suppression of the remaining genes labeled from 4 to 8. We will refer to these other genes as terminal selectors (transcription factors that control cell type-specific gene expression in differentiated cells [50, 51]).

The GRN of figure 1 actually encodes two possible cell types, one being type 1, as we said, the other having the activation pattern 110. We will refer to this second phenotype as cell type 2. Despite encoding both phenotypes, the basin of attraction of cell type 1 includes more than 80% of the possible initial activation patterns, all leading to the equilibrium dynamics characterized by the expression of effector genes 1 and 3, and the suppression of gene 2. This is why we assume this network to be a viable description of cell type 1.

For the reader interested in reproducing our example: The GRN dynamics is modeled using a boolean threshold network (as described in section 2.1), with equal weight edges aij = ±1 for activating/inhibiting links (continuous/dashed line in figure 1), non-zero thresholds b1 = −1 and b5,6,8 = 1. But it is important to remark that none of the conclusions we will draw in the next sections depend on the mathematical details of the example we are using.

3 Evolution

We have seen how our re-definition of the phenotypes of a GRN in the framework of boolean, dynamical models is not affected by mutations changing the size of the genome. In this section, we seek to show an explicit example of how this allows the prediction of the genes whose duplication will reinforce a phenotype favored by natural selection and the consequent appearance of regulatory modules over evolutionary time. The underlying idea is exemplified in the following diagram (same example of phenotype Φ adopted in subsection 2.2), with arrows showing the regulation induced by an ideally connected terminal selector which activates exactly the effectors that are supposedly expressed, and suppresses just the genes that are inactive in our former definition of Φ:

It is easy to foresee that the duplication of this ideal terminal selector would positively impact the change in the basin of attraction of Φ. In a more realistic scenario, duplication would be followed by divergence, represented in the following diagram by a difference in one of the effector genes regulated by the replica:

In this section we will consider duplication events like the one just described. For each terminal selector in the GRN, we will consider the entire spectrum of possible events of duplication + divergence, and show ideal pathways leading to cell type changes in a shifting environment. With reference to figure 2, we will describe a simple evolutionary model for the differentiation of progenitor cells of type 1, as encoded in our network from subsection 2.3, into cells of type 2 (already encoded, but unlikely), and the newly born cell type 3, a novel phenotype induced by the modifications the network is going through.

Figure 2: Different environmental pressure in both space and time selects cells undergoing mutation events that favor the emergence of newly preferred cell types.

We are assuming here that the knowledge of the environment is equivalent to the knowledge of the selected cell type: they are on the same ontological level, and share the same mathematical definition in our theoretical model. The environmental pressure changes in both space and time, and selects cells that undergo mutation events that favor the emergence of the newly preferred cell type. In this example, sister cells [52] of type 1 start being selected for cell type 2, and undergo mutation events \textcircled{A} and \textcircled{B} until cell type 2 represents the dominant phenotype. In our model, the mutation events are represented by gene duplication and divergence of one of the terminal selectors (nodes 4–8 in fig. 1), during which the network acquires a non-mutated replica of a preexistent terminal selector, and then a mutation in the way the replica regulates the remaining genes and itself.

Let us first provide a mathematical description of what we mean by a perfect replica of a preexistent gene. We will then introduce divergence in the form of a mutation affecting the replica’s downstream signaling.

Non-mutated gene replica: Given a boolean network N with n nodes, we want to study the dynamics of the n + 1-node network N’ obtained by the inclusion of the perfect replica of a node already present in N.

If the dynamics of N is governed by the set of boolean equations

then the updating rules of N’ will have the form

In the previous equations, xi is the value (0 or 1) of node i at time t, while x’i is the simplified notation for the value of the same node at time t+1. The φi functions are new functions that we want to determine, given our knowledge of the functions fi. Without loss of generality, we can assume node n + 1 to be a perfect replica of node 1. Let us then consider the requirements we impose on φi(x1, . . . , xn, xn+1), after we include node n + 1:

  1. The assertion that node n + 1 is a perfect replica of node 1 translates into the assumption thatHere we are just stating that node replicas obey the same updating rules as the original nodes.
  2. We also want the new node not to affect the system, when not expressed. Therefore,
  3. Lastly, we want nodes 2, . . . , n to see node 1 and n + 1 as indistinguishable:

The previous conditions are enough to determine the values taken by the functions φi for any dynamical state of N’ with the only exception being those cases where x1 and xn+1 are both 1. This is the only genuinely new scenario whose output cannot be predicted in terms of an equivalent configuration of N. The problem is easily solved in the case of boolean networks with thresholds (subsection 2.1), as we know the prescription that gives the updating rule of a node in terms of the state of the network, eq. (2). Building the functions φi in a similar fashion, we deduce that it is both natural, and enough, to impose aj+n,i = aji and bi+n = bi for the previous three conditions to be satisfied.

Mutation: The signaling of gene replica i is represented by the string of numbers aij = 0, ±1, with j = 1, . . . , n (n = current number of genes in the network). A mutation is introduced by changing one of these numbers to another possible value. For example, changing aij from 0 to ±1 corresponds to the creation of a new edge. Changing it from ±1 to 0 means deleting an edge which is instead departing from the original node. Changing it from ±1 to ∓1 changes an activating link into an inhibiting one or vice-versa. Different models of mutation could be assumed, which in turn determine the minimum numbers of replicas the cell needs to acquire in order to change from one type to another. For simplicity, our examples show only the sequence of mutations that minimizes the number of steps needed to complete the phenotypic shift. Therefore, we consider all possible mutations of the kind previously described, and then select the one that maximizes the basin of attraction of the preferred cell type.

We can now return to the example shown in fig. 2 and deduce that at least two mutation events are needed to convert sister cells of type 1 to type 2 (activation pattern: 110). By acquiring a mutated replica of terminal selector 7, event \textcircled{A}, that inactivates effector gene 3 instead of activating it, the dynamics of the GRN is equally likely to converge toward cell type 1 or 2 (figure 3). The subsequent acquisition of a mutated replica of gene 6, event \textcircled{B}, that, differently from the original one, does not activate gene 7, determines the complete shift toward cell type 2 (88% of the configuration space). The highly connected subset of nodes that appear in the B-panel of figure 3 is responsible for the convergence of the dynamics toward the characteristic activation pattern which defines cell type 2. Whatever the function associated to this cell type is, the newly-created module is in charge of it.

Figure 3: Duplication and mutation events leading from cell type 1 to 2, and 3, and from cell type 2 back to 1, as described in section 3.

The assumption that sister cells of cell type 2 start being selected for cell type 3 (activation pattern: 100), determines the selection of mutation events like \textcircled{C}, where the GRN acquires a mutated replica of terminal selector 8 that suppresses gene 3. One mutation event is now enough to determine the complete shift toward cell type 3.

The last example we will consider here is the convergence of cell type 2 back to cell type 1. After the acquisition of mutations \textcircled{A} and \textcircled{B}, the preferred cell type is again type 1. This can be achieved (complete shift from less than 12% to more than 96%) with a single mutation event, marked \textcircled{D} in figure 3, consisting of a mutated replica of terminal selector 8 that activates effector gene 3. This last evolution of the GRN expresses the same phenotype as the initial network in figure 1, but carries memory of the evolutionary path that led it through its type 2 period in the form of the module visible in the D-panel of figure 3.

We hope this example is enough to convince the reader that, as long as a viable mathematical description of the GRN is available, as well as the knowledge of the preferred phenotypes for a shifting environment, our approach provides the necessary selected gene duplication and mutation event(s) with mechanistic meaning. The possibility to retain cell type mathematical identities for evolving GRNs is the key features enabling it.

We have also shown that our method allows the identification of GRNs that differ for both genotype and network topology (like the GRN in figure 1 and the evolved network in the D-panel of figure 3) as encoding the same cell type (type 1 in our example). Therefore, apparently redundant topological features in the structure of a GRN might carry the imprint of their evolutionary history, and of the environment that induced them.

4 Development

Next we want to show how external control of the expression of accurately chosen terminal selectors – e.g. as under the effect of drug therapies, or because of extracellular signalling during development – can determine cell differentiation, and the expression of not only cell type 2, but also novel cell types that were not initially encoded in our sample network.

For an explanation of cell differentiation not invoking extracellular signaling, and relying instead on cellular noise, the reader is redirected to [53]. We will focus here on the possibility of epigenetic signaling to induce the appearance of new cell types in a Lamarckian fashion [54]. This approach has been extensively investigated as an alternative explanation [55,56] to the current paradigm of treatment-selected, drug-resistant clones in tumor progression [59, 60]. We generalize it here to the broader problem of cell differentiation over a developmental timescale. We also show the role played by our generalized definition of phenotype in reducing the size of the control kernel, as defined in [27].

The original approach developed in [27], consists of taking every possible combination of any subset of genes and their possible expression patterns. For each combination, and each initial state, time series of the transient expression pattern of the remaining genes are generated, while the selected genes are pinned to the specific selected values. When by pinning the smallest number of nodes this operation moves all possible initial conditions toward the basin of attraction of a specific attractor, Kim et al. define that smallest subset of nodes as the control kernel of the attractor state the system is converging to.

For consistency with our less constrained definition of what a phenotype is (subsection 2.2), we need to re-define the notion of control kernel accordingly. The generalized definition of control kernel we adopt here does not assume the forced terminal state to be a specific attractor, just a phenotype (i.e. any possible attractor compatible with the definition of that phenotype). As a result, our control kernels are significantly smaller than those found in [27], even when restricting ourselves to controlling only terminal selectors. To show this explicitly, we have considered again our sample network from subsection 2.3, and the effect of pinning just one terminal selector at a time.

Table 1: Phenotypic change induced by the controlled expression of the terminal selector. This table shows the result of pinning just one terminal selector at a time (TS ON/OFF). Four new phenotypes, cell types 3 ÷ 6, not initially encoded in the GRN of figure 1, are now possible equilibrium state of the gene expression dynamics, with basins of attraction sizes (likelihoods) listed as percentages.

The result is shown in table 1. Forcing the expression of terminal selector 7 (referred to as “TS7 is ON” in the table) is enough to enlarge the basin of attraction of cell type 2 from less than 20% to exactly 100%. Likewise, forcing the expression of TS8 makes the network shift toward a phenotype characterized by the expression pattern 111, that was not even encoded in the network. Eventually, we might assume 80% (or some other high percentage value) to be enough for the network to be considered locked in a certain cell phenotype. In this case TS4 would be an alternative control kernel for the cell type defined by the 111 pattern. This new definition of control kernel, focused on forcing the GRN into a less restrictive activation pattern than a single, specific attractor, seems to be much easier to achieve experimentally, better suited for large/realistic networks, and therefore relevant to drug treatments.

Cell type differentiation in response to extracellular signaling, is still a key aspect of our generalized approach to dynamical GRN models. Even if mainly aimed at incorporating evolutionary features (section 3), our framework entirely preserves (and sometimes facilitates) the applicability of network control theory to GRN models in explaining both developmental or drug-induced change.

5 Phenotype Preservation

In their seminal paper on the GRN structures devoted to the development of animal body plans [28], Davidson and Erwin postulated the existence of a modular structure for these networks: more ancient gene modules, assemble early, and are strongly selected for, because of the vital nature of the function they perform. A similar, modular approach can be recognized in [18], and our redefinitions of the phenotypes (section 2) naturally reflects Hartwell et al.’s point of view: “If the function of a protein were to directly affect all properties of the cell, it would be hard to change that protein, because an improvement in one function would probably be offset by impairments in others.”

We have already shown that our many-to-one activation patterns to phenotype map favors modularity. Our aim in this section is to show that it also gives rise to a nested phenotypic structure which makes functions less likely to be disrupted the earlier it gets selected.

We have seen how a phenotype is defined in terms of the activation values of just a subset of nodes in the gene expression pattern. If, for example, Φ1,1 is defined by the activation of just n1,1 out of n nodes in our boolean model, then there will be 2n−n1,1 possible microstates still compatible with it, for expressing Φ1,1 is not constraining the entire activation pattern. This also means that, after the selection of Φ1,1, there will be still room for the selection of additional functions. Let Φ2,1 and Φ2,2 be two of them (see figure 4), characterized by the expression pattern of n2,1 and n2,2 additional effector genes, respectively, and let us assume they both get selected. Let us refer to these two novel functions as generation 2 phenotypes, while Φ1,1 will belong to first generation. We can expect this process to continue until the present time, with more recent phenotypes defined by more and more detailed activation patterns of an increasing number of effector genes, as required by the fulfillment of the definitions of all the phenotypes selected over the previous generations. It is important to keep in mind that such extrapolation is possible because we can retain the same definitions of the phenotypes Φi,j over successive generations, even if we know the genotype has very likely changed in the meantime.

Figure 4: Nested phenotypic structure emerging from the selection of secondary, more recent traits while preserving older ones. Each phenotype also shares, on top of its defining, and more recent trait, all the traits that were selected earlier.

With reference to figure 4, while phenotype Φ1,1 is present as long as n1,1 effector genes exhibit the right expression pattern, phenotype Φ5,4 will rely on the right activation of

effector genes.

This also means that more recent traits are a more likely target of mutations and, therefore, more easily disrupted, while older ones, on top of being favored by natural selection, are naturally sheltered from deleterious mutations.

6 Conclusions

This manuscript represents a synthesis of mathematical models of gene regulatory networks within theoretical evolutionary biology, which also accounts for development.

Despite of their predictive power, and their possibility to mathematically reinterpret cell fates as the steady states of the abstract, non-linear dynamics of the regulatory processes they describe, dynamical models (boolean models in this manuscript) of GRNs have been limited by this identification between mathematical attractors and biological phenotypes. Evolutionary processes, like single gene and whole genome duplication, both chromosome gain and loss, horizontal gene transfer, etc., alter the size of the genome, and deprive the exact attractor-phenotype identification as a one-to-one map of its original meaning. The leading role played in evolution and speciation by the interplay between these processes and natural selection, makes current dynamical models of GRNs unfit to describe genetic network evolution.

Motivated by this limitation, we have explored the consequences of relaxing this one-to-one identification between cell types and attractors. We have shown with a simple numerical model that, redefining a cell type as a collection of attractors of a GRN, all sharing a common, characterizing property, enables retaining the main features of Kauffman’s original theory, and permitting easy generalization using network control theory. It is now no longer needed for the genome to retain a fixed size for the cell type type definition to preserve its meaning.

This opens the possibility to quantitative studies of the effect that gene duplication, mutation, and natural selection have on cell types, by virtue of allowing the study of evolutionary processes within the well-established framework of dynamical systems theory.

More specifically, we have shown the emergence of elementary GRN modules responsible for cell fate determination as an immediate consequence of the selection processes of optimal evolutionary mutations and gene duplication events. We have incorporated the formalism of network controllability, preserving its biological interpretation as a viable mathematical description of extracellular/epigenetic control of gene expression in an evolutionary model. Finally, we have shown that the remaining degrees of freedom arising from the identification of the phenotypes of the GRN with the macrostates, as opposed to the states, of the dynamical network gives rise to a nested phenotypic structure, and that this structure naturally makes more ancient phenotypes less likely to be disrupted by mutations.

References

[1] J.M.W. Slack Essential Developmental Biology, Wiley-Blackwell, Oxford 2013
[2] O. Stegle, S.A. Teichmann, J.C. Marioni, Nat. Rev. Genet. 16 (2015) 133.
[3] E. Shapiro, T. Biezuner, S. Linnarsson, Nat. Rev. Genet. 14 (2013) 618.
[4] O. Schwartzman, A. Tanay, Nat. Rev. Genet. 16 (2015) 716.
[5] S.A. Kauffman, J. Theoret. Biol. 22 (1969) 437.
[6] M.D. Laubichler, S.J. Prohaska, and P.F. Stadler J. Exp. Zool. (Mol Dev Evol). 330 (2018) 5.
[7] F. A. Kondrashov, I. B. Rogozin, Y. I. Wolf, E. V. Koonin, Genome Biol. 3 (2002) research0008
[8] G. C. Conant & K. H. Wolfe, Nature Rev. Genet. 9 (2008) 938.
[9] S.E. Brenner, T. Hubbard, A. Murzin & C. Chothia, Nature (1995) 378 140.
[10] S.A. Teichmann, J. Park & C. Chothia, Proc. Natl. Acad. Sci. USA 95 (1998) 14658.
[11] J. Gough, K. Karplus, R. Hughey & C. Chothia J. Mol. Biol. 313 (2001) 903.
[12] A. Wagner, Proc. Natl. Acad. Sci. 91 (1994) 4387.
[13] M. Aldana, E. Balleza, S. Kauffman, O. Resendiz, J. Theor. Biol. 245 (2007) 433.
[14] A. Crombach, P. Hogeweg, PLoS Comp Biol 4 (2008) 7:e1000112.
[15] J.T Bonner, The evolution of complexity. Princeton University Press, Princeton, New Jersey 1988.
[16] G.P Wagner, Amer. Zool., 36 (1996) 36.
[17] B. Alberts, Cell 92 (1998) 291.
[18] L.H. Hartwell, J.J. Hopfield, S. Leibler, A.W. Murray, Nature 402 (1999) Suppl C47.
[19] J.B. Pereira-Leal, E.D. Levy, S.A. Teichmann, Phil. Trans. R. Soc. B 361 (2006) 507.
[20] J.B. Pereira-Leal, E.D. Levy, C. Kamp, S.A. Teichmann, Genome Biol. 8 (2007) R51.
[21] K. Achim, D. Arendt, Curr. Opin. Genet. Dev. 27 (2014) 102.
[22] A.S. Khalil, J.J. Collins, Nature Reviews Genetics 11 (2010) 367.
[23] H. Hsu, et al., Biophys. J. 65 (1993) 1196.
[24] J.B. Stock, M.G. Surette, in Escherichia coli and Salmonella: Cellular and Molecular Biology (1996) 1103.
[25] I. Herskowitz, Cell 80 (1995) 187.
[26] F. Posas, M. Takekawa, H. Saito, Curr. Opin. Microbiol. 1 (1998) 175.
[27] J. Kim, S.-M. Park, K.-H. Cho, Scientific Reports 3 (2013) 2223.
[28] E.H. Davidson and D.H. Erwin, Science 311 (2006) 796.
[29] S. Huang, D.E. Ingber, Exp. Cell Res. 261 (2000) 91.
[30] L. Mendoza, E.R. Alvarez-Buylla, J. Theor. Biol. 204 (2000) 311.
[31] R. Albert, H.G. Othmer, J. Theor. Biol. 223 (2003) 1.
[32] T.S. Gardner, D. Di Bernardo, D. Lorenz, J.J. Collins, Science 301 (2003) 102.
[33] C. Espinosa-Soto, P. Padilla-Longoria, E.R. Alvarez-Buylla, Plant Cell 16 (2004) 2923.
[34] F. Li, T. Long, Y. Lu, Q. Ouyang, C. Tang, Proc. Natl. Acad. Sci. U. S. A. 101 (2004) 4781.
[35] S. Huang, G. Eichler, Y. Bar-Yam, D.E. Ingber, Phys. Rev. Lett. 94 (2005) 128701.
[36] K. Klemm, S. Bornholdt, Proc. Natl. Acad. Sci. U. S. A. 102 (2005) 18414.
[37] K.Y. Lau, S. Ganguli, C. Tang, Phys. Rev. E 75 (2007) 051907.
[38] E. Balleza, E.R. Alvarez-Buylla, A. Chaos, A. Kauffman, I. Shmulevich, M. Aldana, PLoS One 3 (2008) e2456.
[39] A. Samal, S. Jain, BMCSyst. Biol. 2 (2008) 21.
[40] C. Espinosa-Soto, O.C. Martin, A. Wagner, BMC Evol. Biol. 11 (2011) 5.
[41] C. Torres-Sosa, S. Huang, M. Aldana, PLoS Comput. Biol. 8 (2012) e1002669.
[42] A. Garg, A. Di Cara, I. Xenarios, et al., Bioinformatics 24 (2008) 1917.
[43] A. Henry, F. Mone´ger, A. Samal, O.C. Martin, Mol. Biosyst., 9 (2013) 1726.
[44] J.X. Zhou, A. Samal, A.F d’H´erou el, N. D. Price, S. Huang, Biosystems 142 (2016) 15.
[45] S. Ciliberti , O.C. Martin , A. Wagner, PLoS Comp Biol 3 (2007) 2:e15.
[46] S. Ciliberti , O.C. Martin , A. Wagner, PNAS 104 (2007) 13591.
[47] T. Boveri, Die Organismen als Historische Wesen. Kgl. Universit¨atsdruckerei von H. St¨urtz, 1906.
[48] H.F. Fumi˜a, M.L. Matins, PLoS One 8 (2013) e69008.
[49] Arendt et al., Nature Reviews Genetics 17 (2016) 744.
[50] T. Graf, and T. Enver, Nature 462 (2009) 587.
[51] A.C. Mullen, et al., Cell (2011) 147 565.
[52] D. Arendt, Nat. Rev. Genet. 9 (2008) 868.
[53] M. Villani, A. Barbieri, R. Serra, PLoS One 6 (2011) e17703.
[54] E. Mayr, J. Hist Biol. 5 (1972) 55.
[55] S. Huang, Progress in Biophysics and Molecular Biology 110 (2012) 69.
[56] A.O. Pisco, A. Brock, J. Zhou, A. Moor, M. Mojtahedi, D. Jackson, and S. Huang, Nat Commun 4 (2013) 2467.
[57] H. de Vries, Species and Varieties, Their origin by Mutation, Chicago, Open Court Publishing Company, 1905.
[58] A. Wagner, Arrival of the Fittest, New York, Penguin, 2014.
[59] C.S. Attolini, F. Michor, Ann. N. Y. Acad. Sci. 1168 (2009) 23.
[60] L.A. Diaz et al., Nature 486 (2012) 537.