Lynx Roundup, June 7th 2018

Lynx Roundup, June 7th 2018

Daily roundup of Data Science news around the industry, 6/7/2018.

Matthew Alhonte
Matthew Alhonte
The Keras 4 Step Workflow - KDnuggets
In his book “Deep Learning with Python,” Francois Chollet outlines a process for developing neural networks with Keras in 4 steps. Let’s take a look at this process with a simple example.
Monetizing computing resources on the blockchain
This weed-killing AI robot uses 20 times less herbicide and may disrupt a $26 billion market
Smart weed-killing robots are here and could soon reduce the need for herbicides and genetically modified crops. Swiss company EcoRobotix has a solar-powered robot that can work for up to 12 hours detecting and destroying weeds. Ecorobotix says the robot uses 20 times less herbicide than traditional…
What are some good statistics papers for undegrad students? from r/statistics

Last week, Jacob Scott was at a meeting to celebrate the establishment of the Center for Evolutionary Therapy at Moffitt, and he presented our work on measuring the effective games that non-small cell lung cancer plays (see this preprint for the latest draft). From the audience, David Basanta summarized it in a tweet as “trying to make our game theory models less abstract”. But I actually saw our work as doing the opposite (and so quickly disagreed).

However, I could understand the way David was using ‘abstract’. I think I’ve often used it in this colloquial sense as well. And in that sense it is often the opposite of empirical, which is seen as colloquially ‘concrete’. Given my arrogance, I — of course — assume that my current conception of ‘abstract’ is the correct one, and the colloquial sense is wrong. To test myself: in this post, I will attempt to define both what ‘abstract’ means and how it is used colloquially. As a case study, I will use the game assay that David and I disagreed about.

This is a particularly useful exercise for me because it lets me make better sense of how two very different-seeming aspects of my work — the theoretical versus the empirical — are both abstractions. It also lets me think about when simple models are abstract and when they’re ‘just’ toys.

Given that I am a computer scientist, let me start in the nuts-and-bolts of computer science and then come back to biology. For a software engineer, an abstraction is a way to hide the complexity of computer systems. It is a way to make programs that can be used and re-used without having to re-write all the code for each new application on every different computer. It is in this sense that an algorithm is an abstraction of the actual sequence of bit flips (or whatever other implementation you might use: legos, anyone?) that carry out the physical processes that is computation. To turn it around: the physical process carried out by your computer is then an implementation of some abstract algorithm. Abstraction and implementation are in some sense dual to each other.

An important feature of this for programmers is that a given abstraction can have many implementations. An abstract object is multiply-realizable by a number of concrete objects. The concrete objects might differ from each other in various ways — sometimes very drastically: compare your laptop to a bunch of legos — but if the implementations are ‘correct’ then the ways in which they differ are irrelevant to the abstraction. The abstraction is less detailed than the implementation.

Multiple realizability means that more abstract models specify fewer details that their less abstract implementations. This is often an easy feature to spot, and I think this has led to the colloquial use of abstract models as “less specified”. In the context of modelling, less specified usually means less particular details about the experimental system being modelled. From this perspective, “linking to data” seems like making a model less abstract. In practice, it often means adding a lot of complicated details to a model. The classic kitchen-sink problem.

I think that the above is the sense of “abstract” that David had in mind for his tweet. And under that colloquial sense, his tweet was correct. Although in our case, it was cutting away instead of adding kitchen sinks that got us closer to the empirical, more specified model. But I think this colloquial view of abstraction can lead to some confusion.

Before I dive into the potential confusion, let me start with what I think is an uncontroversial case of abstraction in evolutionary biology. I will use my own work because I know it best (and because my hidden reason for writing this post is to understand better how my various projects interconnect).

In my work on the complexity of evolutionary equilibria (see latest preprint), I argue that there exist hard fitness landscapes where local fitness optima may not be reachable in a reasonable amount of time even when allowing progressively more general and abstract evolutionary dynamics. For this generality, I pay with increasing complication in the corresponding fitness landscapes. In particular, in what kind of epistasis they contain. I go through three levels of abstraction:

  1. If we restrict our evolutionary dynamics to random fitter mutant or fittest mutant strong-selection weak mutation (SSWM), then just sign epistasis is sufficient to ensure the existence of hard landscapes. I call landscapes that have at most sign-epistasis — semi-smooth landscapes.
  2. If we allow any adaptive evolutionary dynamics, then reciprocal sign epistasis (rugged landscapes) in the NK model with K \geq 2 is sufficient for hard landscapes.
  3. If we want to show that arbitrary evolutionary dynamics — even non-adaptive ones — cannot find local fitness optima then we need K \geq 2 and the standard conjecture from computational complexity that \mathrm{FP} \neq \mathrm{PLS}.

The details of these results don’t matter, but hopefully it is clear how they are progressively more abstract. At the first level, random fitter mutant or fittest mutant strong-selection weak mutation dynamics are two very specific evolutionary update rule. There might be some context in which we could see them as abstractions — or more correctly: limits — of some other particular implementatios of populations, but for these purposes it is just a very particular rule. At the second level, I am looking at any adaptive dynamics and every SSWM dynamic is an adaptive dynamic. In particular, random fitter mutant SSWM and fittest mutant SSWM are two particular — and distinct — implementations of an adaptive dynamic. So the second level results apply to the first level, but the first level don’t (necessarily) apply to the second level. Finally, at the third level, I am looking at any evolutionary dynamics — even ones that don’t follow up-hill walks, i.e. even ones that aren’t adaptive. Of course, adaptive (evolutionary) dynamics are a particular implementation of any evolutionary dynamic. So the results at the third level apply to the second level and to the first. In other words, if we take a family of fitness landscapes that are hard at the third level then they will be hard at the second level and the first level. However, if we take a family of fitness landscapes that are hard at the first level then they might not be hard at the second or third level.

In general, if you establish an abstract result then you don’t need to re-establish it in more concrete cases. All you need to do is to show that the concrete case is an implementation of the abstract specification and your previous work will carry over. This modularity is what makes large software systems possible and what makes pure math possible and powerful. It is something that I often see lacking in biology, and it is one of the reasons why I am working on the above project.

As we went up the levels of abstraction in that example, there were fewer details about the evolutionary dynamics. But this isn’t what made them abstractions, it was just a consequence of them being more abstract. What made them abstract is that they captured the effects of many possible ways of filling in the detail and their results generalized to possible implementations of the details.

But just having fewer details does not mean that the model is robust to adding details. For example, I think that David would consider (inviscid) replicator dynamics of the Prisoner’s dilemma (in its cost-benefit form) as less detailed (and thus more colloquially abstract) than a very specific spatial model where each agent is represented explicitly as occupying some point in a grid. Maybe it isn’t even a grid, but the tissue structure lifted from a real histology slide. However, in the first model, we could guarantee that defection would always dominate but in the second (& third) model, the details of the space (and the specific choices of cost and benefit) might have cooperators co-exist or dominate in some cases but not others. As such, the simpler and less-detailed model does not tell us much about the more complicated and more-detailed model. The inviscid model is not an abstraction and the spatial model is not an implementation of that abstraction.

At second glance, this shouldn’t be that surprising. In this particular case, inviscid was just a particular choice of spatial structure that just seems intuitively ‘simpler’ or ‘less of a commitment’ to us than the choice of a grid or a network lifted from histology. This is a typical feature of heuristic models. And it doesn’t mean that heuristic models aren’t essential features of science. It just means that they aren’t abstractions. However, I think that they have an important relationship to abstraction, but I will explore that in a later post.

For now, I want to bring us back to the game-assay that opened this post. Why do I think that it is an abstraction? More specifically, I think that the idea of effective games (see this preprint for more) is an abstraction that I made with the purpose of being operationalizable. This was motivated by my own limitations. I could not come up with a good way to establish or measure features of populations like their reproductive strategies, interaction length-scales, or spatial structure. Hence, I needed an object that abstracted over those features, so that it wouldn’t matter how those details were implemented. This bug also transforms into a feature: if certain details are abstracted over then one can compare measurements from systems where those details differ.

In the case of effective games, this abstraction was done by focusing on the frequency and growth rate of types as opposed to the more standard (among modelers) view of fitness of tokens (specific individuals). The focus on types lets me absorb all the details of spatial structure, interaction length-scales, reproductive strategies, etc into the measurement of the type fitness. It is nature that figures out the particular computation that transforms token fitness into type fitness and I don’t need to know it once I am working at the level of abstract effective games. I don’t need an explicit model of space because this model is implicit in the details that the model abstracts over.

Of course, because I abstracted over many microdynamical physical processes that generate the type fitnesses of the population, I can’t actually describe the specific way it does this in our experiment. If I could then I wouldn’t need the abstraction. But I can still convince you that effective games multiply realizable by showing you two different examples of different mathematical implementations of the same effective game. For example, suppose that we measured the abstract effective game given by the following payoff matrix (the Leader game that we measured in our non-small cell lung cancer system):

\begin{pmatrix}  2.6 & 3.5 \\ 3.1 & 3.0  \end{pmatrix}

There are several reductive games that could implement it. For a first example, if we thought that every cell interacted with every other cell and updating was done by imitation then the same exact matrix as above would be the reductive game. For a second example, if we thought that our population lived on a 3-regular random graph and updated itself with death-birth dynamics then we could calculate the reductive game by inverting the Ohtsuki-Nowak transform. In this case, the reductive game corresponding to the above effective game would be (up to some time rescaling to make the two games seem most numerically similar):

\begin{pmatrix}  2.6 & 3.7 \\ 2.9 & 3.0  \end{pmatrix}

This might not seem like a huge change numerically, but it transforms a Leader game into a Hawk-Dove game. In other words, if the effective Leader game that we measured had been implemented in a well-mixed population then the corresponding reductive game would also be Leader but if it had been implemented in a slightly spatially structured population then the corresponding reductive game would be Hawk-Dove. Thus, two qualitatively different reductive games — depending on the spatial structure that we abstract over — can implement the same abstract effective game. Unfortunately, there is no reason to believe that either of these two spatial structures are a good description of our actual experimental system. Thus, we can’t take our measured effective game and “push it down” to a particular reductive game implementation.

By formulating the concept of an effective game, we can abstract over details that are currently impossible to measure and focus on what is measurable. In this case, abstraction allows the abstract object to be more empirical than more concrete implementations of it. That is why I disagreed with David about this interpretation of our joint work. And by writing out the details of my disagreement with David, I think that I can now better understand why Peter Jeavons instantly recognized our game assay as an abstraction, while I needed this whole exposition to convince myself.


Matthew Alhonte

Supervillain in somebody's action hero movie. Experienced a radioactive freak accident at a young age which rendered him part-snake and strangely adept at Python.