Lynx Roundup, April 30th 2018

Lynx Roundup, April 30th 2018

Review prediction with Neo4j and Tensorflow! Machine learning yearning! Blockchain explained in 7 Python functions!

Matthew Alhonte
Matthew Alhonte
Review prediction with Neo4j and TensorFlow
We show how to create an embedding to predict product reviews, using the TensorFlow machine learning framework and the Neo4j graph database. It achieves 97% validation accuracy. A common problem in…
Blockchain Explained in 7 Python Functions - KDnuggets
It wasn’t until I wrote my own simple Blockchain, that I truly understood what it is and the potential applications for it. So without further ado, lets set up our 7 functions!

Just recently A* Algorithm was added to Neo4j graph algorithms and I decided to show how nicely APOC spatial functions fit with it as it uses GPS location for heuristic.

Import

I found this cool github repo geoiq/acetate/ where there is geospatial data available and has many contributors that we can thank to. I picked a file containing a list of 100+ cities in Europe and imported them into Neo4j.

LOAD CSV WITH HEADERS FROM 
"https://raw.githubusercontent.com/geoiq/acetate/master/places/Europe-z4-z6.txt"
as row FIELDTERMINATOR "\t"
MERGE (city:City{name: row.name})
ON CREATE SET city.population = toINT(row.population)
MERGE (country:Country{code: row.`country code`})
MERGE (city)-[:IS_IN]->(country)

Apoc spatial

Even though the GPS location of the cities is included in the CSV, I did not import it, just so I can show how you can get it yourself using geocoding API, that is hidden in apoc.spatial.geocodeOnce procedure. Find more details in my Neo4j to ELK post and documentation.

MATCH (city:City)-[:IS_IN]->(country)
CALL apoc.spatial.geocodeOnce(city.name + " " + country.code) 
YIELD location
// Save response
SET city.latitude = location.latitude,
    city.longitude = location.longitude

Distance calculation

Lets assume we want to go on a trip through Europe and visit cities on the list on the way. Our only requirement is that we don’t travel more than 250km per day so that we have time and energy left to act a tourist and explore cities.

We will calculate distance between cities and save it as a relationship property between them where the distance is less than 250km. This way we set a threshold to travel less than 250km per day on our trip and still wind up in one of the cities on the list every day.

WITH 250 as distanceInKm
MATCH (c1:City),(c2:City)
WHERE id(c1) < id(c2)
WITH c1,c2,
distance(point({longitude:c1.longitude,latitude:c1.latitude}), 
         point({longitude:c2.longitude,latitude:c2.latitude})) as distance
WHERE distance < (distanceInKm * 1000) 
MERGE (c1)-[l:LINK]->(c2)
ON CREATE SET l.distance = distance

A* Algorithm

A* is an informed search algorithm, or a best-first search, meaning that it solves problems by searching among all possible paths to the solution (goal) for the one that incurs the smallest cost (least distance travelled, shortest time, etc.), and among these paths it first considers the ones that appear to lead most quickly to the solution. It is formulated in terms of weighted graphs: starting from a specific node of a graph, it constructs a tree of paths starting from that node, expanding paths one step at a time, until one of its paths ends at the predetermined goal node.

Taken from Wikipedia

Current implementation of the A* algorithm in graph algorithms uses geospatial information as heuristic function to be able to do best-first search through the graph.

I live near Ljubljana and lets say I want to travel to Amsterdam with my backyard helicopter :).

MATCH (start:City{name:"Ljubljana"}),(end:City{name:"Amsterdam"})
CALL algo.shortestPath.astar.stream(start, end, 'distance',
  'latitude','longitude',{
  nodeQuery:'City',relationshipQuery:'LINK',direction:'BOTH'})
YIELD nodeId, cost
MATCH (n) where id(n)=nodeId
RETURN n.name,cost

Our graph is not very detailed as it only contains 100ish cities, so I got back quite interesting results. At first glance it doesn’t really seem like the fastest way to get to Amsterdam, but it is definitely an interesting one judging by the cities we would visit and all the scenery ranging from Adriatic sea to Alps and finally flat lands of Netherlands.

a.png

Cost is the distance in meters.

Let’s also visualize the resulted path with neovis.js.

The query that we use to get back the shortest path from A* algorithm. Could also be used to visualize in Neo4j Browser.

MATCH (start:City{name:'Ljubljana'}),(end:City{name:'Amsterdam'})
 CALL algo.shortestPath.astar.stream(start, end, 'distance', 
 'latitude','longitude',{ 
 nodeQuery:'City',relationshipQuery:'LINK',direction:'BOTH'}) 
YIELD nodeId, cost
MATCH (n) where id(n)=nodeId 
WITH collect(n) as nodes
UNWIND range(0, length(nodes)-2) as index
WITH nodes[index] as from, nodes[index + 1] as to 
MATCH p=(from)-[:LINK]-(to)
RETURN p

I used population of the cities for defining the size of the nodes and distance between cities for defining the width of the relationships.

neovisa.png

Code available on github.

Knowledge graph

Lets assume I made it to Amsterdam. Now what? What is worth checking out or visiting?

We can use Google knowledge graph API to find attractions we could visit and enrich our graph with the data. Find more details in my previous post Neo4j APOC triggers and web APIs.

WITH "api_key" as google_api_key
MATCH (c:City{name:"Amsterdam"})-[:IS_IN]->(country)
CALL apoc.load.json("https://kgsearch.googleapis.com/v1/entities:search?query=" 
                    + apoc.text.urlencode(c.name) + " " + country.code + 
                     "&key=" + google_api_key + "&limit=20&indent=True")
YIELD value
UNWIND value.itemListElement as row
WITH row.result as results,c 
WHERE results.name is not null
MERGE (p:Attraction{name:results.name})
ON CREATE SET p.description = results.description,
 p.detailedDescription = results.detailedDescription.articleBody
MERGE (p)-[:IS_IN]->(c)
WITH results,p
UNWIND (results.`@type`) as type
MERGE (t:Type{name:type})
MERGE (p)-[:HAS_TYPE]->(t)

Now that we stored the data we got from the API into our graph, we can search for things to do or visit in Amsterdam.

MATCH (:City{name:"Amsterdam"})<-[:IS_IN]-(a:Attraction)
RETURN a.name as attraction,
       a.description as description

asmte

Conclusion

I always loved how easily you can scrape the internet with Neo4j and cypher/APOC procedures. Neo4j allows us to easily enrich our graph and make it a proper knowledge graph of our own by combining multiple data sources ranging from other databases to online APIs. Combining this with the graph algorithms it becomes an even more serious analytics tool, that is fun to work with.

Choosing the Right Metric for Evaluating Machine Learning Models – Part 1 - KDnuggets
Each machine learning model is trying to solve a problem with a different objective using a different dataset and hence, it is important to understand the context before choosing a metric.

https://spectrum.ieee.org/the-human-os/biomedical/devices/qa-ai-could-redesign-the-drug-development-process

Hey look, free book!  From Andrew Ng hisself:

Machine Learning Yearning
More Environments Will Not Make Things Easier

Long before the invention of Feynman diagrams, engineers were using similar diagrams to reason about electrical circuits and more general networks containing mechanical, hydraulic, thermodynamic and chemical components. We can formalize this reasoning using ‘props’: a certain kind of categories whose objects are natural numbers, with the tensor product of objects given by addition. In this approach, each kind of network corresponds to a prop, and each network of this kind is a morphism in that prop. A network with m inputs and n outputs is a morphism from m to n. Putting networks together in series is composition, and setting them side by side is tensoring.

In this paper, we study the props for various kinds of electrical circuits:

• John Baez, Brandon Coya and Franciscus Rebro, Props in network theory.

We start with circuits made solely of ideal perfectly conductive wires. Then we consider circuits with passive linear components like resistors, capacitors and inductors. Finally we turn on the power and consider circuits that also have voltage and current sources.

And here’s the cool part: each kind of circuit corresponds to a prop that pure mathematicians would eventually invent on their own! So, what’s good for engineers is often mathematically natural too.

We describe the ‘behavior’ of these various kinds of circuits using morphisms between props. In particular, we give a new proof of the black-boxing theorem proved earlier with Brendan Fong. Unlike the original proof, this new one easily generalizes to circuits with nonlinear components! We also use a morphism of props to clarify the relation between circuit diagrams and the signal-flow diagrams in control theory.

Here’s a quick sketch of the main ideas.

Props in network theory

In his 1963 thesis, Lawvere introduced functorial semantics: the use of categories with specified extra structure as ‘theories’ whose ‘models’ are structure-preserving functors into other such categories:

• F. W. Lawvere, Functorial semantics of algebraic theories, Ph.D. thesis, Department of Mathematics, Columbia University, 1963. Also in Reprints in Theory and Applications of Categories 5 (2003), 1–121.

In particular, a Lawvere theory is a category with finite cartesian products and a distinguished object X such that every object is a power X^n. These can serve as theories of mathematical structures that are sets X equipped with n-ary operations

f \colon X^n \to X

obeying equational laws. However, structures of a more linear-algebraic nature are often vector spaces equipped with operations of the form

f \colon X^{\otimes m} \to X^{\otimes n}

To extend functorial semantics to these, Mac Lane introduced props—or as he called them, ‘PROPs’. The acronym stands for ‘products and permutations’:

• Saunders Mac Lane, Categorical algebra, Bulletin of the American Mathematical Society 71 (1965), 40–106.

A prop is a symmetric monoidal category equipped with a distinguished object X such that every object is a tensor power X^{\otimes n}. Working with tensor products rather than cartesian products puts operations having multiple outputs on an equal footing with operations having multiple inputs.

Already in 1949 Feynman had introduced his famous diagrams, which he used to describe theories of elementary particles. For a theory with just one type of particle, Feynman’s method amounts to specifying a prop where an operation f \colon X^{\otimes m} \to X^{\otimes n} describes a process with m particles coming in and n going out. Although Feynman diagrams quickly caught on in physics, only in the 1980s did it become clear that they were a method of depicting morphisms in symmetric monoidal categories. A key step was the work of Joyal and Street, which rigorously justified reasoning in any symmetric monoidal category using ‘string diagrams’—a generalization of Feynman diagrams:

• André Joyal and Ross Street, The geometry of tensor calculus I, Advances in Mathematics 88 (1991), 55–112.

By now, many mathematical physicists are aware of props and the general idea of functorial semantics. In constrast, props seem to be virtually unknown in engineering!

But engineers have been using diagrammatic methods ever since the rise of electrical circuits. And in the 1940s, Olson explained how to apply circuit diagrams to networks of mechanical, hydraulic, thermodynamic and chemical components:

• Harry F. Olson, Dynamical Analogies, Van Nostrand, New York, 1943.

By 1961, Paynter had made the analogies between these various systems mathematically precise using ‘bond graphs’:

• Henry M. Paynter, Analysis and Design of Engineering Systems, MIT Press, Cambridge, Massachusetts, 1961.

Here he shows a picture of a hydroelectric power plant, and the bond graph that abstractly describes it:

By 1963, Forrester was using circuit diagrams in economics:

• Jay Wright Forrester, Industrial Dynamics, MIT Press, Cambridge, Massachusetts, 1961.

In 1984, Odum published a beautiful and influential book on their use in biology and ecology:

• Howard T. Odum, Ecological and General Systems: An Introduction to Systems Ecology, Wiley, New York, 1984.

Energy Systems Symbols

We can use props to study circuit diagrams of all these kinds! The underlying mathematics is similar in each case, so we focus on just one example: electrical circuits. For other examples, take a look at this:

• John Baez, Network theory (part 29), Azimuth, 23 April 2013.

In our new paper, we illustrate the usefulness of props by giving a new, shorter proof of the ‘black-boxing theorem’ proved here:

• John Baez and Brendan Fong, A compositional framework for passive linear networks. (Blog article here.)

A ‘black box’ is a system with inputs and outputs whose internal mechanisms are unknown or ignored. A simple example is the lock on a doorknob: one can insert a key and try to turn it; it either opens the door or not, and it fulfills this function without us needing to know its inner workings. We can treat a system as a black box through a process called ‘black-boxing’, which forgets its inner workings and records only the relation it imposes between its inputs and outputs. Systems with inputs and outputs can be seen as morphisms in a category, where composition uses the outputs of the one system as the inputs of another. We thus expect black-boxing to be a functor out of a category of this sort. A ‘black-boxing theorem’ makes this intuition precise.

In an electrical circuit, associated to each wire there is a pair of variables called the potential \phi and current I. When we black-box such a circuit, we record only the relation it imposes between the variables on its input and output wires. Since these variables come in pairs, this is a relation between even-dimensional vector spaces. But these vector spaces turn out to be equipped with extra structure: they are symplectic vector spaces, meaning they are equipped with a nondegenerate antisymmetric bilinear form. Black-boxing gives a relation that respects this extra structure: it is a ‘Lagrangian’ relation.

Why does symplectic geometry show up when we black-box an electrical circuit? The first proof of the black-boxing theorem answered this question. A circuit made of linear resistors acts to minimize the total power dissipated in the form of heat. More generally, any circuit made of linear resistors, inductors and capacitors obeys a generalization of this ‘principle of minimum power’. Whenever a system obeys a minimum principle, it establishes a Lagrangian relation between input and output variables. This fact was first noticed in classical mechanics, where systems obey the ‘principle of least action’. Indeed, symplectic geometry has its origins in classical mechanics. But it applies more generally: for any sort of system governed by a minimum principle, black-boxing should give a functor to some category where the morphisms are Lagrangian relations.

The first step toward such a theorem for electrical circuits is to treat circuits as morphisms in a suitable category. We start with circuits made only of ideal perfectly conductive wires. These are morphisms in a prop we call \mathrm{Circ}, defined in Section 3 of our paper. In Section 8 we construct a black-boxing functor

\blacksquare \colon \mathrm{Circ} \to \mathrm{LagRel}_k

sending each such circuit to the relation it defines between its input and output potentials and currents. Here \mathrm{LagRel}_k is a prop with symplectic vector spaces of the form k^{2n} as objects and linear Lagrangian relations as morphisms, and \blacksquare is a morphism of props. We work in a purely algebraic fashion, so k here can be any field.

In Section 9 we extend black-boxing to a larger class of circuits that include linear resistors, inductors and capacitors. This gives a new proof of the black-boxing theorem that Brendan and I proved: namely, there is a morphism of props

\blacksquare \colon \mathrm{Circ}_k \to \mathrm{LagRel}_k

sending each such linear circuit to the Lagrangian relation it defines between its input and output potentials and currents. The ease with which we can extend the black-boxing functor is due to the fact that all our categories with circuits as morphisms are props. We can describe these props using generators and relations, so that constructing a black-boxing functor simply requires that we choose where it sends each generator, and check that the all the relations hold. In Section 10 we explain how electric circuits are related to signal-flow diagrams, used in control theory. Finally, in Section 11, we illustrate how props can be used to study nonlinear circuits.

Outline of the results

The paper is pretty long, so here’s a more detailed outline of the results.

In Section 1 we explain a general notion of `L-circuit’ that was first introduced under a different name here:

• R. Rosebrugh, N. Sabadini and R. F. C. Walters, Generic commutative separable algebras and cospans of graphs, Theory and Applications of Categories 15 (2005), 164–177.

An L-circuit is a cospan of finite sets where the apex is the set of nodes of a graph whose edges are labelled by elements of some set L. In applications to electrical engineering, the elements of L describe different ‘circuit elements’ such as resistors, inductors and capacitors. We discuss a symmetric monoidal category \mathrm{Circ}_L whose objects are finite sets and whose morphisms are (isomorphism classes of) L-circuits.

In Section 2 we study \mathrm{Circ}_L when L is a 1-element set. We call this category simply \mathrm{Circ}. In applications to engineering, a morphism in \mathrm{Circ} describes circuit made solely of ideal conductive wires. We show how such a circuit can be simplified in two successive stages, described by symmetric monoidal functors:

\mathrm{Circ} \stackrel{G}{\longrightarrow} \mathrm{FinCospan} \stackrel{H}{\longrightarrow} \mathrm{FinCorel}

Here \mathrm{FinCospan} is the category of cospans of finite sets, while \mathrm{FinCorel} is the category of ‘corelations’ between finite sets. Corelations, categorically dual to relations, are already known to play an important role in network theory:

• Brendan Fong, Decorated corelations.

Just as a relation can be seen as a jointly monic span, a corelation can be seen as a jointly epic cospan. The functor G crushes any graph down to its set of components, while H reduces any cospan to a jointly epic one.

In Section 4 we turn to props. Propositions 11 and 12, proved in Appendix A.1 with the help of Steve Lack, characterize which symmetric monoidal categories are equivalent to props and which symmetric monoidal functors are isomorphic to morphisms of props. We use these to find props equivalent to \mathrm{Circ}_L, \mathrm{Circ}, \mathrm{FinCospan} and \mathrm{FinCorel}. This lets us reinterpret the arrows here as morphisms of props:

\mathrm{Circ} \stackrel{G}{\longrightarrow} \mathrm{FinCospan} \stackrel{H}{\longrightarrow} \mathrm{FinCorel}

In Section 5 we discuss presentations of props. Proposition 19, proved in Appendix A.2 using a result of Todd Trimble, shows that the category of props is monadic over the category of signatures, \mathrm{Set}^{\mathbb{N} \times \mathbb{N}}. This lets us work with props using generators and relations. We conclude by recalling a presentation of \mathrm{FinCospan} due to Lack and a presentation of \mathrm{FinCorel} due to Coya and Fong:

• Steve Lack, Composing PROPs, Theory and Applications of Categories 13 (2004), 147–163.

• Brandon Coya and Brendan Fong, Corelations are the prop for extraspecial commutative Frobenius monoids, Theory and Applications of Categories 32 (2017), 380–395. (Blog article here.)

In Section 6 we introduce the prop \mathrm{FinRel}_k. This prop is equivalent to the symmetric monoidal category with finite-dimensional vector spaces over the field k as objects and linear relations as morphisms, with direct sum as its tensor product. A presentation of this prop was given in these papers:

• Filippo Bonchi, Pawel Sobociénski and Fabio Zanasi, Interacting Hopf algebras, Journal of Pure and Applied Algebra 221 (2017), 144–184.

• John Baez and Jason Erbele, Categories in control, Theory and Application of Categories 30 (2015), 836–881. (Blog article here.)

In Section 7 we state the main result in the paper by Rosebrugh, Sabadini and Walters. This gives a presentation of \mathrm{Circ}_L. Equivalently, it says that \mathrm{Circ}_L is the coproduct, in the category of props, of \mathrm{FinCospan} and the free prop on a set of unary operations, one for each element of L. This result makes it easy to construct morphisms from \mathrm{Circ}_L to other props.

In Section 8 we introduce the prop \mathrm{LagRel}_k where morphisms are Lagrangian linear relations between symplectic vector spaces, and we construct the black-boxing functor \blacksquare \colon \mathrm{Circ} \to \mathrm{LagRel}_k. Mathematically, this functor is the composite

\mathrm{Circ} \stackrel{G}{\longrightarrow} \mathrm{FinCospan} \stackrel{H}{\longrightarrow} \mathrm{FinCorel} \stackrel{K}{\longrightarrow} \mathrm{LagRel}_k

where K is a symmetric monoidal functor defined by its action on the generators of \mathrm{FinCorel}. In applications to electrical engineering, the black-boxing functor maps any circuit of ideal conductive wires to its ‘behavior’: that is, to the relation that it imposes on the potentials and currents at its inputs and outputs.

In Section 9 we extend the black-boxing functor to circuits that include resistors, inductors, capacitors and certain other linear circuit elements. The most elegant prop having such circuits as morphisms is \mathrm{Circ}_k, meaning \mathrm{Circ}_L with the label set L taken to be a field $k.$ We characterize the black-boxing functor

\blacksquare \colon \mathrm{Circ}_k \to \mathrm{LagRel}_k

in Theorem 41.

In Section 10 we expand the scope of inquiry to include ‘signal-flow diagrams’, which are important in control theory. We explain how signal-flow diagrams serve as a syntax for linear relations. Concretely, this means that signal-flow diagrams are morphisms in a free prop \mathrm{SigFlow}_k with the same generators as \mathrm{FinRel}_k, but no relations. There is thus a morphism of props

\square \colon \mathrm{SigFlow}_k \to \mathrm{FinRel}_k

mapping any signal-flow diagrams to the linear relation that it denotes. It is natural to wonder how this is related to the black-boxing functor

\blacksquare \colon \mathrm{Circ}_k \to \mathrm{LagRel}_k

The answer involves the free prop \widetilde{\mathrm{Circ}}_k which arises when we take the simplest presentation of \mathrm{Circ}_k and omit the relations. This comes with a map

P \colon \widetilde{\mathrm{Circ}}_k \to \mathrm{Circ}_k

which reinstates those relations, and in Theorem 44 we show there is a map of props T \colon \widetilde{\mathrm{Circ}}_k \to \mathrm{SigFlow}_k making this diagram commute:

Putting everything together, we get this grand commuting diagram relating circuit diagrams, linear circuits, signal flow diagrams, cospans, corelations, Lagrangian relations, and linear relations:

Finally, in Section 11 we illustrate how props can also be used to study nonlinear circuits. Namely, we show how to include voltage and current sources. Black-boxing these gives Lagrangian affine relations between symplectic vector spaces! Eventually we’ll get around to studying more general nonlinear circuits… but not today.

Feature Importance Measures for Tree Models — Part I
2019–05–25 Update: I’ve published a post covering another importance measure — SHAP values — on my personal blog and on Medium. This post is inspired by a Kaggle kernel and its discussions [1]. I’d…
Roundup

Matthew Alhonte

Supervillain in somebody's action hero movie. Experienced a radioactive freak accident at a young age which rendered him part-snake and strangely adept at Python.