Higher-order analysis of temporal network data¶
An educational tutorial using the free python module pyTempNets¶
Ingo Scholtes
Chair of Systems
Design
ETH Zürich
May 2015 (Updated: August 2015)
In this tutorial, I will introduce some basic concepts of the analysis of temporal network data using higher-order networks, which have been introduced in the following recent publications:
- I Scholtes, N Wider, A Garas: Higher-Order Aggregate Networks in the Analysis of Temporal Networks: Path Structures and centralities, [1508.06467], Aug. 2015
- I Scholtes, N Wider, R Pfitzner, A Garas, CJ Tessone, F Schweitzer: Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal neworks, Nature Communications, 5, Sept. 2014
- R Pfitzner, I Scholtes, A Garas, CJ Tessone, F Schweitzer: Betweenness preference: Quantifying correlations in the topological dynamics of temporal networks, Phys Rev Lett, 110(19), 198701, May 2013
A key point of these works is that we highlight the importance of non-Markovian characteristics in time-stamped relational data. In a nutshell, we show that the ordering of links in time-stamped networks matters. We further introduce higher-order networks, a generalization of the common network perspective that allows to study the influence of order correlations in temporal networks. This new perspective on time-stamped network data provides interesting new opportunities not only to better understand dynamical processes on temporal networks, but also for the definition of novel measures for centrality or new community detection algorithms that take into account bot the topological and temporal dimension of complex systems.
In the following, I illustrate some basic concepts of non-Markovian
characteristics in temporal networks. In particular, I will showcase the use of
pyTempNet, a free
python module which simplifies the analysis and visualization of time-stamped
relational data, as well as the simulation of dynamical processes on top of
temporal networks. All of the methods and concepts introduced in the
publications listed above are fully implemented in pyTempNet
thus
making it particularly easy to apply the higher-order network framework to your data.
Installation of pyTempNets is easy. You will need a working installation of
python3
properly set up. You will also have to have the modules
numpy
, scipy
, matplotlib
and
igraph
installed. This can be done easily by installing the
Anaconda python 3.4 distribution, which you can get from here.
Once you have installed Anaconda, you will just need to additionally set up
igraph
by invoking the following command:
> pip install python-igraph
You are then ready to install pyTempNets directly from gitHub by typing:
> pip install git+git://github.com/IngoScholtes/pyTempNets.git
After this, you should be ready to run this notebook.
Let us start with a new ipython
notebook and let us import some
basic modules required in the following tutorial. We first import the basic
pyTempNet
module as follows:
import pyTempNet as tn
Since time-aggregated networks are internally represented as
igraph
objects (which we can plot and analyze using standard
igraph
functions) we will also need to include
igraph
.
import igraph
Finally, we will want to plot some figures and do some basic calculations
using numpy
and matplotlib
.
import numpy as np
import matplotlib.pyplot as plt
You won't have to care too much about the following imports and definitions, as they will simply allow us to directly embed igraph plots, Latex output and videos into the output of an ipython notebook.
from IPython.display import *
from IPython.display import HTML
import base64
from wand.image import Image
from subprocess import call
def showVideo(filename):
VIDEO_TAG = """<video controls>
<source src="data:video/x-m4v;base64,{0}" type="video/mp4">
Your browser does not support the video tag.
</video>"""
bytes = open(filename, "rb").read()
return HTML(VIDEO_TAG.format(base64.b64encode(bytes).decode('utf-8')))
def compileAndShowTex(file):
call("cd " + file.split('/')[0] + " && pdflatex " + file.split('/')[1], shell=True)
with Image(filename=file.replace('.tex', '.pdf'), resolution=200) as img:
img.resize(400, 400)
img.save(filename=file.replace('.tex', '.png'))
display(Image(filename=file.replace('.tex', '.png')))
Two simple examples¶
Let us introduce some basic concepts using two simple examples for temporal networks. We first create an empty temporal network object, which we can then use to add time-stamped links one by one. In each call, we pass the source and target node, as well as an integer time stamp indicating when the link was present.
t1 = tn.TemporalNetwork()
t1.addEdge('a', 'c', 1)
t1.addEdge('c', 'e', 2)
t1.addEdge('b', 'c', 3)
t1.addEdge('c', 'd', 4)
t1.addEdge('b', 'c', 5)
t1.addEdge('c', 'e', 6)
t1.addEdge('a', 'c', 7)
t1.addEdge('c', 'd', 8)
A nice way to visualize such networks are so-called time-unfolded
notations, in which we unfold time into a spatial (vertical) dimension.
pyTempNet allows you to directly generate tikz
code for such a
representation. We just call the following function:
tn.Visualizer.exportTikzUnfolded(t1, 'tutorial/t1.tex')
This will produce a LaTeX file which we can simply compile to obtain a PDF
figure (this comes in handy for illustrative figures in manuscripts). We can
either manually compile the file and convert the resulting PDF figure to a PNG,
or we can do this automatically using our little helper function (which requires
imagemagick
and ghostscript
as well as
imagemagick
's python interface Wand
to be installed on
your system).
With this helper function, we can easily display the resulting figure in ipython as follows:
compileAndShowTex('tutorial/t1.tex')
In this representation, each of the five nodes is represented by multiple temporal copies. Two subsequent temporal copies are connected whenever a link existed at the respective time stamp. This representation makes it easy to study so-called time-respecting paths.
Apart from time-unfolded representations, pyTempNet
can also
easily be used to produce videos of temporal networks. For this we can simply
export a sequence of PNG movie frames using the following built-in
function:
tn.Visualizer.exportMovie(t1,'tutorial\\video.mp4', fps=1)
This will internally generate a sequence of numbered frames within a
subdirectory called frames
. It will then automatically convert the
resulting frames to an mp4 video using - for instance - the tool
imagemagick
. For this, make sure to install
imagemagick
for your platform.
We can now show the resulting video of the temporal network using the helper function defined above. Just click the play button.
showVideo('tutorial/video.mp4')
Let us now study the extraction of so-called time-respecting
paths in temporal networks. This crucial concept is the temporal
equivalent to paths in static network. Different from static networks, links on
a time-respecting path have to respect causality, i.e. in order
for a path (u,v,t_1) -> (v,w,t_2) of two time-stamped links to connect nodes
u and w, the first link has to be present before the second
link, i.e. t_1<t_2. pyTempNet
can automatically extract
time-respecting paths of length two for you. Sometimes, you may also want to
include a limit in the waiting time for time-respecting paths. I.e. rather than
simply requiring t_1<t_2, you may want to impose that t_2-t_1 <= delta for
some maximum waiting time delta.
The pyTempnet
function extractTwoPaths
can be used
to extract time-respecting paths of length two. If no delta is specified, delta
= 1 is assumed, i.e. only consecutive links will be considered to contribute to
time-respecting paths.
Note that we can specify an arbitrary maximum time difference delta by using
the function setMaxTimeDiff
.
t1.setMaxTimeDiff(delta=1)
t1.extractTwoPaths()
Depending on your data set, this function may require a few seconds (if you have several hundred thousand or even Millions of time-stamped links). Once it has been completed, we can start analyzing causality structures and non-Markovian characteristics in the temporal network. If you omit the explicit call above, the call will automatically be made whenever time-respecting paths are first needed.
Based on the statistics of time-respecting paths, we can calculate betweenness preference, a measure that captures to what extent nodes mediate time-respecting paths between particular pairs of their neighbours in the static network. Being an information-theoretic measure, it can be interpreted in terms of the number of bits of information that are lost, when we aggregate all time-stamped links around a node (thus obtaining a time-aggregated network which misses the information on the ordering of links). For details on the definition and interpretation of the measure, I have to refer to the publications mentioned above.
With pyTempNet
we can calculate the betweenness preference of a
node using the following function. In this particular case, the betweenness
preference of node c should be zero, as there is no preference for node c to
mediate time-respecting paths between any particular pair of nodes (see also
temporal network above).
print('I^b(S;D) = ', tn.Measures.BetweennessPreference(t1,'c'))
We can now calculate different time-aggregated representations of our temporal network. The simplest and most commonly used one is the plain time-aggregated network, in which all time-stamps are discarded, while link weights indicate the number or frequency of interactions between nodes.
We can obtain an igraph object representing the (first-order) aggregate network as follows. This igraph object can be plotted using basic igraph functions.
# We can compute and plot the first-order aggregate network
g1 = t1.igraphFirstOrder()
visual_style = {}
visual_style["bbox"] = (600, 400)
visual_style["margin"] = 60
visual_style["vertex_size"] = 80
visual_style["vertex_label_size"] = 24
visual_style["vertex_color"] = "lightblue"
visual_style["edge_curved"] = 0.2
visual_style["edge_width"] = 1
visual_style["edge_arrow_size"] = 2
visual_style["layout"] = g1.layout_auto()
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_label"] = g1.es["weight"]
igraph.plot(g1, 'tutorial/t1_G1.png', **visual_style)
display(Image(filename='tutorial/t1_G1.png'))
In the Nature Communications and EPJ B articles listed above, we introduced so-called higher-order networks which not only capture the frequency but also the ordering of time-stamped links. The idea is that each link in the time-stamped network is represented by a node in the second-order network, while each time-respecting path of length two is represented by a link. Again, details on this abstraction can be found in the papers above.
Here, we can compute (and plot) the second-order network for the temporal network above as follows:
g2 = t1.igraphSecondOrder()
visual_style["layout"] = g2.layout_auto()
visual_style["vertex_label"] = g2.vs["name"]
visual_style["edge_label"] = g2.es["weight"]
igraph.plot(g2, 'tutorial/t1_G2.png', **visual_style)
display(Image(filename='tutorial/t1_G2.png'))
In the example above, we see that a total of four time-respecting paths of length two (represented by the four links) exist in the temporal network, each of them occurring exactly one time (thus the link weights of one).
Let us now see what happens if we simply flip the order of two time-stamped links in the temporal network. We do this by defining the following new temporal network (which is identical to the previous one except for one reordering).
t2 = tn.TemporalNetwork()
t2.addEdge('a', 'c', 1)
t2.addEdge('c', 'e', 2)
t2.addEdge('b', 'c', 3)
t2.addEdge('c', 'd', 4)
t2.addEdge('b', 'c', 5)
t2.addEdge('c', 'd', 6)
t2.addEdge('a', 'c', 7)
t2.addEdge('c', 'e', 8)
Again, we can produce a tikz figure showing the time-unfolded representation of this temporal network (which I again manually compiled and converted to a PNG file).
tn.Visualizer.exportTikzUnfolded(t2, 'tutorial/t2.tex')
compileAndShowTex('tutorial/t2.tex')
We now again extract all time-respecting paths of length two and then visualize the first-order network. Not surprisingly, this is exactly the same as before, since we only changed the ordering of time-stamped links.
t2.setMaxTimeDiff(delta=1)
t2.extractTwoPaths()
g1 = t2.igraphFirstOrder()
visual_style["layout"] = g1.layout_auto()
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_label"] = g1.es["weight"]
igraph.plot(g1, 't2_G1.png', **visual_style)
display(Image(filename='t2_G1.png'))
As we will see now, there is an important change in terms of causality structures. Now, node c has a preference to mediate time-respecting paths between the pairs of nodes a and e as well as b and d respectively. Furthermore, knowing the "source" of a time-respecting path through c completely determines the target (source a determining target e and source b determining target d). Our uncertainity of two equally likely choices is reduced to one, which corresponds to a betweenness preference of one bit.
print('I^b(S;D) = ', tn.Measures.BetweennessPreference(t2,'c'))
We can also output the corresponding (unnormalized) betweenness preference matrix (again see paper for details). The entries provide us with the number of different time-respecting paths through node c. The first row corresponds to node a, the second row corresponds to node b, the first column corresponds to node e, the second column corresponds to node d. Here, the entries reveal that there are two time-respecting paths a -> c -> e (first row) and two time-respecting paths b -> c -> d (second row). Time-respecting paths a -> c -> d and b -> c -> e (off-diagonal zero entries) are absent.
print('I^b(S;D) = ', tn.Utilities.BWPrefMatrix(t2,'c'))
The changes in the statistics of time-respecting paths that are due to the reordering of the time-stamped edges is captured by the fact that the second-order network is different from the one before (even though the first-order network is the same!).
g2 = t2.igraphSecondOrder()
visual_style["layout"] = g2.layout_auto()
visual_style["vertex_label"] = g2.vs["name"]
visual_style["edge_label"] = g2.es["weight"]
igraph.plot(g2, 'tutorial/t2_G2.png', **visual_style)
display(Image(filename='tutorial/t2_G2.png'))
In the second-order network above, we see that now each of the two time-respecting paths (a,c) -> (c,e) and (b,c) -> (c,d)occurs twice, while the theoretically possible time-respecting paths (a,c) -> (c,d) and (b,c) -> (c,e) (which we would actually expect to occur with the same probability than the others) are absent!
You see that this change in the causal structures of the temporal network are exclusively driven by the ordering of time-stamped edges and it is these phenomena that we will now study in more realistic scenarios.
Reading from files: A larger synthetic example¶
We now consider a larger (synthetic) example which we read from a TEDGE file, a simple file format which contains a list of time-stamped links. TEDGE files have the format
`source target time
a b 4
b c 7
u v 3
u v 3
...`
The separation character (here: space) can be arbitrary and can be specified upon reading. Furthermore, the ordering of columns can be arbitrary and thus a header line indicating which column refers to source, target and timestamp of an edge is required. As you can see in the example above, time-stamped edges do not necessarily have to be ordered according to time. In addition to simple integer time stamps, actual (arbitrary) string timestamps like 2015-05-23 09:23:22 can be used. In this case a format string indicating the meaning of the timestamp has to be included.
Here, we use a simple example of integer time stamps, reading a file of time-stamped edges that were synthetically generated to include order correlations.
t = tn.readFile('data/tutorial/example.tedges', sep=' ')
We can actually print a summary of its basic properties by using the
Summary()
function.
print(t.Summary())
Let us now extract time-respecting paths of length two and visualize the first-order network.
t.setMaxTimeDiff(delta=1)
t.extractTwoPaths()
g1 = t.igraphFirstOrder()
visual_style = {}
visual_style["bbox"] = (600, 600)
visual_style["margin"] = 60
visual_style["edge_width"] = [x/100 for x in g1.es()["weight"]]
visual_style["vertex_size"] = 20
visual_style["vertex_label_size"] = 12
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_arrow_size"] = 0.3
visual_style["layout"] = g1.layout_auto()
igraph.plot(g1, 'tutorial/example_g1.png', **visual_style)
display(Image(filename='tutorial/example_g1.png'))
Importantly, the weighted time-aggregated network does not show any specific structure. However, due to the specific ordering of time-stamped edges in the temporal network, the second-order aggregate network can look completely different, meaning that only very specific time-respecting paths actually occur. Here, the second-order network shows three pronounced temporal communities that are not visible in the first-order network.
g2 = t.igraphSecondOrder()
visual_style = {}
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 5
igraph.plot(g2, 'tutorial/example_g2.png', **visual_style)
display(Image(filename='tutorial/example_g2.png'))
Analyzing real-world data: Diffusion dynamics in temporal networks¶
Let us now come to a more interesting case, analyzing real-world relational data sets from a number of different contexts. The first data set that we will study covers time-stamped antenna-antenna interactions in an ant colony and was originally published in:
- B Blonder, A Dornhaus: Time-Ordered Networks Reveal Limitations to Information Flow in Ant Colonies, PlosOne, 2011
Here, we read data on one particular colony (details in our papers above) as a temporal network.
t = tn.readFile('data/tutorial/ants-1-1_agg_6s_scc.tedges', sep=' ')
print(t.Summary())
Let us first consider the (first-order) time-aggregated network.
g1 = t.igraphFirstOrder()
visual_style = {}
visual_style["bbox"] = (600, 600)
visual_style["margin"] = 60
visual_style["vertex_size"] = 30
visual_style["vertex_label_size"] = 8
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_arrow_size"] = 0.3
visual_style["layout"] = g1.layout_auto()
igraph.plot(g1, 'tutorial/ants_g1.png', **visual_style)
display(Image(filename='tutorial/ants_g1.png'))
Above, we have briefly argued about the time-respecting paths that we
would expect, considering that specific order correlations (i.e.
betweennness preferences of nodes) are absent in the temporal network. Based on
the first-order aggregate network we can actually compute the expected
second-order network, under the assumption of a null model in which all
interactions are independent from each other. For this, we can use the following
pyTempNet
function:
g2n = t.igraphSecondOrderNull()
visual_style = {}
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 10
visual_style["layout"] = g2n.layout_auto()
igraph.plot(g2n, 'tutorial/ants_g2n.png', **visual_style)
display(Image(filename='tutorial/ants_g2n.png'))
We see that this expected network is rather densely connected, i.e. we expect many time-respecting paths of length two to exist between nodes. We can now compare this to the actual second-order network.
Here, we see that - due to the specific ordering of time-stamped links, the actual second-order network can look completely different, meaning that only very specific time-respecting paths actually occur. Here, the second-order network is much less connected than the expected one shown above.
g2 = t.igraphSecondOrder()
igraph.plot(g2, 'tutorial/ants_g2.png', **visual_style)
display(Image(filename='tutorial/ants_g2.png'))
This (visually) different second-order network shown above is a sign for the presence of non-Markovian characteristics which change causality in the temporal network. We can quantify the presence of these characteristics by calculating the entropy growth rate ratio between a first- and a second-order Markov model of the time-stamped edge sequence (see details in publications above). This value captures to what extend the actual temporal network differs from a Markovian case, in which the next edge in a time-stamped edge sequence is independent from the previous one.
Here, we observe a value that is strictly smaller than one, which signifies that the data set indeed included non-Markovian characteristics.
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
We can next ask the question to what extent these characteristics influence dynamical processes on the temporal network. Here, we consider a diffusion process, modeled by the convergence behavior of a random walk process on the temporal network. With the next two functions, we can compute the time (i.e. number of random walk steps) required by a random walker to converge. Precisely, we measure the average number of steps after which the total variation distance between the visitation probabilities and the stationary distribution is smaller than a given threshold of epsilon.
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)
print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
For small epsilon (i.e. large times t) we empirically observe that non-Markovian characteristics slow down the diffusion process on average by a factor of approx. 2.05 (values will naturally differ for individual runs).
We can actually predict this analytically by means of a spectral analysis of the second-order network. The following function analytically calculates the expected factor by which a diffusion process in a temporal network is slowed down due to non-Markovian characteristics (and the resulting changes in the causality structures of the temporal network).
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Here we expect diffusion in the temporal network to be slower by a factor of about 2.05 (compared to a Markovian temporal network in which no order correlations change causality). This is well in line with our empirical observation above.
Apart from predicting changes in diffusion speed, we can do some extended spectral analysis of the second-order network based on a normalized Laplacian matrix. This analysis confirms that the actual temporal sequence has a smaller algebraic connectivity than a Markovian temporal network. This indicates that the causal topology of time-respecting paths in the temporal network is less connected than expected at random, thus explaining the observed slow-down of diffusion.
This can intuitively be related to the weaker *causal connectivity& in the real temporal network compared to what we expect at random (compare the second-order networks shown above).
print("Algebraic Connectivity (G2) =", tn.Measures.AlgebraicConn(t))
print("Algebraic Connectivity (G2 null) =", tn.Measures.AlgebraicConn(t, model="NULL"))
Let us now move to a second data set, which covers E-Mail exchanges between employees in a manufacturing company. Again, we can read these data asa temporal network.
t = tn.readFile('data/tutorial/manufacturing_30d_agg_3600_scc.tedges', sep=' ')
print(t.Summary())
The entropy growth rate ratio smaller than one confirms that the temporal network exhibits non-Markovian characteristics that are likely to change causality.
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
Based on spectral properties, we analytically predict these characteristics to slow down diffusion by a factor of about 3.01 (compared to a Markovian temporal network).
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Let us empirically confirm that this analytical prediction is correct ...
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-12)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-12)
print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Our third data set covers contacts between medical personnel in a hospital, recorded by means of proximity sensing tags. Again, we read these data asa temporal network.
t = tn.readFile('data/tutorial/Hospital_noADM_agg_300_scc_8_56h.tedges', sep=' ')
print(t.Summary())
Again, the entropy growth rate ratio smaller than one confirms that the temporal network exhibits non-Markovian characteristics that are likely to change causality.
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
Based on spectral properties, we analytically predict these characteristics to slow down diffusion by a factor of about 5.75.
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Again, we empirically confirm that this prediction is reasonable ...
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-10)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-10)
print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
We next use the RealityMining data set, covering proximity interactions between students at MIT. We read it as a temporal network.
t = tn.readFile('data/tutorial/RealityMining_agg_300s_scc.tedges', sep=' ')
print(t.Summary())
And again, the temporal sequence deviates from a Markovian temporal network and non-Markovian characteristics are expected to slow down diffusion by a factor of about 7.77, which we confirm empirically.
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)
print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Finally, we also find examples for temporal networks in which non-Markovian characteristics result in a speed-up. For this, we consider a data set of time-stamped passenger flows in the London Tube network.
t = tn.readFile('data/tutorial/tube_flows_scc.tedges', sep=' ')
print(t.Summary())
Interestingly, here non-Markovian characteristics result in a speed-up of diffusion expressed by a slow-down factor smaller than one.
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
This sounds counter-intuitive, the mere reordering of time-stamped links results in a speed up. Let us validate this by means of a simulation.
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)
print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
That's interesting, but what exactly is happening there? We can get a better
idea about this, by actually looking at a visualization of a diffusion process.
For this purpose, pyTempNet
comes with a function that allows to
conveniently generate a video of the diffusion in a temporal network. The idea
is that the process will be visualized in the first-order
time-aggregated network, while the diffusion dynamics can either follow a
second-order or a first-order Markov model. By this, we are able to visualize
the effect of order correlations on the dynamical process.
g1 = t.igraphFirstOrder()
visual_style = {}
visual_style["edge_arrow_size"] = 0.4
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 10
visual_style["edge_width"] = [np.log(x)/4 for x in g1.es()["weight"]]
visual_style["layout"] = g1.layout_auto()
tn.exportDiffusionVideo(t, 'tutorial/LT_diffusion_t2.mp4', visual_style, steps = 200, initial_index=0, model='SECOND')
This function will output the evolution of total variation distance on the
console every 50 steps, so that you can check whether the chosen number of steps
was sufficient for the diffusion process to converge to a reasonable degree. The
function also generates video frames, are internally encoded into a video using
imagemagick
's convert tool. If you would like, you could actually
call the function exportDiffusionMovieFrames
and convert the frames
into a video yourself with this command:
> convert -delay 10 frames/LT_diffusion_t2_frame* output/LT_t2.mp4
However, here we save us this effort and directly embed the video resulting from the method call above. This video shows how a diffusion process evolves in the temporal network based on the actual two-path statistics in the data.
showVideo('tutorial/LT_diffusion_t2.mp4')
In the video, we observe a fast diffusion through the network. Let us now compare this to a diffusion process based on the first-order time-aggregated network, i.e. based on the two-path statistics that we would expect. We could do this in the same way as above, i.e. generating a video and embedding it below. However there is a nice shortcut that directly produces a comparison video, in which a diffusion process is running side-by-side in a Markovian and non-Markovian temporal network. This allows you to quickly visually explore how non-Markovian characteristics influence a dynamical process in your data. We can simply do this by the following method call:
tn.exportDiffusionComparisonVideo(t, 'tutorial/diffusion_comparison.mp4', visual_style, steps = 200, initial_index=0)
Neat! Now we can directly watch the diffusion processes side-by-side. Note that in the current implementation, the non-Markovian temporal network is shown on the left, while the Markovian one is shown on the right.
showVideo('tutorial/diffusion_comparison.mp4')
That's interesting! We observe that the diffusion process based on a first-order model, in which two-path statistics correspond to what we expect based on a weighted time-aggregated network, is indeed slower than in the second-order model. We also observe why this is the case: The numerous loops in the network result in a slow diffusion, while the actual two-path statistics based on the data set on passenger flows result in a much faster diffusion, which quickly reaches all nodes in the network.
In fact, this is not surprising as people do not travel the London Tube in loops. And it is this temporal characteristics which we miss if we merely study time-aggregated networks, and which we can include in our analysis by means of a second-order representation.
Spectral analysis: demonstration using TRIGRAM files¶
Let us now come to the last part of our tutorial, introducing some basic support for a spectral analysis of temporal networks. Here, rather than using a TEDGE file containing time-stamped edges from which we extract time-respecting paths of length two, we use so-called TRIGRAM files. These files directly contain the statistics of time-respecting paths of length two. They are useful whenever direct data on paths or flows in dynamic networks is available.
The format of a TRIGRAM file is the following:
`node1 node2 node3 weight
a b c 3.0
u v w 4.0
...`
In the example above, we have a time-respecting path (a,b) -> (b,c) occuring three times, while we have a time-respecting path (u,v) -> (v,w) occurring four times.
Based on this format, in the following we use synthetic TRIGRAM data
generated from a model which
shows how non-Markovian
characteristics in temporal networks can both slow down and speed up dynamical
processes.
Similar like we have seen above, the first file corresponds to a case where non-Markovian properties speed up a diffusion process.
t_su = tn.readFile('data/tutorial/sigma0_75.trigram', fformat='TRIGRAM', sep = ' ')
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t_su))
Again, the entropy growth rate ratio is smaller than one, verifying that the temporal network has non-Markovian characteristics. We can test the effect of these non-Markovian characteristics by computing the analytical slow-down factor. Here it is smaller than one, verifying a speed-up of diffusion dynamics.
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t_su))
Let us have a closer look at this temporal network. The first-order aggregate network consists of two communities.
g1 = t_su.igraphFirstOrder()
visual_style = {}
visual_style["edge_width"] = [np.sqrt(x) for x in g1.es()["weight"]]
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 10
visual_style["edge_arrow_size"] = 0.5
visual_style["layout"] = g1.layout_auto()
igraph.plot(g1, 'tutorial/model_0_75_g1.png', **visual_style)
display(Image(filename='tutorial/model_0_75_g1.png'))
These two communities are visible in the expected second-order network as well
g2n = t_su.igraphSecondOrderNull()
visual_style = {}
max_weight = max(g2n.es()["weight"])
visual_style["edge_width"] = [400*x for x in g2n.es()["weight"]]
visual_style["edge_arrow_size"] = 0.5
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 12
visual_style["layout"] = g2n.layout_auto()
igraph.plot(g2n, 'tutorial/model_0_75_g2n.png', **visual_style)
display(Image(filename='tutorial/model_0_75_g2n.png'))
In the actual second-order time-aggregated network, we observe that communities are (temporally) more connected than expected at random (visible in the few strong links connecting communities), and thus explaining the speed-up.
g2 = t_su.igraphSecondOrder()
max_weight = max(g2.es()["weight"])
visual_style["edge_width"] = [6*x for x in g2.es()["weight"]]
visual_style["edge_arrow_size"] = 0.5
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 12
igraph.plot(g2, 'tutorial/model_0_75_g2.png', **visual_style)
display(Image(filename='tutorial/model_0_75_g2.png'))
We can gain some more (analytical) insights about this by means of a spectral analysis of the higher-order networks. We can first calculate algebraic connectivity in the second-order network. A comparison with the algebraic connectivity of the (expected) Markovian temporal network shows that the temporal network is indeed better connected than expected.
print("Algebraic Connectivity (G2) =", tn.Measures.AlgebraicConn(t_su))
print("Algebraic Connectivity (G2 null) =", tn.Measures.AlgebraicConn(t_su, model="NULL"))
We can get a better perspective on this by studying the so-called
Fiedler vector, i.e. the eigenvector corresponding to the
second-smallest eigenvalue of the Laplacian matrix, corresponding to the
higher-order aggregate network. This vector can be computed with
pyTempNet
right away. We can then plot the entries of this vector
correponding to individual nodes in the second-order network (i.e. links in the
first-order network).
fiedler = tn.Measures.FiedlerVectorDense(t_su)
plt.clf()
plt.tick_params(axis='both', which='major', labelsize=20)
plt.tick_params(axis='both', which='minor', labelsize=20)
plt.xlabel('$i$', fontsize=30)
plt.ylabel(r'$(v_2)_i$', fontsize=30)
plt.subplots_adjust(bottom=0.25)
plt.subplots_adjust(left=0.25)
plt.scatter(range(len(fiedler)), np.real(fiedler), color="g")
plt.savefig("tutorial/model_0_75_fiedler.png")
plt.close()
display(Image(filename='tutorial/model_0_75_fiedler.png'))
We see two clear ranges of values below and above zero which are a sign of two pronounced communities. At the same time, the points connecting these two ranges show that a number of nodes are in between these communities. Interestingly, if we compare the Fiedler vector to that of the expected second-order network, we see that community structures in the real temporal network are not as strong as expected at random, i.e. in this case the non-Markovian properties of the temporal network actually mitigate community structures!
fiedler = tn.Measures.FiedlerVectorDense(t_su, model="NULL")
plt.clf()
plt.tick_params(axis='both', which='major', labelsize=20)
plt.tick_params(axis='both', which='minor', labelsize=20)
plt.xlabel('$i$', fontsize=30)
plt.ylabel(r'$(v_2)_i$', fontsize=30)
plt.subplots_adjust(bottom=0.25)
plt.subplots_adjust(left=0.25)
plt.scatter(range(len(fiedler)), np.real(fiedler), color="b")
plt.savefig("tutorial/model_fiedler_null.png")
plt.close()
display(Image(filename='tutorial/model_fiedler_null.png'))
We now consider a second example, a temporal network which - although it is consistent with the same time-aggregated network shown above, is generated in such a way that non-Markovian characteristics actually slow down diffusion. We can again read the resulting TRIGRAM file, compute the entropy growth rate ration and the analytical slow-down factor.
t_sd = tn.readFile('data/tutorial/sigma-0_75.trigram', fformat='TRIGRAM', sep = ' ')
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t_sd))
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t_sd))
Here, our analysis shows that diffusion is expected to be slower by a factor of five, which we can again verify using simulations.
speed_g2 = tn.Processes.RWDiffusion(t_sd.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t_sd.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)
print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
A spectral analysis shows that the algebraic connectivity in the second-order network is much smaller than before, which is due to the fact that in this particular example non-Markovian characteristics enforce the community structures. Compared to the algebraic connectivity of the expected Markovian temporal network, here non-Markovian characteristics result in a less connected causal topology, thus explaining the slow-down.
print("Algebraic Connectivity (G2) =", tn.Measures.AlgebraicConn(t_sd))
This can also be seen in the distribution of entries of the Fiedler vector. The two value ranges are much more separated, which means that community structures are stronger than before. Furthermore, they are stronger than in the null model.
fiedler = tn.Measures.FiedlerVectorDense(t_sd)
plt.clf()
plt.tick_params(axis='both', which='major', labelsize=20)
plt.tick_params(axis='both', which='minor', labelsize=20)
plt.xlabel('$i$', fontsize=30)
plt.ylabel(r'$(v_2)_i$', fontsize=30)
plt.subplots_adjust(bottom=0.25)
plt.subplots_adjust(left=0.25)
plt.scatter(range(len(fiedler)), np.real(fiedler), color="r")
plt.savefig("tutorial/model_-0_75_fiedler.png")
plt.close()
display(Image(filename='tutorial/model_-0_75_fiedler.png'))
Notably, both the first-order aggregate network, as well as the expected second-order network (both shown below) are exactly the same as before (since the only difference between the temporal networks is the ordering of time-stamped edges).
g1 = t_sd.igraphFirstOrder()
visual_style = {}
visual_style["edge_width"] = [np.sqrt(x) for x in g1.es()["weight"]]
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 15
visual_style["edge_arrow_size"] = 0.5
visual_style["layout"] = g1.layout_auto()
igraph.plot(g1, 'tutorial/model_-0_75_g1.png', **visual_style)
display(Image(filename='tutorial/model_-0_75_g1.png'))
g2n = t_sd.igraphSecondOrderNull()
visual_style["edge_width"] = [500*x for x in g2n.es()["weight"]]
visual_style["edge_arrow_size"] = 0.5
visual_style["vertex_size"] = 10
visual_style["layout"] = g2n.layout_auto()
igraph.plot(g2n, 'tutorial/model_-0_75_g2n.png', **visual_style)
display(Image(filename='tutorial/model_-0_75_g2n.png'))
However, the actual second-order aggregate network is different. Here we observe that compared to the second-order network above, non-Markovian characteristics enforce community structures (visible in the smal weights of the links connecting the two communities in the second-order network).
g2 = t_sd.igraphSecondOrder()
visual_style["edge_width"] = [5*x for x in g2.es()["weight"]]
visual_style["vertex_size"] = 10
visual_style["edge_arrow_size"] = 0.5
igraph.plot(g2, 'tutorial/model_-0_75_g2.png', **visual_style)
display(Image(filename='tutorial/model_-0_75_g2.png'))
Let us finally visually explore how the different ordering of interactions in the two models for temporal networks presented above influence a diffusion process. Based on the spectral analysis presented above, we already have an idea: In the one case, where time-respecting paths across communities are - compared to what is expected at random - overrepresented diffusion speed is fast because the community structure of the aggregate networks is effectively mitigated. In the second case, where time-respecting paths across communities are underrepresented, the ordering of interactions effectively enforces community structures. As such, in the first case we expect a diffusion process to quickly move across communities, while in the second case we expect it to be trapped in communities for a longer time.
Let us check this intuition by generating two videos of the diffusion process in both temporal networks (in both cases comparing a diffusion process that considers the actual two-path statistics with the one expected based on Markovian dynamics).
visual_style = {}
visual_style["edge_width"] = [np.sqrt(x) for x in g1.es()["weight"]]
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 15
visual_style["edge_arrow_size"] = 0.5
visual_style["layout"] = g1.layout_auto()
tn.exportDiffusionComparisonVideo(t_sd, 'tutorial/model_slowdown.mp4', visual_style, steps = 200, initial_index=50)
tn.exportDiffusionComparisonVideo(t_su, 'tutorial/model_speedup.mp4', visual_style, steps = 200, initial_index=50)
This will take about two minutes. After the process finisheds, we can directly embed both resulting videos as before. Again, the non-Markovian dynamics is left, while the Markovian (null) model is right.
showVideo('tutorial/model_slowdown.mp4')
showVideo('tutorial/model_speedup.mp4')
The videos of the diffusion process confirm the results of our spectral analysis: the mere ordering of interactions in temporal networks can either mitigate or enforce the community structures of a complex network, thus speeding up or slowing down diffusion dynamics!
I hope you enjoyed this tutorial and I hope you are now excited to apply the higher-order network framework introduced in the articles mentioned above to your own time-stamped relational data. We believe that higher-order networks provide interesting opportunities for novel data mining methods, new measures for centrality in temporal networks, and the development of novel community detection algorithms that incorporate both topological and temporal characteristics of time-stamped network data. Moreover, as demonstrated above, the higher-order network abstraction has been implemented in the latest version of the Open Soure python module pyTempNets
Feel free to get in touch with me if you have further questions or if you
want to use the python model pyTempNet
for your own analyses.
Ingo Scholtes
ischoltes@ethz.ch
Zurich, May 2015