Continuous-Time Dynamic Network Embeddings

Reference Paper: http://ryanrossi.com/pubs/nguyen-et-al-WWW18-BigNet.pdf

This is a demo of StellarGraph’s implementation of Continuous-Time Dynamic Network Embeddings. The steps outlined in this notebook show how time respecting random walks can be obtained from a graph containing time information on edges, and how these walks can be used to generate network embeddings for a link prediction task.

We compare the embeddings learnt from temporal walks with non-temporal walks in this demo.

Run the master version of this notebook:

[1]:
# install StellarGraph if running on Google Colab
import sys
if 'google.colab' in sys.modules:
  %pip install -q stellargraph[demos]==1.0.0rc1
[2]:
# verify that we're using the correct version of StellarGraph for this notebook
import stellargraph as sg

try:
    sg.utils.validate_notebook_version("1.0.0rc1")
except AttributeError:
    raise ValueError(
        f"This notebook requires StellarGraph version 1.0.0rc1, but a different version {sg.__version__} is installed.  Please see <https://github.com/stellargraph/stellargraph/issues/1172>."
    ) from None
[3]:
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
from IPython.display import display, HTML
from sklearn.manifold import TSNE
from sklearn.model_selection import train_test_split
from stellargraph import StellarGraph
from stellargraph.datasets import IAEnronEmployees

%matplotlib inline

Dataset

The dataset used in this demo is called enron-ia-employees, available in Network Repository: http://networkrepository.com/ia-enron-employees.php

[4]:
dataset = IAEnronEmployees()
display(HTML(dataset.description))

full_graph, edges = dataset.load()
A dataset of edges that represent emails sent from one employee to another.There are 50572 edges, and each of them contains timestamp information. Edges refer to 151 unique node IDs in total.Ryan A. Rossi and Nesreen K. Ahmed “The Network Data Repository with Interactive Graph Analytics and Visualization” (2015)

Split edges

We want to split the edges into parts that can be used for our link prediction task: * the oldest edges are used to create the graph structure * the recent edges are what we are interested in predicting - we randomly split this part further into training and test sets.

[5]:
# subset of edges to split
train_subset = 0.25
test_subset = 0.25

# number of edges to be kept in the graph
num_edges_graph = int(len(edges) * (1 - train_subset))

# keep older edges in graph, and predict more recent edges
edges_graph = edges[:num_edges_graph]
edges_other = edges[num_edges_graph:]

# split recent edges further to train and test sets
edges_train, edges_test = train_test_split(edges_other, test_size=test_subset)

print(
    f"Number of edges in graph: {len(edges_graph)}\n"
    f"Number of edges in training set: {len(edges_train)}\n"
    f"Number of edges in test set: {len(edges_test)}"
)
Number of edges in graph: 37929
Number of edges in training set: 9482
Number of edges in test set: 3161

Create a StellarGraph

Now we can use the edges we’ve reserved for the graph to create an instance of the StellarGraph class

[6]:
graph = StellarGraph(
    nodes=pd.DataFrame(index=full_graph.nodes()),
    edges=edges_graph,
    edge_weight_column="time",
)

Running random walks

Define the random walk parameters we’d like to use: * num_walks_per_node - Number of random walks to perform per starting node. * walk_length - Length of each random walk. For temporal walks, this is the maximum length of a walk, since walks may end up being shorter when there are not enough time respecting edges to traverse. * context_window_size - Size of the context window used to train the Word2Vec model.

[9]:
num_walks_per_node = 10
walk_length = 80
context_window_size = 10

We try to keep the setup comparable between the use of temporal and biased (static) random walks. For temporal walks, the input parameter is defined in terms of the total number of context windows you are interested in obtaining, which differs from the traditional approach of specifying the number of walks to run per node in the graph. We calculate the number of context windows we need in terms of the traditional parameters as:

[10]:
num_cw = len(graph.nodes()) * num_walks_per_node * (walk_length - context_window_size + 1)

We’re now ready to do the walks

[11]:
from stellargraph.data import TemporalRandomWalk

temporal_rw = TemporalRandomWalk(graph)
temporal_walks = temporal_rw.run(
    num_cw=num_cw,
    cw_size=context_window_size,
    max_walk_length=walk_length,
    walk_bias="exponential",
)

print("Number of temporal random walks: {}".format(len(temporal_walks)))
Number of temporal random walks: 1827
[12]:
from stellargraph.data import BiasedRandomWalk

static_rw = BiasedRandomWalk(graph)
static_walks = static_rw.run(
    nodes=graph.nodes(), n=num_walks_per_node, length=walk_length
)

print("Number of static random walks: {}".format(len(static_walks)))
Number of static random walks: 1510

Using the random walks obtained, we can train our Word2Vec models to generate node embeddings

[13]:
from gensim.models import Word2Vec

embedding_size = 128
temporal_model = Word2Vec(
    temporal_walks,
    size=embedding_size,
    window=context_window_size,
    min_count=0,
    sg=1,
    workers=2,
    iter=1,
)
static_model = Word2Vec(
    static_walks,
    size=embedding_size,
    window=context_window_size,
    min_count=0,
    sg=1,
    workers=2,
    iter=1,
)

For convenience, we can use the trained Word2Vec models to define helper functions that transform a node ID into a node embedding.

NOTE: Temporal walks may not generate an embedding for every node in the graph; if there’s no temporal walks that involve a particular node or they are all too short, the node won’t appear in any context window. We handle this by using zeros as embeddings for such nodes to indicate that they are uninformative.

[14]:
unseen_node_embedding = np.zeros(embedding_size)


def temporal_embedding(u):
    try:
        return temporal_model.wv[u]
    except KeyError:
        return unseen_node_embedding


def static_embedding(u):
    return static_model.wv[u]

Node Embedding Visualisation

For visualisation of embeddings, we’ll first define a helper function that we can also use later to show the TSNE visualisation.

[15]:
def plot_tsne(title, x, y=None):
    tsne = TSNE(n_components=2)
    x_t = tsne.fit_transform(x)

    plt.figure(figsize=(7, 7))
    plt.title(title)
    alpha = 0.7 if y is None else 0.5

    scatter = plt.scatter(x_t[:, 0], x_t[:, 1], c=y, cmap="jet", alpha=alpha)
    if y is not None:
        plt.legend(*scatter.legend_elements(), loc="lower left", title="Classes")

We can visualise the node embeddings to take a glance at how the temporal walks have resulted in groups of nodes being clustered together

[16]:
temporal_node_embeddings = temporal_model.wv.vectors
static_node_embeddings = static_model.wv.vectors
plot_tsne("TSNE visualisation of temporal node embeddings", temporal_node_embeddings)
plot_tsne("TSNE visualisation of static node embeddings", static_node_embeddings)
../../../_images/demos_link-prediction_random-walks_ctdne-link-prediction_29_0.png
../../../_images/demos_link-prediction_random-walks_ctdne-link-prediction_29_1.png