StellarGraph API¶

Core¶

This contains the core objects used by the StellarGraph library.

class stellargraph.core.StellarGraphBase(incoming_graph_data=None, **attr)[source]¶

StellarGraph class for undirected graph ML models. It stores both graph information from a NetworkX Graph object as well as features for machine learning.

To create a StellarGraph object ready for machine learning, at a minimum pass the graph structure to the StellarGraph as a NetworkX graph:

For undirected models:

Gs = StellarGraph(nx_graph)

For directed models:

Gs = StellarDiGraph(nx_graph)

To create a StellarGraph object with node features, supply the features as a numeric feature vector for each node.

To take the feature vectors from a node attribute in the original NetworkX graph, supply the attribute name to the node_features argument:

Gs = StellarGraph(nx_graph, node_features="feature")

where the nx_graph contains nodes that have a “feature” attribute containing the feature vector for the node. All nodes of the same type must have the same size feature vectors.

Alternatively, supply the node features as Pandas DataFrame objects with the of the DataFrame set to the node IDs. For graphs with a single node type, you can supply the DataFrame object directly to StellarGraph:

node_data = pd.DataFrame(
    [feature_vector_1, feature_vector_2, ..],
    index=[node_id_1, node_id_2, ...])
Gs = StellarGraph(nx_graph, node_features=node_data)

For graphs with multiple node types, provide the node features as Pandas DataFrames for each type separately, as a dictionary by node type. This allows node features to have different sizes for each node type:

node_data = {
    node_type_1: pd.DataFrame(...),
    node_type_2: pd.DataFrame(...),
}
Gs = StellarGraph(nx_graph, node_features=node_data)

You can also supply the node feature vectors as an iterator of node_id and feature vector pairs, for graphs with single and multiple node types:

node_data = zip([node_id_1, node_id_2, ...],
    [feature_vector_1, feature_vector_2, ..])
Gs = StellarGraph(nx_graph, node_features=node_data)

Parameters:

node_type_name – str, optional (default=globals.TYPE_ATTR_NAME) This is the name for the node types that StellarGraph uses when processing heterogeneous graphs. StellarGraph will look for this attribute in the nodes of the graph to determine their type.
edge_type_name – str, optional (default=globals.TYPE_ATTR_NAME) This is the name for the edge types that StellarGraph uses when processing heterogeneous graphs. StellarGraph will look for this attribute in the edges of the graph to determine their type.
node_features – str, dict, list or DataFrame optional (default=None) This tells StellarGraph where to find the node feature information required by some graph models. These are expected to be a numeric feature vector for each node in the graph.

check_graph_for_ml(features=True)[source]¶: Checks if all properties required for machine learning training/inference are set up. An error will be raised if the graph is not correctly setup.

create_graph_schema(create_type_maps=True, nodes=None)[source]¶

Create graph schema in dict of dict format from current graph.

Note the assumption we make that there is only one edge of a particular edge type per node pair.

This means that specifying an edge by node0, node1 and edge type is unique.

Parameters:	create_type_maps (bool) – If True quick lookup of node/edge types is created in the schema. This can be slow. nodes (list) – A list of node IDs to use to build schema. This must represent all node types and all edge types in the graph. If specified, create_type_maps must be False. If not specified, all nodes and edges in the graph are used.
Returns:	GraphSchema object.

get_feature_for_nodes(nodes, node_type=None)[source]¶

Get the numeric feature vectors for the specified node or nodes. If the node type is not specified the node types will be found for all nodes. It is therefore important to supply the node_type for this method to be fast.

Parameters:	n – (list or hashable) Node ID or list of node IDs node_type – (hashable) the type of the nodes.
Returns:	Numpy array containing the node features for the requested nodes.

info(show_attributes=True, sample=None)[source]¶

Return an information string summarizing information on the current graph. This includes node and edge type information and their attributes.

Note: This requires processing all nodes and edges and could take a long time for a large graph.

Parameters:	sample (int) – To speed up the graph analysis, use only a random sample of this many nodes and edges.
Returns:	An information string.

node_feature_sizes(node_types=None)[source]¶

Get the feature sizes for the specified node types.

Parameters:	node_types – (list) A list of node types. If None all current node types will be used.
Returns:	A dictionary of node type and integer feature size.

node_types¶

Get a list of all node types in the graph.

Returns:	set of types

nodes_of_type(node_type=None)[source]¶

Get the nodes of the graph with the specified node types.

Parameters:	node_type –
Returns:	A list of node IDs with type node_type

type_for_node(node)[source]¶

Get the type of the node

Parameters:	node – Node ID
Returns:	Node type

class stellargraph.core.GraphSchema[source]¶

Class to encapsulate the schema information for a heterogeneous graph.

Typically this should be created from a StellarGraph object, using the create_graph_schema method.

edge_index(edge_type)[source]¶

Return edge type index from the type tuple

Parameters:	index – Tuple of (node1_type, edge_type, node2_type)
Returns:	Numerical edge type index

get_edge_type(edge, index=False)[source]¶

Return the type of the edge as a triple of: (source_node_type, relation_type, dest_node_type).

The edge is specified as a standard NetworkX multigraph edge triple of (node_id_1, node_id_2, edge_key).

If the graph schema is undirected and there is an edge type for the edge (node_id_2, node_id_1, edge_key) then the edge type for this node will be returned permuted to match the node order.

Parameters:	edge – The edge ID from the original graph as a triple. index – Return a numeric type index if True, otherwise return the type triple.
Returns:	A node type triple or index.

get_node_type(node, index=False)[source]¶

Returns the type of the node specified either by node ID.

Parameters:	node – The node ID from the original graph index – Return a numeric type index if True, otherwise return the type name.
Returns:	A node type name or index

is_of_edge_type(edge, edge_type, index=False)[source]¶

Tests if an edge is of the given edge type.

The edge is specified as a standard NetworkX multigraph edge triple of (node_id_1, node_id_2, edge_key).

If the graph schema is undirected then the ordering of the nodes of the edge type doesn’t matter.

Parameters:	edge – The edge ID from the original graph as a triple. edge_type – The type of the edge as a tuple or EdgeType triple.
Returns:	True if the edge is of the given type

node_index(name)[source]¶

Return node type index from the type name

Parameters:	index – name of the node type.
Returns:	Numerical node type index

sampling_layout(head_node_types, num_samples)[source]¶

For a sampling scheme with a list of head node types and the number of samples per hop, return the map from the actual sample index to the adjacency list index.

Parameters:	head_node_types – A list of node types of the head nodes. num_samples – A list of integers that are the number of neighbours to sample at each hop.
Returns:	A list containing, for each head node type, a list consisting of tuples of (node_type, sampling_index). The list matches the list given by the method type_adjacency_list(…) and can be used to reformat the samples given by SampledBreadthFirstWalk to that expected by the HinSAGE model.

sampling_tree(head_node_types, n_hops)[source]¶

Returns a sampling tree for the specified head node types for neighbours up to n_hops away. A unique ID is created for each sampling node.

Parameters:	head_node_types – An iterable of the types of the head nodes n_hops – The number of hops away
Returns:	A list of the form [(type_adjacency_index, node_type, [children]), …] where children are (type_adjacency_index, node_type, [children])

type_adjacency_list(head_node_types, n_hops)[source]¶

Creates a BFS sampling tree as an adjacency list from head node types.

Each list element is a tuple of:

(node_type, [child_1, child_2, ...])

where child_k is an index pointing to the child of the current node.

Note that the children are ordered by edge type.

Parameters:	head_node_types – Node types of head nodes. n_hops – How many hops to sample.
Returns:	List of form `[ (node_type, [children]), ...]`

Data¶

The data package contains classes and functions to read, process, and query graph data

class stellargraph.data.UniformRandomWalk(graph, graph_schema=None, seed=None)[source]¶

Performs uniform random walks on the given graph

run(nodes=None, n=None, length=None, seed=None)[source]¶

Perform a random walk starting from the root nodes.

Parameters:	nodes – <list> The root nodes as a list of node IDs n – <int> Total number of random walks per root node length – <int> Maximum length of each random walk seed – <int> Random number generator seed; default is None
Returns:	<list> List of lists of nodes ids for each of the random walks

class stellargraph.data.BiasedRandomWalk(graph, graph_schema=None, seed=None)[source]¶

Performs biased second order random walks (like those used in Node2Vec algorithm https://snap.stanford.edu/node2vec/) controlled by the values of two parameters p and q.

run(nodes=None, n=None, p=1.0, q=1.0, length=None, seed=None, weighted=False, edge_weight_label='weight')[source]¶

Perform a random walk starting from the root nodes.

Parameters:

nodes – <list> The root nodes as a list of node IDs
n – <int> Total number of random walks per root node
p – <float> Defines probability, 1/p, of returning to source node
q – <float> Defines probability, 1/q, for moving to a node away from the source node
length – <int> Maximum length of each random walk
seed – <int> Random number generator seed; default is None
weighted – <False or True> Indicates whether the walk is unweighted or weighted
edge_weight_label – <string> Label of the edge weight property.

Returns:

<list> List of lists of nodes ids for each of the random walks

class stellargraph.data.UniformRandomMetaPathWalk(graph, graph_schema=None, seed=None)[source]¶

For heterogeneous graphs, it performs uniform random walks based on given metapaths.

run(nodes=None, n=None, length=None, metapaths=None, node_type_attribute='label', seed=None)[source]¶

Performs metapath-driven uniform random walks on heterogeneous graphs.

Parameters:

nodes – <list> The root nodes as a list of node IDs
n – <int> Total number of random walks per root node
length – <int> Maximum length of each random walk
metapaths – <list> List of lists of node labels that specify a metapath schema, e.g.,
'Paper', 'Author'], ['Author, 'Paper', 'Venue', 'Paper', 'Author']] specifies two metapath ([['Author',) –
of length 3 and 5 respectively. (schemas) –
node_type_attribute – <str> The node attribute name that stores the node’s type
seed – <int> Random number generator seed; default is None

Returns:

<list> List of lists of nodes ids for each of the random walks generated

class stellargraph.data.SampledBreadthFirstWalk(graph, graph_schema=None, seed=None)[source]¶

Breadth First Walk that generates a sampled number of paths from a starting node. It can be used to extract a random sub-graph starting from a set of initial nodes.

run(nodes=None, n=1, n_size=None, seed=None)[source]¶

Performs a sampled breadth-first walk starting from the root nodes.

Parameters:

nodes – <list> A list of root node ids such that from each node n BFWs will be generated up to the
depth d. (given) –
n – <int> Number of walks per node id.
n_size – <list> The number of neighbouring nodes to expand at each depth of the walk. Sampling of
with replacement is always used regardless of the node degree and number of neighbours (neighbours) –
requested. –
seed – <int> Random number generator seed; default is None

Returns:

A list of lists such that each list element is a sequence of ids corresponding to a BFW.

class stellargraph.data.SampledHeterogeneousBreadthFirstWalk(graph, graph_schema=None, seed=None)[source]¶

Breadth First Walk for heterogeneous graphs that generates a sampled number of paths from a starting node. It can be used to extract a random sub-graph starting from a set of initial nodes.

run(nodes=None, n=1, n_size=None, seed=None)[source]¶

Performs a sampled breadth-first walk starting from the root nodes.

Parameters:

nodes – <list> A list of root node ids such that from each node n BFWs will be generated with the number of samples per hop specified in n_size.
n – <int> Number of walks per node id.
n_size – <list> The number of neighbouring nodes to expand at each depth of the walk. Sampling of
with replacement is always used regardless of the node degree and number of neighbours (neighbours) –
requested. –
graph_schema – <GraphSchema> If None then the graph schema is extracted from self.graph
seed – <int> Random number generator seed; default is None

Returns:

A list of lists such that each list element is a sequence of ids corresponding to a sampled Heterogeneous BFW.

class stellargraph.data.EdgeSplitter(g, g_master=None)[source]¶

Class for generating training and test data for link prediction in graphs.

The class requires as input a graph (in networkx format) and a percentage as a function of the total number of edges in the given graph of the number of positive and negative edges to sample. For heterogeneous graphs, the caller can also specify the type of edge and an edge property to split on. In the latter case, only a date property can be used and it must be in the format dd/mm/yyyy. A date to be used as a threshold value such that only edges that have date after the threshold must be given. This effects only the sampling of positive edges.

Negative edges are sampled at random by uniformly (for ‘global’ method) selecting two nodes in the graph and then checking if these edges are connected or not. If not, the pair of nodes is considered a negative sample. Otherwise, it is discarded and the process repeats. Alternatively, negative edges are sampled (for ‘local’ method) using DFS search at a distance from the source node (selected uniformly at random from all nodes in the graph) sampled according to a given set of probabilities.

Positive edges can be sampled so that when they are subsequently removed from the graph, the reduced graph is either guaranteed, or not guaranteed, to remain connected. In the former case, graph connectivity is maintained by first calculating the minimum spanning tree. The edges that belong to the minimum spanning tree are protected from removal, and therefore cannot be sampled for the training set. The edges that do not belong to the minimum spanning tree are then sampled uniformly at random, until the required number of positive edges have been sampled for the training set. In the latter case, when connectedness of the reduced graph is not guaranteed, positive edges are sampled uniformly at random from all the edges in the graph, regardless of whether they belong to the spanning tree (which is not calculated in this case).

Parameters:	g – <StellarGraph or networkx object> The graph to sample edges from. g_master – <StellarGraph or networkx object> The graph representing the original dataset and a superset of the g. If it is not None, then when positive and negative edges are sampled, care is taken to make sure (graph) – that a true positive edge is not sampled as a negative edge.

train_test_split(p=0.5, method='global', probs=None, keep_connected=False, edge_label=None, edge_attribute_label=None, edge_attribute_threshold=None, attribute_is_datetime=None, seed=None)[source]¶

Generates positive and negative edges and a graph that has the same nodes as the original but the positive edges removed. It can be used to generate data from homogeneous and heterogeneous graphs.

For heterogeneous graphs, positive and negative examples can be generated based on specified edge type or edge type and edge property given a threshold value for the latter.

Parameters:

p – <float> Percent of edges to be returned. It is calculated as a function of the total number of edges in the original graph. If the graph is heterogeneous, the percentage is calculated as a function of the total number of edges that satisfy the edge_label, edge_attribute_label and edge_attribute_threshold values given.
method – <str> How negative edges are sampled. If ‘global’, then nodes are selected uniformly at random. If ‘local’ then the first nodes is sampled uniformly from all nodes in the graph, but the second node is chosen to be from the former’s local neighbourhood.
probs – <list> list The probabilities for sampling a node that is k-hops from the source node, e.g., [0.25, 0.75] means that there is a 0.25 probability that the target node will be 1-hope away from the source node and 0.75 that it will be 2 hops away from the source node. This only affects sampling of negative edges if method is set to ‘local’.
keep_connected – <True or False> If True then when positive edges are removed care is taken that the reduced graph remains connected. If False, positive edges are removed without guaranteeing the connectivity of the reduced graph.
edge_label – <str> If splitting based on edge type, then this parameter specifies the key for the type of edges to split on.
edge_attribute_label – <str> The label for the edge attribute to split on.
edge_attribute_threshold – <str> The threshold value applied to the edge attribute when sampling positive examples.
attribute_is_datetime – <boolean> Specifies if edge attribute is datetime or not.
seed – <int> seed for random number generator, positive int or 0

Returns:

The reduced graph (positive edges removed) and the edge data as 2 numpy arrays, the first array of dimensionality Nx2 (where N is the number of edges) holding the node ids for the edges and the second of dimensionality Nx1 holding the edge labels, 0 for negative and 1 for positive examples.

stellargraph.data.from_epgm(epgm_location, dataset_name=None, directed=False)[source]¶

Imports a graph stored in EPGM format to a NetworkX object

Parameters:	epgm_location (str) – The directory containing the EPGM data dataset_name (str) – The name of the dataset to import directed (bool) – If True, load as a directed graph, otherwise load as an undirected graph
Returns:	A NetworkX graph containing the data for the EPGM-stored graph.

stellargraph.data.load_dataset_BlogCatalog3(location)[source]¶

This method loads the BlogCatalog3 network dataset (http://socialcomputing.asu.edu/datasets/BlogCatalog3) into a networkx undirected heterogeneous graph.

The graph has two types of nodes, ‘user’ and ‘group’, and two types of edges, ‘friend’ and ‘belongs’. The ‘friend’ edges connect two ‘user’ nodes and the ‘belongs’ edges connects ‘user’ and ‘group’ nodes.

The node and edge types are not included in the dataset that is a collection of node and group ids along with the list of edges in the graph.

Important note about the node IDs: The dataset uses integers for node ids. However, the integers from 1 to 39 are used as IDs for both users and groups. This would cause a confusion when constructing the networkx graph object. As a result, we convert all IDs to string and append the character ‘u’ to the integer ID for user nodes and the character ‘g’ to the integer ID for group nodes.

Parameters:	location – <str> The directory where the dataset is located
Returns:	A networkx Graph object.

Generators¶

The mapper package contains classes and functions to map graph data to neural network inputs

class stellargraph.mapper.FullBatchNodeGenerator(G, name=None, func_opt=None, **kwargs)[source]¶

A data generator for node prediction with Homogeneous full-batch models, e.g., GCN, GAT The supplied graph G should be a StellarGraph object that is ready for machine learning. Currently the model requires node features for all nodes in the graph. Use the flow() method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.

Example:

G_generator = FullBatchNodeGenerator(G)
train_data_gen = G_generator.flow(node_ids, node_targets)

Parameters:	G (StellarGraphBase) – a machine-learning StellarGraph-type graph name (str) – an optional name of the generator func_opt – an optional function to apply on features and adjacency matrix (declared func_opt(features, Aadj, kwargs)) kwargs** – additional parameters for func_opt function

class stellargraph.mapper.GraphSAGENodeGenerator(G, batch_size, num_samples, schema=None, seed=None, name=None)[source]¶

A data generator for node prediction with Homogeneous GraphSAGE models

At minimum, supply the StellarGraph, the batch size, and the number of node samples for each layer of the GraphSAGE model.

The supplied graph should be a StellarGraph object that is ready for machine learning. Currently the model requires node features for all nodes in the graph.

Use the flow() method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.

Example:

G_generator = GraphSAGENodeGenerator(G, 50, [10,10])
train_data_gen = G_generator.flow(node_ids)

Parameters:	G (StellarGraph) – The machine-learning ready graph. batch_size (int) – Size of batch to return. num_samples (list) – The number of samples per layer (hop) to take. schema (GraphSchema) – [Optional] Graph schema for G. seed (int) – [Optional] Random seed for the node sampler. name (str or None) – Name of the generator (optional)

flow(node_ids, targets=None, shuffle=False)[source]¶

Creates a generator/sequence object for training or evaluation with the supplied node ids and numeric targets.

The node IDs are the nodes to train or inference on: the embeddings calculated for these nodes are passed to the downstream task. These are a subset of the nodes in the graph.

The targets are an array of numeric targets corresponding to the supplied node_ids to be used by the downstream task. They should be given in the same order as the list of node IDs. If they are not specified (for example, for use in prediction), the targets will not be available to the downsteam task.

Note that the shuffle argument should be True for training and False for prediction.

Parameters:	node_ids – an iterable of node IDs targets – a 2D array of numeric targets with shape (len(node_ids), target_size) shuffle (bool) – If True the node_ids will be shuffled at each epoch, if False the node_ids will be processed in order.
Returns:	A NodeSequence object to use with the GraphSAGE model in Keras methods `fit_generator`, `evaluate_generator`, and `predict_generator`

flow_from_dataframe(node_targets, shuffle=False)[source]¶

Creates a generator/sequence object for training or evaluation with the supplied node ids and numeric targets.

Parameters:	node_targets – a Pandas DataFrame of numeric targets indexed by the node ID for that target. shuffle (bool) – If True the node_ids will be shuffled at each epoch, if False the node_ids will be processed in order.
Returns:	A NodeSequence object to use with the GraphSAGE model in Keras methods `fit_generator`, `evaluate_generator`, and `predict_generator`

sample_features(head_nodes, sampling_schema)[source]¶

Sample neighbours recursively from the head nodes, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.

Parameters:	head_nodes – An iterable of head nodes to perform sampling on. sampling_schema – The sampling schema for the model
Returns:	A list of the same length as `num_samples` of collected features from the sampled nodes of shape: `(len(head_nodes), num_sampled_at_layer, feature_size)` where num_sampled_at_layer is the cumulative product of num_samples for that layer.

class stellargraph.mapper.GraphSAGELinkGenerator(G, batch_size, num_samples, seed=None, name=None)[source]¶

A data generator for link prediction with Homogeneous GraphSAGE models

At minimum, supply the StellarGraph, the batch size, and the number of node samples for each layer of the GraphSAGE model.

The supplied graph should be a StellarGraph object that is ready for machine learning. Currently the model requires node features for all nodes in the graph.

Use the flow() method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.

Example:

G_generator = GraphSageLinkGenerator(G, 50, [10,10])
train_data_gen = G_generator.flow(edge_ids)

Parameters:	g (StellarGraph) – A machine-learning ready graph. batch_size (int) – Size of batch of links to return. num_samples (list) – List of number of neighbour node samples per GraphSAGE layer (hop) to take. seed (int or str) – Random seed for the sampling methods. optional (name,) – Name of generator

flow(link_ids, targets=None, shuffle=False)[source]¶

Creates a generator/sequence object for training or evaluation with the supplied edge IDs and numeric targets.

The edge IDs are the edges to train or inference on. They are expected to by tuples of (source_id, destination_id).

The targets are an array of numeric targets corresponding to the supplied link_ids to be used by the downstream task. They should be given in the same order as the list of link IDs. If they are not specified (for example, for use in prediction), the targets will not be available to the downsteam task.

Note that the shuffle argument should be True for training and False for prediction.

Parameters:	link_ids – an iterable of (src_id, dst_id) tuples specifying the edges. targets – a 2D array of numeric targets with shape (len(link_ids), target_size) shuffle (bool) – If True the node_ids will be shuffled at each epoch, if False the node_ids will be processed in order.
Returns:	A LinkSequence object to use with the GraphSAGE model methods `fit_generator()`, `evaluate_generator()`, and `predict_generator()`

sample_features(head_links, sampling_schema)[source]¶

Sample neighbours recursively from the head nodes, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.

Parameters:	head_links – An iterable of edges to perform sampling for. sampling_schema – The sampling schema for the model
Returns:	A list of the same length as `num_samples` of collected features from the sampled nodes of shape: `(len(head_nodes), num_sampled_at_layer, feature_size)` where num_sampled_at_layer is the cumulative product of num_samples for that layer.

class stellargraph.mapper.HinSAGENodeGenerator(G, batch_size, num_samples, schema=None, seed=None, name=None)[source]¶

Keras-compatible data mapper for Heterogeneous GraphSAGE (HinSAGE)

At minimum, supply the StellarGraph, the batch size, and the number of node samples for each layer of the HinSAGE model.

The supplied graph should be a StellarGraph object that is ready for machine learning. Currently the model requires node features for all nodes in the graph.

Use the flow() method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.

Note that the shuffle argument should be True for training and False for prediction.

Example:

G_generator = HinSAGENodeGenerator(G, 50, [10,10])
data_gen = G_generator.flow(node_ids)

Parameters:	G (StellarGraph) – The machine-learning ready graph batch_size (int) – Size of batch to return num_samples (list) – The number of samples per layer (hop) to take schema (GraphSchema) – [Optional] Graph schema for G. seed (int) – Random seed for the node sampler name (str) – Name of the generator.

flow(node_ids, targets=None, shuffle=False)[source]¶

Creates a generator/sequence object for training or evaluation with the supplied node ids and numeric targets.

The node IDs are the nodes to train or inference on: the embeddings calculated for these nodes are passed to the downstream task. These are a subset of the nodes in the graph.

The targets are an array of numeric targets corresponding to the supplied node_ids to be used by the downstream task. They should be given in the same order as the list of node IDs. If they are not specified (for example, for use in prediction), the targets will not be available to the downsteam task.

Note that the shuffle argument should be True for training and False for prediction.

Parameters:	node_ids (iterable) – The head node IDs targets (Numpy array) – a 2D array of numeric targets with shape `(len(node_ids), target_size)` node_type (str) – The target node type, if not given the node type will be inferred from the graph. shuffle (bool) – If True the node_ids will be shuffled at each epoch, if False the node_ids will be processed in order.
Returns:	A NodeSequence object to use with the GraphSAGE model in Keras methods fit_generator, evaluate_generator, and predict_generator.

flow_from_dataframe(node_targets, shuffle=False)[source]¶

Creates a generator/sequence object for training or evaluation with the supplied node ids and numeric targets.

Note that the shuffle argument should be True for training and False for prediction.

Parameters:	node_targets (DataFrame) – Numeric targets indexed by the node ID for that target. node_type (str) – The target node type, if not given the node type will be inferred from the graph. shuffle (bool) – If True the node_ids will be shuffled at each epoch, if False the node_ids will be processed in order.
Returns:	A NodeSequence object to use with the GraphSAGE model in Keras methods fit_generator, evaluate_generator, and predict_generator.

sample_features(head_nodes, sampling_schema)[source]¶

Sample neighbours recursively from the head nodes, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.

Parameters:	head_nodes – An iterable of head nodes to perform sampling on. node_sampling_schema – The sampling schema for the HinSAGE model, this is can be generated by the `GraphSchema` object.
Returns:	A list of the same length as `num_samples` of collected features from the sampled nodes of shape: `(len(head_nodes), num_sampled_at_layer, feature_size)` where num_sampled_at_layer is the cumulative product of num_samples for that layer.

class stellargraph.mapper.HinSAGELinkGenerator(G, batch_size, num_samples, seed=None, name=None)[source]¶

A data generator for link prediction with Heterogeneous HinSAGE models

At minimum, supply the StellarGraph, the batch size, and the number of node samples for each layer of the GraphSAGE model.

The supplied graph should be a StellarGraph object that is ready for machine learning. Currently the model requires node features for all nodes in the graph.

Use the flow() method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.

Note that you don’t need to pass link_type (target link type) to the link mapper, considering that:

The mapper actually only cares about (src,dst) node types, and these can be inferred from the passed link ids (although this might be expensive, as it requires parsing the links ids passed - yet only once)
It’s possible to do link prediction on a graph where that link type is completely removed from the graph (e.g., “same_as” links in ER)

Example:

G_generator = HinSAGELinkGenerator(G, 50, [10,10])
data_gen = G_generator.flow(edge_ids)

Parameters:	g (StellarGraph) – A machine-learning ready graph. batch_size (int) – Size of batch of links to return. num_samples (list) – List of number of neighbour node samples per GraphSAGE layer (hop) to take. seed (int or str) – Random seed for the sampling methods. optional (name.,) – Name of generator

flow(link_ids, targets=None, shuffle=False)[source]¶

Creates a generator/sequence object for training or evaluation with the supplied edge IDs and numeric targets.

The edge IDs are the edges to train or inference on. They are expected to by tuples of (source_id, destination_id).

The targets are an array of numeric targets corresponding to the supplied link_ids to be used by the downstream task. They should be given in the same order as the list of link IDs. If they are not specified (for example, for use in prediction), the targets will not be available to the downsteam task.

Note that the shuffle argument should be True for training and False for prediction.

Parameters:	link_ids – an iterable of (src_id, dst_id) tuples specifying the edges. targets – a 2D array of numeric targets with shape `(len(link_ids), target_size)` shuffle (bool) – If True the node_ids will be shuffled at each epoch, if False the node_ids will be processed in order.
Returns:	A LinkSequence object to use with the GraphSAGE model methods `fit_generator()`, `evaluate_generator()`, and `predict_generator()`

sample_features(head_links, sampling_schema)[source]¶

Sample neighbours recursively from the head nodes, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.

Parameters:

head_links – An iterable of edges to perform sampling for.
sampling_schema – The sampling schema for the model

Returns:

A list of the same length as num_samples of collected features from the sampled nodes of shape:

(len(head_nodes), num_sampled_at_layer, feature_size)

where num_sampled_at_layer is the cumulative product of num_samples for that layer.

GraphSAGE model¶

GraphSAGE and compatible aggregator layers

class stellargraph.layer.graphsage.GraphSAGE(layer_sizes, generator=None, n_samples=None, input_dim=None, aggregator=None, bias=True, dropout=0.0, normalize='l2')[source]¶

Implementation of the GraphSAGE algorithm of Hamilton et al. with Keras layers. see: http://snap.stanford.edu/graphsage/

The model minimally requires specification of the layer sizes as a list of ints corresponding to the feature dimensions for each hidden layer and a generator object.

Different aggregators can also be specified with the aggregator argument, which should be the aggregator class, either MeanAggregator, MeanPoolingAggregator, MaxPoolingAggregator, or AttentionalAggregator.

Parameters:

layer_sizes (list) – Hidden feature dimensions for each layer
generator (Sequence) – A NodeSequence or LinkSequence. If specified the n_samples and input_dim will be taken from this object.
n_samples (list) – (Optional: needs to be specified if no mapper is provided.) The number of samples per layer in the model.
input_dim (int) – The dimensions of the node features used as input to the model.
aggregator (class) – The GraphSAGE aggregator to use. Defaults to the MeanAggregator.
bias (bool) – If True a bias vector is learnt for each layer in the GraphSAGE model
dropout (float) – The dropout supplied to each layer in the GraphSAGE model.
normalize (str or None) – The normalization used after each layer, defaults to L2 normalization.

default_model(flatten_output=False)[source]¶

Return model with default inputs

Parameters:	flatten_output – The GraphSAGE model will return an output tensor of form (batch_size, 1, feature_size). If this flag is true, the output will be of size (batch_size, 1*feature_size)
Returns:	(x_inp, x_out) where `x_inp` is a list of Keras input tensors for the specified GraphSAGE model and `x_out` is tne Keras tensor for the GraphSAGE model output.
Return type:	tuple

class stellargraph.layer.graphsage.MeanAggregator(output_dim: int = 0, bias: bool = False, act: Callable = 'relu', **kwargs)[source]¶

Mean Aggregator for GraphSAGE implemented with Keras base layer

Parameters:	output_dim (int) – Output dimension bias (bool) – Optional bias act (Callable or str) – name of the activation function to use (must be a Keras activation function), or alternatively, a TensorFlow operation.

aggregate_neighbours(x_neigh)[source]¶: Override with a method to aggregate tensors over neighbourhood.

build(input_shape)[source]¶

Builds layer

Parameters:	input_shape (list of list of int) – Shape of input tensors for self neighbour (and) –

class stellargraph.layer.graphsage.MeanPoolingAggregator(*args, **kwargs)[source]¶

Mean Pooling Aggregator for GraphSAGE implemented with Keras base layer

Implements the aggregator of Eq. (3) in Hamilton et al. (2017), with max pooling replaced with mean pooling

Parameters:	output_dim (int) – Output dimension bias (bool) – Optional bias act (Callable or str) – name of the activation function to use (must be a Keras activation function), or alternatively, a TensorFlow operation.

aggregate_neighbours(x_neigh)[source]¶

Aggregates the neighbour tensors by mean-pooling of neighbours

Parameters:	x_neigh (Tensor) – Neighbour tensor of shape (n_batch, n_head, n_neighbour, n_feat)
Returns:	Aggregated neighbour tensor of shape (n_batch, n_head, n_feat)

build(input_shape)[source]¶

Builds layer

Parameters:	input_shape (list of list of int) – Shape of input tensors for self neighbour (and) –

class stellargraph.layer.graphsage.MaxPoolingAggregator(*args, **kwargs)[source]¶

Max Pooling Aggregator for GraphSAGE implemented with Keras base layer

Implements the aggregator of Eq. (3) in Hamilton et al. (2017)

Parameters:	output_dim (int) – Output dimension bias (bool) – Optional bias act (Callable or str) – name of the activation function to use (must be a Keras activation function), or alternatively, a TensorFlow operation.

aggregate_neighbours(x_neigh)[source]¶

Aggregates the neighbour tensors by max-pooling of neighbours

Parameters:	x_neigh (Tensor) – Neighbour tensor of shape (n_batch, n_head, n_neighbour, n_feat)
Returns:	Aggregated neighbour tensor of shape (n_batch, n_head, n_feat)

build(input_shape)[source]¶

Builds layer

Parameters:	input_shape (list of list of int) – Shape of input tensors for self neighbour (and) –

class stellargraph.layer.graphsage.AttentionalAggregator(*args, **kwargs)[source]¶

Attentional Aggregator for GraphSAGE implemented with Keras base layer

Implements the aggregator of Veličković et al. “Graph Attention Networks” ICLR 2018

Parameters:	output_dim (int) – Output dimension bias (bool) – Optional bias act (Callable or str) – name of the activation function to use (must be a Keras activation function), or alternatively, a TensorFlow operation.

build(input_shape)[source]¶

Builds layer

Parameters:	input_shape (list of list of int) – Shape of input tensors for self neighbour (and) –

call(x, **kwargs)[source]¶

Apply aggregator on input tensors, x

Parameters:	x (List[Tensor]) – Tensors giving self and neighbour features x[0]: self Tensor (batch_size, head size, feature_size) x[1]: neighbour Tensor (batch_size, head size, neighbours, feature_size)
Returns:	Keras Tensor representing the aggregated embeddings in the input.

weight_output_size()[source]¶

Calculates the output size, according to whether the model is building a MLP and the method (concat or sum).

Returns:	size of the weight outputs.
Return type:	int

HinSAGE model¶

Heterogeneous GraphSAGE and compatible aggregator layers

class stellargraph.layer.hinsage.HinSAGE(layer_sizes, generator=None, n_samples=None, input_neighbor_tree=None, input_dim=None, aggregator=None, bias=True, dropout=0.0, normalize='l2')[source]¶

Implementation of the GraphSAGE algorithm extended for heterogeneous graphs with Keras layers.

default_model(flatten_output=False)[source]¶

Return model with default inputs

Parameters:	flatten_output (bool) – The HinSAGE model returns an output tensor of form (batch_size, 1, feature_size) - if this flag is True, the output will be resized to (batch_size, feature_size)
Returns:	(x_inp, x_out) where `x_inp` is a list of Keras input tensors for the specified HinSAGE model and `x_out` is tne Keras tensor for the HinSAGE model output.
Return type:	tuple

class stellargraph.layer.hinsage.MeanHinAggregator(output_dim: int = 0, bias: bool = False, act: Union[Callable, AnyStr] = 'relu', **kwargs)[source]¶

Mean Aggregator for HinSAGE implemented with Keras base layer

Parameters:	output_dim (int) – Output dimension bias (bool) – Use bias in layer or not (Default False) act (Callable or str) – name of the activation function to use (must be a Keras activation function), or alternatively, a TensorFlow operation.

build(input_shape)[source]¶

Builds layer

Parameters:	input_shape (list of list of int) – Shape of input per neighbour type.

call(x, **kwargs)[source]¶

Apply MeanAggregation on input tensors, x

Parameters:	x – List of Keras Tensors x[0] = tensor of self features shape (n_batch, n_head, n_feat) x[1+r] = tensors of neighbour features each of shape (n_batch, n_head, n_neighbour[r], n_feat[r])
Returns:	Keras Tensor representing the aggregated embeddings in the input.

compute_output_shape(input_shape)[source]¶

Computes the output shape of the layer. Assumes that the layer will be built to match that input shape provided.

Parameters:	input_shape (tuple of ints) – Shape tuples can include None for free dimensions, instead of an integer.
Returns:	An input shape tuple.

get_config()[source]¶: Gets class configuration for Keras serialization

GCN model¶

class stellargraph.layer.gcn.GraphConvolution(units, support=1, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, **kwargs)[source]¶

Implementation of the graph convolution layer as in https://arxiv.org/abs/1609.02907

build(input_shapes)[source]¶

Creates the layer weights.

Must be implemented on all layers that have weights.

# Arguments

input_shape: Keras tensor (future input to layer): or list/tuple of Keras tensors to reference for weight shape computations.

call(inputs, mask=None)[source]¶

This is where the layer’s logic lives.

# Arguments: inputs: Input tensor, or list/tuple of input tensors. **kwargs: Additional keyword arguments.
# Returns: A tensor or list/tuple of tensors.

compute_output_shape(input_shapes)[source]¶

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments

input_shape: Shape tuple (tuple of integers): or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.

# Returns

An output shape tuple.

get_config()[source]¶: Gets class configuration for Keras serialization

class stellargraph.layer.gcn.GCN(layer_sizes, activations, generator, bias=True, dropout=0.0, kernel_regularizer=None, **kwargs)[source]¶

To create GCN layers with Keras layers.

link_model(flatten_output=False)[source]¶

Builds a GCN model for link (node pair) prediction :param flatten_output:

Returns:

node_model()[source]¶

Builds a GCN model for node prediction

Returns:	(x_inp, x_out) where x_inp is a Keras input tensor for the specified GCN model and x_out is a Keras tensor for the GCN model output.
Return type:	tuple

Link prediction layers¶

Link inference functions for link classification (including link prediction) and link attribute inference (regression)

stellargraph.layer.link_inference.link_classification(output_dim: int = 1, output_act: AnyStr = 'sigmoid', edge_embedding_method: AnyStr = 'ip')[source]¶

Defines a function that predicts a binary or multi-class edge classification output from (source, destination) node embeddings (node features).

Parameters:

output_dim (int) – Number of classifier’s output units – desired dimensionality of the output,
output_act (str) – activation function applied to the output, one of “softmax”, “sigmoid”, etc., or any activation function supported by Keras, see https://keras.io/activations/ for more information.
edge_embedding_method (str) –
Name of the method of combining (src,dst) node features/embeddings into edge embeddings. One of:
- ’concat’ – concatenation,
- ’ip’ or ‘dot’ – inner product, \(ip(u,v) = sum_{i=1..d}{u_i*v_i}\),
- ’mul’ or ‘hadamard’ – element-wise multiplication, \(h(u,v)_i = u_i*v_i\),
- ’l1’ – \(l_1(u,v)_i = |u_i-v_i|\),
- ’l2’ – \(l_2(u,v)_i = (u_i-v_i)^2\),
- ’avg’ – \(avg(u,v) = (u+v)/2\).

Returns:

Function taking edge tensors with src, dst node embeddings (i.e., pairs of (node_src, node_dst) tensors) and returning logits of output_dim length (e.g., edge class probabilities).

stellargraph.layer.link_inference.link_regression(output_dim: int = 1, clip_limits: Optional[Tuple[float]] = None, edge_embedding_method: AnyStr = 'ip')[source]¶

Defines a function that predicts a numeric edge regression output vector/scalar from (source, destination) node embeddings (node features).

Parameters:

output_dim (int) – Number of classifier’s output units – desired dimensionality of the output,
clip_limits (tuple) – lower and upper thresholds for LeakyClippedLinear unit on top. If None (not provided), the LeakyClippedLinear unit is not applied.
edge_embedding_method (str) –
Name of the method of combining (src,dst) node features/embeddings into edge embeddings. One of:
- ’concat’ – concatenation,
- ’ip’ or ‘dot’ – inner product, \(ip(u,v) = sum_{i=1..d}{u_i*v_i}\),
- ’mul’ or ‘hadamard’ – element-wise multiplication, \(h(u,v)_i = u_i*v_i\),
- ’l1’ – \(l_1(u,v)_i = |u_i-v_i|\),
- ’l2’ – \(l_2(u,v)_i = (u_i-v_i)^2\),
- ’avg’ – \(avg(u,v) = (u+v)/2\).

Returns:

Function taking edge tensors with src, dst node embeddings (i.e., pairs of (node_src, node_dst) tensors) and returning a numeric value (e.g., edge attribute being predicted) constructed according to edge_embedding_method.

stellargraph.layer.link_inference.link_inference(output_dim: int = 1, output_act: AnyStr = 'linear', edge_embedding_method: AnyStr = 'ip', clip_limits: Optional[Tuple[float]] = None, name: AnyStr = 'link_inference')[source]¶

Defines an edge inference function that takes source, destination node embeddings (node features) as input, and returns a numeric vector of output_dim size.

Parameters:

output_dim (int) – Number of predictor’s output units – desired dimensionality of the output.
output_act (str) – activation function applied to the output, one of “softmax”, “sigmoid”, etc., or any activation function supported by Keras, see https://keras.io/activations/ for more information.
edge_embedding_method (str) –
Name of the method of combining (src,dst) node features or embeddings into edge embeddings. One of:
- ’concat’ – concatenation,
- ’ip’ or ‘dot’ – inner product, \(ip(u,v) = sum_{i=1..d}{u_i*v_i}\),
- ’mul’ or ‘hadamard’ – element-wise multiplication, \(h(u,v)_i = u_i*v_i\),
- ’l1’ – \(l_1(u,v)_i = |u_i-v_i|\),
- ’l2’ – \(l_2(u,v)_i = (u_i-v_i)^2\),
- ’avg’ – \(avg(u,v) = (u+v)/2\).
clip_limits (Tuple[float]) – lower and upper thresholds for LeakyClippedLinear unit on top. If None (not provided), the LeakyClippedLinear unit is not applied.
name (str) – optional name of the defined function, used for error logging

Returns:

Function taking edge tensors with src, dst node embeddings (i.e., pairs of (node_src, node_dst) tensors) and returning a vector of output_dim length (e.g., edge class probabilities, edge attribute prediction, etc.).