StellarGraph API¶
Core¶
-
class
stellargraph.
GraphSchema
(is_directed, node_types, edge_types, schema)[source]¶ Class to encapsulate the schema information for a heterogeneous graph.
Typically this should be created from a StellarGraph object, using the
create_graph_schema()
method.-
edge_index
(edge_type)[source]¶ Return edge type index from the type tuple
- Parameters
index – Tuple of (node1_type, edge_type, node2_type)
- Returns
Numerical edge type index
-
node_index
(name)[source]¶ Return node type index from the type name
- Parameters
index – name of the node type.
- Returns
Numerical node type index
-
sampling_layout
(head_node_types, num_samples)[source]¶ For a sampling scheme with a list of head node types and the number of samples per hop, return the map from the actual sample index to the adjacency list index.
- Parameters
head_node_types – A list of node types of the head nodes.
num_samples – A list of integers that are the number of neighbours to sample at each hop.
- Returns
A list containing, for each head node type, a list consisting of tuples of (node_type, sampling_index). The list matches the list given by the method type_adjacency_list(…) and can be used to reformat the samples given by SampledBreadthFirstWalk to that expected by the HinSAGE model.
-
sampling_tree
(head_node_types, n_hops)[source]¶ Returns a sampling tree for the specified head node types for neighbours up to n_hops away. A unique ID is created for each sampling node.
- Parameters
head_node_types – An iterable of the types of the head nodes
n_hops – The number of hops away
- Returns
A list of the form [(type_adjacency_index, node_type, [children]), …] where children are (type_adjacency_index, node_type, [children])
-
type_adjacency_list
(head_node_types, n_hops)[source]¶ Creates a BFS sampling tree as an adjacency list from head node types.
Each list element is a tuple of:
(node_type, [child_1, child_2, ...])
where
child_k
is an index pointing to the child of the current node.Note that the children are ordered by edge type.
- Parameters
head_node_types – Node types of head nodes.
n_hops – How many hops to sample.
- Returns
List of form
[ (node_type, [children]), ...]
-
-
class
stellargraph.
IndexedArray
(values=None, index=None)[source]¶ An array where the first dimension is indexed.
This is a reduced Pandas DataFrame. It has:
multidimensional data support, where each element
values[idx, ...]
can be a vector, matrix or even higher rank objecta requirement that all values have the same type
labels for the elements of the first axis e.g.
index[0]
is the label for thevalues[0, ...]
element.no labels for other axes
less overhead (but less API) than a Pandas DataFrame
- Parameters
values (numpy.ndarray, optional) – an array of rank at least 2 of data, where the first axis is indexed.
index (sequence, optional) – a sequence of labels or IDs, one for each element of the first axis. If not specified, this defaults to sequential integers starting at 0
-
class
stellargraph.
StellarDiGraph
(nodes=None, edges=None, *, source_column='source', target_column='target', edge_weight_column='weight', edge_type_column=None, node_type_default='default', edge_type_default='default', dtype='float32', graph=None, node_type_name='label', edge_type_name='label', node_features=None)[source]¶
-
class
stellargraph.
StellarGraph
(nodes=None, edges=None, *, is_directed=False, source_column='source', target_column='target', edge_weight_column='weight', edge_type_column=None, node_type_default='default', edge_type_default='default', dtype='float32', graph=None, node_type_name='label', edge_type_name='label', node_features=None)[source]¶ StellarGraph class for graph machine learning.
Summary of a StellarGraph and the terminology used:
it stores graph structure, as a collection of nodes and a collection of edges that connect a source node to a target node
each node and edge has an associated type
each node and edge has a numeric vector of features, and the vectors of all nodes or edges with the same type have the same dimension
it is homogeneous if there is only one type of node and one type of edge
it is heterogeneous if it is not homogeneous (more than one type of node, or more than one type of edge)
it is directed if the direction of an edge starting at its source node and finishing at its target node is important
it is undirected if the direction does not matter
every StellarGraph can be a multigraph, meaning there can be multiple edges between any two nodes
To create a StellarGraph object, at a minimum pass the edges as a Pandas DataFrame. Each row of the edges DataFrame represents an edge, where the index is the ID of the edge, and the
source
andtarget
columns store the node ID of the source and target nodes.For example, suppose we’re modelling a graph that’s a square with a diagonal:
a -- b | \ | | \ | d -- c
The DataFrame might look like:
edges = pd.DataFrame( {"source": ["a", "b", "c", "d", "a"], "target": ["b", "c", "d", "a", "c"]} )
If this data represents an undirected graph (the ordering of each edge source/target doesn’t matter):
Gs = StellarGraph(edges=edges)
If this data represents a directed graph (the ordering does matter):
Gs = StellarDiGraph(edges=edges)
One can also pass information about nodes, as either:
a NumPy array, if the node IDs are 0, 1, 2, …
a Pandas DataFrame
Each row of the nodes frame (first dimension of the NumPy array) represents a node in the graph, where the index is the ID of the node. When this node information is not passed (the argument is left as the default), the set of nodes is automatically inferred. This inference in the example above is equivalent to:
nodes = IndexedArray(index=["a", "b", "c", "d"]) Gs = StellarGraph(nodes, edges)
Numeric node features are taken as any columns of the nodes DataFrame. For example, if the graph above has two features
x
andy
associated with each node:# As a IndexedArray (no column names): feature_array = np.array([[-1, 0.4], [2, 0.1], [-3, 0.9], [4, 0]]) nodes = IndexedArray(feature_array, index=["a", "b", "c", "d"]) # As a Pandas DataFrame: nodes = pd.DataFrame( {"x": [-1, 2, -3, 4], "y": [0.4, 0.1, 0.9, 0]}, index=["a", "b", "c", "d"] ) # As a NumPy array: # Note, edges must change to using 0, 1, 2, 3 (instead of a, b, c, d) nodes = feature_array
Construction directly from a
IndexedArray
or NumPy array will have the least overhead, but construction from Pandas allows for convenient data transformation.Edge weights are taken as the optional
weight
column of the edges DataFrame:edges = pd.DataFrame({ "source": ["a", "b", "c", "d", "a"], "target": ["b", "c", "d", "a", "c"], "weight": [10, 0.5, 1, 3, 13] })
Numeric edge features are taken by any columns that do not have a special meaning (that is, excluding
source
,target
and the optionalweight
oredge_type_column
columns). For example, if the graph has weighted edges with two featuresa
andb
associated with each node:edges = pd.DataFrame({ "source": ["a", "b", "c", "d", "a"], "target": ["b", "c", "d", "a", "c"], "weight": [10, 0.5, 1, 3, 13], "a": [-1, 2, -3, 4, -5], "b": [0.4, 0.1, 0.9, 0, 0.9], })
Heterogeneous graphs, with multiple node or edge types, can be created by passing multiple
IndexedArray
or DataFrames in a dictionary. The dictionary keys are the names/identifiers for the type. For example, if the graph above has nodea
of typefoo
, and the rest as typebar
, the construction might look like:foo_nodes = IndexedArray(np.array([[-1]]), index=["a"]) bar_nodes = IndexedArray( np.array([[0.4, 100], [0.1, 200], [0.9, 300]], index=["b", "c", "d"]) ) StellarGraph({"foo": foo_nodes, "bar": bar_nodes}, edges)
(One cannot pass multiple NumPy arrays, because the node IDs cannot be inferred properly in this case. The node IDs for a NumPy array can be specified via the
IndexedArray
type.)Notice the
foo
node has one featurex
, while thebar
nodes have 2 featuresy
andz
. A heterogeneous graph can have different features for each type.Edges of different types can work in the same way. For instance, if edges have different types based on their orientation:
horizontal_edges = pd.DataFrame( {"source": ["a", "c"], "target": ["b", "d"]}, index=[0, 2] ) vertical_edges = pd.DataFrame( {"source": ["b", "d"], "target": ["c", "a"]}, index=[1, 3] ) diagonal_edges = pd.DataFrame({"source": ["a"], "target": ["c"]}, index=[4]) StellarGraph(nodes, {"h": horizontal_edges, "v": vertical_edges, "d": diagonal_edges})
A dictionary can be passed for both arguments:
StellarGraph( {"foo": foo_nodes, "bar": bar_nodes}, {"h": horizontal_edges, "v": vertical_edges, "d": diagonal_edges} )
Alternatively, a single DataFrame can be provided, with an additional column of the type. This column is specified by passing the
edge_type_column
argument:orientation_edges = pd.DataFrame( { "source": ["a", "b", "c", "d", "a"], "target": ["b", "c", "d", "a", "c"], "type": ["h", "v", "h", "v", "d"] } ) StellarGraph(nodes, orientation_edges, edge_type_column="type")
Note
The IDs of nodes must be unique across all types: for example, it is an error to have a node 0 of type
a
, and a node 0 of typeb
. IDs of edges must also be unique across all types.This type stores the external IDs for nodes and edges as ilocs. For convenience, methods here will traffic in the external ID values and transparently convert to and from ilocs as required internally. Many of these methods also have a
use_ilocs
parameter that allows for explicitly switching the methods to consume and return ilocs directly, cutting out the conversion overhead.See also
The
from_networkx()
allows constructing from a NetworkX graph.The examples of loading data into a
StellarGraph
from many formats.- Parameters
nodes (Numpy array, IndexedArray, DataFrame or dict of hashable to IndexedArray or Pandas DataFrame, optional) – Features for every node in the graph. The values are taken as numeric node features of type
dtype
. If there is only one type of node, a NumPy array,IndexedArray
or DataFrame can be passed directly, and the type defaults to thenode_type_default
parameter. Nodes have an ID taken from the index of the dataframe, and they have to be unique across all types. For nodes with no features, an appropriate value can be created withIndexedArray(index=node_ids)
, wherenode_ids
is a list of the node IDs. If this is not passed, the nodes will be inferred fromedges
with no features for each node.edges (DataFrame or dict of hashable to Pandas DataFrame, optional) – An edge list for each type of edges as a Pandas DataFrame containing a source, target and (optionally) weight column (the names of each are taken from the
source_column
,target_column
andedge_weight_column
parameters), along with any feature columns. If there is only one type of edges, a DataFrame can be passed directly, and the type defaults to theedge_type_default
parameter. Edges have an ID taken from the index of the dataframe, and they have to be unique across all types.is_directed (bool, optional) – If True, the data represents a directed multigraph, otherwise an undirected multigraph.
source_column (str, optional) – The name of the column to use as the source node of edges in the
edges
edge list argument.target_column (str, optional) – The name of the column to use as the target node of edges in the
edges
edge list argument.edge_weight_column (str, optional) – The name of the column in each of the
edges
DataFrames to use as the weight of edges. If the column does not exist in any of them, it is defaulted to1
.edge_type_column (str, optional) – The name of the column in the
edges
DataFrame to use as the edge type (if this is set,edges
must be a single DataFrame, not a dictionary).node_type_default (str, optional) – The default node type to use, if
nodes
is passed as a DataFrame (not adict
).edge_type_default (str, optional) – The default edge type to use, if
edges
is passed as a DataFrame (not adict
).dtype (numpy data-type, optional) – The numpy data-type to use for the features extracted from each of the
nodes
DataFrames.graph – Deprecated, use
from_networkx()
.node_type_name – Deprecated, use
from_networkx()
.edge_type_name – Deprecated, use
from_networkx()
.node_features – Deprecated, use
from_networkx()
.
-
check_graph_for_ml
(features=True, expensive_check=False)[source]¶ Checks if all properties required for machine learning training/inference are set up. An error will be raised if the graph is not correctly setup.
-
connected_components
()[source]¶ Compute the connected components in this graph, ordered by size.
The nodes in the largest component can be computed with
nodes = next(graph.connected_components())
. The node IDs returned by this method can be used to compute the corresponding subgraph withgraph.subgraph(nodes)
.For directed graphs, this computes the weakly connected components. This effectively treating each edge as undirected.
- Returns
An iterator over sets of node IDs in each connected component, from the largest (most nodes) to smallest (fewest nodes).
-
create_graph_schema
(nodes=None)[source]¶ Create graph schema from the current graph.
- Parameters
nodes (list) – A list of node IDs to use to build schema. This must represent all node types and all edge types in the graph. If not specified, all nodes and edges in the graph are used.
- Returns
GraphSchema object.
-
edge_arrays
(include_edge_type=False, include_edge_weight=False, use_ilocs=False) → tuple[source]¶ Obtains the collection of edges in the graph as a tuple of arrays (sources, targets, types, weights).
types
andweights
will be None if the optional parameters are not specified.- Parameters
include_edge_type (bool) – A flag that indicates whether to return edge types.
include_edge_weight (bool) – A flag that indicates whether to return edge weights.
use_ilocs (bool) – if True return ilocs for nodes (and edge types)
- Returns
A tuple containing 1D arrays of the source and target nodes (sources, targets, types, weights). Setting include_edge_type and/or include_edge_weight to True will include arrays of edge types and/or edge weights in this tuple, otherwise they will be set to
None
.
-
edge_feature_shapes
(edge_types=None)[source]¶ Get the feature shapes for the specified edge types.
See also
- Parameters
edge_types (list, optional) – A list of edge types. If None all current edge types will be used.
- Returns
A dictionary of edge type and tuple feature shapes.
-
edge_feature_sizes
(edge_types=None)[source]¶ Get the feature sizes for the specified edge types.
See also
- Parameters
edge_types (list, optional) – A list of edge types. If None all current edge types will be used.
- Returns
A dictionary of edge type and integer feature size.
-
edge_features
(edges=None, edge_type=None, use_ilocs=False)[source]¶ Get the numeric feature vectors for the specified edges or edge type.
For graphs with a single edge type:
graph.edge_features()
to retrieve features of all edges, in the same order asgraph.edges()
.graph.edge_features(edges=some_edge_ids)
to retrieve features for each edge insome_edge_ids
.
For graphs with multiple edge types:
graph.edge_features(edge_type=some_type)
to retrieve features of all edges of typesome_type
, in the same order asgraph.edges(edge_type=some_type)
.graph.edge_features(edges=some_edge_ids, edge_type=some_type)
to retrieve features for each edge insome_edge_ids
. All of the chosen edges must be of typesome_type
.graph.edge_features(edges=some_edge_ids)
to retrieve features for each edge insome_edge_ids
. All of the chosen edges must be of the same type, which will be inferred. This will be slower than providing the edge type explicitly in the previous example.
- Parameters
edges (list or hashable, optional) – Edge ID or list of edge IDs, all of the same type
edge_type (hashable, optional) – the type of the edges.
- Returns
Numpy array containing the edge features for the requested edges or edge type.
-
edge_type_ilocs_to_names
(edge_type_ilocs)[source]¶ Get the names of the specified edge type ilocs.
- Parameters
edge_type_ilocs (sequence of int) – edge type ilocs
- Returns
Numpy array containing the names of the requested edge types.
-
edge_type_names_to_ilocs
(edge_type_names)[source]¶ Get the edge type ilocs for the specified edge types.
- Parameters
edge_type_names (sequence of hashable) – edge types
- Returns
Numpy array containing the ilocs of the requested edge types.
-
property
edge_types
¶ Returns: a sequence of all edge types in the graph
-
edges
(include_edge_type=False, include_edge_weight=False, use_ilocs=False) → Iterable[Any][source]¶ Obtains the collection of edges in the graph.
- Parameters
include_edge_type (bool) – A flag that indicates whether to return edge types of format (node 1, node 2, edge type) or edge pairs of format (node 1, node 2).
include_edge_weight (bool) – A flag that indicates whether to return edge weights. Weights are returned in a separate list.
use_ilocs (bool) – if True return ilocs for nodes (and edge types)
- Returns
The graph edges. If edge weights are included then a tuple of (edges, weights).
-
static
from_networkx
(graph, *, edge_weight_attr='weight', node_type_attr='label', edge_type_attr='label', node_type_default='default', edge_type_default='default', node_features=None, dtype='float32')[source]¶ Construct a
StellarGraph
object from a NetworkX graph:Gs = StellarGraph.from_networkx(nx_graph)
To create a StellarGraph object with node features, supply the features as a numeric feature vector for each node.
To take the feature vectors from a node attribute in the original NetworkX graph, supply the attribute name to the
node_features
argument:Gs = StellarGraph.from_networkx(nx_graph, node_features="feature")
where
nx_graph
contains nodes that have a"feature"
attribute containing the feature vector for the node. All nodes of the same type must have the same size feature vectors.Alternatively, supply the node features as Pandas DataFrame objects with the index of the DataFrame set to the node IDs. For graphs with a single node type, you can supply the DataFrame object directly to StellarGraph:
node_data = pd.DataFrame( [feature_vector_1, feature_vector_2, ..], index=[node_id_1, node_id_2, ...]) Gs = StellarGraph.from_networkx(nx_graph, node_features=node_data)
For graphs with multiple node types, provide the node features as Pandas DataFrames for each type separately, as a dictionary by node type. This allows node features to have different sizes for each node type:
node_data = { node_type_1: pd.DataFrame(...), node_type_2: pd.DataFrame(...), } Gs = StellarGraph.from_networkx(nx_graph, node_features=node_data)
The dictionary only needs to include node types with features. If a node type isn’t mentioned in the dictionary (for example, if nx_graph above has a 3rd node type), each node of that type will have a feature vector of length zero.
You can also supply the node feature vectors as an iterator of node_id and feature vector pairs, for graphs with single and multiple node types:
node_data = zip([node_id_1, node_id_2, ...], [feature_vector_1, feature_vector_2, ..]) Gs = StellarGraph.from_networkx(nx_graph, node_features=node_data)
See also
- Parameters
graph – The NetworkX graph instance.
node_type_attr (str, optional) – This is the name for the node types that StellarGraph uses when processing heterogeneous graphs. StellarGraph will look for this attribute in the nodes of the graph to determine their type.
node_type_default (str, optional) – This is the default node type to use for nodes that do not have an explicit type.
edge_type_attr (str, optional) – This is the name for the edge types that StellarGraph uses when processing heterogeneous graphs. StellarGraph will look for this attribute in the edges of the graph to determine their type.
edge_type_default (str, optional) – This is the default edge type to use for edges that do not have an explicit type.
node_features (str, dict, list or DataFrame optional) – This tells StellarGraph where to find the node feature information required by some graph models. These are expected to be a numeric feature vector for each node in the graph.
edge_weight_attr (str, optional) – The name of the attribute to use as the weight of edges.
- Returns
A
StellarGraph
(ifgraph
is undirected) orStellarDiGraph
(ifgraph
is directed) instance representing the data ingraph
andnode_features
.
-
has_node
(node: Any) → bool[source]¶ Indicates whether or not the graph contains the specified node.
- Parameters
node (any) – The node.
- Returns
A value of True (cf False) if the node is (cf is not) in the graph.
- Return type
-
in_node_arrays
(node: Any, include_edge_weight=False, edge_types=None, use_ilocs=False)[source]¶ Obtains the collection of neighbouring nodes with edges directed to the given node. For an undirected graph, neighbours are treated as both in-nodes and out-nodes.
- Parameters
node (any) – The node in question.
include_edge_weight (bool, default False) – If True an array of edge weights is also returned.
edge_types (list of hashable, optional) – If provided, only traverse the graph via the provided edge types when collecting neighbours.
use_ilocs (bool) – if True node is treated as a node iloc (and similarly edge_types is treated as a edge type ilocs) and the ilocs of each neighbour is returned.
- Returns
A numpy array of the neighboring in-nodes. If include_edge_weight is True then an array of edge weights is also returned in a tuple (neighbor_array, edge_weight_array)
-
in_nodes
(node: Any, include_edge_weight=False, edge_types=None, use_ilocs=False) → Iterable[Any][source]¶ Obtains the collection of neighbouring nodes with edges directed to the given node. For an undirected graph, neighbours are treated as both in-nodes and out-nodes.
- Parameters
node (any) – The node in question.
include_edge_weight (bool, default False) – If True, each neighbour in the output is a named tuple with fields node (the node ID) and weight (the edge weight)
edge_types (list of hashable, optional) – If provided, only traverse the graph via the provided edge types when collecting neighbours.
use_ilocs (bool) – if True node is treated as a node iloc (and similarly edge_types is treated as a edge type ilocs) and the ilocs of each neighbour is returned.
- Returns
The neighbouring in-nodes.
- Return type
iterable
-
info
(show_attributes=None, sample=None, truncate=20)[source]¶ Return an information string summarizing information on the current graph. This includes node and edge type information and their attributes.
Note: This requires processing all nodes and edges and could take a long time for a large graph.
- Parameters
show_attributes – Deprecated, unused.
sample – Deprecated, unused.
truncate (int, optional) – If an integer, show only the
truncate
most common node and edge type triples; ifNone
, list each one individually.
- Returns
An information string.
-
is_directed
() → bool[source]¶ Indicates whether the graph is directed (True) or undirected (False).
- Returns
The graph directedness status.
- Return type
-
neighbor_arrays
(node: Any, include_edge_weight=False, edge_types=None, use_ilocs=False)[source]¶ Obtains the collection of neighbouring nodes connected to the given node as an array of node_ids. If include_edge_weight edge is True then an array of edges weights is also returned in a tuple of (neighbor_ids, edge_weights).
- Parameters
node (any) – The node in question.
include_edge_weight (bool, default False) – If True an array of edge weights is also returned.
edge_types (list of hashable, optional) – If provided, only traverse the graph via the provided edge types when collecting neighbours.
use_ilocs (bool) – if True node is treated as a node iloc (and similarly edge_types is treated as a edge type ilocs) and the ilocs of each neighbour is returned.
- Returns
A numpy array of the neighboring nodes. If include_edge_weight is True then an array of edge weights is also returned in a tuple (neighbor_array, edge_weight_array)
-
neighbors
(node: Any, include_edge_weight=False, edge_types=None, use_ilocs=False) → Iterable[any][source]¶ Obtains the collection of neighbouring nodes connected to the given node.
- Parameters
node (any) – The node in question.
include_edge_weight (bool, default False) – If True, each neighbour in the output is a named tuple with fields node (the node ID) and weight (the edge weight)
edge_types (list of hashable, optional) – If provided, only traverse the graph via the provided edge types when collecting neighbours.
use_ilocs (bool) – if True node is treated as a node iloc (and similarly edge_types is treated as a edge type ilocs) and the ilocs of each neighbour is returned.
- Returns
The neighboring nodes.
- Return type
iterable
-
node_degrees
(use_ilocs=False) → Mapping[Any, int][source]¶ Obtains a map from node to node degree.
use_ilocs (bool): if True return node ilocs
- Returns
The degree of each node.
-
node_feature_shapes
(node_types=None)[source]¶ Get the feature shapes for the specified node types.
See also
- Parameters
node_types (list, optional) – A list of node types. If None all current node types will be used.
- Returns
A dictionary of node type and tuple feature shapes.
-
node_feature_sizes
(node_types=None)[source]¶ Get the feature sizes for the specified node types.
See also
- Parameters
node_types (list, optional) – A list of node types. If None all current node types will be used.
- Returns
A dictionary of node type and integer feature size.
-
node_features
(nodes=None, node_type=None, use_ilocs=False)[source]¶ Get the numeric feature vectors for the specified nodes or node type.
For graphs with a single node type:
graph.node_features()
to retrieve features of all nodes, in the same order asgraph.nodes()
.graph.node_features(nodes=some_node_ids)
to retrieve features for each node insome_node_ids
.
For graphs with multiple node types:
graph.node_features(node_type=some_type)
to retrieve features of all nodes of typesome_type
, in the same order asgraph.nodes(node_type=some_type)
.graph.node_features(nodes=some_node_ids, node_type=some_type)
to retrieve features for each node insome_node_ids
. All of the chosen nodes must be of typesome_type
.graph.node_features(nodes=some_node_ids)
to retrieve features for each node insome_node_ids
. All of the chosen nodes must be of the same type, which will be inferred. This will be slower than providing the node type explicitly in the previous example.
- Parameters
nodes (list or hashable, optional) – Node ID or list of node IDs, all of the same type
node_type (hashable, optional) – the type of the nodes.
- Returns
Numpy array containing the node features for the requested nodes or node type.
-
node_ids_to_ilocs
(nodes)[source]¶ Get the node ilocs for the specified node or nodes.
- Parameters
nodes (list or hashable) – node IDs
- Returns
Numpy array containing the indices for the requested nodes.
-
node_ilocs_to_ids
(node_ilocs)[source]¶ Get the node ids for the specified node ilocs.
- Parameters
node_ilocs (list or hashable) – node ilocs
- Returns
Numpy array containing the node ids for the requested nodes.
-
node_type
(node, use_ilocs=False)[source]¶ Get the type of the node
- Parameters
node – a node or iterable of nodes
use_ilocs – if True node is treated as a node iloc
- Returns
Node type or numpy array of node types
-
node_type_ilocs_to_names
(node_type_ilocs)[source]¶ Get the names of the specified node type ilocs.
- Parameters
node_type_ilocs (sequence of int) – node type ilocs
- Returns
Numpy array containing the names of the requested node types.
-
node_type_names_to_ilocs
(node_type_names)[source]¶ Get the node type ilocs for the specified node types.
- Parameters
node_type_names (sequence of hashable) – node types
- Returns
Numpy array containing the ilocs of the requested node types.
-
property
node_types
¶ Get a list of all node types in the graph.
- Returns
set of types
-
nodes
(node_type=None, use_ilocs=False) → Iterable[Any][source]¶ Obtains the collection of nodes in the graph.
- Parameters
node_type (hashable, optional) – a type of nodes that exist in the graph
use_ilocs (bool) – if True return node ilocs as a
range
object
- Returns
All the nodes in the graph if
node_type
isNone
, otherwise all the nodes in the graph of typenode_type
.
-
nodes_of_type
(node_type=None)[source]¶ Get the nodes of the graph with the specified node types.
- Parameters
node_type (hashable) – a type of nodes that exist in the graph (this must be passed, omitting it or passing
None
is deprecated)- Returns
A list of node IDs with type node_type
-
number_of_edges
() → int[source]¶ Obtains the number of edges in the graph.
- Returns
The number of edges.
- Return type
-
number_of_nodes
() → int[source]¶ Obtains the number of nodes in the graph.
- Returns
The number of nodes.
- Return type
-
out_node_arrays
(node: Any, include_edge_weight=False, edge_types=None, use_ilocs=False)[source]¶ Obtains the collection of neighbouring nodes with edges directed from the given node. For an undirected graph, neighbours are treated as both in-nodes and out-nodes.
- Parameters
node (any) – The node in question.
include_edge_weight (bool, default False) – If True an array of edge weights is also returned.
edge_types (list of hashable, optional) – If provided, only traverse the graph via the provided edge types when collecting neighbours.
use_ilocs (bool) – if True node is treated as a node iloc (and similarly edge_types is treated as a edge type ilocs) and the ilocs of each neighbour is returned.
- Returns
A numpy array of the neighboring out-nodes. If include_edge_weight is True then an array of edge weights is also returned in a tuple (neighbor_array, edge_weight_array)
-
out_nodes
(node: Any, include_edge_weight=False, edge_types=None, use_ilocs=False) → Iterable[Any][source]¶ Obtains the collection of neighbouring nodes with edges directed from the given node. For an undirected graph, neighbours are treated as both in-nodes and out-nodes.
- Parameters
node (any) – The node in question.
include_edge_weight (bool, default False) – If True, each neighbour in the output is a named tuple with fields node (the node ID) and weight (the edge weight)
edge_types (list of hashable, optional) – If provided, only traverse the graph via the provided edge types when collecting neighbours.
use_ilocs (bool) – if True node is treated as a node iloc (and similarly edge_types is treated as a edge type ilocs) and the ilocs of each neighbour is returned.
- Returns
The neighbouring out-nodes.
- Return type
iterable
-
subgraph
(nodes)[source]¶ Compute the node-induced subgraph implied by
nodes
.- Parameters
nodes (iterable) – The nodes in the subgraph.
- Returns
A
StellarGraph
orStellarDiGraph
instance containing only the nodes innodes
, and any edges between them inself
. It contains the same node & edge types, node features and edge weights as inself
.
-
to_adjacency_matrix
(nodes: Optional[Iterable] = None, weighted=False, edge_type=None)[source]¶ Obtains a SciPy sparse adjacency matrix of edge weights.
By default (
weighted=False
), each element of the matrix contains the number of edges between the two vertices (only 0 or 1 in a graph without multi-edges).- Parameters
nodes (iterable) – The optional collection of nodes comprising the subgraph. If specified, then the adjacency matrix is computed for the subgraph; otherwise, it is computed for the full graph.
weighted (bool) – If true, use the edge weight column from the graph instead of edge counts (weights from multi-edges are summed).
edge_type (hashable, optional) – If set (to an edge type), only includes edges of that type, otherwise uses all edges.
- Returns
The weighted adjacency matrix.
-
to_networkx
(node_type_attr='label', edge_type_attr='label', edge_weight_attr='weight', feature_attr='feature', node_type_name=None, edge_type_name=None, edge_weight_label=None, feature_name=None)[source]¶ Create a NetworkX MultiGraph or MultiDiGraph instance representing this graph.
- Parameters
node_type_attr (str) – the name of the attribute to use to store a node’s type (or label).
edge_type_attr (str) – the name of the attribute to use to store a edge’s type (or label).
edge_weight_attr (str) – the name of the attribute to use to store a edge’s weight.
feature_attr (str, optional) – the name of the attribute to use to store a node’s feature vector; if
None
, feature vectors are not stored within each node.node_type_name (str) – Deprecated, use
node_type_attr
.edge_type_name (str) – Deprecated, use
edge_type_attr
.edge_weight_label (str) – Deprecated, use
edge_weight_attr
.feature_name (str, optional) – Deprecated, use
feature_attr
.
- Returns
An instance of networkx.MultiDiGraph (if directed) or networkx.MultiGraph (if undirected) containing all the nodes & edges and their types & features in this graph.
-
unique_edge_type
(error_message=None)[source]¶ Return the unique edge type, for a homogeneous-edge graph.
- Parameters
error_message (str, optional) – a custom message to use for the exception; this can use the
%(found)s
placeholder to insert the real sequence of edge types.- Returns
If this graph has only one edge type, this returns that edge type, otherwise it raises a
ValueError
exception.
-
unique_node_type
(error_message=None)[source]¶ Return the unique node type, for a homogeneous-node graph.
- Parameters
error_message (str, optional) – a custom message to use for the exception; this can use the
%(found)s
placeholder to insert the real sequence of node types.- Returns
If this graph has only one node type, this returns that node type, otherwise it raises a
ValueError
exception.
-
stellargraph.
custom_keras_layers
= {...}¶ A dictionary of the
tensorflow.keras
layers defined by StellarGraph.When Keras models using StellarGraph layers are saved, they can be loaded by passing this value to the
custom_objects
parameter to model loading functions liketensorflow.keras.models.load_model
.Example:
import stellargraph as sg from tensorflow import keras keras.models.load_model("/path/to/model", custom_objects=sg.custom_keras_layers)
Data¶
The data package contains classes and functions to read, process, and query graph data
-
class
stellargraph.data.
BiasedRandomWalk
(graph, n=None, length=None, p=1.0, q=1.0, weighted=False, seed=None)[source]¶ Performs biased second order random walks (like those used in Node2Vec algorithm https://snap.stanford.edu/node2vec/) controlled by the values of two parameters p and q.
See also
Examples using this random walk:
unsupervised representation learning: Node2Vec using Gensim Word2Vec, Node2Vec using StellarGraph
node classification: Node2Vec using Gensim Word2Vec, Node2Vec using StellarGraph, Node2Vec with edge weights
link prediction: Node2Vec, comparison to CTDNE (TemporalRandomWalk), comparison of algorithms
Related functionality:
UnsupervisedSampler
for transforming random walks into links for unsupervised training of link prediction modelsNode2Vec
,Node2VecNodeGenerator
andNode2VecLinkGenerator
for training a Node2Vec using only StellarGraphOther random walks:
UniformRandomWalk
,UniformRandomMetaPathWalk
,TemporalRandomWalk
.
- Parameters
graph (StellarGraph) – Graph to traverse
n (int, optional) – Total number of random walks per root node
length (int, optional) – Maximum length of each random walk
p (float, optional) – Defines probability, 1/p, of returning to source node
q (float, optional) – Defines probability, 1/q, for moving to a node away from the source node
weighted (bool, optional) – Indicates whether the walk is unweighted or weighted
seed (int, optional) – Random number generator seed
-
run
(nodes, *, n=None, length=None, p=None, q=None, seed=None, weighted=None)[source]¶ Perform a random walk starting from the root nodes. Optional parameters default to using the values passed in during construction.
- Parameters
nodes (list) – The root nodes as a list of node IDs
n (int, optional) – Total number of random walks per root node
length (int, optional) – Maximum length of each random walk
p (float, optional) – Defines probability, 1/p, of returning to source node
q (float, optional) – Defines probability, 1/q, for moving to a node away from the source node
seed (int, optional) – Random number generator seed; default is None
weighted (bool, optional) – Indicates whether the walk is unweighted or weighted
- Returns
List of lists of nodes ids for each of the random walks
-
class
stellargraph.data.
EdgeSplitter
(g, g_master=None)[source]¶ Class for generating training and test data for link prediction in graphs.
The class requires as input a graph (in networkx format) and a percentage as a function of the total number of edges in the given graph of the number of positive and negative edges to sample. For heterogeneous graphs, the caller can also specify the type of edge and an edge property to split on. In the latter case, only a date property can be used and it must be in the format
dd/mm/yyyy
. A date to be used as a threshold value such that only edges that have date after the threshold must be given. This effects only the sampling of positive edges.Negative edges are sampled at random by (for ‘global’ method) selecting two nodes in the graph and then checking if these edges are connected or not. If not, the pair of nodes is considered a negative sample. Otherwise, it is discarded and the process repeats. Alternatively, negative edges are sampled (for ‘local’ method) using DFS search at a distance from the source node (selected at random from all nodes in the graph) sampled according to a given set of probabilities.
Positive edges can be sampled so that when they are subsequently removed from the graph, the reduced graph is either guaranteed, or not guaranteed, to remain connected. In the former case, graph connectivity is maintained by first calculating the minimum spanning tree. The edges that belong to the minimum spanning tree are protected from removal, and therefore cannot be sampled for the training set. The edges that do not belong to the minimum spanning tree are then sampled uniformly at random, until the required number of positive edges have been sampled for the training set. In the latter case, when connectedness of the reduced graph is not guaranteed, positive edges are sampled uniformly at random from all the edges in the graph, regardless of whether they belong to the spanning tree (which is not calculated in this case).
- Parameters
g (StellarGraph or networkx object) – The graph to sample edges from.
g_master (StellarGraph or networkx object) – The graph representing the original dataset and a superset of the graph g. If it is not None, then when positive and negative edges are sampled, care is taken to make sure that a true positive edge is not sampled as a negative edge.
-
train_test_split
(p=0.5, method='global', probs=None, keep_connected=False, edge_label=None, edge_attribute_label=None, edge_attribute_threshold=None, attribute_is_datetime=None, seed=None)[source]¶ Generates positive and negative edges and a graph that has the same nodes as the original but the positive edges removed. It can be used to generate data from homogeneous and heterogeneous graphs.
For heterogeneous graphs, positive and negative examples can be generated based on specified edge type or edge type and edge property given a threshold value for the latter.
- Parameters
p (float) – Percent of edges to be returned. It is calculated as a function of the total number of edges in the original graph. If the graph is heterogeneous, the percentage is calculated as a function of the total number of edges that satisfy the edge_label, edge_attribute_label and edge_attribute_threshold values given.
method (str) – How negative edges are sampled. If ‘global’, then nodes are selected at random. If ‘local’ then the first nodes is sampled from all nodes in the graph, but the second node is chosen to be from the former’s local neighbourhood.
probs (list) – list The probabilities for sampling a node that is k-hops from the source node, e.g., [0.25, 0.75] means that there is a 0.25 probability that the target node will be 1-hope away from the source node and 0.75 that it will be 2 hops away from the source node. This only affects sampling of negative edges if method is set to ‘local’.
keep_connected (bool) – If True then when positive edges are removed care is taken that the reduced graph remains connected. If False, positive edges are removed without guaranteeing the connectivity of the reduced graph.
edge_label (str, optional) – of edges to split on.
edge_attribute_label (str, optional) – The label for the edge attribute to split on.
edge_attribute_threshold (str, optional) – The threshold value applied to the edge attribute when sampling positive examples.
attribute_is_datetime (bool, optional) – Specifies if edge attribute is datetime or not.
seed (int, optional) – seed for random number generator, positive int or 0
- Returns
The reduced graph (positive edges removed) and the edge data as 2 numpy arrays, the first array of dimensionality N × 2 (where N is the number of edges) holding the node ids for the edges and the second of dimensionality N × 1 holding the edge labels, 0 for negative and 1 for positive examples. The graph matches the input graph passed to the
EdgeSplitter
constructor: the returned graph is aStellarGraph
instance if the input graph was one, and, similarly, a NetworkX graph if the input graph was one.
-
class
stellargraph.data.
SampledBreadthFirstWalk
(graph, graph_schema=None, seed=None)[source]¶ Breadth First Walk that generates a sampled number of paths from a starting node. It can be used to extract a random sub-graph starting from a set of initial nodes.
-
run
(nodes, n_size, n=1, seed=None, weighted=False)[source]¶ Performs a sampled breadth-first walk starting from the root nodes.
- Parameters
nodes (list) – A list of root node ids such that from each node a BFWs will be generated up to the given depth. The depth of each of the walks is inferred from the length of the
n_size
list parameter.n_size (list of int) – The number of neighbouring nodes to expand at each depth of the walk. Sampling of neighbours is always done with replacement regardless of the node degree and number of neighbours requested.
n (int) – Number of walks per node id.
seed (int, optional) – Random number generator seed; Default is None.
weighted (bool, optional) – If True, sample neighbours using the edge weights in the graph.
- Returns
A list of lists such that each list element is a sequence of ids corresponding to a BFW.
-
-
class
stellargraph.data.
SampledHeterogeneousBreadthFirstWalk
(graph, graph_schema=None, seed=None)[source]¶ Breadth First Walk for heterogeneous graphs that generates a sampled number of paths from a starting node. It can be used to extract a random sub-graph starting from a set of initial nodes.
-
run
(nodes, n_size, n=1, seed=None)[source]¶ Performs a sampled breadth-first walk starting from the root nodes.
- Parameters
nodes (list) – A list of root node ids such that from each node n BFWs will be generated with the number of samples per hop specified in n_size.
n_size (int) – The number of neighbouring nodes to expand at each depth of the walk. Sampling of
n (int, default 1) – Number of walks per node id. Neighbours with replacement is always used regardless of the node degree and number of neighbours requested.
seed (int, optional) – Random number generator seed; default is None
- Returns
A list of lists such that each list element is a sequence of ids corresponding to a sampled Heterogeneous BFW.
-
-
class
stellargraph.data.
TemporalRandomWalk
(graph, cw_size=None, max_walk_length=80, initial_edge_bias=None, walk_bias=None, p_walk_success_threshold=0.01, seed=None)[source]¶ Performs temporal random walks on the given graph. The graph should contain numerical edge weights that correspond to the time at which the edge was created. Exact units are not relevant for the algorithm, only the relative differences (e.g. seconds, days, etc).
See also
Example using this random walk: link prediction with CTDNE
Related functionality: other random walks:
UniformRandomWalk
,BiasedRandomWalk
,UniformRandomMetaPathWalk
.- Parameters
graph (StellarGraph) – Graph to traverse
cw_size (int, optional) – Size of context window. Also used as the minimum walk length, since a walk must generate at least 1 context window for it to be useful.
max_walk_length (int, optional) – Maximum length of each random walk. Should be greater than or equal to the context window size.
initial_edge_bias (str, optional) –
Distribution to use when choosing a random initial temporal edge to start from. Available options are:
None (default) - The initial edge is picked from a uniform distribution.
”exponential” - Heavily biased towards more recent edges.
walk_bias (str, optional) –
Distribution to use when choosing a random neighbour to walk through. Available options are:
None (default) - Neighbours are picked from a uniform distribution.
”exponential” - Exponentially decaying probability, resulting in a bias towards shorter time gaps.
p_walk_success_threshold (float, optional) – Lower bound for the proportion of successful (i.e. longer than minimum length) walks. If the 95% percentile of the estimated proportion is less than the provided threshold, a RuntimeError will be raised. The default value of 0.01 means an error is raised if less than 1% of the attempted random walks are successful. This parameter exists to catch any potential situation where too many unsuccessful walks can cause an infinite or very slow loop.
seed (int, optional) – Random number generator seed.
-
run
(num_cw, cw_size=None, max_walk_length=None, initial_edge_bias=None, walk_bias=None, p_walk_success_threshold=None, seed=None)[source]¶ Perform a time respecting random walk starting from randomly selected temporal edges. Optional parameters default to using the values passed in during construction.
- Parameters
num_cw (int) – Total number of context windows to generate. For comparable results to most other random walks, this should be a multiple of the number of nodes in the graph.
cw_size (int, optional) – Size of context window. Also used as the minimum walk length, since a walk must generate at least 1 context window for it to be useful.
max_walk_length (int, optional) – Maximum length of each random walk. Should be greater than or equal to the context window size.
initial_edge_bias (str, optional) –
Distribution to use when choosing a random initial temporal edge to start from. Available options are:
None (default) - The initial edge is picked from a uniform distribution.
”exponential” - Heavily biased towards more recent edges.
walk_bias (str, optional) –
Distribution to use when choosing a random neighbour to walk through. Available options are:
None (default) - Neighbours are picked from a uniform distribution.
”exponential” - Exponentially decaying probability, resulting in a bias towards shorter time gaps.
p_walk_success_threshold (float, optional) – Lower bound for the proportion of successful (i.e. longer than minimum length) walks. If the 95% percentile of the estimated proportion is less than the provided threshold, a RuntimeError will be raised. The default value of 0.01 means an error is raised if less than 1% of the attempted random walks are successful. This parameter exists to catch any potential situation where too many unsuccessful walks can cause an infinite or very slow loop.
seed (int, optional) – Random number generator seed; default is None.
- Returns
List of lists of node ids for each of the random walks.
-
class
stellargraph.data.
UniformRandomMetaPathWalk
(graph, n=None, length=None, metapaths=None, seed=None)[source]¶ For heterogeneous graphs, it performs uniform random walks based on given metapaths. Optional parameters default to using the values passed in during construction.
See also
Examples using this random walk:
Related functionality:
UnsupervisedSampler
for transforming random walks into links for unsupervised training of link prediction modelsOther random walks:
UniformRandomWalk
,BiasedRandomWalk
,TemporalRandomWalk
.
- Parameters
graph (StellarGraph) – Graph to traverse
n (int, optional) – Total number of random walks per root node
length (int, optional) – Maximum length of each random walk
metapaths (list of list, optional) – List of lists of node labels that specify a metapath schema, e.g., [[‘Author’, ‘Paper’, ‘Author’], [‘Author, ‘Paper’, ‘Venue’, ‘Paper’, ‘Author’]] specifies two metapath schemas of length 3 and 5 respectively.
seed (int, optional) – Random number generator seed
-
run
(nodes, *, n=None, length=None, metapaths=None, seed=None)[source]¶ Performs metapath-driven uniform random walks on heterogeneous graphs.
- Parameters
nodes (list) – The root nodes as a list of node IDs
n (int, optional) – Total number of random walks per root node
length (int, optional) – Maximum length of each random walk
metapaths (list of list, optional) – List of lists of node labels that specify a metapath schema, e.g., [[‘Author’, ‘Paper’, ‘Author’], [‘Author, ‘Paper’, ‘Venue’, ‘Paper’, ‘Author’]] specifies two metapath schemas of length 3 and 5 respectively.
seed (int, optional) – Random number generator seed; default is None
- Returns
List of lists of nodes ids for each of the random walks generated
-
class
stellargraph.data.
UniformRandomWalk
(graph, n=None, length=None, seed=None)[source]¶ Performs uniform random walks on the given graph
See also
Related functionality:
UnsupervisedSampler
for transforming random walks into links for unsupervised training of link prediction modelsOther random walks:
BiasedRandomWalk
,UniformRandomMetaPathWalk
,TemporalRandomWalk
.
- Parameters
graph (StellarGraph) – Graph to traverse
n (int, optional) – Total number of random walks per root node
length (int, optional) – Maximum length of each random walk
seed (int, optional) – Random number generator seed
-
class
stellargraph.data.
UnsupervisedSampler
(G, nodes=None, length=2, number_of_walks=1, seed=None, walker=None)[source]¶ The UnsupervisedSampler is responsible for sampling walks in the given graph and returning positive and negative samples w.r.t. those walks, on demand.
The positive samples are all the (target, context) pairs from the walks and the negative samples are contexts generated for each target based on a sampling distribution.
By default, a UniformRandomWalk is used, but a custom walker can be specified instead. An error will be raised if other parameters are specified along with a custom walker.
See also
Examples using this sampler:
Attri2Vec: node classification link prediction, unsupervised representation learning
GraphSAGE: unsupervised representation learning
Node2Vec: node classification, unsupervised representation learning
Built-in classes for
walker
:UniformRandomWalk
,BiasedRandomWalk
,UniformRandomMetaPathWalk
.- Parameters
G (StellarGraph) – A stellargraph with features.
nodes (iterable, optional) – If not provided, all nodes in the graph are used.
length (int) – Length of the walks for the default UniformRandomWalk walker. Length must be at least 2.
number_of_walks (int) – Number of walks from each root node for the default UniformRandomWalk walker.
seed (int, optional) – Random seed for the default UniformRandomWalk walker.
walker (RandomWalk, optional) – A RandomWalk object to use instead of the default UniformRandomWalk walker.
-
run
(batch_size)[source]¶ This method returns a batch_size number of positive and negative samples from the graph. A random walk is generated from each root node, which are transformed into positive context pairs, and the same number of negative pairs are generated from a global node sampling distribution. The resulting list of context pairs are shuffled and converted to batches of size
batch_size
.Currently the global node sampling distribution for the negative pairs is the degree distribution to the 3/4 power. This is the same used in node2vec (https://snap.stanford.edu/node2vec/).
- Parameters
batch_size (int) – The number of samples to generate for each batch. This must be an even number.
- Returns
List of batches, where each batch is a tuple of (list context pairs, list of labels)
Generators¶
The mapper package contains classes and functions to map graph data to neural network inputs
-
class
stellargraph.mapper.
AdjacencyPowerGenerator
(G, num_powers=10, weighted=False)[source]¶ A data generator for use with the Watch Your Step algorithm [1]. It calculates and returns the first
num_powers
of the adjacency matrix row by row.See also
Model using this generator:
WatchYourStep
.Example using this generator: unsupervised representation learning
- Parameters
G (StellarGraph) – a machine-learning StellarGraph-type graph
num_powers (int) – the number of adjacency powers to calculate. Defaults to 10 as this value was found to perform well by the authors of the paper.
weighted (bool, optional) – if True, use the edge weights from
G
; if False, treat the graph as unweighted.
-
flow
(batch_size, num_parallel_calls=1)[source]¶ Creates the tensorflow.data.Dataset object for training node embeddings from powers of the adjacency matrix.
-
num_batch_dims
()[source]¶ Returns the number of batch dimensions in returned tensors (_not_ the batch size itself).
For instance, for full batch methods like GCN, the feature has shape
1 × number of nodes × feature size
, where the 1 is a “dummy” batch dimension andnumber of nodes
is the real batch size (every node in the graph).
-
class
stellargraph.mapper.
Attri2VecLinkGenerator
(G, batch_size, name=None)[source]¶ A data generator for context node prediction with the attri2vec model.
At minimum, supply the StellarGraph and the batch size.
The supplied graph should be a StellarGraph object with node features.
Use the
flow()
method supplying the nodes and targets, or an UnsupervisedSampler instance that generates node samples on demand, to get an object that can be used as a Keras data generator.Example:
G_generator = Attri2VecLinkGenerator(G, 50) train_data_gen = G_generator.flow(edge_ids, edge_labels)
See also
Model using this generator:
Attri2Vec
.An example using this generator (see the model for more): link prediction.
Related functionality:
UnsupervisedSampler
for unsupervised training using random walksAttri2VecNodeGenerator
for node classification and related tasks
- Parameters
G (StellarGraph) – A machine-learning ready graph.
batch_size (int) – Size of batch of links to return.
optional (name,) – Name of generator.
-
sample_features
(head_links, batch_num)[source]¶ Sample content features of the target nodes and the ids of the context nodes and return these as a list of feature arrays for the attri2vec algorithm.
- Parameters
head_links – An iterable of edges to perform sampling for.
batch_num (int) – Batch number
- Returns
A list of feature arrays, with each element being the feature of a target node and the id of the corresponding context node.
-
class
stellargraph.mapper.
Attri2VecNodeGenerator
(G, batch_size, name=None)[source]¶ A node feature generator for node representation prediction with the attri2vec model.
At minimum, supply the StellarGraph and the batch size.
The supplied graph should be a StellarGraph object with node features.
Use the
flow()
method supplying the nodes to get an object that can be used as a Keras data generator.Example:
G_generator = Attri2VecNodeGenerator(G, 50) data_gen = G_generator.flow(node_ids)
See also
Model using this generator:
Attri2Vec
.An example using this generator (see the model for more): node classification.
Related functionality:
Attri2VecLinkGenerator
for training, link prediction and related tasks.- Parameters
G (StellarGraph) – The machine-learning ready graph.
batch_size (int) – Size of batch to return.
-
flow
(node_ids)[source]¶ Creates a generator/sequence object for node representation prediction with the supplied node ids.
The node IDs are the nodes to inference on: the embeddings calculated for these nodes are passed to the downstream task. These are a subset/all of the nodes in the graph.
- Parameters
node_ids – an iterable of node IDs.
- Returns
A NodeSequence object to use with the Attri2Vec model in the Keras method
predict
.
-
flow_from_dataframe
(node_ids)[source]¶ Creates a generator/sequence object for node representation prediction by using the index of the supplied dataframe as the node ids.
- Parameters
node_ids – a Pandas DataFrame of node_ids.
- Returns
A NodeSequence object to use with the Attri2Vec model in the Keras method
predict
.
-
sample_features
(head_nodes, batch_num)[source]¶ Sample content features of the head nodes, and return these as a list of feature arrays for the attri2vec algorithm.
- Parameters
head_nodes – An iterable of head nodes to perform sampling on.
batch_num (int) – Batch number
- Returns
A list of feature arrays, with each element being the feature of a head node.
-
class
stellargraph.mapper.
ClusterNodeGenerator
(G, clusters=1, q=1, lam=0.1, weighted=False, name=None)[source]¶ A data generator for use with GCN, GAT and APPNP models on homogeneous graphs, see [1].
The supplied graph G should be a StellarGraph object with node features. Use the
flow()
method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.This generator will supply the features array and the adjacency matrix to a mini-batch Keras graph ML model.
[1] W. Chiang, X. Liu, S. Si, Y. Li, S. Bengio, C. Hsieh, 2019.
- Parameters
G (StellarGraph) – a machine-learning StellarGraph-type graph
clusters (int or list, optional) – If int, it indicates the number of clusters (default is 1, corresponding to the entire graph). If clusters is greater than 1, then nodes are randomly assigned to a cluster. If list, then it should be a list of lists of node IDs, such that each list corresponds to a cluster of nodes in G. The clusters should be non-overlapping.
q (int, optional) – The number of clusters to combine for each mini-batch (default is 1). The total number of clusters must be divisible by q.
lam (float, optional) – The mixture coefficient for adjacency matrix normalisation (default is 0.1). Valid values are in the interval [0, 1].
weighted (bool, optional) – if True, use the edge weights from
G
; if False, treat the graph as unweighted.name (str, optional) – Name for the node generator.
-
default_corrupt_input_index_groups
()[source]¶ Optionally returns the indices of input tensors that can be shuffled for
CorruptedGenerator
to use inDeepGraphInfomax
.If this isn’t overridden, this method returns None, indicating that the generator doesn’t have a default or “canonical” set of indices that can be corrupted for Deep Graph Infomax.
-
flow
(node_ids, targets=None, name=None)[source]¶ Creates a generator/sequence object for training, evaluation, or prediction with the supplied node ids and numeric targets.
- Parameters
node_ids (iterable) – an iterable of node ids for the nodes of interest (e.g., training, validation, or test set nodes)
targets (2d array, optional) – a 2D array of numeric node targets with shape
(len(node_ids), target_size)
name (str, optional) – An optional name for the returned generator object.
- Returns
A
ClusterNodeSequence
object to use withGCN
,GAT
orAPPNP
in Keras methodsfit()
,evaluate()
, andpredict()
.
-
num_batch_dims
()[source]¶ Returns the number of batch dimensions in returned tensors (_not_ the batch size itself).
For instance, for full batch methods like GCN, the feature has shape
1 × number of nodes × feature size
, where the 1 is a “dummy” batch dimension andnumber of nodes
is the real batch size (every node in the graph).
-
class
stellargraph.mapper.
CorruptedGenerator
(base_generator, *, corrupt_index_groups=None)[source]¶ Keras compatible data generator that wraps a
Generator
and provides corrupted data for training Deep Graph Infomax.See also
Model using this generator:
DeepGraphInfomax
.Examples using this generator:
Generators that support corruption natively:
FullBatchNodeGenerator
,RelationalFullBatchNodeGenerator
,GraphSAGENodeGenerator
,DirectedGraphSAGENodeGenerator
,HinSAGENodeGenerator
,ClusterNodeGenerator
.- Parameters
base_generator (Generator) – the uncorrupted Generator object.
corrupt_index_groups (list of list of int, optional) – an explicit list of which input tensors should be shuffled to create the corrupted inputs. This is a list of “groups”, where each group is a non-empty list of indices into the tensors that the base generator yields. The tensors within each group are flattened to be rank-2 (preserving the last dimension, of node features), concatenated, shuffled and split back to their original shapes, to compute new corrupted values for each tensors within that group. Each group has this operation done independently. Each index can appear in at most one group. (This parameter is only optional if
base_generator
provides a default viadefault_corrupt_input_index_groups
. Otherwise, this parameter must be specified.)
-
flow
(*args, **kwargs)[source]¶ Creates the corrupted :class: Sequence object for training Deep Graph Infomax.
- Parameters
args – the positional arguments for the self.base_generator.flow(…) method
kwargs – the keyword arguments for the self.base_generator.flow(…) method
-
num_batch_dims
()[source]¶ Returns the number of batch dimensions in returned tensors (_not_ the batch size itself).
For instance, for full batch methods like GCN, the feature has shape
1 × number of nodes × feature size
, where the 1 is a “dummy” batch dimension andnumber of nodes
is the real batch size (every node in the graph).
-
class
stellargraph.mapper.
DirectedGraphSAGELinkGenerator
(G, batch_size, in_samples, out_samples, seed=None, name=None, weighted=False)[source]¶ A data generator for link prediction with directed Homogeneous GraphSAGE models
At minimum, supply the StellarDiGraph, the batch size, and the number of node samples (separately for in-nodes and out-nodes) for each layer of the GraphSAGE model.
The supplied graph should be a StellarDiGraph object with node features.
Use the
flow()
method supplying the nodes and (optionally) targets, or an UnsupervisedSampler instance that generates node samples on demand, to get an object that can be used as a Keras data generator.Example:
G_generator = DirectedGraphSageLinkGenerator(G, 50, [10,10], [10,10]) train_data_gen = G_generator.flow(edge_ids)
See also
Model using this generator:
GraphSAGE
.Related functionality:
UnsupervisedSampler
for unsupervised training using random walksDirectedGraphSAGENodeGenerator
for node classification and related tasksGraphSAGELinkGenerator
for undirected graphsHinSAGELinkGenerator
for heterogeneous graphs
- Parameters
G (StellarGraph) – A machine-learning ready graph.
batch_size (int) – Size of batch of links to return.
in_samples (list) – The number of in-node samples per layer (hop) to take.
out_samples (list) – The number of out-node samples per layer (hop) to take.
optional (name,) – Name of generator.
weighted (bool, optional) – If True, sample neighbours using the edge weights in the graph.
-
sample_features
(head_links, batch_num)[source]¶ Sample neighbours recursively from the head links, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.
- Parameters
head_links – An iterable of head links to perform sampling on.
- Returns
(len(head_nodes), num_sampled_at_layer, feature_size)
wherenum_sampled_at_layer
is the total number (cumulative product) of nodes sampled at the given number of hops from each head node, given the sequence of in/out directions.- Return type
A list of feature tensors from the sampled nodes at each layer, each of shape
-
class
stellargraph.mapper.
DirectedGraphSAGENodeGenerator
(G, batch_size, in_samples, out_samples, seed=None, name=None, weighted=False)[source]¶ A data generator for node prediction with homogeneous GraphSAGE models on directed graphs.
At minimum, supply the StellarDiGraph, the batch size, and the number of node samples (separately for in-nodes and out-nodes) for each layer of the GraphSAGE model.
The supplied graph should be a StellarDiGraph object with node features.
Use the
flow()
method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.Example:
G_generator = DirectedGraphSAGENodeGenerator(G, 50, [10,5], [5,1]) train_data_gen = G_generator.flow(train_node_ids, train_node_labels) test_data_gen = G_generator.flow(test_node_ids)
See also
Model using this generator:
DirectedGraphSAGE
.Example using this generator: node classification.
Related functionality:
Neo4jDirectedGraphSAGENodeGenerator
for usingDirectedGraphSAGE
with Neo4jCorruptedGenerator
for unsupervised training usingDeepGraphInfomax
DirectedGraphSAGELinkGenerator
for link prediction and related tasksGraphSAGENodeGenerator
for undirected graphsHinSAGENodeGenerator
for heterogeneous graphs
- Parameters
G (StellarDiGraph) – The machine-learning ready graph.
batch_size (int) – Size of batch to return.
in_samples (list) – The number of in-node samples per layer (hop) to take.
out_samples (list) – The number of out-node samples per layer (hop) to take.
seed (int) – [Optional] Random seed for the node sampler.
weighted (bool, optional) – If True, sample neighbours using the edge weights in the graph.
-
default_corrupt_input_index_groups
()[source]¶ Optionally returns the indices of input tensors that can be shuffled for
CorruptedGenerator
to use inDeepGraphInfomax
.If this isn’t overridden, this method returns None, indicating that the generator doesn’t have a default or “canonical” set of indices that can be corrupted for Deep Graph Infomax.
-
sample_features
(head_nodes, batch_num)[source]¶ Sample neighbours recursively from the head nodes, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.
- Parameters
head_nodes – An iterable of head nodes to perform sampling on.
batch_num (int) – Batch number
- Returns
(len(head_nodes), num_sampled_at_layer, feature_size)
wherenum_sampled_at_layer
is the total number (cumulative product) of nodes sampled at the given number of hops from each head node, given the sequence of in/out directions.- Return type
A list of feature tensors from the sampled nodes at each layer, each of shape
-
class
stellargraph.mapper.
FullBatchLinkGenerator
(G, name=None, method='gcn', k=1, sparse=True, transform=None, teleport_probability=0.1, weighted=False)[source]¶ A data generator for use with full-batch models on homogeneous graphs, e.g., GCN, GAT, SGC. The supplied graph G should be a StellarGraph object with node features.
Use the
flow()
method supplying the links as a list of(src, dst)
tuples of node IDs and (optionally) targets.This generator will supply the features array and the adjacency matrix to a full-batch Keras graph ML model. There is a choice to supply either a sparse adjacency matrix (the default) or a dense adjacency matrix, with the sparse argument.
For these algorithms the adjacency matrix requires preprocessing and the ‘method’ option should be specified with the correct preprocessing for each algorithm. The options are as follows:
method='gcn'
: Normalizes the adjacency matrix for the GCN algorithm. This implements the linearized convolution of Eq. 8 in [1].method='sgc'
: This replicates the k-th order smoothed adjacency matrix to implement the Simplified Graph Convolutions of Eq. 8 in [2].method='self_loops'
ormethod='gat'
: Simply sets the diagonal elements of the adjacency matrix to one, effectively adding self-loops to the graph. This is used by the GAT algorithm of [3].method='ppnp'
: Calculates the personalized page rank matrix of Eq. 2 in [4].
[1] Kipf and Welling, 2017. [2] Wu et al. 2019. [3] Veličković et al., 2018. [4] Klicpera et al., 2018.
Example:
G_generator = FullBatchLinkGenerator(G) train_flow = G_generator.flow([(1,2), (3,4), (5,6)], [0, 1, 1]) # Fetch the data from train_flow, and feed into a Keras model: x_inputs, y_train = train_flow[0] model.fit(x=x_inputs, y=y_train) # Alternatively, use the generator itself with model.fit: model.fit(train_flow, epochs=num_epochs)
See also
Models using this generator:
GCN
,GAT
,APPNP
,PPNP
.Example using this generator: link classification with GCN.
Related generator:
FullBatchNodeGenerator
for node classification and similar tasks.- Parameters
G (StellarGraph) – a machine-learning StellarGraph-type graph
name (str) – an optional name of the generator
method (str) – Method to preprocess adjacency matrix. One of
gcn
(default),sgc
,self_loops
, ornone
.k (None or int) – This is the smoothing order for the
sgc
method. This should be positive integer.transform (callable) – an optional function to apply on features and adjacency matrix the function takes
(features, Aadj)
as arguments.sparse (bool) – If True (default) a sparse adjacency matrix is used, if False a dense adjacency matrix is used.
teleport_probability (float) – teleport probability between 0.0 and 1.0. “probability” of returning to the starting node in the propagation step as in [4].
weighted (bool, optional) – if True, use the edge weights from
G
; if False, treat the graph as unweighted.
-
flow
(link_ids, targets=None, use_ilocs=False)[source]¶ Creates a generator/sequence object for training or evaluation with the supplied node ids and numeric targets.
- Parameters
link_ids – an iterable of link ids specified as tuples of node ids or an array of shape (N_links, 2) specifying the links.
targets – a 1D or 2D array of numeric node targets with shape
(len(node_ids),)
or(len(node_ids), target_size)
use_ilocs (bool) – if True, node_ids are represented by ilocs, otherwise node_ids need to be transformed into ilocs
- Returns
A NodeSequence object to use with GCN or GAT models in Keras methods
fit()
,evaluate()
, andpredict()
-
class
stellargraph.mapper.
FullBatchNodeGenerator
(G, name=None, method='gcn', k=1, sparse=True, transform=None, teleport_probability=0.1, weighted=False)[source]¶ A data generator for use with full-batch models on homogeneous graphs, e.g., GCN, GAT, SGC. The supplied graph G should be a StellarGraph object with node features.
Use the
flow()
method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.This generator will supply the features array and the adjacency matrix to a full-batch Keras graph ML model. There is a choice to supply either a sparse adjacency matrix (the default) or a dense adjacency matrix, with the sparse argument.
For these algorithms the adjacency matrix requires preprocessing and the ‘method’ option should be specified with the correct preprocessing for each algorithm. The options are as follows:
method='gcn'
: Normalizes the adjacency matrix for the GCN algorithm. This implements the linearized convolution of Eq. 8 in [1].method='sgc'
: This replicates the k-th order smoothed adjacency matrix to implement the Simplified Graph Convolutions of Eq. 8 in [2].method='self_loops'
ormethod='gat'
: Simply sets the diagonal elements of the adjacency matrix to one, effectively adding self-loops to the graph. This is used by the GAT algorithm of [3].method='ppnp'
: Calculates the personalized page rank matrix of Eq. 2 in [4].
[1] Kipf and Welling, 2017. [2] Wu et al. 2019. [3] Veličković et al., 2018. [4] Klicpera et al., 2018.
Example:
G_generator = FullBatchNodeGenerator(G) train_flow = G_generator.flow(node_ids, node_targets) # Fetch the data from train_flow, and feed into a Keras model: x_inputs, y_train = train_flow[0] model.fit(x=x_inputs, y=y_train) # Alternatively, use the generator itself with model.fit: model.fit(train_flow, epochs=num_epochs)
See also
Models using this generator:
GCN
,GAT
,APPNP
,PPNP
.Example using this generator (see individual models for more): node classification.
Related generators:
ClusterNodeGenerator
for scalable/inductive trainingCorruptedGenerator
for unsupervised training withDeepGraphInfomax
FullBatchLinkGenerator
for link prediction and similar tasksRelationalFullBatchNodeGenerator
for multiple edge types, withRGCN
PaddedGraphGenerator
for graph classification
- Parameters
G (StellarGraph) – a machine-learning StellarGraph-type graph
name (str) – an optional name of the generator
method (str) – Method to preprocess adjacency matrix. One of
gcn
(default),sgc
,self_loops
, ornone
.k (None or int) – This is the smoothing order for the
sgc
method. This should be positive integer.transform (callable) – an optional function to apply on features and adjacency matrix the function takes
(features, Aadj)
as arguments.sparse (bool) – If True (default) a sparse adjacency matrix is used, if False a dense adjacency matrix is used.
teleport_probability (float) – teleport probability between 0.0 and 1.0. “probability” of returning to the starting node in the propagation step as in [4].
weighted (bool, optional) – if True, use the edge weights from
G
; if False, treat the graph as unweighted.
-
default_corrupt_input_index_groups
()[source]¶ Optionally returns the indices of input tensors that can be shuffled for
CorruptedGenerator
to use inDeepGraphInfomax
.If this isn’t overridden, this method returns None, indicating that the generator doesn’t have a default or “canonical” set of indices that can be corrupted for Deep Graph Infomax.
-
flow
(node_ids, targets=None, use_ilocs=False)[source]¶ Creates a generator/sequence object for training or evaluation with the supplied node ids and numeric targets.
- Parameters
node_ids – an iterable of node ids for the nodes of interest (e.g., training, validation, or test set nodes)
targets – a 1D or 2D array of numeric node targets with shape
(len(node_ids),)
or(len(node_ids), target_size)
use_ilocs (bool) – if True, node_ids are represented by ilocs, otherwise node_ids need to be transformed into ilocs
- Returns
A NodeSequence object to use with GCN or GAT models in Keras methods
fit()
,evaluate()
, andpredict()
-
class
stellargraph.mapper.
Generator
[source]¶ A generator supports creating sequences for input into graph machine learning algorithms via the flow method.
-
default_corrupt_input_index_groups
()[source]¶ Optionally returns the indices of input tensors that can be shuffled for
CorruptedGenerator
to use inDeepGraphInfomax
.If this isn’t overridden, this method returns None, indicating that the generator doesn’t have a default or “canonical” set of indices that can be corrupted for Deep Graph Infomax.
-
abstract
flow
(*args, **kwargs)[source]¶ Create a Keras Sequence or similar input, appropriate for a graph machine learning model.
-
abstract
num_batch_dims
()[source]¶ Returns the number of batch dimensions in returned tensors (_not_ the batch size itself).
For instance, for full batch methods like GCN, the feature has shape
1 × number of nodes × feature size
, where the 1 is a “dummy” batch dimension andnumber of nodes
is the real batch size (every node in the graph).
-
-
class
stellargraph.mapper.
GraphSAGELinkGenerator
(G, batch_size, num_samples, seed=None, name=None, weighted=False)[source]¶ A data generator for link prediction with Homogeneous GraphSAGE models
At minimum, supply the StellarGraph, the batch size, and the number of node samples for each layer of the GraphSAGE model.
The supplied graph should be a StellarGraph object with node features.
Use the
flow()
method supplying the nodes and (optionally) targets, or an UnsupervisedSampler instance that generates node samples on demand, to get an object that can be used as a Keras data generator.Example:
G_generator = GraphSageLinkGenerator(G, 50, [10,10]) train_data_gen = G_generator.flow(edge_ids)
See also
Model using this generator:
GraphSAGE
.Some examples using this generator (see the model for more):
Related functionality:
UnsupervisedSampler
for unsupervised training using random walksGraphSAGENodeGenerator
for node classification and related tasksDirectedGraphSAGELinkGenerator
for directed graphsHinSAGELinkGenerator
for heterogeneous graphs
- Parameters
G (StellarGraph) – A machine-learning ready graph.
batch_size (int) – Size of batch of links to return.
num_samples (list) – List of number of neighbour node samples per GraphSAGE layer (hop) to take.
weighted (bool, optional) – If True, sample neighbours using the edge weights in the graph.
-
sample_features
(head_links, batch_num)[source]¶ Sample neighbours recursively from the head nodes, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.
- Parameters
head_links – An iterable of edges to perform sampling for.
batch_num (int) – Batch number
- Returns
A list of the same length as
num_samples
of collected features from the sampled nodes of shape:(len(head_nodes), num_sampled_at_layer, feature_size)
wherenum_sampled_at_layer
is the cumulative product of num_samples for that layer.
-
class
stellargraph.mapper.
GraphSAGENodeGenerator
(G, batch_size, num_samples, seed=None, name=None, weighted=False)[source]¶ A data generator for node prediction with Homogeneous GraphSAGE models
At minimum, supply the StellarGraph, the batch size, and the number of node samples for each layer of the GraphSAGE model.
The supplied graph should be a StellarGraph object with node features.
Use the
flow()
method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.Example:
G_generator = GraphSAGENodeGenerator(G, 50, [10,10]) train_data_gen = G_generator.flow(train_node_ids, train_node_labels) test_data_gen = G_generator.flow(test_node_ids)
See also
Model using this generator:
GraphSAGE
.Some examples using this generator (see the model for more):
Related functionality:
Neo4jGraphSAGENodeGenerator
for usingGraphSAGE
with Neo4jCorruptedGenerator
for unsupervised training usingDeepGraphInfomax
GraphSAGELinkGenerator
for link prediction, unsupervised training using random walks and related tasksDirectedGraphSAGENodeGenerator
for directed graphsHinSAGENodeGenerator
for heterogeneous graphs
- Parameters
G (StellarGraph) – The machine-learning ready graph.
batch_size (int) – Size of batch to return.
num_samples (list) – The number of samples per layer (hop) to take.
seed (int) – [Optional] Random seed for the node sampler.
weighted (bool, optional) – If True, sample neighbours using the edge weights in the graph.
-
default_corrupt_input_index_groups
()[source]¶ Optionally returns the indices of input tensors that can be shuffled for
CorruptedGenerator
to use inDeepGraphInfomax
.If this isn’t overridden, this method returns None, indicating that the generator doesn’t have a default or “canonical” set of indices that can be corrupted for Deep Graph Infomax.
-
sample_features
(head_nodes, batch_num)[source]¶ Sample neighbours recursively from the head nodes, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.
- Parameters
head_nodes – An iterable of head nodes to perform sampling on.
batch_num (int) – Batch number
- Returns
A list of the same length as
num_samples
of collected features from the sampled nodes of shape:(len(head_nodes), num_sampled_at_layer, feature_size)
wherenum_sampled_at_layer
is the cumulative product ofnum_samples
for that layer.
-
class
stellargraph.mapper.
GraphWaveGenerator
(G, scales=5, 10, degree=20)[source]¶ Implementation of the GraphWave structural embedding algorithm from the paper: “Learning Structural Node Embeddings via Diffusion Wavelets” (https://arxiv.org/pdf/1710.10321.pdf)
This class is minimally with a StellarGraph object. Calling the flow function will return a TensorFlow DataSet that contains the GraphWave embeddings.
This implementation differs from the paper by removing the automatic method of calculating scales. This method was found to not work well in practice, and replicating the results of the paper requires manually specifying much larger scales than those automatically calculated.
See also
Example using this generator: unsupervised representation learning.
- Parameters
G (StellarGraph) – the StellarGraph object.
scales (iterable of floats) – the wavelet scales to use. Smaller values embed smaller scale structural features, and larger values embed larger structural features.
degree – the degree of the Chebyshev polynomial to use. Higher degrees yield more accurate results but at a higher computational cost. According to [1], the default value of 20 is accurate enough for most applications.
[1] D. I. Shuman, P. Vandergheynst, and P. Frossard, “Chebyshev Polynomial Approximation for Distributed Signal Processing,” https://arxiv.org/abs/1105.1891
-
flow
(node_ids, sample_points, batch_size, targets=None, shuffle=False, seed=None, repeat=False, num_parallel_calls=1)[source]¶ Creates a TensorFlow DataSet object of GraphWave embeddings.
The dimension of the embeddings are 2 * len(scales) * len(sample_points).
- Parameters
node_ids – an iterable of node ids for the nodes of interest (e.g., training, validation, or test set nodes)
sample_points – a 1D array of points at which to sample the characteristic function. This should be of the form: sample_points=np.linspace(0, max_val, number_of_samples) and is graph dependent.
batch_size (int) – the number of node embeddings to include in a batch.
targets – a 1D or 2D array of numeric node targets with shape
(len(node_ids),)
or(len(node_ids), target_size)
shuffle (bool) – indicates whether to shuffle the dataset after each epoch
seed (int,optional) – the random seed to use for shuffling the dataset
repeat (bool) – indicates whether iterating through the DataSet will continue infinitely or stop after one full pass.
num_parallel_calls (int) – number of threads to use.
-
num_batch_dims
()[source]¶ Returns the number of batch dimensions in returned tensors (_not_ the batch size itself).
For instance, for full batch methods like GCN, the feature has shape
1 × number of nodes × feature size
, where the 1 is a “dummy” batch dimension andnumber of nodes
is the real batch size (every node in the graph).
-
class
stellargraph.mapper.
HinSAGELinkGenerator
(G, batch_size, num_samples, head_node_types=None, schema=None, seed=None, name=None)[source]¶ A data generator for link prediction with Heterogeneous HinSAGE models
At minimum, supply the StellarGraph, the batch size, and the number of node samples for each layer of the GraphSAGE model.
The supplied graph should be a StellarGraph object with node features for all node types.
Use the
flow()
method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.The generator should be given the
(src,dst)
node types usingIt’s possible to do link prediction on a graph where that link type is completely removed from the graph (e.g., “same_as” links in ER)
See also
Model using this generator:
HinSAGE
.Example using this generator: link prediction.
Related functionality:
UnsupervisedSampler
for unsupervised training using random walksHinSAGENodeGenerator
for node classification and related tasksGraphSAGELinkGenerator
for homogeneous graphsDirectedGraphSAGELinkGenerator
for directed homogeneous graphs
- Parameters
g (StellarGraph) – A machine-learning ready graph.
batch_size (int) – Size of batch of links to return.
num_samples (list) – List of number of neighbour node samples per GraphSAGE layer (hop) to take.
head_node_types (list, optional) – List of the types (str) of the two head nodes forming the node pair. This does not need to be specified if
G
has only one node type.seed (int or str, optional) – Random seed for the sampling methods.
Example:
G_generator = HinSAGELinkGenerator(G, 50, [10,10]) data_gen = G_generator.flow(edge_ids)
-
sample_features
(head_links, batch_num)[source]¶ Sample neighbours recursively from the head nodes, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.
- Parameters
- Returns
A list of the same length as num_samples of collected features from the sampled nodes of shape:
(len(head_nodes), num_sampled_at_layer, feature_size)
wherenum_sampled_at_layer
is the cumulative product of num_samples for that layer.
-
class
stellargraph.mapper.
HinSAGENodeGenerator
(G, batch_size, num_samples, head_node_type=None, schema=None, seed=None, name=None)[source]¶ Keras-compatible data mapper for Heterogeneous GraphSAGE (HinSAGE)
At minimum, supply the StellarGraph, the batch size, and the number of node samples for each layer of the HinSAGE model.
The supplied graph should be a StellarGraph object with node features for all node types.
Use the
flow()
method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.Note that the shuffle argument should be True for training and False for prediction.
See also
Model using this generator:
HinSAGE
.Example using this generator: unsupervised representation learning via Deep Graph Infomax.
Related functionality:
CorruptedGenerator
for unsupervised training usingDeepGraphInfomax
HinSAGELinkGenerator
for link prediction and related tasksGraphSAGENodeGenerator
for homogeneous graphsDirectedGraphSAGENodeGenerator
for directed homogeneous graphs
- Parameters
G (StellarGraph) – The machine-learning ready graph
batch_size (int) – Size of batch to return
num_samples (list) – The number of samples per layer (hop) to take
head_node_type (str, optional) – The node type that will be given to the generator using the flow method, the model will expect this node type. This does not need to be specified if
G
has only one node type.schema (GraphSchema, optional) – Graph schema for G.
seed (int, optional) – Random seed for the node sampler
Example:
G_generator = HinSAGENodeGenerator(G, 50, [10,10]) train_data_gen = G_generator.flow(train_node_ids, train_node_labels) test_data_gen = G_generator.flow(test_node_ids)
-
default_corrupt_input_index_groups
()[source]¶ Optionally returns the indices of input tensors that can be shuffled for
CorruptedGenerator
to use inDeepGraphInfomax
.If this isn’t overridden, this method returns None, indicating that the generator doesn’t have a default or “canonical” set of indices that can be corrupted for Deep Graph Infomax.
-
sample_features
(head_nodes, batch_num)[source]¶ Sample neighbours recursively from the head nodes, collect the features of the sampled nodes, and return these as a list of feature arrays for the GraphSAGE algorithm.
- Parameters
head_nodes – An iterable of head nodes to perform sampling on.
batch_num (int) – Batch number
- Returns
A list of the same length as
num_samples
of collected features from the sampled nodes of shape:(len(head_nodes), num_sampled_at_layer, feature_size)
wherenum_sampled_at_layer
is the cumulative product ofnum_samples
for that layer.
-
class
stellargraph.mapper.
KGTripleGenerator
(G, batch_size)[source]¶ A data generator for working with triple-based knowledge graph models, like ComplEx.
This requires a StellarGraph that contains all nodes/entities and every edge/relation type that will be trained or predicted upon. The graph does not need to contain the edges/triples that are used for training or prediction.
See also
Models using this generator:
ComplEx
,DistMult
,RotatE
,RotE
,RotH
.Example using this generator (see individual models for more): link prediction with ComplEx.
- Parameters
G (StellarGraph) – the graph containing all nodes, and all edge types.
batch_size (int) – the size of the batches to generate
-
flow
(edges, negative_samples=None, sample_strategy='uniform', shuffle=False, seed=None)[source]¶ Create a Keras Sequence yielding the edges/triples in
edges
, potentially with some negative edges.The negative edges are sampled using the “local closed world assumption”, where a source/subject or a target/object is randomly mutated.
- Parameters
edges – the edges/triples to feed into a knowledge graph model.
negative_samples (int, optional) – the number of negative samples to generate for each positive edge.
sample_strategy (str, optional) –
the sampling strategy to use for negative sampling, if
negative_samples
is not None. Supported values:uniform
Uniform sampling, where a negative edge is created from a positive edge in
edges
by replacing the source or destination entity with a uniformly sampled random entity in the graph (without verifying if the edge exists in the graph: for sparse graphs, this is unlikely). Each element in a batch is labelled as 1 (positive) or 0 (negative). An appropriate loss function istensorflow.keras.losses.BinaryCrossentropy
(probably withfrom_logits=True
).self-adversarial
Self-adversarial sampling from [1], where each edge is sampled in the same manner as
uniform
sampling. Each element in a batch is labelled as 1 (positive) or an integer in[0, -batch_size)
(negative). An appropriate loss function isstellargraph.losses.SelfAdversarialNegativeSampling
.[1] Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang, “RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space,” arXiv:1902.10197, Feb. 2019.
- Returns
A Keras sequence that can be passed to the
fit
andpredict
method of knowledge-graph models.
-
num_batch_dims
()[source]¶ Returns the number of batch dimensions in returned tensors (_not_ the batch size itself).
For instance, for full batch methods like GCN, the feature has shape
1 × number of nodes × feature size
, where the 1 is a “dummy” batch dimension andnumber of nodes
is the real batch size (every node in the graph).
-
class
stellargraph.mapper.
Node2VecLinkGenerator
(G, batch_size, name=None)[source]¶ A data generator for context node prediction with Node2Vec models.
At minimum, supply the StellarGraph and the batch size.
The supplied graph should be a StellarGraph object that is ready for machine learning. Currently the model does not require node features for nodes in the graph.
Use the
flow()
method supplying the nodes and targets, or an UnsupervisedSampler instance that generates node samples on demand, to get an object that can be used as a Keras data generator.Example:
G_generator = Node2VecLinkGenerator(G, 50) data_gen = G_generator.flow(edge_ids, edge_labels)
See also
Model using this generator:
Node2Vec
.An example using this generator (see the model for more): unsupervised representation learning.
Related functionality:
Node2VecNodeGenerator
for node classification and related tasks.- Parameters
G (StellarGraph) – A machine-learning ready graph.
batch_size (int) – Size of batch of links to return.
-
sample_features
(head_links, batch_num)[source]¶ Sample the ids of the target and context nodes. and return these as a list of feature arrays for the Node2Vec algorithm.
- Parameters
head_links – An iterable of edges to perform sampling for.
- Returns
A list of feature arrays, with each element being the ids of the sampled target and context node.
-
class
stellargraph.mapper.
Node2VecNodeGenerator
(G, batch_size, name=None)[source]¶ A data generator for node representation prediction with Node2Vec models.
At minimum, supply the StellarGraph and the batch size.
The supplied graph should be a StellarGraph object that is ready for machine learning. Currently the model does not require node features for nodes in the graph.
Use the
flow()
method supplying the nodes to get an object that can be used as a Keras data generator.Example:
G_generator = Node2VecNodeGenerator(G, 50) data_gen = G_generator.flow(node_ids)
See also
Model using this generator:
Node2Vec
.An example using this generator (see the model for more): unsupervised representation learning.
Related functionality:
Node2VecLinkGenerator
for training, link prediction, and related tasks.- Parameters
G (StellarGraph) – The machine-learning ready graph.
batch_size (int) – Size of batch to return.
-
flow
(node_ids)[source]¶ Creates a generator/sequence object for node representation prediction with the supplied node ids. This should be used with a trained
Node2Vec
model in order to transform node ids to node embeddings. For training, seeNode2VecLinkGenerator
instead.The node IDs are the nodes to inference on: the embeddings calculated for these nodes are passed to the downstream task. These are a subset/all of the nodes in the graph.
- Parameters
node_ids – an iterable of node IDs.
- Returns
A NodeSequence object to use with the Node2Vec model in the Keras method
predict
.
-
flow_from_dataframe
(node_ids)[source]¶ Creates a generator/sequence object for node representation prediction by using the index of the supplied dataframe as the node ids.
- Parameters
node_ids – a Pandas DataFrame of node_ids.
- Returns
A NodeSequence object to use with the Node2Vec model in the Keras method
predict
.
-
sample_features
(head_nodes, batch_num)[source]¶ Get the ids of the head nodes, and return these as a list of feature arrays for the Node2Vec algorithm.
- Parameters
head_nodes – An iterable of head nodes to perform sampling on.
- Returns
A list of feature arrays, with each element being the id of each head node.
-
class
stellargraph.mapper.
PaddedGraphGenerator
(graphs, name=None)[source]¶ A data generator for use with graph classification algorithms.
The supplied graphs should be
StellarGraph
objects with node features. Use theflow()
method supplying the graph indexes and (optionally) targets to get an object that can be used as a Keras data generator.This generator supplies the features arrays and the adjacency matrices to a mini-batch Keras graph classification model. Differences in the number of nodes are resolved by padding each batch of features and adjacency matrices, and supplying a boolean mask indicating which are valid and which are padding.
See also
Models using this generator:
GCNSupervisedGraphClassification
,DeepGraphCNN
.Examples using this generator:
- Parameters
-
flow
(graphs, targets=None, symmetric_normalization=True, weighted=False, batch_size=1, name=None, shuffle=False, seed=None)[source]¶ Creates a generator/sequence object for training, evaluation, or prediction with the supplied graph indexes and targets.
- Parameters
graphs (iterable) – an iterable of graph indexes in self.graphs or an iterable of
StellarGraph
objects for the graphs of interest (e.g., training, validation, or test set nodes).targets (2d array, optional) – a 2D array of numeric graph targets with shape
(len(graphs), len(targets))
.symmetric_normalization (bool, optional) – The type of normalization to be applied on the graph adjacency matrices. If True, the adjacency matrix is left and right multiplied by the inverse square root of the degree matrix; otherwise, the adjacency matrix is only left multiplied by the inverse of the degree matrix.
weighted (bool, optional) – if True, use the edge weights from
G
; if False, treat the graph as unweighted.batch_size (int, optional) – The batch size.
name (str, optional) – An optional name for the returned generator object.
shuffle (bool, optional) – If True the node IDs will be shuffled at the end of each epoch.
seed (int, optional) – Random seed to use in the sequence object.
- Returns
A
PaddedGraphSequence
object to use with Keras methodsfit()
,evaluate()
, andpredict()
-
num_batch_dims
()[source]¶ Returns the number of batch dimensions in returned tensors (_not_ the batch size itself).
For instance, for full batch methods like GCN, the feature has shape
1 × number of nodes × feature size
, where the 1 is a “dummy” batch dimension andnumber of nodes
is the real batch size (every node in the graph).
-
class
stellargraph.mapper.
RelationalFullBatchNodeGenerator
(G, name=None, sparse=True, transform=None, weighted=False)[source]¶ A data generator for use with full-batch models on relational graphs e.g. RGCN.
The supplied graph G should be a StellarGraph or StellarDiGraph object with node features. Use the
flow()
method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.This generator will supply the features array and the adjacency matrix to a full-batch Keras graph ML model. There is a choice to supply either a list of sparse adjacency matrices (the default) or a list of dense adjacency matrices, with the sparse argument.
For these algorithms the adjacency matrices require preprocessing and the default option is to normalize each row of the adjacency matrix so that it sums to 1. For customization a transformation (callable) can be passed that operates on the node features and adjacency matrix.
Example:
G_generator = RelationalFullBatchNodeGenerator(G) train_data_gen = G_generator.flow(node_ids, node_targets) # Fetch the data from train_data_gen, and feed into a Keras model: # Alternatively, use the generator itself with model.fit: model.fit(train_gen, epochs=num_epochs, ...)
See also
Model using this generator:
RGCN
.Examples using this generator:
Related generators:
FullBatchNodeGenerator
for graphs with one edge typeCorruptedGenerator
for unsupervised training withDeepGraphInfomax
- Parameters
G (StellarGraph) – a machine-learning StellarGraph-type graph
name (str) – an optional name of the generator
transform (callable) – an optional function to apply on features and adjacency matrix the function takes
(features, Aadj)
as arguments.sparse (bool) – If True (default) a list of sparse adjacency matrices is used, if False a list of dense adjacency matrices is used.
weighted (bool, optional) – if True, use the edge weights from
G
; if False, treat the graph as unweighted.
-
default_corrupt_input_index_groups
()[source]¶ Optionally returns the indices of input tensors that can be shuffled for
CorruptedGenerator
to use inDeepGraphInfomax
.If this isn’t overridden, this method returns None, indicating that the generator doesn’t have a default or “canonical” set of indices that can be corrupted for Deep Graph Infomax.
-
flow
(node_ids, targets=None)[source]¶ Creates a generator/sequence object for training or evaluation with the supplied node ids and numeric targets.
- Parameters
node_ids – and iterable of node ids for the nodes of interest (e.g., training, validation, or test set nodes)
targets – a 2D array of numeric node targets with shape
(len(node_ids), target_size)
- Returns
A NodeSequence object to use with RGCN models in Keras methods
fit()
,evaluate()
, andpredict()
-
num_batch_dims
()[source]¶ Returns the number of batch dimensions in returned tensors (_not_ the batch size itself).
For instance, for full batch methods like GCN, the feature has shape
1 × number of nodes × feature size
, where the 1 is a “dummy” batch dimension andnumber of nodes
is the real batch size (every node in the graph).
-
class
stellargraph.mapper.
SlidingFeaturesNodeGenerator
(G, window_size, batch_size=1)[source]¶ A data generator for a graph containing sequence data, created by sliding windows across the features of each node in a graph.
See also
Model using this generator:
GCN_LSTM
.- Parameters
G (StellarGraph) – a graph instance where the node features are ordered sequence data
window_size (int) – the number of sequence points included in the sliding window.
batch_size (int, optional) – the number of sliding windows to include in each batch.
-
flow
(sequence_iloc_slice, target_distance=None)[source]¶ Create a sequence object for time series prediction within the given section of the node features.
This handles both univariate data (each node has a single associated feature vector) and multivariate data (each node has an associated feature tensor). The features are always sliced and indexed along the first feature axis.
- Parameters
sequence_iloc_slice (slice) –
A slice object of the range of features from which to select windows. A slice object is the object form of
:
within[...]
, e.g.slice(a, b)
is equivalent to thea:b
inv[a:b]
, andslice(None, b)
is equivalent tov[:b]
. As with that slicing, this parameter is inclusive in the start and exclusive in the end.For example, suppose the graph has feature vectors of length 10 and
window_size = 3
:passing in
slice(None, None)
will create 7 windows across all 10 features starting with the features slice0:3
, then1:4
, and so on.passing in
slice(4, 7)
will create just one window, slicing the three elements4:7
.
For training, one might do a train-test split by choosing a boundary and considering everything before that as training data, and everything after, e.g. 80% of the features:
train_end = int(0.8 * sequence_length) train_gen = sliding_generator.flow(slice(None, train_end)) test_gen = sliding_generator.flow(slice(train_end, None))
target_distance (int, optional) –
The distance from the last element of each window to select an element to include as a supervised training target. Note: this always stays within the slice defined by
sequence_iloc_slice
.Continuing the example above: a call like
sliding_generator.flow(slice(4, 9), target_distance=1)
will yield two pairs of window and target:a feature window slicing
4:7
which includes the features at indices 4, 5, 6, and then a target feature at index 7 (distance 1 from the last element of the feature window)a feature window slicing
5:8
and a target feature from index 8.
- Returns
A Keras sequence that yields batches of sliced windows of features, and, optionally, selected target values.
-
num_batch_dims
()[source]¶ Returns the number of batch dimensions in returned tensors (_not_ the batch size itself).
For instance, for full batch methods like GCN, the feature has shape
1 × number of nodes × feature size
, where the 1 is a “dummy” batch dimension andnumber of nodes
is the real batch size (every node in the graph).
Layers and models¶
The layer package contains implementations of popular neural network layers for graph ML as Keras layers
GraphSAGE¶
-
class
stellargraph.layer.
GraphSAGE
(layer_sizes, generator=None, aggregator=None, bias=True, dropout=0.0, normalize='l2', activations=None, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None, bias_initializer='zeros', bias_regularizer=None, bias_constraint=None, n_samples=None, input_dim=None, multiplicity=None)[source]¶ Implementation of the GraphSAGE algorithm of Hamilton et al. with Keras layers. see: http://snap.stanford.edu/graphsage/
The model minimally requires specification of the layer sizes as a list of int corresponding to the feature dimensions for each hidden layer and a generator object.
Different neighbour node aggregators can also be specified with the
aggregator
argument, which should be the aggregator class, eitherMeanAggregator
,MeanPoolingAggregator
,MaxPoolingAggregator
, orAttentionalAggregator
.To use this class as a Keras model, the features and graph should be supplied using the
GraphSAGENodeGenerator
class for node inference models or theGraphSAGELinkGenerator
class for link inference models. The .in_out_tensors method should be used to create a Keras model from the GraphSAGE object.Examples
Creating a two-level GrapSAGE node classification model with hidden node sizes of 8 and 4 and 10 neighbours sampled at each layer using an existing
StellarGraph
object G containing the graph and node features:generator = GraphSAGENodeGenerator(G, batch_size=50, num_samples=[10,10]) gat = GraphSAGE( layer_sizes=[8, 4], activations=["relu","softmax"], generator=generator, ) x_inp, predictions = gat.in_out_tensors()
Note that passing a NodeSequence or LinkSequence object from the generator.flow(…) method as the generator= argument is now deprecated and the base generator object should be passed instead.
See also
Examples using GraphSAGE:
unsupervised representation learning: via random walks, via Deep Graph Infomax
calibrating models: node classification, link prediction
ensemble models: node classification, link prediction
Appropriate data generators:
GraphSAGENodeGenerator
,Neo4jGraphSAGENodeGenerator
,GraphSAGELinkGenerator
.Related models:
DirectedGraphSAGE
for a generalisation to directed graphsHinSAGE
for a generalisation to heterogeneous graphsDeepGraphInfomax
for unsupervised training
Aggregators:
MeanAggregator
,MeanPoolingAggregator
,MaxPoolingAggregator
,AttentionalAggregator
.- Parameters
layer_sizes (list) – Hidden feature dimensions for each layer.
generator (GraphSAGENodeGenerator or GraphSAGELinkGenerator) – If specified n_samples and input_dim will be extracted from this object.
aggregator (class) – The GraphSAGE aggregator to use; defaults to the MeanAggregator.
bias (bool) – If True (default), a bias vector is learnt for each layer.
dropout (float) – The dropout supplied to each layer; defaults to no dropout.
normalize (str or None) – The normalization used after each layer; defaults to L2 normalization.
activations (list) – Activations applied to each layer’s output; defaults to
['relu', ..., 'relu', 'linear']
.kernel_initializer (str or func, optional) – The initialiser to use for the weights of each layer.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights of each layer.
kernel_constraint (str or func, optional) – The constraint to use for the weights of each layer.
bias_initializer (str or func, optional) – The initialiser to use for the bias of each layer.
bias_regularizer (str or func, optional) – The regulariser to use for the bias of each layer.
bias_constraint (str or func, optional) – The constraint to use for the bias of each layer.
n_samples (list, optional) – The number of samples per layer in the model.
input_dim (int, optional) – The dimensions of the node features used as input to the model.
multiplicity (int, optional) – The number of nodes to process at a time. This is 1 for a node inference and 2 for link inference (currently no others are supported).
Note
The values for
n_samples
,input_dim
, andmultiplicity
are obtained from the providedgenerator
by default. The additional keyword arguments for these parameters provide an alternative way to specify them if a generator cannot be supplied.-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
(multiplicity=None)[source]¶ Builds a GraphSAGE model for node or link/node pair prediction, depending on the generator used to construct the model (whether it is a node or link/node pair generator).
- Returns
(x_inp, x_out)
, wherex_inp
is a list of Keras input tensorsfor the specified GraphSAGE model (either node or link/node pair model) and
x_out
contains model output tensor(s) of shape (batch_size, layer_sizes[-1])
- Return type
-
link_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
node_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
class
stellargraph.layer.
DirectedGraphSAGE
(layer_sizes, generator=None, aggregator=None, bias=True, dropout=0.0, normalize='l2', activations=None, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None, bias_initializer='zeros', bias_regularizer=None, bias_constraint=None, n_samples=None, input_dim=None, multiplicity=None)[source]¶ Implementation of a directed version of the GraphSAGE algorithm of Hamilton et al. with Keras layers. see: http://snap.stanford.edu/graphsage/
The model minimally requires specification of the layer sizes as a list of int corresponding to the feature dimensions for each hidden layer and a generator object.
Different neighbour node aggregators can also be specified with the
aggregator
argument, which should be the aggregator class, eitherMeanAggregator
,MeanPoolingAggregator
,MaxPoolingAggregator
, orAttentionalAggregator
.See also
Examples using Directed GraphSAGE:
Appropriate data generators:
DirectedGraphSAGENodeGenerator
,Neo4jDirectedGraphSAGENodeGenerator
,DirectedGraphSAGELinkGenerator
.Related models:
Aggregators:
MeanAggregator
,MeanPoolingAggregator
,MaxPoolingAggregator
,AttentionalAggregator
.- Parameters
layer_sizes (list) – Hidden feature dimensions for each layer.
generator (DirectedGraphSAGENodeGenerator) – If specified n_samples and input_dim will be extracted from this object.
aggregator (class, optional) – The GraphSAGE aggregator to use; defaults to the MeanAggregator.
bias (bool, optional) – If True (default), a bias vector is learnt for each layer.
dropout (float, optional) – The dropout supplied to each layer; defaults to no dropout.
normalize (str, optional) – The normalization used after each layer; defaults to L2 normalization.
kernel_initializer (str or func, optional) – The initialiser to use for the weights of each layer.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights of each layer.
kernel_constraint (str or func, optional) – The constraint to use for the weights of each layer.
bias_initializer (str or func, optional) – The initialiser to use for the bias of each layer.
bias_regularizer (str or func, optional) – The regulariser to use for the bias of each layer.
bias_constraint (str or func, optional) – The constraint to use for the bias of each layer.
- Notes::
If a generator is not specified, then additional keyword arguments must be supplied:
in_samples (list): The number of in-node samples per layer in the model.
out_samples (list): The number of out-node samples per layer in the model.
input_dim (int): The dimensions of the node features used as input to the model.
multiplicity (int): The number of nodes to process at a time. This is 1 for a node inference and 2 for link inference (currently no others are supported).
Passing a NodeSequence or LinkSequence object from the generator.flow(…) method as the generator= argument is now deprecated and the base generator object should be passed instead.
-
class
stellargraph.layer.
MeanAggregator
(*args, **kwargs)[source]¶ Mean Aggregator for GraphSAGE implemented with Keras base layer
- Parameters
-
group_aggregate
(x_group, group_idx=0)[source]¶ Mean aggregator for tensors over the neighbourhood for each group.
- Parameters
x_group (tf.Tensor) – : The input tensor representing the sampled neighbour nodes.
group_idx (int, optional) – Group index.
- Returns
A tensor aggregation of the input nodes features.
- Return type
tensorflow.Tensor
-
class
stellargraph.layer.
MeanPoolingAggregator
(*args, **kwargs)[source]¶ Mean Pooling Aggregator for GraphSAGE implemented with Keras base layer
Implements the aggregator of Eq. (3) in Hamilton et al. (2017), with max pooling replaced with mean pooling
- Parameters
-
group_aggregate
(x_group, group_idx=0)[source]¶ Aggregates the group tensors by mean-pooling of neighbours
- Parameters
x_group (tf.Tensor) – : The input tensor representing the sampled neighbour nodes.
group_idx (int, optional) – Group index.
- Returns
A tensor aggregation of the input nodes features.
- Return type
tensorflow.Tensor
-
class
stellargraph.layer.
MaxPoolingAggregator
(*args, **kwargs)[source]¶ Max Pooling Aggregator for GraphSAGE implemented with Keras base layer
Implements the aggregator of Eq. (3) in Hamilton et al. (2017)
- Parameters
-
group_aggregate
(x_group, group_idx=0)[source]¶ Aggregates the group tensors by max-pooling of neighbours
- Parameters
x_group (tf.Tensor) – : The input tensor representing the sampled neighbour nodes.
group_idx (int, optional) – Group index.
- Returns
A tensor aggregation of the input nodes features.
- Return type
tensorflow.Tensor
-
class
stellargraph.layer.
AttentionalAggregator
(*args, **kwargs)[source]¶ Attentional Aggregator for GraphSAGE implemented with Keras base layer
Implements the aggregator of Veličković et al. “Graph Attention Networks” ICLR 2018
- Parameters
-
calculate_group_sizes
(input_shape)[source]¶ Calculates the output size for each input group.
- The results are stored in two variables:
self.included_weight_groups: if the corresponding entry is True then the input group is valid and should be used.
self.weight_sizes: the size of the output from this group.
The AttentionalAggregator is implemented to not use the first (head node) group. This makes the implementation different from other aggregators.
- Parameters
input_shape (list of list of int) – Shape of input tensors for self and neighbour features
-
call
(inputs, **kwargs)[source]¶ Apply aggregator on the input tensors, inputs
- Parameters
inputs (List[Tensor]) – Tensors giving self and neighbour features x[0]: self Tensor (batch_size, head size, feature_size) x[k>0]: group Tensors for neighbourhood (batch_size, head size, neighbours, feature_size)
- Returns
Keras Tensor representing the aggregated embeddings in the input.
HinSAGE¶
-
class
stellargraph.layer.
HinSAGE
(layer_sizes, generator=None, aggregator=None, bias=True, dropout=0.0, normalize='l2', activations=None, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None, bias_initializer='zeros', bias_regularizer=None, bias_constraint=None, n_samples=None, input_neighbor_tree=None, input_dim=None, multiplicity=None)[source]¶ Implementation of the GraphSAGE algorithm extended for heterogeneous graphs with Keras layers.
To use this class as a Keras model, the features and graph should be supplied using the
HinSAGENodeGenerator
class for node inference models or theHinSAGELinkGenerator
class for link inference models. The .in_out_tensors method should be used to create a Keras model from the GraphSAGE object.Currently the class supports node or link prediction models which are built depending on whether a HinSAGENodeGenerator or HinSAGELinkGenerator object is specified. The models are built for a single node or link type. For example if you have nodes of types ‘A’ and ‘B’ you can build a link model for only a single pair of node types, for example (‘A’, ‘B’), which should be specified in the HinSAGELinkGenerator.
If you feed links into the model that do not have these node types (in correct order) an error will be raised.
Examples
Creating a two-level GrapSAGE node classification model on nodes of type ‘A’ with hidden node sizes of 8 and 4 and 10 neighbours sampled at each layer using an existing
StellarGraph
object G containing the graph and node features:generator = HinSAGENodeGenerator( G, batch_size=50, num_samples=[10,10], head_node_type='A' ) gat = HinSAGE( layer_sizes=[8, 4], activations=["relu","softmax"], generator=generator, ) x_inp, predictions = gat.in_out_tensors()
Creating a two-level GrapSAGE link classification model on nodes pairs of type (‘A’, ‘B’) with hidden node sizes of 8 and 4 and 5 neighbours sampled at each layer:
generator = HinSAGELinkGenerator( G, batch_size=50, num_samples=[5,5], head_node_types=('A','B') ) gat = HinSAGE( layer_sizes=[8, 4], activations=["relu","softmax"], generator=generator, ) x_inp, predictions = gat.in_out_tensors()
Note that passing a NodeSequence or LinkSequence object from the generator.flow(…) method as the generator= argument is now deprecated and the base generator object should be passed instead.
See also
Examples using HinSAGE:
Appropriate data generators:
HinSAGENodeGenerator
,HinSAGELinkGenerator
.Related models:
GraphSAGE
for homogeneous graphsDirectedGraphSAGE
for homogeneous directed graphsDeepGraphInfomax
for unsupervised training
Aggregators:
MeanHinAggregator
.The Heterogeneous GraphSAGE (HinSAGE) explanatory document has more theoretical details.
- Parameters
layer_sizes (list) – Hidden feature dimensions for each layer
generator (HinSAGENodeGenerator or HinSAGELinkGenerator) – If specified, required model arguments such as the number of samples will be taken from the generator object. See note below.
aggregator (HinSAGEAggregator) – The HinSAGE aggregator to use; defaults to the MeanHinAggregator.
bias (bool) – If True (default), a bias vector is learnt for each layer.
dropout (float) – The dropout supplied to each layer; defaults to no dropout.
normalize (str) – The normalization used after each layer; defaults to L2 normalization.
activations (list) – Activations applied to each layer’s output; defaults to
['relu', ..., 'relu', 'linear']
.kernel_initializer (str or func, optional) – The initialiser to use for the weights of each layer.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights of each layer.
kernel_constraint (str or func, optional) – The constraint to use for the weights of each layer.
bias_initializer (str or func, optional) – The initialiser to use for the bias of each layer.
bias_regularizer (str or func, optional) – The regulariser to use for the bias of each layer.
bias_constraint (str or func, optional) – The constraint to use for the bias of each layer.
n_samples (list, optional) – The number of samples per layer in the model.
input_neighbor_tree (list of tuple, optional) – A list of (node_type, [children]) tuples that specify the subtree to be created by the HinSAGE model.
input_dim (dict, optional) – The input dimensions for each node type as a dictionary of the form
{node_type: feature_size}
.multiplicity (int, optional) – The number of nodes to process at a time. This is 1 for a node inference and 2 for link inference (currently no others are supported).
Note
The values for
n_samples
,input_neighbor_tree
,input_dim
, andmultiplicity
are obtained from the providedgenerator
by default. The additional keyword arguments for these parameters provide an alternative way to specify them if a generator cannot be supplied.-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
()[source]¶ Builds a HinSAGE model for node or link/node pair prediction, depending on the generator used to construct the model (whether it is a node or link/node pair generator).
- Returns
(x_inp, x_out)
, wherex_inp
is a list of Keras input tensorsfor the specified HinSAGE model (either node or link/node pair model) and
x_out
contains model output tensor(s) of shape (batch_size, layer_sizes[-1]).
- Return type
-
class
stellargraph.layer.
MeanHinAggregator
(*args, **kwargs)[source]¶ Mean Aggregator for HinSAGE implemented with Keras base layer
- Parameters
output_dim (int) – Output dimension
bias (bool) – Use bias in layer or not (Default False)
act (Callable or str) – name of the activation function to use (must be a Keras activation function), or alternatively, a TensorFlow operation.
kernel_initializer (str or func) – The initialiser to use for the weights
kernel_regularizer (str or func) – The regulariser to use for the weights
kernel_constraint (str or func) – The constraint to use for the weights
bias_initializer (str or func) – The initialiser to use for the bias
bias_regularizer (str or func) – The regulariser to use for the bias
bias_constraint (str or func) – The constraint to use for the bias
-
build
(input_shape)[source]¶ Builds layer
- Parameters
input_shape (list of list of int) – Shape of input per neighbour type.
-
call
(x, **kwargs)[source]¶ Apply MeanAggregation on input tensors, x
- Parameters
x –
List of Keras Tensors with the following elements
x[0]: tensor of self features shape (n_batch, n_head, n_feat)
x[1+r]: tensors of neighbour features each of shape (n_batch, n_head, n_neighbour[r], n_feat[r])
- Returns
Keras Tensor representing the aggregated embeddings in the input.
Node2Vec¶
-
class
stellargraph.layer.
Node2Vec
(emb_size, generator=None, node_num=None, multiplicity=None)[source]¶ Implementation of the Node2Vec algorithm of A. Grover and J. Leskovec with Keras layers. see: https://snap.stanford.edu/node2vec/
The model minimally requires specification of the embedding size and a generator object.
See also
Examples using Node2Vec:
using Gensim Word2Vec, not this class: node classification, node classification with edge weights, link prediction, unsupervised representation learning.
Appropriate data generators:
Node2VecNodeGenerator
,Node2VecLinkGenerator
.Related functionality:
BiasedRandomWalk
does the underlying random walks.- Parameters
emb_size (int) – The dimension of node embeddings.
generator (Sequence) – A NodeSequence or LinkSequence.
node_num (int, optional) – The number of nodes in the given graph.
multiplicity (int, optional) – The number of nodes to process at a time. This is 1 for a node inference and 2 for link inference (currently no others are supported).
-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
(multiplicity=None)[source]¶ Builds a Node2Vec model for node or link/node pair prediction, depending on the generator used to construct the model (whether it is a node or link/node pair generator).
- Returns
(x_inp, x_out)
, wherex_inp
contains Keras input tensor(s)for the specified Node2Vec model (either node or link/node pair model) and
x_out
contains model output tensor(s) of shape (batch_size, self.emb_size)
- Return type
-
link_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
node_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
Attri2Vec¶
-
class
stellargraph.layer.
Attri2Vec
(layer_sizes, generator=None, bias=False, activation='sigmoid', normalize=None, input_dim=None, node_num=None, multiplicity=None)[source]¶ Implementation of the attri2vec algorithm of Zhang et al. with Keras layers. see: https://arxiv.org/abs/1901.04095.
The model minimally requires specification of the layer sizes as a list of int corresponding to the feature dimensions for each hidden layer and a generator object.
See also
Examples using Attri2Vec:
Appropriate data generators:
Attri2VecNodeGenerator
,Attri2VecLinkGenerator
.- Parameters
layer_sizes (list) – Hidden feature dimensions for each layer.
generator (Sequence) – A NodeSequence or LinkSequence.
bias (bool) – If True a bias vector is learnt for each layer in the attri2vec model, default to False.
activation (str) – The activation function of each layer in the attri2vec model, which takes values from
linear
,relu
andsigmoid
(default).normalize ("l2" or None) – The normalization used after each layer, default to None.
input_dim (int, optional) – The dimensions of the node features used as input to the model.
node_num (int, optional) – The number of nodes in the given graph.
multiplicity (int, optional) – The number of nodes to process at a time. This is 1 for a node inference and 2 for link inference (currently no others are supported).
Note
The values for
input_dim
,node_num
, andmultiplicity
are obtained from the providedgenerator
by default. The additional keyword arguments for these parameters provide an alternative way to specify them if a generator cannot be supplied.-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
(multiplicity=None)[source]¶ Builds a Attri2Vec model for node or link/node pair prediction, depending on the generator used to construct the model (whether it is a node or link/node pair generator).
- Returns
(x_inp, x_out)
, wherex_inp
is a list of Keras input tensorsfor the specified Attri2Vec model (either node or link/node pair model) and
x_out
contains model output tensor(s) of shape(batch_size, layer_sizes[-1])
- Return type
-
link_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
node_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
GCN¶
-
class
stellargraph.layer.
GCN
(layer_sizes, generator, bias=True, dropout=0.0, activations=None, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None, bias_initializer='zeros', bias_regularizer=None, bias_constraint=None, squeeze_output_batch=True)[source]¶ A stack of Graph Convolutional layers that implement a graph convolution network model as in https://arxiv.org/abs/1609.02907
The model minimally requires specification of the layer sizes as a list of int corresponding to the feature dimensions for each hidden layer, activation functions for each hidden layers, and a generator object.
To use this class as a Keras model, the features and preprocessed adjacency matrix should be supplied using:
the
FullBatchNodeGenerator
class for node inferencethe
ClusterNodeGenerator
class for scalable/inductive node inference using the Cluster-GCN training procedure (https://arxiv.org/abs/1905.07953)the
FullBatchLinkGenerator
class for link inference
To have the appropriate preprocessing the generator object should be instantiated with the
method='gcn'
argument.Note that currently the GCN class is compatible with both sparse and dense adjacency matrices and the
FullBatchNodeGenerator
will default to sparse.Example
Creating a GCN node classification model from an existing
StellarGraph
objectG
:generator = FullBatchNodeGenerator(G, method="gcn") gcn = GCN( layer_sizes=[32, 4], activations=["elu","softmax"], generator=generator, dropout=0.5 ) x_inp, predictions = gcn.in_out_tensors()
Notes
The inputs are tensors with a batch dimension of 1. These are provided by the
FullBatchNodeGenerator
object.This assumes that the normalized Laplacian matrix is provided as input to Keras methods. When using the
FullBatchNodeGenerator
specify themethod='gcn'
argument to do this preprocessing.The nodes provided to the
FullBatchNodeGenerator.flow()
method are used by the final layer to select the predictions for those nodes in order. However, the intermediate layers before the final layer order the nodes in the same way as the adjacency matrix.
See also
Examples using GCN:
Appropriate data generators:
FullBatchNodeGenerator
,FullBatchLinkGenerator
,ClusterNodeGenerator
.Related models:
Other full-batch models: see the documentation of
FullBatchNodeGenerator
for a full listRGCN
for a generalisation to multiple edge typesGCNSupervisedGraphClassification
for graph classification by pooling the output of GCNGCN_LSTM
for time-series and sequence prediction, incorporating the graph structure via GCNDeepGraphInfomax
for unsupervised training
GraphConvolution
is the base layer out of which a GCN model is built.- Parameters
layer_sizes (list of int) – Output sizes of GCN layers in the stack.
generator (FullBatchNodeGenerator) – The generator instance.
bias (bool) – If True, a bias vector is learnt for each layer in the GCN model.
dropout (float) – Dropout rate applied to input features of each GCN layer.
activations (list of str or func) – Activations applied to each layer’s output; defaults to
['relu', ..., 'relu']
.kernel_initializer (str or func, optional) – The initialiser to use for the weights of each layer.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights of each layer.
kernel_constraint (str or func, optional) – The constraint to use for the weights of each layer.
bias_initializer (str or func, optional) – The initialiser to use for the bias of each layer.
bias_regularizer (str or func, optional) – The regulariser to use for the bias of each layer.
bias_constraint (str or func, optional) – The constraint to use for the bias of each layer.
squeeze_output_batch (bool, optional) – if True, remove the batch dimension when the batch size is 1. If False, leave the batch dimension.
-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
(multiplicity=None)[source]¶ Builds a GCN model for node or link prediction
- Returns
(x_inp, x_out)
, wherex_inp
is a list of Keras/TensorFlowinput tensors for the GCN model and
x_out
is a tensor of the GCN model output.
- Return type
-
link_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
node_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
class
stellargraph.layer.
GraphConvolution
(*args, **kwargs)[source]¶ Graph Convolution (GCN) Keras layer. The implementation is based on https://github.com/tkipf/keras-gcn.
Original paper: Semi-Supervised Classification with Graph Convolutional Networks. Thomas N. Kipf, Max Welling, International Conference on Learning Representations (ICLR), 2017 https://github.com/tkipf/gcn
Notes
The batch axis represents independent graphs to be convolved with this GCN kernel (for instance, for full-batch node prediction on a single graph, its dimension should be 1).
If the adjacency matrix is dense, both it and the features should have a batch axis, with equal batch dimension.
If the adjacency matrix is sparse, it should not have a batch axis, and the batch dimension of the features must be 1.
There are two inputs required, the node features, and the normalized graph Laplacian matrix
This class assumes that the normalized Laplacian matrix is passed as input to the Keras methods.
See also
GCN
combines several of these layers.- Parameters
units (int) – dimensionality of output feature vectors
activation (str or func) – nonlinear activation applied to layer’s output to obtain output features
use_bias (bool) – toggles an optional bias
final_layer (bool) – Deprecated, use
tf.gather
orGatherIndices
kernel_initializer (str or func, optional) – The initialiser to use for the weights.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights.
kernel_constraint (str or func, optional) – The constraint to use for the weights.
bias_initializer (str or func, optional) – The initialiser to use for the bias.
bias_regularizer (str or func, optional) – The regulariser to use for the bias.
bias_constraint (str or func, optional) – The constraint to use for the bias.
-
build
(input_shapes)[source]¶ Builds the layer
- Parameters
input_shapes (list of int) – shapes of the layer’s inputs (node features and adjacency matrix)
-
call
(inputs)[source]¶ Applies the layer.
- Parameters
inputs (list) – a list of 3 input tensors that includes node features (size 1 x N x F), graph adjacency matrix (size N x N), where N is the number of nodes in the graph, and F is the dimensionality of node features.
- Returns
Keras Tensor that represents the output of the layer.
Cluster-GCN¶
-
class
stellargraph.layer.
ClusterGCN
(layer_sizes, activations, generator, bias=True, dropout=0.0, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None, bias_initializer='zeros', bias_regularizer=None, bias_constraint=None)[source]¶ Deprecated: use
stellargraph.layer.GCN
withstellargraph.mapper.ClusterNodeGenerator
.
-
class
stellargraph.layer.
ClusterGraphConvolution
(*args, **kwargs)[source]¶ Deprecated: use
GraphConvolution
.
RGCN¶
-
class
stellargraph.layer.
RGCN
(layer_sizes, generator, bias=True, num_bases=0, dropout=0.0, activations=None, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None, bias_initializer='zeros', bias_regularizer=None, bias_constraint=None)[source]¶ A stack of Relational Graph Convolutional layers that implement a relational graph convolution neural network model as in https://arxiv.org/pdf/1703.06103.pdf
The model minimally requires specification of the layer sizes as a list of int corresponding to the feature dimensions for each hidden layer, activation functions for each hidden layers, and a generator object.
To use this class as a Keras model, the features and preprocessed adjacency matrix should be supplied using the
RelationalFullBatchNodeGenerator
class. The generator object should be instantiated as follows:generator = RelationalFullBatchNodeGenerator(G)
Note that currently the RGCN class is compatible with both sparse and dense adjacency matrices and the
RelationalFullBatchNodeGenerator
will default to sparse.Notes
The inputs are tensors with a batch dimension of 1. These are provided by the
RelationalFullBatchNodeGenerator
object.The nodes provided to the
RelationalFullBatchNodeGenerator.flow()
method are used by the final layer to select the predictions for those nodes in order. However, the intermediate layers before the final layer order the nodes in the same way as the adjacency matrix.
Examples
Creating a RGCN node classification model from an existing
StellarGraph
objectG
:generator = RelationalFullBatchNodeGenerator(G) rgcn = RGCN( layer_sizes=[32, 4], activations=["elu","softmax"], bases=10, generator=generator, dropout=0.5 ) x_inp, predictions = rgcn.in_out_tensors()
See also
Examples using RGCN:
Appropriate data generator:
RelationalFullBatchNodeGenerator
.Related model:
GCN
is a specialisation for a single edge type.RelationalGraphConvolution
is the base layer out of which an RGCN model is built.- Parameters
layer_sizes (list of int) – Output sizes of RGCN layers in the stack.
generator (RelationalFullBatchNodeGenerator) – The generator instance.
num_bases (int) – Specifies number of basis matrices to use for the weight matrices of the RGCN layer as in the paper. Defaults to 0 which specifies that no basis decomposition is used.
bias (bool) – If True, a bias vector is learnt for each layer in the RGCN model.
dropout (float) – Dropout rate applied to input features of each RGCN layer.
activations (list of str or func) – Activations applied to each layer’s output; defaults to
['relu', ..., 'relu']
.kernel_initializer (str or func, optional) – The initialiser to use for the weights of each layer.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights of each layer.
kernel_constraint (str or func, optional) – The constraint to use for the weights of each layer.
bias_initializer (str or func, optional) – The initialiser to use for the bias of each layer.
bias_regularizer (str or func, optionalx) – The regulariser to use for the bias of each layer.
bias_constraint (str or func, optional) – The constraint to use for the bias of each layer.
-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
(multiplicity=None)[source]¶ Builds a RGCN model for node prediction. Link/node pair prediction will added in the future.
- Returns
(x_inp, x_out)
, wherex_inp
is a list of Keras input tensorsfor the specified RGCN model and
x_out
contains model output tensor(s) of shape(batch_size, layer_sizes[-1])
- Return type
-
class
stellargraph.layer.
RelationalGraphConvolution
(*args, **kwargs)[source]¶ Relational Graph Convolution (RGCN) Keras layer.
Original paper: Modeling Relational Data with Graph Convolutional Networks. Thomas N. Kipf, Michael Schlichtkrull (2017). https://arxiv.org/pdf/1703.06103.pdf
Notes
The inputs are tensors with a batch dimension of 1: Keras requires this batch dimension, and for full-batch methods we only have a single “batch”.
There are 1 + R inputs required (where R is the number of relationships): the node features, and a normalized adjacency matrix for each relationship
See also
RGCN
combines several of these layers.- Parameters
units (int) – dimensionality of output feature vectors
num_relationships (int) – the number of relationships in the graph
num_bases (int) – the number of basis matrices to use for parameterizing the weight matrices as described in the paper; defaults to 0.
num_bases < 0
triggers the default behaviour ofnum_bases = 0
activation (str or func) – nonlinear activation applied to layer’s output to obtain output features
use_bias (bool) – toggles an optional bias
final_layer (bool) – Deprecated, use
tf.gather
orGatherIndices
kernel_initializer (str or func) – The initialiser to use for the self kernel and also relational kernels if
num_bases=0
.kernel_regularizer (str or func) – The regulariser to use for the self kernel and also relational kernels if
num_bases=0
.kernel_constraint (str or func) – The constraint to use for the self kernel and also relational kernels if
num_bases=0
.basis_initializer (str or func) – The initialiser to use for the basis matrices.
basis_regularizer (str or func) – The regulariser to use for the basis matrices.
basis_constraint (str or func) – The constraint to use for the basis matrices.
coefficient_initializer (str or func) – The initialiser to use for the coefficients.
coefficient_regularizer (str or func) – The regulariser to use for the coefficients.
coefficient_constraint (str or func) – The constraint to use for the coefficients.
bias_initializer (str or func) – The initialiser to use for the bias.
bias_regularizer (str or func) – The regulariser to use for the bias.
bias_constraint (str or func) – The constraint to use for the bias.
input_dim (int, optional) – the size of the input shape, if known.
kwargs – any additional arguments to pass to
tensorflow.keras.layers.Layer
-
build
(input_shapes)[source]¶ Builds the layer
- Parameters
input_shapes (list of int) – shapes of the layer’s inputs
features, node_indices, and adjacency matrices) ((node) –
-
call
(inputs)[source]¶ Applies the layer.
- Parameters
inputs (list) – a list of 2 + R input tensors that includes node features (size 1 x N x F), and a graph adjacency matrix (size N x N) for each relationship. R is the number of relationships in the graph (edge type), N is the number of nodes in the graph, and F is the dimensionality of node features.
- Returns
Keras Tensor that represents the output of the layer.
PPNP¶
-
class
stellargraph.layer.
PPNP
(layer_sizes, generator, activations, bias=True, dropout=0.0, kernel_regularizer=None)[source]¶ Implementation of Personalized Propagation of Neural Predictions (PPNP) as in https://arxiv.org/abs/1810.05997.
The model minimally requires specification of the fully connected layer sizes as a list of int corresponding to the feature dimensions for each hidden layer, activation functions for each hidden layers, and a generator object.
To use this class as a Keras model, the features and preprocessed adjacency matrix should be supplied using the
FullBatchNodeGenerator
class. To have the appropriate preprocessing the generator object should be instantiated as follows:generator = FullBatchNodeGenerator(G, method="ppnp")
Notes
The inputs are tensors with a batch dimension of 1. These are provided by the
FullBatchNodeGenerator
object.This assumes that the personalized page rank matrix is provided as input to Keras methods. When using the
FullBatchNodeGenerator
specify themethod='ppnp'
argument to do this preprocessing.method='ppnp'
requires thatuse_sparse=False
and generates a dense personalized page rank matrixThe nodes provided to the
FullBatchNodeGenerator.flow()
method are used by the final layer to select the predictions for those nodes in order. However, the intermediate layers before the final layer order the nodes in the same way as the adjacency matrix.The size of the final fully connected layer must be equal to the number of classes to predict.
See also
Example using PPNP: node classification.
Appropriate data generators:
FullBatchNodeGenerator
,FullBatchLinkGenerator
.PPNPPropagationLayer
is the base layer out of which a PPNP model is built.- Parameters
layer_sizes (list of int) – list of output sizes of fully connected layers in the stack
activations (list of str) – list of activations applied to each fully connected layer’s output
generator (FullBatchNodeGenerator) – an instance of FullBatchNodeGenerator class constructed on the graph of interest
bias (bool) – toggles an optional bias in fully connected layers
dropout (float) – dropout rate applied to input features of each layer
kernel_regularizer (str) – normalization applied to the kernels of fully connected layers
-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
(multiplicity=None)[source]¶ Builds a PPNP model for node or link prediction
- Returns
(x_inp, x_out)
, wherex_inp
is a list of Keras/TensorFlowinput tensors for the model and
x_out
is a tensor of the model output.
- Return type
-
link_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
node_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
class
stellargraph.layer.
PPNPPropagationLayer
(*args, **kwargs)[source]¶ Implementation of Personalized Propagation of Neural Predictions (PPNP) as in https://arxiv.org/abs/1810.05997.
Notes
The inputs are tensors with a batch dimension of 1: Keras requires this batch dimension, and for full-batch methods we only have a single “batch”.
There are two inputs required, the node features, and the graph personalized page rank matrix
This class assumes that the personalized page rank matrix (specified in paper) matrix is passed as input to the Keras methods.
See also
PPNP
combines several of these layers.- Parameters
-
build
(input_shapes)[source]¶ Builds the layer
- Parameters
input_shapes (list of int) – shapes of the layer’s inputs (node features and adjacency matrix)
-
call
(inputs)[source]¶ Applies the layer.
- Parameters
inputs (list) – a list of 3 input tensors that includes node features (size 1 x N x F), graph personalized page rank matrix (size N x N), where N is the number of nodes in the graph, and F is the dimensionality of node features.
- Returns
Keras Tensor that represents the output of the layer.
APPNP¶
-
class
stellargraph.layer.
APPNP
(layer_sizes, generator, activations, bias=True, dropout=0.0, teleport_probability=0.1, kernel_regularizer=None, approx_iter=10)[source]¶ Implementation of Approximate Personalized Propagation of Neural Predictions (APPNP) as in https://arxiv.org/abs/1810.05997.
The model minimally requires specification of the fully connected layer sizes as a list of int corresponding to the feature dimensions for each hidden layer, activation functions for each hidden layers, and a generator object.
To use this class as a Keras model, the features and preprocessed adjacency matrix should be supplied using:
the
FullBatchNodeGenerator
class for node inferencethe
ClusterNodeGenerator
class for scalable/inductive node inference using the Cluster-GCN training procedure (https://arxiv.org/abs/1905.07953)the
FullBatchLinkGenerator
class for link inference
To have the appropriate preprocessing the generator object should be instantiated with the method=’gcn’ argument.
Example
Building an APPNP node model:
generator = FullBatchNodeGenerator(G, method="gcn") ppnp = APPNP( layer_sizes=[64, 64, 1], activations=['relu', 'relu', 'relu'], generator=generator, dropout=0.5 ) x_in, x_out = ppnp.in_out_tensors()
Notes
The inputs are tensors with a batch dimension of 1. These are provided by the
FullBatchNodeGenerator
object.This assumes that the normalized Laplacian matrix is provided as input to Keras methods. When using the
FullBatchNodeGenerator
specify themethod='gcn'
argument to do this preprocessing.The nodes provided to the
FullBatchNodeGenerator.flow()
method are used by the final layer to select the predictions for those nodes in order. However, the intermediate layers before the final layer order the nodes in the same way as the adjacency matrix.The size of the final fully connected layer must be equal to the number of classes to predict.
See also
Example using APPNP: node classification.
Appropriate data generators:
FullBatchNodeGenerator
,FullBatchLinkGenerator
,ClusterNodeGenerator
.APPNPPropagationLayer
is the base layer out of which an APPNP model is built.- Parameters
layer_sizes (list of int) – list of output sizes of fully connected layers in the stack
activations (list of str) – list of activations applied to each fully connected layer’s output
generator (FullBatchNodeGenerator) – an instance of FullBatchNodeGenerator class constructed on the graph of interest
bias (bool) – toggles an optional bias in fully connected layers
dropout (float) – dropout rate applied to input features of each layer
kernel_regularizer (str) – normalization applied to the kernels of fully connected layers
teleport_probability – “probability” of returning to the starting node in the propagation step as described in
paper (the) –
approx_iter – number of iterations to approximate PPNP as described in the paper (K in the paper)
-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
(multiplicity=None)[source]¶ Builds an APPNP model for node or link prediction
- Returns
(x_inp, x_out)
, wherex_inp
is a list of Keras/TensorFlowinput tensors for the model and
x_out
is a tensor of the model output.
- Return type
-
link_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
node_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
propagate_model
(base_model)[source]¶ Propagates a trained model using personalised PageRank.
- Parameters
base_model (keras Model) – trained model with node features as input, predicted classes as output
- Returns
(x_inp, x_out)
, wherex_inp
is a list of two Keras input tensorsfor the APPNP model (containing node features and graph adjacency), and
x_out
is a Keras tensor for the APPNP model output.
- Return type
-
class
stellargraph.layer.
APPNPPropagationLayer
(*args, **kwargs)[source]¶ Implementation of Approximate Personalized Propagation of Neural Predictions (PPNP) as in https://arxiv.org/abs/1810.05997.
Notes
The inputs are tensors with a batch dimension of 1: Keras requires this batch dimension, and for full-batch methods we only have a single “batch”.
There are two inputs required, the node features, and the normalized graph Laplacian matrix
This class assumes that the normalized Laplacian matrix is passed as input to the Keras methods.
See also
APPNP
combines several of these layers.- Parameters
units (int) – dimensionality of output feature vectors
final_layer (bool) – Deprecated, use
tf.gather
orGatherIndices
teleport_probability – “probability” of returning to the starting node in the propagation step as described in
paper (the) –
input_dim (int, optional) – the size of the input shape, if known.
kwargs – any additional arguments to pass to
tensorflow.keras.layers.Layer
-
build
(input_shapes)[source]¶ Builds the layer
- Parameters
input_shapes (list of int) – shapes of the layer’s inputs (node features and adjacency matrix)
-
call
(inputs)[source]¶ Applies the layer.
- Parameters
inputs (list) – a list of 3 input tensors that includes propagated node features (size 1 x N x F), node features (size 1 x N x F), graph adjacency matrix (size N x N), where N is the number of nodes in the graph, and F is the dimensionality of node features.
- Returns
Keras Tensor that represents the output of the layer.
GAT¶
-
class
stellargraph.layer.
GAT
(layer_sizes, generator=None, attn_heads=1, attn_heads_reduction=None, bias=True, in_dropout=0.0, attn_dropout=0.0, normalize=None, activations=None, saliency_map_support=False, multiplicity=1, num_nodes=None, num_features=None, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None, bias_initializer='zeros', bias_regularizer=None, bias_constraint=None, attn_kernel_initializer='glorot_uniform', attn_kernel_regularizer=None, attn_kernel_constraint=None)[source]¶ A stack of Graph Attention (GAT) layers with aggregation of multiple attention heads, Eqs 5-6 of the GAT paper https://arxiv.org/abs/1710.10903
To use this class as a Keras model, the features and preprocessed adjacency matrix should be supplied using:
the
FullBatchNodeGenerator
class for node inferencethe
ClusterNodeGenerator
class for scalable/inductive node inference using the Cluster-GCN training procedure (https://arxiv.org/abs/1905.07953)the
FullBatchLinkGenerator
class for link inference
To have the appropriate preprocessing the generator object should be instantiated with the method=’gat’ argument.
Examples
Creating a GAT node classification model from an existing
StellarGraph
object G:generator = FullBatchNodeGenerator(G, method="gat") gat = GAT( layer_sizes=[8, 4], activations=["elu","softmax"], attn_heads=8, generator=generator, in_dropout=0.5, attn_dropout=0.5, ) x_inp, predictions = gat.in_out_tensors()
Notes
The inputs are tensors with a batch dimension of 1. These are provided by the
FullBatchNodeGenerator
object.This does not add self loops to the adjacency matrix, you should preprocess the adjacency matrix to add self-loops, using the
method='gat'
argument of theFullBatchNodeGenerator
.The nodes provided to the
FullBatchNodeGenerator.flow()
method are used by the final layer to select the predictions for those nodes in order. However, the intermediate layers before the final layer order the nodes in the same way as the adjacency matrix.
See also
Examples using GAT:
Appropriate data generators:
FullBatchNodeGenerator
,FullBatchLinkGenerator
,ClusterNodeGenerator
.Related models:
Other full-batch models: see the documentation of
FullBatchNodeGenerator
for a full listDeepGraphInfomax
for unsupervised training
GraphAttention
andGraphAttentionSparse
are the base layers out of which a GAT model is built.- Parameters
layer_sizes (list of int) – list of output sizes of GAT layers in the stack. The length of this list defines the number of GraphAttention layers in the stack.
generator (FullBatchNodeGenerator) – an instance of FullBatchNodeGenerator class constructed on the graph of interest
attn_heads (int or list of int) –
number of attention heads in GraphAttention layers. The options are:
a single integer: the passed value of
attn_heads
will be applied to all GraphAttention layers in the stack, except the last layer (for which the number of attn_heads will be set to 1).a list of integers: elements of the list define the number of attention heads in the corresponding layers in the stack.
attn_heads_reduction (list of str or None) – reductions applied to output features of each attention head, for all layers in the stack. Valid entries in the list are:
concat
,average
. If None is passed, the default reductions are applied:concat
reduction to all layers in the stack except the final layer,average
reduction to the last layer (Eqs. 5-6 of the GAT paper).bias (bool) – toggles an optional bias in GAT layers
in_dropout (float) – dropout rate applied to input features of each GAT layer
attn_dropout (float) – dropout rate applied to attention maps
normalize (str or None) – normalization applied to the final output features of the GAT layers stack. Default is None.
activations (list of str) – list of activations applied to each layer’s output; defaults to
['elu', ..., 'elu']
.saliency_map_support (bool) – If calculating saliency maps using the tools in stellargraph.interpretability.saliency_maps this should be True. Otherwise this should be False (default).
multiplicity (int, optional) – The number of nodes to process at a time. This is 1 for a node inference and 2 for link inference (currently no others are supported).
num_nodes (int, optional) – The number of nodes in the given graph.
num_features (int, optional) – The dimensions of the node features used as input to the model.
kernel_initializer (str or func, optional) – The initialiser to use for the weights of each layer.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights of each layer.
kernel_constraint (str or func, optional) – The constraint to use for the weights of each layer.
bias_initializer (str or func, optional) – The initialiser to use for the bias of each layer.
bias_regularizer (str or func, optional) – The regulariser to use for the bias of each layer.
bias_constraint (str or func, optional) – The constraint to use for the bias of each layer.
attn_kernel_initializer (str or func, optional) – The initialiser to use for the attention weights.
attn_kernel_regularizer (str or func, optional) – The regulariser to use for the attention weights.
attn_kernel_constraint (str or func, optional) – The constraint to use for the attention bias.
Note
The values for
multiplicity
,num_nodes
, andnum_features
are obtained from the providedgenerator
by default. The additional keyword arguments for these parameters provide an alternative way to specify them if a generator cannot be supplied.-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
(multiplicity=None)[source]¶ Builds a GAT model for node or link prediction
- Returns
(x_inp, x_out)
, wherex_inp
is a list of Keras/TensorFlowinput tensors for the model and
x_out
is a tensor of the model output.
- Return type
-
link_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
node_model
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
class
stellargraph.layer.
GraphAttention
(*args, **kwargs)[source]¶ Graph Attention (GAT) layer. The base implementation is taken from https://github.com/danielegrattarola/keras-gat, with some modifications added for ease of use.
Based on the original paper: Graph Attention Networks. P. Veličković et al. ICLR 2018 https://arxiv.org/abs/1710.10903
Notes
The inputs are tensors with a batch dimension of 1: Keras requires this batch dimension, and for full-batch methods we only have a single “batch”.
There are two inputs required, the node features, and the graph adjacency matrix
This does not add self loops to the adjacency matrix, you should preprocess the adjacency matrix to add self-loops
See also
GAT
combines several of these layers, andGraphAttentionSparse
supports a sparse adjacency matrix.- Parameters
F_out (int) – dimensionality of output feature vectors
attn_heads (int or list of int) – number of attention heads
attn_heads_reduction (str) – reduction applied to output features of each attention head,
concat
oraverage
.average
should be applied in the final prediction layer of the model (Eq. 6 of the paper).in_dropout_rate (float) – dropout rate applied to features
attn_dropout_rate (float) – dropout rate applied to attention coefficients
activation (str) – nonlinear activation applied to layer’s output to obtain output features (eq. 4 of the GAT paper)
final_layer (bool) – Deprecated, use
tf.gather
orGatherIndices
use_bias (bool) – toggles an optional bias
saliency_map_support (bool) – If calculating saliency maps using the tools in stellargraph.interpretability.saliency_maps this should be True. Otherwise this should be False (default).
kernel_initializer (str or func, optional) – The initialiser to use for the head weights.
kernel_regularizer (str or func, optional) – The regulariser to use for the head weights.
kernel_constraint (str or func, optional) – The constraint to use for the head weights.
bias_initializer (str or func, optional) – The initialiser to use for the head bias.
bias_regularizer (str or func, optional) – The regulariser to use for the head bias.
bias_constraint (str or func, optional) – The constraint to use for the head bias.
attn_kernel_initializer (str or func, optional) – The initialiser to use for the attention weights.
attn_kernel_regularizer (str or func, optional) – The regulariser to use for the attention weights.
attn_kernel_constraint (str or func, optional) – The constraint to use for the attention weights.
-
build
(input_shapes)[source]¶ Builds the layer
- Parameters
input_shapes (list of int) – shapes of the layer’s inputs (node features and adjacency matrix)
-
call
(inputs)[source]¶ Creates the layer as a Keras graph.
Note that the inputs are tensors with a batch dimension of 1: Keras requires this batch dimension, and for full-batch methods we only have a single “batch”.
There are two inputs required, the node features, and the graph adjacency matrix
Notes
This does not add self loops to the adjacency matrix.
- Parameters
inputs (list) – list of inputs with 3 items:
features (node) –
adjacency matrix (graph) –
N is the number of nodes in the graph, (where) – F is the dimensionality of node features M is the number of output nodes
-
class
stellargraph.layer.
GraphAttentionSparse
(*args, **kwargs)[source]¶ Graph Attention (GAT) layer, base implementation taken from https://github.com/danielegrattarola/keras-gat, some modifications added for ease of use.
Based on the original paper: Graph Attention Networks. P. Veličković et al. ICLR 2018 https://arxiv.org/abs/1710.10903
Notes
The inputs are tensors with a batch dimension of 1: Keras requires this batch dimension, and for full-batch methods we only have a single “batch”.
There are three inputs required, the node features, the output indices (the nodes that are to be selected in the final layer), and the graph adjacency matrix
This does not add self loops to the adjacency matrix, you should preprocess the adjacency matrix to add self-loops
See also
GAT
combines several of these layers, andGraphAttention
supports a dense adjacency matrix.- Parameters
F_out (int) – dimensionality of output feature vectors
attn_heads (int or list of int) – number of attention heads
attn_heads_reduction (str) – reduction applied to output features of each attention head,
concat
oraverage
.average
should be applied in the final prediction layer of the model (Eq. 6 of the paper).in_dropout_rate (float) – dropout rate applied to features
attn_dropout_rate (float) – dropout rate applied to attention coefficients
activation (str) – nonlinear activation applied to layer’s output to obtain output features (eq. 4 of the GAT paper)
final_layer (bool) – Deprecated, use
tf.gather
orGatherIndices
use_bias (bool) – toggles an optional bias
saliency_map_support (bool) – If calculating saliency maps using the tools in stellargraph.interpretability.saliency_maps this should be True. Otherwise this should be False (default).
kernel_initializer (str or func, optional) – The initialiser to use for the head weights.
kernel_regularizer (str or func, optional) – The regulariser to use for the head weights.
kernel_constraint (str or func, optional) – The constraint to use for the head weights.
bias_initializer (str or func, optional) – The initialiser to use for the head bias.
bias_regularizer (str or func, optional) – The regulariser to use for the head bias.
bias_constraint (str or func, optional) – The constraint to use for the head bias.
attn_kernel_initializer (str or func, optional) – The initialiser to use for the attention weights.
attn_kernel_regularizer (str or func, optional) – The regulariser to use for the attention weights.
attn_kernel_constraint (str or func, optional) – The constraint to use for the attention weights.
-
call
(inputs, **kwargs)[source]¶ Creates the layer as a Keras graph
Notes
This does not add self loops to the adjacency matrix.
- Parameters
inputs (list) – list of inputs with 4 items:
features (node) –
graph adjacency matrix (sparse) –
N is the number of nodes in the graph, (where) – F is the dimensionality of node features M is the number of output nodes
Watch Your Step¶
Knowledge Graph models¶
-
class
stellargraph.layer.
ComplEx
(generator, embedding_dimension, embeddings_initializer='normal', embeddings_regularizer=None)[source]¶ Embedding layers and a ComplEx scoring layers that implement the ComplEx knowledge graph embedding algorithm as in http://jmlr.org/proceedings/papers/v48/trouillon16.pdf
See also
Example using ComplEx: link prediction
Related models: other knowledge graph models, see
KGTripleGenerator
for a full list.Appropriate data generator:
KGTripleGenerator
.- Parameters
generator (KGTripleGenerator) – A generator of triples to feed into the model.
embedding_dimension (int) – the dimension of the embedding (that is, a vector in
C^embedding_dimension
is learnt for each node and each link type)embeddings_initializer (str or func, optional) – The initialiser to use for the embeddings (the default of random normal values matches the paper’s reference implementation).
embeddings_regularizer (str or func, optional) – The regularizer to use for the embeddings.
-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
embedding_arrays
()¶ Retrieve each separate set of embeddings for nodes/entities and edge types/relations in this model.
- Returns
the first element contains the embeddings for nodes/entities (for each element,
shape = number of nodes × k
), the second element contains the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension for each layer. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of lists of numpy arrays
-
embeddings
()¶ Retrieve the embeddings for nodes/entities and edge types/relations in this model, if there’s only one set of embeddings for each of nodes and edge types.
- Returns
the first element is the embeddings for nodes/entities (
shape = number of nodes × k
), the second element is the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of numpy arrays
-
in_out_tensors
()¶ Builds a knowledge graph model.
- Returns
A tuple of (list of input tensors, tensor for ComplEx model score outputs)
-
rank_edges_against_all_nodes
(test_data, known_edges_graph, tie_breaking='random')¶ Returns the ranks of the true edges in
test_data
, when scored against all other similar edges.For each input edge
E = (s, r, o)
, the score of the modified-object edge(s, r, n)
is computed for every noden
in the graph, and similarly the score of the modified-subject edge(n, r, o)
.This computes “raw” and “filtered” ranks:
- raw
The score of each edge is ranked against all of the modified-object and modified-subject ones, for instance, if
E = ("a", "X", "b")
has score 3.14, and only one modified-object edge has a higher score (e.g.F = ("a", "X", "c")
), then the raw modified-object rank forE
will be 2; if all of the(n, "X", "b")
edges have score less than 3.14, then the raw modified-subject rank forE
will be 1.- filtered
The score of each edge is ranked against only the unknown modified-object and modified-subject edges. An edge is considered known if it is in
known_edges_graph
which should typically hold every edge in the dataset (that is everything from the train, test and validation sets, if the data has been split). For instance, continuing the raw example, if the higher-scoring edgeF
is in the graph, then it will be ignored, giving a filtered modified-object rank forE
of 1. (IfF
was not in the graph, the filtered modified-object rank would be 2.)
- Parameters
test_data – the output of
KGTripleGenerator.flow()
on some test triplesknown_edges_graph (StellarGraph) – a graph instance containing all known edges/triples
tie_breaking ('random', 'top' or 'bottom') – How to rank true edges that tie with modified-object or modified-subject ones, see Sun et al. “A Re-evaluation of Knowledge Graph Completion Methods”
- Returns
A numpy array of integer raw ranks. It has shape
N × 2
, where N is the number of test triples intest_data
; the first column (array[:, 0]
) holds the modified-object ranks, and the second (array[:, 1]
) holds the modified-subject ranks.
-
class
stellargraph.layer.
ComplExScore
(*args, **kwargs)[source]¶ ComplEx scoring Keras layer.
Original Paper: Complex Embeddings for Simple Link Prediction, Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier and Guillaume Bouchard, ICML 2016. http://jmlr.org/proceedings/papers/v48/trouillon16.pdf
This combines subject, relation and object embeddings into a score of the likelihood of the link.
-
build
(input_shape)[source]¶ Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.
This is typically used to create the weights of Layer subclasses.
- Parameters
input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).
-
bulk_scoring
(node_embs, node_embs_conj, s_embs, r_embs, o_embs)[source]¶ Compute a batch of modified-object and modified-subject scores for ranking.
- Parameters
node_embs –
num_nodes × k
array of all node embeddings, wherek
is the size of the embeddings returned by :meth:embeddings_to_numpy`.extra_data – the return value of
bulk_scoring_data()
s_embs –
batch_size × k
embeddings for the true source nodesr_embs –
batch_size × k
embeddings for the true edge types/relationso_embs –
batch_size × k
embeddings for the true object nodes
- Returns
This should return a pair of NumPy arrays of shape
num_nodes × batch_size
. The first array contains scores of the modified-object edges, and the second contains scores of the modified-subject edges.
-
bulk_scoring_data
(node_embs, edge_type_embs)[source]¶ Pre-compute some data for bulk ranking, if any such data would be helpful.
-
call
(inputs)[source]¶ Applies the layer.
- Parameters
inputs – a list of 6 tensors (
shape = batch size × 1 × embedding dimension k
), where the three consecutive pairs represent real and imaginary parts of the subject, relation and object embeddings, respectively, that is,inputs == [Re(subject), Im(subject), Re(relation), ...]
-
embeddings
(num_nodes, num_edge_types, dimension, initializer, regularizer)[source]¶ Create appropriate embedding layer(s) for this scoring.
- Parameters
num_nodes – the number of nodes in this graph.
num_edge_types – the number of edge types/relations in this graph.
dimension – the requested embedding dimension, for whatever that means for this scoring.
initializer – the initializer to use for embeddings, when required.
regularizer – the regularizer to use for embeddings, when required.
- Returns
A pair of lists of
tensorflow.keras.layers.Embedding
layers, corresponding to nodes and edge types.
-
embeddings_to_numpy
(node_embs, edge_type_embs)[source]¶ Convert raw embedding NumPy arrays into “semantic” embeddings, such as complex numbers instead of interleaved real numbers.
- Parameters
node_embs –
num_nodes × k
array of all node embeddings, wherek
is the size of the embeddings returned by :meth:embeddings_to_numpy`.edge_type_embs –
num_edge_type × k
array of all edge type/relation embeddings, wherek
is the size of the embeddings returned by :meth:embeddings_to_numpy`.
- Returns
Model-specific NumPy arrays corresponding to some useful view of the embeddings vectors.
-
-
class
stellargraph.layer.
DistMult
(generator, embedding_dimension, embeddings_initializer='uniform', embeddings_regularizer=None)[source]¶ Embedding layers and a DistMult scoring layers that implement the DistMult knowledge graph embedding algorithm as in https://arxiv.org/pdf/1412.6575.pdf
See also
Example using DistMult: link prediction
Related models: other knowledge graph models, see
KGTripleGenerator
for a full list.Appropriate data generator:
KGTripleGenerator
.- Parameters
generator (KGTripleGenerator) – A generator of triples to feed into the model.
embedding_dimension (int) – the dimension of the embedding (that is, a vector in
R^embedding_dimension
is learnt for each node and each link type)embeddings_initializer (str or func, optional) – The initialiser to use for the embeddings.
embeddings_regularizer (str or func, optional) – The regularizer to use for the embeddings.
-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
embedding_arrays
()¶ Retrieve each separate set of embeddings for nodes/entities and edge types/relations in this model.
- Returns
the first element contains the embeddings for nodes/entities (for each element,
shape = number of nodes × k
), the second element contains the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension for each layer. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of lists of numpy arrays
-
embeddings
()¶ Retrieve the embeddings for nodes/entities and edge types/relations in this model, if there’s only one set of embeddings for each of nodes and edge types.
- Returns
the first element is the embeddings for nodes/entities (
shape = number of nodes × k
), the second element is the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of numpy arrays
-
in_out_tensors
()¶ Builds a knowledge graph model.
- Returns
A tuple of (list of input tensors, tensor for ComplEx model score outputs)
-
rank_edges_against_all_nodes
(test_data, known_edges_graph, tie_breaking='random')¶ Returns the ranks of the true edges in
test_data
, when scored against all other similar edges.For each input edge
E = (s, r, o)
, the score of the modified-object edge(s, r, n)
is computed for every noden
in the graph, and similarly the score of the modified-subject edge(n, r, o)
.This computes “raw” and “filtered” ranks:
- raw
The score of each edge is ranked against all of the modified-object and modified-subject ones, for instance, if
E = ("a", "X", "b")
has score 3.14, and only one modified-object edge has a higher score (e.g.F = ("a", "X", "c")
), then the raw modified-object rank forE
will be 2; if all of the(n, "X", "b")
edges have score less than 3.14, then the raw modified-subject rank forE
will be 1.- filtered
The score of each edge is ranked against only the unknown modified-object and modified-subject edges. An edge is considered known if it is in
known_edges_graph
which should typically hold every edge in the dataset (that is everything from the train, test and validation sets, if the data has been split). For instance, continuing the raw example, if the higher-scoring edgeF
is in the graph, then it will be ignored, giving a filtered modified-object rank forE
of 1. (IfF
was not in the graph, the filtered modified-object rank would be 2.)
- Parameters
test_data – the output of
KGTripleGenerator.flow()
on some test triplesknown_edges_graph (StellarGraph) – a graph instance containing all known edges/triples
tie_breaking ('random', 'top' or 'bottom') –
How to rank true edges that tie with modified-object or modified-subject ones, see Sun et al. “A Re-evaluation of Knowledge Graph Completion Methods”
- Returns
A numpy array of integer raw ranks. It has shape
N × 2
, where N is the number of test triples intest_data
; the first column (array[:, 0]
) holds the modified-object ranks, and the second (array[:, 1]
) holds the modified-subject ranks.
-
class
stellargraph.layer.
DistMultScore
(*args, **kwargs)[source]¶ DistMult scoring Keras layer.
Original Paper: Embedding Entities and Relations for Learning and Inference in Knowledge Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng. ICLR 2015
This combines subject, relation and object embeddings into a score of the likelihood of the link.
-
build
(input_shape)[source]¶ Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.
This is typically used to create the weights of Layer subclasses.
- Parameters
input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).
-
bulk_scoring
(all_n_embs, _extra_data, s_embs, r_embs, o_embs)[source]¶ Compute a batch of modified-object and modified-subject scores for ranking.
- Parameters
node_embs –
num_nodes × k
array of all node embeddings, wherek
is the size of the embeddings returned by :meth:embeddings_to_numpy`.extra_data – the return value of
bulk_scoring_data()
s_embs –
batch_size × k
embeddings for the true source nodesr_embs –
batch_size × k
embeddings for the true edge types/relationso_embs –
batch_size × k
embeddings for the true object nodes
- Returns
This should return a pair of NumPy arrays of shape
num_nodes × batch_size
. The first array contains scores of the modified-object edges, and the second contains scores of the modified-subject edges.
-
call
(inputs)[source]¶ Applies the layer.
- Parameters
inputs – a list of 3 tensors (
shape = batch size × 1 × embedding dimension
), representing the subject, relation and object embeddings, respectively, that is,inputs == [subject, relation, object]
-
embeddings
(num_nodes, num_edge_types, dimension, initializer, regularizer)[source]¶ Create appropriate embedding layer(s) for this scoring.
- Parameters
num_nodes – the number of nodes in this graph.
num_edge_types – the number of edge types/relations in this graph.
dimension – the requested embedding dimension, for whatever that means for this scoring.
initializer – the initializer to use for embeddings, when required.
regularizer – the regularizer to use for embeddings, when required.
- Returns
A pair of lists of
tensorflow.keras.layers.Embedding
layers, corresponding to nodes and edge types.
-
-
class
stellargraph.layer.
RotatE
(**kwargs)[source]¶ Warning
RotatE
is experimental: demo and documentation is missing (see: #1549, #1550). It may be difficult to use and may have major changes at any time.Implementation of https://arxiv.org/abs/1902.10197
See also
Related models: other knowledge graph models, see
KGTripleGenerator
for a full list.Appropriate data generator:
KGTripleGenerator
.-
embedding_arrays
()¶ Retrieve each separate set of embeddings for nodes/entities and edge types/relations in this model.
- Returns
the first element contains the embeddings for nodes/entities (for each element,
shape = number of nodes × k
), the second element contains the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension for each layer. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of lists of numpy arrays
-
embeddings
()¶ Retrieve the embeddings for nodes/entities and edge types/relations in this model, if there’s only one set of embeddings for each of nodes and edge types.
- Returns
the first element is the embeddings for nodes/entities (
shape = number of nodes × k
), the second element is the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of numpy arrays
-
in_out_tensors
()¶ Builds a knowledge graph model.
- Returns
A tuple of (list of input tensors, tensor for ComplEx model score outputs)
-
rank_edges_against_all_nodes
(test_data, known_edges_graph, tie_breaking='random')¶ Returns the ranks of the true edges in
test_data
, when scored against all other similar edges.For each input edge
E = (s, r, o)
, the score of the modified-object edge(s, r, n)
is computed for every noden
in the graph, and similarly the score of the modified-subject edge(n, r, o)
.This computes “raw” and “filtered” ranks:
- raw
The score of each edge is ranked against all of the modified-object and modified-subject ones, for instance, if
E = ("a", "X", "b")
has score 3.14, and only one modified-object edge has a higher score (e.g.F = ("a", "X", "c")
), then the raw modified-object rank forE
will be 2; if all of the(n, "X", "b")
edges have score less than 3.14, then the raw modified-subject rank forE
will be 1.- filtered
The score of each edge is ranked against only the unknown modified-object and modified-subject edges. An edge is considered known if it is in
known_edges_graph
which should typically hold every edge in the dataset (that is everything from the train, test and validation sets, if the data has been split). For instance, continuing the raw example, if the higher-scoring edgeF
is in the graph, then it will be ignored, giving a filtered modified-object rank forE
of 1. (IfF
was not in the graph, the filtered modified-object rank would be 2.)
- Parameters
test_data – the output of
KGTripleGenerator.flow()
on some test triplesknown_edges_graph (StellarGraph) – a graph instance containing all known edges/triples
tie_breaking ('random', 'top' or 'bottom') –
How to rank true edges that tie with modified-object or modified-subject ones, see Sun et al. “A Re-evaluation of Knowledge Graph Completion Methods”
- Returns
A numpy array of integer raw ranks. It has shape
N × 2
, where N is the number of test triples intest_data
; the first column (array[:, 0]
) holds the modified-object ranks, and the second (array[:, 1]
) holds the modified-subject ranks.
-
-
class
stellargraph.layer.
RotatEScore
(*args, **kwargs)[source]¶ -
bulk_scoring
(all_n_embs, _extra_data, s_embs, r_embs, o_embs)[source]¶ Compute a batch of modified-object and modified-subject scores for ranking.
- Parameters
node_embs –
num_nodes × k
array of all node embeddings, wherek
is the size of the embeddings returned by :meth:embeddings_to_numpy`.extra_data – the return value of
bulk_scoring_data()
s_embs –
batch_size × k
embeddings for the true source nodesr_embs –
batch_size × k
embeddings for the true edge types/relationso_embs –
batch_size × k
embeddings for the true object nodes
- Returns
This should return a pair of NumPy arrays of shape
num_nodes × batch_size
. The first array contains scores of the modified-object edges, and the second contains scores of the modified-subject edges.
-
call
(inputs)[source]¶ This is where the layer’s logic lives.
- Parameters
inputs – Input tensor, or list/tuple of input tensors.
**kwargs – Additional keyword arguments.
- Returns
A tensor or list/tuple of tensors.
-
embeddings
(num_nodes, num_edge_types, dimension, initializer, regularizer)[source]¶ Create appropriate embedding layer(s) for this scoring.
- Parameters
num_nodes – the number of nodes in this graph.
num_edge_types – the number of edge types/relations in this graph.
dimension – the requested embedding dimension, for whatever that means for this scoring.
initializer – the initializer to use for embeddings, when required.
regularizer – the regularizer to use for embeddings, when required.
- Returns
A pair of lists of
tensorflow.keras.layers.Embedding
layers, corresponding to nodes and edge types.
-
embeddings_to_numpy
(node_embs, edge_type_embs)[source]¶ Convert raw embedding NumPy arrays into “semantic” embeddings, such as complex numbers instead of interleaved real numbers.
- Parameters
node_embs –
num_nodes × k
array of all node embeddings, wherek
is the size of the embeddings returned by :meth:embeddings_to_numpy`.edge_type_embs –
num_edge_type × k
array of all edge type/relation embeddings, wherek
is the size of the embeddings returned by :meth:embeddings_to_numpy`.
- Returns
Model-specific NumPy arrays corresponding to some useful view of the embeddings vectors.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
- Returns
Python dictionary.
-
-
class
stellargraph.layer.
RotE
(**kwargs)[source]¶ Warning
RotE
is experimental: demo is missing (see: #1664). It may be difficult to use and may have major changes at any time.Embedding layers and a RotE scoring layer that implement the RotE knowledge graph embedding algorithm as in https://arxiv.org/pdf/2005.00545.pdf
See also
Related models:
other knowledge graph models, see
KGTripleGenerator
for a full listRotH
for the hyperbolic version of this Euclidean model
Appropriate data generator:
KGTripleGenerator
.- Parameters
generator (KGTripleGenerator) – A generator of triples to feed into the model.
embedding_dimension (int) – the dimension of the embeddings (that is, a vector in
R^embedding_dimension
plus a bias inR
is learnt for each node, along with a pair of vectors inR^embedding_dimension
andR^(embedding_dimension / 2)
for each node type). It must be even.embeddings_initializer (str or func, optional) – The initialiser to use for the embeddings.
embeddings_regularizer (str or func, optional) – The regularizer to use for the embeddings.
-
embedding_arrays
()¶ Retrieve each separate set of embeddings for nodes/entities and edge types/relations in this model.
- Returns
the first element contains the embeddings for nodes/entities (for each element,
shape = number of nodes × k
), the second element contains the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension for each layer. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of lists of numpy arrays
-
embeddings
()¶ Retrieve the embeddings for nodes/entities and edge types/relations in this model, if there’s only one set of embeddings for each of nodes and edge types.
- Returns
the first element is the embeddings for nodes/entities (
shape = number of nodes × k
), the second element is the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of numpy arrays
-
in_out_tensors
()¶ Builds a knowledge graph model.
- Returns
A tuple of (list of input tensors, tensor for ComplEx model score outputs)
-
rank_edges_against_all_nodes
(test_data, known_edges_graph, tie_breaking='random')¶ Returns the ranks of the true edges in
test_data
, when scored against all other similar edges.For each input edge
E = (s, r, o)
, the score of the modified-object edge(s, r, n)
is computed for every noden
in the graph, and similarly the score of the modified-subject edge(n, r, o)
.This computes “raw” and “filtered” ranks:
- raw
The score of each edge is ranked against all of the modified-object and modified-subject ones, for instance, if
E = ("a", "X", "b")
has score 3.14, and only one modified-object edge has a higher score (e.g.F = ("a", "X", "c")
), then the raw modified-object rank forE
will be 2; if all of the(n, "X", "b")
edges have score less than 3.14, then the raw modified-subject rank forE
will be 1.- filtered
The score of each edge is ranked against only the unknown modified-object and modified-subject edges. An edge is considered known if it is in
known_edges_graph
which should typically hold every edge in the dataset (that is everything from the train, test and validation sets, if the data has been split). For instance, continuing the raw example, if the higher-scoring edgeF
is in the graph, then it will be ignored, giving a filtered modified-object rank forE
of 1. (IfF
was not in the graph, the filtered modified-object rank would be 2.)
- Parameters
test_data – the output of
KGTripleGenerator.flow()
on some test triplesknown_edges_graph (StellarGraph) – a graph instance containing all known edges/triples
tie_breaking ('random', 'top' or 'bottom') –
How to rank true edges that tie with modified-object or modified-subject ones, see Sun et al. “A Re-evaluation of Knowledge Graph Completion Methods”
- Returns
A numpy array of integer raw ranks. It has shape
N × 2
, where N is the number of test triples intest_data
; the first column (array[:, 0]
) holds the modified-object ranks, and the second (array[:, 1]
) holds the modified-subject ranks.
-
class
stellargraph.layer.
RotH
(**kwargs)[source]¶ Warning
RotH
is experimental: demo is missing (see: #1664). It may be difficult to use and may have major changes at any time.Embedding layers and a RotH scoring layer that implement the RotH knowledge graph embedding algorithm as in https://arxiv.org/abs/2005.00545
See also
Related models:
other knowledge graph models, see
KGTripleGenerator
for a full listRotE
for the Euclidean version of this hyperbolic model
Appropriate data generator:
KGTripleGenerator
.- Parameters
generator (KGTripleGenerator) – A generator of triples to feed into the model.
embedding_dimension (int) – the dimension of the embeddings (that is, a vector in
R^embedding_dimension
plus a bias inR
is learnt for each node, along with a pair of vectors inR^embedding_dimension
andR^(embedding_dimension / 2)
for each node type). It must be even.embeddings_initializer (str or func, optional) – The initialiser to use for the embeddings.
embeddings_regularizer (str or func, optional) – The regularizer to use for the embeddings.
-
embedding_arrays
()¶ Retrieve each separate set of embeddings for nodes/entities and edge types/relations in this model.
- Returns
the first element contains the embeddings for nodes/entities (for each element,
shape = number of nodes × k
), the second element contains the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension for each layer. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of lists of numpy arrays
-
embeddings
()¶ Retrieve the embeddings for nodes/entities and edge types/relations in this model, if there’s only one set of embeddings for each of nodes and edge types.
- Returns
the first element is the embeddings for nodes/entities (
shape = number of nodes × k
), the second element is the embeddings for edge types/relations (shape = number of edge types x k
), wherek
is some notion of the embedding dimension. The type of the embeddings depends on the specific scoring function chosen.- Return type
A tuple of numpy arrays
-
in_out_tensors
()¶ Builds a knowledge graph model.
- Returns
A tuple of (list of input tensors, tensor for ComplEx model score outputs)
-
rank_edges_against_all_nodes
(test_data, known_edges_graph, tie_breaking='random')¶ Returns the ranks of the true edges in
test_data
, when scored against all other similar edges.For each input edge
E = (s, r, o)
, the score of the modified-object edge(s, r, n)
is computed for every noden
in the graph, and similarly the score of the modified-subject edge(n, r, o)
.This computes “raw” and “filtered” ranks:
- raw
The score of each edge is ranked against all of the modified-object and modified-subject ones, for instance, if
E = ("a", "X", "b")
has score 3.14, and only one modified-object edge has a higher score (e.g.F = ("a", "X", "c")
), then the raw modified-object rank forE
will be 2; if all of the(n, "X", "b")
edges have score less than 3.14, then the raw modified-subject rank forE
will be 1.- filtered
The score of each edge is ranked against only the unknown modified-object and modified-subject edges. An edge is considered known if it is in
known_edges_graph
which should typically hold every edge in the dataset (that is everything from the train, test and validation sets, if the data has been split). For instance, continuing the raw example, if the higher-scoring edgeF
is in the graph, then it will be ignored, giving a filtered modified-object rank forE
of 1. (IfF
was not in the graph, the filtered modified-object rank would be 2.)
- Parameters
test_data – the output of
KGTripleGenerator.flow()
on some test triplesknown_edges_graph (StellarGraph) – a graph instance containing all known edges/triples
tie_breaking ('random', 'top' or 'bottom') –
How to rank true edges that tie with modified-object or modified-subject ones, see Sun et al. “A Re-evaluation of Knowledge Graph Completion Methods”
- Returns
A numpy array of integer raw ranks. It has shape
N × 2
, where N is the number of test triples intest_data
; the first column (array[:, 0]
) holds the modified-object ranks, and the second (array[:, 1]
) holds the modified-subject ranks.
GCN Supervised Graph Classification¶
-
class
stellargraph.layer.
GCNSupervisedGraphClassification
(layer_sizes, activations, generator, bias=True, dropout=0.0, pooling=None, pool_all_layers=False, kernel_initializer=None, kernel_regularizer=None, kernel_constraint=None, bias_initializer=None, bias_regularizer=None, bias_constraint=None)[source]¶ A stack of
GraphConvolution
layers together with a Keras GlobalAveragePooling1D layer (by default) that implement a supervised graph classification network using the GCN convolution operator (https://arxiv.org/abs/1609.02907).The model minimally requires specification of the GCN layer sizes as a list of int corresponding to the feature dimensions for each hidden layer, activation functions for each hidden layers, and a generator object.
To use this class as a Keras model, the features and preprocessed adjacency matrix should be supplied using the
PaddedGraphGenerator
class.Examples
Creating a graph classification model from a list of
StellarGraph
objects (graphs
). We also add two fully connected dense layers using the last one for binary classification with softmax activation:generator = PaddedGraphGenerator(graphs) model = GCNSupervisedGraphClassification( layer_sizes=[32, 32], activations=["elu","elu"], generator=generator, dropout=0.5 ) x_inp, x_out = model.in_out_tensors() predictions = Dense(units=8, activation='relu')(x_out) predictions = Dense(units=2, activation='softmax')(predictions)
See also
Examples using GCN graph classification:
Appropriate data generator:
PaddedGraphGenerator
.Related models:
DeepGraphCNN
for a specialisation usingSortPooling
GCN
for predictions for individual nodes or links
- Parameters
layer_sizes (list of int) – list of output sizes of the graph GCN layers in the stack.
activations (list of str) – list of activations applied to each GCN layer’s output.
generator (PaddedGraphGenerator) – an instance of
PaddedGraphGenerator
class constructed on the graphs used for training.bias (bool, optional) – toggles an optional bias in graph convolutional layers.
dropout (float, optional) – dropout rate applied to input features of each GCN layer.
pooling (callable, optional) –
a Keras layer or function that takes two arguments and returns a tensor representing the embeddings for each graph in the batch. Arguments:
embeddings tensor argument with shape
batch size × nodes × output size
, wherenodes
is the maximum number of nodes of a graph in the batch andoutput size
is the size of the final graph convolutional layer, or, ifpool_all_layers
, the sum of the sizes of each graph convolutional layers.mask
tensor named argument of booleans with shapebatch size × nodes
.True
values indicate which rows of the embeddings argument are valid, and all other rows (corresponding tomask == False
) must be ignored.
The returned tensor can have any shape
batch size
,batch size × N1
,batch size × N1 × N2
, …, as long as theN1
,N2
, … are constant across all graphs: they must not depend on thenodes
dimension or on the number ofTrue
values inmask
.pooling
defaults to mean pooling viaGlobalAveragePooling1D
.pool_all_layers (bool, optional) – which layers to pass to the pooling method: if
True
, pass the concatenation of the output of every GCN layer, otherwise pass only the output of the last GCN layer.kernel_initializer (str or func, optional) – The initialiser to use for the weights of each graph convolutional layer.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights of each graph convolutional layer.
kernel_constraint (str or func, optional) – The constraint to use for the weights of each layer graph convolutional.
bias_initializer (str or func, optional) – The initialiser to use for the bias of each layer graph convolutional.
bias_regularizer (str or func, optional) – The regulariser to use for the bias of each layer graph convolutional.
bias_constraint (str or func, optional) – The constraint to use for the bias of each layer graph convolutional.
-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
in_out_tensors
()[source]¶ Builds a Graph Classification model.
- Returns
(x_inp, x_out)
, wherex_inp
is a list of two input tensors for theGraph Classification model (containing node features and normalized adjacency matrix), and
x_out
is a tensor for the Graph Classification model output.
- Return type
Deep Graph Convolutional Neural Network¶
-
class
stellargraph.layer.
SortPooling
(*args, **kwargs)[source]¶ Sort Pooling Keras layer.
Note that sorting is performed using only the last column of the input tensor as stated in [1], “For convenience, we set the last graph convolution to have one channel and only used this single channel for sorting.”
[1] An End-to-End Deep Learning Architecture for Graph Classification, M. Zhang, Z. Cui, M. Neumann, and Y. Chen, AAAI-18, https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/17146
See also
The
DeepGraphCNN
model uses this class for graph classification.- Parameters
-
call
(embeddings, mask)[source]¶ Applies the layer.
- Parameters
embeddings (tensor) – the node features (size B x N x Sum F_i) where B is the batch size, N is the number of nodes in the largest graph in the batch, and F_i is the dimensionality of node features output from the i-th convolutional layer.
mask (tensor) – a boolean mask (size B x N)
- Returns
Keras Tensor that represents the output of the layer.
-
class
stellargraph.layer.
DeepGraphCNN
(layer_sizes, activations, k, generator, bias=True, dropout=0.0, kernel_initializer=None, kernel_regularizer=None, kernel_constraint=None, bias_initializer=None, bias_regularizer=None, bias_constraint=None)[source]¶ A stack of
GraphConvolution
layers together with a SortPooling layer that implement a supervised graph classification network (DGCNN) using the GCN convolution operator (https://arxiv.org/abs/1609.02907).The DGCNN model was introduced in the paper, “An End-to-End Deep Learning Architecture for Graph Classification” by M. Zhang, Z. Cui, M. Neumann, and Y. Chen, AAAI 2018, https://www.cse.wustl.edu/~muhan/papers/AAAI_2018_DGCNN.pdf
The model minimally requires specification of the GCN layer sizes as a list of int corresponding to the feature dimensions for each hidden layer, activation functions for each hidden layer, a generator object, and the number of output nodes for the class:SortPooling layer.
To use this class as a Keras model, the features and preprocessed adjacency matrix should be supplied using the
PaddedGraphGenerator
class.Examples
Creating a graph classification model from a list of
StellarGraph
objects (graphs
). We also add two one-dimensional convolutional layers, a max pooling layer, and two fully connected dense layers one with dropout one used for binary classification:generator = PaddedGraphGenerator(graphs) model = DeepGraphCNN( layer_sizes=[32, 32, 32, 1], activations=["tanh","tanh", "tanh", "tanh"], generator=generator, k=30 ) x_inp, x_out = model.in_out_tensors() x_out = Conv1D(filters=16, kernel_size=97, strides=97)(x_out) x_out = MaxPool1D(pool_size=2)(x_out) x_out = Conv1D(filters=32, kernel_size=5, strides=1)(x_out) x_out = Flatten()(x_out) x_out = Dense(units=128, activation="relu")(x_out) x_out = Dropout(rate=0.5)(x_out) predictions = Dense(units=1, activation="sigmoid")(x_out) model = Model(inputs=x_inp, outputs=predictions)
See also
Example using DGCNN: graph classification.
Appropriate data generator:
PaddedGraphGenerator
.Related models:
GCNSupervisedGraphClassification
for the general form, supporting more customisationGCN
for predictions for individual nodes or links
- Parameters
layer_sizes (list of int) – list of output sizes of the graph GCN layers in the stack.
activations (list of str) – list of activations applied to each GCN layer’s output.
k (int) – size (number of rows) of output tensor.
generator (GraphGenerator) – an instance of
GraphGenerator
class constructed on the graphs used for training.bias (bool, optional) – toggles an optional bias in graph convolutional layers.
dropout (float, optional) – dropout rate applied to input features of each GCN layer.
kernel_initializer (str or func, optional) – The initialiser to use for the weights of each graph convolutional layer.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights of each graph convolutional layer.
kernel_constraint (str or func, optional) – The constraint to use for the weights of each layer graph convolutional.
bias_initializer (str or func, optional) – The initialiser to use for the bias of each layer graph convolutional.
bias_regularizer (str or func, optional) – The regulariser to use for the bias of each layer graph convolutional.
bias_constraint (str or func, optional) – The constraint to use for the bias of each layer graph convolutional.
Graph Convolution LSTM¶
-
class
stellargraph.layer.
GCN_LSTM
(**kwargs)[source]¶ Warning
GCN_LSTM
is experimental: Lack of unit tests and code refinement (see: #1132, #1526, #1564). It may be difficult to use and may have major changes at any time.GCN_LSTM is a univariate timeseries forecasting method. The architecture comprises of a stack of N1 Graph Convolutional layers followed by N2 LSTM layers, a Dropout layer, and a Dense layer. This main components of GNN architecture is inspired by: T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction (https://arxiv.org/abs/1811.05320). The implementation of the above paper is based on one graph convolution layer stacked with a GRU layer.
The StellarGraph implementation is built as a stack of the following set of layers:
User specified no. of Graph Convolutional layers
User specified no. of LSTM layers
1 Dense layer
1 Dropout layer.
The last two layers consistently showed better performance and regularization experimentally.
See also
Example using GCN_LSTM: spatio-temporal time-series prediction.
Appropriate data generator:
SlidingFeaturesNodeGenerator
.Related model:
GCN
for graphs without time-series node features.- Parameters
seq_len – No. of LSTM cells
adj – unweighted/weighted adjacency matrix of [no.of nodes by no. of nodes dimension
gc_layer_sizes (list of int) – Output sizes of Graph Convolution layers in the stack.
lstm_layer_sizes (list of int) – Output sizes of LSTM layers in the stack.
generator (SlidingFeaturesNodeGenerator) – A generator instance.
bias (bool) – If True, a bias vector is learnt for each layer in the GCN model.
dropout (float) – Dropout rate applied to input features of each GCN layer.
gc_activations (list of str or func) – Activations applied to each layer’s output; defaults to
['relu', ..., 'relu']
.lstm_activations (list of str or func) – Activations applied to each layer’s output; defaults to
['tanh', ..., 'tanh']
.kernel_initializer (str or func, optional) – The initialiser to use for the weights of each layer.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights of each layer.
kernel_constraint (str or func, optional) – The constraint to use for the weights of each layer.
bias_initializer (str or func, optional) – The initialiser to use for the bias of each layer.
bias_regularizer (str or func, optional) – The regulariser to use for the bias of each layer.
bias_constraint (str or func, optional) – The constraint to use for the bias of each layer.
-
class
stellargraph.layer.
FixedAdjacencyGraphConvolution
(*args, **kwargs)[source]¶ Graph Convolution (GCN) Keras layer. The implementation is based on https://github.com/tkipf/keras-gcn.
Original paper: Semi-Supervised Classification with Graph Convolutional Networks. Thomas N. Kipf, Max Welling, International Conference on Learning Representations (ICLR), 2017 https://github.com/tkipf/gcn
Notes
The inputs are 3 dimensional tensors: batch size, sequence length, and number of nodes.
This class assumes that a simple unweighted or weighted adjacency matrix is passed to it, the normalized Laplacian matrix is calculated within the class.
- Parameters
units (int) – dimensionality of output feature vectors
A (N x N) – weighted/unweighted adjacency matrix
activation (str or func) – nonlinear activation applied to layer’s output to obtain output features
use_bias (bool) – toggles an optional bias
kernel_initializer (str or func, optional) – The initialiser to use for the weights.
kernel_regularizer (str or func, optional) – The regulariser to use for the weights.
kernel_constraint (str or func, optional) – The constraint to use for the weights.
bias_initializer (str or func, optional) – The initialiser to use for the bias.
bias_regularizer (str or func, optional) – The regulariser to use for the bias.
bias_constraint (str or func, optional) – The constraint to use for the bias.
-
build
(input_shapes)[source]¶ Builds the layer
- Parameters
input_shapes (list of int) – shapes of the layer’s inputs (the batches of node features)
-
call
(features)[source]¶ Applies the layer.
- Parameters
features (ndarray) – node features (size B x N x F), where B is the batch size, F = TV is the feature size (consisting of the sequence length and the number of variates), and N is the number of nodes in the graph.
- Returns
Keras Tensor that represents the output of the layer.
Deep Graph Infomax¶
-
class
stellargraph.layer.
DeepGraphInfomax
(base_model, corrupted_generator=None)[source]¶ A class to wrap stellargraph models for Deep Graph Infomax unsupervised training (https://arxiv.org/pdf/1809.10341.pdf).
- Parameters
base_model – the base stellargraph model class
-
build
(**kwargs)¶ Deprecated: use
in_out_tensors()
.
-
embedding_model
()[source]¶ Deprecated: use
base_model.in_out_tensors
instead. Deep Graph Infomax just trains the base model, and the model behaves as usual after training.
-
in_out_tensors
()[source]¶ A function to create the the Keras inputs and outputs for a Deep Graph Infomax model for unsupervised training.
Note that the
tensorflow.nn.sigmoid_cross_entropy_with_logits()
loss must be used with this model.Example:
dg_infomax = DeepGraphInfoMax(...) x_in, x_out = dg_infomax.in_out_tensors() model = Model(inputs=x_in, outputs=x_out) model.compile(loss=tf.nn.sigmoid_cross_entropy_with_logits, ...)
- Returns
input and output layers for use with a Keras model
-
class
stellargraph.layer.
DGIDiscriminator
(*args, **kwargs)[source]¶ This Layer computes the Discriminator function for Deep Graph Infomax (https://arxiv.org/pdf/1809.10341.pdf).
See also
DeepGraphInfomax
uses this layer.-
build
(input_shapes)[source]¶ Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.
This is typically used to create the weights of Layer subclasses.
- Parameters
input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).
-
call
(inputs)[source]¶ Applies the layer to the inputs.
- Parameters
inputs – a list or tuple of tensors with shapes
[(1, N, F), (1, F)]
for full batch methods and shapes[(B, F), (F,)]
for sampled node methods, containing the node features and a summary feature vector. WhereN
is the number of nodes in the graph,F
is the feature dimension, andB
is the batch size.- Returns
a Tensor with shape
(1, N)
for full batch methods and shape(B,)
for sampled node methods.
-
Link prediction¶
-
class
stellargraph.layer.
LinkEmbedding
(*args, **kwargs)[source]¶ Defines an edge inference function that takes source, destination node embeddings (node features) as input, and returns a numeric vector of output_dim size.
This class takes as input as either:
A list of two tensors of shape (N, M) being the embeddings for each of the nodes in the link, where N is the number of links, and M is the node embedding size.
A single tensor of shape (…, N, 2, M) where the axis second from last indexes the nodes in the link and N is the number of links and M the embedding size.
Examples
Consider two tensors containing the source and destination embeddings of size M:
x_src = tf.constant(x_src, shape=(1, M), dtype="float32") x_dst = tf.constant(x_dst, shape=(1, M), dtype="float32") li = LinkEmbedding(method="ip", activation="sigmoid")([x_src, x_dst])
See also
Examples using this class:
Related functions:
link_inference()
,link_classification()
,link_regression()
.- Parameters
axis (int) – If a single tensor is supplied this is the axis that indexes the node embeddings so that the indices 0 and 1 give the node embeddings to be combined. This is ignored if two tensors are supplied as a list.
activation (str) – activation function applied to the output, one of “softmax”, “sigmoid”, etc., or any activation function supported by Keras, see https://keras.io/activations/ for more information.
method (str) –
Name of the method of combining
(src,dst)
node features or embeddings into edge embeddings. One of:concat
– concatenation,ip
ordot
– inner product, \(ip(u,v) = sum_{i=1..d}{u_i*v_i}\),mul
orhadamard
– element-wise multiplication, \(h(u,v)_i = u_i*v_i\),l1
– L1 operator, \(l_1(u,v)_i = |u_i-v_i|\),l2
– L2 operator, \(l_2(u,v)_i = (u_i-v_i)^2\),avg
– average, \(avg(u,v) = (u+v)/2\).
For all methods except
ip
ordot
a dense layer is applied on top of the combined edge embedding to transform to a vector of sizeoutput_dim
.
-
call
(x)[source]¶ Apply the layer to the node embeddings in x. These embeddings are either:
A list of two tensors of shape (N, M) being the embeddings for each of the nodes in the link, where N is the number of links, and M is the node embedding size.
A single tensor of shape (…, N, 2, M) where the axis second from last indexes the nodes in the link and N is the number of links and M the embedding size.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
- Returns
Python dictionary.
-
stellargraph.layer.
link_classification
(output_dim: int = 1, output_act: AnyStr = 'sigmoid', edge_embedding_method: AnyStr = 'ip')[source]¶ Defines a function that predicts a binary or multi-class edge classification output from (source, destination) node embeddings (node features).
This function takes as input as either:
A list of two tensors of shape (N, M) being the embeddings for each of the nodes in the link, where N is the number of links, and M is the node embedding size.
A single tensor of shape (…, N, 2, M) where the axis second from last indexes the nodes in the link and N is the number of links and M the embedding size.
Note that the output tensor is flattened before being returned.
See also
Examples using this function:
Attri2Vec: node classification link prediction, unsupervised representation learning
GraphSAGE: link prediction, unsupervised representation learning
Node2Vec: node classification, unsupervised representation learning
other link prediction: comparison of algorithms, ensembles, calibration
Related functionality:
LinkEmbedding
,link_inference()
,link_regression()
.- Parameters
output_dim (int) – Number of classifier’s output units – desired dimensionality of the output,
output_act (str) – activation function applied to the output, one of “softmax”, “sigmoid”, etc., or any activation function supported by Keras, see https://keras.io/activations/ for more information.
edge_embedding_method (str) –
Name of the method of combining
(src,dst)
node features/embeddings into edge embeddings. One of:concat
– concatenation,ip
ordot
– inner product, \(ip(u,v) = sum_{i=1..d}{u_i*v_i}\),mul
orhadamard
– element-wise multiplication, \(h(u,v)_i = u_i*v_i\),l1
– L1 operator, \(l_1(u,v)_i = |u_i-v_i|\),l2
– L2 operator, \(l_2(u,v)_i = (u_i-v_i)^2\),avg
– average, \(avg(u,v) = (u+v)/2\).
- Returns
Function taking edge tensors with
src
,dst
node embeddings (i.e., pairs of(node_src, node_dst)
tensors) and returning logits of output_dim length (e.g., edge class probabilities).
-
stellargraph.layer.
link_regression
(output_dim: int = 1, clip_limits: Optional[Tuple[float]] = None, edge_embedding_method: AnyStr = 'ip')[source]¶ Defines a function that predicts a numeric edge regression output vector/scalar from (source, destination) node embeddings (node features).
This function takes as input as either:
A list of two tensors of shape (N, M) being the embeddings for each of the nodes in the link, where N is the number of links, and M is the node embedding size.
A single tensor of shape (…, N, 2, M) where the axis second from last indexes the nodes in the link and N is the number of links and M the embedding size.
Note that the output tensor is flattened before being returned.
See also
Example using this function: HinSAGE link prediction.
Related functionality:
LinkEmbedding
,link_inference()
,link_classification()
.- Parameters
output_dim (int) – Number of classifier’s output units – desired dimensionality of the output,
clip_limits (tuple) – lower and upper thresholds for LeakyClippedLinear unit on top. If None (not provided), the LeakyClippedLinear unit is not applied.
edge_embedding_method (str) –
Name of the method of combining
(src,dst)
node features/embeddings into edge embeddings. One of:concat
– concatenation,ip
ordot
– inner product, \(ip(u,v) = sum_{i=1..d}{u_i*v_i}\),mul
orhadamard
– element-wise multiplication, \(h(u,v)_i = u_i*v_i\),l1
– L1 operator, \(l_1(u,v)_i = |u_i-v_i|\),l2
– L2 operator, \(l_2(u,v)_i = (u_i-v_i)^2\),avg
– average, \(avg(u,v) = (u+v)/2\).
- Returns
Function taking edge tensors with
src
,dst
node embeddings (i.e., pairs of(node_src, node_dst)
tensors) and returning a numeric value (e.g., edge attribute being predicted) constructed according to edge_embedding_method.
-
stellargraph.layer.
link_inference
(output_dim: int = 1, output_act: AnyStr = 'linear', edge_embedding_method: AnyStr = 'ip', clip_limits: Optional[Tuple[float]] = None, name: AnyStr = 'link_inference')[source]¶ Defines an edge inference function that takes source, destination node embeddings (node features) as input, and returns a numeric vector of output_dim size.
This function takes as input as either:
A list of two tensors of shape (N, M) being the embeddings for each of the nodes in the link, where N is the number of links, and M is the node embedding size.
A single tensor of shape (…, N, 2, M) where the axis second from last indexes the nodes in the link and N is the number of links and M the embedding size.
Note that the output tensor is flattened before being returned.
See also
Related functionality:
LinkEmbedding
,link_classification()
,link_regression()
.- Parameters
output_dim (int) – Number of predictor’s output units – desired dimensionality of the output.
output_act (str) – activation function applied to the output, one of “softmax”, “sigmoid”, etc., or any activation function supported by Keras, see https://keras.io/activations/ for more information.
edge_embedding_method (str) –
Name of the method of combining
(src,dst)
node features or embeddings into edge embeddings. One of:concat
– concatenation,ip
ordot
– inner product, \(ip(u,v) = sum_{i=1..d}{u_i*v_i}\),mul
orhadamard
– element-wise multiplication, \(h(u,v)_i = u_i*v_i\),l1
– L1 operator, \(l_1(u,v)_i = |u_i-v_i|\),l2
– L2 operator, \(l_2(u,v)_i = (u_i-v_i)^2\),avg
– average, \(avg(u,v) = (u+v)/2\).
For all methods except
ip
ordot
a dense layer is applied on top of the combined edge embedding to transform to a vector of sizeoutput_dim
.clip_limits (Tuple[float]) – lower and upper thresholds for LeakyClippedLinear unit on top. If None (not provided), the LeakyClippedLinear unit is not applied.
name (str) – optional name of the defined function, used for error logging
- Returns
Function taking edge tensors with
src
,dst
node embeddings (i.e., pairs of(node_src, node_dst)
tensors) and returning a vector of output_dim length (e.g., edge class probabilities, edge attribute prediction, etc.).
Ensembles¶
Ensembles of graph neural network models, GraphSAGE, GCN, GAT, and HinSAGE, with optional bootstrap sampling of the training data (implemented in the BaggingEnsemble class).
-
class
stellargraph.ensemble.
BaggingEnsemble
(model, n_estimators=3, n_predictions=3)[source]¶ The BaggingEnsemble class can be used to create ensembles of stellargraph’s graph neural network algorithms including GCN, GraphSAGE, GAT, and HinSAGE. Ensembles can be used for training classification and regression problems for node attribute inference and link prediction.
This class can be used to create Bagging ensembles.
Bagging ensembles add model diversity in two ways: (1) by random initialisation of the models’ weights (before training) to different values; and (2) by bootstrap sampling of the training data for each model. That is, each model in the ensemble is trained on a random subset of the training examples, sampled with replacement from the original training data.
-
fit
(generator, train_data, train_targets, steps_per_epoch=None, epochs=1, verbose=1, validation_data=None, validation_steps=None, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True, initial_epoch=0, bag_size=None, use_early_stopping=False, early_stopping_monitor='val_loss')[source]¶ This method trains the ensemble on the data given in train_data and train_targets. If validation data are also given, then the training metrics are evaluated on these data and results printed on screen if verbose level is greater than 0.
The method trains each model in the ensemble in series for the number of epochs specified. Training can also stop early with the best model as evaluated on the validation data, if use_early_stopping is enabled.
Each model in the ensemble is trained using a bootstrapped sample of the data (the train data are re-sampled with replacement.) The number of bootstrap samples can be specified via the bag_size parameter; by default, the number of bootstrap samples equals the number of training points.
For detail descriptions of Keras-specific parameters consult the Keras documentation at https://keras.io/models/sequential/
- Parameters
generator – The generator object for training data. It should be one of type GraphSAGENodeGenerator, HinSAGENodeGenerator, FullBatchNodeGenerator, GraphSAGELinkGenerator, or HinSAGELinkGenerator.
train_data (iterable) – It is an iterable, e.g. list, that specifies the data to train the model with.
train_targets (iterable) – It is an iterable, e.g. list, that specifies the target values for the train data.
steps_per_epoch (None or int) – (Keras-specific parameter) If not None, it specifies the number of steps to yield from the generator before declaring one epoch finished and starting a new epoch.
epochs (int) – (Keras-specific parameter) The number of training epochs.
verbose (int) – (Keras-specific parameter) The verbosity mode that should be 0 , 1, or 2 meaning silent, progress bar, and one line per epoch respectively.
validation_data – A generator for validation data that is optional (None). If not None then, it should be one of type GraphSAGENodeGenerator, HinSAGENodeGenerator, FullBatchNodeGenerator, GraphSAGELinkGenerator, or HinSAGELinkGenerator.
validation_steps (None or int) – (Keras-specific parameter) If validation_generator is not None, then it specifies the number of steps to yield from the generator before stopping at the end of every epoch.
class_weight (None or dict) – (Keras-specific parameter) If not None, it should be a dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to “pay more attention” to samples from an under-represented class.
max_queue_size (int) – (Keras-specific parameter) The maximum size for the generator queue.
workers (int) – (Keras-specific parameter) The maximum number of workers to use.
use_multiprocessing (bool) – (Keras-specific parameter) If True then use process based threading.
shuffle (bool) – (Keras-specific parameter) If True, then it shuffles the order of batches at the beginning of each training epoch.
initial_epoch (int) – (Keras-specific parameter) Epoch at which to start training (useful for resuming a previous training run).
bag_size (None or int) – The number of samples in a bootstrap sample. If None and bagging is used, then the number of samples is equal to the number of training points.
use_early_stopping (bool) – If set to True, then early stopping is used when training each model in the ensemble. The default is False.
early_stopping_monitor (str) – The quantity to monitor for early stopping, e.g., ‘val_loss’, ‘val_weighted_acc’. It should be a valid Keras metric.
- Returns
It returns a list of Keras History objects each corresponding to one trained model in the ensemble.
- Return type
-
-
class
stellargraph.ensemble.
Ensemble
(model, n_estimators=3, n_predictions=3)[source]¶ The Ensemble class can be used to create ensembles of stellargraph’s graph neural network algorithms including GCN, GraphSAGE, GAT, and HinSAGE. Ensembles can be used for training classification and regression problems for node attribute inference and link prediction.
The Ensemble class can be used to create Naive ensembles.
Naive ensembles add model diversity by random initialisation of the models’ weights (before training) to different values. Each model in the ensemble is trained on the same training set of examples.
See also
Example using ensembles: node classification.
Related functionality:
BaggingEnsemble
for bootstrap sampling while training, in addition to random initialisation.-
compile
(optimizer, loss=None, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None)[source]¶ Method for configuring the model for training. It is a wrapper of the keras.models.Model.compile method for all models in the ensemble.
For detailed descriptions of Keras-specific parameters consult the Keras documentation at https://keras.io/models/sequential/
- Parameters
optimizer (Keras optimizer or str) – (Keras-specific parameter) The optimizer to use given either as an instance of a Keras optimizer or a string naming the optimiser of choice.
loss (Keras function or str) – (Keras-specific parameter) The loss function or string indicating the type of loss to use.
metrics (list or dict) – (Keras-specific parameter) List of metrics to be evaluated by each model in the ensemble during training and testing. It should be a list for a model with a single output. To specify different metrics for different outputs of a multi-output model, you could also pass a dictionary.
loss_weights (None or list) – (Keras-specific parameter) Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs. The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. If a list, it is expected to have a 1:1 mapping to the model’s outputs. If a tensor, it is expected to map output names (strings) to scalar coefficients.
sample_weight_mode (None, str, list, or dict) – (Keras-specific parameter) If you need to do timestep-wise sample weighting (2D weights), set this to “temporal”. None defaults to sample-wise weights (1D). If the model has multiple outputs, you can use a different sample_weight_mode on each output by passing a dictionary or a list of modes.
weighted_metrics (list) – (Keras-specific parameter) List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing.
-
evaluate
(generator, test_data=None, test_targets=None, max_queue_size=10, workers=1, use_multiprocessing=False, verbose=0)[source]¶ Evaluates the ensemble on a data (node or link) generator. It makes n_predictions for each data point for each of the n_estimators and returns the mean and standard deviation of the predictions.
For detailed descriptions of Keras-specific parameters consult the Keras documentation at https://keras.io/models/sequential/
- Parameters
generator – The generator object that, if test_data is not None, should be one of type GraphSAGENodeGenerator, HinSAGENodeGenerator, FullBatchNodeGenerator, GraphSAGELinkGenerator, or HinSAGELinkGenerator. However, if test_data is None, then generator should be one of type NodeSequence, LinkSequence, or FullBatchSequence.
test_data (None or iterable) – If not None, then it is an iterable, e.g. list, that specifies the node IDs to evaluate the model on.
test_targets (None or iterable) – If not None, then it is an iterable, e.g. list, that specifies the target values for the test_data.
max_queue_size (int) – (Keras-specific parameter) The maximum size for the generator queue.
workers (int) – (Keras-specific parameter) The maximum number of workers to use.
use_multiprocessing (bool) – (Keras-specific parameter) If True then use process based threading.
verbose (int) – (Keras-specific parameter) The verbosity mode that should be 0 or 1 with the former turning verbosity off and the latter on.
- Returns
The mean and standard deviation of the model metrics for the given data.
- Return type
-
evaluate_generator
(*args, **kwargs)[source]¶ Deprecated: use
evaluate()
.
-
fit
(generator, steps_per_epoch=None, epochs=1, verbose=1, validation_data=None, validation_steps=None, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True, initial_epoch=0, use_early_stopping=False, early_stopping_monitor='val_loss')[source]¶ This method trains the ensemble on the data specified by the generator. If validation data are given, then the training metrics are evaluated on these data and results printed on screen if verbose level is greater than 0.
The method trains each model in the ensemble in series for the number of epochs specified. Training can also stop early with the best model as evaluated on the validation data, if use_early_stopping is set to True.
For detail descriptions of Keras-specific parameters consult the Keras documentation at https://keras.io/models/sequential/
- Parameters
generator – The generator object for training data. It should be one of type NodeSequence, LinkSequence, SparseFullBatchSequence, or FullBatchSequence.
steps_per_epoch (None or int) – (Keras-specific parameter) If not None, it specifies the number of steps to yield from the generator before declaring one epoch finished and starting a new epoch.
epochs (int) – (Keras-specific parameter) The number of training epochs.
verbose (int) – (Keras-specific parameter) The verbosity mode that should be 0 , 1, or 2 meaning silent, progress bar, and one line per epoch respectively.
validation_data – A generator for validation data that is optional (None). If not None then, it should be one of type NodeSequence, LinkSequence, SparseFullBatchSequence, or FullBatchSequence.
validation_steps (None or int) – (Keras-specific parameter) If validation_generator is not None, then it specifies the number of steps to yield from the generator before stopping at the end of every epoch.
class_weight (None or dict) – (Keras-specific parameter) If not None, it should be a dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to “pay more attention” to samples from an under-represented class.
max_queue_size (int) – (Keras-specific parameter) The maximum size for the generator queue.
workers (int) – (Keras-specific parameter) The maximum number of workers to use.
use_multiprocessing (bool) – (Keras-specific parameter) If True then use process based threading.
shuffle (bool) – (Keras-specific parameter) If True, then it shuffles the order of batches at the beginning of each training epoch.
initial_epoch (int) – (Keras-specific parameter) Epoch at which to start training (useful for resuming a previous training run).
use_early_stopping (bool) – If set to True, then early stopping is used when training each model in the ensemble. The default is False.
early_stopping_monitor (str) – The quantity to monitor for early stopping, e.g., ‘val_loss’, ‘val_weighted_acc’. It should be a valid Keras metric.
- Returns
It returns a list of Keras History objects each corresponding to one trained model in the ensemble.
- Return type
-
layers
(indx=None)[source]¶ This method returns the layer objects for the model specified by the value of
indx
.
-
predict
(generator, predict_data=None, summarise=False, output_layer=None, max_queue_size=10, workers=1, use_multiprocessing=False, verbose=0)[source]¶ This method generates predictions for the data produced by the given generator or alternatively the data given in parameter predict_data.
For detailed descriptions of Keras-specific parameters consult the Keras documentation at https://keras.io/models/sequential/
- Parameters
generator – The generator object that, if predict_data is None, should be one of type GraphSAGENodeGenerator, HinSAGENodeGenerator, FullBatchNodeGenerator, GraphSAGELinkGenerator, or HinSAGELinkGenerator. However, if predict_data is not None, then generator should be one of type NodeSequence, LinkSequence, SparseFullBatchSequence, or FullBatchSequence.
predict_data (None or iterable) – If not None, then it is an iterable, e.g. list, that specifies the node IDs to make predictions for. If generator is of type FullBatchNodeGenerator then predict_data should be all the nodes in the graph since full batch approaches such as GCN and GAT can only be used to make predictions for all graph nodes.
summarise (bool) – If True, then the mean of the predictions over self.n_estimators and self.n_predictions are returned for each query point. If False, then all predictions are returned.
output_layer (None or int) – If not None, then the predictions are the outputs of the layer specified. The default is the model’s output layer.
max_queue_size (int) – (Keras-specific parameter) The maximum size for the generator queue.
workers (int) – (Keras-specific parameter) The maximum number of workers to use.
use_multiprocessing (bool) – (Keras-specific parameter) If True then use process based threading.
verbose (int) – (Keras-specific parameter) The verbosity mode that should be 0 or 1 with the former turning verbosity off and the latter on.
- Returns
The predictions. It will have shape
M × K × N × F
ifsummarise
is set toFalse
, orN × F
otherwise.M
is the number of estimators in the ensemble;K
is the number of predictions per query point;N
is the number of query points; andF
is the output dimensionality of the specified layer determined by the shape of the output layer.- Return type
numpy array
-
Calibration¶
Calibration for classification, binary and multi-class, models.
-
class
stellargraph.calibration.
IsotonicCalibration
[source]¶ A class for applying Isotonic Calibration to the outputs of a binary or multi-class classifier.
See also
Related functionality:
expected_calibration_error()
,plot_reliability_diagram()
,TemperatureCalibration
.-
fit
(x_train, y_train)[source]¶ Train a calibration model using the provided data.
- Parameters
x_train (numpy array) – The training data that should be the classifier’s probabilistic outputs. It should have shape N × C where N is the number of training samples and C is the number of classes.
y_train (numpy array) – The training class labels. For binary problems y_train has shape (N,) when N is the number of samples. For multi-class classification, y_train has shape (N,C) where C is the number of classes and y_train is using one-hot encoding.
-
predict
(x)[source]¶ This method calibrates the given data assumed the output of a classification model.
For multi-class classification, the probabilities for each class are first scaled using the corresponding isotonic regression model and then normalized to sum to 1.
- Parameters
x (numpy array) – The values to calibrate. For binary classification problems it should have shape (N,) where N is the number of samples to calibrate. For multi-class classification problems, it should have shape (N, C) where C is the number of classes.
- Returns
The calibrated probabilities. It has shape (N, C) where N is the number of samples and C is the number of classes.
- Return type
numpy array
-
-
class
stellargraph.calibration.
TemperatureCalibration
(epochs=1000)[source]¶ A class for temperature calibration for binary and multi-class classification problems.
For binary classification, Platt Scaling is used for calibration. Platt Scaling was proposed in the paper Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, J. C. Platt, Advances in large margin classifiers, 10(3): 61-74, 1999.
For multi-class classification, Temperature Calibration is used. It is an extension of Platt Scaling and it was proposed in the paper On Calibration of Modern Neural Networks, C. Guo et. al., ICML, 2017.
In Temperature Calibration, a classifier’s non-probabilistic outputs, i.e., logits, are scaled by a trainable parameter called Temperature. The softmax is applied to the rescaled logits to calculate the probabilistic output. As noted in the cited paper, Temperature Scaling does not change the maximum of the softmax function so the classifier’s prediction remain the same.
See also
Related functionality:
expected_calibration_error()
,plot_reliability_diagram()
,IsotonicCalibration
.-
fit
(x_train, y_train, x_val=None, y_val=None)[source]¶ Train the calibration model.
For temperature scaling of a multi-class classifier, If validation data is given, then training stops when the validation accuracy starts increasing. Validation data are ignored for Platt scaling
- Parameters
x_train (numpy array) – The training data that should be a classifier’s non-probabilistic outputs. For calibrating a binary classifier it should have shape (N,) where N is the number of training samples. For calibrating a multi-class classifier, it should have shape (N, C) where N is the number of samples and C is the number of classes.
y_train (numpy array) – The training data class labels. For calibrating a binary classifier it should have shape (N,) where N is the number of training samples. For calibrating a multi-class classifier, it should have shape (N, C) where N is the number of samples and C is the number of classes and the class labels are one-hot encoded.
x_val (numpy array or None) – The validation data used only for calibrating multi-class classification models. It should have shape (M, C) where M is the number of validation samples and C is the number of classes and the class labels are one-hot encoded. that should be the classifier’s non-probabilistic outputs.
y_val (numpy array or None) – The validation data class labels used only for calibrating multi-class classification models. It should have shape (M, C) where M is the number of validation samples and C is the number of classes and the class labels are one-hot encoded.
-
predict
(x)[source]¶ This method calibrates the given data using the learned temperature. It scales each logit by the temperature, exponentiates the results, and finally normalizes the scaled values such that their sum is 1.
- Parameters
x (numpy.ndarray) – The logits. For binary classification problems, it should have dimensionality (N,) where N is the number of samples to calibrate. For multi-class problems, it should have dimensionality (N, C) where C is the number of classes.
- Returns
The calibrated probabilities.
- Return type
numpy array
-
-
stellargraph.calibration.
expected_calibration_error
(prediction_probabilities, accuracy, confidence)[source]¶ Helper function for calculating the expected calibration error as defined in the paper On Calibration of Modern Neural Networks, C. Guo, et. al., ICML, 2017
It is assumed that for a validation dataset, the prediction probabilities have been calculated for each point in the dataset and given in the array prediction_probabilities.
See also
Related functionality:
plot_reliability_diagram()
,IsotonicCalibration
,TemperatureCalibration
.- Parameters
prediction_probabilities (numpy array) – The predicted probabilities.
accuracy (numpy array) – The accuracy such that the i-th entry in the array holds the proportion of correctly classified samples that fall in the i-th bin.
confidence (numpy array) – The confidence such that the i-th entry in the array is the average prediction probability over all the samples assigned to this bin.
- Returns
The expected calibration error.
- Return type
-
stellargraph.calibration.
plot_reliability_diagram
(calibration_data, predictions, ece=None, filename=None)[source]¶ Helper function for plotting a reliability diagram.
See also
Related functionality:
expected_calibration_error()
,IsotonicCalibration
,TemperatureCalibration
.- Parameters
calibration_data (list) – The calibration data as a list where each entry in the list is a 2-tuple of type
numpy.ndarray
. Each entry in the tuple holds the fraction of positives and the mean predicted values for the true and predicted class labels.predictions (np.ndarray) – The probabilistic predictions of the classifier for each sample in the dataset used for diagnosing miscalibration.
ece (None or list of float) – If not None, this list stores the expected calibration error for each class.
filename (str or None) – If not None, the figure is saved on disk in the given filename.
Neo4j Connector¶
The Neo4J connector package contains classes and functions to support sampling from Neo4J databases.
-
class
stellargraph.connector.neo4j.
Neo4jDirectedBreadthFirstNeighbors
(**kwargs)[source]¶ Warning
Neo4jDirectedBreadthFirstNeighbors
is experimental: the class is not fully tested. It may be difficult to use and may have major changes at any time.Breadth First Walk that generates a sampled number of paths from a starting node. It can be used to extract a random sub-graph starting from a set of initial nodes from Neo4j database.
-
run
(nodes=None, n=1, in_size=None, out_size=None)[source]¶ Send queries to Neo4j databases and collect sampled breadth-first walks starting from the root nodes.
- Parameters
nodes (list of hashable) – A list of root node ids such that from each node n BFWs will be generated up to the given depth d.
n (int) – Number of walks per node id.
in_size (list of int) – The number of in-directed nodes to sample with replacement at each depth of the walk.
out_size (list of int) – The number of out-directed nodes to sample with replacement at each depth of the walk.
- Returns
A list of multi-hop neighbourhood samples. Each sample expresses a collection of nodes, which could be either in-neighbors, or out-neighbors of the previous hops. Result has the format: [[head1, head2, …], [in1_head1, in2_head1, …, in1_head2, in2_head2, …], [out1_head1, out2_head1, …, out1_head2, out2_head2, …], [in1_in1_head1, in2_in1_head1, …, in1_in2_head1, …], [out1_in1_head1, out2_in1_head1, …, out1_in2_head1, …], [in1_out1_head1, in2_out1_head1, …, in1_out2_head1, …], [out1_out1_head1, out2_out1_head1, …, out1_out2_head1, …], … ]
-
-
class
stellargraph.connector.neo4j.
Neo4jDirectedGraphSAGENodeGenerator
(**kwargs)[source]¶ Warning
Neo4jDirectedGraphSAGENodeGenerator
is experimental: the class is not fully tested. It may be difficult to use and may have major changes at any time.A data generator for node prediction with homogeneous GraphSAGE models on directed graphs.
At minimum, supply the StellarDiGraph, the batch size, and the number of node samples (separately for in-nodes and out-nodes) for each layer of the GraphSAGE model.
The supplied graph should be a StellarDiGraph object with node features.
Use the
flow()
method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.Example:
G_generator = DirectedGraphSAGENodeGenerator(G, 50, [10,5], [5,1]) train_data_gen = G_generator.flow(train_node_ids, train_node_labels) test_data_gen = G_generator.flow(test_node_ids)
See also
Model using this generator:
DirectedGraphSAGE
.Example using this generator: node classification.
Related functionality:
DirectedGraphSAGENodeGenerator
for usingDirectedGraphSAGE
without Neo4j.- Parameters
graph (Neo4jStellarDiGraph) – Neo4jStellarGraph object
batch_size (int) – Size of batch to return.
in_samples (list) – The number of in-node samples per layer (hop) to take.
out_samples (list) – The number of out-node samples per layer (hop) to take.
name (string, optional) – Optional name for the generator
-
sample_features
(head_nodes, batch_num)[source]¶ Collect the features of the sampled nodes from Neo4j, and return these as a list of feature arrays for the GraphSAGE algorithm.
- Parameters
head_nodes – An iterable of head nodes to perform sampling on.
batch_num – Ignored, because this is not reproducible.
- Returns
(len(head_nodes), num_sampled_at_layer, feature_size)
wherenum_sampled_at_layer
is the total number (cumulative product) of nodes sampled at the given number of hops from each head node, given the sequence of in/out directions.- Return type
A list of feature tensors from the sampled nodes at each layer, each of shape
-
class
stellargraph.connector.neo4j.
Neo4jGraphSAGENodeGenerator
(**kwargs)[source]¶ Warning
Neo4jGraphSAGENodeGenerator
is experimental: the class is not fully tested. It may be difficult to use and may have major changes at any time.A data generator for node prediction with Homogeneous GraphSAGE models
At minimum, supply the Neo4jStellarGraph, the batch size, and the number of node samples for each layer of the GraphSAGE model.
The supplied graph should be a Neo4jStellarGraph object with node features.
Use the
flow()
method supplying the nodes and (optionally) targets to get an object that can be used as a Keras data generator.Example:
G_generator = GraphSAGENodeGenerator(G, 50, [10,10]) train_data_gen = G_generator.flow(train_node_ids, train_node_labels) test_data_gen = G_generator.flow(test_node_ids)
See also
Model using this generator:
GraphSAGE
.Example using this generator: node classification.
Related functionality:
GraphSAGENodeGenerator
for usingGraphSAGE
without Neo4j.- Parameters
graph (Neo4jStellarGraph) – Neo4jStellarGraph object
batch_size (int) – Size of batch to return.
num_samples (list) – The number of samples per layer (hop) to take.
name (int, optional) – Optional name for the generator.
-
sample_features
(head_nodes, batch_num)[source]¶ Collect the features of the nodes sampled from Neo4j, and return these as a list of feature arrays for the GraphSAGE algorithm.
- Parameters
head_nodes – An iterable of head nodes to perform sampling on.
batch_num – Ignored, because this is not reproducible.
- Returns
A list of the same length as
num_samples
of collected features from the sampled nodes of shape:(len(head_nodes), num_sampled_at_layer, feature_size)
wherenum_sampled_at_layer
is the cumulative product ofnum_samples
for that layer.
-
class
stellargraph.connector.neo4j.
Neo4jSampledBreadthFirstWalk
(**kwargs)[source]¶ Warning
Neo4jSampledBreadthFirstWalk
is experimental: the class is not fully tested. It may be difficult to use and may have major changes at any time.Breadth First Walk that generates a sampled number of paths from a starting node. It can be used to extract a random sub-graph starting from a set of initial nodes from Neo4j database.
-
run
(nodes=None, n=1, n_size=None)[source]¶ Send queries to Neo4j graph databases and collect sampled breadth-first walks starting from the root nodes.
- Parameters
nodes (list of hashable) – A list of root node ids such that from each node n BFWs will be generated up to the given depth d.
n_size (list of int) – The number of neighbouring nodes to expand at each depth of the walk. Sampling of neighbours with replacement is always used regardless of the node degree and number of neighbours requested.
n (int) – Number of walks per node id.
seed (int, optional) – Random number generator seed; default is None
- Returns
A list of lists, each list is a sequence of sampled node ids at a certain hop.
-
-
class
stellargraph.connector.neo4j.
Neo4jStellarDiGraph
(graph_db, node_label=None, id_property='ID', features_property='features')[source]¶
-
class
stellargraph.connector.neo4j.
Neo4jStellarGraph
(**kwargs)[source]¶ Warning
Neo4jStellarGraph
is experimental: the class is not tested (see: #1578). It may be difficult to use and may have major changes at any time.Neo4jStellarGraph class for graph machine learning on graphs stored in a Neo4j database.
This class communicates with Neo4j via a
py2neo.Graph
connected to the graph database of interest and contains functions to query the graph data necessary for machine learning.See also
- Parameters
graph_db (py2neo.Graph) – a
py2neo.Graph
connected to a Neo4j graph database.node_label (str, optional) – Common label for all nodes in the graph, if such label exists. Providing this is useful if there are any indexes created on this label (e.g. on node IDs), as it will improve performance of queries.
id_property (str, optional) – Name of Neo4j property to use as ID.
features_property (str, optional) – Name of Neo4j property to use as features.
is_directed (bool, optional) – If True, the data represents a directed multigraph, otherwise an undirected multigraph.
-
cache_all_nodes_in_memory
(dtype='float32')[source]¶ Load all node IDs and features into memory from Neo4j so that subsequent method calls that access node features can use the cached data instead of querying the database.
This method should be avoided for larger graphs.
- Parameters
dtype (str, optional) – Data type of features
-
check_graph_for_ml
(expensive_check=False)[source]¶ Checks if all properties required for machine learning training/inference are set up. An error will be raised if the graph is not correctly setup.
-
clusters
(method='louvain')[source]¶ Performs community detection to cluster the graph.
- Parameters
method (str, optional) – specifies the algorithm to use, can be one of:
louvain
,labelPropagation
.- Returns
- A list of lists, where each inner list corresponds to a cluster and
contains the node ids of the nodes in that cluster.
-
node_feature_sizes
()[source]¶ Get the feature sizes for the node types in the graph.
This method obtains the feature size by sampling a random node from the graph. Currently this class only supports a single default node type, and makes the following assumptions:
all nodes have features as a single list
all nodes’ features have the same size
there’s no mutations that change the size(s)
- Returns
A dictionary of node type and integer feature size.
-
node_features
(nodes)[source]¶ Get the numeric feature vectors for the specified nodes or node type.
- Parameters
nodes (list or hashable, optional) – Node ID or list of node IDs.
- Returns
Numpy array containing the node features for the requested nodes.
-
nodes
()[source]¶ Obtains the collection of nodes in the graph.
- Returns
The node IDs of all the nodes in the graph.
-
to_adjacency_matrix
(node_ids, weighted=False)[source]¶ Obtains a SciPy sparse adjacency matrix for the subgraph containing the nodes specified in node_ids.
-
unique_node_type
(error_message=None)[source]¶ Return the unique node type, for a homogeneous-node graph.
- Parameters
error_message (str, optional) – a custom message to use for the exception; this can use the
%(found)s
placeholder to insert the real sequence of node types.- Returns
If this graph has only one node type, this returns that node type, otherwise it raises a
ValueError
exception.
Loss functions¶
-
class
stellargraph.losses.
SelfAdversarialNegativeSampling
(temperature=1.0, name='self_adversarial_negative_sampling')[source]¶ Computes the self-adversarial binary cross entropy for negative sampling, from [1].
[1] Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang, “RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space,” arXiv:1902.10197
- Parameters
temperature (float, optional) – a scaling factor for the weighting of negative samples
-
call
(labels, logit_scores)[source]¶ - Parameters
labels – tensor of integer labels for each row, either 1 for a true sample, or any value <= 0 for negative samples. Negative samples with identical labels are combined for the softmax normalisation.
logit_scores – tensor of scores for each row in logits
-
stellargraph.losses.
graph_log_likelihood
(*args, **kwargs)[source]¶ Warning
graph_log_likelihood
is experimental: lack of unit tests (see: #804). It may be difficult to use and may have major changes at any time.Computes the graph log likelihood loss function as in https://arxiv.org/abs/1710.09599.
This is different to most Keras loss functions in that it doesn’t directly compare predicted values to expected values. It uses wys_output which contains the dot products of embeddings and expected random walks, and part of the adjacency matrix batch_adj to calculate how well the node embeddings capture the graph structure in some sense.
- Parameters
batch_adj – tensor with shape
batch_rows x 1 x num_nodes
containing rows of the adjacency matrixwys_output – tensor with shape
batch_rows x 2 x num_nodes
containing the embedding outer product scores with shapebatch_rows x 1 x num_nodes
and attentive expected random walk with shapebatch_rows x 1, num_nodes
concatenated.
- Returns
the graph log likelihood loss for the batch
Utilities¶
This contains the utility objects used by the StellarGraph library.
-
stellargraph.utils.
plot_history
(history, individual_figsize=7, 4, return_figure=False, **kwargs)[source]¶ Plot the training history of one or more models.
This creates a column of plots, with one plot for each metric recorded during training, with the plot showing the metric vs. epoch. If multiple models have been trained (that is, a list of histories is passed in), each metric plot includes multiple train and validation series.
Validation data is optional (it is detected by metrics with names starting with
val_
).- Parameters
history – the training history, as returned by
tf.keras.Model.fit()
individual_figsize (tuple of numbers) – the size of the plot for each metric
return_figure (bool) – if True, then the figure object with the plots is returned, None otherwise.
kwargs – additional arguments to pass to
matplotlib.pyplot.subplots()
- Returns
The figure object with the plots if
return_figure=True
, None otherwise- Return type
matplotlib.figure.Figure
-
stellargraph.utils.hyperbolic.
poincare_ball_distance
(c, x, y)[source]¶ Distance between
x
andy
, on the Poincaré ball with curvature-c
: \(d_c(\mathbf{x}, \mathbf{y})\).See Section 2 of [1] for more details.
[1] O.-E. Ganea, G. Bécigneul, and T. Hofmann, “Hyperbolic Neural Networks,” arXiv:1805.09112, Jun. 2018.
- Parameters
c (tensorflow Tensor-like) – the curvature of the hyperbolic space(s). Must be able to be broadcast to
x
andy
.x (tensorflow Tensor-like) – a tensor containing vectors in hyperbolic space, where each vector is an element of the last axis (for example, if
x
has shape(2, 3, 4)
, it represents2 * 3 = 6
hyperbolic vectors, each of length4
). Must be able to be broadcast toy
.y (tensorflow Tensor-like) – a tensor containing vectors in hyperbolic space, where each vector is an element of the last axis similar to
x
. Must be able to be broadcast tox
.
- Returns
A TensorFlow Tensor containing the hyperbolic distance between each of the vectors (last axis) in
x
andy
, using the corresponding curvature fromc
. This tensor has the same shape as the Euclidean equivalenttf.norm(x - y)
.
-
stellargraph.utils.hyperbolic.
poincare_ball_exp
(c, x, v)[source]¶ The exponential map of
v
atx
on the Poincaré ball with curvature-c
: \(\exp_{\mathbf{x}}^c(\mathbf{v})\).See Section 2 of [1] for more details.
[1] O.-E. Ganea, G. Bécigneul, and T. Hofmann, “Hyperbolic Neural Networks,” arXiv:1805.09112, Jun. 2018.
- Parameters
c (tensorflow Tensor-like) – the curvature of the hyperbolic space(s). Must be able to be broadcast to
x
andv
.x (tensorflow Tensor-like, optional) – a tensor containing vectors in hyperbolic space representing the base points for the exponential map, where each vector is an element of the last axis (for example, if
x
has shape(2, 3, 4)
, it represents2 * 3 = 6
hyperbolic vectors, each of length4
). Must be able to be broadcast tov
. An explicitx = None
is equivalent tox
being all zeros, but uses a more efficient form of \(\exp_{\mathbf{0}}^c(\mathbf{v})\).v (tensorflow Tensor-like) – a tensor containing vectors in Euclidean space representing the tangent vectors for the exponential map, where each vector is an element of the last axis similar to
x
. Must be able to be broadcast tox
.
-
stellargraph.utils.hyperbolic.
poincare_ball_mobius_add
(c, x, y)[source]¶ Möbius addition of
x
andy
, on the Poincaré ball with curvature-c
: \(\mathbf{x} \oplus^c \mathbf{y}\).See Section 2 of [1] for more details.
[1] O.-E. Ganea, G. Bécigneul, and T. Hofmann, “Hyperbolic Neural Networks,” arXiv:1805.09112, Jun. 2018.
- Parameters
c (tensorflow Tensor-like) – the curvature of the hyperbolic space(s). Must be able to be broadcast to
x
andy
.x (tensorflow Tensor-like) – a tensor containing vectors in hyperbolic space, where each vector is an element of the last axis (for example, if
x
has shape(2, 3, 4)
, it represents2 * 3 = 6
hyperbolic vectors, each of length4
). Must be able to be broadcast toy
.y (tensorflow Tensor-like) – a tensor containing vectors in hyperbolic space, where each vector is an element of the last axis similar to
x
. Must be able to be broadcast tox
.
- Returns
A TensorFlow Tensor containing the Möbius addition of each of the vectors (last axis) in
x
andy
, using the corresponding curvature fromc
. This tensor has the same shape as the Euclidean equivalentx + y
.
Datasets¶
The datasets package contains classes to download sample datasets
-
class
stellargraph.datasets.
AIFB
[source]¶ The AIFB dataset describes the AIFB research institute in terms of its staff, research group, and publications. First used for machine learning with RDF in Bloehdorn, Stephan and Sure, York, “Kernel Methods for Mining Instance Data in Ontologies”, The Semantic Web (2008), http://dx.doi.org/10.1007/978-3-540-76298-0_5. It contains ~8k entities, ~29k edges, and 45 different relationships or edge types. In (Bloehdorn et al 2007) the dataset was first used to predict the affiliation (i.e., research group) for people in the dataset. The dataset contains 178 members of a research group with 5 different research groups. The goal is to predict which research group a researcher belongs to.
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://figshare.com/articles/AIFB_DataSet/745364
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Loads the dataset into a directed heterogeneous graph.
The nodes features are the node’s position after being one-hot encoded; for example, the first node has features
[1, 0, 0, ...]
, the second has[0, 1, 0, ...]
.This requires the
rdflib
library to be installed.- Returns
A tuple where the first element is a graph containing all edges except for those with type
affiliation
andemploys
(the inverse ofaffiliation
), and the second element is a DataFrame containing the one-hot encoded affiliation of the 178 nodes that have an affiliation.
-
property
-
class
stellargraph.datasets.
BlogCatalog3
[source]¶ This dataset is crawled from a social blog directory website BlogCatalog http://www.blogcatalog.com and contains the friendship network crawled and group memberships.
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://figshare.com/articles/BlogCatalog_dataset/11923611
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this dataset into an undirected heterogeneous graph, downloading it if required.
The graph has two types of nodes, ‘user’ and ‘group’, and two types of edges, ‘friend’ and ‘belongs’. The ‘friend’ edges connect two ‘user’ nodes and the ‘belongs’ edges connects ‘user’ and ‘group’ nodes.
The node and edge types are not included in the dataset that is a collection of node and group ids along with the list of edges in the graph.
Important note about the node IDs: The dataset uses integers for node ids. However, the integers from 1 to 39 are used as IDs for both users and groups. This would cause a confusion when constructing the graph object. As a result, we convert all IDs to string and append the character ‘u’ to the integer ID for user nodes and the character ‘g’ to the integer ID for group nodes.
- Returns
A
StellarGraph
object.
-
property
-
class
stellargraph.datasets.
CiteSeer
[source]¶ The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links, although 17 of these have a source or target publication that isn’t in the dataset and only 4715 are included in the graph. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words.
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://linqs.soe.ucsc.edu/data
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
(largest_connected_component_only=False)[source]¶ Load this dataset into an undirected homogeneous graph, downloading it if required.
The node feature vectors are included.
- Parameters
largest_connected_component_only (bool) – if True, returns only the largest connected component, not the whole graph.
- Returns
A tuple where the first element is the
StellarGraph
object with the nodes, node feature vectors and edges, and the second element is a pandas Series of the node subject class labels.
-
property
-
class
stellargraph.datasets.
Cora
[source]¶ The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://linqs.soe.ucsc.edu/data
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
(directed=False, largest_connected_component_only=False, subject_as_feature=False, edge_weights=None, str_node_ids=False)[source]¶ Load this dataset into a homogeneous graph that is directed or undirected, downloading it if required.
The node feature vectors are included, and the edges are treated as directed or undirected depending on the
directed
parameter.- Parameters
directed (bool) – if True, return a directed graph, otherwise return an undirected one.
largest_connected_component_only (bool) – if True, returns only the largest connected component, not the whole graph.
edge_weights (callable, optional) – a function that accepts three parameters: an unweighted StellarGraph containing node features, a Pandas Series of the node labels, a Pandas DataFrame of the edges (with source and target columns). It should return a sequence of numbers (e.g. a 1D NumPy array) of edge weights for each edge in the DataFrame.
str_node_ids (bool) – if True, load the node IDs as strings, rather than integers.
subject_as_feature (bool) – if True, the subject for each paper (node) is included in the node features, one-hot encoded (the subjects are still also returned as a Series).
- Returns
A tuple where the first element is the
StellarGraph
object (orStellarDiGraph
, ifdirected == True
) with the nodes, node feature vectors and edges, and the second element is a pandas Series of the node subject class labels.
-
property
-
class
stellargraph.datasets.
FB15k
[source]¶ This FREEBASE FB15k DATA consists of a collection of triplets (synset, relation_type, triplet)extracted from Freebase (http://www.freebase.com). There are 14,951 nodes and 1,345 relation types among them. The training set contains 483142 triplets, the validation set 50000 and the test set 59071. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán, Jason Weston and Oksana Yakhnenko “Translating Embeddings for Modeling Multi-relational Data” (2013).
Note: this dataset contains many inverse relations, and so should only be used to compare against published results. Prefer FB15k_237. See: Kristina Toutanova and Danqi Chen “Observed versus latent features for knowledge base and text inference” (2015), and Dettmers, Tim, Pasquale Minervini, Pontus Stenetorp and Sebastian Riedel “Convolutional 2D Knowledge Graph Embeddings” (2017).
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://everest.hds.utc.fr/doku.php?id=en:transe
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this data into a directed heterogeneous graph.
- Returns
A tuple
(graph, train, test, validation)
wheregraph
is aStellarDiGraph
containing all the data, and the remaining three elements are DataFrames of triplets, with columnssource
&target
(synsets) andlabel
(the relation type). The three DataFrames together make up the edges included ingraph
.
-
property
-
class
stellargraph.datasets.
FB15k_237
[source]¶ This FREEBASE FB15k DATA consists of a collection of triplets (synset, relation_type, triplet)extracted from Freebase (http://www.freebase.com). There are 14541 nodes and 237 relation types among them. The training set contains 272115 triplets, the validation set 17535 and the test set 20466.It is a reduced version of FB15k where inverse relations have been removed.Kristina Toutanova and Danqi Chen “Observed versus latent features for knowledge base and text inference” (2015).
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://github.com/TimDettmers/ConvE
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this data into a directed heterogeneous graph.
- Returns
A tuple
(graph, train, test, validation)
wheregraph
is aStellarDiGraph
containing all the data, and the remaining three elements are DataFrames of triplets, with columnssource
&target
(synsets) andlabel
(the relation type). The three DataFrames together make up the edges included ingraph
.
-
property
-
class
stellargraph.datasets.
IAEnronEmployees
[source]¶ A dataset of edges that represent emails sent from one employee to another.There are 50572 edges, and each of them contains timestamp information. Edges refer to 151 unique node IDs in total.Ryan A. Rossi and Nesreen K. Ahmed “The Network Data Repository with Interactive Graph Analytics and Visualization” (2015)
See also
For more information about loading data for graph machine learning:
The source of this dataset: http://networkrepository.com/ia-enron-employees.php
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this data into a set of nodes and edges
- Returns
A tuple
(graph, edges)
graph
is aStellarGraph
containing all the data. Timestamp information on edges are encoded as edge weights.edges
are the original edges from the dataset which are sorted in ascending order of time - these can be used to create train/test splits based on time values.Node IDs in the returned data structures are all converted to strings to allow for compatibility with with
gensim
’sWord2Vec
model.
-
property
-
class
stellargraph.datasets.
MUTAG
[source]¶ Each graph represents a chemical compound and graph labels represent ‘their mutagenic effect on a specific gram negative bacterium.’The dataset includes 188 graphs with 18 nodes and 20 edges on average for each graph. Graph nodes have 7 labels and each graph is labelled as belonging to 1 of 2 classes.
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this dataset into a list of StellarGraph objects with corresponding labels, downloading it if required.
Note: Edges in MUTAG are labelled as one of 4 values: aromatic, single, double, and triple indicated by integers 0, 1, 2, 3 respectively. The edge labels are included in the
StellarGraph
objects as edge weights in integer representation.- Returns
A tuple that is a list of
StellarGraph
objects and a Pandas Series of labels one for each graph.
-
property
-
class
stellargraph.datasets.
MovieLens
[source]¶ The MovieLens 100K dataset contains 100,000 ratings from 943 users on 1682 movies.
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://grouplens.org/datasets/movielens/100k/
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this dataset into an undirected heterogeneous graph, downloading it if required.
The graph has two types of nodes (
user
andmovie
) and one type of edge (rating
).The dataset includes some node features on both users and movies: on users, they consist of categorical features (
gender
andjob
) which are one-hot encoded into binary features, and anage
feature that is scaled to have mean = 0 and standard deviation = 1.- Returns
A tuple where the first element is a
StellarGraph
instance containing the graph data and features, and the second element is a pandas DataFrame of edges, with columnsuser_id
,movie_id
andrating
(a label from 1 to 5).
-
property
-
class
stellargraph.datasets.
PROTEINS
[source]¶ Each graph represents a protein and graph labels represent whether they are are enzymes or non-enzymes. The dataset includes 1113 graphs with 39 nodes and 73 edges on average for each graph. Graph nodes have 4 attributes (including a one-hot encoding of their label), and each graph is labelled as belonging to 1 of 2 classes.
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this dataset into a list of StellarGraph objects with corresponding labels, downloading it if required.
- Returns
A tuple that is a list of
StellarGraph
objects and a Pandas Series of labels one for each graph.
-
property
-
class
stellargraph.datasets.
PubMedDiabetes
[source]¶ The PubMed Diabetes dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://linqs.soe.ucsc.edu/data
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this graph into an undirected homogeneous graph, downloading it if required.
- Returns
A tuple where the first element is a
StellarGraph
instance containing the graph data and features, and the second element is a pandas Series of node class labels.
-
property
-
class
stellargraph.datasets.
WN18
[source]¶ The WN18 dataset consists of triplets from WordNet 3.0 (http://wordnet.princeton.edu). There are 40,943 synsets and 18 relation types among them. The training set contains 141442 triplets, the validation set 5000 and the test set 5000. Antoine Bordes, Xavier Glorot, Jason Weston and Yoshua Bengio “A Semantic Matching Energy Function for Learning with Multi-relational Data” (2014).
Note: this dataset contains many inverse relations, and so should only be used to compare against published results. Prefer WN18RR. See: Kristina Toutanova and Danqi Chen “Observed versus latent features for knowledge base and text inference” (2015), and Dettmers, Tim, Pasquale Minervini, Pontus Stenetorp and Sebastian Riedel “Convolutional 2D Knowledge Graph Embeddings” (2017).
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://everest.hds.utc.fr/doku.php?id=en:transe
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this data into a directed heterogeneous graph.
- Returns
A tuple
(graph, train, test, validation)
wheregraph
is aStellarDiGraph
containing all the data, and the remaining three elements are DataFrames of triplets, with columnssource
&target
(synsets) andlabel
(the relation type). The three DataFrames together make up the edges included ingraph
.
-
property
-
class
stellargraph.datasets.
WN18RR
[source]¶ The WN18RR dataset consists of triplets from WordNet 3.0 (http://wordnet.princeton.edu). There are 40,943 synsets and 11 relation types among them. The training set contains 86835 triplets, the validation set 3034 and the test set 3134. It is a reduced version of WN18 where inverse relations have been removed.Tim Dettmers, Pasquale Minervini, Pontus Stenetorp and Sebastian Riedel “Convolutional 2D Knowledge Graph Embeddings” (2017).
See also
For more information about loading data for graph machine learning:
The source of this dataset: https://github.com/TimDettmers/ConvE
-
property
data_directory
¶ The full path of the directory containing the data content files for this dataset.
- Type
-
download
(ignore_cache: Optional[bool] = False) → None¶ Download the dataset (if not already downloaded)
- Parameters
ignore_cache (bool, optional) – Ignore a cached dataset and force a re-download.
- Raises
FileNotFoundError – If the dataset is not successfully downloaded.
-
load
()[source]¶ Load this data into a directed heterogeneous graph.
- Returns
A tuple
(graph, train, test, validation)
wheregraph
is aStellarDiGraph
containing all the data, and the remaining three elements are DataFrames of triplets, with columnssource
&target
(synsets) andlabel
(the relation type). The three DataFrames together make up the edges included ingraph
.
-
property
Random¶
stellargraph.random
contains functions to control the randomness behaviour in StellarGraph.
-
stellargraph.random.
set_seed
(seed)[source]¶ Create a new global RandomState using the provided seed. If seed is None, StellarGraph’s global RandomState object simply wraps the global random state for each external module.
When trying to create a reproducible workflow using this function, please note that this seed only controls the randomness of the non-TensorFlow part of the library. Randomness within TensorFlow layers is controlled via TensorFlow’s own global random seed, which can be set using
tensorflow.random.set_seed
.- Parameters
seed (int, optional) – random seed