Introduction

StellarGraph is a Python library for machine learning on graph-structured (or equivalently, network-structured) data.

Graph-structured data represent entities, e.g., people, as nodes (or equivalently, vertices), and relationships between entities, e.g., friendship, as links (or equivalently, edges). Nodes and links may have associated attributes such as age, income, time when a friendship was established, etc. StellarGraph supports analysis of both homogeneous networks (with nodes and links of one type) and heterogeneous networks (with more than one type of nodes and/or links).

The StellarGraph library implements several state-of-the-art algorithms for applying machine learning methods to discover patterns and answer questions using graph-structured data.

The StellarGraph library can be used to solve tasks using graph-structured data, such as: - Representation learning for nodes and edges, to be used for visualisation and various downstream machine learning tasks; - Classification and attribute inference of nodes or edges; - Link prediction.

We provide Examples of using StellarGraph to solve such tasks using several real-world datasets.

Getting Started

To get started with StellarGraph you’ll need data structured as homogeneous or heterogeneous graph, including attributes for the entities represented as graph nodes. NetworkX is used to represent the graph and Pandas, Scikit-Learn, and/or Numpy can be used to are used to store node attributes.

Detailed and narrated examples of various machine learning workflows on network data, supported by StellarGraph, from data ingestion into graph structure to inference, are given in the demos directory of this repository.

Requirements

Main requirements (all requirements are in requirements.txt)

Installation

StellarGraph is a Python 3 library and requires Python version 3.6 to function (note that the library uses Keras with the Tensorflow backend, and thus does not currently work in python 3.7). The required Python version can be downloaded and installed from python.org. Alternatively, use the Anaconda Python environment, available from anaconda.com.

The StellarGraph library can be installed in one of two ways, described next.

Install StellarGraph using pip:

To install StellarGraph library from PyPi using pip, execute the following command:

pip install stellargraph

Some of the examples require installing additional dependencies as well as stellargraph. To install these dependencies using pip, execute the following command:

pip install stellargraph[demos]

Install StellarGraph from Github source:

Install a git client, for example install the XCode git client by typing:

git --install

First, clone the StellarGraph repository using git:

git clone https://github.com/stellargraph/stellargraph.git

Then, cd to the StellarGraph folder, and install the libraray by executing the following commands:

cd stellargraph
pip install -r requirements.txt
pip install .

Other requirements are the NetworkX library (to create and modify graphs and networks), numpy (to manipulate numeric arrays), pandas (to manipulate tabular data), and gensim (to use the Word2Vec model), scikit-learn (to prepare datasets for machine learning), and matplotlib (for plotting).

Examples

Getting the datasets

The StellarGraph examples require datasets to work. They are not supplied with stellargraph, and need to be downloaded separately.

CORA dataset

This dataset can be downloaded from https://linqs-data.soe.ucsc.edu/public/lbc/cora.tgz

Download and unzip the cora.tgz file to a location on your computer and pass this location as a command line argument to the example scripts, as detailed below.

Running the GraphSAGE Cora Node Classification Example

  • Install the stellargraph Python library, explained in the previous section.

  • If you haven’t already done so, clone the StellarGraph repository using git:

    git clone https://github.com/stellargraph/stellargraph.git
    
  • Download and decompress the CORA dataset (see Getting the datasets above).

  • Change to the Cora node classification directory under demos:

    cd /path/to/stellargraph/demos/node-classification-graphsage
    
  • Run the example script and specify the location of the downloaded CORA dataset with the following command:

    python graphsage-cora-example.py -l <path_to_cora_dataset>
    
  • Additional arguments can be specified that change the GraphSAGE model and training parameters, a description of these arguments is displayed using the help option to the script:

    python cora-example.py --help
    

Running Other Examples

There are several other examples in the demos directory. Read the README.md in the demos directory to find out more.