Intepretability on Hateful Twitter Datasets

In this demo, we apply saliency maps (with support of sparse tensors) on the task on the detection of Twitter users who use hateful lexicon using graph machine learning with Stellargraph.

We consider the use-case of identifying hateful users on Twitter motivated by the work in [1] and using the dataset also published in [1]. Classification is based on a graph based on users’ retweets and attributes as related to their account activity, and the content of tweets.

We pose identifying hateful users as a binary classification problem. We demonstrate the advantage of connected vs unconnected data in a semi-supervised setting with few training examples.

For connected data, we use Graph Convolutional Networks [2] as implemented in the stellargraph library. We pose the problem of identifying hateful tweeter users as node attribute inference in graphs.

We then use the interpretability tool (i.e., saliency maps) implemented in our library to demonstrate how to obtain the importance of the node features and links to gain insights into the model.

References

  1. “Like Sheep Among Wolves”: Characterizing Hateful Users on Twitter. M. H. Ribeiro, P. H. Calais, Y. A. Santos, V. A. F. Almeida, and W. Meira Jr. arXiv preprint arXiv:1801.00317 (2017).
  2. Semi-Supervised Classification with Graph Convolutional Networks. T. Kipf, M. Welling. ICLR 2017. arXiv:1609.02907

Run the master version of this notebook:

[1]:
# install StellarGraph if running on Google Colab
import sys
if 'google.colab' in sys.modules:
  %pip install -q stellargraph[demos]==1.0.0rc1
[2]:
# verify that we're using the correct version of StellarGraph for this notebook
import stellargraph as sg

try:
    sg.utils.validate_notebook_version("1.0.0rc1")
except AttributeError:
    raise ValueError(
        f"This notebook requires StellarGraph version 1.0.0rc1, but a different version {sg.__version__} is installed.  Please see <https://github.com/stellargraph/stellargraph/issues/1172>."
    ) from None
[3]:
import networkx as nx
import pandas as pd
import numpy as np
import seaborn as sns
import itertools
import os

from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.linear_model import LogisticRegressionCV

import stellargraph as sg
from stellargraph.mapper import GraphSAGENodeGenerator, FullBatchNodeGenerator
from stellargraph.layer import GraphSAGE, GCN, GAT
from stellargraph import globalvar

from tensorflow.keras import layers, optimizers, losses, metrics, Model, models
from sklearn import preprocessing, feature_extraction
from sklearn.model_selection import train_test_split
from sklearn import metrics

import matplotlib.pyplot as plt
import seaborn as sns
from scipy.sparse import csr_matrix, lil_matrix
%matplotlib inline

Train GCN model on the dataset

[4]:
data_dir = os.path.expanduser("~/data/hateful-twitter-users")

First load and prepare the node features

Each node in the graph is associated with a large number of features (also referred to as attributes).

The list of features is given here. We repeated here for convenience.

hate :(“hateful”|”normal”|”other”) if user was annotated as hateful, normal, or not annotated.

(is_50|is_50_2) :bool whether user was deleted up to 12/12/17 or 14/01/18.

(is_63|is_63_2) :bool whether user was suspended up to 12/12/17 or 14/01/18.

(hate|normal)_neigh :bool is the user on the neighborhood of a (hateful|normal) user?

[c_] (statuses|follower|followees|favorites)_count :int number of (tweets|follower|followees|favorites) a user has.

[c_] listed_count:int number of lists a user is in.

[c_] (betweenness|eigenvector|in_degree|outdegree) :float centrality measurements for each user in the retweet graph.

[c_] *_empath :float occurrences of empath categories in the users latest 200 tweets.

[c_] *_glove :float
glove vector calculated for users latest 200 tweets.

[c_] (sentiment|subjectivity) :float average sentiment and subjectivity of users tweets.

[c_] (time_diff|time_diff_median) :float average and median time difference between tweets.

[c_] (tweet|retweet|quote) number :float percentage of direct tweets, retweets and quotes of an user.

[c_] (number urls|number hashtags|baddies|mentions) :float number of bad words|mentions|urls|hashtags per tweet in average.

[c_] status length :float average status length.

hashtags :string all hashtags employed by the user separated by spaces.

Notice that c_ are attributes calculated for the 1-neighborhood of a user in the retweet network (averaged out).

First, we are going to load the user features and prepare them for machine learning.

[5]:
users_feat = pd.read_csv(os.path.join(data_dir, "users_neighborhood_anon.csv"))

Data cleaning and preprocessing

The dataset as given includes a large number of graph related features that are manually extracted.

Since we are going to employ modern graph neural networks methods for classification, we are going to drop these manually engineered features.

The power of Graph Neural Networks stems from their ability to learn useful graph-related features eliminating the need for manual feature engineering.

[6]:
def data_cleaning(feat):
    feat = feat.drop(columns=["hate_neigh", "normal_neigh"])

    # Convert target values in hate column from strings to integers (0,1,2)
    feat["hate"] = np.where(
        feat["hate"] == "hateful", 1, np.where(feat["hate"] == "normal", 0, 2)
    )

    # missing information
    number_of_missing = feat.isnull().sum()
    number_of_missing[number_of_missing != 0]

    # Replace NA with 0
    feat.fillna(0, inplace=True)

    # droping info about suspension and deletion as it is should not be use din the predictive model
    feat.drop(
        feat.columns[feat.columns.str.contains("is_|_glove|c_|sentiment")],
        axis=1,
        inplace=True,
    )

    # drop hashtag feature
    feat.drop(["hashtags"], axis=1, inplace=True)

    # Drop centrality based measures
    feat.drop(
        columns=["betweenness", "eigenvector", "in_degree", "out_degree"], inplace=True
    )

    feat.drop(columns=["created_at"], inplace=True)

    return feat
[7]:
node_data = data_cleaning(users_feat)

The continous features in our dataset have distributions with very long tails. We apply normalization to correct for this.

[8]:
# Ignore the first two columns because those are user_id and hate (the target variable)
df_values = node_data.iloc[:, 2:].values
[9]:
pt = preprocessing.PowerTransformer(method="yeo-johnson", standardize=True)
[10]:
df_values_log = pt.fit_transform(df_values)
[11]:
node_data.iloc[:, 2:] = df_values_log
[12]:
# Set the dataframe index to be the same as the user_id and drop the user_id columns
node_data.index = node_data.index.map(str)
node_data.drop(columns=["user_id"], inplace=True)

Next load the graph

Now that we have the node features prepared for machine learning, let us load the retweet graph.

[13]:
g_nx = nx.read_edgelist(path=os.path.expanduser(os.path.join(data_dir, "users.edges")))
[14]:
g_nx.number_of_nodes(), g_nx.number_of_edges()
[14]:
(100386, 2194979)

The graph has just over 100k nodes and approximately 2.2m edges.

We aim to train a graph neural network model that will predict the “hate”attribute on the nodes.

For computation convenience, we have mapped the target labels normal, hateful, and other to the numeric values 0, 1, and 2 respectively.

[15]:
print(set(node_data["hate"]))
{0, 1, 2}
[16]:
node_data = node_data.loc[list(g_nx.nodes())]
node_data.head()
[16]:
hate statuses_count followers_count followees_count favorites_count listed_count negotiate_empath vehicle_empath science_empath timidity_empath ... number hashtags tweet number retweet number quote number status length number urls baddies mentions time_diff time_diff_median
10999 2 0.651057 -0.228440 0.539018 1.468664 0.319936 0.060148 -1.573040 0.468232 -0.446347 ... -0.347727 -0.087181 0.355153 1.193070 0.010627 0.314380 0.581937 0.017239 -0.772738 -0.713314
55317 2 0.527130 0.159289 0.603327 0.116831 0.400391 -0.170600 0.731748 -0.155481 0.487008 ... -0.159648 0.863400 -0.628442 1.058797 -0.400813 -0.034034 -0.023220 0.088925 0.209697 0.501357
44622 2 -0.972049 0.513316 0.003403 0.041867 0.682879 0.398669 -0.434141 -0.439622 0.134869 ... 1.059839 -0.068104 0.338591 -0.254387 1.066497 1.200203 0.243681 0.661312 1.318291 1.403518
71821 2 1.003596 1.295017 0.219550 0.198376 1.810431 -0.601582 -1.187685 0.012743 0.684971 ... -1.705789 0.335796 -0.035509 -1.125292 -0.736826 -0.555163 -0.429600 0.542465 -0.675596 -0.164192
57907 2 1.158887 1.763834 2.302950 -0.603070 1.965467 1.635436 -1.573040 -1.285986 -1.540435 ... 0.994608 1.001552 -0.818391 0.511212 0.249450 -0.184754 0.682368 1.253365 -0.766926 -0.781316

5 rows × 205 columns

Splitting the data

For machine learning we want to take a subset of the nodes for training, and use the rest for validation and testing. We’ll use scikit-learn again to split our data into training and test sets.

The total number of annotated nodes is very small when compared to the total number of nodes in the graph. We are only going to use 15% of the annotated nodes for training and the remaining 85% of nodes for testing.

First, we are going to select the subset of nodes that are annotated as hateful or normal. These will be the nodes that have ‘hate’ values that are either 0 or 1.

[17]:
# choose the nodes annotated with normal or hateful classes
annotated_users = node_data[node_data["hate"] != 2]
[18]:
annotated_user_features = annotated_users.drop(columns=["hate"])
annotated_user_targets = annotated_users[["hate"]]

There are 4971 annoted nodes out of a possible, approximately, 100k nodes.

[19]:
print(annotated_user_targets.hate.value_counts())
0    4427
1     544
Name: hate, dtype: int64
[20]:
# split the data
train_data, test_data, train_targets, test_targets = train_test_split(
    annotated_user_features, annotated_user_targets, test_size=0.85, random_state=101
)
train_targets = train_targets.values
test_targets = test_targets.values
print("Sizes and class distributions for train/test data")
print("Shape train_data {}".format(train_data.shape))
print("Shape test_data {}".format(test_data.shape))
print(
    "Train data number of 0s {} and 1s {}".format(
        np.sum(train_targets == 0), np.sum(train_targets == 1)
    )
)
print(
    "Test data number of 0s {} and 1s {}".format(
        np.sum(test_targets == 0), np.sum(test_targets == 1)
    )
)
Sizes and class distributions for train/test data
Shape train_data (745, 204)
Shape test_data (4226, 204)
Train data number of 0s 667 and 1s 78
Test data number of 0s 3760 and 1s 466
[21]:
train_targets.shape, test_targets.shape
[21]:
((745, 1), (4226, 1))
[22]:
train_data.shape, test_data.shape
[22]:
((745, 204), (4226, 204))

We are going to use 745 nodes for training and 4226 nodes for testing.

[23]:
# choosing features to assign to a graph, excluding target variable
node_features = node_data.drop(columns=["hate"])

Dealing with imbalanced data

Because the training data exhibit high imbalance, we introduce class weights.

[24]:
from sklearn.utils.class_weight import compute_class_weight

class_weights = compute_class_weight(
    "balanced", np.unique(train_targets), train_targets[:, 0]
)
train_class_weights = dict(zip(np.unique(train_targets), class_weights))
train_class_weights
[24]:
{0: 0.5584707646176912, 1: 4.7756410256410255}

Our data is now ready for machine learning.

Node features are stored in the Pandas DataFrame node_features.

The graph in networkx format is stored in the variable g_nx.

Specify global parameters

Here we specify some parameters that control the type of model we are going to use. For example, we specify the base model type, e.g., GCN, GraphSAGE, etc, as well as model-specific parameters.

[25]:
epochs = 20

Creating the base graph machine learning model in Keras

Now create a StellarGraph object from the NetworkX graph and the node features and targets. It is StellarGraph objects that we use in this library to perform machine learning tasks on.

[26]:
G = sg.StellarGraph.from_networkx(g_nx, node_features=node_features)

To feed data from the graph to the Keras model we need a generator. The generators are specialized to the model and the learning task.

For training we map only the training nodes returned from our splitter and the target values.

[27]:
generator = FullBatchNodeGenerator(G, method="gcn", sparse=True)
train_gen = generator.flow(train_data.index, train_targets)
Using GCN (local pooling) filters...
[28]:
base_model = GCN(
    layer_sizes=[32, 16],
    generator=generator,
    bias=True,
    dropout=0.5,
    activations=["elu", "elu"],
)
x_inp, x_out = base_model.in_out_tensors()
prediction = layers.Dense(units=1, activation="sigmoid")(x_out)

Create a Keras model

Now let’s create the actual Keras model with the graph inputs x_inp provided by the base_model and outputs being the predictions from the softmax layer.

[29]:
model = Model(inputs=x_inp, outputs=prediction)

We compile our Keras model to use the Adam optimiser and the binary cross entropy loss.

[30]:
model.compile(
    optimizer=optimizers.Adam(lr=0.005), loss=losses.binary_crossentropy, metrics=["acc"],
)
[31]:
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(1, 100386, 204)]   0
__________________________________________________________________________________________________
input_3 (InputLayer)            [(1, None, 2)]       0
__________________________________________________________________________________________________
input_4 (InputLayer)            [(1, None)]          0
__________________________________________________________________________________________________
dropout (Dropout)               (1, 100386, 204)     0           input_1[0][0]
__________________________________________________________________________________________________
input_2 (InputLayer)            [(1, None)]          0
__________________________________________________________________________________________________
squeezed_sparse_conversion (Squ (100386, 100386)     0           input_3[0][0]
                                                                 input_4[0][0]
__________________________________________________________________________________________________
graph_convolution (GraphConvolu (1, 100386, 32)      6560        dropout[0][0]
                                                                 input_2[0][0]
                                                                 squeezed_sparse_conversion[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             (1, 100386, 32)      0           graph_convolution[0][0]
__________________________________________________________________________________________________
graph_convolution_1 (GraphConvo (1, None, 16)        528         dropout_1[0][0]
                                                                 input_2[0][0]
                                                                 squeezed_sparse_conversion[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (1, None, 1)         17          graph_convolution_1[0][0]
==================================================================================================
Total params: 7,105
Trainable params: 7,105
Non-trainable params: 0
__________________________________________________________________________________________________

Train the model, keeping track of its loss and accuracy on the training set, and its performance on the test set during the training. We don’t use the test set during training but only for measuring the trained model’s generalization performance.

[32]:
test_gen = generator.flow(test_data.index, test_targets)
history = model.fit(
    train_gen,
    epochs=epochs,
    validation_data=test_gen,
    verbose=2,
    shuffle=False,
    class_weight=None,
)
Epoch 1/20
1/1 - 2s - loss: 0.7628 - acc: 0.4430 - val_loss: 0.6149 - val_acc: 0.7274
Epoch 2/20
1/1 - 2s - loss: 0.6236 - acc: 0.7074 - val_loss: 0.5340 - val_acc: 0.8173
Epoch 3/20
1/1 - 2s - loss: 0.5439 - acc: 0.7866 - val_loss: 0.4680 - val_acc: 0.8504
Epoch 4/20
1/1 - 2s - loss: 0.4800 - acc: 0.8309 - val_loss: 0.4154 - val_acc: 0.8623
Epoch 5/20
1/1 - 2s - loss: 0.4137 - acc: 0.8510 - val_loss: 0.3731 - val_acc: 0.8857
Epoch 6/20
1/1 - 2s - loss: 0.3704 - acc: 0.8738 - val_loss: 0.3377 - val_acc: 0.9035
Epoch 7/20
1/1 - 2s - loss: 0.3347 - acc: 0.9020 - val_loss: 0.3077 - val_acc: 0.9113
Epoch 8/20
1/1 - 2s - loss: 0.2946 - acc: 0.9195 - val_loss: 0.2834 - val_acc: 0.9139
Epoch 9/20
1/1 - 2s - loss: 0.2637 - acc: 0.9221 - val_loss: 0.2654 - val_acc: 0.9169
Epoch 10/20
1/1 - 2s - loss: 0.2394 - acc: 0.9221 - val_loss: 0.2523 - val_acc: 0.9188
Epoch 11/20
1/1 - 2s - loss: 0.2333 - acc: 0.9154 - val_loss: 0.2424 - val_acc: 0.9200
Epoch 12/20
1/1 - 2s - loss: 0.2169 - acc: 0.9262 - val_loss: 0.2352 - val_acc: 0.9214
Epoch 13/20
1/1 - 2s - loss: 0.2034 - acc: 0.9329 - val_loss: 0.2311 - val_acc: 0.9210
Epoch 14/20
1/1 - 2s - loss: 0.1907 - acc: 0.9315 - val_loss: 0.2299 - val_acc: 0.9210
Epoch 15/20
1/1 - 2s - loss: 0.1882 - acc: 0.9329 - val_loss: 0.2305 - val_acc: 0.9205
Epoch 16/20
1/1 - 2s - loss: 0.1867 - acc: 0.9289 - val_loss: 0.2315 - val_acc: 0.9198
Epoch 17/20
1/1 - 2s - loss: 0.1746 - acc: 0.9450 - val_loss: 0.2321 - val_acc: 0.9212
Epoch 18/20
1/1 - 2s - loss: 0.1785 - acc: 0.9289 - val_loss: 0.2333 - val_acc: 0.9224
Epoch 19/20
1/1 - 2s - loss: 0.1716 - acc: 0.9329 - val_loss: 0.2347 - val_acc: 0.9238
Epoch 20/20
1/1 - 2s - loss: 0.1691 - acc: 0.9383 - val_loss: 0.2366 - val_acc: 0.9245

Model Evaluation

Now we have trained the model, let’s evaluate it on the test set.

We are going to consider 4 evaluation metrics calculated on the test set: Accuracy, Area Under the ROC curve (AU-ROC), the ROC curve, and the confusion table.

[33]:
test_metrics = model.evaluate(test_gen)
print("\nTest Set Metrics:")
for name, val in zip(model.metrics_names, test_metrics):
    print("\t{}: {:0.4f}".format(name, val))

Test Set Metrics:
        loss: 0.2366
        acc: 0.9245
[34]:
all_nodes = node_data.index
all_gen = generator.flow(all_nodes)
all_predictions = model.predict(all_gen).squeeze()[..., np.newaxis]
[35]:
all_predictions.shape
[35]:
(100386, 1)
[36]:
all_predictions_df = pd.DataFrame(all_predictions, index=node_data.index)

Let’s extract the predictions for the test data only.

[37]:
test_preds = all_predictions_df.loc[test_data.index, :]
[38]:
test_preds.shape
[38]:
(4226, 1)

The predictions are the probability of the true class that in this case is the probability of a user being hateful.

[39]:
test_predictions = test_preds.values
test_predictions_class = ((test_predictions > 0.5) * 1).flatten()
test_df = pd.DataFrame(
    {
        "Predicted_score": test_predictions.flatten(),
        "Predicted_class": test_predictions_class,
        "True": test_targets[:, 0],
    }
)
roc_auc = metrics.roc_auc_score(test_df["True"].values, test_df["Predicted_score"].values)
print("The AUC on test set:\n")
print(roc_auc)
The AUC on test set:

0.8730678134416948

Interpretability by Saliency Maps

To understand which features and edges the model is looking at while making the predictions, we use the interpretability tool in the StellarGraph library (i.e., saliency maps) to demonstrate the importance of node features and edges given a target user.

[40]:
from stellargraph.interpretability.saliency_maps import IntegratedGradients

int_saliency = IntegratedGradients(model, all_gen)
[41]:
# we first select a list of nodes which are confidently classified as hateful.
predicted_hateful_index = set(np.where(all_predictions > 0.9)[0].tolist())
test_indices_set = set([int(k) for k in test_data.index.tolist()])
hateful_in_test = list(predicted_hateful_index.intersection(test_indices_set))

# let's pick one node from the predicted hateful users as an example.
idx = 2
target_idx = hateful_in_test[idx]
target_nid = list(G.nodes())[target_idx]
print("target_idx = {}, target_nid = {}".format(target_idx, target_nid))
print(
    "prediction score for node {} is {}".format(target_idx, all_predictions[target_idx])
)
print(
    "ground truth score for node {} is {}".format(
        target_idx, test_targets[test_data.index.tolist().index(str(target_nid))]
    )
)
[X, all_targets, A_index, A], y_true_all = all_gen[0]
target_idx = 36367, target_nid = 77692
prediction score for node 36367 is [0.9576855]
ground truth score for node 36367 is [1]

For the prediction of the target node, we then calculate the importance of the features for each node in the graph. Our support for sparse saliency maps makes it efficient to fit the scale like this dataset.

[42]:
# We set the target_idx which is our target node.
node_feature_importance = int_saliency.get_integrated_node_masks(target_idx, 0)

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

As node_feature_importance is a matrix where node_feature_importance[i][j] indicates the importance of the j-th feature of node i to the prediction of the target node, we sum up the feature importance of each node to measure its node importance.

[43]:
node_importance = np.sum(node_feature_importance, axis=-1)
node_importance_rank = np.argsort(node_importance)[::-1]
print(node_importance[node_importance_rank])
print(
    "node_importance has {} non-zero values".format(
        np.where(node_importance != 0)[0].shape[0]
    )
)
[ 0.41207344  0.21552302  0.20715151 ... -0.0263247  -0.03053751
 -0.03189572]
node_importance has 12721 non-zero values

We expect the number of non-zero values of node_importance to match the number of nodes in the ego graph.

[44]:
G_ego = nx.ego_graph(g_nx, target_nid, radius=2)
print("The ego graph of the target node has {} neighbors".format(len(G_ego.nodes())))
The ego graph of the target node has 12721 neighbors

We then analyze the feature importance of the top-250 important nodes. See the output for the top-5 importance nodes. For each row, the features are sorted according to their importance.

[45]:
feature_names = annotated_users.keys()[1:].values
feature_importance_rank = np.argsort(node_feature_importance[target_idx])[::-1]
df = pd.DataFrame(
    [
        ([k] + list(feature_names[np.argsort(node_feature_importance[k])[::-1]]))
        for k in node_importance_rank[:250]
    ],
    columns=range(205),
)
df.head()
[45]:
0 1 2 3 4 5 6 7 8 9 ... 195 196 197 198 199 200 201 202 203 204
0 36367 fear_empath torment_empath favorites_count legend_empath rage_empath computer_empath furniture_empath royalty_empath office_empath ... listen_empath medieval_empath divine_empath confusion_empath statuses_count gain_empath wealthy_empath family_empath tweet number retweet number
1 34534 rage_empath fear_empath favorites_count ridicule_empath pet_empath computer_empath children_empath shame_empath royalty_empath ... hearing_empath weakness_empath masculine_empath home_empath gain_empath positive_emotion_empath competing_empath statuses_count tweet number retweet number
2 69814 ridicule_empath rage_empath joy_empath anger_empath money_empath retweet number listen_empath tweet number negotiate_empath ... contentment_empath divine_empath deception_empath fear_empath dominant_heirarchical_empath furniture_empath subjectivity white_collar_job_empath health_empath giving_empath
3 69254 ridicule_empath legend_empath hipster_empath rage_empath science_empath fun_empath farming_empath furniture_empath favorites_count ... status length kill_empath clothing_empath weakness_empath breaking_empath retweet number giving_empath pet_empath gain_empath quote number
4 71135 royalty_empath rage_empath statuses_count leader_empath weakness_empath legend_empath joy_empath children_empath ridicule_empath ... divine_empath religion_empath healing_empath poor_empath sexual_empath deception_empath help_empath farming_empath tweet number favorites_count

5 rows × 205 columns

As a sanity check, we expect the target node itself to have a relatively high importance.

[46]:
self_feature_importance_rank = np.argsort(node_feature_importance[target_idx])
print(np.sum(node_feature_importance[target_idx]))
print(
    "The node itself is the {}-th important node".format(
        1 + node_importance_rank.tolist().index(target_idx)
    )
)
df = pd.DataFrame([feature_names[self_feature_importance_rank][::-1]], columns=range(204))
df
0.4120734350877242
The node itself is the 1-th important node
[46]:
0 1 2 3 4 5 6 7 8 9 ... 194 195 196 197 198 199 200 201 202 203
0 fear_empath torment_empath favorites_count legend_empath rage_empath computer_empath furniture_empath royalty_empath office_empath science_empath ... listen_empath medieval_empath divine_empath confusion_empath statuses_count gain_empath wealthy_empath family_empath tweet number retweet number

1 rows × 204 columns

For different nodes, the same features may have different ranks. To understand the overall importance of the features, we now analyze the average feature importance rank for the above selected nodes. Specifically, we obtain the average rank of each specific feature among the top-250 important nodes.

[47]:
from collections import defaultdict

average_feature_rank = defaultdict(int)
for i in node_importance_rank[:250]:
    feature_rank = list(feature_names[np.argsort(node_feature_importance[i])[::-1]])
    for j in range(len(feature_rank)):
        average_feature_rank[feature_rank[j]] += feature_rank.index(feature_rank[j])
for k in average_feature_rank.keys():
    average_feature_rank[k] /= 250.0
sorted_avg_feature_rank = sorted(average_feature_rank.items(), key=lambda a: a[1])
for feat, avg_rank in sorted_avg_feature_rank:
    print(feat, avg_rank)
ridicule_empath 11.096
hipster_empath 23.612
furniture_empath 27.564
science_empath 28.28
cleaning_empath 33.94
legend_empath 34.536
joy_empath 35.688
anger_empath 36.344
farming_empath 37.444
children_empath 37.54
listen_empath 42.396
negotiate_empath 51.996
listed_count 52.784
vacation_empath 52.952
rural_empath 53.42
anonymity_empath 53.688
traveling_empath 53.988
time_diff 55.836
speaking_empath 56.932
money_empath 56.968
noise_empath 57.368
death_empath 59.676
exasperation_empath 60.744
hiking_empath 61.0
violence_empath 61.692
positive_emotion_empath 62.132
social_media_empath 63.252
leisure_empath 64.448
royalty_empath 64.732
pride_empath 65.788
weakness_empath 66.6
office_empath 68.584
wedding_empath 69.616
sympathy_empath 70.0
subjectivity 70.424
terrorism_empath 70.74
kill_empath 71.808
computer_empath 73.048
number hashtags 73.092
fight_empath 74.428
favorites_count 74.444
emotional_empath 75.48
followers_count 75.584
dispute_empath 75.74
baddies 75.756
shame_empath 76.06
zest_empath 78.936
rage_empath 78.996
statuses_count 81.088
swearing_terms_empath 81.52
attractive_empath 83.192
dominant_personality_empath 84.864
fire_empath 84.964
business_empath 85.288
government_empath 85.732
fear_empath 85.992
love_empath 86.236
musical_empath 86.564
car_empath 86.772
number urls 87.404
play_empath 87.632
white_collar_job_empath 88.052
disgust_empath 88.116
fun_empath 88.44
payment_empath 88.768
vehicle_empath 89.568
childish_empath 90.0
prison_empath 90.096
writing_empath 91.224
healing_empath 91.296
torment_empath 91.848
school_empath 92.644
hygiene_empath 92.852
achievement_empath 94.408
ancient_empath 96.328
medical_emergency_empath 96.336
philosophy_empath 97.288
crime_empath 97.416
driving_empath 97.96
pet_empath 97.964
timidity_empath 98.2
cold_empath 98.296
warmth_empath 98.408
power_empath 98.756
leader_empath 98.832
dance_empath 99.204
liquid_empath 99.288
communication_empath 99.496
movement_empath 99.916
monster_empath 102.284
ocean_empath 102.676
tweet number 103.136
confusion_empath 103.38
surprise_empath 103.496
order_empath 103.508
followees_count 103.968
alcohol_empath 104.008
worship_empath 104.636
hearing_empath 104.8
trust_empath 104.8
affection_empath 105.088
shape_and_size_empath 105.32
urban_empath 105.448
toy_empath 105.556
suffering_empath 105.896
college_empath 106.852
beauty_empath 106.896
real_estate_empath 108.068
tool_empath 108.428
health_empath 108.624
appearance_empath 108.812
stealing_empath 108.904
technology_empath 109.016
occupation_empath 109.512
youth_empath 109.76
time_diff_median 110.224
eating_empath 110.26
ugliness_empath 110.3
banking_empath 110.56
lust_empath 110.98
competing_empath 111.256
cooking_empath 111.484
fashion_empath 111.792
ship_empath 111.9
journalism_empath 112.076
negative_emotion_empath 112.676
economics_empath 113.6
water_empath 113.956
retweet number 114.104
sexual_empath 114.336
night_empath 114.776
smell_empath 115.676
air_travel_empath 115.736
party_empath 115.884
war_empath 116.0
law_empath 116.676
tourism_empath 116.708
body_empath 116.88
status length 116.904
reading_empath 117.008
politeness_empath 117.156
family_empath 117.42
meeting_empath 117.504
shopping_empath 117.708
art_empath 117.904
medieval_empath 118.684
pain_empath 119.612
feminine_empath 120.672
wealthy_empath 121.316
optimism_empath 121.552
irritability_empath 121.664
plant_empath 121.912
independence_empath 122.204
animal_empath 122.304
horror_empath 122.38
phone_empath 122.416
nervousness_empath 122.712
quote number 122.86
anticipation_empath 124.308
cheerfulness_empath 125.0
politics_empath 125.672
internet_empath 126.012
military_empath 126.056
sailing_empath 126.088
celebration_empath 126.304
giving_empath 127.656
religion_empath 127.848
weapon_empath 128.356
sound_empath 129.372
strength_empath 131.172
restaurant_empath 133.104
breaking_empath 133.148
injury_empath 133.204
sports_empath 134.016
swimming_empath 134.496
sadness_empath 134.884
mentions 135.048
gain_empath 135.116
valuable_empath 137.04
exercise_empath 137.804
help_empath 137.828
programming_empath 137.892
hate_empath 138.292
weather_empath 138.476
friends_empath 138.752
clothing_empath 139.02
contentment_empath 139.568
neglect_empath 139.652
blue_collar_job_empath 141.772
work_empath 143.168
beach_empath 144.776
disappointment_empath 144.944
deception_empath 145.72
messaging_empath 147.116
sleep_empath 148.896
poor_empath 148.932
envy_empath 150.56
dominant_heirarchical_empath 154.408
morning_empath 156.448
superhero_empath 158.392
home_empath 159.552
aggression_empath 160.376
divine_empath 160.444
masculine_empath 161.288

It seems for our target node, topics relevant to cleaning, hipster, etc. are important while those such as leaisure, ship, goverment, etc. are not important.

We then calculate the link importance for the edges that are connected to the target node within k hops (k = 2 for our GCN model).

[48]:
link_importance = int_saliency.get_integrated_link_masks(target_idx, 0, steps=2)
[49]:
(x, y) = link_importance.nonzero()
[X, all_targets, A_index, A], y_true_all = all_gen[0]
print(A_index.shape, A.shape)
G_edge_indices = [(A_index[0, k, 0], A_index[0, k, 1]) for k in range(A_index.shape[1])]
link_dict = {(A_index[0, k, 0], A_index[0, k, 1]): k for k in range(A_index.shape[1])}
(1, 4289572, 2) (1, 4289572)

As a sanity check, we expect the most important edge to connect important nodes.

[50]:
nonzero_importance_val = link_importance[(x, y)].flatten().tolist()[0]
link_importance_rank = np.argsort(nonzero_importance_val)[::-1]
edge_number_in_ego_graph = link_importance_rank.shape[0]
print(
    "There are {} edges within the ego graph of the target node".format(
        edge_number_in_ego_graph
    )
)
x_rank, y_rank = x[link_importance_rank], y[link_importance_rank]
print(
    "The most important edge connects {}-th important node and {}-th important node".format(
        node_importance_rank.tolist().index(x_rank[0]),
        (node_importance_rank.tolist().index(y_rank[0])),
    )
)
There are 26145 edges within the ego graph of the target node
The most important edge connects 0-th important node and 132-th important node

To ensure that we are getting the correct importance for edges, we then check what happens if we perturb the top-10 most important edges. Specifically, if we remove the top important edges according to the calculated edge importance scores, we should expect to see the prediction of the target node change.

[51]:
from copy import deepcopy

selected_nodes = np.array([[target_idx]], dtype="int32")
prediction_clean = model.predict([X, selected_nodes, A_index, A]).squeeze()
A_perturb = deepcopy(A)
print("A_perturb.shape = {}".format(A_perturb.shape))
# we remove top 1% important edges in the graph and see how the prediction changes
topk = int(edge_number_in_ego_graph * 0.01)

for i in range(topk):
    edge_x, edge_y = x_rank[i], y_rank[i]
    edge_index = link_dict[(edge_x, edge_y)]
    A_perturb[0, edge_index] = 0
A_perturb.shape = (1, 4289572)

As expected, the prediction score drops after the perturbation. The target node is predicted as non-hateful now.

[52]:
prediction = model.predict([X, selected_nodes, A_index, A_perturb]).squeeze()
print(
    "The prediction score changes from {} to {} after the perturbation".format(
        prediction_clean, prediction
    )
)
The prediction score changes from 0.9576854109764099 to 0.14111952483654022 after the perturbation

NOTES: For UX team, the above notebook shows how we are able to compute the importance of nodes and edges. However, it seems the ego graph of the target node in twitter dataset is often very big so that we may draw only top important nodes/edges on the visualization.

[53]:

Run the master version of this notebook: