torchcp.graph

score function

DAPS

Method: Diffusion Adaptive Prediction Sets Paper: Conformal Prediction Sets for Graph Neural Networks (Zargarbashi et al., 2023) Link: https://proceedings.mlr.press/v202/h-zargarbashi23a/h-zargarbashi23a.pdf Github: https://github.com/soroushzargar/DAPS

SNAPS

Method: Similarity-Navigated Adaptive Prediction Sets Paper: Similarity-Navigated Conformal Prediction for Graph Neural Networks (Song et al., 2024) Link: https://arxiv.org/pdf/2405.14303 Github: https://github.com/janqsong/SNAPS

class torchcp.graph.score.DAPS(graph_data, base_score_function, neigh_coef=0.5)

Method: Diffusion Adaptive Prediction Sets Paper: Conformal Prediction Sets for Graph Neural Networks (Zargarbashi et al., 2023) Link: https://proceedings.mlr.press/v202/h-zargarbashi23a/h-zargarbashi23a.pdf Github: https://github.com/soroushzargar/DAPS

The diffusion process adjusts the non-conformity scores of nodes by propagating information from their neighbors, where the strength of diffusion is controlled by the parameter neigh_coef. A higher value of neigh_coef puts more emphasis on the diffusion of scores.

Parameters:

neigh_coef (float) – A diffusion parameter that controls the balance between local (node-specific) scores and diffusion scores. It must be a value in [0, 1].

class torchcp.graph.score.SNAPS(graph_data, base_score_function, xi=0.3333333333333333, mu=0.3333333333333333, knn_edge=None, knn_weight=None, features=None, k=20)

Method: Similarity-Navigated Adaptive Prediction Sets Paper: Similarity-Navigated Conformal Prediction for Graph Neural Networks (Song et al., 2024) Link: https://arxiv.org/pdf/2405.14303 Github: https://github.com/janqsong/SNAPS

Parameters:
  • xi (float) – The weight parameter for neighborhood-based scores, where 0 <= xi <= 1.

  • mu (float) – The weight parameter for similarity-based scores, where 0 <= mu <= 1.

  • knn_edge (torch.Tensor, optional) – An edge list representing the k-nearest neighbors (k-NN) for each node. It may be constructed based on the similarity of nodes’ feature. The shape is (2, E), where E is the number of edges in the kNN graph. The first row contains the source node indices, and the second row contains the target node indices.

  • knn_weight (torch.Tensor, optional) – The weights associated with each k-NN edge, if applicable. Defaults to uniform weights.

  • features (torch.Tensor, optional) – A tensor of node features used to compute the k-NN graph if knn_edge is not provided. The shape is (N, D), where N is the number of nodes and D is the dimensionality of the features. Defaults to None.

  • k (int, optional) – The number of nearest neighbors to consider when constructing the k-NN graph. Defaults to 20.

predictor

SplitPredictor

Method: Split Conformal Prediction (Vovk et a., 2005).

NAPSPredictor

Method: Neighbourhood Adaptive Prediction Sets Paper: Distribution Free Prediction Sets for Node Classification (Clarkson et al., 2023) Link: https://proceedings.mlr.press/v202/clarkson23a/clarkson23a.pdf Github: https://github.com/jase-clarkson/graph_cp/tree/master

class torchcp.graph.predictor.SplitPredictor(graph_data, score_function, model=None, alpha=0.1, device=None)

Method: Split Conformal Prediction (Vovk et a., 2005). Paper: Algorithmic Learning in a Random World Link: https://link.springer.com/book/10.1007/978-3-031-06649-8.

Parameters:
  • graph_data (torch_geometric.data.Data) – The input graph data in PyG format.

  • score_function (callable) – A user-defined function that computes the non-conformity score.

  • model (torch.nn.Module) – A PyTorch model used for predictions on the graph. Defaults to None.

  • alpha (float, optional) – The significance level. Default is 0.1.

  • device (torch.device, optional) – The device on which the model is located. Default is None.

calculate_threshold(logits, cal_idx, label_mask, alpha=None)

Calculate the conformal prediction threshold for a given calibration set.

This method computes a threshold (q_hat) based on the non-conformity scores of the calibration set. The threshold ensures that the conformal prediction meets the desired significance level (alpha).

Parameters:
  • logits (torch.Tensor) – The raw model outputs (logits) for all samples in the dataset. Shape: [num_samples, num_classes].

  • cal_idx (torch.Tensor or list) – Indices specifying the samples in the calibration set. Shape: [num_calibration_samples].

  • label_mask (torch.Tensor) – A boolean tensor indicating the presence of valid labels for each sample and class. Shape: [num_samples, num_classes].

  • alpha (float) – The significance level, a value in the range (0, 1), representing the acceptable error rate. Default is None.

calibrate(cal_idx, alpha=None)

Abstract method to perform calibration on a given calibration set.

Parameters:
  • cal_idx (torch.Tensor) – Indices specifying the samples in the graph data that belong to the calibration set.

  • alpha (float) – The significance level, a value in the range (0, 1), representing the acceptable error rate for conformal prediction. Default is None.

evaluate(eval_idx)

Evaluate the model’s conformal prediction performance on a given evaluation set.

This method performs evaluation by first making predictions using the model’s raw outputs (logits) and then calculating several performance metrics based on the prediction sets generated for the evaluation samples. It calculates the coverage rate, average prediction set size, and singleton hit ratio, and returns these metrics as a dictionary.

Parameters:

eval_idx (torch.Tensor or list) – Indices of the samples in the evaluation or test set. Shape: [num_test_samples].

Returns:

A dictionary containing the evaluation results. The dictionary includes: - “Coverage_rate”: The proportion of test samples for which the true label is included in the prediction set. - “Average_size”: The average size of the prediction sets. - “Singleton_hit_ratio”: The ratio of singleton (i.e., single-class) prediction sets where the predicted class matches the true label.

Return type:

dict

predict(eval_idx)

Abstract method to make predictions on a given test set.

This method must be implemented by subclasses to handle the prediction process for conformal prediction. The prediction will be based on the model’s outputs and the non-conformity scores, adjusted according to the calibration.

Parameters:

eval_idx (torch.Tensor or list) – Indices specifying the samples in the test set on which predictions need to be made. Shape: [num_test_samples].

Returns:

A list containing prediction sets for the test samples, depending on the specific conformal prediction method implemented.

Return type:

list

predict_with_logits(logits, eval_idx, q_hat=None)

Generate prediction sets based on the logits and the conformal threshold.

This method constructs prediction sets by comparing the non-conformity scores (calculated from the logits) to a predefined threshold (q_hat). If q_hat is not provided, it defaults to the value of self.q_hat, which should have been set during the calibration phase.

Parameters:
  • logits (torch.Tensor) – The raw output of the model (before applying softmax). Shape: [num_samples, num_classes].

  • eval_idx (torch.Tensor or list) – Indices of the samples in the evaluation or test set. Shape: [num_test_samples].

  • q_hat (float, optional) – The conformal threshold used to generate prediction sets. If not provided, self.q_hat (calculated during the calibration phase) will be used.

Returns:

A list containing prediction sets for the test samples, depending on the specific conformal prediction method implemented.

Return type:

list

class torchcp.graph.predictor.NAPSPredictor(graph_data, score_function=<torchcp.classification.score.aps.APS object>, model=None, alpha=0.1, device=None, cutoff=50, k=2, scheme='unif')

Method: Neighbourhood Adaptive Prediction Sets Paper: Distribution Free Prediction Sets for Node Classification (Clarkson et al., 2023) Link: https://proceedings.mlr.press/v202/clarkson23a/clarkson23a.pdf Github: https://github.com/jase-clarkson/graph_cp/tree/master

This class implements the NAPS method for conformal prediction on graph-structured data. It constructs prediction sets for nodes based on their neighborhood structure and non-conformity scores.

Parameters:
  • score_function (callable) – Must be APS non-conformity scores function with score_type=”softmax”

  • model (torch.nn.Module) – A PyTorch model used for predictions on the graph. Defaults to None.

  • alpha (float, optional) – The significance level. Default is 0.1.

  • device (torch.device, optional) – The device on which the model is located. Default is None.

  • cutoff (int) – Minimum number of k-hop neighbors a node must have to be included in the test set. Default is 50. Nodes with fewer than this number of neighbors will be excluded.

  • k (int) – Number of k-hop neighbors to include in the calibration set for each node. Default is 2, meaning nodes and their 2-hop neighbors are used for calibration.

  • scheme (str) – The weight decay scheme for k-hop neighbors. Options include: - ‘unif’: Uniform weighting (weights = 1) - ‘linear’: Linear decay (weights = 1/k) - ‘geom’: Geometric decay (weights = 2^{-(k-1)}) Default is ‘unif’.

calculate_threshold_for_node(node, logits, labels, alpha=None)

Calculate the conformal prediction threshold for a given node based on its neighborhood.

This method computes the conformal prediction threshold for a specific node by examining the non-conformity scores of the node’s neighbors. If the node has enough neighbors (as defined by the cutoff), it calibrates the threshold using these neighbors’ scores.

Parameters:
  • node (int) – The ID of the node for which the threshold is being calculated.

  • logits (torch.Tensor) – The raw model outputs (logits) for test nodes. Shape: [num_test_nodes, num_classes].

  • labels (torch.Tensor) – The true labels for test nodes. Shape: [num_test_nodes].

  • alpha (float) – The significance level for the conformal prediction. This is used to determine the threshold for the prediction set. Default is None.

Returns:

A dictionary where the key is the node ID and the value is the calibrated threshold for the node. If the node doesn’t have enough neighbors (i.e., fewer than cutoff), None is returned.

Return type:

dict

evaluate(eval_idx, alpha=None)

Evaluate the model’s conformal prediction performance on a given evaluation set.

This method performs evaluation by first making predictions using the model’s raw outputs (logits) and then calculating several performance metrics based on the prediction sets generated for the evaluation samples. It calculates the coverage rate, average prediction set size, and singleton hit ratio, and returns these metrics as a dictionary.

Parameters:
  • eval_idx (torch.Tensor or list) – Indices of the samples in the evaluation or test set. Shape: [num_test_samples].

  • alpha (float) – The pre-defined empirical marginal coverage level, where 1 - alpha represents the confidence level of the prediction sets. Default is None.

Returns:

A dictionary containing the evaluation results. The dictionary includes: - “Coverage_rate”: The proportion of test samples for which the true label is included in the prediction set. - “Average_size”: The average size of the prediction sets. - “Singleton_hit_ratio”: The ratio of singleton (i.e., single-class) prediction sets where the predicted class matches the true label.

Return type:

dict

predict(eval_idx, alpha=None)

Give evaluation predicted set.

This method performs evaluation by first making predictions using the model’s raw outputs.

Parameters:
  • eval_idx (torch.Tensor or list) – Indices of the samples in the evaluation or test set. Shape: [num_test_samples].

  • alpha (float) – The pre-defined empirical marginal coverage level, where 1 - alpha represents the confidence level of the prediction sets. Default is None.

Returns:

A tensor containing the indices of the nodes that meet the criteria of having at least ‘cutoff’ k-hop

neighbors for testing. Shape: [num_lcc_nodes].

prediction_sets (list):

A list containing the precomputed prediction sets for each node in lcc_nodes. Each set is a list of predicted classes for that node.

Return type:

lcc_nodes (torch.Tensor)

predict_with_logits(logits, eval_idx, alpha=None)

Predict the prediction sets for nodes in the graph.

This method calculates the prediction sets for each node that has at least ‘cutoff’ k-hop neighbors, based on the provided logits and labels. The prediction sets are precomputed for a given empirical marginal coverage 1 - alpha, where alpha is the significance level.

Parameters:
  • logits (torch.Tensor) – A tensor containing the model’s predicted logits

  • eval_idx (torch.Tensor) – The indices of test nodes.

  • alpha (float) – The pre-defined empirical marginal coverage level, where 1 - alpha represents the confidence level of the prediction sets. Default is None.

Returns:

A tensor containing the indices of the nodes that meet the criteria of having at least ‘cutoff’ k-hop

neighbors for testing. Shape: [num_lcc_nodes].

prediction_sets (list):

A list containing the precomputed prediction sets for each node in lcc_nodes. Each set is a list of predicted classes for that node.

Return type:

lcc_nodes (torch.Tensor)

trainer

CFGNNTrainer

Method: Conformalized GNN Paper: Uncertainty Quantification over Graph with Conformalized Graph Neural Networks (Huang et al., 2023).

class torchcp.graph.trainer.CFGNNTrainer(model, graph_data, hidden_channels=64, num_layers=2, alpha=0.1, optimizer_class: ~torch.optim.optimizer.Optimizer = <class 'torch.optim.adam.Adam'>, optimizer_params: dict = {'lr': 0.001, 'weight_decay': 0.0005}, device=None)

Method: Conformalized GNN Paper: Uncertainty Quantification over Graph with Conformalized Graph Neural Networks (Huang et al., 2023). Link: https://openreview.net/pdf?id=ygjQCOyNfh Github: https://github.com/snap-stanford/conformalized-gnn

A class for training and evaluating a Conformalized GNN (CF-GNN) for graph classification tasks. The model uses a Graph Neural Network (GNN) as the backbone and integrates conformal prediction methods for uncertainty quantification and model calibration.

Parameters:
  • model (torch.nn.Module) – backbone model.

  • graph_data (from torch_geometric.data import Data) – x (tensor): features of nodes. edge_index (Tensor): The edge index, shape (2, num_edges). edge_weight (Tensor, optional): The edge weights, shape (num_edges,). train_idx: The indices of the training nodes. val_idx: The indices of the validation nodes. calib_train_idx: The indices of the training nodes for CF-GNN.

  • hidden_channels (int) – Number of hidden channels for the CF-GNN layers.

  • alpha (float, optional) – The significance level for conformal prediction. Default is 0.1.

  • optimizer_class (torch.optim.Optimizer) – Optimizer class for temperature parameter Default: torch.optim.Adam

  • optimizer_params (dict) – Parameters passed to optimizer constructor Default: {‘weight_decay’: 5e-4, ‘lr’: 0.001}

train(n_epochs=5000)

Trains the CF-GNN model for a specified number of epochs and returns the corrected logits.

Parameters:

n_epochs – The number of training epochs.

Returns:

The best model of CF-GNN.

Return type:

model