torchcp.classification

score function

LAC

Least Ambiguous Classifiers (LAC)

APS

Method: Adaptive Prediction Sets (APS) Paper: Classification with Valid and Adaptive Coverage (Romano et al., 2020) Link:https://proceedings.neurips.cc/paper/2020/file/244edd7e85dc81602b7615cd705545f5-Paper.pdf

RAPS

Method: Regularized Adaptive Prediction Sets Paper: Uncertainty Sets for Image Classifiers using Conformal Prediction (Angelopoulos et al., 2020) Link: https://arxiv.org/abs/2009.14193

SAPS

Method: Sorted Adaptive Prediction Sets Paper: Conformal Prediction for Deep Classifier via Label Ranking (Huang et al., 2023) Link: https://arxiv.org/abs/2310.06430 Github: https://github.com/ml-stat-Sustech/conformal_prediction_via_label_ranking

Margin

Method: Margin non-conformity score Paper: Bias reduction through conditional conformal prediction (Löfström et al., 2015) Link: https://dl.acm.org/doi/abs/10.3233/IDA-150786

TOPK

Method: TOPK prediction sets Paper: Uncertainty Sets for Image Classifiers using Conformal Prediction (Angelopoulos et al., 2020) Link: https://arxiv.org/abs/2009.14193

KNN

Method: K-Nearest Neighbor non-conformity score Paper: Hedging Predictions in Machine Learning (Gammerman et al., 2016).

EntmaxScore

Score functions based on gamma-entmax transformations as described in 'Sparse Activations as Conformal Predictors' (Campos et al., AISTATS 2025).

class torchcp.classification.score.LAC(score_type='softmax')

Least Ambiguous Classifiers (LAC)

Parameters:

score_type (Union[str, Callable], optional) – Specifies how to transform logits. - If str: Use predefined functions {“softmax”, “identity”, “log_softmax”, “log”} - If callable: Custom function that takes and returns torch.Tensor Defaults to “softmax”.

transform

The transformation function applied to logits.

Type:

callable

Examples::
>>> lac = LAC(score_type="softmax")
>>> logits = torch.tensor([[2.0, 1.0, 0.1], [0.5, 2.5, 1.0]])
>>> scores_all = lac(logits)
>>> # Using custom function
>>> custom_transform = lambda x: x.sigmoid()
>>> lac = LAC(score_type=custom_transform)
>>> scores_custom = lac(logits)

References

Sadinle, M. et al., (2016). Least ambiguous set-valued classifiers with bounded error levels. Journal of the American Statistical Association, 111(515), 1648-1658.

Link : https://arxiv.org/abs/1609.00451

class torchcp.classification.score.APS(score_type='softmax', randomized=True)

Method: Adaptive Prediction Sets (APS) Paper: Classification with Valid and Adaptive Coverage (Romano et al., 2020) Link:https://proceedings.neurips.cc/paper/2020/file/244edd7e85dc81602b7615cd705545f5-Paper.pdf

Parameters:
  • score_type (str, optional) – The type of score to use. Default is “softmax”.

  • randomized (bool, optional) – Whether to use randomized scores. Default is True.

_calculate_all_label(probs)

Calculate non-conformity scores for all classes.

_sort_sum(probs)

Sort probabilities and calculate cumulative sum.

_calculate_single_label(probs, label)

Calculate non-conformity score for the ground-truth label.

Examples::
>>> aps = APS(score_type="softmax", randomized=True)
>>> probs = torch.tensor([[0.1, 0.4, 0.5], [0.3, 0.3, 0.4]])
>>> scores = aps._calculate_all_label(probs)
>>> print(scores)
class torchcp.classification.score.RAPS(score_type='softmax', randomized=True, penalty=0, kreg=0)

Method: Regularized Adaptive Prediction Sets Paper: Uncertainty Sets for Image Classifiers using Conformal Prediction (Angelopoulos et al., 2020) Link: https://arxiv.org/abs/2009.14193

Parameters:
  • penalty (float) – The weight of regularization. When penalty = 0, RAPS=APS.

  • kreg (int, optional) – The rank of regularization which is an integer in [0, classes_num]. Default is 0.

  • score_type (str, optional) – The type of score to use. Default is “softmax”.

  • randomized (bool, optional) – Whether to use randomized scores. Default is True.

Examples::
>>> raps = RAPS(penalty=0.1, kreg=1, score_type="softmax", randomized=True)
>>> probs = torch.tensor([[0.1, 0.4, 0.5], [0.3, 0.3, 0.4]])
>>> scores_all = raps._calculate_all_label(probs)
>>> print(scores_all)
>>> scores_single = raps._calculate_single_label(probs, torch.tensor([2, 1]))
>>> print(scores_single)
class torchcp.classification.score.SAPS(score_type='softmax', randomized=True, weight=0.2)

Method: Sorted Adaptive Prediction Sets Paper: Conformal Prediction for Deep Classifier via Label Ranking (Huang et al., 2023) Link: https://arxiv.org/abs/2310.06430 Github: https://github.com/ml-stat-Sustech/conformal_prediction_via_label_ranking

Parameters:
  • weight (float) – The weight of label ranking. Must be a positive value.

  • score_type (str, optional) – The type of score to use. Default is “softmax”.

  • randomized (bool, optional) – Whether to use randomized scores. Default is True.

Examples::
>>> saps = SAPS(weight=0.5, score_type="softmax", randomized=True)
>>> probs = torch.tensor([[0.1, 0.4, 0.5], [0.3, 0.3, 0.4]])
>>> scores_all = saps._calculate_all_label(probs)
>>> print(scores_all)
>>> scores_single = saps._calculate_single_label(probs, torch.tensor([2, 1]))
>>> print(scores_single)
class torchcp.classification.score.Margin(score_type='softmax')

Method: Margin non-conformity score Paper: Bias reduction through conditional conformal prediction (Löfström et al., 2015) Link: https://dl.acm.org/doi/abs/10.3233/IDA-150786

Parameters:

score_type (str, optional) – The type of score to use. Default is “softmax”.

_calculate_single_label(probs, label)

Calculate margin non-conformity score for a single label.

_calculate_all_label(probs)

Calculate margin non-conformity scores for all labels.

Examples::
>>> margin = Margin(score_type="softmax")
>>> probs = torch.tensor([[0.1, 0.4, 0.5], [0.3, 0.3, 0.4]])
>>> scores_single = margin._calculate_single_label(probs, torch.tensor([2, 1]))
>>> print(scores_single)
>>> scores_all = margin._calculate_all_label(probs)
>>> print(scores_all)
class torchcp.classification.score.TOPK(score_type='softmax', randomized=True)

Method: TOPK prediction sets Paper: Uncertainty Sets for Image Classifiers using Conformal Prediction (Angelopoulos et al., 2020) Link: https://arxiv.org/abs/2009.14193

Args:

score_type (str, optional): The type of score to use. Default is “softmax”. randomized (bool, optional): Whether to use randomized scores. Default is True.

Examples::
>>> topk = TOPK(score_type="softmax", randomized=True)
>>> probs = torch.tensor([[0.1, 0.4, 0.5], [0.3, 0.3, 0.4]])
>>> scores = topk._calculate_all_label(probs)
>>> print(scores)
class torchcp.classification.score.KNN(features, labels, num_classes, k=1, p=2, batch=None)

Method: K-Nearest Neighbor non-conformity score Paper: Hedging Predictions in Machine Learning (Gammerman et al., 2016). Link: https://ieeexplore.ieee.org/document/8129828.

Parameters:
  • features (torch.Tensor) – The input features of training data.

  • labels (torch.Tensor) – The labels of training data.

  • num_classes (int) – The number of classes.

  • k (int, optional) – The number of neighbors. Default is 1.

  • p (float or str, optional) – p value for the p-norm distance to calculate between each vector pair. Default is 2. Optional: float or “cosine”.

  • batch (int, optional) – Batch size for distance calculation. Default is None. Set according to your GPU memory; too large may cause out of memory, too small may be slow.

Examples::
>>> features = torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0]])
>>> labels = torch.tensor([0, 1, 0])
>>> knn = KNN(features, labels, num_classes=2, k=1, p=2, batch=2)
>>> test_features = torch.tensor([[1.5, 2.5], [2.5, 3.5]])
>>> scores = knn(test_features)
>>> print(scores)
class torchcp.classification.score.EntmaxScore(gamma=2.0)

Score functions based on gamma-entmax transformations as described in ‘Sparse Activations as Conformal Predictors’ (Campos et al., AISTATS 2025).

Parameters:

gamma (float, optional) – The gamma parameter for entmax transformation. - gamma = 1: softmax with log-margin score - gamma = 2: sparsemax - gamma > 1: sparse entmax Defaults to 2.0 (sparsemax).

gamma

The gamma parameter for entmax.

Type:

float

temperature

The temperature scaling factor.

Type:

float

Examples::
>>> entmax = EntmaxScore(gamma=2.0, temperature=1.0)  # Sparsemax
>>> logits = torch.tensor([[2.0, 1.0, 0.1], [0.5, 2.5, 1.0]])
>>> scores_all = entmax(logits)
>>> # Using gamma=1 (softmax with log-margin)
>>> entmax = EntmaxScore(gamma=1.0, temperature=0.5)
>>> scores_custom = entmax(logits, label=torch.tensor([1, 0]))

References

Campos, M. M. et al., (2025). Sparse Activations as Conformal Predictors. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS).

predictor

SplitPredictor

Split Conformal Prediction (Vovk et a., 2005).

ClassConditionalPredictor

Method: Class-conditional conformal prediction Paper: Conditional validity of inductive conformal predictors (Vovk et al., 2012) Link: https://proceedings.mlr.press/v25/vovk12.html

ClusteredPredictor

Method: Clutered Conforml Predictor Paper: Class-Conditional Conformal Prediction with Many Classes (Ding et al., 2023) Link: https://arxiv.org/abs/2306.09335 Github: https://github.com/tiffanyding/class-conditional-conformal

RC3PPredictor

Rank Calibrated Class-conditional Conformal Prediction (RC3P) as described in "Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration" by Shi et al., NeurIPS 2024.

WeightedPredictor

Method: Weighted Conformal Prediction Paper: Conformal Prediction Under Covariate Shift (Tibshirani et al., 2019) Link: https://arxiv.org/abs/1904.06019 Github: https://github.com/ryantibs/conformal/

class torchcp.classification.predictor.SplitPredictor(score_function, model=None, temperature=1, alpha=0.1, device=None)

Split Conformal Prediction (Vovk et a., 2005). Book: https://link.springer.com/book/10.1007/978-3-031-06649-8.

Parameters:
  • score_function (callable) – Non-conformity score function.

  • model (torch.nn.Module, optional) – A PyTorch model. Default is None.

  • temperature (float, optional) – The temperature of Temperature Scaling. Default is 1.

  • alpha (float, optional) – The significance level. Default is 0.1.

  • device (torch.device, optional) – The device on which the model is located. Default is None.

calibrate(cal_dataloader, alpha=None)

Virtual method to calibrate the calibration set.

Parameters:
  • cal_dataloader (torch.utils.data.DataLoader) – A dataloader of the calibration set.

  • alpha (float) – The significance level. Default is None. If None, the alpha will be set to the value of the predictor.

evaluate(val_dataloader: DataLoader) Dict[str, float]

Evaluate prediction sets on validation dataset.

Parameters:

val_dataloader (torch.utils.data.DataLoader) – Dataloader for validation set.

Returns:

Dictionary containing evaluation metrics:
  • Coverage_rate: Empirical coverage rate on validation set

  • Average_size: Average size of prediction sets

Return type:

dict

predict(x_batch)

Generate prediction sets for a batch of instances.

Parameters:

x_batch (torch.Tensor) – A batch of instances.

Returns:

A list of prediction sets for each instance in the batch.

Return type:

list

predict_p(x_batch, y_batch=None, smooth=False)

Compute p-values for conformal prediction.

Parameters:
  • x_batch (torch.Tensor) – A batch of instances.

  • y_batch (torch.Tensor) – A batch of labels for instances. Default is None.

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

p-values for each test sample and class, shape (n_test, k)

Return type:

Tensor

predict_with_logits(logits, q_hat=None)

Generate prediction sets from logits.

Parameters:
  • logits (torch.Tensor) – Model output before softmax.

  • q_hat (torch.Tensor, optional) – The conformal threshold. Default is None.

Returns:

A list of prediction sets for each instance in the batch.

Return type:

list

class torchcp.classification.predictor.ClassConditionalPredictor(score_function, model=None, temperature=1, alpha=0.1, device=None)

Method: Class-conditional conformal prediction Paper: Conditional validity of inductive conformal predictors (Vovk et al., 2012) Link: https://proceedings.mlr.press/v25/vovk12.html

Parameters:
  • score_function (callable) – Non-conformity score function.

  • model (torch.nn.Module, optional) – A PyTorch model. Default is None.

  • temperature (float, optional) – The temperature of Temperature Scaling. Default is 1.

  • alpha (float, optional) – The significance level. Default is 0.1.

  • device (torch.device, optional) – The device on which the model is located. Default is None.

q_hat

The calibrated threshold for each class.

Type:

torch.Tensor

calculate_threshold(logits, labels, alpha=None)

Calculate the class-wise conformal prediction thresholds.

Parameters:
  • logits (torch.Tensor) – The logits output from the model.

  • labels (torch.Tensor) – The ground truth labels.

  • alpha (float) – The significance level. Default is None.

class torchcp.classification.predictor.ClusteredPredictor(score_function, model=None, temperature=1, alpha=0.1, ratio_clustering='auto', num_clusters='auto', split='random', device=None)

Method: Clutered Conforml Predictor Paper: Class-Conditional Conformal Prediction with Many Classes (Ding et al., 2023) Link: https://arxiv.org/abs/2306.09335 Github: https://github.com/tiffanyding/class-conditional-conformal

The class implements class-conditional conformal prediction with many classes.

Parameters:
  • score_function (callable) – A non-conformity score function.

  • model (torch.nn.Module, optional) – A PyTorch model. Default is None.

  • alpha (float, optional) – The significance level. Default is 0.1.

  • ratio_clustering (str or float, optional) – The ratio of examples in the calibration dataset used to cluster classes. Default is “auto”.

  • num_clusters (str or int, optional) – The number of clusters. If ratio_clustering is “auto”, the number of clusters is automatically computed. Default is “auto”.

  • split (str, optional) – The method to split the dataset into clustering dataset and calibration set. Options are ‘proportional’, ‘doubledip’, or ‘random’. Default is ‘random’.

  • temperature (float, optional) – The temperature of Temperature Scaling. Default is 1.

  • device (torch.device, optional) – The device on which the model is located. Default is None.

__ratio_clustering

The ratio of examples in the calibration dataset used to cluster classes.

Type:

str or float

__num_clusters

The number of clusters.

Type:

str or int

__split

The method to split the dataset into clustering dataset and calibration set.

Type:

str

calculate_threshold(logits, labels, alpha=None)

Calculate the class-wise conformal prediction thresholds.

Parameters:
  • logits (torch.Tensor) – The logits output from the model.

  • labels (torch.Tensor) – The ground truth labels.

  • alpha (float) – The significance level. Default is None.

class torchcp.classification.predictor.RC3PPredictor(score_function, model=None, alpha=0.1, device=None)

Rank Calibrated Class-conditional Conformal Prediction (RC3P) as described in “Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration” by Shi et al., NeurIPS 2024.

Parameters:
  • score_function (callable) – Non-conformity score function (e.g., APS or RAPS).

  • model (torch.nn.Module, optional) – A PyTorch model. Default is None.

  • alpha (float, optional) – The significance level. Default is 0.1.

  • device (torch.device, optional) – The device on which the model is located. Default is None.

calculate_threshold(logits, labels, alpha=None)

Perform class-wise calibration for conformal thresholds and label ranks.

Parameters:
  • logits (torch.Tensor) – Model logits for calibration data.

  • labels (torch.Tensor) – True labels for calibration data.

  • alpha (float) – Target miscoverage rate. Default is None.

calibrate(cal_dataloader, alpha=None)

Calibrate the RC3P predictor using class-wise conformal scores and label ranks.

Parameters:
  • cal_dataloader (DataLoader) – Calibration data loader.

  • alpha (float) – Target miscoverage rate (0 < alpha < 1). Default is None.

predict(x_batch)

Generate prediction sets for a batch of instances using RC3P.

Parameters:

x_batch (torch.Tensor) – A batch of input instances.

Returns:

Prediction sets for each instance in the batch (as boolean tensors).

Return type:

torch.Tensor

predict_with_logits(logits)

Generate prediction sets from logits using class-wise thresholds and rank limits.

Parameters:

logits (torch.Tensor) – Model logits for test data (B, K).

Returns:

Prediction sets for each instance (as boolean tensors).

Return type:

torch.Tensor

class torchcp.classification.predictor.WeightedPredictor(score_function, model=None, temperature=1, alpha=0.1, image_encoder=None, domain_classifier=None, device=None)

Method: Weighted Conformal Prediction Paper: Conformal Prediction Under Covariate Shift (Tibshirani et al., 2019) Link: https://arxiv.org/abs/1904.06019 Github: https://github.com/ryantibs/conformal/

Parameters:
  • score_function (callable) – Non-conformity score function.

  • model (torch.nn.Module) – A PyTorch model.

  • alpha (float, optional) – The significance level. Default is 0.1.

  • image_encoder (torch.nn.Module) – A PyTorch model to generate the embedding feature of an input image.

  • domain_classifier (torch.nn.Module, optional) – A PyTorch model (a binary classifier) to predict the probability that an embedding feature comes from the source domain. Default is None.

  • temperature (float, optional) – The temperature of Temperature Scaling. Default is 1.

  • device (torch.device, optional) – The device on which the model is located. Default is None.

calculate_threshold(logits, labels, alpha=None)

Calculate the conformal prediction threshold.

Parameters:
  • logits (torch.Tensor) – The logits output from the model.

  • labels (torch.Tensor) – The ground truth labels.

  • alpha (float) – The significance level. Default is None.

calibrate(cal_dataloader, alpha=None)

Calibrate the model using the calibration set.

Parameters:
  • cal_dataloader (torch.utils.data.DataLoader) – A dataloader of the calibration set.

  • alpha (float) – The significance level. Default is None.

evaluate(val_dataloader: DataLoader) Dict[str, float]

Evaluate prediction sets on validation dataset using domain adaptation.

This method trains a domain classifier if not provided, computes importance weights for validation set, generates prediction sets and calculates metrics.

Parameters:

val_dataloader (DataLoader) – Dataloader for validation set.

Returns:

Dictionary containing evaluation metrics:
  • Coverage_rate: Empirical coverage rate on validation set

  • Average_size: Average size of prediction sets

Return type:

dict

Raises:

ValueError – If calibration has not been performed first.

predict(x_batch)

Generate prediction sets for a batch of instances.

Parameters:

x_batch (torch.Tensor) – A batch of instances.

Returns:

A list of prediction sets for each instance in the batch.

Return type:

list

loss function

ConfTrLoss

Conformal Training (ConfTr) Loss Implementation.

ConfTSLoss

Conformal Temperature Scaling (ConfTS).

CDLoss

Implementation of Conformal Discriminative Loss (CDLoss) for efficient conformal prediction.

UncertaintyAwareLoss

A loss function used for conformalized uncertainty-aware training of deep multi-class classifiers

class torchcp.classification.loss.ConfTrLoss(predictor, alpha, fraction, soft_qunatile=True, epsilon=0.0001, loss_type='valid', target_size=1, loss_transform='square')

Conformal Training (ConfTr) Loss Implementation.

A method for training neural networks with built-in conformal prediction guarantees. Optimizes models to output efficient prediction sets while maintaining coverage.

Parameters:
  • predictor (torchcp.classification.Predictor) – An instance of the CP predictor class.

  • alpha (float) – The significance level for each training batch.

  • fraction (float) – The fraction of the calibration set in each training batch. Must be a value in (0, 1).

  • soft_qunatile (bool, optional) – Whether to use soft quantile. Default is True.

  • epsilon (float, optional) – A temperature value. Default is 1e-4.

  • loss_type (str) – The selected (multi-selected) loss functions, which can be “valid”, “classification”, “probs”, “coverage”.

  • target_size (int, optional) – Optional: 0 | 1. Default is 1.

  • loss_transform (str, optional) – A transform for loss. Default is “square”. Can be “square”, “abs”, or “log”.

Examples::
>>> predictor = torchcp.classification.SplitPredictor()
>>> conftr = ConfTr(predictor=predictor, alpha=0.05, fraction=0.2, loss_type="valid")
>>> logits = torch.randn(100, 10)
>>> labels = torch.randint(0, 2, (100,))
>>> loss = conftr(logits, labels)
>>> loss.backward()
Reference:

Stutz et al. “Learning Optimal Conformal Classifiers”, ICLR 2021, https://arxiv.org/abs/2110.09192

Github: https://github.com/google-deepmind/conformal_training

class torchcp.classification.loss.ConfTSLoss(predictor, alpha, fraction=0.5, soft_qunatile=True)

Conformal Temperature Scaling (ConfTS).

The class implements the loss function of conformal temperature scaling. It supports multiple loss functions and allows for flexible configuration of the training process.

Parameters:
  • predictor (torchcp.classification.Predictor) – An instance of the CP predictor class.

  • fraction (float) – The fraction of the calibration set in each training batch. Must be a value in (0, 1). Default is 0.5.

  • soft_qunatile (bool, optional) – Whether to use soft quantile. Default is True.

Examples::
>>> predictor = torchcp.classification.SplitPredictor(score_function=APS(score_type="softmax", randomized=False))
>>> confts = ConfTS(predictor=predictor, fraction=0.2)
>>> logits = torch.randn(100, 10)
>>> labels = torch.randint(0, 2, (100,))
>>> loss = confts(logits, labels)
>>> loss.backward()
Reference:

Xi et al. “Delving into Temperature Scaling for Adaptive Conformal Prediction” (2023), https://arxiv.org/abs/2402.04344

forward(logits, labels)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class torchcp.classification.loss.CDLoss(predictor, epsilon=0.0001)

Implementation of Conformal Discriminative Loss (CDLoss) for efficient conformal prediction.

This loss function encourages the model to output prediction sets that: 1. Contain the true label with high probability 2. Are as small as possible for efficiency

The loss is computed by measuring the probability of each class being included in the prediction set relative to the true label’s score.

Parameters:
  • predictor (torchcp.classification.Predictor) – Predictor instance that defines the scoring mechanism for conformal prediction.

  • epsilon (float, optional) – Temperature parameter that controls the sharpness of the sigmoid function. Smaller values create sharper boundaries. Default: 1e-4

Reference:

Liu et al. “C-Adapter: Adapting Deep Classifiers for Efficient Conformal Prediction Sets”. arXiv:2410.09408, 2024.

forward(logits, labels)

Compute the Conformal Discriminative Loss for a batch of predictions.

Parameters:
  • logits (Tensor) – Model output logits with shape (batch_size, num_classes)

  • labels (Tensor) – Ground truth class labels with shape (batch_size,)

Returns:

Scalar loss value computed as the weighted average of prediction

set probabilities across all classes and samples.

Return type:

Tensor

Note

Implementation follows Equation (4) from the paper, using sigmoid function to compute smooth approximation of prediction set membership.

class torchcp.classification.loss.UncertaintyAwareLoss

A loss function used for conformalized uncertainty-aware training of deep multi-class classifiers

Examples

>>> conflearn_loss_fn = ConfLearnLoss()
>>> output = torch.randn(100, 10)
>>> target = torch.randint(0, 2, (100,))
>>> Z_batch = torch.randint(0, 2, (100,))
>>> loss = conflearn_loss_fn(output, target, Z_batch)
>>> loss.backward()
Reference:

Einbinder et al. “Training Uncertainty-Aware Classifiers with Conformalized Deep Learning” (2022), https://arxiv.org/abs/2205.05878

compute_loss(y_train_pred, y_train_batch)

Computes the conformal loss for a given batch of predictions and ground truth.

Parameters:
  • y_train_pred (torch.Tensor) – The model’s predicted logits for the batch.

  • y_train_batch (torch.Tensor) – The ground truth labels for the batch.

Returns:

The conformal loss for the batch.

Return type:

torch.Tensor

forward(output, target)

Forward pass of the conformal loss function. The loss is computed by iterating over different groupings in Z_batch, applying the conformal loss for each group, and averaging the loss over all groups.

Parameters:
  • output (torch.Tensor) – The model’s output logits (predictions before softmax).

  • target (torch.Tensor) – The ground truth labels.

Returns:

The computed loss for the given batch.

Return type:

torch.Tensor

trainer

BaseTrainer

Abstract base trainer class that handles basic model setup and device configuration.

ConfTSTrainer

Conformal Temperature Scaling Trainer.

TSTrainer

Temperature Scaling Trainer for model calibration.

OrdinalTrainer

A trainer for training ordinal classifiers.

UncertaintyAwareTrainer

Conformalized uncertainty-aware training of deep multi-class classifiers

SCPOTrainer

Trainer for Surrogate Conformal Predictor Optimization.

class torchcp.classification.trainer.BaseTrainer(model: Module, device: device | None = None, verbose: bool = True)

Abstract base trainer class that handles basic model setup and device configuration.

Parameters:
  • model (torch.nn.Module) – Neural network model to be trained

  • device (torch.device, optional) – Device to run the model on. If None, will automatically use GPU (‘cuda’) if available, otherwise CPU (‘cpu’) Default: None

  • verbose (bool) – Whether to show training progress Default: True

load_model(path: str) None

Load model state dict from disk.

Parameters:

path – Path to saved model weights

save_model(path: str) None

Save model state dict to disk.

Parameters:

path – Path to save model weights

abstract train(train_loader: DataLoader, val_loader: DataLoader | None = None, **kwargs) Module

Train the model. Must be implemented by subclasses.

Parameters:
  • train_loader – DataLoader for training data

  • val_loader – Optional DataLoader for validation data

  • **kwargs – Additional training arguments

Returns:

Trained model

Return type:

torch.nn.Module

class torchcp.classification.trainer.ConfTSTrainer(model: Module, alpha: float, init_temperature: float, device: device | None = None, verbose: bool = True)

Conformal Temperature Scaling Trainer.

Parameters:
  • model (torch.nn.Module) – Base neural network model to be calibrated.

  • init_temperature (float) – Initial value for temperature scaling parameter.

  • alpha (float) – Target miscoverage rate (significance level) for conformal prediction.

  • device (torch.device, optional) – Device to run the model on. If None, will automatically use GPU (‘cuda’) if available, otherwise CPU (‘cpu’) Default: None

  • verbose (bool) – Whether to display training progress. Default: True.

Examples

>>> # Initialize a CNN model
>>> cnn = torchvision.models.resnet18(pretrained=True)
>>>
>>> # Create ConfTS trainer
>>> trainer = ConfTSTrainer(
...     model=cnn,
...     init_temperature=1.5,
...     alpha=0.1
...     )
>>>
>>> # Train calibration
>>> trainer.train(
...     train_loader=train_loader,
...     val_loader=val_loader,
...     num_epochs=10
... )
>>>
>>> # Save calibrated model
>>> trainer.save_model('calibrated_model.pth')
train(train_loader, lr=0.01, num_epochs=100)

Train temperature scaling parameter using LBFGS optimizer.

Collects logits and labels from training data, then optimizes the temperature parameter to minimize NLL loss.

Parameters:
  • train_loader (DataLoader) – DataLoader with calibration data

  • lr (float, optional) – Learning rate for LBFGS. Defaults to 0.01

  • num_epochs (int, optional) – Max LBFGS iterations. Defaults to 100

class torchcp.classification.trainer.TSTrainer(model: Module, init_temperature: float, device: device | None = None, verbose: bool = True)

Temperature Scaling Trainer for model calibration.

This trainer implements temperature scaling to calibrate neural network predictions. It optimizes a single temperature parameter that divides the logits to improve model calibration.

Parameters:
  • model (torch.nn.Module) – Base neural network model to calibrate

  • init_temperature (float) – Initial temperature scaling parameter

  • device (torch.device, optional) – Device to run the model on. If None, will automatically use GPU (‘cuda’) if available, otherwise CPU (‘cpu’) Default: None

  • verbose (bool, optional) – Whether to print progress. Defaults to True

train(train_loader: DataLoader, lr: float = 0.01, num_epochs: int = 100)

Train temperature scaling parameter using LBFGS optimizer.

Collects logits and labels from training data, then optimizes the temperature parameter to minimize NLL loss.

Parameters:
  • train_loader (DataLoader) – DataLoader with calibration data

  • lr (float, optional) – Learning rate for LBFGS. Defaults to 0.01

  • num_epochs (int, optional) – Max LBFGS iterations. Defaults to 100

class torchcp.classification.trainer.OrdinalTrainer(model: Module, ordinal_config: Dict[str, str], device: device | None = None, verbose: bool = True)

A trainer for training ordinal classifiers.

This class extends the Trainer class and provides methods for training, evaluating, and predicting with ordinal classifiers. It supports various configurations and training strategies to handle ordinal data.

Parameters:
  • model (torch.nn.Module) – Base neural network model

  • ordinal_config (Dict[str, str]) – Configuration for ordinal classifier phi (str): Type of phi function (“abs”, “square”) varphi (str): Type of varphi function (“abs”, “square”) example: {“phi”: “abs”, “varphi”: “abs”}

  • device (torch.device, optional) – Device to run the model on. If None, will automatically use GPU (‘cuda’) if available, otherwise CPU (‘cpu’) Default: None

  • verbose (bool) – Whether to display training progress Default: True

Examples

>>> # Define base model
>>> backbone = torchvision.models.resnet18(pretrained=True)
>>>
>>> # Configure ordinal classifier
>>> ordinal_config = {
...     "phi": "square",
...     "varphi": "abs"
... }
>>>
>>> # Create trainer
>>> trainer = OrdinalTrainer(
...     model=backbone,
...     ordinal_config=ordinal_config
...     )
>>>
>>> # Train model
>>> trainer.train(
...     train_loader=train_loader,
...     val_loader=val_loader,
...     num_epochs=10
... )
class torchcp.classification.trainer.UncertaintyAwareTrainer(model: Module, weight: float, device: device | None = None, verbose: bool = True)

Conformalized uncertainty-aware training of deep multi-class classifiers

Parameters:
  • model (torch.nn.Module) – Neural network model to train.

  • device (torch.device) – Device to run the model on. If None, will automatically use GPU (‘cuda’) if available, otherwise CPU (‘cpu’) Default: None

  • verbose (bool) – Whether to print progress. Defaults to True.

Examples

>>> model = MyModel()
>>> trainer = ConfLearnTrainer(model, device='cuda')
>>> save_path = './path/to/save'
>>> trainer.train(train_loader, save_path, val_loader, num_epochs=10)
Reference:

Einbinder et al. “Training Uncertainty-Aware Classifiers with Conformalized Deep Learning” (2022), https://arxiv.org/abs/2205.05878

calculate_loss(output, target, Z_batch, training=True)

Calculates the total loss during training or validation.

The loss is a combination of the prediction loss and the conformal prediction loss, where the conformal loss is weighted by the hyperparameter mu.

Parameters:
  • output (torch.Tensor) – The model’s output predictions (logits).

  • target (torch.Tensor) – The true labels (ground truth).

  • Z_batch (torch.Tensor) – A tensor indicating which samples are used for conformal prediction loss.

  • training (bool) – A flag indicating whether the calculation is for training or validation (default: True).

Returns:

The computed total loss.

Return type:

torch.Tensor

split_dataloader(data_loader: DataLoader, split_ratio=0.8)

This function splits a given DataLoader into two parts based on the specified split ratio for calculate cross-entropy loss and conformal loss, respectively. The split is done randomly, and the labels for the split data are generated as a binary indicator.

Parameters:
  • data_loader (DataLoader) – The DataLoader object containing the original dataset.

  • split_ratio (float, optional) – The ratio to split the dataset into two parts.

Returns:

A new DataLoader that contains the modified dataset with the binary labels.

Return type:

DataLoader

train(train_loader: DataLoader, val_loader: DataLoader | None = None, num_epochs: int = 10)

Train the model

Parameters:
  • train_loader – DataLoader for training data

  • val_loader – Optional DataLoader for validation data

  • num_epochs – Number of training epochs

train_epoch(train_loader: DataLoader)

Trains the model for one epoch.

The function iterates through the training data and updates the model parameters using backpropagation and the optimizer.

Parameters:

train_loader (torch.utils.data.DataLoader) – The DataLoader providing the training data.

validate(val_loader: DataLoader) float

Evaluate model on validation set

Parameters:

val_loader – DataLoader for validation data

Returns:

Average validation loss

class torchcp.classification.trainer.SCPOTrainer(model: Module, alpha: float, lr: float = 0.1, lambda_val: float = 10000, gamma_val: float = 1, device: device | None = None, verbose: bool = True)

Trainer for Surrogate Conformal Predictor Optimization.

Parameters:
  • model (torch.nn.Module) – Base neural network model to be calibrated.

  • alpha (float) – The significance level for each training batch.

  • lr (float) – Learning rate for the optimizer. Default is 0.1.

  • lambda_val (float) – Weight for the coverage loss term.

  • gamma_val (float) – Inverse of the temperature value.

  • device (torch.device, optional) – Device to run the model on. If None, will automatically use GPU (‘cuda’) if available, otherwise CPU (‘cpu’) Default: None

  • verbose (bool) – Whether to display training progress. Default: True.

Examples

>>> # Define base model
>>> backbone = torchvision.models.resnet18(pretrained=True)
>>>
>>> # Create SCPO trainer
>>> trainer = SCPOTrainer(
...             model=model,
...             alpha=0.01,
...             device=device,
...             verbose=True)
>>>
>>> # Train model
>>> trainer.train(
...     train_loader=train_loader,
...     num_epochs=10
... )

metrics

coverage_rate(prediction_sets, labels[, ...])

The metric for empirical coverage.

average_size(prediction_sets[, labels])

CovGap(prediction_sets, labels, alpha, ...)

The average class-conditional coverage gap.

VioClasses(prediction_sets, labels, alpha, ...)

The number of violated classes.

DiffViolation(logits, prediction_sets, ...)

Difficulty-stratified coverage violation

SSCV(prediction_sets, labels, alpha[, ...])

Size-stratified coverage violation (SSCV).

WSC(features, prediction_sets, labels[, ...])

Worst-Slice Coverage (WSC).

torchcp.classification.utils.metrics.coverage_rate(prediction_sets, labels, coverage_type='default', num_classes=None)

The metric for empirical coverage.

Parameters:
  • prediction_sets (torch.Tensor) – Boolean tensor of prediction sets (N x C), where N is number of samples and C is number of classes.

  • labels (torch.Tensor) – Ground-truth labels (N,).

  • coverage_type (str, optional) – Type of coverage rate calculation. ‘default’: marginal coverage rate ‘macro’: average coverage rate across all classes

  • num_classes (int, optional) – Number of classes. Required when coverage_type is ‘macro’.

Returns:

Empirical coverage rate.

Return type:

float

torchcp.classification.utils.metrics.average_size(prediction_sets, labels=None)
torchcp.classification.utils.metrics.CovGap(prediction_sets, labels, alpha, num_classes, shot_idx=None)

The average class-conditional coverage gap.

Paper: Class-Conditional Conformal Prediction with Many Classes (Ding et al., 2023) Link: https://neurips.cc/virtual/2023/poster/70548

Parameters:
  • prediction_sets (torch.Tensor) – Boolean tensor of prediction sets (N x C).

  • labels (torch.Tensor) – Ground-truth labels (N,).

  • alpha (float) – User-guided confidence level.

  • num_classes (int) – Number of classes.

  • shot_idx (list, optional) – Indices of classes to compute coverage gap.

Returns:

Average class-conditional coverage gap (percentage).

Return type:

float

torchcp.classification.utils.metrics.VioClasses(prediction_sets, labels, alpha, num_classes)

The number of violated classes.

Paper: Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data (Kasa et al., 2023) Link: https://arxiv.org/abs/2307.01088

Parameters:
  • prediction_sets (torch.Tensor) – Boolean tensor of prediction sets (N x C).

  • labels (torch.Tensor) – Ground-truth labels (N,).

  • alpha (float) – User-guided confidence level.

  • num_classes (int) – Number of classes.

Returns:

Number of classes with violated coverage.

Return type:

int

torchcp.classification.utils.metrics.DiffViolation(logits, prediction_sets, labels, alpha, strata_diff=[[1, 1], [2, 3], [4, 6], [7, 10], [11, 100], [101, 1000]])

Difficulty-stratified coverage violation

Paper: Uncertainty Sets for Image Classifiers using Conformal Prediction (Angelopoulos et al., 2020) Link: https://arxiv.org/abs/2009.14193

Parameters:
  • logits (torch.Tensor) – the predicted logits.

  • prediction_sets (torch.Tensor) – the prediction sets generated by CP algorithms.

  • labels (list) – the ground-truth label of each samples.

  • alpha (float) – the user-guided confidence level.

  • strata_diff (list) – a coarse partitioning of the possible difficulties.

Returns:

(the difficulty-stratified coverage violation, the number of samples, the empirical coverage and size of each difficulty).

Return type:

2-tuple

torchcp.classification.utils.metrics.SSCV(prediction_sets, labels, alpha, stratified_size=[[0, 1], [2, 3], [4, 10], [11, 100], [101, 1000]])

Size-stratified coverage violation (SSCV).

Paper: Uncertainty Sets for Image Classifiers using Conformal Prediction (Angelopoulos et al., 2020)

Link : https://iclr.cc/virtual/2021/spotlight/3435

Parameters:
  • prediction_sets (torch.Tensor) – Boolean tensor of prediction sets (N x C).

  • labels (torch.Tensor) – Ground-truth labels (N,).

  • alpha (float) – User-guided confidence level (between 0 and 1).

  • stratified_size (list) – Coarse partitioning of possible set sizes. Each element should be a list [min_size, max_size] where: - min_size and max_size are non-negative integers - min_size <= max_size - Ranges should not overlap

Returns:

The value of SSCV.

Return type:

Int

torchcp.classification.utils.metrics.WSC(features, prediction_sets, labels, delta=0.1, M=1000, test_fraction=0.75, random_state=2020, verbose=False)

Worst-Slice Coverage (WSC).

Classification with Valid and Adaptive Coverage (Romano et al., 2020) Paper: Classification with Valid and Adaptive Coverage Link : https://proceedings.neurips.cc/paper/2020/hash/244edd7e85dc81602b7615cd705545f5-Abstract.html Code: https://github.com/msesia/arc/tree/d80d27519f18b11e7feaf8cf0da8827151af9ce3

Parameters:
  • features (torch.Tensor) – Input features (N x D).

  • prediction_sets (torch.Tensor) – Boolean tensor of prediction sets (N x C).

  • y (torch.Tensor) – Ground-truth labels (N,).

  • delta (float) – Confidence level (between 0 and 1).

  • M (int) – Number of random projections.

  • test_size (float) – Proportion of test split.

  • random_state (int) – Random seed.

  • verbose – Whether to print progress.

torchcp.classification.utils.metrics.singleton_hit_ratio(prediction_sets, labels)
torchcp.classification.utils.metrics.compute_p_values(cal_scores, test_scores, smooth=False)

Compute p-values for conformal prediction.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

p-values for each test sample and class, shape (n_test, k)

Return type:

Tensor

torchcp.classification.utils.metrics.pvalue_criterion_S(cal_scores, test_scores, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Sum criterion: measures efficiency by the average sum of the p-values.

Smaller values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

The average sum of the p-values across all test samples.

Return type:

float

torchcp.classification.utils.metrics.pvalue_criterion_N(cal_scores, test_scores, alpha, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Number criterion: uses the average size of the prediction sets.

Smaller values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • alpha (float) – The significance level.

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

The average size of the prediction sets.

Return type:

float

torchcp.classification.utils.metrics.pvalue_criterion_U(cal_scores, test_scores, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Unconfidence criterion: uses the average unconfidence over the test sequence,

where the unconfidence for a test object x_i is the second largest p-value. Smaller values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

Mean of second-largest p-values across test samples.

Return type:

float

torchcp.classification.utils.metrics.pvalue_criterion_F(cal_scores, test_scores, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Fuzziness criterion: uses the average fuzziness where the fuzziness for a test object x_i is defined

as the sum of all p_values apart from a largest one. Smaller values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

Mean sum of p-values minus the max p-value per sample.

Return type:

float

torchcp.classification.utils.metrics.pvalue_criterion_M(cal_scores, test_scores, alpha, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Multiple criterion: uses the percentage of objects x_i in the test sequence

for which the prediction set at significance level is multiple. Smaller values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • alpha (float) – The significance level.

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

Proportion of test samples with prediction set size > 1.

Return type:

float

torchcp.classification.utils.metrics.pvalue_criterion_E(cal_scores, test_scores, alpha, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Excess criterion: uses the average amount the size of the prediction set exceeds 1.

Larger values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • alpha (float) – The significance level.

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

Mean of (set size - 1), clamped at 0, across test samples.

Return type:

float

torchcp.classification.utils.metrics.pvalue_criterion_OU(cal_scores, test_scores, test_labels, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Observed Unconfidence criterion: uses the average observed unconfidence over the test sequence,

where the observed unconfidence for a test example (x_i, y_i) is the largest p-value for the false labels. Smaller values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • test_labels (Tensor) – Ground-Truth labels for test samples.

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

Mean of the highest p-value among incorrect classes per sample.

Return type:

float

torchcp.classification.utils.metrics.pvalue_criterion_OF(cal_scores, test_scores, test_labels, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Observed Fuzziness criterion: uses the average sum of the pvalues for the false labels.

Smaller values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

Mean sum of p-values excluding the true label per test sample.

Return type:

float

torchcp.classification.utils.metrics.pvalue_criterion_OM(cal_scores, test_scores, test_labels, alpha, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Observed Multiple criterion: uses the percentage of observed multiple predictions in the test sequence,

where an observed multiple prediction is defined to be a prediction set including a false label. Smaller values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • test_labels (Tensor) – Ground-Truth labels for test samples.

  • alpha (float) – The significance level.

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

Proportion of test samples where prediction set contains at least one wrong class.

Return type:

float

torchcp.classification.utils.metrics.pvalue_criterion_OE(cal_scores, test_scores, test_labels, alpha, smooth=False)

Paper: Criteria of efficiency for conformal prediction (Vovk et al., 2016)

Observed Excess criterion: uses the average number of false labels included

in the prediction sets at significance level. Smaller values are preferable.

Parameters:
  • cal_scores (Tensor) – Nonconformity scores from the calibration set, shape (n_cal,).

  • test_scores (Tensor) – Nonconformity scores for test samples across k classes, shape (n_test, k).

  • test_labels (Tensor) – Ground-Truth labels for test samples.

  • alpha (float) – The significance level.

  • smooth (bool) – Whether to apply randomized smoothing when calibration scores equal test scores.

Returns:

Mean number of wrong classes in prediction sets across test samples.

Return type:

float

utils

TS([temperature])

Using a pre-defiend tempreature to scale the logits

class torchcp.classification.utils.TS(temperature=1)

Using a pre-defiend tempreature to scale the logits

forward(batch_logits)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

optimze(dataloader, device=None, max_iters=10, lr=0.01, epsilon=0.01)

Tune the tempearature of the model (using the validation set). We’re going to set it to optimize NLL. valid_loader (DataLoader): validation set loader