Metric Contribution 🎯

Overview

This page describes how to add a new metric to scTimeBench.

Metric implementation is centered around the class hierarchy in src/scTimeBench/metrics/.

Metric Base Classes

All metrics inherit from BaseMetric, which provides the shared evaluation flow, database logging, dataset loading, and method-output validation.

Common family base classes include:

Use these family classes when the new metric fits an existing output type. Add a new family only when the evaluation flow is materially different.

Metric Hierarchy

Metric classes are registered automatically through the BaseMetric subclass mechanism. A metric can be a direct leaf class or a parent class that exposes submetrics.

Guidelines:

implement _defaults() to declare configurable parameters and their defaults,
implement _setup_supported_datasets() to declare the supported dataset class names,
implement _setup_method_output_requirements() to declare the required method outputs,
implement _prep_kwargs_for_submetric_eval() to pass the right arguments into the evaluator, and
implement _submetric_eval() or the family-specific evaluator such as _embedding_eval().

Helper or abstract classes that are not meant to run directly should use the skip_metric decorator where appropriate.

Dataset Matching

Metrics do not match on dataset tags. They match on dataset class names, for example MaDataset, SuoDataset, or GarciaAlonsoDataset.

The default dataset tags used by the benchmark are configured separately in:

When adding a new dataset, update both the dataset registry and the metric group lists so the dataset is discoverable by the right metrics.

Metric Groups

The framework uses metric groups to choose default datasets when a run does not specify them explicitly.

Current groups are defined in the shared dataset YAML and mapped to metric classes such as:

embedding metrics,
ontology-based metrics, and
gene-expression prediction metrics.

If your metric belongs to a new family, define a new default_dataset_group and add a matching entry in the YAML file.

Required Outputs

The metric should request the outputs that it needs from the method runner. The required files are defined with RequiredOutputFiles and are checked before evaluation begins.

Examples include:

EMBEDDING (embeddings for observed cells),
NEXT_TIMEPOINT_EMBEDDING (embeddings for projected cells),
NEXT_CELLTYPE (predicted cell types of projected cells), and
NEXT_GEX (predicted gene expression of projected cells).

For metrics with multiple acceptable output combinations, use a list of lists in required_outputs.

Validation

Confirm that the metric runs through the full pipeline:

dataset selection,
preprocessing,
method execution,
required-output validation, and
evaluation/database logging.

If the metric is a submetric family, verify that each leaf metric is discovered and evaluated in the expected order.

Checklist

metric class added under src/scTimeBench/metrics/
family base class chosen correctly
supported datasets listed by class name
default dataset group set when needed
required method outputs declared
evaluation logic covered by tests