napistu.genomics.scverse_loading

Functions for connection scverse data with Napistu graphs.

Classes

DatasetConfig:

Pydantic model for a single dataset configuration.

DatasetsConfig:

Pydantic model for multiple datasets configuration.

Public Functions

prepare_anndata_results_df:

Prepare a results table from an AnnData object for use in Napistu.

prepare_mudata_results_df:

Prepare results tables from a MuData object for use in Napistu, with adata-specific ontology handling.

Functions

prepare_anndata_results_df(adata[, ...])

Prepare a results table from an AnnData object for use in Napistu.

prepare_mudata_results_df(mdata, ...[, ...])

Prepare results tables from a MuData object for use in Napistu, with adata-specific ontology handling.

Classes

DatasetConfig(*, name, uri, path)

Pydantic model for a single dataset configuration.

DatasetsConfig(data)

Container for multiple dataset configurations.

ModalityOntologyConfig(*, ontologies[, ...])

Configuration for ontology handling in a single modality.

MultiModalityOntologyConfig([root])

Configuration for ontology handling across multiple modalities.

class napistu.genomics.scverse_loading.DatasetConfig(*, name: str, uri: str, path: Path)

Bases: BaseModel

Pydantic model for a single dataset configuration.

name

Name of the dataset.

Type:

str

uri

URI/URL for the dataset (must start with http:// or https://).

Type:

str

path

Local file path to the dataset file (.h5ad or .h5mu).

Type:

Path

Public Methods
--------------
load_h5ad

Load an .h5ad file as an AnnData object.

load_h5mu

Load a .h5mu file as a MuData object.

Examples

>>> from pathlib import Path
>>> config = DatasetConfig(
...     name="my_dataset",
...     uri="https://example.com/dataset",
...     path=Path("/path/to/dataset.h5ad")
... )
>>> adata = config.load_h5ad()
classmethod validate_path(v: str | Path) Path

Validate that path is a non-empty string or Path.

classmethod validate_uri(v: str) str

Validate that uri is a valid URL.

load_h5ad() anndata.AnnData

Load an .h5ad file as an AnnData object.

load_h5mu() mudata.MuData

Load a .h5mu file as a MuData object.

_abc_impl = <_abc._abc_data object>
model_config = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
path: Path
uri: str
class napistu.genomics.scverse_loading.DatasetsConfig(data: Dict[str, DatasetConfig | Dict[str, str | Path]])

Bases: object

Container for multiple dataset configurations.

data

Dictionary mapping dataset names to DatasetConfig objects.

Type:

Dict[str, DatasetConfig]

Public Methods
--------------
get

Get dataset config by name, raising KeyError if not found.

keys

Return dataset names.

values

Return dataset configs.

items

Return (name, config) pairs.

Private Methods
---------------
_get_item

Support dictionary-style access.

_contains

Support ‘in’ operator.

Examples

>>> from pathlib import Path
>>> data = {
...     "dataset1": {
...         "uri": "https://example.com/dataset1",
...         "path": "/path/to/dataset1.h5ad"
...     },
...     "dataset2": {
...         "uri": "https://example.com/dataset2",
...         "path": "/path/to/dataset2.h5mu"
...     }
... }
>>> configs = DatasetsConfig(data)
>>> # Access datasets
>>> dataset1 = configs["dataset1"]
>>> dataset2 = configs.get("dataset2")
>>> # Check if dataset exists
>>> "dataset1" in configs
True
>>> # Iterate over datasets
>>> for name, config in configs.items():
...     print(f"{name}: {config.uri}")
__init__(data: Dict[str, DatasetConfig | Dict[str, str | Path]])

Initialize DatasetsConfig from a dictionary, ensuring each DatasetConfig has its name set.

Parameters:

data (Dict[str, Union[DatasetConfig, Dict[str, Union[str, Path]]]]) – Dictionary mapping dataset names to either: - DatasetConfig objects (name will be set from key if missing) - Dicts that will be converted to DatasetConfig (name will be set from key)

get(key: str) DatasetConfig

Get dataset config by name, raising KeyError if not found.

items()

Return (name, config) pairs.

keys()

Return dataset names.

values()

Return dataset configs.

data: Dict[str, DatasetConfig]
class napistu.genomics.scverse_loading.ModalityOntologyConfig(*, ontologies: Set[str] | Dict[str, str] | None, index_which_ontology: str | None = None)

Bases: BaseModel

Configuration for ontology handling in a single modality.

_abc_impl = <_abc._abc_data object>
index_which_ontology: str | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ontologies: Set[str] | Dict[str, str] | None
class napistu.genomics.scverse_loading.MultiModalityOntologyConfig(root: RootModelRootType = PydanticUndefined)

Bases: RootModel

Configuration for ontology handling across multiple modalities.

classmethod from_dict(data: Dict[str, Dict[str, Set[str] | Dict[str, str] | None | str]]) MultiModalityOntologyConfig

Create a MultiModalityOntologyConfig from a dictionary.

Parameters:

data (Dict[str, Dict[str, Union[Optional[Union[Set[str], Dict[str, str]]], Optional[str]]]]) – Dictionary mapping modality names to their ontology configurations. Each modality config should have ‘ontologies’ and optionally ‘index_which_ontology’. The ‘ontologies’ field can be: - None to automatically detect valid ontology columns - Set of columns to treat as ontologies - Dict mapping wide column names to ontology names

Returns:

Validated ontology configuration

Return type:

MultiModalityOntologyConfig

items()
_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

root: Dict[str, ModalityOntologyConfig]
napistu.genomics.scverse_loading._create_results_df(array: ndarray, attrs: List[str], var_index: Index, table_type: str) DataFrame

Create a DataFrame with the right orientation based on table type.

For varm/varp tables:
  • rows are vars (var_index)

  • columns are attrs (features/selected vars)

For X/layers:
  • rows are attrs (selected observations)

  • columns are vars (var_index)

  • then transpose to get vars as rows

napistu.genomics.scverse_loading._extract_ontologies(var_table: DataFrame, ontologies: Set[str] | Dict[str, str] | None = None, index_which_ontology: str | None = None) DataFrame

Extract ontology columns from a var table, optionally including the index as an ontology.

Parameters:
  • var_table (pd.DataFrame) – The var table containing systematic identifiers

  • ontologies (Optional[Union[Set[str], Dict[str, str]]], default=None) – Either: - Set of columns to treat as ontologies (these should be entries in ONTOLOGIES_LIST) - Dict mapping wide column names to ontology names in the ONTOLOGIES_LIST controlled vocabulary - None to automatically detect valid ontology columns based on ONTOLOGIES_LIST

  • index_which_ontology (Optional[str], default=None) – If provided, extract the index as this ontology. Must not already exist in var table.

Returns:

DataFrame containing only the ontology columns, with the same index as var_table

Return type:

pd.DataFrame

Raises:

ValueError – If index_which_ontology already exists in var table If any renamed ontology column already exists in var table If any rename values are duplicated If any final column names are not in ONTOLOGIES_LIST

napistu.genomics.scverse_loading._get_table_from_dict_attr(adata: anndata.AnnData | mudata.MuData, attr_name: str, table_name: str | None = None) pd.DataFrame | np.ndarray

Get a table from a dict-like AnnData attribute (varm, layers, etc.)

Parameters:
  • adata (anndata.AnnData or mudata.MuData) – The AnnData or MuData object to load the table from

  • attr_name (str) – Name of the attribute (‘varm’, ‘layers’, etc.)

  • table_name (str, optional) – Specific table name to retrieve. If None and only one table exists, that table will be returned. If None and multiple tables exist, raises ValueError

Returns:

The table data. For array-type attributes (varm, varp, X, layers), returns numpy array. For other attributes, returns DataFrame

Return type:

Union[pd.DataFrame, np.ndarray]

Raises:

ValueError – If attr_name is not a valid dict-like attribute If no tables found in the attribute If multiple tables found and table_name not specified If specified table_name not found

napistu.genomics.scverse_loading._get_valid_attrs_for_feature_level_array(adata: anndata.AnnData, table_type: str, raw_results_table: np.ndarray, table_colnames: List[str] | None = None) list[str]

Get valid attributes for a feature-level array.

Parameters:
  • adata (anndata.AnnData) – The AnnData object

  • table_type (str) – The type of table

  • raw_results_table (np.ndarray) – The raw results table for dimension validation

  • table_colnames (Optional[List[str]]) – Column names for varm tables

Returns:

List of valid attributes for this table type

Return type:

list[str]

Raises:

ValueError – If table_type is invalid or if table_colnames validation fails for varm tables

napistu.genomics.scverse_loading._load_raw_table(adata: anndata.AnnData | mudata.MuData, table_type: str, table_name: str | None = None) pd.DataFrame | np.ndarray

Load an AnnData table.

This function loads an AnnData table and returns it as a pd.DataFrame.

Parameters:
  • adata (anndata.AnnData or mudata.MuData) – The AnnData or MuData object to load the table from.

  • table_type (str) – The type of table to load.

  • table_name (str, optional) – The name of the table to load.

Returns:

The loaded table.

Return type:

pd.DataFrame or np.ndarray

napistu.genomics.scverse_loading._select_results_attrs(adata: anndata.AnnData, raw_results_table: pd.DataFrame | np.ndarray, table_type: str, results_attrs: List[str] | None = None, table_colnames: List[str] | None = None) pd.DataFrame

Select results attributes from an AnnData object.

This function selects results attributes from raw_results_table derived from an AnnData object and converts them if needed to a pd.DataFrame with appropriate indices.

Parameters:
  • adata (anndata.AnnData) – The AnnData object containing the results to be formatted.

  • raw_results_table (pd.DataFrame or np.ndarray) – The raw results table to be formatted.

  • table_type (str,) – The type of table raw_results_table refers to.

  • results_attrs (list of str, optional) – The attributes to extract from the raw_results_table.

  • table_colnames (list of str, optional,) – If table_type is varm, this is the names of all columns (e.g., PC1, PC2, etc.). Ignored otherwise

Returns:

A DataFrame containing the formatted results.

Return type:

pd.DataFrame

napistu.genomics.scverse_loading._split_mdata_results_by_modality(mdata: mudata.MuData, results_data_table: pd.DataFrame) Dict[str, pd.DataFrame]

Split a results table by modality and verify compatibility with var tables.

Parameters:
  • mdata (mudata.MuData) – MuData object containing multiple modalities

  • results_data_table (pd.DataFrame) – Results table with vars as rows, typically from prepare_anndata_results_df()

Returns:

Dictionary with modality names as keys and DataFrames as values. Each DataFrame contains just the results for that modality. The index of each DataFrame is guaranteed to match the corresponding modality’s var table for later merging.

Return type:

Dict[str, pd.DataFrame]

Raises:

ValueError – If any modality’s vars are not found in the results table If any modality’s results have different indices than its var table

napistu.genomics.scverse_loading.prepare_anndata_results_df(adata: anndata.AnnData | mudata.MuData, table_type: str = 'var', table_name: str | None = None, results_attrs: List[str] | None = None, ontologies: Set[str] | Dict[str, str] | None = None, index_which_ontology: str | None = None, table_colnames: List[str] | None = None) pd.DataFrame

Prepare a results table from an AnnData object for use in Napistu.

This function extracts a table from an AnnData object and formats it for use in Napistu. The returned DataFrame will always include systematic identifiers from the var table, along with the requested results data.

Parameters:
  • adata (anndata.AnnData or mudata.MuData) – The AnnData or MuData object containing the results to be formatted.

  • table_type (str, optional) – The type of table to extract from the AnnData object. Must be one of: “var”, “varm”, or “X”.

  • table_name (str, optional) – The name of the table to extract from the AnnData object.

  • results_attrs (list of str, optional) – The attributes to extract from the table.

  • index_which_ontology (str, optional) – The ontology to use for the systematic identifiers. This column will be pulled out of the index renamed to the ontology name, and added to the results table as a new column with the same name. Must not already exist in var table.

  • ontologies (Optional[Union[Set[str], Dict[str, str]]], default=None) –

    Either: - Set of columns to treat as ontologies (these should be entries in ONTOLOGIES_LIST ) - Dict mapping wide column names to ontology names in the ONTOLOGIES_LIST controlled vocabulary - None to automatically detect valid ontology columns based on ONTOLOGIES_LIST

    If index_which_ontology is defined, it should be represented in these ontologies.

  • table_colnames (Optional[List[str]], optional) – Column names for varm tables. Required when table_type is “varm”. Ignored otherwise.

Returns:

A DataFrame containing the formatted results with systematic identifiers. The index will match the var_names of the AnnData object.

Return type:

pd.DataFrame

Raises:

ValueError – If table_type is not one of: “var”, “varm”, or “X” If index_which_ontology already exists in var table

napistu.genomics.scverse_loading.prepare_mudata_results_df(mdata: mudata.MuData, mudata_ontologies: 'MultiModalityOntologyConfig' | Dict[str, Dict[str, Set[str] | Dict[str, str] | None | str | None]], table_type: str = 'var', table_name: str | None = None, results_attrs: List[str] | None = None, table_colnames: List[str] | None = None, level: str = 'mdata') Dict[str, pd.DataFrame]

Prepare results tables from a MuData object for use in Napistu, with adata-specific ontology handling.

This function extracts tables from each adata in a MuData object and formats them for use in Napistu. Each adata’s table will include systematic identifiers from its var table along with the requested results data. Ontology handling is configured per-adata using MultiModalityOntologyConfig.

Parameters:
  • mdata (mudata.MuData) – The MuData object containing the results to be formatted.

  • mudata_ontologies (MultiModalityOntologyConfig or dict) – Configuration for ontology handling modality (each with a separate AnnData object). Must include an entry for each modality. Can be either: - A MultiModalityOntologyConfig object - A dictionary that can be converted to MultiModalityOntologyConfig using from_dict() Each modality’s ‘ontologies’ field can be: - None to automatically detect valid ontology columns - Set of columns to treat as ontologies - Dict mapping wide column names to ontology names The ‘index_which_ontology’ field is optional.

  • table_type (str, optional) – The type of table to extract from each modality. Must be one of: “var”, “varm”, or “X”.

  • table_name (str, optional) – The name of the table to extract from each modality.

  • results_attrs (list of str, optional) – The attributes to extract from the table.

  • table_colnames (list of str, optional) – Column names for varm tables. Required when table_type is “varm”. Ignored otherwise.

  • level (str, optional) – Whether to extract data from “mdata” (MuData-level) or “adata” (individual AnnData-level) tables. Default is “mdata”.

Returns:

Dictionary mapping modality names to their formatted results DataFrames. Each DataFrame contains the modality’s results with systematic identifiers. The index of each DataFrame will match the var_names of that modality.

Return type:

Dict[str, pd.DataFrame]

Raises:

ValueError – If table_type is not one of: “var”, “varm”, or “X” If mudata_ontologies contains invalid configuration If modality-specific ontology extraction fails If any modality is missing from mudata_ontologies If level is not “global” or “modality”