napistu.genomics.scverse_loading

Functions for connection scverse data with Napistu graphs.

Classes

DatasetConfig:: Pydantic model for a single dataset configuration.
DatasetsConfig:: Pydantic model for multiple datasets configuration.

Public Functions

prepare_anndata_results_df:: Prepare a results table from an AnnData object for use in Napistu.
prepare_mudata_results_df:: Prepare results tables from a MuData object for use in Napistu, with adata-specific ontology handling.

Functions

`prepare_anndata_results_df`(adata[, ...])	Prepare a results table from an AnnData object for use in Napistu.
`prepare_mudata_results_df`(mdata, ...[, ...])	Prepare results tables from a MuData object for use in Napistu, with adata-specific ontology handling.

Classes

`DatasetConfig`(*, name, uri, path)	Pydantic model for a single dataset configuration.
`DatasetsConfig`(data)	Container for multiple dataset configurations.
`ModalityOntologyConfig`(*, ontologies[, ...])	Configuration for ontology handling in a single modality.
`MultiModalityOntologyConfig`([root])	Configuration for ontology handling across multiple modalities.

class napistu.genomics.scverse_loading.DatasetConfig(*, name: str, uri: str, path: Path)

Bases: BaseModel

Pydantic model for a single dataset configuration.

name

Name of the dataset.

Type:: str

uri

URI/URL for the dataset (must start with http:// or https://).

Type:: str

path

Local file path to the dataset file (.h5ad or .h5mu).

Type:: Path

Public Methods

--------------

load_h5ad: Load an .h5ad file as an AnnData object.

load_h5mu: Load a .h5mu file as a MuData object.

Examples

>>> from pathlib import Path
>>> config = DatasetConfig(
...     name="my_dataset",
...     uri="https://example.com/dataset",
...     path=Path("/path/to/dataset.h5ad")
... )
>>> adata = config.load_h5ad()

classmethod validate_path(v: str | Path) → Path: Validate that path is a non-empty string or Path.

classmethod validate_uri(v: str) → str: Validate that uri is a valid URL.

load_h5ad() → anndata.AnnData: Load an .h5ad file as an AnnData object.

load_h5mu() → mudata.MuData: Load a .h5mu file as a MuData object.

_abc_impl = <_abc._abc_data object>

model_config = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str

path: Path

uri: str

class napistu.genomics.scverse_loading.DatasetsConfig(data: Dict[str, DatasetConfig | Dict[str, str | Path]])

Bases: object

Container for multiple dataset configurations.

data

Dictionary mapping dataset names to DatasetConfig objects.

Type:: Dict[str, DatasetConfig]

Public Methods

--------------

get: Get dataset config by name, raising KeyError if not found.

keys: Return dataset names.

values: Return dataset configs.

items: Return (name, config) pairs.

Private Methods

---------------

_get_item: Support dictionary-style access.

_contains: Support ‘in’ operator.

Examples

>>> from pathlib import Path
>>> data = {
...     "dataset1": {
...         "uri": "https://example.com/dataset1",
...         "path": "/path/to/dataset1.h5ad"
...     },
...     "dataset2": {
...         "uri": "https://example.com/dataset2",
...         "path": "/path/to/dataset2.h5mu"
...     }
... }
>>> configs = DatasetsConfig(data)
>>> # Access datasets
>>> dataset1 = configs["dataset1"]
>>> dataset2 = configs.get("dataset2")
>>> # Check if dataset exists
>>> "dataset1" in configs
True
>>> # Iterate over datasets
>>> for name, config in configs.items():
...     print(f"{name}: {config.uri}")

__init__(data: Dict[str, DatasetConfig | Dict[str, str | Path]])

Initialize DatasetsConfig from a dictionary, ensuring each DatasetConfig has its name set.

Parameters:: data (Dict[str, Union[DatasetConfig, Dict[str, Union[str, Path]]]]) – Dictionary mapping dataset names to either: - DatasetConfig objects (name will be set from key if missing) - Dicts that will be converted to DatasetConfig (name will be set from key)

get(key: str) → DatasetConfig: Get dataset config by name, raising KeyError if not found.

items(): Return (name, config) pairs.

keys(): Return dataset names.

values(): Return dataset configs.

data: Dict[str, DatasetConfig]

class napistu.genomics.scverse_loading.ModalityOntologyConfig(*, ontologies: Set[str] | Dict[str, str] | None, index_which_ontology: str | None = None)

Bases: BaseModel

Configuration for ontology handling in a single modality.

_abc_impl = <_abc._abc_data object>

index_which_ontology: str | None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ontologies: Set[str] | Dict[str, str] | None

class napistu.genomics.scverse_loading.MultiModalityOntologyConfig(root: RootModelRootType = PydanticUndefined)

Bases: RootModel

Configuration for ontology handling across multiple modalities.

classmethod from_dict(data: Dict[str, Dict[str, Set[str] | Dict[str, str] | None | str]]) → MultiModalityOntologyConfig

Create a MultiModalityOntologyConfig from a dictionary.

Parameters:: data (Dict[str, Dict[str, Union[Optional[Union[Set[str], Dict[str, str]]], Optional[str]]]]) – Dictionary mapping modality names to their ontology configurations. Each modality config should have ‘ontologies’ and optionally ‘index_which_ontology’. The ‘ontologies’ field can be: - None to automatically detect valid ontology columns - Set of columns to treat as ontologies - Dict mapping wide column names to ontology names
Returns:: Validated ontology configuration
Return type:: MultiModalityOntologyConfig

items()

_abc_impl = <_abc._abc_data object>

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

root: Dict[str, ModalityOntologyConfig]

napistu.genomics.scverse_loading._create_results_df(array: ndarray, attrs: List[str], var_index: Index, table_type: str) → DataFrame

Create a DataFrame with the right orientation based on table type.

For varm/varp tables:

rows are vars (var_index)
columns are attrs (features/selected vars)

For X/layers:

rows are attrs (selected observations)
columns are vars (var_index)
then transpose to get vars as rows

napistu.genomics.scverse_loading._extract_ontologies(var_table: DataFrame, ontologies: Set[str] | Dict[str, str] | None = None, index_which_ontology: str | None = None) → DataFrame

Extract ontology columns from a var table, optionally including the index as an ontology.

Parameters:

var_table (pd.DataFrame) – The var table containing systematic identifiers
ontologies (Optional[Union[Set[str], Dict[str, str]]], default=None) – Either: - Set of columns to treat as ontologies (these should be entries in ONTOLOGIES_LIST) - Dict mapping wide column names to ontology names in the ONTOLOGIES_LIST controlled vocabulary - None to automatically detect valid ontology columns based on ONTOLOGIES_LIST
index_which_ontology (Optional[str], default=None) – If provided, extract the index as this ontology. Must not already exist in var table.

Returns:

DataFrame containing only the ontology columns, with the same index as var_table

Return type:

pd.DataFrame

Raises:

ValueError – If index_which_ontology already exists in var table If any renamed ontology column already exists in var table If any rename values are duplicated If any final column names are not in ONTOLOGIES_LIST

napistu.genomics.scverse_loading._get_table_from_dict_attr(adata: anndata.AnnData | mudata.MuData, attr_name: str, table_name: str | None = None) → pd.DataFrame | np.ndarray

Get a table from a dict-like AnnData attribute (varm, layers, etc.)

Parameters:

adata (anndata.AnnData or mudata.MuData) – The AnnData or MuData object to load the table from
attr_name (str) – Name of the attribute (‘varm’, ‘layers’, etc.)
table_name (str, optional) – Specific table name to retrieve. If None and only one table exists, that table will be returned. If None and multiple tables exist, raises ValueError

Returns:

The table data. For array-type attributes (varm, varp, X, layers), returns numpy array. For other attributes, returns DataFrame

Return type:

Union[pd.DataFrame, np.ndarray]

Raises:

ValueError – If attr_name is not a valid dict-like attribute If no tables found in the attribute If multiple tables found and table_name not specified If specified table_name not found

napistu.genomics.scverse_loading._get_valid_attrs_for_feature_level_array(adata: anndata.AnnData, table_type: str, raw_results_table: np.ndarray, table_colnames: List[str] | None = None) → list[str]

Get valid attributes for a feature-level array.

Parameters:

adata (anndata.AnnData) – The AnnData object
table_type (str) – The type of table
raw_results_table (np.ndarray) – The raw results table for dimension validation
table_colnames (Optional[List[str]]) – Column names for varm tables

Returns:

List of valid attributes for this table type

Return type:

list[str]

Raises:

ValueError – If table_type is invalid or if table_colnames validation fails for varm tables

napistu.genomics.scverse_loading._load_raw_table(adata: anndata.AnnData | mudata.MuData, table_type: str, table_name: str | None = None) → pd.DataFrame | np.ndarray

Load an AnnData table.

This function loads an AnnData table and returns it as a pd.DataFrame.

Parameters:

adata (anndata.AnnData or mudata.MuData) – The AnnData or MuData object to load the table from.
table_type (str) – The type of table to load.
table_name (str, optional) – The name of the table to load.

Returns:

The loaded table.

Return type:

pd.DataFrame or np.ndarray

napistu.genomics.scverse_loading._select_results_attrs(adata: anndata.AnnData, raw_results_table: pd.DataFrame | np.ndarray, table_type: str, results_attrs: List[str] | None = None, table_colnames: List[str] | None = None) → pd.DataFrame

Select results attributes from an AnnData object.

This function selects results attributes from raw_results_table derived from an AnnData object and converts them if needed to a pd.DataFrame with appropriate indices.

Parameters:

adata (anndata.AnnData) – The AnnData object containing the results to be formatted.
raw_results_table (pd.DataFrame or np.ndarray) – The raw results table to be formatted.
table_type (str,) – The type of table raw_results_table refers to.
results_attrs (list of str, optional) – The attributes to extract from the raw_results_table.
table_colnames (list of str, optional,) – If table_type is varm, this is the names of all columns (e.g., PC1, PC2, etc.). Ignored otherwise

Returns:

A DataFrame containing the formatted results.

Return type:

pd.DataFrame

napistu.genomics.scverse_loading._split_mdata_results_by_modality(mdata: mudata.MuData, results_data_table: pd.DataFrame) → Dict[str, pd.DataFrame]

Split a results table by modality and verify compatibility with var tables.

Parameters:

mdata (mudata.MuData) – MuData object containing multiple modalities
results_data_table (pd.DataFrame) – Results table with vars as rows, typically from prepare_anndata_results_df()

Returns:

Dictionary with modality names as keys and DataFrames as values. Each DataFrame contains just the results for that modality. The index of each DataFrame is guaranteed to match the corresponding modality’s var table for later merging.

Return type:

Dict[str, pd.DataFrame]

Raises:

ValueError – If any modality’s vars are not found in the results table If any modality’s results have different indices than its var table

napistu.genomics.scverse_loading.prepare_anndata_results_df(adata: anndata.AnnData | mudata.MuData, table_type: str = 'var', table_name: str | None = None, results_attrs: List[str] | None = None, ontologies: Set[str] | Dict[str, str] | None = None, index_which_ontology: str | None = None, table_colnames: List[str] | None = None) → pd.DataFrame

Prepare a results table from an AnnData object for use in Napistu.

This function extracts a table from an AnnData object and formats it for use in Napistu. The returned DataFrame will always include systematic identifiers from the var table, along with the requested results data.

Parameters:

adata (anndata.AnnData or mudata.MuData) – The AnnData or MuData object containing the results to be formatted.
table_type (str, optional) – The type of table to extract from the AnnData object. Must be one of: “var”, “varm”, or “X”.
table_name (str, optional) – The name of the table to extract from the AnnData object.
results_attrs (list of str, optional) – The attributes to extract from the table.
index_which_ontology (str, optional) – The ontology to use for the systematic identifiers. This column will be pulled out of the index renamed to the ontology name, and added to the results table as a new column with the same name. Must not already exist in var table.
ontologies (Optional[Union[Set[str], Dict[str, str]]], default=None) –
Either: - Set of columns to treat as ontologies (these should be entries in ONTOLOGIES_LIST ) - Dict mapping wide column names to ontology names in the ONTOLOGIES_LIST controlled vocabulary - None to automatically detect valid ontology columns based on ONTOLOGIES_LIST

If index_which_ontology is defined, it should be represented in these ontologies.
table_colnames (Optional[List[str]], optional) – Column names for varm tables. Required when table_type is “varm”. Ignored otherwise.

Returns:

A DataFrame containing the formatted results with systematic identifiers. The index will match the var_names of the AnnData object.

Return type:

pd.DataFrame

Raises:

ValueError – If table_type is not one of: “var”, “varm”, or “X” If index_which_ontology already exists in var table

napistu.genomics.scverse_loading.prepare_mudata_results_df(mdata: mudata.MuData, mudata_ontologies: 'MultiModalityOntologyConfig' | Dict[str, Dict[str, Set[str] | Dict[str, str] | None | str | None]], table_type: str = 'var', table_name: str | None = None, results_attrs: List[str] | None = None, table_colnames: List[str] | None = None, level: str = 'mdata') → Dict[str, pd.DataFrame]

Prepare results tables from a MuData object for use in Napistu, with adata-specific ontology handling.

This function extracts tables from each adata in a MuData object and formats them for use in Napistu. Each adata’s table will include systematic identifiers from its var table along with the requested results data. Ontology handling is configured per-adata using MultiModalityOntologyConfig.

Parameters:

mdata (mudata.MuData) – The MuData object containing the results to be formatted.
mudata_ontologies (MultiModalityOntologyConfig or dict) – Configuration for ontology handling modality (each with a separate AnnData object). Must include an entry for each modality. Can be either: - A MultiModalityOntologyConfig object - A dictionary that can be converted to MultiModalityOntologyConfig using from_dict() Each modality’s ‘ontologies’ field can be: - None to automatically detect valid ontology columns - Set of columns to treat as ontologies - Dict mapping wide column names to ontology names The ‘index_which_ontology’ field is optional.
table_type (str, optional) – The type of table to extract from each modality. Must be one of: “var”, “varm”, or “X”.
table_name (str, optional) – The name of the table to extract from each modality.
results_attrs (list of str, optional) – The attributes to extract from the table.
table_colnames (list of str, optional) – Column names for varm tables. Required when table_type is “varm”. Ignored otherwise.
level (str, optional) – Whether to extract data from “mdata” (MuData-level) or “adata” (individual AnnData-level) tables. Default is “mdata”.

Returns:

Dictionary mapping modality names to their formatted results DataFrames. Each DataFrame contains the modality’s results with systematic identifiers. The index of each DataFrame will match the var_names of that modality.

Return type:

Dict[str, pd.DataFrame]

Raises:

ValueError – If table_type is not one of: “var”, “varm”, or “X” If mudata_ontologies contains invalid configuration If modality-specific ontology extraction fails If any modality is missing from mudata_ontologies If level is not “global” or “modality”