napistu.genomics.scverse_loading
Functions for connection scverse data with Napistu graphs.
Classes
- DatasetConfig:
Pydantic model for a single dataset configuration.
- DatasetsConfig:
Pydantic model for multiple datasets configuration.
Public Functions
- prepare_anndata_results_df:
Prepare a results table from an AnnData object for use in Napistu.
- prepare_mudata_results_df:
Prepare results tables from a MuData object for use in Napistu, with adata-specific ontology handling.
Functions
|
Prepare a results table from an AnnData object for use in Napistu. |
|
Prepare results tables from a MuData object for use in Napistu, with adata-specific ontology handling. |
Classes
|
Pydantic model for a single dataset configuration. |
|
Container for multiple dataset configurations. |
|
Configuration for ontology handling in a single modality. |
|
Configuration for ontology handling across multiple modalities. |
- class napistu.genomics.scverse_loading.DatasetConfig(*, name: str, uri: str, path: Path)
Bases:
BaseModelPydantic model for a single dataset configuration.
- name
Name of the dataset.
- Type:
str
- path
Local file path to the dataset file (.h5ad or .h5mu).
- Type:
Path
- Public Methods
- --------------
- load_h5ad
Load an .h5ad file as an AnnData object.
- load_h5mu
Load a .h5mu file as a MuData object.
Examples
>>> from pathlib import Path >>> config = DatasetConfig( ... name="my_dataset", ... uri="https://example.com/dataset", ... path=Path("/path/to/dataset.h5ad") ... ) >>> adata = config.load_h5ad()
- classmethod validate_path(v: str | Path) Path
Validate that path is a non-empty string or Path.
- classmethod validate_uri(v: str) str
Validate that uri is a valid URL.
- load_h5ad() anndata.AnnData
Load an .h5ad file as an AnnData object.
- load_h5mu() mudata.MuData
Load a .h5mu file as a MuData object.
- _abc_impl = <_abc._abc_data object>
- model_config = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: str
- path: Path
- uri: str
- class napistu.genomics.scverse_loading.DatasetsConfig(data: Dict[str, DatasetConfig | Dict[str, str | Path]])
Bases:
objectContainer for multiple dataset configurations.
- data
Dictionary mapping dataset names to DatasetConfig objects.
- Type:
Dict[str, DatasetConfig]
- Public Methods
- --------------
- get
Get dataset config by name, raising KeyError if not found.
- keys
Return dataset names.
- values
Return dataset configs.
- items
Return (name, config) pairs.
- Private Methods
- ---------------
- _get_item
Support dictionary-style access.
- _contains
Support ‘in’ operator.
Examples
>>> from pathlib import Path >>> data = { ... "dataset1": { ... "uri": "https://example.com/dataset1", ... "path": "/path/to/dataset1.h5ad" ... }, ... "dataset2": { ... "uri": "https://example.com/dataset2", ... "path": "/path/to/dataset2.h5mu" ... } ... } >>> configs = DatasetsConfig(data) >>> # Access datasets >>> dataset1 = configs["dataset1"] >>> dataset2 = configs.get("dataset2") >>> # Check if dataset exists >>> "dataset1" in configs True >>> # Iterate over datasets >>> for name, config in configs.items(): ... print(f"{name}: {config.uri}")
- __init__(data: Dict[str, DatasetConfig | Dict[str, str | Path]])
Initialize DatasetsConfig from a dictionary, ensuring each DatasetConfig has its name set.
- Parameters:
data (Dict[str, Union[DatasetConfig, Dict[str, Union[str, Path]]]]) – Dictionary mapping dataset names to either: - DatasetConfig objects (name will be set from key if missing) - Dicts that will be converted to DatasetConfig (name will be set from key)
- get(key: str) DatasetConfig
Get dataset config by name, raising KeyError if not found.
- items()
Return (name, config) pairs.
- keys()
Return dataset names.
- values()
Return dataset configs.
- data: Dict[str, DatasetConfig]
- class napistu.genomics.scverse_loading.ModalityOntologyConfig(*, ontologies: Set[str] | Dict[str, str] | None, index_which_ontology: str | None = None)
Bases:
BaseModelConfiguration for ontology handling in a single modality.
- _abc_impl = <_abc._abc_data object>
- index_which_ontology: str | None
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- ontologies: Set[str] | Dict[str, str] | None
- class napistu.genomics.scverse_loading.MultiModalityOntologyConfig(root: RootModelRootType = PydanticUndefined)
Bases:
RootModelConfiguration for ontology handling across multiple modalities.
- classmethod from_dict(data: Dict[str, Dict[str, Set[str] | Dict[str, str] | None | str]]) MultiModalityOntologyConfig
Create a MultiModalityOntologyConfig from a dictionary.
- Parameters:
data (Dict[str, Dict[str, Union[Optional[Union[Set[str], Dict[str, str]]], Optional[str]]]]) – Dictionary mapping modality names to their ontology configurations. Each modality config should have ‘ontologies’ and optionally ‘index_which_ontology’. The ‘ontologies’ field can be: - None to automatically detect valid ontology columns - Set of columns to treat as ontologies - Dict mapping wide column names to ontology names
- Returns:
Validated ontology configuration
- Return type:
- items()
- _abc_impl = <_abc._abc_data object>
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- root: Dict[str, ModalityOntologyConfig]
- napistu.genomics.scverse_loading._create_results_df(array: ndarray, attrs: List[str], var_index: Index, table_type: str) DataFrame
Create a DataFrame with the right orientation based on table type.
- For varm/varp tables:
rows are vars (var_index)
columns are attrs (features/selected vars)
- For X/layers:
rows are attrs (selected observations)
columns are vars (var_index)
then transpose to get vars as rows
- napistu.genomics.scverse_loading._extract_ontologies(var_table: DataFrame, ontologies: Set[str] | Dict[str, str] | None = None, index_which_ontology: str | None = None) DataFrame
Extract ontology columns from a var table, optionally including the index as an ontology.
- Parameters:
var_table (pd.DataFrame) – The var table containing systematic identifiers
ontologies (Optional[Union[Set[str], Dict[str, str]]], default=None) – Either: - Set of columns to treat as ontologies (these should be entries in ONTOLOGIES_LIST) - Dict mapping wide column names to ontology names in the ONTOLOGIES_LIST controlled vocabulary - None to automatically detect valid ontology columns based on ONTOLOGIES_LIST
index_which_ontology (Optional[str], default=None) – If provided, extract the index as this ontology. Must not already exist in var table.
- Returns:
DataFrame containing only the ontology columns, with the same index as var_table
- Return type:
pd.DataFrame
- Raises:
ValueError – If index_which_ontology already exists in var table If any renamed ontology column already exists in var table If any rename values are duplicated If any final column names are not in ONTOLOGIES_LIST
- napistu.genomics.scverse_loading._get_table_from_dict_attr(adata: anndata.AnnData | mudata.MuData, attr_name: str, table_name: str | None = None) pd.DataFrame | np.ndarray
Get a table from a dict-like AnnData attribute (varm, layers, etc.)
- Parameters:
adata (anndata.AnnData or mudata.MuData) – The AnnData or MuData object to load the table from
attr_name (str) – Name of the attribute (‘varm’, ‘layers’, etc.)
table_name (str, optional) – Specific table name to retrieve. If None and only one table exists, that table will be returned. If None and multiple tables exist, raises ValueError
- Returns:
The table data. For array-type attributes (varm, varp, X, layers), returns numpy array. For other attributes, returns DataFrame
- Return type:
Union[pd.DataFrame, np.ndarray]
- Raises:
ValueError – If attr_name is not a valid dict-like attribute If no tables found in the attribute If multiple tables found and table_name not specified If specified table_name not found
- napistu.genomics.scverse_loading._get_valid_attrs_for_feature_level_array(adata: anndata.AnnData, table_type: str, raw_results_table: np.ndarray, table_colnames: List[str] | None = None) list[str]
Get valid attributes for a feature-level array.
- Parameters:
adata (anndata.AnnData) – The AnnData object
table_type (str) – The type of table
raw_results_table (np.ndarray) – The raw results table for dimension validation
table_colnames (Optional[List[str]]) – Column names for varm tables
- Returns:
List of valid attributes for this table type
- Return type:
list[str]
- Raises:
ValueError – If table_type is invalid or if table_colnames validation fails for varm tables
- napistu.genomics.scverse_loading._load_raw_table(adata: anndata.AnnData | mudata.MuData, table_type: str, table_name: str | None = None) pd.DataFrame | np.ndarray
Load an AnnData table.
This function loads an AnnData table and returns it as a pd.DataFrame.
- Parameters:
adata (anndata.AnnData or mudata.MuData) – The AnnData or MuData object to load the table from.
table_type (str) – The type of table to load.
table_name (str, optional) – The name of the table to load.
- Returns:
The loaded table.
- Return type:
pd.DataFrame or np.ndarray
- napistu.genomics.scverse_loading._select_results_attrs(adata: anndata.AnnData, raw_results_table: pd.DataFrame | np.ndarray, table_type: str, results_attrs: List[str] | None = None, table_colnames: List[str] | None = None) pd.DataFrame
Select results attributes from an AnnData object.
This function selects results attributes from raw_results_table derived from an AnnData object and converts them if needed to a pd.DataFrame with appropriate indices.
- Parameters:
adata (anndata.AnnData) – The AnnData object containing the results to be formatted.
raw_results_table (pd.DataFrame or np.ndarray) – The raw results table to be formatted.
table_type (str,) – The type of table raw_results_table refers to.
results_attrs (list of str, optional) – The attributes to extract from the raw_results_table.
table_colnames (list of str, optional,) – If table_type is varm, this is the names of all columns (e.g., PC1, PC2, etc.). Ignored otherwise
- Returns:
A DataFrame containing the formatted results.
- Return type:
pd.DataFrame
- napistu.genomics.scverse_loading._split_mdata_results_by_modality(mdata: mudata.MuData, results_data_table: pd.DataFrame) Dict[str, pd.DataFrame]
Split a results table by modality and verify compatibility with var tables.
- Parameters:
mdata (mudata.MuData) – MuData object containing multiple modalities
results_data_table (pd.DataFrame) – Results table with vars as rows, typically from prepare_anndata_results_df()
- Returns:
Dictionary with modality names as keys and DataFrames as values. Each DataFrame contains just the results for that modality. The index of each DataFrame is guaranteed to match the corresponding modality’s var table for later merging.
- Return type:
Dict[str, pd.DataFrame]
- Raises:
ValueError – If any modality’s vars are not found in the results table If any modality’s results have different indices than its var table
- napistu.genomics.scverse_loading.prepare_anndata_results_df(adata: anndata.AnnData | mudata.MuData, table_type: str = 'var', table_name: str | None = None, results_attrs: List[str] | None = None, ontologies: Set[str] | Dict[str, str] | None = None, index_which_ontology: str | None = None, table_colnames: List[str] | None = None) pd.DataFrame
Prepare a results table from an AnnData object for use in Napistu.
This function extracts a table from an AnnData object and formats it for use in Napistu. The returned DataFrame will always include systematic identifiers from the var table, along with the requested results data.
- Parameters:
adata (anndata.AnnData or mudata.MuData) – The AnnData or MuData object containing the results to be formatted.
table_type (str, optional) – The type of table to extract from the AnnData object. Must be one of: “var”, “varm”, or “X”.
table_name (str, optional) – The name of the table to extract from the AnnData object.
results_attrs (list of str, optional) – The attributes to extract from the table.
index_which_ontology (str, optional) – The ontology to use for the systematic identifiers. This column will be pulled out of the index renamed to the ontology name, and added to the results table as a new column with the same name. Must not already exist in var table.
ontologies (Optional[Union[Set[str], Dict[str, str]]], default=None) –
Either: - Set of columns to treat as ontologies (these should be entries in ONTOLOGIES_LIST ) - Dict mapping wide column names to ontology names in the ONTOLOGIES_LIST controlled vocabulary - None to automatically detect valid ontology columns based on ONTOLOGIES_LIST
If index_which_ontology is defined, it should be represented in these ontologies.
table_colnames (Optional[List[str]], optional) – Column names for varm tables. Required when table_type is “varm”. Ignored otherwise.
- Returns:
A DataFrame containing the formatted results with systematic identifiers. The index will match the var_names of the AnnData object.
- Return type:
pd.DataFrame
- Raises:
ValueError – If table_type is not one of: “var”, “varm”, or “X” If index_which_ontology already exists in var table
- napistu.genomics.scverse_loading.prepare_mudata_results_df(mdata: mudata.MuData, mudata_ontologies: 'MultiModalityOntologyConfig' | Dict[str, Dict[str, Set[str] | Dict[str, str] | None | str | None]], table_type: str = 'var', table_name: str | None = None, results_attrs: List[str] | None = None, table_colnames: List[str] | None = None, level: str = 'mdata') Dict[str, pd.DataFrame]
Prepare results tables from a MuData object for use in Napistu, with adata-specific ontology handling.
This function extracts tables from each adata in a MuData object and formats them for use in Napistu. Each adata’s table will include systematic identifiers from its var table along with the requested results data. Ontology handling is configured per-adata using MultiModalityOntologyConfig.
- Parameters:
mdata (mudata.MuData) – The MuData object containing the results to be formatted.
mudata_ontologies (MultiModalityOntologyConfig or dict) – Configuration for ontology handling modality (each with a separate AnnData object). Must include an entry for each modality. Can be either: - A MultiModalityOntologyConfig object - A dictionary that can be converted to MultiModalityOntologyConfig using from_dict() Each modality’s ‘ontologies’ field can be: - None to automatically detect valid ontology columns - Set of columns to treat as ontologies - Dict mapping wide column names to ontology names The ‘index_which_ontology’ field is optional.
table_type (str, optional) – The type of table to extract from each modality. Must be one of: “var”, “varm”, or “X”.
table_name (str, optional) – The name of the table to extract from each modality.
results_attrs (list of str, optional) – The attributes to extract from the table.
table_colnames (list of str, optional) – Column names for varm tables. Required when table_type is “varm”. Ignored otherwise.
level (str, optional) – Whether to extract data from “mdata” (MuData-level) or “adata” (individual AnnData-level) tables. Default is “mdata”.
- Returns:
Dictionary mapping modality names to their formatted results DataFrames. Each DataFrame contains the modality’s results with systematic identifiers. The index of each DataFrame will match the var_names of that modality.
- Return type:
Dict[str, pd.DataFrame]
- Raises:
ValueError – If table_type is not one of: “var”, “varm”, or “X” If mudata_ontologies contains invalid configuration If modality-specific ontology extraction fails If any modality is missing from mudata_ontologies If level is not “global” or “modality”