napistu.ontologies.genodexito

Classes

Genodexito([organismal_species, ...])

A tool for mapping gene identifiers across ontologies.

class napistu.ontologies.genodexito.Genodexito(organismal_species: str = 'Homo sapiens', preferred_method: str = 'bioconductor', allow_fallback: bool = True, r_paths: List[str] | None = None, test_mode: bool = False, mygene_query_strategies: List[str] | None = None)

Bases: object

A tool for mapping gene identifiers across ontologies.

Genodexito provides a unified interface for mapping between different gene identifier ontologies (e.g. Ensembl, Entrez, UniProt). It supports both an R-centric workflow using Bioconductor through RPy2, as well as a Python-centric workflow using MyGene.info.

The class automatically handles fallback between the two methods if one fails.

Parameters:

organismal_species (str, optional) – The organismal species to map identifiers for, by default “Homo sapiens”
preferred_method (str, optional) – Which mapping method to try first (“bioconductor” or “python”), by default “bioconductor”
allow_fallback (bool, optional) – Whether to allow falling back to the other method if preferred fails, by default True
r_paths (Optional[List[str]], optional) – Optional paths to R libraries for Bioconductor, by default None
test_mode (bool, optional) – If True, limit queries to 1000 genes for testing purposes, by default False
mygene_query_strategies (list of str, optional) – MyGene.info query strings when using the Python mapper; omitted uses MYGENE_DEFAULT_QUERIES

mappings

Dictionary of mapping tables between ontologies

Type:: Optional[Dict[str, pd.DataFrame]]

mapper_used

Which mapping method was successfully used (“bioconductor” or “python”)

Type:: Optional[str]

merged_mappings

Combined wide-format mapping table

Type:: Optional[pd.DataFrame]

stacked_mappings

Combined long-format mapping table

Type:: Optional[pd.DataFrame]

create_mapping_tables(mappings: Set[str], overwrite: bool = False): Create mapping tables between different ontologies. This is the primary method to fetch and store identifier mappings. Must be called before using other methods.

merge_mappings(ontologies: Set[str] | None = None): Create a wide-format table where each row is an Entrez gene ID and columns contain the corresponding identifiers in other ontologies.

stack_mappings(ontologies: Set[str] | None = None): Create a long-format table combining all mappings, with columns for ontology type and identifier values.

expand_sbml_dfs_ids(sbml_dfs: sbml_dfs_core.SBML_dfs, ontologies: Set[str] | None = None): Update the expanded identifiers for a model by adding additional related ontologies pulled from Bioconductor or MyGene.info.

Examples

>>> # Initialize mapper with Python method
>>> geno = Genodexito(preferred_method="python")
>>>
>>> # Create mapping tables for specific ontologies
>>> mappings = {'ensembl_gene', 'symbol', 'uniprot'}
>>> geno.create_mapping_tables(mappings)
>>>
>>> # Create merged wide-format table
>>> geno.merge_mappings()
>>> print(geno.merged_mappings.head())
>>>
>>> # Create stacked long-format table
>>> geno.stack_mappings()
>>> print(geno.stacked_mappings.head())

__init__(organismal_species: str = 'Homo sapiens', preferred_method: str = 'bioconductor', allow_fallback: bool = True, r_paths: List[str] | None = None, test_mode: bool = False, mygene_query_strategies: List[str] | None = None) → None

Initialize unified gene mapper

Parameters:

organismal_species (str, optional) – Species name, by default “Homo sapiens”
preferred_method (str, optional) – Which mapping method to try first (“bioconductor” or “python”), by default “bioconductor”
allow_fallback (bool, optional) – Whether to allow falling back to other method if preferred fails, by default True
r_paths (Optional[List[str]], optional) – Optional paths to R libraries for Bioconductor, by default None
test_mode (bool, optional) – If True, limit queries to 1000 genes for testing purposes, by default False
mygene_query_strategies (Optional[List[str]], optional) – MyGene.info query strategies for the Python mapper; omitted uses MYGENE_DEFAULT_QUERIES

_check_mappings() → None

Check that mappings exist and contain required ontologies.

Raises:

ValueError – If mappings don’t exist or don’t contain NCBI_ENTREZ_GENE
TypeError – If any identifiers are not strings
ValueError – If any mapping tables contain NA values

_create_expanded_identifiers(sbml_dfs: SBML_dfs, ontologies: Set[str] | None = None) → Series

Create expanded identifiers for SBML species.

Update a table’s identifiers to include additional related ontologies. Ontologies are pulled from the bioconductor “org” packages or MyGene.info.

Parameters:

sbml_dfs (sbml_dfs_core.SBML_dfs) – A relational pathway model built around reactions interconverting compartmentalized species
ontologies (Optional[Set[str]], optional) – Ontologies to add or complete, by default None If None, uses all available ontologies

Returns:

Series with identifiers as the index and updated Identifiers objects as values

Return type:

pd.Series

Raises:

ValueError – If merged mappings don’t exist or all requested ontologies already exist
TypeError – If identifiers are not in expected format

_use_mappings(ontologies: Set[str] | None) → Set[str]

Validate and process ontologies for mapping operations.

Parameters:: ontologies (Optional[Set[str]]) – Set of ontologies to validate. If None, uses all available mappings.
Returns:: Set of validated ontologies to use
Return type:: Set[str]
Raises:: ValueError – If mappings don’t exist or ontologies are invalid

create_mapping_tables(mappings: Set[str], overwrite: bool = False) → None

Create mapping tables between different ontologies.

This is a drop-in replacement for create_bioconductor_mapping_tables that handles both Bioconductor and Python-based mapping methods.

Parameters:

mappings (Set[str]) – Set of ontologies to create mappings for
overwrite (bool, optional) – Whether to overwrite existing mappings, by default False

Returns:

Updates self.mappings and self.mapper_used in place

Return type:

None

expand_sbml_dfs_ids(sbml_dfs: SBML_dfs, ontologies: Set[str] | None = None) → SBML_dfs

Update the expanded identifiers for a model.

Parameters:

sbml_dfs (sbml_dfs_core.SBML_dfs) – The SBML model to update with expanded identifiers
ontologies (Optional[Set[str]], optional) – Set of ontologies to use for mapping. If None, uses all available ontologies from INTERCONVERTIBLE_GENIC_ONTOLOGIES.

Returns:

Updated SBML model with expanded identifiers

Return type:

sbml_dfs_core.SBML_dfs

merge_mappings(ontologies: Set[str] | None = None) → None

Merge mappings into a single wide table.

Creates a wide-format table where each row is an Entrez gene ID and columns contain the corresponding identifiers in other ontologies.

Parameters:

ontologies (Optional[Set[str]], optional) – Set of ontologies to include in merged table, by default None If None, uses all available ontologies

Returns:

Updates self.merged_mappings in place

Return type:

None

Raises:

ValueError – If mappings don’t exist or requested ontologies are invalid
TypeError – If any identifiers are not strings
ValueError – If any mapping tables contain NA values

stack_mappings(ontologies: Set[str] | None = None) → None

Stack mappings into a single long table.

Convert a dict of mappings between Entrez identifiers and other identifiers into a single long-format table.

Parameters:

ontologies (Optional[Set[str]], optional) – Set of ontologies to include in stacked table, by default None If None, uses all available ontologies

Returns:

Updates self.stacked_mappings in place

Return type:

None

Raises:

ValueError – If mappings don’t exist or requested ontologies are invalid
TypeError – If any identifiers are not strings
ValueError – If any mapping tables contain NA values