napistu.ontologies.genodexito
Classes
|
A tool for mapping gene identifiers across ontologies. |
- class napistu.ontologies.genodexito.Genodexito(organismal_species: str = 'Homo sapiens', preferred_method: str = 'bioconductor', allow_fallback: bool = True, r_paths: List[str] | None = None, test_mode: bool = False, mygene_query_strategies: List[str] | None = None)
Bases:
objectA tool for mapping gene identifiers across ontologies.
Genodexito provides a unified interface for mapping between different gene identifier ontologies (e.g. Ensembl, Entrez, UniProt). It supports both an R-centric workflow using Bioconductor through RPy2, as well as a Python-centric workflow using MyGene.info.
The class automatically handles fallback between the two methods if one fails.
- Parameters:
organismal_species (str, optional) – The organismal species to map identifiers for, by default “Homo sapiens”
preferred_method (str, optional) – Which mapping method to try first (“bioconductor” or “python”), by default “bioconductor”
allow_fallback (bool, optional) – Whether to allow falling back to the other method if preferred fails, by default True
r_paths (Optional[List[str]], optional) – Optional paths to R libraries for Bioconductor, by default None
test_mode (bool, optional) – If True, limit queries to 1000 genes for testing purposes, by default False
mygene_query_strategies (list of str, optional) – MyGene.info query strings when using the Python mapper; omitted uses
MYGENE_DEFAULT_QUERIES
- mappings
Dictionary of mapping tables between ontologies
- Type:
Optional[Dict[str, pd.DataFrame]]
- mapper_used
Which mapping method was successfully used (“bioconductor” or “python”)
- Type:
Optional[str]
- merged_mappings
Combined wide-format mapping table
- Type:
Optional[pd.DataFrame]
- stacked_mappings
Combined long-format mapping table
- Type:
Optional[pd.DataFrame]
- create_mapping_tables(mappings: Set[str], overwrite: bool = False)
Create mapping tables between different ontologies. This is the primary method to fetch and store identifier mappings. Must be called before using other methods.
- merge_mappings(ontologies: Set[str] | None = None)
Create a wide-format table where each row is an Entrez gene ID and columns contain the corresponding identifiers in other ontologies.
- stack_mappings(ontologies: Set[str] | None = None)
Create a long-format table combining all mappings, with columns for ontology type and identifier values.
- expand_sbml_dfs_ids(sbml_dfs: sbml_dfs_core.SBML_dfs, ontologies: Set[str] | None = None)
Update the expanded identifiers for a model by adding additional related ontologies pulled from Bioconductor or MyGene.info.
Examples
>>> # Initialize mapper with Python method >>> geno = Genodexito(preferred_method="python") >>> >>> # Create mapping tables for specific ontologies >>> mappings = {'ensembl_gene', 'symbol', 'uniprot'} >>> geno.create_mapping_tables(mappings) >>> >>> # Create merged wide-format table >>> geno.merge_mappings() >>> print(geno.merged_mappings.head()) >>> >>> # Create stacked long-format table >>> geno.stack_mappings() >>> print(geno.stacked_mappings.head())
- __init__(organismal_species: str = 'Homo sapiens', preferred_method: str = 'bioconductor', allow_fallback: bool = True, r_paths: List[str] | None = None, test_mode: bool = False, mygene_query_strategies: List[str] | None = None) None
Initialize unified gene mapper
- Parameters:
organismal_species (str, optional) – Species name, by default “Homo sapiens”
preferred_method (str, optional) – Which mapping method to try first (“bioconductor” or “python”), by default “bioconductor”
allow_fallback (bool, optional) – Whether to allow falling back to other method if preferred fails, by default True
r_paths (Optional[List[str]], optional) – Optional paths to R libraries for Bioconductor, by default None
test_mode (bool, optional) – If True, limit queries to 1000 genes for testing purposes, by default False
mygene_query_strategies (Optional[List[str]], optional) – MyGene.info query strategies for the Python mapper; omitted uses
MYGENE_DEFAULT_QUERIES
- _check_mappings() None
Check that mappings exist and contain required ontologies.
- Raises:
ValueError – If mappings don’t exist or don’t contain NCBI_ENTREZ_GENE
TypeError – If any identifiers are not strings
ValueError – If any mapping tables contain NA values
- _create_expanded_identifiers(sbml_dfs: SBML_dfs, ontologies: Set[str] | None = None) Series
Create expanded identifiers for SBML species.
Update a table’s identifiers to include additional related ontologies. Ontologies are pulled from the bioconductor “org” packages or MyGene.info.
- Parameters:
sbml_dfs (sbml_dfs_core.SBML_dfs) – A relational pathway model built around reactions interconverting compartmentalized species
ontologies (Optional[Set[str]], optional) – Ontologies to add or complete, by default None If None, uses all available ontologies
- Returns:
Series with identifiers as the index and updated Identifiers objects as values
- Return type:
pd.Series
- Raises:
ValueError – If merged mappings don’t exist or all requested ontologies already exist
TypeError – If identifiers are not in expected format
- _use_mappings(ontologies: Set[str] | None) Set[str]
Validate and process ontologies for mapping operations.
- Parameters:
ontologies (Optional[Set[str]]) – Set of ontologies to validate. If None, uses all available mappings.
- Returns:
Set of validated ontologies to use
- Return type:
Set[str]
- Raises:
ValueError – If mappings don’t exist or ontologies are invalid
- create_mapping_tables(mappings: Set[str], overwrite: bool = False) None
Create mapping tables between different ontologies.
This is a drop-in replacement for create_bioconductor_mapping_tables that handles both Bioconductor and Python-based mapping methods.
- Parameters:
mappings (Set[str]) – Set of ontologies to create mappings for
overwrite (bool, optional) – Whether to overwrite existing mappings, by default False
- Returns:
Updates self.mappings and self.mapper_used in place
- Return type:
None
- expand_sbml_dfs_ids(sbml_dfs: SBML_dfs, ontologies: Set[str] | None = None) SBML_dfs
Update the expanded identifiers for a model.
- Parameters:
sbml_dfs (sbml_dfs_core.SBML_dfs) – The SBML model to update with expanded identifiers
ontologies (Optional[Set[str]], optional) – Set of ontologies to use for mapping. If None, uses all available ontologies from INTERCONVERTIBLE_GENIC_ONTOLOGIES.
- Returns:
Updated SBML model with expanded identifiers
- Return type:
- merge_mappings(ontologies: Set[str] | None = None) None
Merge mappings into a single wide table.
Creates a wide-format table where each row is an Entrez gene ID and columns contain the corresponding identifiers in other ontologies.
- Parameters:
ontologies (Optional[Set[str]], optional) – Set of ontologies to include in merged table, by default None If None, uses all available ontologies
- Returns:
Updates self.merged_mappings in place
- Return type:
None
- Raises:
ValueError – If mappings don’t exist or requested ontologies are invalid
TypeError – If any identifiers are not strings
ValueError – If any mapping tables contain NA values
- stack_mappings(ontologies: Set[str] | None = None) None
Stack mappings into a single long table.
Convert a dict of mappings between Entrez identifiers and other identifiers into a single long-format table.
- Parameters:
ontologies (Optional[Set[str]], optional) – Set of ontologies to include in stacked table, by default None If None, uses all available ontologies
- Returns:
Updates self.stacked_mappings in place
- Return type:
None
- Raises:
ValueError – If mappings don’t exist or requested ontologies are invalid
TypeError – If any identifiers are not strings
ValueError – If any mapping tables contain NA values