napistu.ingestion.omnipath

Functions

format_omnipath_as_sbml_dfs(...)

Format OmniPath interaction data as SBML_dfs object.

get_interactions([dataset, organismal_species])

Retrieve interaction data from Omnipath with corrected evidence processing.

napistu.ingestion.omnipath._extract_omnipath_identifiers(identifier_series: Series, entity_name: str, ontology_aliases: Dict[str, List[str]] = {'corum': {'CORUM'}, 'signor': {'SIGNOR'}}) DataFrame

Extract and standardize OmniPath identifiers from a series of annotation strings.

Parameters:
  • identifier_series (pd.Series) – Series containing OmniPath annotation strings.

  • entity_name (str) – Name of the entity column in the output DataFrame.

  • ontology_aliases (Dict[str, List[str]], optional) – Mapping from OmniPath ontology names to Napistu ontology names, by default OMNIPATH_ONTOLOGY_ALIASES.

Returns:

DataFrame with entity_name as index and s_identifiers as column containing Identifiers objects for each entity.

Return type:

pd.DataFrame

napistu.ingestion.omnipath._fix_consensus_logic(df: DataFrame) DataFrame

Fix the consensus logic bug for interactions with no evidence.

When strict_evidences=True, interactions with no stimulation or inhibition evidence incorrectly get consensus_stimulation=True and consensus_inhibition=True due to the 0>=0 and 0<=0 comparisons both being True.

Parameters:

df (pd.DataFrame) – DataFrame containing OmniPath interaction data with consensus flags.

Returns:

DataFrame with corrected consensus logic flags.

Return type:

pd.DataFrame

Notes

This function identifies interactions where: - No stimulation evidence exists (is_stimulation=False) - No inhibition evidence exists (is_inhibition=False) - Both consensus flags are True (consensus_stimulation=True, consensus_inhibition=True)

For such interactions, both consensus flags are set to False.

napistu.ingestion.omnipath._format_complex_interactions(complex_formation_edgelist: DataFrame, nondegenerate_species_df: DataFrame) DataFrame

Format complex formation interactions into a standardized edgelist format.

Parameters:
  • complex_formation_edgelist (pd.DataFrame) – DataFrame containing complex formation reactions.

  • nondegenerate_species_df (pd.DataFrame) – DataFrame containing species data with unique s_names.

Returns:

DataFrame containing formatted complex formation reactions in edgelist format.

Return type:

pd.DataFrame

napistu.ingestion.omnipath._format_edgelist_interactions(interactions: DataFrame, nondegenerate_species_df: DataFrame) DataFrame

Format OmniPath interactions into a standardized edgelist format.

Parameters:
  • interactions (pd.DataFrame) – DataFrame containing OmniPath interaction data.

  • nondegenerate_species_df (pd.DataFrame) – DataFrame containing species data with unique s_names.

Returns:

DataFrame containing formatted interactions in edgelist format with columns for upstream/downstream names, SBO terms, stoichiometry, and metadata.

Return type:

pd.DataFrame

napistu.ingestion.omnipath._format_reaction_references(reference_series: Series) DataFrame

Format reaction references as Identifiers objects.

Parameters:

reference_series (pd.Series) – Series containing reference strings for reactions.

Returns:

DataFrame with reference strings as index and r_Identifiers as column containing Identifiers objects for each reference.

Return type:

pd.DataFrame

napistu.ingestion.omnipath._get_omnipath_fxn_map()

Get a map of dataset names to interaction classes

napistu.ingestion.omnipath._load_omnipath_attribute_mapper() DataFrame

Create a mapping table for OmniPath interaction attributes to SBO terms.

Based on OmniPath interaction’s consensus reversibility, stimulation, and inhibition, assign them to SBO terms and expand based on reversibility to forward and reverse directions.

Returns:

DataFrame containing all valid combinations of OmniPath attributes mapped to SBO terms, with columns for consensus flags, SBO terms, and reversibility.

Return type:

pd.DataFrame

napistu.ingestion.omnipath._name_interactions(interactions: DataFrame, nondegenerate_species_df: DataFrame) DataFrame

Add systematic names to interactions by merging with species data.

Parameters:
  • interactions (pd.DataFrame) – DataFrame containing interactions with source and target interactor IDs.

  • nondegenerate_species_df (pd.DataFrame) – DataFrame containing species data with interactor IDs and s_names.

Returns:

DataFrame containing interactions with upstream_name and downstream_name columns added by merging with species data.

Return type:

pd.DataFrame

napistu.ingestion.omnipath._parse_omnipath_annotation(annotation_str: str) DataFrame

Convert a semicolon-separated annotation string to a pandas DataFrame.

Parses strings in the format ‘annotation;annotation;…’ into a DataFrame with columns for annotation and the original annotation string.

Parameters:

annotation_str (str) – String containing annotations separated by semicolons.

Returns:

DataFrame with columns ‘annotation’ and ‘annotation_str’.

Return type:

pd.DataFrame

Examples

>>> s = '11290752;11983166;12601176'
>>> df = _parse_omnipath_annotation(s)
>>> print(df)
  annotation            annotation_str
0   11290752  11290752;11983166;12601176
1   11983166  11290752;11983166;12601176
2   12601176  11290752;11983166;12601176
napistu.ingestion.omnipath._parse_omnipath_named_annotation(annotation_str: str) DataFrame

Convert a semicolon-separated named annotation string to a pandas DataFrame.

Parses strings in the format ‘name:annotation;name:annotation;…’ into a DataFrame with columns for name, annotation, and the original annotation string.

Parameters:

annotation_str (str) – String containing named annotations separated by semicolons, with each annotation in the format ‘name:annotation’.

Returns:

DataFrame with columns ‘name’, ‘annotation’, and ‘annotation_str’.

Return type:

pd.DataFrame

Examples

>>> s = 'CORUM:4478;Compleat:HC1449;PDB:4awl'
>>> df = _parse_omnipath_named_annotation(s)
>>> print(df)
     name annotation                    annotation_str
0   CORUM       4478  CORUM:4478;Compleat:HC1449;PDB:4awl
1 Compleat    HC1449  CORUM:4478;Compleat:HC1449;PDB:4awl
2      PDB       4awl  CORUM:4478;Compleat:HC1449;PDB:4awl
napistu.ingestion.omnipath._patch_degenerate_s_names(df: DataFrame, id_col: str = 'interactor_id', name_col: str = 's_name') DataFrame

Patch degenerate s_names by appending interactor IDs to non-unique names.

Parameters:
  • df (pd.DataFrame) – DataFrame containing species data.

  • id_col (str, optional) – Column name containing interactor IDs, by default OMNIPATH_INTERACTIONS.INTERACTOR_ID.

  • name_col (str, optional) – Column name containing species names, by default SBML_DFS.S_NAME.

Returns:

DataFrame with patched s_names where non-unique names have been made unique by appending the interactor ID.

Return type:

pd.DataFrame

Notes

This function ensures that all s_names are unique by appending the interactor ID to any name that appears multiple times in the dataset.

napistu.ingestion.omnipath._prepare_integer_based_ids(interactor_int_ids: List[str]) DataFrame

Prepare PubChem identifiers for integer-based interactor IDs.

Parameters:

interactor_int_ids (List[str]) – List of integer-based interactor IDs to map to PubChem.

Returns:

DataFrame containing PubChem species data with columns: - interactor_id: Original interactor ID - s_name: PubChem compound name - s_Identifiers: Identifiers object with PubChem and SMILES identifiers

Return type:

pd.DataFrame

napistu.ingestion.omnipath._prepare_omnipath_ids_complexes(interactor_string_ids: List[str]) Tuple[DataFrame, DataFrame]

Prepare complex identifiers and formation reactions for string-based interactor IDs.

Parameters:

interactor_string_ids (List[str]) – List of string-based interactor IDs to check for complexes.

Returns:

  • complex_species (pd.DataFrame) – DataFrame containing complex species data with columns: - interactor_id: Complex identifier - s_name: Complex name - s_Identifiers: Identifiers object with complex identifiers

  • complex_formation_edgelist (pd.DataFrame) – DataFrame containing complex formation reactions with columns: - source: Component interactor ID - target: Complex interactor ID - upstream_stoichiometry: Stoichiometry of component - downstream_stoichiometry: Stoichiometry of complex (typically 1)

napistu.ingestion.omnipath._prepare_omnipath_ids_mirbase(interactor_string_ids: List[str]) DataFrame

Prepare miRBase identifiers for string-based interactor IDs.

Parameters:

interactor_string_ids (List[str]) – List of string-based interactor IDs to map to miRBase.

Returns:

DataFrame containing miRBase species data with columns: - interactor_id: Original interactor ID - s_name: miRNA name - s_Identifiers: Identifiers object with miRBase identifiers

Return type:

pd.DataFrame

napistu.ingestion.omnipath._prepare_omnipath_ids_uniprot(interactor_string_ids: List[str], organismal_species: str | OrganismalSpeciesValidator, preferred_method: str = 'bioconductor', allow_fallback: bool = True) DataFrame

Prepare UniProt identifiers for string-based interactor IDs.

Parameters:
  • interactor_string_ids (List[str]) – List of string-based interactor IDs to cross-reference with UniProt.

  • organismal_species (Union[str, OrganismalSpeciesValidator]) – The species for which to retrieve UniProt annotations.

  • preferred_method (str, optional) – Preferred method for identifier mapping, by default GENODEXITO_DEFS.BIOCONDUCTOR.

  • allow_fallback (bool, optional) – Whether to allow fallback to alternative mapping methods, by default True.

Returns:

DataFrame containing UniProt species data with columns: - interactor_id: Original interactor ID - s_name: Gene name - s_Identifiers: Identifiers object with UniProt identifiers

Return type:

pd.DataFrame

Raises:

ValueError – If organismal_species is invalid or not supported.

napistu.ingestion.omnipath.format_omnipath_as_sbml_dfs(organismal_species: str | OrganismalSpeciesValidator, preferred_method: str, allow_fallback: bool, **kwargs: Any) SBML_dfs

Format OmniPath interaction data as SBML_dfs object.

This function processes OmniPath interaction data and converts it into a structured SBML_dfs format suitable for network analysis and modeling. It handles various types of molecular interactions including proteins, small molecules, miRNAs, and complexes.

Parameters:
  • organismal_species (str | OrganismalSpeciesValidator) – The species name (e.g., “human”, “mouse”, “rat”) for which to retrieve interactions.

  • preferred_method (str) – Preferred method for identifier mapping (e.g., “bioconductor”, “ensembl”).

  • allow_fallback (bool) – Whether to allow fallback to alternative mapping methods if preferred method fails.

  • **kwargs (Any) – Additional keyword arguments passed to get_interactions().

Returns:

sbml_dfs – SBML_dfs object containing the formatted interaction data with species and reactions.

Return type:

sbml_dfs_core.SBML_dfs

Raises:
  • ValueError – If organismal_species is not supported by OmniPath. If duplicated s_names are found after processing.

  • ConnectionError – If unable to connect to OmniPath or external databases.

Notes

This function performs the following steps: 1. Retrieves interactions from OmniPath 2. Maps interactor IDs to systematic identifiers using multiple databases

  • PubChem

  • UniProt

  • miRBase

  • Complexes

  1. Aggregates molecular species across all sources

  2. Creates complex formation reactions where applicable

  3. Formats all interactions into a standardized edgelist

  4. Creates an SBML_dfs object with proper structure

Examples

>>> # Format human interactions using bioconductor mapping
>>> sbml_dfs = format_omnipath_as_sbml_dfs(
...     organismal_species="human",
...     preferred_method="bioconductor",
...     allow_fallback=True
... )
>>> print(f"Species: {len(sbml_dfs.species)}")
>>> print(f"Reactions: {len(sbml_dfs.reactions)}")
napistu.ingestion.omnipath.get_interactions(dataset: str | object = 'all', organismal_species: str | OrganismalSpeciesValidator = 'human', **kwargs) DataFrame

Retrieve interaction data from Omnipath with corrected evidence processing.

This function wraps the underlying Omnipath interaction classes and applies strict evidence filtering with fixes for known consensus logic bugs.

Parameters:
  • dataset (str or interaction class, default "all") – Which interaction dataset to retrieve. Options: - “all”: AllInteractions (all datasets) - “omnipath”: OmniPath (literature-supported only) - “dorothea”: Dorothea (TF-target from DoRothEA) - “collectri”: CollecTRI (TF-target from CollecTRI) - “tf_target”: TFtarget (TF-target interactions) - “transcriptional”: Transcriptional (all TF-target) - “post_translational”: PostTranslational (protein-protein) - “pathway_extra”: PathwayExtra (activity flow, no literature) - “kinase_extra”: KinaseExtra (enzyme-substrate, no literature) - “ligrec_extra”: LigRecExtra (ligand-receptor, no literature) - “tf_mirna”: TFmiRNA (TF-miRNA interactions) - “mirna”: miRNA (miRNA-target interactions) - “lncrna_mrna”: lncRNAmRNA (lncRNA-mRNA interactions) Or pass an interaction class directly.

  • **kwargs – Additional parameters passed to the underlying interaction class.

Returns:

Interaction data with columns including: - source, target: Interacting proteins - is_directed, is_stimulation, is_inhibition: Evidence presence flags - consensus_direction, consensus_stimulation, consensus_inhibition: Consensus flags - curation_effort: Evidence quality score - sources, references: Supporting data

Return type:

pd.DataFrame

Notes

Evidence Processing:

This function uses strict_evidences=True, which recomputes all evidence-derived attributes from the raw evidence data rather than using server pre-computed values. This ensures transparency about which evidence supports each interaction property.

How Evidence Flags Are Calculated:

The is_* flags indicate presence of evidence of each type: `python is_directed = bool(any evidence in "directed" category) is_stimulation = bool(any evidence in "positive" category) is_inhibition = bool(any evidence in "negative" category) ` These are simple “any evidence exists” boolean flags.

How Consensus Flags Are Calculated:

Consensus flags compare weighted evidence (curation effort) between categories: `python curation_effort = sum(len(evidence.references) + 1 for evidence in category) consensus_stimulation = curation_effort_positive >= curation_effort_negative consensus_inhibition = curation_effort_positive <= curation_effort_negative consensus_direction = curation_effort_directed >= curation_effort_undirected `

Important: When evidence is tied (equal curation effort), both consensus flags can be True. When no evidence exists, both would incorrectly be True due to 0 >= 0 and 0 <= 0, but this function fixes that edge case.

Consensus Logic Bug Fix:

The original Omnipath logic has a bug where interactions with no stimulation or inhibition evidence get consensus_stimulation=True and consensus_inhibition=True because both 0 >= 0 and 0 <= 0 evaluate to True. This function fixes such cases by setting both consensus flags to False when no evidence exists.

Evidence Categories Explained:

  • positive: Evidence supporting stimulation/activation

  • negative: Evidence supporting inhibition/repression

  • directed: Evidence that the interaction has a specific direction

  • undirected: Evidence that interaction exists but direction is unclear

Interpreting Results:

Common patterns and their meanings: - is_stimulation=True, consensus_stimulation=True: Strong positive evidence - is_stimulation=True, is_inhibition=True: Conflicting evidence exists - consensus_stimulation=True, consensus_inhibition=True: Tied evidence - is_stimulation=False, is_inhibition=False: No directional evidence

Why Use Strict Evidence Mode:

  • Transparency: Know exactly which evidence supports each attribute

  • Filtering: Can restrict to specific datasets/resources in query

  • Consistency: All attributes computed from same evidence base

  • Reproducibility: Results don’t depend on server-side integration

Examples

>>> # Get all interactions with corrected evidence processing
>>> df = get_interactions("all")
>>>
>>> # Get only DoRothEA TF-target interactions
>>> tf_interactions = get_interactions("dorothea")
>>>
>>> # Filter to specific resources
>>> filtered = get_interactions("all", resources=["IntAct", "BioGRID"])
>>>
>>> # Check for conflicting evidence
>>> conflicted = df[(df.is_stimulation) & (df.is_inhibition)]
>>> print(f"Found {len(conflicted)} interactions with conflicting evidence")
>>>
>>> # Look at evidence quality
>>> high_quality = df[df.curation_effort >= 10]