napistu.matching.species

Functions

features_to_pathway_species(...[, ...])

Features to Pathway Species

match_by_ontology_and_identifier(...[, ...])

Match features to pathway species based on both ontology and identifier matches.

match_features_to_wide_pathway_species(...)

Convert a wide-format DataFrame with multiple ontology columns to long format, and match features to pathway species by ontology and identifier.

napistu.matching.species._ensure_feature_id_var(df: DataFrame, feature_id_var: str = 'feature_id') DataFrame

Ensure the DataFrame has a feature_id column, creating one if it doesn’t exist.

Parameters:
  • df (pd.DataFrame) – DataFrame to check/modify

  • feature_id_var (str, default=FEATURE_ID_VAR_DEFAULT) – Name of the feature ID column

Returns:

DataFrame with guaranteed feature_id column

Return type:

pd.DataFrame

napistu.matching.species._log_feature_species_mapping_stats(pathway_species: DataFrame, feature_id_var: str = 'feature_id')

Log statistics about the mapping between feature_id and s_id in the pathway_species DataFrame.

napistu.matching.species._validate_wide_ontologies(wide_df: DataFrame, ontologies: str | Set[str] | Dict[str, str] | None = None) Set[str]

Validate ontology specifications against the wide DataFrame and ONTOLOGIES_LIST.

Parameters:
  • wide_df (pd.DataFrame) – DataFrame with one column per ontology and a results column

  • ontologies (Optional[Union[str, Set[str], Dict[str, str]]]) – Either: - String specifying a single ontology column - Set of columns to treat as ontologies - Dict mapping wide column names to ontology names - None to automatically detect ontology columns based on ONTOLOGIES_LIST

Returns:

Set of validated ontology names. For dictionary mappings, returns the target ontology names.

Return type:

Set[str]

Raises:

ValueError – If validation fails for any ontology specification or no valid ontologies are found

napistu.matching.species.features_to_pathway_species(feature_identifiers: DataFrame, species_identifiers: DataFrame, ontologies: set, feature_identifiers_var: str = 'identifier', feature_id_var: str = 'feature_id', expand_identifiers: bool = False, identifier_delimiter: str = '/', verbose: bool = False) DataFrame

Features to Pathway Species

Match a table of molecular species to their corresponding species in a pathway representation.

Parameters: feature_identifiers: pd.DataFrame

pd.Dataframe containing a “feature_identifiers_var” variable used to match entries

species_identifiers: pd.DataFrame

A table of molecular species identifiers produced from sbml_dfs.get_identifiers(“species”) generally using sbml_dfs.export_sbml_dfs()

ontologies: set

A set of ontologies used to match features to pathway species

feature_identifiers_var: str

Variable in “feature_identifiers” containing identifiers

expand_identifiers: bool, default=False

If True, split identifiers in feature_identifiers_var by identifier_delimiter and explode into multiple rows

identifier_delimiter: str, default=”/”

Delimiter to use for splitting identifiers if expand_identifiers is True

verbose: bool, default=False

If True, log mapping statistics at the end of the function

Returns: pathway_species: pd.DataFrame

species_identifiers joined to feature_identifiers based on shared identifiers

napistu.matching.species.match_by_ontology_and_identifier(feature_identifiers: DataFrame, species_identifiers: DataFrame, ontologies: str | Set[str] | List[str], feature_identifiers_var: str = 'identifier', verbose: bool = False) DataFrame

Match features to pathway species based on both ontology and identifier matches. Performs separate matching for each ontology and concatenates the results.

Parameters:
  • feature_identifiers (pd.DataFrame) – DataFrame containing feature identifiers and results. Must have columns [ontology, feature_identifiers_var, results]

  • species_identifiers (pd.DataFrame) – DataFrame containing species identifiers from pathway. Must have columns [ontology, identifier]

  • ontologies (Union[str, Set[str], List[str]]) – Ontologies to match on. Can be: - A single ontology string - A set of ontology strings - A list of ontology strings

  • feature_identifiers_var (str, default="identifier") – Name of the identifier column in feature_identifiers

  • verbose (bool, default=False) – Whether to print verbose output

Returns:

Concatenated results of matching for each ontology. Contains all columns from features_to_pathway_species()

Return type:

pd.DataFrame

Examples

>>> # Match using a single ontology
>>> result = match_by_ontology_and_identifier(
...     feature_identifiers=features_df,
...     species_identifiers=species_df,
...     ontologies="uniprot"
... )
>>> # Match using multiple ontologies
>>> result = match_by_ontology_and_identifier(
...     feature_identifiers=features_df,
...     species_identifiers=species_df,
...     ontologies={"uniprot", "chebi"}
... )
napistu.matching.species.match_features_to_wide_pathway_species(wide_df: DataFrame, species_identifiers: DataFrame, ontologies: Set[str] | Dict[str, str] | None = None, feature_identifiers_var: str = 'identifier', feature_id_var: str = 'feature_id', verbose: bool = False) DataFrame

Convert a wide-format DataFrame with multiple ontology columns to long format, and match features to pathway species by ontology and identifier.

Parameters:
  • wide_df (pd.DataFrame) – DataFrame with ontology identifier columns and any number of results columns. All non-ontology columns are treated as results.

  • species_identifiers (pd.DataFrame) – DataFrame as required by features_to_pathway_species

  • ontologies (Optional[Union[Set[str], Dict[str, str]]], default=None) – Either: - Set of columns to treat as ontologies (these should be entries in ONTOLOGIES_LIST ) - Dict mapping wide column names to ontology names in the ONTOLOGIES_LIST controlled vocabulary - None to automatically detect valid ontology columns based on ONTOLOGIES_LIST

  • feature_identifiers_var (str, default="identifier") – Name for the identifier column in the long format

  • feature_id_var (str, default=FEATURE_ID_VAR_DEFAULT) – Name for the feature id column in the long format

  • verbose (bool, default=False) – Whether to print verbose output

Returns:

Output of match_by_ontology_and_identifier

Return type:

pd.DataFrame

Examples

>>> # Example with auto-detected ontology columns and multiple results
>>> wide_df = pd.DataFrame({
...     'uniprot': ['P12345', 'Q67890'],
...     'chebi': ['15377', '16810'],
...     'log2fc': [1.0, 2.0],
...     'pvalue': [0.01, 0.05]
... })
>>> result = match_features_to_wide_pathway_species(
...     wide_df=wide_df,
...     species_identifiers=species_identifiers
... )
>>> # Example with custom ontology mapping
>>> wide_df = pd.DataFrame({
...     'protein_id': ['P12345', 'Q67890'],
...     'compound_id': ['15377', '16810'],
...     'expression': [1.0, 2.0],
...     'confidence': [0.8, 0.9]
... })
>>> result = match_features_to_wide_pathway_species(
...     wide_df=wide_df,
...     species_identifiers=species_identifiers,
...     ontologies={'protein_id': 'uniprot', 'compound_id': 'chebi'}
... )