napistu.matching.species

Functions

`features_to_pathway_species`(...[, ...])	Features to Pathway Species
`match_by_ontology_and_identifier`(...[, ...])	Match features to pathway species based on both ontology and identifier matches.
`match_features_to_wide_pathway_species`(...)	Convert a wide-format DataFrame with multiple ontology columns to long format, and match features to pathway species by ontology and identifier.

napistu.matching.species._ensure_feature_id_var(df: DataFrame, feature_id_var: str = 'feature_id') → DataFrame

Ensure the DataFrame has a feature_id column, creating one if it doesn’t exist.

Parameters:

df (pd.DataFrame) – DataFrame to check/modify
feature_id_var (str, default=FEATURE_ID_VAR_DEFAULT) – Name of the feature ID column

Returns:

DataFrame with guaranteed feature_id column

Return type:

pd.DataFrame

napistu.matching.species._log_feature_species_mapping_stats(pathway_species: DataFrame, feature_id_var: str = 'feature_id'): Log statistics about the mapping between feature_id and s_id in the pathway_species DataFrame.

napistu.matching.species._validate_wide_ontologies(wide_df: DataFrame, ontologies: str | Set[str] | Dict[str, str] | None = None) → Set[str]

Validate ontology specifications against the wide DataFrame and ONTOLOGIES_LIST.

Parameters:

wide_df (pd.DataFrame) – DataFrame with one column per ontology and a results column
ontologies (Optional[Union[str, Set[str], Dict[str, str]]]) – Either: - String specifying a single ontology column - Set of columns to treat as ontologies - Dict mapping wide column names to ontology names - None to automatically detect ontology columns based on ONTOLOGIES_LIST

Returns:

Set of validated ontology names. For dictionary mappings, returns the target ontology names.

Return type:

Set[str]

Raises:

ValueError – If validation fails for any ontology specification or no valid ontologies are found

napistu.matching.species.features_to_pathway_species(feature_identifiers: DataFrame, species_identifiers: DataFrame, ontologies: set, feature_identifiers_var: str = 'identifier', feature_id_var: str = 'feature_id', expand_identifiers: bool = False, identifier_delimiter: str = '/', verbose: bool = False) → DataFrame

Features to Pathway Species

Match a table of molecular species to their corresponding species in a pathway representation.

Parameters: feature_identifiers: pd.DataFrame

pd.Dataframe containing a “feature_identifiers_var” variable used to match entries

species_identifiers: pd.DataFrame: A table of molecular species identifiers produced from sbml_dfs.get_identifiers(“species”) generally using sbml_dfs.export_sbml_dfs()
ontologies: set: A set of ontologies used to match features to pathway species
feature_identifiers_var: str: Variable in “feature_identifiers” containing identifiers
expand_identifiers: bool, default=False: If True, split identifiers in feature_identifiers_var by identifier_delimiter and explode into multiple rows
identifier_delimiter: str, default=”/”: Delimiter to use for splitting identifiers if expand_identifiers is True
verbose: bool, default=False: If True, log mapping statistics at the end of the function

Returns: pathway_species: pd.DataFrame

species_identifiers joined to feature_identifiers based on shared identifiers

napistu.matching.species.match_by_ontology_and_identifier(feature_identifiers: DataFrame, species_identifiers: DataFrame, ontologies: str | Set[str] | List[str], feature_identifiers_var: str = 'identifier', verbose: bool = False) → DataFrame

Match features to pathway species based on both ontology and identifier matches. Performs separate matching for each ontology and concatenates the results.

Parameters:

feature_identifiers (pd.DataFrame) – DataFrame containing feature identifiers and results. Must have columns [ontology, feature_identifiers_var, results]
species_identifiers (pd.DataFrame) – DataFrame containing species identifiers from pathway. Must have columns [ontology, identifier]
ontologies (Union[str, Set[str], List[str]]) – Ontologies to match on. Can be: - A single ontology string - A set of ontology strings - A list of ontology strings
feature_identifiers_var (str, default="identifier") – Name of the identifier column in feature_identifiers
verbose (bool, default=False) – Whether to print verbose output

Returns:

Concatenated results of matching for each ontology. Contains all columns from features_to_pathway_species()

Return type:

pd.DataFrame

Examples

>>> # Match using a single ontology
>>> result = match_by_ontology_and_identifier(
...     feature_identifiers=features_df,
...     species_identifiers=species_df,
...     ontologies="uniprot"
... )

>>> # Match using multiple ontologies
>>> result = match_by_ontology_and_identifier(
...     feature_identifiers=features_df,
...     species_identifiers=species_df,
...     ontologies={"uniprot", "chebi"}
... )

napistu.matching.species.match_features_to_wide_pathway_species(wide_df: DataFrame, species_identifiers: DataFrame, ontologies: Set[str] | Dict[str, str] | None = None, feature_identifiers_var: str = 'identifier', feature_id_var: str = 'feature_id', verbose: bool = False) → DataFrame

Convert a wide-format DataFrame with multiple ontology columns to long format, and match features to pathway species by ontology and identifier.

Parameters:

wide_df (pd.DataFrame) – DataFrame with ontology identifier columns and any number of results columns. All non-ontology columns are treated as results.
species_identifiers (pd.DataFrame) – DataFrame as required by features_to_pathway_species
ontologies (Optional[Union[Set[str], Dict[str, str]]], default=None) – Either: - Set of columns to treat as ontologies (these should be entries in ONTOLOGIES_LIST ) - Dict mapping wide column names to ontology names in the ONTOLOGIES_LIST controlled vocabulary - None to automatically detect valid ontology columns based on ONTOLOGIES_LIST
feature_identifiers_var (str, default="identifier") – Name for the identifier column in the long format
feature_id_var (str, default=FEATURE_ID_VAR_DEFAULT) – Name for the feature id column in the long format
verbose (bool, default=False) – Whether to print verbose output

Returns:

Output of match_by_ontology_and_identifier

Return type:

pd.DataFrame

Examples

>>> # Example with auto-detected ontology columns and multiple results
>>> wide_df = pd.DataFrame({
...     'uniprot': ['P12345', 'Q67890'],
...     'chebi': ['15377', '16810'],
...     'log2fc': [1.0, 2.0],
...     'pvalue': [0.01, 0.05]
... })
>>> result = match_features_to_wide_pathway_species(
...     wide_df=wide_df,
...     species_identifiers=species_identifiers
... )

>>> # Example with custom ontology mapping
>>> wide_df = pd.DataFrame({
...     'protein_id': ['P12345', 'Q67890'],
...     'compound_id': ['15377', '16810'],
...     'expression': [1.0, 2.0],
...     'confidence': [0.8, 0.9]
... })
>>> result = match_features_to_wide_pathway_species(
...     wide_df=wide_df,
...     species_identifiers=species_identifiers,
...     ontologies={'protein_id': 'uniprot', 'compound_id': 'chebi'}
... )