napistu.matching.species
Functions
|
Features to Pathway Species |
|
Match features to pathway species based on both ontology and identifier matches. |
Convert a wide-format DataFrame with multiple ontology columns to long format, and match features to pathway species by ontology and identifier. |
- napistu.matching.species._ensure_feature_id_var(df: DataFrame, feature_id_var: str = 'feature_id') DataFrame
Ensure the DataFrame has a feature_id column, creating one if it doesn’t exist.
- Parameters:
df (pd.DataFrame) – DataFrame to check/modify
feature_id_var (str, default=FEATURE_ID_VAR_DEFAULT) – Name of the feature ID column
- Returns:
DataFrame with guaranteed feature_id column
- Return type:
pd.DataFrame
- napistu.matching.species._log_feature_species_mapping_stats(pathway_species: DataFrame, feature_id_var: str = 'feature_id')
Log statistics about the mapping between feature_id and s_id in the pathway_species DataFrame.
- napistu.matching.species._validate_wide_ontologies(wide_df: DataFrame, ontologies: str | Set[str] | Dict[str, str] | None = None) Set[str]
Validate ontology specifications against the wide DataFrame and ONTOLOGIES_LIST.
- Parameters:
wide_df (pd.DataFrame) – DataFrame with one column per ontology and a results column
ontologies (Optional[Union[str, Set[str], Dict[str, str]]]) – Either: - String specifying a single ontology column - Set of columns to treat as ontologies - Dict mapping wide column names to ontology names - None to automatically detect ontology columns based on ONTOLOGIES_LIST
- Returns:
Set of validated ontology names. For dictionary mappings, returns the target ontology names.
- Return type:
Set[str]
- Raises:
ValueError – If validation fails for any ontology specification or no valid ontologies are found
- napistu.matching.species.features_to_pathway_species(feature_identifiers: DataFrame, species_identifiers: DataFrame, ontologies: set, feature_identifiers_var: str = 'identifier', feature_id_var: str = 'feature_id', expand_identifiers: bool = False, identifier_delimiter: str = '/', verbose: bool = False) DataFrame
Features to Pathway Species
Match a table of molecular species to their corresponding species in a pathway representation.
Parameters: feature_identifiers: pd.DataFrame
pd.Dataframe containing a “feature_identifiers_var” variable used to match entries
- species_identifiers: pd.DataFrame
A table of molecular species identifiers produced from sbml_dfs.get_identifiers(“species”) generally using sbml_dfs.export_sbml_dfs()
- ontologies: set
A set of ontologies used to match features to pathway species
- feature_identifiers_var: str
Variable in “feature_identifiers” containing identifiers
- expand_identifiers: bool, default=False
If True, split identifiers in feature_identifiers_var by identifier_delimiter and explode into multiple rows
- identifier_delimiter: str, default=”/”
Delimiter to use for splitting identifiers if expand_identifiers is True
- verbose: bool, default=False
If True, log mapping statistics at the end of the function
Returns: pathway_species: pd.DataFrame
species_identifiers joined to feature_identifiers based on shared identifiers
- napistu.matching.species.match_by_ontology_and_identifier(feature_identifiers: DataFrame, species_identifiers: DataFrame, ontologies: str | Set[str] | List[str], feature_identifiers_var: str = 'identifier', verbose: bool = False) DataFrame
Match features to pathway species based on both ontology and identifier matches. Performs separate matching for each ontology and concatenates the results.
- Parameters:
feature_identifiers (pd.DataFrame) – DataFrame containing feature identifiers and results. Must have columns [ontology, feature_identifiers_var, results]
species_identifiers (pd.DataFrame) – DataFrame containing species identifiers from pathway. Must have columns [ontology, identifier]
ontologies (Union[str, Set[str], List[str]]) – Ontologies to match on. Can be: - A single ontology string - A set of ontology strings - A list of ontology strings
feature_identifiers_var (str, default="identifier") – Name of the identifier column in feature_identifiers
verbose (bool, default=False) – Whether to print verbose output
- Returns:
Concatenated results of matching for each ontology. Contains all columns from features_to_pathway_species()
- Return type:
pd.DataFrame
Examples
>>> # Match using a single ontology >>> result = match_by_ontology_and_identifier( ... feature_identifiers=features_df, ... species_identifiers=species_df, ... ontologies="uniprot" ... )
>>> # Match using multiple ontologies >>> result = match_by_ontology_and_identifier( ... feature_identifiers=features_df, ... species_identifiers=species_df, ... ontologies={"uniprot", "chebi"} ... )
- napistu.matching.species.match_features_to_wide_pathway_species(wide_df: DataFrame, species_identifiers: DataFrame, ontologies: Set[str] | Dict[str, str] | None = None, feature_identifiers_var: str = 'identifier', feature_id_var: str = 'feature_id', verbose: bool = False) DataFrame
Convert a wide-format DataFrame with multiple ontology columns to long format, and match features to pathway species by ontology and identifier.
- Parameters:
wide_df (pd.DataFrame) – DataFrame with ontology identifier columns and any number of results columns. All non-ontology columns are treated as results.
species_identifiers (pd.DataFrame) – DataFrame as required by features_to_pathway_species
ontologies (Optional[Union[Set[str], Dict[str, str]]], default=None) – Either: - Set of columns to treat as ontologies (these should be entries in ONTOLOGIES_LIST ) - Dict mapping wide column names to ontology names in the ONTOLOGIES_LIST controlled vocabulary - None to automatically detect valid ontology columns based on ONTOLOGIES_LIST
feature_identifiers_var (str, default="identifier") – Name for the identifier column in the long format
feature_id_var (str, default=FEATURE_ID_VAR_DEFAULT) – Name for the feature id column in the long format
verbose (bool, default=False) – Whether to print verbose output
- Returns:
Output of match_by_ontology_and_identifier
- Return type:
pd.DataFrame
Examples
>>> # Example with auto-detected ontology columns and multiple results >>> wide_df = pd.DataFrame({ ... 'uniprot': ['P12345', 'Q67890'], ... 'chebi': ['15377', '16810'], ... 'log2fc': [1.0, 2.0], ... 'pvalue': [0.01, 0.05] ... }) >>> result = match_features_to_wide_pathway_species( ... wide_df=wide_df, ... species_identifiers=species_identifiers ... )
>>> # Example with custom ontology mapping >>> wide_df = pd.DataFrame({ ... 'protein_id': ['P12345', 'Q67890'], ... 'compound_id': ['15377', '16810'], ... 'expression': [1.0, 2.0], ... 'confidence': [0.8, 0.9] ... }) >>> result = match_features_to_wide_pathway_species( ... wide_df=wide_df, ... species_identifiers=species_identifiers, ... ontologies={'protein_id': 'uniprot', 'compound_id': 'chebi'} ... )