napistu.identifiers

Systematic identifiers for species, reactions, compartments, etc.

Classes

Identifiers

Identifiers for a single entity or relationship.

Public Functions

construct_cspecies_identifiers

Construct compartmentalized species identifiers by adding sc_id to species_identifiers.

df_to_identifiers

Convert a DataFrame of identifier information to a Series of Identifiers objects.

Functions

construct_cspecies_identifiers(...)

Construct compartmentalized species identifiers by adding sc_id to species_identifiers.

df_to_identifiers(df)

Convert a DataFrame of identifier information to a Series of Identifiers objects.

Classes

Identifiers(id_list[, verbose])

Identifiers for a single entity or relationship.

class napistu.identifiers.Identifiers(id_list: list, verbose: bool = False)

Bases: object

Identifiers for a single entity or relationship.

df

a DataFrame of identifiers with columns ontology, identifier, url, bqb

Type:

pd.DataFrame

Properties
----------
ids

(deprecated) a list of identifiers which are each a dict containing an ontology and identifier

Type:

list

Public Methods
-------
get_all_bqbs

Returns a set of all BQB entries

get_all_ontologies

Returns a set of all ontology entries

has_ontology(ontologies)

Returns a bool of whether 1+ of the ontologies was represented

hoist(ontology)

Returns value(s) from an ontology

print

Print a table of identifiers

classmethod merge(identifier_series: Series) Identifiers

Merge multiple Identifiers objects into a single Identifiers object.

Parameters:

identifier_series (pd.Series) – Series of Identifiers objects to merge

Returns:

New Identifiers object containing all unique identifiers

Return type:

Identifiers

__init__(id_list: list, verbose: bool = False) None

Tracks a set of identifiers and the ontologies they belong to.

Parameters:
  • id_list (list) – a list of identifier dictionaries containing ontology, identifier, and optionally url

  • verbose (bool) – extra reporting, defaults to False

Return type:

None.

get_all_bqbs() set[str]

Returns a set of all BQB entries

Returns:

A set containing all unique BQB values from the identifiers

Return type:

set[str]

get_all_ontologies(bqb_terms: list[str] = None) set[str]

Returns a set of all ontology entries

Returns:

A set containing all unique ontology names from the identifiers

Return type:

set[str]

has_ontology(ontologies: str | list[str]) bool

Check if specified ontologies are present in the identifiers.

Parameters:

ontologies (str or list of str) – Ontology name(s) to search for

Returns:

True if any specified ontologies are present

Return type:

bool

hoist(ontology: str, squeeze: bool = True) str | list[str] | None

Returns value(s) from an ontology

Parameters:
  • ontology (str) – the ontology of interest

  • squeeze (bool) – if True, return a single value if possible

Returns:

the value(s) of an ontology of interest

Return type:

str or list

print()

Print a table of identifiers

property ids: list[dict]
class napistu.identifiers._IdentifierValidator(*, ontology: str, identifier: str, bqb: str, url: str | None = None)

Bases: BaseModel

_abc_impl = <_abc._abc_data object>
bqb: str
identifier: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ontology: str
url: str | None
class napistu.identifiers._IdentifiersValidator(*, id_list: list[_IdentifierValidator])

Bases: BaseModel

_abc_impl = <_abc._abc_data object>
id_list: list[_IdentifierValidator]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

napistu.identifiers._check_species_identifiers_table(species_identifiers: DataFrame, required_vars: set = {'bqb', 'identifier', 'ontology', 's_id', 's_name'})
napistu.identifiers._deduplicate_identifiers_by_priority(df: DataFrame, group_cols: list[str]) DataFrame

Deduplicate identifiers by prioritizing BQB terms and URL presence.

Parameters:
  • df (pd.DataFrame) – DataFrame containing identifier information with BQB and URL columns

  • group_cols (list[str]) – Columns to group by for deduplication (e.g., [ontology, identifier] or [pk, ontology, identifier])

Returns:

Deduplicated DataFrame with highest priority entries retained

Return type:

pd.DataFrame

napistu.identifiers._prepare_species_identifiers(sbml_dfs: SBML_dfs, dogmatic: bool = False, species_identifiers: pd.DataFrame | None = None) pd.DataFrame

Accepts and validates species_identifiers, or extracts a fresh table if None.

napistu.identifiers._validate_assets_sbml_ids(sbml_dfs: SBML_dfs, identifiers_df: pd.DataFrame) None

Check an sbml_dfs file and identifiers table for inconsistencies.

Parameters:
  • sbml_dfs (sbml_dfs_core.SBML_dfs) – The sbml_dfs object to check

  • identifiers_df (pd.DataFrame) – The identifiers table to check

Return type:

None

Raises:

ValueError – If there are inconsistencies between the sbml_dfs and identifiers_df

napistu.identifiers.construct_cspecies_identifiers(species_identifiers: pd.DataFrame, cspecies_references: 'SBML_dfs' | pd.DataFrame) pd.DataFrame

Construct compartmentalized species identifiers by adding sc_id to species_identifiers.

This function merges compartmentalized species IDs (sc_id) into a species_identifiers table, allowing you to work with compartmentalized species without loading the full sbml_dfs object.

Parameters:
  • species_identifiers (pd.DataFrame) – A species identifiers table with columns including s_id, ontology, identifier. Must satisfy SPECIES_IDENTIFIERS_REQUIRED_VARS.

  • cspecies_references (Union[sbml_dfs_core.SBML_dfs, pd.DataFrame]) – Either an sbml_dfs object from which compartmentalized_species will be extracted, or a 2-column DataFrame with s_id and sc_id columns.

Returns:

The species_identifiers table with an additional sc_id column. Each row in the original table will be expanded to include all corresponding sc_ids for that s_id.

Return type:

pd.DataFrame

napistu.identifiers.df_to_identifiers(df: DataFrame) Series

Convert a DataFrame of identifier information to a Series of Identifiers objects.

Parameters:

df (pd.DataFrame) – DataFrame containing identifier information with required columns: ontology, identifier, url, bqb

Returns:

Series indexed by index_col containing Identifiers objects

Return type:

pd.Series