napistu.ingestion.sbml

Functions

sbml_dfs_from_sbml(self, sbml_model[, ...])

Parses an SBML model into a set of standardized DataFrames.

Classes

CompartmentAliasesValidator([root])

A Pydantic model for validating compartment alias dictionaries.

SBML(sbml_path[, verbose])

A class for handling Systems Biology Markup Language (SBML) files.

SBML_reaction(sbml_reaction)

A convenience class for processing individual SBML reactions.

class napistu.ingestion.sbml.CompartmentAliasesValidator(root: RootModelRootType = PydanticUndefined)

Bases: RootModel

A Pydantic model for validating compartment alias dictionaries.

This model ensures that the compartment alias dictionary is a mapping from a string (the canonical compartment name) to a list of strings (the aliases for that compartment). It also validates that the keys of the dictionary are valid compartment names.

root

The root of the model is a dictionary where keys are strings and values are lists of strings.

Type:

dict[str, list[str]]

classmethod from_dict(data: dict[str, list[str]]) CompartmentAliasesValidator

Create a CompartmentAliasesValidator from a dictionary.

Parameters:

data (dict[str, list[str]]) – A dictionary mapping canonical compartment names to their aliases.

Returns:

A validated instance of the model.

Return type:

CompartmentAliasesValidator

classmethod validate_aliases(values: dict[str, list[str]])

Validate the compartment alias dictionary.

items()
_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

root: dict[str, list[str]]
class napistu.ingestion.sbml.SBML(sbml_path: str, verbose: bool = False)

Bases: object

A class for handling Systems Biology Markup Language (SBML) files.

This class provides an interface to read and parse SBML files, offering methods to access the model, summarize its contents, and report any errors encountered during parsing.

Parameters:

sbml_path (str) – The file path to an SBML model. Supports local paths and GCS URIs.

document

The raw SBML document object from libsbml.

Type:

libsbml.SBMLDocument

model

The parsed SBML model object from libsbml.

Type:

libsbml.Model

verbose

If True, then include detailed logs.

Type:

bool, default=False

summary()

Prints a summary of the sbml model

sbml_errors(reduced_log, return_df)

Print a summary of all errors in the SBML file

Raises:

ValueError – If the SBML model is not Level 3, or if critical, unknown errors are found during parsing.

__init__(sbml_path: str, verbose: bool = False) None

Initializes the SBML object by reading and validating an SBML file.

_define_compartments(compartment_aliases_dict: dict | None = None) DataFrame

Extracts and defines compartments from the SBML model.

This function iterates through the compartments in the SBML model, extracting their IDs, names, and identifiers. It also handles cases where CVTerms are missing by mapping compartment names to known GO terms.

Parameters:
  • sbml_model (SBML) – The SBML model to process.

  • compartment_aliases_dict (dict, optional) – A dictionary to map custom compartment names. If None, the default mapping from COMPARTMENT_ALIASES is used.

Returns:

A DataFrame containing information about each compartment, indexed by compartment ID.

Return type:

pd.DataFrame

_define_cspecies(verbose: bool = False) DataFrame

Creates a DataFrame of compartmentalized species from an SBML model.

This function extracts all species from the model and creates a standardized DataFrame that includes unique IDs for each compartmentalized species (sc_id), along with species and compartment IDs, and their corresponding identifiers.

Parameters:

verbose (bool) – extra reporting, defaults to False

Returns:

A DataFrame containing information about each compartmentalized species.

Return type:

pd.DataFrame

_define_fbc_gene_products() list[dict]
_define_reactions() tuple[DataFrame, DataFrame]

Extracts and defines reactions and their participating species.

This function iterates through all reactions in the SBML model, creating a DataFrame for reaction attributes and another for all participating species (reactants, products, and modifiers).

Parameters:

sbml_model (SBML) – The SBML model to process.

Returns:

A tuple containing two DataFrames: - The first DataFrame contains reaction attributes, indexed by reaction ID. - The second DataFrame lists all species participating in reactions.

Return type:

tuple[pd.DataFrame, pd.DataFrame]

_define_species(verbose: bool = False) tuple[DataFrame, DataFrame]

Extracts and defines species and compartmentalized species.

This function creates two DataFrames: one for unique molecular species (un-compartmentalized) and another for compartmentalized species, which represent a species within a specific compartment.

Parameters:

verbose (bool) – extra reporting, defaults to False

Returns:

A tuple containing two DataFrames: - The first DataFrame represents unique molecular species. - The second DataFrame represents compartmentalized species.

Return type:

tuple[pd.DataFrame, pd.DataFrame]

sbml_errors(reduced_log: bool = True, return_df: bool = False)

Formats and reports all errors found in the SBML file.

Parameters:
  • reduced_log (bool, optional) – If True, aggregates errors by category and severity. Defaults to True.

  • return_df (bool, optional) – If True, returns a DataFrame of the errors. Otherwise, prints a styled summary. Defaults to False.

Returns:

A DataFrame containing the error log if return_df is True and errors are present, otherwise None.

Return type:

pd.DataFrame or None

summary() DataFrame

Generates a styled summary of the SBML model.

Returns:

A styled pandas DataFrame containing a summary of the model, including pathway name, ID, and counts of species and reactions.

Return type:

pd.io.formats.style.Styler

class napistu.ingestion.sbml.SBML_reaction(sbml_reaction: libsbml.Reaction)

Bases: object

A convenience class for processing individual SBML reactions.

This class extracts and organizes key information about an SBML reaction, including its attributes and participating species (substrates, products, and modifiers).

Parameters:

sbml_reaction (libsbml.Reaction) – A libsbml Reaction object to be processed.

reaction_dict

A dictionary of reaction-level attributes, including its ID, name, reversibility, identifiers, and source information.

Type:

dict

species

A DataFrame listing all species participating in the reaction, including their roles (substrate, product, modifier), stoichiometry, and SBO terms.

Type:

pd.DataFrame

__init__(sbml_reaction: libsbml.Reaction) None

Initializes the SBML_reaction object by parsing a libsbml Reaction.

napistu.ingestion.sbml._cv_to_Identifiers(entity: libsbml.Species | libsbml.Reaction | libsbml.Compartment, strict: bool = False) Identifiers

Convert an SBML controlled vocabulary element into a cpr Identifiers object.

Parameters:
  • entity (libsbml.Species) – An entity (species, reaction, compartment, …) with attached CV terms

  • strict (bool, default True) – If True, log full tracebacks for parsing failures. If False, use simple warning messages.

Returns:

An Identifiers object containing the CV terms

Return type:

Identifiers

napistu.ingestion.sbml._define_compartments_missing_cvterms(comp: libsbml.Compartment, aliases: dict) dict[str, Any]
napistu.ingestion.sbml._extract_gene_products(association: libsbml.Association) list[dict]

Recursively extracts gene products from an association tree.

napistu.ingestion.sbml._get_biological_qualifier_codes() dict

Lazily build the libsbml integer to BQB string mapping.

napistu.ingestion.sbml._get_gene_product_dict(gp)

Extracts attributes of a gene product from an SBML reaction object.

Parameters:

gp (libsbml.GeneProduct) – A libsbml GeneProduct object.

Returns:

A dictionary containing the gene product’s ID, name, and identifiers.

Return type:

dict

napistu.ingestion.sbml._libsbml()

Import libsbml or raise ImportError with install hint (pip install napistu[etl]).

napistu.ingestion.sbml._refine_compartments(compartments_df, compartmentalized_species_df)

Refine compartments to only those actually used by compartmentalized species.

This function filters the compartments DataFrame to include only compartments that are referenced by compartmentalized species, and validates that all required compartments exist.

Parameters:
  • compartments_df (pd.DataFrame) – DataFrame of all extracted compartments with c_id as index

  • compartmentalized_species_df (pd.DataFrame) – DataFrame of compartmentalized species with c_id column

Returns:

Filtered compartments DataFrame containing only used compartments

Return type:

pd.DataFrame

Raises:

ValueError – If compartmentalized species reference compartments that don’t exist

napistu.ingestion.sbml._validate_species_consistency(species_df, compartmentalized_species_df)

Validate consistency between species and compartmentalized species tables.

Parameters:
  • species_df (pd.DataFrame) – DataFrame of species with s_id as index

  • compartmentalized_species_df (pd.DataFrame) – DataFrame of compartmentalized species with s_id column

Raises:

ValueError – If there are inconsistencies between the two tables

napistu.ingestion.sbml.sbml_dfs_from_sbml(self, sbml_model: SBML, compartment_aliases: dict | None = None, verbose: bool = False)

Parses an SBML model into a set of standardized DataFrames.

This function serves as the main entry point for converting an SBML model into the internal DataFrame-based representation used by napistu. It orchestrates the processing of compartments, species, and reactions.

Parameters:
  • self (object) – The instance of the calling class, expected to have a schema attribute.

  • sbml_model (SBML) – The SBML model to be parsed.

  • compartment_aliases (dict, optional) – A dictionary to map custom compartment names to the napistu controlled vocabulary. If None, the default mapping (COMPARTMENT_ALIASES) is used. Defaults to None.

  • verbose (bool) – extra reporting, defaults to False

Returns:

The calling class instance, now populated with DataFrames for compartments, species, compartmentalized_species, reactions, and reaction_species

Return type:

object