napistu.ingestion.sbml
Functions
|
Parses an SBML model into a set of standardized DataFrames. |
Classes
|
A Pydantic model for validating compartment alias dictionaries. |
|
A class for handling Systems Biology Markup Language (SBML) files. |
|
A convenience class for processing individual SBML reactions. |
- class napistu.ingestion.sbml.CompartmentAliasesValidator(root: RootModelRootType = PydanticUndefined)
Bases:
RootModelA Pydantic model for validating compartment alias dictionaries.
This model ensures that the compartment alias dictionary is a mapping from a string (the canonical compartment name) to a list of strings (the aliases for that compartment). It also validates that the keys of the dictionary are valid compartment names.
- root
The root of the model is a dictionary where keys are strings and values are lists of strings.
- Type:
dict[str, list[str]]
- classmethod from_dict(data: dict[str, list[str]]) CompartmentAliasesValidator
Create a CompartmentAliasesValidator from a dictionary.
- Parameters:
data (dict[str, list[str]]) – A dictionary mapping canonical compartment names to their aliases.
- Returns:
A validated instance of the model.
- Return type:
- classmethod validate_aliases(values: dict[str, list[str]])
Validate the compartment alias dictionary.
- items()
- _abc_impl = <_abc._abc_data object>
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- root: dict[str, list[str]]
- class napistu.ingestion.sbml.SBML(sbml_path: str, verbose: bool = False)
Bases:
objectA class for handling Systems Biology Markup Language (SBML) files.
This class provides an interface to read and parse SBML files, offering methods to access the model, summarize its contents, and report any errors encountered during parsing.
- Parameters:
sbml_path (str) – The file path to an SBML model. Supports local paths and GCS URIs.
- document
The raw SBML document object from libsbml.
- Type:
libsbml.SBMLDocument
- model
The parsed SBML model object from libsbml.
- Type:
libsbml.Model
- verbose
If True, then include detailed logs.
- Type:
bool, default=False
- summary()
Prints a summary of the sbml model
- sbml_errors(reduced_log, return_df)
Print a summary of all errors in the SBML file
- Raises:
ValueError – If the SBML model is not Level 3, or if critical, unknown errors are found during parsing.
- __init__(sbml_path: str, verbose: bool = False) None
Initializes the SBML object by reading and validating an SBML file.
- _define_compartments(compartment_aliases_dict: dict | None = None) DataFrame
Extracts and defines compartments from the SBML model.
This function iterates through the compartments in the SBML model, extracting their IDs, names, and identifiers. It also handles cases where CVTerms are missing by mapping compartment names to known GO terms.
- Parameters:
sbml_model (SBML) – The SBML model to process.
compartment_aliases_dict (dict, optional) – A dictionary to map custom compartment names. If None, the default mapping from COMPARTMENT_ALIASES is used.
- Returns:
A DataFrame containing information about each compartment, indexed by compartment ID.
- Return type:
pd.DataFrame
- _define_cspecies(verbose: bool = False) DataFrame
Creates a DataFrame of compartmentalized species from an SBML model.
This function extracts all species from the model and creates a standardized DataFrame that includes unique IDs for each compartmentalized species (sc_id), along with species and compartment IDs, and their corresponding identifiers.
- Parameters:
verbose (bool) – extra reporting, defaults to False
- Returns:
A DataFrame containing information about each compartmentalized species.
- Return type:
pd.DataFrame
- _define_fbc_gene_products() list[dict]
- _define_reactions() tuple[DataFrame, DataFrame]
Extracts and defines reactions and their participating species.
This function iterates through all reactions in the SBML model, creating a DataFrame for reaction attributes and another for all participating species (reactants, products, and modifiers).
- Parameters:
sbml_model (SBML) – The SBML model to process.
- Returns:
A tuple containing two DataFrames: - The first DataFrame contains reaction attributes, indexed by reaction ID. - The second DataFrame lists all species participating in reactions.
- Return type:
tuple[pd.DataFrame, pd.DataFrame]
- _define_species(verbose: bool = False) tuple[DataFrame, DataFrame]
Extracts and defines species and compartmentalized species.
This function creates two DataFrames: one for unique molecular species (un-compartmentalized) and another for compartmentalized species, which represent a species within a specific compartment.
- Parameters:
verbose (bool) – extra reporting, defaults to False
- Returns:
A tuple containing two DataFrames: - The first DataFrame represents unique molecular species. - The second DataFrame represents compartmentalized species.
- Return type:
tuple[pd.DataFrame, pd.DataFrame]
- sbml_errors(reduced_log: bool = True, return_df: bool = False)
Formats and reports all errors found in the SBML file.
- Parameters:
reduced_log (bool, optional) – If True, aggregates errors by category and severity. Defaults to True.
return_df (bool, optional) – If True, returns a DataFrame of the errors. Otherwise, prints a styled summary. Defaults to False.
- Returns:
A DataFrame containing the error log if return_df is True and errors are present, otherwise None.
- Return type:
pd.DataFrame or None
- summary() DataFrame
Generates a styled summary of the SBML model.
- Returns:
A styled pandas DataFrame containing a summary of the model, including pathway name, ID, and counts of species and reactions.
- Return type:
pd.io.formats.style.Styler
- class napistu.ingestion.sbml.SBML_reaction(sbml_reaction: libsbml.Reaction)
Bases:
objectA convenience class for processing individual SBML reactions.
This class extracts and organizes key information about an SBML reaction, including its attributes and participating species (substrates, products, and modifiers).
- Parameters:
sbml_reaction (libsbml.Reaction) – A libsbml Reaction object to be processed.
- reaction_dict
A dictionary of reaction-level attributes, including its ID, name, reversibility, identifiers, and source information.
- Type:
dict
- species
A DataFrame listing all species participating in the reaction, including their roles (substrate, product, modifier), stoichiometry, and SBO terms.
- Type:
pd.DataFrame
- __init__(sbml_reaction: libsbml.Reaction) None
Initializes the SBML_reaction object by parsing a libsbml Reaction.
- napistu.ingestion.sbml._cv_to_Identifiers(entity: libsbml.Species | libsbml.Reaction | libsbml.Compartment, strict: bool = False) Identifiers
Convert an SBML controlled vocabulary element into a cpr Identifiers object.
- Parameters:
entity (libsbml.Species) – An entity (species, reaction, compartment, …) with attached CV terms
strict (bool, default True) – If True, log full tracebacks for parsing failures. If False, use simple warning messages.
- Returns:
An Identifiers object containing the CV terms
- Return type:
- napistu.ingestion.sbml._define_compartments_missing_cvterms(comp: libsbml.Compartment, aliases: dict) dict[str, Any]
- napistu.ingestion.sbml._extract_gene_products(association: libsbml.Association) list[dict]
Recursively extracts gene products from an association tree.
- napistu.ingestion.sbml._get_biological_qualifier_codes() dict
Lazily build the libsbml integer to BQB string mapping.
- napistu.ingestion.sbml._get_gene_product_dict(gp)
Extracts attributes of a gene product from an SBML reaction object.
- Parameters:
gp (libsbml.GeneProduct) – A libsbml GeneProduct object.
- Returns:
A dictionary containing the gene product’s ID, name, and identifiers.
- Return type:
dict
- napistu.ingestion.sbml._libsbml()
Import libsbml or raise ImportError with install hint (
pip install napistu[etl]).
- napistu.ingestion.sbml._refine_compartments(compartments_df, compartmentalized_species_df)
Refine compartments to only those actually used by compartmentalized species.
This function filters the compartments DataFrame to include only compartments that are referenced by compartmentalized species, and validates that all required compartments exist.
- Parameters:
compartments_df (pd.DataFrame) – DataFrame of all extracted compartments with c_id as index
compartmentalized_species_df (pd.DataFrame) – DataFrame of compartmentalized species with c_id column
- Returns:
Filtered compartments DataFrame containing only used compartments
- Return type:
pd.DataFrame
- Raises:
ValueError – If compartmentalized species reference compartments that don’t exist
- napistu.ingestion.sbml._validate_species_consistency(species_df, compartmentalized_species_df)
Validate consistency between species and compartmentalized species tables.
- Parameters:
species_df (pd.DataFrame) – DataFrame of species with s_id as index
compartmentalized_species_df (pd.DataFrame) – DataFrame of compartmentalized species with s_id column
- Raises:
ValueError – If there are inconsistencies between the two tables
- napistu.ingestion.sbml.sbml_dfs_from_sbml(self, sbml_model: SBML, compartment_aliases: dict | None = None, verbose: bool = False)
Parses an SBML model into a set of standardized DataFrames.
This function serves as the main entry point for converting an SBML model into the internal DataFrame-based representation used by napistu. It orchestrates the processing of compartments, species, and reactions.
- Parameters:
self (object) – The instance of the calling class, expected to have a schema attribute.
sbml_model (SBML) – The SBML model to be parsed.
compartment_aliases (dict, optional) – A dictionary to map custom compartment names to the napistu controlled vocabulary. If None, the default mapping (COMPARTMENT_ALIASES) is used. Defaults to None.
verbose (bool) – extra reporting, defaults to False
- Returns:
The calling class instance, now populated with DataFrames for compartments, species, compartmentalized_species, reactions, and reaction_species
- Return type:
object