napistu.sbml_dfs_core

The core SBML DataFrame class for representing a SBML model as a collection of pandas DataFrames.

Classes

SBML_dfs

A class representing a SBML model as a collection of pandas DataFrames.

Classes

SBML_dfs(sbml_model, model_source[, ...])

System Biology Markup Language Model Data Frames.

class napistu.sbml_dfs_core.SBML_dfs(sbml_model: sbml.SBML | MutableMapping[str, pd.DataFrame | dict[str, pd.DataFrame]], model_source: Source, validate: bool = True, resolve: bool = True, verbose: bool = False)

Bases: object

System Biology Markup Language Model Data Frames.

A class representing a SBML model as a collection of pandas DataFrames. This class provides methods for manipulating and analyzing biological pathway models with support for species, reactions, compartments, and their relationships.

compartments

Sub-cellular compartments in the model, indexed by compartment ID (c_id)

Type:

pd.DataFrame

species

Molecular species in the model, indexed by species ID (s_id)

Type:

pd.DataFrame

species_data

Additional data for species. Each DataFrame is indexed by species_id (s_id)

Type:

Dict[str, pd.DataFrame]

reactions

Reactions in the model, indexed by reaction ID (r_id)

Type:

pd.DataFrame

reactions_data

Additional data for reactions. Each DataFrame is indexed by reaction_id (r_id)

Type:

Dict[str, pd.DataFrame]

reaction_species

One entry per species participating in a reaction, indexed by reaction-species ID (rsc_id)

Type:

pd.DataFrame

schema

Dictionary representing the structure of the other attributes and meaning of their variables

Type:

dict

Public Methods (alphabetical)
----------------------------
add_reactions_data(label, data)

Add a new reactions data table to the model with validation.

add_species_data(label, data)

Add a new species data table to the model with validation.

copy

Return a deep copy of the SBML_dfs object.

export_sbml_dfs(model_prefix, outdir, overwrite=False, dogmatic=True)

Export the SBML_dfs model and its tables to files in a specified directory.

find_entity_references(entity_type, entity_ids, reference_type, reference_ids)

Find entities that reference specified entities through a given reference type.

from_edgelist(interaction_edgelist, species_df, compartments_df, interaction_source=Source(init=True), interaction_edgelist_defaults=INTERACTION_EDGELIST_DEFAULTS, keep_species_data=False, keep_reactions_data=False)

Create SBML_dfs from interaction edgelist.

from_pickle(path)

Load an SBML_dfs from a pickle file.

get_characteristic_species_ids(dogmatic=True)

Return characteristic systematic identifiers for molecular species, optionally using a strict or loose definition.

get_cspecies_features

Compute and return additional features for compartmentalized species, such as degree and type.

get_identifiers(id_type)

Retrieve a table of identifiers for a specified entity type (e.g., species or reactions).

get_ontology_cooccurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False)

Get ontology co-occurrence matrix for a specific entity type.

get_ontology_occurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False)

Get ontology occurrence summary for a specific entity type.

get_ontology_x_source_cooccurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False, characteristic_only=False, dogmatic=True, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS)

Get ontology × source co-occurrence matrix for a specific entity type.

get_sbo_term_occurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False)

Get SBO term occurrence summary for a specific entity type.

get_sbo_term_x_source_cooccurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False, characteristic_only=False, dogmatic=True, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS)

Get SBO term × source co-occurrence matrix for a specific entity type.

get_source_cooccurrence(entity_type, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS)

Get pathway co-occurrence matrix for a specific entity type.

get_source_occurrence(entity_type, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS)

Get pathway occurrence summary for a specific entity type.

get_sources(entity_type)

Get the unnest sources table for a given entity type.

get_source_total_counts(entity_type)

Get the total counts of each source for a given entity type.

get_species_features

Compute and return additional features for species, such as species type.

get_summary

Return a dictionary of diagnostic statistics summarizing the SBML_dfs structure.

get_table(entity_type, required_attributes=None)

Retrieve a table for a given entity type, optionally validating required attributes.

get_uri_urls(entity_type, entity_ids=None, required_ontology=None)

Return reference URLs for specified entities, optionally filtered by ontology.

infer_sbo_terms

Infer and fill in missing SBO terms for reaction species based on stoichiometry.

infer_uncompartmentalized_species_location

Infer and assign compartments for compartmentalized species with missing compartment information.

name_compartmentalized_species

Rename compartmentalized species to include compartment information if needed.

post_consensus_checks(entity_types=[SBML_DFS.SPECIES, SBML_DFS.COMPARTMENTS], check_types=[CONSENSUS_CHECKS.SOURCE_COOCCURRENCE, CONSENSUS_CHECKS.ONTOLOGY_X_SOURCE_COOCCURRENCE])

Perform checks on the SBML_dfs object after consensus building.

reaction_formulas(r_ids=None)

Generate human-readable reaction formulas for specified reactions.

reaction_summaries(r_ids=None)

Return a summary DataFrame for specified reactions, including names and formulas.

remove_entities(entity_type, entity_ids, remove_species=False)

Remove specified entities and optionally remove unused species.

remove_reactions_data(label)

Remove a reactions data table by label.

remove_species_data(label)

Remove a species data table by label.

remove_unused

Find and remove unused entities from the model with cascading cleanup.

search_by_ids(id_table, identifiers=None, ontologies=None, bqbs=None)

Find entities and identifiers matching a set of query IDs.

search_by_name(name, entity_type, partial_match=True)

Find entities by exact or partial name match.

select_species_data(species_data_table)

Select a species data table from the SBML_dfs object by name.

show_summary

Display a formatted summary of the SBML_dfs model.

species_status(s_id)

Return all reactions a species participates in, with stoichiometry and formula information.

to_dict

Return the 5 major SBML_dfs tables as a dictionary.

to_pickle(path)

Save the SBML_dfs to a pickle file.

validate

Validate the SBML_dfs structure and relationships.

validate_and_resolve

Validate and attempt to automatically fix common issues.

Private/Hidden Methods (alphabetical, appear after public methods)
-----------------------------------------------------------------
_attempt_resolve(e)
_edgelist_assemble_sbml_model(compartments, species, comp_species, reactions, reaction_species, species_data, reactions_data, keep_species_data, keep_reactions_data, extra_columns)
_find_invalid_entities_by_reference(entity_type, reference_type, reference_ids)
_find_underspecified_reactions_by_reference(reference_type, reference_ids)
_get_entity_data(entity_type, label)
_get_identifiers_table_for_ontology_occurrence(entity_type, characteristic_only=False, dogmatic=True)
_get_non_interactor_reactions
_remove_entities_direct(entity_type, entity_ids)
_remove_entity_data(entity_type, label)
_validate_entity_data_access(entity_type, label)
_validate_identifiers
_validate_pk_fk_correspondence
_validate_r_ids(r_ids)
_validate_reaction_species
_validate_reactions_data(reactions_data_table)
_validate_sources
_validate_species_data(species_data_table)
_validate_table(table_name)
classmethod _edgelist_assemble_sbml_model(compartments: DataFrame, species: DataFrame, comp_species: DataFrame, reactions: DataFrame, reaction_species: DataFrame, species_data: DataFrame, reactions_data: DataFrame, keep_species_data: bool | str, keep_reactions_data: bool | str, extra_columns: dict[str, list[str]], model_source: Source) SBML_dfs

Assemble the final SBML_dfs object.

Parameters:
  • compartments (pd.DataFrame) – Processed compartments data

  • species (pd.DataFrame) – Processed species data

  • comp_species (pd.DataFrame) – Compartmentalized species data

  • reactions (pd.DataFrame) – Reactions data

  • reaction_species (pd.DataFrame) – Reaction species relationships

  • species_data (pd.DataFrame) – Extra species data to include

  • reactions_data (pd.DataFrame) – Extra reactions data to include

  • keep_species_data (bool or str) – Label for species extra data

  • keep_reactions_data (bool or str) – Label for reactions extra data

  • extra_columns (dict) – Dictionary containing lists of extra column names

Returns:

Validated SBML data structure

Return type:

SBML_dfs

classmethod from_edgelist(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame, model_source: Source, interaction_edgelist_defaults: dict[str, Any] = {'compartment_downstream': 'cellular_component', 'compartment_upstream': 'cellular_component', 'r_isreversible': False, 'sbo_term_name_downstream': 'modified', 'sbo_term_name_upstream': 'modifier', 'stoichiometry_downstream': 0, 'stoichiometry_upstream': 0}, keep_species_data: bool | str = False, keep_reactions_data: bool | str = False, require_edgelist_consistency: bool = False) SBML_dfs

Create SBML_dfs from interaction edgelist.

Combines a set of molecular interactions into a mechanistic SBML_dfs model by processing interaction data, species information, and compartment definitions.

Parameters:
  • interaction_edgelist (pd.DataFrame) – Table containing molecular interactions with columns: - name_upstream : str, matches “s_name” from species_df - name_downstream : str, matches “s_name” from species_df - r_name : str, name for the interaction - r_Identifiers : Identifiers, supporting identifiers - compartment_upstream : str, matches “c_name” from compartments_df - compartment_downstream : str, matches “c_name” from compartments_df - sbo_term_name_upstream : str, SBO term defining interaction type - sbo_term_name_downstream : str, SBO term defining interaction type - stoichiometry_upstream : float, stoichiometry of upstream species - stoichiometry_downstream : float, stoichiometry of downstream species - r_isreversible : bool, whether reaction is reversible

  • species_df (pd.DataFrame) – Table defining molecular species with columns: - s_name : str, name of molecular species - s_Identifiers : Identifiers, species identifiers

  • compartments_df (pd.DataFrame) – Table defining compartments with columns: - c_name : str, name of compartment - c_Identifiers : Identifiers, compartment identifiers

  • model_source (Source) – Source annotations for the data source

  • interaction_edgelist_defaults (dict[str, Any], default INTERACTION_EDGELIST_DEFAULTS) – Default values for interaction edgelist columns

  • keep_species_data (bool or str, default False) – Whether to preserve extra species columns. If True, saves as ‘source’ label. If string, uses as custom label. If False, discards extra data.

  • keep_reactions_data (bool or str, default False) – Whether to preserve extra reaction columns. If True, saves as ‘source’ label. If string, uses as custom label. If False, discards extra data.

  • require_edgelist_consistency (bool, default False) – Whether to force the edgelist to be consistent with the species and compartments dataframes This is useful for cases where there may be reasonable departures between the edgelist and the species and compartments dataframes but the user wants to create an SBML_dfs model anyway

Returns:

Validated SBML data structure containing compartments, species, compartmentalized species, reactions, and reaction species tables.

Return type:

SBML_dfs

classmethod from_pickle(path: str) SBML_dfs

Load an SBML_dfs from a pickle file.

Parameters:

path (str) – Path to the pickle file

Returns:

The loaded SBML_dfs object

Return type:

SBML_dfs

__init__(sbml_model: sbml.SBML | MutableMapping[str, pd.DataFrame | dict[str, pd.DataFrame]], model_source: Source, validate: bool = True, resolve: bool = True, verbose: bool = False) None

Initialize a SBML_dfs object from a SBML model or dictionary of tables.

Parameters:
  • sbml_model (Union[sbml.SBML, MutableMapping[str, Union[pd.DataFrame, Dict[str, pd.DataFrame]]]]) – Either a SBML model produced by sbml.SBML() or a dictionary containing tables following the sbml_dfs schema

  • validate (bool, optional) – Whether to validate the model structure and relationships, by default True

  • resolve (bool, optional) – Whether to attempt automatic resolution of common issues, by default True

  • verbose (bool) – extra reporting, defaults to False

Raises:

ValueError – If the model structure is invalid and cannot be resolved

_attempt_resolve(e)
_find_invalid_entities_by_reference(entity_type: str, reference_type: str, reference_ids: set[str]) set[str]

Find and return orphaned entities based on broken foreign key references.

Parameters:
  • entity_type (str) – The entity type to check for orphans (the table with primary keys)

  • reference_type (str) – The type of foreign key reference to check

  • reference_ids (set[str]) – Specific reference IDs that were removed

Returns:

Set of primary keys that are orphaned and should be removed

Return type:

set[str]

_find_underspecified_reactions_by_reference(reference_type: str, reference_ids: set[str]) set[str]

Find reactions that would become underspecified after removing species.

Parameters:
  • reference_type (str) – The type of foreign key reference to check

  • reference_ids (set[str]) – Specific reference IDs that were removed

Returns:

  • set[str] – Set of reaction IDs that were orphaned and removed

  • set[str] – Set of reaction-species IDs that were orphaned and removed

_get_data_summary()

Summarize the data tables in the SBML_dfs object

_get_entity_data(entity_type: str, label: str) DataFrame

Get data from species_data or reactions_data by table name and label.

Parameters:
  • entity_type (str) – Name of the table to get data from (‘species’ or ‘reactions’)

  • label (str) – Label of the data to retrieve

Returns:

The requested data as a DataFrame

Return type:

pd.DataFrame

Raises:

ValueError – If entity_type is not ‘species’ or ‘reactions’, or if label doesn’t exist

_get_identifiers_table_for_ontology_occurrence(entity_type: str, characteristic_only: bool = False, dogmatic: bool = True) DataFrame

Get the appropriate identifiers table for ontology analysis.

This method handles the common logic for determining which identifiers table to use based on the characteristic_only and dogmatic parameters.

Parameters:
  • entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)

  • characteristic_only (bool, optional) – Whether to use only characteristic identifiers (only supported for species), by default False

  • dogmatic (bool, optional) – Whether to use dogmatic identifier filtering, by default True

Returns:

The appropriate identifiers table for ontology analysis

Return type:

pd.DataFrame

Raises:

ValueError – If the entity type is invalid

_get_non_interactor_reactions() DataFrame

Get reactions table filtered to exclude reactions that are all interactors.

Returns:

Reactions table with non-interactor reactions only

Return type:

pd.DataFrame

_remove_entities_direct(entity_type: str, entity_ids: list[str])

Directly remove entities without cascading cleanup.

Parameters:
  • entity_type (str) – The entity type to remove

  • entity_ids (list[str]) – IDs of entities to remove

_remove_entity_data(entity_type: str, label: str) None

Remove data from species_data or reactions_data by table name and label.

Parameters:
  • entity_type (str) – Name of the table to remove data from (‘species’ or ‘reactions’)

  • label (str) – Label of the data to remove

Raises:

ValueError – If entity_type is not ‘species’ or ‘reactions’, or if label doesn’t exist

_validate_entity_data_access(entity_type: str, label: str) MutableMapping[str, DataFrame] | None

Validate entity type and label, and return the data dictionary if valid.

Parameters:
  • entity_type (str) – Name of the table to access (‘species’ or ‘reactions’)

  • label (str) – Label of the data to access

Returns:

The data dictionary if entity_type and label are valid

Return type:

MutableMapping[str, pd.DataFrame]

Raises:

ValueError – If entity_type is not ‘species’ or ‘reactions’, or if label doesn’t exist

_validate_identifiers()

Validate identifiers in the model

Iterates through all tables and checks if the identifier columns are valid.

Raises:

ValueError – missing identifiers in the table

_validate_pk_fk_correspondence()

Check bidirectional primary key and foreign key correspondence for all tables in the schema.

Validates: 1. All foreign keys exist as primary keys (standard FK constraint) 2. All primary keys are referenced as foreign keys (referential completeness)

Raises ValueError if any FK constraint or referential completeness violations are found.

_validate_r_ids(r_ids: str | list[str] | None) list[str]
_validate_reaction_species()
_validate_reactions_data(reactions_data_table: DataFrame)

Validates reactions data attribute

Parameters:

reactions_data_table (pd.DataFrame) – a reactions data table

Raises:

ValueError – r_id not index name r_id index contains duplicates r_id not in reactions table

_validate_sources()

Validate sources in the model

Iterates through all tables and checks if the source columns are valid.

Raises:

ValueError – missing sources in the table

_validate_species_data(species_data_table: DataFrame)

Validates species data attribute

Parameters:

species_data_table (pd.DataFrame) – a species data table

Raises:

ValueError – s_id not index name s_id index contains duplicates s_id not in species table

_validate_table(table_name: str) None

Validate a table in this SBML_dfs object against its schema.

This is an internal method that validates a table that is part of this SBML_dfs object against the schema stored in self.schema.

Parameters:

table (str) – Name of the table to validate

Raises:

ValueError – If the table does not conform to its schema

add_reactions_data(label: str, data: DataFrame)

Add additional reaction data with validation.

Parameters:
  • label (str) – Label for the new data

  • data (pd.DataFrame) – Data to add, must be indexed by reaction_id

Raises:

ValueError – If the data is invalid or label already exists

add_species_data(label: str, data: DataFrame)

Add additional species data with validation.

Parameters:
  • label (str) – Label for the new data

  • data (pd.DataFrame) – Data to add, must be indexed by species_id

Raises:

ValueError – If the data is invalid or label already exists

copy()

Return a deep copy of the SBML_dfs object.

Returns:

A deep copy of the current SBML_dfs object.

Return type:

SBML_dfs

export_sbml_dfs(model_prefix: str, outdir: str, overwrite: bool = False, dogmatic: bool = True) None

Export SBML_dfs

Export summaries of species identifiers and each table underlying an SBML_dfs pathway model

Params

model_prefix: str

Label to prepend to all exported files

outdir: str

Path to an existing directory where results should be saved

overwrite: bool

Should the directory be overwritten if it already exists?

dogmatic: bool

If True then treat genes, transcript, and proteins as separate species. If False then treat them interchangeably.

rtype:

None

find_entity_references(entity_type: str, entity_ids: list[str]) dict[str, set[str]]

Find all entities that directly depend on the set of requested entities.

Parameters:
  • entity_type (str) – The initial entity type to remove

  • entity_ids (list[str]) – IDs of entities to remove

Returns:

Dictionary mapping entity types to sets of IDs that directly depend on the requested entities

Return type:

dict[str, set[str]]

get_characteristic_species_ids(dogmatic: bool = True) DataFrame

Get Characteristic Species IDs

List the systematic identifiers which are characteristic of molecular species, e.g., excluding subcomponents, and optionally, treating proteins, transcripts, and genes equiavlently.

Characteristic identifiers include: - the defining IDs (BQB_IS) if dogmatic is True, and BQB_IS, BQB_IS_ENCODED_BY, BQB_ENCODES if dogmatic = False. - small complexes (BQB_HAS_PART)

This function is useful for pulling out the species which are closely associated with a specific proteins, metabolites, etc.

Parameters:

dogmatic (bool, default=True) – Whether to use the dogmatic flag to determine which BQB attributes are valid.

Returns:

A DataFrame containing the systematic identifiers which are characteristic of molecular species.

Return type:

pd.DataFrame

get_cspecies_features() DataFrame

Get additional attributes of compartmentalized species.

Returns:

Compartmentalized species with additional features including: - sc_degree: Number of reactions the species participates in - sc_children: Number of reactions where species is consumed - sc_parents: Number of reactions where species is produced - species_type: Classification of the species

Return type:

pd.DataFrame

get_identifiers(id_type, filter_by_bqb=None, add_names=True, keep_source=False) DataFrame

Get identifiers from a specified entity type.

Parameters:
  • id_type (str) – Type of entity to get identifiers for (e.g., ‘species’, ‘reactions’)

  • filter_by_bqb (None, list, or str, optional) – Filter identifiers by biological qualifier (BQB) terms: - None: No filtering, return all identifiers (default) - list: List of specific BQB terms to include - “defining”: Use BQB_DEFINING_ATTRS (strict defining identifiers) - “loose”: Use BQB_DEFINING_ATTRS_LOOSE (includes encoded/encodes relationships)

  • add_names (bool, optional) – Whether to add entity names and other metadata from the entity table, by default True

  • keep_source (bool, optional) – Whether to include the source column in the output, by default False. Only applies when add_names=True. The source column is excluded by default as it’s typically not needed for identifier lookups.

Returns:

Table of identifiers for the specified entity type. If add_names=True, includes entity metadata; if add_names=False, returns only the core identifier data.

Return type:

pd.DataFrame

Raises:

ValueError – If id_type is invalid, identifiers are malformed, or filter_by_bqb is invalid

get_ontology_cooccurrence(entity_type: str, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, characteristic_only: bool = False, dogmatic: bool = True) DataFrame

Get ontology co-occurrence matrix for a specific entity type.

This method creates a co-occurrence matrix showing which ontologies share entities of the specified type, indicating ontology relationships and overlaps.

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:
  • entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)

  • stratify_by_bqb (bool, optional) – Whether to stratify by BQB (Biological Qualifier) terms, by default True

  • allow_col_multindex (bool, optional) – Whether to allow column multi-index, by default False

  • characteristic_only (bool, optional) – Whether to use only characteristic identifiers (only supported for species), by default False

  • dogmatic (bool, optional) – Whether to use dogmatic identifier filtering, by default True

Returns:

Co-occurrence matrix with ontologies as both rows and columns

Return type:

pd.DataFrame

Raises:

ValueError – If the entity type is invalid or identifiers are malformed

get_ontology_occurrence(entity_type: str, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, characteristic_only: bool = False, dogmatic: bool = True, include_missing: bool = False, binarize: bool = False) DataFrame

Get ontology occurrence summary for a specific entity type.

This method analyzes which ontologies are associated with entities of the specified type, providing a summary of ontology occurrence patterns.

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:
  • entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)

  • stratify_by_bqb (bool, optional) – Whether to stratify by BQB (Biological Qualifier) terms, by default True

  • allow_col_multindex (bool, optional) – Whether to allow column multi-index, by default False

  • characteristic_only (bool, optional) –

    Whether to only include characteristic identifiers. Only supported for species. If,

    true - returns only the characteristic identifiers (BQB_IS, and small complex BQB_HAS_PART annotations) false - returns all identifiers

  • dogmatic (bool, optional) – Whether to use a strict or loose definition of characteristic identifiers. Only applicable if characteristic_only is True and entity_type is SBML_DFS.SPECIES.

  • include_missing (bool, optional) – Whether to include missing entities in the result using add_missing_ids_column, by default False

  • binarize (bool, optional) – Whether to convert the result to binary values (0 vs 1+), by default False

Returns:

Summary of ontology occurrence patterns with entities as rows and ontologies as columns. If binarize=True, values are 0 or 1.

Return type:

pd.DataFrame

Raises:

ValueError – If the entity type is invalid or identifiers are malformed

get_ontology_x_source_cooccurrence(entity_type: str, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, characteristic_only: bool = False, dogmatic: bool = True, priority_pathways: list[str] = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) DataFrame

Get ontology × source co-occurrence matrix for a specific entity type.

This method creates a co-occurrence matrix showing the relationship between ontologies and sources (pathways) by calculating how many entities of the specified type are shared between each ontology-source pair.

The method combines ontology occurrence data with source occurrence data to create a cross-tabulation matrix where: - Rows represent ontologies - Columns represent sources/pathways - Values represent the number of entities shared between each ontology-source pair

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:
  • entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)

  • stratify_by_bqb (bool, optional) – Whether to stratify by BQB (Biological Qualifier) terms in ontology analysis, by default True

  • allow_col_multindex (bool, optional) – Whether to allow column multi-index in ontology analysis, by default False

  • characteristic_only (bool, optional) – Whether to use only characteristic identifiers in ontology analysis (only supported for species), by default False

  • dogmatic (bool, optional) – Whether to use dogmatic identifier filtering in ontology analysis, by default True

  • priority_pathways (list[str], optional) – List of pathway IDs to prioritize in the source analysis, by default DEFAULT_PRIORITIZED_PATHWAYS

Returns:

Co-occurrence matrix with ontologies as rows and sources as columns. Values represent the number of entities shared between each ontology-source pair.

Return type:

pd.DataFrame

Raises:

ValueError – If the entity type is invalid, identifiers are malformed, or source tables are empty

Examples

>>> # Get ontology × source co-occurrence for species
>>> cooccurrence_matrix = sbml_dfs.get_ontology_x_source_cooccurrence(SBML_DFS.SPECIES)
>>>
>>> # Use characteristic species only
>>> char_cooccurrence = sbml_dfs.get_ontology_x_source_cooccurrence(
...     SBML_DFS.SPECIES, characteristic_only=True
... )
>>>
>>> # Custom pathway priority
>>> custom_cooccurrence = sbml_dfs.get_ontology_x_source_cooccurrence(
...     SBML_DFS.SPECIES, priority_pathways=['reactome', 'kegg']
... )
get_sbo_term_occurrence(name_terms=True, include_interactor_reactions=False) DataFrame

Get the occurrence of SBO terms for reactions.

Note: By default, reactions that consist entirely of interactor species will be excluded from the analysis. This is mandatory for most of the other occurrence and co-occurrence methods.

Parameters:
  • name_terms (bool, optional) – Whether to name the SBO terms, by default True

  • include_interactor_reactions (bool, optional) – Whether to exclude interactor reactions, by default True

get_sbo_term_x_source_cooccurrence(name_terms: bool = True, priority_pathways: list[str] = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) DataFrame

Get SBO term × source co-occurrence matrix for reactions.

This method creates a co-occurrence matrix showing the relationship between SBO terms and sources (pathways) by calculating how many reactions are shared between each SBO term-source pair.

The method combines SBO term occurrence data with source occurrence data to create a cross-tabulation matrix where: - Rows represent SBO terms - Columns represent sources/pathways - Values represent the number of reactions shared between each SBO term-source pair

Note: Reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:
  • name_terms (bool, optional) – Whether to name the SBO terms using human-readable names, by default True

  • priority_pathways (list[str], optional) – List of pathway IDs to prioritize in the source analysis, by default DEFAULT_PRIORITIZED_PATHWAYS

Returns:

Co-occurrence matrix with SBO terms as rows and sources as columns. Values represent the number of reactions shared between each SBO term-source pair.

Return type:

pd.DataFrame

Raises:

ValueError – If source tables are empty

Examples

>>> # Get SBO term × source co-occurrence for reactions
>>> cooccurrence_matrix = sbml_dfs.get_sbo_term_x_source_cooccurrence()
>>>
>>> # Use numeric SBO term codes instead of names
>>> numeric_cooccurrence = sbml_dfs.get_sbo_term_x_source_cooccurrence(name_terms=False)
get_source_cooccurrence(entity_type: str, priority_pathways: list[str] | None = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) DataFrame

Get pathway co-occurrence matrix for a specific entity type.

This method creates a co-occurrence matrix showing which pathways share entities of the specified type, indicating pathway relationships and overlaps.

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:
  • entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)

  • priority_pathways (Optional[list[str]], default DEFAULT_PRIORITIZED_PATHWAYS) – List of pathway IDs to prioritize in the analysis. If None, uses all pathways without filtering or warnings.

Returns:

Co-occurrence matrix with pathways as both rows and columns

Return type:

pd.DataFrame

Raises:

ValueError – If the source tables for the entity type are empty (indicating single-source model)

get_source_occurrence(entity_type: str, priority_pathways: list[str] | None = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904'], include_missing: bool = False, binarize: bool = False) DataFrame

Get pathway occurrence summary for a specific entity type.

This method analyzes which pathways contain entities of the specified type, providing a summary of pathway occurrence patterns.

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:
  • entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)

  • priority_pathways (Optional[list[str]], default DEFAULT_PRIORITIZED_PATHWAYS) – List of pathway IDs to prioritize in the analysis. If None, uses all pathways without filtering or warnings.

  • include_missing (bool, optional) – Whether to include missing entities in the result using add_missing_ids_column, by default False

  • binarize (bool, optional) – Whether to convert the result to binary values (0 vs 1+), by default False

Returns:

Summary of pathway occurrence patterns. If binarize=True, values are 0 or 1.

Return type:

pd.DataFrame

Raises:

ValueError – If the source tables for the entity type are empty (indicating single-source model)

get_source_total_counts(entity_type: str) Series

Get the total counts of each source for a given entity type.

Parameters:

entity_type (str) – The type of entity to get the total counts of (e.g., ‘species’, ‘reactions’)

Returns:

Series containing the total counts of each source, indexed by pathway_id

Return type:

pd.Series

Raises:

ValueError – If entity_type is invalid

get_sources(entity_type: str) DataFrame | None

Get the unnest sources table for a given entity type.

Parameters:

entity_type (str) – The type of entity to get sources for (e.g., ‘species’, ‘reactions’)

Returns:

DataFrame containing the unnest sources table, or None if no sources found

Return type:

pd.DataFrame | None

Raises:

ValueError – If entity_type is invalid or does not have a source attribute

get_species_features() DataFrame

Get additional attributes of species.

Returns:

Species with additional features including: - species_type: Classification of the species (e.g., metabolite, protein)

Return type:

pd.DataFrame

get_summary() Mapping[str, Any]

Get diagnostic statistics about the SBML_dfs.

Returns:

Dictionary of diagnostic statistics including: - n_species_types: Number of species types - n_species_per_type: Number of species per type - n_entity_types: Dictionary of entity counts by type - dict_n_species_per_compartment: Number of species per compartment - stats_species_per_reactions: Statistics on reactands per reaction - top10_species_per_reactions: Top 10 reactions by number of reactands - sbo_name_counts: Count of reaction species by SBO term name - stats_degree: Statistics on species connectivity - top10_degree: Top 10 species by connectivity - species_ontology_counts: Count of species by ontology identifiers - data_summary: Summary of species and reaction data

Return type:

Mapping[str, Any]

get_table(entity_type: str, required_attributes: None | set[str] = None) DataFrame

Get a table from the SBML_dfs object with optional attribute validation.

Parameters:
  • entity_type (str) – The type of entity table to retrieve (e.g., ‘species’, ‘reactions’)

  • required_attributes (Optional[Set[str]], optional) – Set of attributes that must be present in the table, by default None. Must be passed as a set, e.g. {‘id’}, not a string.

Returns:

The requested table

Return type:

pd.DataFrame

Raises:
  • ValueError – If entity_type is invalid or required attributes are missing

  • TypeError – If required_attributes is not a set

get_uri_urls(entity_type: str, entity_ids: Iterable[str] | None = None, required_ontology: str | None = None) Series

Get reference URLs for specified entities.

Parameters:
  • entity_type (str) – Type of entity to get URLs for (e.g., ‘species’, ‘reactions’)

  • entity_ids (Optional[Iterable[str]], optional) – Specific entities to get URLs for, by default None (all entities)

  • required_ontology (Optional[str], optional) – Specific ontology to get URLs from, by default None

Returns:

Series mapping entity IDs to their reference URLs

Return type:

pd.Series

Raises:

ValueError – If entity_type is invalid

infer_sbo_terms()

Infer SBO Terms

Define SBO terms based on stoichiometry for reaction_species with missing terms. Modifies the SBML_dfs object in-place.

Return type:

None (modifies SBML_dfs object in-place)

infer_uncompartmentalized_species_location()

Infer Uncompartmentalized Species Location

If the compartment of a subset of compartmentalized species was not specified, infer an appropriate compartment from other members of reactions they participate in.

This method modifies the SBML_dfs object in-place.

Return type:

None (modifies SBML_dfs object in-place)

name_compartmentalized_species()

Name Compartmentalized Species

Rename compartmentalized species if they have the same name as their species. Modifies the SBML_dfs object in-place.

Return type:

None (modifies SBML_dfs object in-place)

post_consensus_checks(entity_types: list[str] = ['species', 'compartments'], check_types: list[str] = ['source_cooccurrence', 'ontology_x_source_cooccurrence']) None

Post-consensus checks

Perform checks on the SBML_dfs object after consensus building.

Parameters:
  • entity_types (list[str], optional) – Entity types to check

  • check_types (list[str], optional) – Check types to perform

Return type:

None

reaction_formulas(r_ids: str | list[str] | None = None) Series

Reaction Summary

Return human-readable formulas for reactions.

Parameters:

r_ids: [str], str or None

Reaction IDs or None for all reactions

returns:

formula_strs

rtype:

pd.Series

reaction_summaries(r_ids: str | list[str] | None = None) DataFrame

Reaction Summary

Return a summary of reactions.

Parameters:

r_ids: [str], str or None

Reaction IDs or None for all reactions

returns:

reaction_summaries_df – A table with r_id as an index and columns: - r_name: str, name of the reaction - r_formula_str: str, human-readable formula of the reaction

rtype:

pd.DataFrame

remove_entities(entity_type: str, entity_ids: Iterable[str], remove_references: bool = True)

Public method to remove entities and optionally clean up orphaned references.

Special handling for “cofactors” where literal cleanup of reactions based on reaction_species is allowed normally, removing substrates/products would remove the reaction.

Parameters:
  • entity_type (str) – The entity type (e.g., ‘reactions’, ‘compartmentalized_species’, ‘species’, ‘compartments’, or “cofactors”)

  • entity_ids (Iterable[str]) – IDs of entities to remove

  • remove_references (bool, default True) – Whether to remove orphaned references after entity removal

remove_reactions_data(label: str)

Remove reactions data by label.

remove_species_data(label: str)

Remove species data by label.

remove_unused() None

Find and remove unused entities from the model.

This method identifies unused entities using find_unused_entities and then cleans them up using the existing remove_entities method which properly handles cleanup of species_data and reactions_data as needed.

Returns:

Modifies the SBML_dfs object in-place

Return type:

None

search_by_ids(id_table: DataFrame, identifiers: str | list | set | None = None, ontologies: str | list | set | None = None, bqbs: str | list | set | None = ['BQB_IS', 'BQB_IS_HOMOLOG_TO', 'BQB_IS_ENCODED_BY', 'BQB_ENCODES', 'BQB_HAS_PART']) tuple[DataFrame, DataFrame]

Find entities and identifiers matching a set of query IDs.

Parameters:
  • id_table (pd.DataFrame) – DataFrame containing identifier mappings

  • identifiers (Optional[Union[str, list, set]], optional) – Identifiers to filter by, by default None

  • ontologies (Optional[Union[str, list, set]], optional) – Ontologies to filter by, by default None

  • bqbs (Optional[Union[str, list, set]], optional) – BQB terms to filter by, by default [BQB.IS, BQB.HAS_PART]

Returns:

  • Matching entities

  • Matching identifiers

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

Raises:
  • ValueError – If entity_type is invalid or ontologies are invalid

  • TypeError – If ontologies is not a set

search_by_name(name: str, entity_type: str, partial_match: bool = True) DataFrame

Find entities by exact or partial name match.

Parameters:
  • name (str) – Name to search for

  • entity_type (str) – Type of entity to search (e.g., ‘species’, ‘reactions’)

  • partial_match (bool, optional) – Whether to allow partial string matches, by default True

Returns:

Matching entities

Return type:

pd.DataFrame

select_species_data(species_data_table: str) DataFrame

Select a species data table from the SBML_dfs object.

Parameters:

species_data_table (str) – Name of the species data table to select

Returns:

The selected species data table

Return type:

pd.DataFrame

Raises:

ValueError – If species_data_table is not found

show_summary() None

Display a formatted summary of the SBML_dfs model.

This method chains together get_summary(), format_sbml_dfs_summary(), and show() to provide a convenient way to display network statistics.

Returns:

Displays the formatted summary table to console

Return type:

None

Examples

>>> sbml_dfs.show_network_summary()
species_status(s_id: str) DataFrame

Species Status

Return all of the reactions a species participates in.

Parameters: s_id: str

A species ID

Returns: pd.DataFrame, one row per reaction the species participates in with columns: - sc_name: str, name of the compartment the species participates in - stoichiometry: float, stoichiometry of the species in the reaction - r_name: str, name of the reaction - r_formula_str: str, human-readable formula of the reaction

to_dict() dict[str, DataFrame]

Return the 5 major SBML_dfs tables as a dictionary.

Returns:

Dictionary containing the core SBML_dfs tables: - ‘compartments’: Compartments table - ‘species’: Species table - ‘compartmentalized_species’: Compartmentalized species table - ‘reactions’: Reactions table - ‘reaction_species’: Reaction species table

Return type:

dict[str, pd.DataFrame]

to_pickle(path: str) None

Save the SBML_dfs to a pickle file.

Parameters:

path (str) – Path where to save the pickle file

validate()

Validate the SBML_dfs structure and relationships.

Checks: - Schema existence - Required tables presence - Individual table structure - Primary key uniqueness - Foreign key relationships - Optional data table validity - Reaction species validity

Raises:

ValueError – If any validation check fails

validate_and_resolve()

Validate and attempt to automatically fix common issues.

This method iteratively: 1. Attempts validation 2. If validation fails, tries to resolve the issue 3. Repeats until validation passes or issue cannot be resolved

Raises:

ValueError – If validation fails and cannot be automatically resolved

_optional_entities: set[str]
_required_entities: set[str]
compartments: DataFrame
reaction_species: DataFrame
reactions: DataFrame
reactions_data: dict[str, DataFrame]
schema: dict
species: DataFrame
species_data: dict[str, DataFrame]