napistu.sbml_dfs_core

The core SBML DataFrame class for representing a SBML model as a collection of pandas DataFrames.

Classes

SBML_dfs: A class representing a SBML model as a collection of pandas DataFrames.

Classes

SBML_dfs(sbml_model, model_source[, ...])

System Biology Markup Language Model Data Frames.

class napistu.sbml_dfs_core.SBML_dfs(sbml_model: sbml.SBML | MutableMapping[str, pd.DataFrame | dict[str, pd.DataFrame]], model_source: Source, validate: bool = True, resolve: bool = True, verbose: bool = False)

Bases: object

System Biology Markup Language Model Data Frames.

A class representing a SBML model as a collection of pandas DataFrames. This class provides methods for manipulating and analyzing biological pathway models with support for species, reactions, compartments, and their relationships.

compartments

Sub-cellular compartments in the model, indexed by compartment ID (c_id)

Type:: pd.DataFrame

species

Molecular species in the model, indexed by species ID (s_id)

Type:: pd.DataFrame

species_data

Additional data for species. Each DataFrame is indexed by species_id (s_id)

Type:: Dict[str, pd.DataFrame]

reactions

Reactions in the model, indexed by reaction ID (r_id)

Type:: pd.DataFrame

reactions_data

Additional data for reactions. Each DataFrame is indexed by reaction_id (r_id)

Type:: Dict[str, pd.DataFrame]

reaction_species

One entry per species participating in a reaction, indexed by reaction-species ID (rsc_id)

Type:: pd.DataFrame

schema

Dictionary representing the structure of the other attributes and meaning of their variables

Type:: dict

Public Methods (alphabetical)

----------------------------

add_reactions_data(label, data): Add a new reactions data table to the model with validation.

add_species_data(label, data): Add a new species data table to the model with validation.

copy: Return a deep copy of the SBML_dfs object.

export_sbml_dfs(model_prefix, outdir, overwrite=False, dogmatic=True): Export the SBML_dfs model and its tables to files in a specified directory.

find_entity_references(entity_type, entity_ids, reference_type, reference_ids): Find entities that reference specified entities through a given reference type.

from_edgelist(interaction_edgelist, species_df, compartments_df, interaction_source=Source(init=True), interaction_edgelist_defaults=INTERACTION_EDGELIST_DEFAULTS, keep_species_data=False, keep_reactions_data=False): Create SBML_dfs from interaction edgelist.

from_pickle(path): Load an SBML_dfs from a pickle file.

get_characteristic_species_ids(dogmatic=True): Return characteristic systematic identifiers for molecular species, optionally using a strict or loose definition.

get_cspecies_features: Compute and return additional features for compartmentalized species, such as degree and type.

get_identifiers(id_type): Retrieve a table of identifiers for a specified entity type (e.g., species or reactions).

get_ontology_cooccurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False): Get ontology co-occurrence matrix for a specific entity type.

get_ontology_occurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False): Get ontology occurrence summary for a specific entity type.

get_ontology_x_source_cooccurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False, characteristic_only=False, dogmatic=True, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS): Get ontology × source co-occurrence matrix for a specific entity type.

get_sbo_term_occurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False): Get SBO term occurrence summary for a specific entity type.

get_sbo_term_x_source_cooccurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False, characteristic_only=False, dogmatic=True, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS): Get SBO term × source co-occurrence matrix for a specific entity type.

get_source_cooccurrence(entity_type, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS): Get pathway co-occurrence matrix for a specific entity type.

get_source_occurrence(entity_type, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS): Get pathway occurrence summary for a specific entity type.

get_sources(entity_type): Get the unnest sources table for a given entity type.

get_source_total_counts(entity_type): Get the total counts of each source for a given entity type.

get_species_features: Compute and return additional features for species, such as species type.

get_summary: Return a dictionary of diagnostic statistics summarizing the SBML_dfs structure.

get_table(entity_type, required_attributes=None): Retrieve a table for a given entity type, optionally validating required attributes.

get_uri_urls(entity_type, entity_ids=None, required_ontology=None): Return reference URLs for specified entities, optionally filtered by ontology.

infer_sbo_terms: Infer and fill in missing SBO terms for reaction species based on stoichiometry.

infer_uncompartmentalized_species_location: Infer and assign compartments for compartmentalized species with missing compartment information.

name_compartmentalized_species: Rename compartmentalized species to include compartment information if needed.

post_consensus_checks(entity_types=[SBML_DFS.SPECIES, SBML_DFS.COMPARTMENTS], check_types=[CONSENSUS_CHECKS.SOURCE_COOCCURRENCE, CONSENSUS_CHECKS.ONTOLOGY_X_SOURCE_COOCCURRENCE]): Perform checks on the SBML_dfs object after consensus building.

reaction_formulas(r_ids=None): Generate human-readable reaction formulas for specified reactions.

reaction_summaries(r_ids=None): Return a summary DataFrame for specified reactions, including names and formulas.

remove_entities(entity_type, entity_ids, remove_species=False): Remove specified entities and optionally remove unused species.

remove_reactions_data(label): Remove a reactions data table by label.

remove_species_data(label): Remove a species data table by label.

remove_unused: Find and remove unused entities from the model with cascading cleanup.

search_by_ids(id_table, identifiers=None, ontologies=None, bqbs=None): Find entities and identifiers matching a set of query IDs.

search_by_name(name, entity_type, partial_match=True): Find entities by exact or partial name match.

select_species_data(species_data_table): Select a species data table from the SBML_dfs object by name.

show_summary: Display a formatted summary of the SBML_dfs model.

species_status(s_id): Return all reactions a species participates in, with stoichiometry and formula information.

to_dict: Return the 5 major SBML_dfs tables as a dictionary.

to_pickle(path): Save the SBML_dfs to a pickle file.

validate: Validate the SBML_dfs structure and relationships.

validate_and_resolve: Validate and attempt to automatically fix common issues.

Private/Hidden Methods (alphabetical, appear after public methods)

-----------------------------------------------------------------

_attempt_resolve(e)

_edgelist_assemble_sbml_model(compartments, species, comp_species, reactions, reaction_species, species_data, reactions_data, keep_species_data, keep_reactions_data, extra_columns)

_find_invalid_entities_by_reference(entity_type, reference_type, reference_ids)

_find_underspecified_reactions_by_reference(reference_type, reference_ids)

_get_entity_data(entity_type, label)

_get_identifiers_table_for_ontology_occurrence(entity_type, characteristic_only=False, dogmatic=True)

_get_non_interactor_reactions

_remove_entities_direct(entity_type, entity_ids)

_remove_entity_data(entity_type, label)

_validate_entity_data_access(entity_type, label)

_validate_identifiers

_validate_pk_fk_correspondence

_validate_r_ids(r_ids)

_validate_reaction_species

_validate_reactions_data(reactions_data_table)

_validate_sources

_validate_species_data(species_data_table)

_validate_table(table_name)

classmethod _edgelist_assemble_sbml_model(compartments: DataFrame, species: DataFrame, comp_species: DataFrame, reactions: DataFrame, reaction_species: DataFrame, species_data: DataFrame, reactions_data: DataFrame, keep_species_data: bool | str, keep_reactions_data: bool | str, extra_columns: dict[str, list[str]], model_source: Source) → SBML_dfs

Assemble the final SBML_dfs object.

Parameters:

compartments (pd.DataFrame) – Processed compartments data
species (pd.DataFrame) – Processed species data
comp_species (pd.DataFrame) – Compartmentalized species data
reactions (pd.DataFrame) – Reactions data
reaction_species (pd.DataFrame) – Reaction species relationships
species_data (pd.DataFrame) – Extra species data to include
reactions_data (pd.DataFrame) – Extra reactions data to include
keep_species_data (bool or str) – Label for species extra data
keep_reactions_data (bool or str) – Label for reactions extra data
extra_columns (dict) – Dictionary containing lists of extra column names

Returns:

Validated SBML data structure

Return type:

SBML_dfs

classmethod from_edgelist(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame, model_source: Source, interaction_edgelist_defaults: dict[str, Any] = {'compartment_downstream': 'cellular_component', 'compartment_upstream': 'cellular_component', 'r_isreversible': False, 'sbo_term_name_downstream': 'modified', 'sbo_term_name_upstream': 'modifier', 'stoichiometry_downstream': 0, 'stoichiometry_upstream': 0}, keep_species_data: bool | str = False, keep_reactions_data: bool | str = False, require_edgelist_consistency: bool = False) → SBML_dfs

Create SBML_dfs from interaction edgelist.

Combines a set of molecular interactions into a mechanistic SBML_dfs model by processing interaction data, species information, and compartment definitions.

Parameters:

interaction_edgelist (pd.DataFrame) – Table containing molecular interactions with columns: - name_upstream : str, matches “s_name” from species_df - name_downstream : str, matches “s_name” from species_df - r_name : str, name for the interaction - r_Identifiers : Identifiers, supporting identifiers - compartment_upstream : str, matches “c_name” from compartments_df - compartment_downstream : str, matches “c_name” from compartments_df - sbo_term_name_upstream : str, SBO term defining interaction type - sbo_term_name_downstream : str, SBO term defining interaction type - stoichiometry_upstream : float, stoichiometry of upstream species - stoichiometry_downstream : float, stoichiometry of downstream species - r_isreversible : bool, whether reaction is reversible
species_df (pd.DataFrame) – Table defining molecular species with columns: - s_name : str, name of molecular species - s_Identifiers : Identifiers, species identifiers
compartments_df (pd.DataFrame) – Table defining compartments with columns: - c_name : str, name of compartment - c_Identifiers : Identifiers, compartment identifiers
model_source (Source) – Source annotations for the data source
interaction_edgelist_defaults (dict[str, Any], default INTERACTION_EDGELIST_DEFAULTS) – Default values for interaction edgelist columns
keep_species_data (bool or str, default False) – Whether to preserve extra species columns. If True, saves as ‘source’ label. If string, uses as custom label. If False, discards extra data.
keep_reactions_data (bool or str, default False) – Whether to preserve extra reaction columns. If True, saves as ‘source’ label. If string, uses as custom label. If False, discards extra data.
require_edgelist_consistency (bool, default False) – Whether to force the edgelist to be consistent with the species and compartments dataframes This is useful for cases where there may be reasonable departures between the edgelist and the species and compartments dataframes but the user wants to create an SBML_dfs model anyway

Returns:

Validated SBML data structure containing compartments, species, compartmentalized species, reactions, and reaction species tables.

Return type:

SBML_dfs

classmethod from_pickle(path: str) → SBML_dfs

Load an SBML_dfs from a pickle file.

Parameters:: path (str) – Path to the pickle file
Returns:: The loaded SBML_dfs object
Return type:: SBML_dfs

__init__(sbml_model: sbml.SBML | MutableMapping[str, pd.DataFrame | dict[str, pd.DataFrame]], model_source: Source, validate: bool = True, resolve: bool = True, verbose: bool = False) → None

Initialize a SBML_dfs object from a SBML model or dictionary of tables.

Parameters:

sbml_model (Union[sbml.SBML, MutableMapping[str, Union[pd.DataFrame, Dict[str, pd.DataFrame]]]]) – Either a SBML model produced by sbml.SBML() or a dictionary containing tables following the sbml_dfs schema
validate (bool, optional) – Whether to validate the model structure and relationships, by default True
resolve (bool, optional) – Whether to attempt automatic resolution of common issues, by default True
verbose (bool) – extra reporting, defaults to False

Raises:

ValueError – If the model structure is invalid and cannot be resolved

_attempt_resolve(e)

_find_invalid_entities_by_reference(entity_type: str, reference_type: str, reference_ids: set[str]) → set[str]

Find and return orphaned entities based on broken foreign key references.

Parameters:

entity_type (str) – The entity type to check for orphans (the table with primary keys)
reference_type (str) – The type of foreign key reference to check
reference_ids (set[str]) – Specific reference IDs that were removed

Returns:

Set of primary keys that are orphaned and should be removed

Return type:

set[str]

_find_underspecified_reactions_by_reference(reference_type: str, reference_ids: set[str]) → set[str]

Find reactions that would become underspecified after removing species.

Parameters:

reference_type (str) – The type of foreign key reference to check
reference_ids (set[str]) – Specific reference IDs that were removed

Returns:

set[str] – Set of reaction IDs that were orphaned and removed
set[str] – Set of reaction-species IDs that were orphaned and removed

_get_data_summary(): Summarize the data tables in the SBML_dfs object

_get_entity_data(entity_type: str, label: str) → DataFrame

Get data from species_data or reactions_data by table name and label.

Parameters:

entity_type (str) – Name of the table to get data from (‘species’ or ‘reactions’)
label (str) – Label of the data to retrieve

Returns:

The requested data as a DataFrame

Return type:

pd.DataFrame

Raises:

ValueError – If entity_type is not ‘species’ or ‘reactions’, or if label doesn’t exist

_get_identifiers_table_for_ontology_occurrence(entity_type: str, characteristic_only: bool = False, dogmatic: bool = True) → DataFrame

Get the appropriate identifiers table for ontology analysis.

This method handles the common logic for determining which identifiers table to use based on the characteristic_only and dogmatic parameters.

Parameters:

entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
characteristic_only (bool, optional) – Whether to use only characteristic identifiers (only supported for species), by default False
dogmatic (bool, optional) – Whether to use dogmatic identifier filtering, by default True

Returns:

The appropriate identifiers table for ontology analysis

Return type:

pd.DataFrame

Raises:

ValueError – If the entity type is invalid

_get_non_interactor_reactions() → DataFrame

Get reactions table filtered to exclude reactions that are all interactors.

Returns:: Reactions table with non-interactor reactions only
Return type:: pd.DataFrame

_remove_entities_direct(entity_type: str, entity_ids: list[str])

Directly remove entities without cascading cleanup.

Parameters:

entity_type (str) – The entity type to remove
entity_ids (list[str]) – IDs of entities to remove

_remove_entity_data(entity_type: str, label: str) → None

Remove data from species_data or reactions_data by table name and label.

Parameters:

entity_type (str) – Name of the table to remove data from (‘species’ or ‘reactions’)
label (str) – Label of the data to remove

Raises:

ValueError – If entity_type is not ‘species’ or ‘reactions’, or if label doesn’t exist

_validate_entity_data_access(entity_type: str, label: str) → MutableMapping[str, DataFrame] | None

Validate entity type and label, and return the data dictionary if valid.

Parameters:

entity_type (str) – Name of the table to access (‘species’ or ‘reactions’)
label (str) – Label of the data to access

Returns:

The data dictionary if entity_type and label are valid

Return type:

MutableMapping[str, pd.DataFrame]

Raises:

ValueError – If entity_type is not ‘species’ or ‘reactions’, or if label doesn’t exist

_validate_identifiers()

Validate identifiers in the model

Iterates through all tables and checks if the identifier columns are valid.

Raises:: ValueError – missing identifiers in the table

_validate_pk_fk_correspondence()

Check bidirectional primary key and foreign key correspondence for all tables in the schema.

Validates: 1. All foreign keys exist as primary keys (standard FK constraint) 2. All primary keys are referenced as foreign keys (referential completeness)

Raises ValueError if any FK constraint or referential completeness violations are found.

_validate_r_ids(r_ids: str | list[str] | None) → list[str]

_validate_reaction_species()

_validate_reactions_data(reactions_data_table: DataFrame)

Validates reactions data attribute

Parameters:: reactions_data_table (pd.DataFrame) – a reactions data table
Raises:: ValueError – r_id not index name r_id index contains duplicates r_id not in reactions table

_validate_sources()

Validate sources in the model

Iterates through all tables and checks if the source columns are valid.

Raises:: ValueError – missing sources in the table

_validate_species_data(species_data_table: DataFrame)

Validates species data attribute

Parameters:: species_data_table (pd.DataFrame) – a species data table
Raises:: ValueError – s_id not index name s_id index contains duplicates s_id not in species table

_validate_table(table_name: str) → None

Validate a table in this SBML_dfs object against its schema.

This is an internal method that validates a table that is part of this SBML_dfs object against the schema stored in self.schema.

Parameters:: table (str) – Name of the table to validate
Raises:: ValueError – If the table does not conform to its schema

add_reactions_data(label: str, data: DataFrame)

Add additional reaction data with validation.

Parameters:

label (str) – Label for the new data
data (pd.DataFrame) – Data to add, must be indexed by reaction_id

Raises:

ValueError – If the data is invalid or label already exists

add_species_data(label: str, data: DataFrame)

Add additional species data with validation.

Parameters:

label (str) – Label for the new data
data (pd.DataFrame) – Data to add, must be indexed by species_id

Raises:

ValueError – If the data is invalid or label already exists

copy()

Return a deep copy of the SBML_dfs object.

Returns:: A deep copy of the current SBML_dfs object.
Return type:: SBML_dfs

export_sbml_dfs(model_prefix: str, outdir: str, overwrite: bool = False, dogmatic: bool = True) → None

Export SBML_dfs

Export summaries of species identifiers and each table underlying an SBML_dfs pathway model

Params

model_prefix: str: Label to prepend to all exported files
outdir: str: Path to an existing directory where results should be saved
overwrite: bool: Should the directory be overwritten if it already exists?
dogmatic: bool: If True then treat genes, transcript, and proteins as separate species. If False then treat them interchangeably.

rtype:: None

find_entity_references(entity_type: str, entity_ids: list[str]) → dict[str, set[str]]

Find all entities that directly depend on the set of requested entities.

Parameters:

entity_type (str) – The initial entity type to remove
entity_ids (list[str]) – IDs of entities to remove

Returns:

Dictionary mapping entity types to sets of IDs that directly depend on the requested entities

Return type:

dict[str, set[str]]

get_characteristic_species_ids(dogmatic: bool = True) → DataFrame

Get Characteristic Species IDs

List the systematic identifiers which are characteristic of molecular species, e.g., excluding subcomponents, and optionally, treating proteins, transcripts, and genes equiavlently.

Characteristic identifiers include: - the defining IDs (BQB_IS) if dogmatic is True, and BQB_IS, BQB_IS_ENCODED_BY, BQB_ENCODES if dogmatic = False. - small complexes (BQB_HAS_PART)

This function is useful for pulling out the species which are closely associated with a specific proteins, metabolites, etc.

Parameters:: dogmatic (bool, default=True) – Whether to use the dogmatic flag to determine which BQB attributes are valid.
Returns:: A DataFrame containing the systematic identifiers which are characteristic of molecular species.
Return type:: pd.DataFrame

get_cspecies_features() → DataFrame

Get additional attributes of compartmentalized species.

Returns:: Compartmentalized species with additional features including: - sc_degree: Number of reactions the species participates in - sc_children: Number of reactions where species is consumed - sc_parents: Number of reactions where species is produced - species_type: Classification of the species
Return type:: pd.DataFrame

get_identifiers(id_type, filter_by_bqb=None, add_names=True, keep_source=False) → DataFrame

Get identifiers from a specified entity type.

Parameters:

id_type (str) – Type of entity to get identifiers for (e.g., ‘species’, ‘reactions’)
filter_by_bqb (None, list, or str, optional) – Filter identifiers by biological qualifier (BQB) terms: - None: No filtering, return all identifiers (default) - list: List of specific BQB terms to include - “defining”: Use BQB_DEFINING_ATTRS (strict defining identifiers) - “loose”: Use BQB_DEFINING_ATTRS_LOOSE (includes encoded/encodes relationships)
add_names (bool, optional) – Whether to add entity names and other metadata from the entity table, by default True
keep_source (bool, optional) – Whether to include the source column in the output, by default False. Only applies when add_names=True. The source column is excluded by default as it’s typically not needed for identifier lookups.

Returns:

Table of identifiers for the specified entity type. If add_names=True, includes entity metadata; if add_names=False, returns only the core identifier data.

Return type:

pd.DataFrame

Raises:

ValueError – If id_type is invalid, identifiers are malformed, or filter_by_bqb is invalid

get_ontology_cooccurrence(entity_type: str, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, characteristic_only: bool = False, dogmatic: bool = True) → DataFrame

Get ontology co-occurrence matrix for a specific entity type.

This method creates a co-occurrence matrix showing which ontologies share entities of the specified type, indicating ontology relationships and overlaps.

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:

entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
stratify_by_bqb (bool, optional) – Whether to stratify by BQB (Biological Qualifier) terms, by default True
allow_col_multindex (bool, optional) – Whether to allow column multi-index, by default False
characteristic_only (bool, optional) – Whether to use only characteristic identifiers (only supported for species), by default False
dogmatic (bool, optional) – Whether to use dogmatic identifier filtering, by default True

Returns:

Co-occurrence matrix with ontologies as both rows and columns

Return type:

pd.DataFrame

Raises:

ValueError – If the entity type is invalid or identifiers are malformed

get_ontology_occurrence(entity_type: str, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, characteristic_only: bool = False, dogmatic: bool = True, include_missing: bool = False, binarize: bool = False) → DataFrame

Get ontology occurrence summary for a specific entity type.

This method analyzes which ontologies are associated with entities of the specified type, providing a summary of ontology occurrence patterns.

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:

entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
stratify_by_bqb (bool, optional) – Whether to stratify by BQB (Biological Qualifier) terms, by default True
allow_col_multindex (bool, optional) – Whether to allow column multi-index, by default False
characteristic_only (bool, optional) –

Whether to only include characteristic identifiers. Only supported for species. If,
true - returns only the characteristic identifiers (BQB_IS, and small complex BQB_HAS_PART annotations) false - returns all identifiers
dogmatic (bool, optional) – Whether to use a strict or loose definition of characteristic identifiers. Only applicable if characteristic_only is True and entity_type is SBML_DFS.SPECIES.
include_missing (bool, optional) – Whether to include missing entities in the result using add_missing_ids_column, by default False
binarize (bool, optional) – Whether to convert the result to binary values (0 vs 1+), by default False

Returns:

Summary of ontology occurrence patterns with entities as rows and ontologies as columns. If binarize=True, values are 0 or 1.

Return type:

pd.DataFrame

Raises:

ValueError – If the entity type is invalid or identifiers are malformed

get_ontology_x_source_cooccurrence(entity_type: str, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, characteristic_only: bool = False, dogmatic: bool = True, priority_pathways: list[str] = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) → DataFrame

Get ontology × source co-occurrence matrix for a specific entity type.

This method creates a co-occurrence matrix showing the relationship between ontologies and sources (pathways) by calculating how many entities of the specified type are shared between each ontology-source pair.

The method combines ontology occurrence data with source occurrence data to create a cross-tabulation matrix where: - Rows represent ontologies - Columns represent sources/pathways - Values represent the number of entities shared between each ontology-source pair

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:

entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
stratify_by_bqb (bool, optional) – Whether to stratify by BQB (Biological Qualifier) terms in ontology analysis, by default True
allow_col_multindex (bool, optional) – Whether to allow column multi-index in ontology analysis, by default False
characteristic_only (bool, optional) – Whether to use only characteristic identifiers in ontology analysis (only supported for species), by default False
dogmatic (bool, optional) – Whether to use dogmatic identifier filtering in ontology analysis, by default True
priority_pathways (list[str], optional) – List of pathway IDs to prioritize in the source analysis, by default DEFAULT_PRIORITIZED_PATHWAYS

Returns:

Co-occurrence matrix with ontologies as rows and sources as columns. Values represent the number of entities shared between each ontology-source pair.

Return type:

pd.DataFrame

Raises:

ValueError – If the entity type is invalid, identifiers are malformed, or source tables are empty

Examples

>>> # Get ontology × source co-occurrence for species
>>> cooccurrence_matrix = sbml_dfs.get_ontology_x_source_cooccurrence(SBML_DFS.SPECIES)
>>>
>>> # Use characteristic species only
>>> char_cooccurrence = sbml_dfs.get_ontology_x_source_cooccurrence(
...     SBML_DFS.SPECIES, characteristic_only=True
... )
>>>
>>> # Custom pathway priority
>>> custom_cooccurrence = sbml_dfs.get_ontology_x_source_cooccurrence(
...     SBML_DFS.SPECIES, priority_pathways=['reactome', 'kegg']
... )

get_sbo_term_occurrence(name_terms=True, include_interactor_reactions=False) → DataFrame

Get the occurrence of SBO terms for reactions.

Note: By default, reactions that consist entirely of interactor species will be excluded from the analysis. This is mandatory for most of the other occurrence and co-occurrence methods.

Parameters:

name_terms (bool, optional) – Whether to name the SBO terms, by default True
include_interactor_reactions (bool, optional) – Whether to exclude interactor reactions, by default True

get_sbo_term_x_source_cooccurrence(name_terms: bool = True, priority_pathways: list[str] = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) → DataFrame

Get SBO term × source co-occurrence matrix for reactions.

This method creates a co-occurrence matrix showing the relationship between SBO terms and sources (pathways) by calculating how many reactions are shared between each SBO term-source pair.

The method combines SBO term occurrence data with source occurrence data to create a cross-tabulation matrix where: - Rows represent SBO terms - Columns represent sources/pathways - Values represent the number of reactions shared between each SBO term-source pair

Note: Reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:

name_terms (bool, optional) – Whether to name the SBO terms using human-readable names, by default True
priority_pathways (list[str], optional) – List of pathway IDs to prioritize in the source analysis, by default DEFAULT_PRIORITIZED_PATHWAYS

Returns:

Co-occurrence matrix with SBO terms as rows and sources as columns. Values represent the number of reactions shared between each SBO term-source pair.

Return type:

pd.DataFrame

Raises:

ValueError – If source tables are empty

Examples

>>> # Get SBO term × source co-occurrence for reactions
>>> cooccurrence_matrix = sbml_dfs.get_sbo_term_x_source_cooccurrence()
>>>
>>> # Use numeric SBO term codes instead of names
>>> numeric_cooccurrence = sbml_dfs.get_sbo_term_x_source_cooccurrence(name_terms=False)

get_source_cooccurrence(entity_type: str, priority_pathways: list[str] | None = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) → DataFrame

Get pathway co-occurrence matrix for a specific entity type.

This method creates a co-occurrence matrix showing which pathways share entities of the specified type, indicating pathway relationships and overlaps.

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:

entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
priority_pathways (Optional[list[str]], default DEFAULT_PRIORITIZED_PATHWAYS) – List of pathway IDs to prioritize in the analysis. If None, uses all pathways without filtering or warnings.

Returns:

Co-occurrence matrix with pathways as both rows and columns

Return type:

pd.DataFrame

Raises:

ValueError – If the source tables for the entity type are empty (indicating single-source model)

get_source_occurrence(entity_type: str, priority_pathways: list[str] | None = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904'], include_missing: bool = False, binarize: bool = False) → DataFrame

Get pathway occurrence summary for a specific entity type.

This method analyzes which pathways contain entities of the specified type, providing a summary of pathway occurrence patterns.

Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.

Parameters:

entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
priority_pathways (Optional[list[str]], default DEFAULT_PRIORITIZED_PATHWAYS) – List of pathway IDs to prioritize in the analysis. If None, uses all pathways without filtering or warnings.
include_missing (bool, optional) – Whether to include missing entities in the result using add_missing_ids_column, by default False
binarize (bool, optional) – Whether to convert the result to binary values (0 vs 1+), by default False

Returns:

Summary of pathway occurrence patterns. If binarize=True, values are 0 or 1.

Return type:

pd.DataFrame

Raises:

ValueError – If the source tables for the entity type are empty (indicating single-source model)

get_source_total_counts(entity_type: str) → Series

Get the total counts of each source for a given entity type.

Parameters:: entity_type (str) – The type of entity to get the total counts of (e.g., ‘species’, ‘reactions’)
Returns:: Series containing the total counts of each source, indexed by pathway_id
Return type:: pd.Series
Raises:: ValueError – If entity_type is invalid

get_sources(entity_type: str) → DataFrame | None

Get the unnest sources table for a given entity type.

Parameters:: entity_type (str) – The type of entity to get sources for (e.g., ‘species’, ‘reactions’)
Returns:: DataFrame containing the unnest sources table, or None if no sources found
Return type:: pd.DataFrame | None
Raises:: ValueError – If entity_type is invalid or does not have a source attribute

get_species_features() → DataFrame

Get additional attributes of species.

Returns:: Species with additional features including: - species_type: Classification of the species (e.g., metabolite, protein)
Return type:: pd.DataFrame

get_summary() → Mapping[str, Any]

Get diagnostic statistics about the SBML_dfs.

Returns:: Dictionary of diagnostic statistics including: - n_species_types: Number of species types - n_species_per_type: Number of species per type - n_entity_types: Dictionary of entity counts by type - dict_n_species_per_compartment: Number of species per compartment - stats_species_per_reactions: Statistics on reactands per reaction - top10_species_per_reactions: Top 10 reactions by number of reactands - sbo_name_counts: Count of reaction species by SBO term name - stats_degree: Statistics on species connectivity - top10_degree: Top 10 species by connectivity - species_ontology_counts: Count of species by ontology identifiers - data_summary: Summary of species and reaction data
Return type:: Mapping[str, Any]

get_table(entity_type: str, required_attributes: None | set[str] = None) → DataFrame

Get a table from the SBML_dfs object with optional attribute validation.

Parameters:

entity_type (str) – The type of entity table to retrieve (e.g., ‘species’, ‘reactions’)
required_attributes (Optional[Set[str]], optional) – Set of attributes that must be present in the table, by default None. Must be passed as a set, e.g. {‘id’}, not a string.

Returns:

The requested table

Return type:

pd.DataFrame

Raises:

ValueError – If entity_type is invalid or required attributes are missing
TypeError – If required_attributes is not a set

get_uri_urls(entity_type: str, entity_ids: Iterable[str] | None = None, required_ontology: str | None = None) → Series

Get reference URLs for specified entities.

Parameters:

entity_type (str) – Type of entity to get URLs for (e.g., ‘species’, ‘reactions’)
entity_ids (Optional[Iterable[str]], optional) – Specific entities to get URLs for, by default None (all entities)
required_ontology (Optional[str], optional) – Specific ontology to get URLs from, by default None

Returns:

Series mapping entity IDs to their reference URLs

Return type:

pd.Series

Raises:

ValueError – If entity_type is invalid

infer_sbo_terms()

Infer SBO Terms

Define SBO terms based on stoichiometry for reaction_species with missing terms. Modifies the SBML_dfs object in-place.

Return type:: None (modifies SBML_dfs object in-place)

infer_uncompartmentalized_species_location()

Infer Uncompartmentalized Species Location

If the compartment of a subset of compartmentalized species was not specified, infer an appropriate compartment from other members of reactions they participate in.

This method modifies the SBML_dfs object in-place.

Return type:: None (modifies SBML_dfs object in-place)

name_compartmentalized_species()

Name Compartmentalized Species

Rename compartmentalized species if they have the same name as their species. Modifies the SBML_dfs object in-place.

Return type:: None (modifies SBML_dfs object in-place)

post_consensus_checks(entity_types: list[str] = ['species', 'compartments'], check_types: list[str] = ['source_cooccurrence', 'ontology_x_source_cooccurrence']) → None

Post-consensus checks

Perform checks on the SBML_dfs object after consensus building.

Parameters:

entity_types (list[str], optional) – Entity types to check
check_types (list[str], optional) – Check types to perform

Return type:

None

reaction_formulas(r_ids: str | list[str] | None = None) → Series

Reaction Summary

Return human-readable formulas for reactions.

Parameters:

r_ids: [str], str or None: Reaction IDs or None for all reactions

returns:: formula_strs
rtype:: pd.Series

reaction_summaries(r_ids: str | list[str] | None = None) → DataFrame

Reaction Summary

Return a summary of reactions.

Parameters:

r_ids: [str], str or None: Reaction IDs or None for all reactions

returns:: reaction_summaries_df – A table with r_id as an index and columns: - r_name: str, name of the reaction - r_formula_str: str, human-readable formula of the reaction
rtype:: pd.DataFrame

remove_entities(entity_type: str, entity_ids: Iterable[str], remove_references: bool = True)

Public method to remove entities and optionally clean up orphaned references.

Special handling for “cofactors” where literal cleanup of reactions based on reaction_species is allowed normally, removing substrates/products would remove the reaction.

Parameters:

entity_type (str) – The entity type (e.g., ‘reactions’, ‘compartmentalized_species’, ‘species’, ‘compartments’, or “cofactors”)
entity_ids (Iterable[str]) – IDs of entities to remove
remove_references (bool, default True) – Whether to remove orphaned references after entity removal

remove_reactions_data(label: str): Remove reactions data by label.

remove_species_data(label: str): Remove species data by label.

remove_unused() → None

Find and remove unused entities from the model.

This method identifies unused entities using find_unused_entities and then cleans them up using the existing remove_entities method which properly handles cleanup of species_data and reactions_data as needed.

Returns:: Modifies the SBML_dfs object in-place
Return type:: None

Find entities and identifiers matching a set of query IDs.

Parameters:

id_table (pd.DataFrame) – DataFrame containing identifier mappings
identifiers (Optional[Union[str, list, set]], optional) – Identifiers to filter by, by default None
ontologies (Optional[Union[str, list, set]], optional) – Ontologies to filter by, by default None
bqbs (Optional[Union[str, list, set]], optional) – BQB terms to filter by, by default [BQB.IS, BQB.HAS_PART]

Returns:

Matching entities
Matching identifiers

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

Raises:

ValueError – If entity_type is invalid or ontologies are invalid
TypeError – If ontologies is not a set

search_by_name(name: str, entity_type: str, partial_match: bool = True) → DataFrame

Find entities by exact or partial name match.

Parameters:

name (str) – Name to search for
entity_type (str) – Type of entity to search (e.g., ‘species’, ‘reactions’)
partial_match (bool, optional) – Whether to allow partial string matches, by default True

Returns:

Matching entities

Return type:

pd.DataFrame

select_species_data(species_data_table: str) → DataFrame

Select a species data table from the SBML_dfs object.

Parameters:: species_data_table (str) – Name of the species data table to select
Returns:: The selected species data table
Return type:: pd.DataFrame
Raises:: ValueError – If species_data_table is not found

show_summary() → None

Display a formatted summary of the SBML_dfs model.

This method chains together get_summary(), format_sbml_dfs_summary(), and show() to provide a convenient way to display network statistics.

Returns:: Displays the formatted summary table to console
Return type:: None

Examples

>>> sbml_dfs.show_network_summary()

species_status(s_id: str) → DataFrame

Species Status

Return all of the reactions a species participates in.

Parameters: s_id: str

A species ID

Returns: pd.DataFrame, one row per reaction the species participates in with columns: - sc_name: str, name of the compartment the species participates in - stoichiometry: float, stoichiometry of the species in the reaction - r_name: str, name of the reaction - r_formula_str: str, human-readable formula of the reaction

to_dict() → dict[str, DataFrame]

Return the 5 major SBML_dfs tables as a dictionary.

Returns:: Dictionary containing the core SBML_dfs tables: - ‘compartments’: Compartments table - ‘species’: Species table - ‘compartmentalized_species’: Compartmentalized species table - ‘reactions’: Reactions table - ‘reaction_species’: Reaction species table
Return type:: dict[str, pd.DataFrame]

to_pickle(path: str) → None

Save the SBML_dfs to a pickle file.

Parameters:: path (str) – Path where to save the pickle file

validate()

Validate the SBML_dfs structure and relationships.

Checks: - Schema existence - Required tables presence - Individual table structure - Primary key uniqueness - Foreign key relationships - Optional data table validity - Reaction species validity

Raises:: ValueError – If any validation check fails

validate_and_resolve()

Validate and attempt to automatically fix common issues.

This method iteratively: 1. Attempts validation 2. If validation fails, tries to resolve the issue 3. Repeats until validation passes or issue cannot be resolved

Raises:: ValueError – If validation fails and cannot be automatically resolved

_optional_entities: set[str]

_required_entities: set[str]

compartments: DataFrame

reaction_species: DataFrame

reactions: DataFrame

reactions_data: dict[str, DataFrame]

schema: dict

species: DataFrame

species_data: dict[str, DataFrame]