napistu.sbml_dfs_core
The core SBML DataFrame class for representing a SBML model as a collection of pandas DataFrames.
Classes
- SBML_dfs
A class representing a SBML model as a collection of pandas DataFrames.
Classes
|
System Biology Markup Language Model Data Frames. |
- class napistu.sbml_dfs_core.SBML_dfs(sbml_model: sbml.SBML | MutableMapping[str, pd.DataFrame | dict[str, pd.DataFrame]], model_source: Source, validate: bool = True, resolve: bool = True, verbose: bool = False)
Bases:
objectSystem Biology Markup Language Model Data Frames.
A class representing a SBML model as a collection of pandas DataFrames. This class provides methods for manipulating and analyzing biological pathway models with support for species, reactions, compartments, and their relationships.
- compartments
Sub-cellular compartments in the model, indexed by compartment ID (c_id)
- Type:
pd.DataFrame
- species
Molecular species in the model, indexed by species ID (s_id)
- Type:
pd.DataFrame
- species_data
Additional data for species. Each DataFrame is indexed by species_id (s_id)
- Type:
Dict[str, pd.DataFrame]
- reactions
Reactions in the model, indexed by reaction ID (r_id)
- Type:
pd.DataFrame
- reactions_data
Additional data for reactions. Each DataFrame is indexed by reaction_id (r_id)
- Type:
Dict[str, pd.DataFrame]
- reaction_species
One entry per species participating in a reaction, indexed by reaction-species ID (rsc_id)
- Type:
pd.DataFrame
- schema
Dictionary representing the structure of the other attributes and meaning of their variables
- Type:
dict
- Public Methods (alphabetical)
- ----------------------------
- add_reactions_data(label, data)
Add a new reactions data table to the model with validation.
- add_species_data(label, data)
Add a new species data table to the model with validation.
- copy
Return a deep copy of the SBML_dfs object.
- export_sbml_dfs(model_prefix, outdir, overwrite=False, dogmatic=True)
Export the SBML_dfs model and its tables to files in a specified directory.
- find_entity_references(entity_type, entity_ids, reference_type, reference_ids)
Find entities that reference specified entities through a given reference type.
- from_edgelist(interaction_edgelist, species_df, compartments_df, interaction_source=Source(init=True), interaction_edgelist_defaults=INTERACTION_EDGELIST_DEFAULTS, keep_species_data=False, keep_reactions_data=False)
Create SBML_dfs from interaction edgelist.
- from_pickle(path)
Load an SBML_dfs from a pickle file.
- get_characteristic_species_ids(dogmatic=True)
Return characteristic systematic identifiers for molecular species, optionally using a strict or loose definition.
- get_cspecies_features
Compute and return additional features for compartmentalized species, such as degree and type.
- get_identifiers(id_type)
Retrieve a table of identifiers for a specified entity type (e.g., species or reactions).
- get_ontology_cooccurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False)
Get ontology co-occurrence matrix for a specific entity type.
- get_ontology_occurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False)
Get ontology occurrence summary for a specific entity type.
- get_ontology_x_source_cooccurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False, characteristic_only=False, dogmatic=True, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS)
Get ontology × source co-occurrence matrix for a specific entity type.
- get_sbo_term_occurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False)
Get SBO term occurrence summary for a specific entity type.
- get_sbo_term_x_source_cooccurrence(entity_type, stratify_by_bqb=True, allow_col_multindex=False, characteristic_only=False, dogmatic=True, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS)
Get SBO term × source co-occurrence matrix for a specific entity type.
- get_source_cooccurrence(entity_type, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS)
Get pathway co-occurrence matrix for a specific entity type.
- get_source_occurrence(entity_type, priority_pathways=DEFAULT_PRIORITIZED_PATHWAYS)
Get pathway occurrence summary for a specific entity type.
- get_sources(entity_type)
Get the unnest sources table for a given entity type.
- get_source_total_counts(entity_type)
Get the total counts of each source for a given entity type.
- get_species_features
Compute and return additional features for species, such as species type.
- get_summary
Return a dictionary of diagnostic statistics summarizing the SBML_dfs structure.
- get_table(entity_type, required_attributes=None)
Retrieve a table for a given entity type, optionally validating required attributes.
- get_uri_urls(entity_type, entity_ids=None, required_ontology=None)
Return reference URLs for specified entities, optionally filtered by ontology.
- infer_sbo_terms
Infer and fill in missing SBO terms for reaction species based on stoichiometry.
- infer_uncompartmentalized_species_location
Infer and assign compartments for compartmentalized species with missing compartment information.
- name_compartmentalized_species
Rename compartmentalized species to include compartment information if needed.
- post_consensus_checks(entity_types=[SBML_DFS.SPECIES, SBML_DFS.COMPARTMENTS], check_types=[CONSENSUS_CHECKS.SOURCE_COOCCURRENCE, CONSENSUS_CHECKS.ONTOLOGY_X_SOURCE_COOCCURRENCE])
Perform checks on the SBML_dfs object after consensus building.
- reaction_formulas(r_ids=None)
Generate human-readable reaction formulas for specified reactions.
- reaction_summaries(r_ids=None)
Return a summary DataFrame for specified reactions, including names and formulas.
- remove_entities(entity_type, entity_ids, remove_species=False)
Remove specified entities and optionally remove unused species.
- remove_reactions_data(label)
Remove a reactions data table by label.
- remove_species_data(label)
Remove a species data table by label.
- remove_unused
Find and remove unused entities from the model with cascading cleanup.
- search_by_ids(id_table, identifiers=None, ontologies=None, bqbs=None)
Find entities and identifiers matching a set of query IDs.
- search_by_name(name, entity_type, partial_match=True)
Find entities by exact or partial name match.
- select_species_data(species_data_table)
Select a species data table from the SBML_dfs object by name.
- show_summary
Display a formatted summary of the SBML_dfs model.
- species_status(s_id)
Return all reactions a species participates in, with stoichiometry and formula information.
- to_dict
Return the 5 major SBML_dfs tables as a dictionary.
- to_pickle(path)
Save the SBML_dfs to a pickle file.
- validate
Validate the SBML_dfs structure and relationships.
- validate_and_resolve
Validate and attempt to automatically fix common issues.
- Private/Hidden Methods (alphabetical, appear after public methods)
- -----------------------------------------------------------------
- _attempt_resolve(e)
- _edgelist_assemble_sbml_model(compartments, species, comp_species, reactions, reaction_species, species_data, reactions_data, keep_species_data, keep_reactions_data, extra_columns)
- _find_invalid_entities_by_reference(entity_type, reference_type, reference_ids)
- _find_underspecified_reactions_by_reference(reference_type, reference_ids)
- _get_entity_data(entity_type, label)
- _get_identifiers_table_for_ontology_occurrence(entity_type, characteristic_only=False, dogmatic=True)
- _get_non_interactor_reactions
- _remove_entities_direct(entity_type, entity_ids)
- _remove_entity_data(entity_type, label)
- _validate_entity_data_access(entity_type, label)
- _validate_identifiers
- _validate_pk_fk_correspondence
- _validate_r_ids(r_ids)
- _validate_reaction_species
- _validate_reactions_data(reactions_data_table)
- _validate_sources
- _validate_species_data(species_data_table)
- _validate_table(table_name)
- classmethod _edgelist_assemble_sbml_model(compartments: DataFrame, species: DataFrame, comp_species: DataFrame, reactions: DataFrame, reaction_species: DataFrame, species_data: DataFrame, reactions_data: DataFrame, keep_species_data: bool | str, keep_reactions_data: bool | str, extra_columns: dict[str, list[str]], model_source: Source) SBML_dfs
Assemble the final SBML_dfs object.
- Parameters:
compartments (pd.DataFrame) – Processed compartments data
species (pd.DataFrame) – Processed species data
comp_species (pd.DataFrame) – Compartmentalized species data
reactions (pd.DataFrame) – Reactions data
reaction_species (pd.DataFrame) – Reaction species relationships
species_data (pd.DataFrame) – Extra species data to include
reactions_data (pd.DataFrame) – Extra reactions data to include
keep_species_data (bool or str) – Label for species extra data
keep_reactions_data (bool or str) – Label for reactions extra data
extra_columns (dict) – Dictionary containing lists of extra column names
- Returns:
Validated SBML data structure
- Return type:
- classmethod from_edgelist(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame, model_source: Source, interaction_edgelist_defaults: dict[str, Any] = {'compartment_downstream': 'cellular_component', 'compartment_upstream': 'cellular_component', 'r_isreversible': False, 'sbo_term_name_downstream': 'modified', 'sbo_term_name_upstream': 'modifier', 'stoichiometry_downstream': 0, 'stoichiometry_upstream': 0}, keep_species_data: bool | str = False, keep_reactions_data: bool | str = False, require_edgelist_consistency: bool = False) SBML_dfs
Create SBML_dfs from interaction edgelist.
Combines a set of molecular interactions into a mechanistic SBML_dfs model by processing interaction data, species information, and compartment definitions.
- Parameters:
interaction_edgelist (pd.DataFrame) – Table containing molecular interactions with columns: - name_upstream : str, matches “s_name” from species_df - name_downstream : str, matches “s_name” from species_df - r_name : str, name for the interaction - r_Identifiers : Identifiers, supporting identifiers - compartment_upstream : str, matches “c_name” from compartments_df - compartment_downstream : str, matches “c_name” from compartments_df - sbo_term_name_upstream : str, SBO term defining interaction type - sbo_term_name_downstream : str, SBO term defining interaction type - stoichiometry_upstream : float, stoichiometry of upstream species - stoichiometry_downstream : float, stoichiometry of downstream species - r_isreversible : bool, whether reaction is reversible
species_df (pd.DataFrame) – Table defining molecular species with columns: - s_name : str, name of molecular species - s_Identifiers : Identifiers, species identifiers
compartments_df (pd.DataFrame) – Table defining compartments with columns: - c_name : str, name of compartment - c_Identifiers : Identifiers, compartment identifiers
model_source (Source) – Source annotations for the data source
interaction_edgelist_defaults (dict[str, Any], default INTERACTION_EDGELIST_DEFAULTS) – Default values for interaction edgelist columns
keep_species_data (bool or str, default False) – Whether to preserve extra species columns. If True, saves as ‘source’ label. If string, uses as custom label. If False, discards extra data.
keep_reactions_data (bool or str, default False) – Whether to preserve extra reaction columns. If True, saves as ‘source’ label. If string, uses as custom label. If False, discards extra data.
require_edgelist_consistency (bool, default False) – Whether to force the edgelist to be consistent with the species and compartments dataframes This is useful for cases where there may be reasonable departures between the edgelist and the species and compartments dataframes but the user wants to create an SBML_dfs model anyway
- Returns:
Validated SBML data structure containing compartments, species, compartmentalized species, reactions, and reaction species tables.
- Return type:
- classmethod from_pickle(path: str) SBML_dfs
Load an SBML_dfs from a pickle file.
- Parameters:
path (str) – Path to the pickle file
- Returns:
The loaded SBML_dfs object
- Return type:
- __init__(sbml_model: sbml.SBML | MutableMapping[str, pd.DataFrame | dict[str, pd.DataFrame]], model_source: Source, validate: bool = True, resolve: bool = True, verbose: bool = False) None
Initialize a SBML_dfs object from a SBML model or dictionary of tables.
- Parameters:
sbml_model (Union[sbml.SBML, MutableMapping[str, Union[pd.DataFrame, Dict[str, pd.DataFrame]]]]) – Either a SBML model produced by sbml.SBML() or a dictionary containing tables following the sbml_dfs schema
validate (bool, optional) – Whether to validate the model structure and relationships, by default True
resolve (bool, optional) – Whether to attempt automatic resolution of common issues, by default True
verbose (bool) – extra reporting, defaults to False
- Raises:
ValueError – If the model structure is invalid and cannot be resolved
- _attempt_resolve(e)
- _find_invalid_entities_by_reference(entity_type: str, reference_type: str, reference_ids: set[str]) set[str]
Find and return orphaned entities based on broken foreign key references.
- Parameters:
entity_type (str) – The entity type to check for orphans (the table with primary keys)
reference_type (str) – The type of foreign key reference to check
reference_ids (set[str]) – Specific reference IDs that were removed
- Returns:
Set of primary keys that are orphaned and should be removed
- Return type:
set[str]
- _find_underspecified_reactions_by_reference(reference_type: str, reference_ids: set[str]) set[str]
Find reactions that would become underspecified after removing species.
- Parameters:
reference_type (str) – The type of foreign key reference to check
reference_ids (set[str]) – Specific reference IDs that were removed
- Returns:
set[str] – Set of reaction IDs that were orphaned and removed
set[str] – Set of reaction-species IDs that were orphaned and removed
- _get_data_summary()
Summarize the data tables in the SBML_dfs object
- _get_entity_data(entity_type: str, label: str) DataFrame
Get data from species_data or reactions_data by table name and label.
- Parameters:
entity_type (str) – Name of the table to get data from (‘species’ or ‘reactions’)
label (str) – Label of the data to retrieve
- Returns:
The requested data as a DataFrame
- Return type:
pd.DataFrame
- Raises:
ValueError – If entity_type is not ‘species’ or ‘reactions’, or if label doesn’t exist
- _get_identifiers_table_for_ontology_occurrence(entity_type: str, characteristic_only: bool = False, dogmatic: bool = True) DataFrame
Get the appropriate identifiers table for ontology analysis.
This method handles the common logic for determining which identifiers table to use based on the characteristic_only and dogmatic parameters.
- Parameters:
entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
characteristic_only (bool, optional) – Whether to use only characteristic identifiers (only supported for species), by default False
dogmatic (bool, optional) – Whether to use dogmatic identifier filtering, by default True
- Returns:
The appropriate identifiers table for ontology analysis
- Return type:
pd.DataFrame
- Raises:
ValueError – If the entity type is invalid
- _get_non_interactor_reactions() DataFrame
Get reactions table filtered to exclude reactions that are all interactors.
- Returns:
Reactions table with non-interactor reactions only
- Return type:
pd.DataFrame
- _remove_entities_direct(entity_type: str, entity_ids: list[str])
Directly remove entities without cascading cleanup.
- Parameters:
entity_type (str) – The entity type to remove
entity_ids (list[str]) – IDs of entities to remove
- _remove_entity_data(entity_type: str, label: str) None
Remove data from species_data or reactions_data by table name and label.
- Parameters:
entity_type (str) – Name of the table to remove data from (‘species’ or ‘reactions’)
label (str) – Label of the data to remove
- Raises:
ValueError – If entity_type is not ‘species’ or ‘reactions’, or if label doesn’t exist
- _validate_entity_data_access(entity_type: str, label: str) MutableMapping[str, DataFrame] | None
Validate entity type and label, and return the data dictionary if valid.
- Parameters:
entity_type (str) – Name of the table to access (‘species’ or ‘reactions’)
label (str) – Label of the data to access
- Returns:
The data dictionary if entity_type and label are valid
- Return type:
MutableMapping[str, pd.DataFrame]
- Raises:
ValueError – If entity_type is not ‘species’ or ‘reactions’, or if label doesn’t exist
- _validate_identifiers()
Validate identifiers in the model
Iterates through all tables and checks if the identifier columns are valid.
- Raises:
ValueError – missing identifiers in the table
- _validate_pk_fk_correspondence()
Check bidirectional primary key and foreign key correspondence for all tables in the schema.
Validates: 1. All foreign keys exist as primary keys (standard FK constraint) 2. All primary keys are referenced as foreign keys (referential completeness)
Raises ValueError if any FK constraint or referential completeness violations are found.
- _validate_r_ids(r_ids: str | list[str] | None) list[str]
- _validate_reaction_species()
- _validate_reactions_data(reactions_data_table: DataFrame)
Validates reactions data attribute
- Parameters:
reactions_data_table (pd.DataFrame) – a reactions data table
- Raises:
ValueError – r_id not index name r_id index contains duplicates r_id not in reactions table
- _validate_sources()
Validate sources in the model
Iterates through all tables and checks if the source columns are valid.
- Raises:
ValueError – missing sources in the table
- _validate_species_data(species_data_table: DataFrame)
Validates species data attribute
- Parameters:
species_data_table (pd.DataFrame) – a species data table
- Raises:
ValueError – s_id not index name s_id index contains duplicates s_id not in species table
- _validate_table(table_name: str) None
Validate a table in this SBML_dfs object against its schema.
This is an internal method that validates a table that is part of this SBML_dfs object against the schema stored in self.schema.
- Parameters:
table (str) – Name of the table to validate
- Raises:
ValueError – If the table does not conform to its schema
- add_reactions_data(label: str, data: DataFrame)
Add additional reaction data with validation.
- Parameters:
label (str) – Label for the new data
data (pd.DataFrame) – Data to add, must be indexed by reaction_id
- Raises:
ValueError – If the data is invalid or label already exists
- add_species_data(label: str, data: DataFrame)
Add additional species data with validation.
- Parameters:
label (str) – Label for the new data
data (pd.DataFrame) – Data to add, must be indexed by species_id
- Raises:
ValueError – If the data is invalid or label already exists
- copy()
Return a deep copy of the SBML_dfs object.
- Returns:
A deep copy of the current SBML_dfs object.
- Return type:
- export_sbml_dfs(model_prefix: str, outdir: str, overwrite: bool = False, dogmatic: bool = True) None
Export SBML_dfs
Export summaries of species identifiers and each table underlying an SBML_dfs pathway model
Params
- model_prefix: str
Label to prepend to all exported files
- outdir: str
Path to an existing directory where results should be saved
- overwrite: bool
Should the directory be overwritten if it already exists?
- dogmatic: bool
If True then treat genes, transcript, and proteins as separate species. If False then treat them interchangeably.
- rtype:
None
- find_entity_references(entity_type: str, entity_ids: list[str]) dict[str, set[str]]
Find all entities that directly depend on the set of requested entities.
- Parameters:
entity_type (str) – The initial entity type to remove
entity_ids (list[str]) – IDs of entities to remove
- Returns:
Dictionary mapping entity types to sets of IDs that directly depend on the requested entities
- Return type:
dict[str, set[str]]
- get_characteristic_species_ids(dogmatic: bool = True) DataFrame
Get Characteristic Species IDs
List the systematic identifiers which are characteristic of molecular species, e.g., excluding subcomponents, and optionally, treating proteins, transcripts, and genes equiavlently.
Characteristic identifiers include: - the defining IDs (BQB_IS) if dogmatic is True, and BQB_IS, BQB_IS_ENCODED_BY, BQB_ENCODES if dogmatic = False. - small complexes (BQB_HAS_PART)
This function is useful for pulling out the species which are closely associated with a specific proteins, metabolites, etc.
- Parameters:
dogmatic (bool, default=True) – Whether to use the dogmatic flag to determine which BQB attributes are valid.
- Returns:
A DataFrame containing the systematic identifiers which are characteristic of molecular species.
- Return type:
pd.DataFrame
- get_cspecies_features() DataFrame
Get additional attributes of compartmentalized species.
- Returns:
Compartmentalized species with additional features including: - sc_degree: Number of reactions the species participates in - sc_children: Number of reactions where species is consumed - sc_parents: Number of reactions where species is produced - species_type: Classification of the species
- Return type:
pd.DataFrame
- get_identifiers(id_type, filter_by_bqb=None, add_names=True, keep_source=False) DataFrame
Get identifiers from a specified entity type.
- Parameters:
id_type (str) – Type of entity to get identifiers for (e.g., ‘species’, ‘reactions’)
filter_by_bqb (None, list, or str, optional) – Filter identifiers by biological qualifier (BQB) terms: - None: No filtering, return all identifiers (default) - list: List of specific BQB terms to include - “defining”: Use BQB_DEFINING_ATTRS (strict defining identifiers) - “loose”: Use BQB_DEFINING_ATTRS_LOOSE (includes encoded/encodes relationships)
add_names (bool, optional) – Whether to add entity names and other metadata from the entity table, by default True
keep_source (bool, optional) – Whether to include the source column in the output, by default False. Only applies when add_names=True. The source column is excluded by default as it’s typically not needed for identifier lookups.
- Returns:
Table of identifiers for the specified entity type. If add_names=True, includes entity metadata; if add_names=False, returns only the core identifier data.
- Return type:
pd.DataFrame
- Raises:
ValueError – If id_type is invalid, identifiers are malformed, or filter_by_bqb is invalid
- get_ontology_cooccurrence(entity_type: str, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, characteristic_only: bool = False, dogmatic: bool = True) DataFrame
Get ontology co-occurrence matrix for a specific entity type.
This method creates a co-occurrence matrix showing which ontologies share entities of the specified type, indicating ontology relationships and overlaps.
Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.
- Parameters:
entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
stratify_by_bqb (bool, optional) – Whether to stratify by BQB (Biological Qualifier) terms, by default True
allow_col_multindex (bool, optional) – Whether to allow column multi-index, by default False
characteristic_only (bool, optional) – Whether to use only characteristic identifiers (only supported for species), by default False
dogmatic (bool, optional) – Whether to use dogmatic identifier filtering, by default True
- Returns:
Co-occurrence matrix with ontologies as both rows and columns
- Return type:
pd.DataFrame
- Raises:
ValueError – If the entity type is invalid or identifiers are malformed
- get_ontology_occurrence(entity_type: str, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, characteristic_only: bool = False, dogmatic: bool = True, include_missing: bool = False, binarize: bool = False) DataFrame
Get ontology occurrence summary for a specific entity type.
This method analyzes which ontologies are associated with entities of the specified type, providing a summary of ontology occurrence patterns.
Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.
- Parameters:
entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
stratify_by_bqb (bool, optional) – Whether to stratify by BQB (Biological Qualifier) terms, by default True
allow_col_multindex (bool, optional) – Whether to allow column multi-index, by default False
characteristic_only (bool, optional) –
- Whether to only include characteristic identifiers. Only supported for species. If,
true - returns only the characteristic identifiers (BQB_IS, and small complex BQB_HAS_PART annotations) false - returns all identifiers
dogmatic (bool, optional) – Whether to use a strict or loose definition of characteristic identifiers. Only applicable if characteristic_only is True and entity_type is SBML_DFS.SPECIES.
include_missing (bool, optional) – Whether to include missing entities in the result using add_missing_ids_column, by default False
binarize (bool, optional) – Whether to convert the result to binary values (0 vs 1+), by default False
- Returns:
Summary of ontology occurrence patterns with entities as rows and ontologies as columns. If binarize=True, values are 0 or 1.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the entity type is invalid or identifiers are malformed
- get_ontology_x_source_cooccurrence(entity_type: str, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, characteristic_only: bool = False, dogmatic: bool = True, priority_pathways: list[str] = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) DataFrame
Get ontology × source co-occurrence matrix for a specific entity type.
This method creates a co-occurrence matrix showing the relationship between ontologies and sources (pathways) by calculating how many entities of the specified type are shared between each ontology-source pair.
The method combines ontology occurrence data with source occurrence data to create a cross-tabulation matrix where: - Rows represent ontologies - Columns represent sources/pathways - Values represent the number of entities shared between each ontology-source pair
Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.
- Parameters:
entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
stratify_by_bqb (bool, optional) – Whether to stratify by BQB (Biological Qualifier) terms in ontology analysis, by default True
allow_col_multindex (bool, optional) – Whether to allow column multi-index in ontology analysis, by default False
characteristic_only (bool, optional) – Whether to use only characteristic identifiers in ontology analysis (only supported for species), by default False
dogmatic (bool, optional) – Whether to use dogmatic identifier filtering in ontology analysis, by default True
priority_pathways (list[str], optional) – List of pathway IDs to prioritize in the source analysis, by default DEFAULT_PRIORITIZED_PATHWAYS
- Returns:
Co-occurrence matrix with ontologies as rows and sources as columns. Values represent the number of entities shared between each ontology-source pair.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the entity type is invalid, identifiers are malformed, or source tables are empty
Examples
>>> # Get ontology × source co-occurrence for species >>> cooccurrence_matrix = sbml_dfs.get_ontology_x_source_cooccurrence(SBML_DFS.SPECIES) >>> >>> # Use characteristic species only >>> char_cooccurrence = sbml_dfs.get_ontology_x_source_cooccurrence( ... SBML_DFS.SPECIES, characteristic_only=True ... ) >>> >>> # Custom pathway priority >>> custom_cooccurrence = sbml_dfs.get_ontology_x_source_cooccurrence( ... SBML_DFS.SPECIES, priority_pathways=['reactome', 'kegg'] ... )
- get_sbo_term_occurrence(name_terms=True, include_interactor_reactions=False) DataFrame
Get the occurrence of SBO terms for reactions.
Note: By default, reactions that consist entirely of interactor species will be excluded from the analysis. This is mandatory for most of the other occurrence and co-occurrence methods.
- Parameters:
name_terms (bool, optional) – Whether to name the SBO terms, by default True
include_interactor_reactions (bool, optional) – Whether to exclude interactor reactions, by default True
- get_sbo_term_x_source_cooccurrence(name_terms: bool = True, priority_pathways: list[str] = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) DataFrame
Get SBO term × source co-occurrence matrix for reactions.
This method creates a co-occurrence matrix showing the relationship between SBO terms and sources (pathways) by calculating how many reactions are shared between each SBO term-source pair.
The method combines SBO term occurrence data with source occurrence data to create a cross-tabulation matrix where: - Rows represent SBO terms - Columns represent sources/pathways - Values represent the number of reactions shared between each SBO term-source pair
Note: Reactions that consist entirely of interactor species will be excluded from the analysis.
- Parameters:
name_terms (bool, optional) – Whether to name the SBO terms using human-readable names, by default True
priority_pathways (list[str], optional) – List of pathway IDs to prioritize in the source analysis, by default DEFAULT_PRIORITIZED_PATHWAYS
- Returns:
Co-occurrence matrix with SBO terms as rows and sources as columns. Values represent the number of reactions shared between each SBO term-source pair.
- Return type:
pd.DataFrame
- Raises:
ValueError – If source tables are empty
Examples
>>> # Get SBO term × source co-occurrence for reactions >>> cooccurrence_matrix = sbml_dfs.get_sbo_term_x_source_cooccurrence() >>> >>> # Use numeric SBO term codes instead of names >>> numeric_cooccurrence = sbml_dfs.get_sbo_term_x_source_cooccurrence(name_terms=False)
- get_source_cooccurrence(entity_type: str, priority_pathways: list[str] | None = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) DataFrame
Get pathway co-occurrence matrix for a specific entity type.
This method creates a co-occurrence matrix showing which pathways share entities of the specified type, indicating pathway relationships and overlaps.
Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.
- Parameters:
entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
priority_pathways (Optional[list[str]], default DEFAULT_PRIORITIZED_PATHWAYS) – List of pathway IDs to prioritize in the analysis. If None, uses all pathways without filtering or warnings.
- Returns:
Co-occurrence matrix with pathways as both rows and columns
- Return type:
pd.DataFrame
- Raises:
ValueError – If the source tables for the entity type are empty (indicating single-source model)
- get_source_occurrence(entity_type: str, priority_pathways: list[str] | None = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904'], include_missing: bool = False, binarize: bool = False) DataFrame
Get pathway occurrence summary for a specific entity type.
This method analyzes which pathways contain entities of the specified type, providing a summary of pathway occurrence patterns.
Note: When entity_type is ‘reactions’, reactions that consist entirely of interactor species will be excluded from the analysis.
- Parameters:
entity_type (str) – The type of entity to analyze (e.g., ‘species’, ‘reactions’, ‘compartments’)
priority_pathways (Optional[list[str]], default DEFAULT_PRIORITIZED_PATHWAYS) – List of pathway IDs to prioritize in the analysis. If None, uses all pathways without filtering or warnings.
include_missing (bool, optional) – Whether to include missing entities in the result using add_missing_ids_column, by default False
binarize (bool, optional) – Whether to convert the result to binary values (0 vs 1+), by default False
- Returns:
Summary of pathway occurrence patterns. If binarize=True, values are 0 or 1.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the source tables for the entity type are empty (indicating single-source model)
- get_source_total_counts(entity_type: str) Series
Get the total counts of each source for a given entity type.
- Parameters:
entity_type (str) – The type of entity to get the total counts of (e.g., ‘species’, ‘reactions’)
- Returns:
Series containing the total counts of each source, indexed by pathway_id
- Return type:
pd.Series
- Raises:
ValueError – If entity_type is invalid
- get_sources(entity_type: str) DataFrame | None
Get the unnest sources table for a given entity type.
- Parameters:
entity_type (str) – The type of entity to get sources for (e.g., ‘species’, ‘reactions’)
- Returns:
DataFrame containing the unnest sources table, or None if no sources found
- Return type:
pd.DataFrame | None
- Raises:
ValueError – If entity_type is invalid or does not have a source attribute
- get_species_features() DataFrame
Get additional attributes of species.
- Returns:
Species with additional features including: - species_type: Classification of the species (e.g., metabolite, protein)
- Return type:
pd.DataFrame
- get_summary() Mapping[str, Any]
Get diagnostic statistics about the SBML_dfs.
- Returns:
Dictionary of diagnostic statistics including: - n_species_types: Number of species types - n_species_per_type: Number of species per type - n_entity_types: Dictionary of entity counts by type - dict_n_species_per_compartment: Number of species per compartment - stats_species_per_reactions: Statistics on reactands per reaction - top10_species_per_reactions: Top 10 reactions by number of reactands - sbo_name_counts: Count of reaction species by SBO term name - stats_degree: Statistics on species connectivity - top10_degree: Top 10 species by connectivity - species_ontology_counts: Count of species by ontology identifiers - data_summary: Summary of species and reaction data
- Return type:
Mapping[str, Any]
- get_table(entity_type: str, required_attributes: None | set[str] = None) DataFrame
Get a table from the SBML_dfs object with optional attribute validation.
- Parameters:
entity_type (str) – The type of entity table to retrieve (e.g., ‘species’, ‘reactions’)
required_attributes (Optional[Set[str]], optional) – Set of attributes that must be present in the table, by default None. Must be passed as a set, e.g. {‘id’}, not a string.
- Returns:
The requested table
- Return type:
pd.DataFrame
- Raises:
ValueError – If entity_type is invalid or required attributes are missing
TypeError – If required_attributes is not a set
- get_uri_urls(entity_type: str, entity_ids: Iterable[str] | None = None, required_ontology: str | None = None) Series
Get reference URLs for specified entities.
- Parameters:
entity_type (str) – Type of entity to get URLs for (e.g., ‘species’, ‘reactions’)
entity_ids (Optional[Iterable[str]], optional) – Specific entities to get URLs for, by default None (all entities)
required_ontology (Optional[str], optional) – Specific ontology to get URLs from, by default None
- Returns:
Series mapping entity IDs to their reference URLs
- Return type:
pd.Series
- Raises:
ValueError – If entity_type is invalid
- infer_sbo_terms()
Infer SBO Terms
Define SBO terms based on stoichiometry for reaction_species with missing terms. Modifies the SBML_dfs object in-place.
- Return type:
None (modifies SBML_dfs object in-place)
- infer_uncompartmentalized_species_location()
Infer Uncompartmentalized Species Location
If the compartment of a subset of compartmentalized species was not specified, infer an appropriate compartment from other members of reactions they participate in.
This method modifies the SBML_dfs object in-place.
- Return type:
None (modifies SBML_dfs object in-place)
- name_compartmentalized_species()
Name Compartmentalized Species
Rename compartmentalized species if they have the same name as their species. Modifies the SBML_dfs object in-place.
- Return type:
None (modifies SBML_dfs object in-place)
- post_consensus_checks(entity_types: list[str] = ['species', 'compartments'], check_types: list[str] = ['source_cooccurrence', 'ontology_x_source_cooccurrence']) None
Post-consensus checks
Perform checks on the SBML_dfs object after consensus building.
- Parameters:
entity_types (list[str], optional) – Entity types to check
check_types (list[str], optional) – Check types to perform
- Return type:
None
- reaction_formulas(r_ids: str | list[str] | None = None) Series
Reaction Summary
Return human-readable formulas for reactions.
Parameters:
- r_ids: [str], str or None
Reaction IDs or None for all reactions
- returns:
formula_strs
- rtype:
pd.Series
- reaction_summaries(r_ids: str | list[str] | None = None) DataFrame
Reaction Summary
Return a summary of reactions.
Parameters:
- r_ids: [str], str or None
Reaction IDs or None for all reactions
- returns:
reaction_summaries_df – A table with r_id as an index and columns: - r_name: str, name of the reaction - r_formula_str: str, human-readable formula of the reaction
- rtype:
pd.DataFrame
- remove_entities(entity_type: str, entity_ids: Iterable[str], remove_references: bool = True)
Public method to remove entities and optionally clean up orphaned references.
Special handling for “cofactors” where literal cleanup of reactions based on reaction_species is allowed normally, removing substrates/products would remove the reaction.
- Parameters:
entity_type (str) – The entity type (e.g., ‘reactions’, ‘compartmentalized_species’, ‘species’, ‘compartments’, or “cofactors”)
entity_ids (Iterable[str]) – IDs of entities to remove
remove_references (bool, default True) – Whether to remove orphaned references after entity removal
- remove_reactions_data(label: str)
Remove reactions data by label.
- remove_species_data(label: str)
Remove species data by label.
- remove_unused() None
Find and remove unused entities from the model.
This method identifies unused entities using find_unused_entities and then cleans them up using the existing remove_entities method which properly handles cleanup of species_data and reactions_data as needed.
- Returns:
Modifies the SBML_dfs object in-place
- Return type:
None
- search_by_ids(id_table: DataFrame, identifiers: str | list | set | None = None, ontologies: str | list | set | None = None, bqbs: str | list | set | None = ['BQB_IS', 'BQB_IS_HOMOLOG_TO', 'BQB_IS_ENCODED_BY', 'BQB_ENCODES', 'BQB_HAS_PART']) tuple[DataFrame, DataFrame]
Find entities and identifiers matching a set of query IDs.
- Parameters:
id_table (pd.DataFrame) – DataFrame containing identifier mappings
identifiers (Optional[Union[str, list, set]], optional) – Identifiers to filter by, by default None
ontologies (Optional[Union[str, list, set]], optional) – Ontologies to filter by, by default None
bqbs (Optional[Union[str, list, set]], optional) – BQB terms to filter by, by default [BQB.IS, BQB.HAS_PART]
- Returns:
Matching entities
Matching identifiers
- Return type:
Tuple[pd.DataFrame, pd.DataFrame]
- Raises:
ValueError – If entity_type is invalid or ontologies are invalid
TypeError – If ontologies is not a set
- search_by_name(name: str, entity_type: str, partial_match: bool = True) DataFrame
Find entities by exact or partial name match.
- Parameters:
name (str) – Name to search for
entity_type (str) – Type of entity to search (e.g., ‘species’, ‘reactions’)
partial_match (bool, optional) – Whether to allow partial string matches, by default True
- Returns:
Matching entities
- Return type:
pd.DataFrame
- select_species_data(species_data_table: str) DataFrame
Select a species data table from the SBML_dfs object.
- Parameters:
species_data_table (str) – Name of the species data table to select
- Returns:
The selected species data table
- Return type:
pd.DataFrame
- Raises:
ValueError – If species_data_table is not found
- show_summary() None
Display a formatted summary of the SBML_dfs model.
This method chains together get_summary(), format_sbml_dfs_summary(), and show() to provide a convenient way to display network statistics.
- Returns:
Displays the formatted summary table to console
- Return type:
None
Examples
>>> sbml_dfs.show_network_summary()
- species_status(s_id: str) DataFrame
Species Status
Return all of the reactions a species participates in.
Parameters: s_id: str
A species ID
Returns: pd.DataFrame, one row per reaction the species participates in with columns: - sc_name: str, name of the compartment the species participates in - stoichiometry: float, stoichiometry of the species in the reaction - r_name: str, name of the reaction - r_formula_str: str, human-readable formula of the reaction
- to_dict() dict[str, DataFrame]
Return the 5 major SBML_dfs tables as a dictionary.
- Returns:
Dictionary containing the core SBML_dfs tables: - ‘compartments’: Compartments table - ‘species’: Species table - ‘compartmentalized_species’: Compartmentalized species table - ‘reactions’: Reactions table - ‘reaction_species’: Reaction species table
- Return type:
dict[str, pd.DataFrame]
- to_pickle(path: str) None
Save the SBML_dfs to a pickle file.
- Parameters:
path (str) – Path where to save the pickle file
- validate()
Validate the SBML_dfs structure and relationships.
Checks: - Schema existence - Required tables presence - Individual table structure - Primary key uniqueness - Foreign key relationships - Optional data table validity - Reaction species validity
- Raises:
ValueError – If any validation check fails
- validate_and_resolve()
Validate and attempt to automatically fix common issues.
This method iteratively: 1. Attempts validation 2. If validation fails, tries to resolve the issue 3. Repeats until validation passes or issue cannot be resolved
- Raises:
ValueError – If validation fails and cannot be automatically resolved
- _optional_entities: set[str]
- _required_entities: set[str]
- compartments: DataFrame
- reaction_species: DataFrame
- reactions: DataFrame
- reactions_data: dict[str, DataFrame]
- schema: dict
- species: DataFrame
- species_data: dict[str, DataFrame]