napistu.sbml_dfs_utils

Utilities supporting creation and manipulation of SBML_dfs instances.

Public Functions

add_missing_ids_column(contingency_table, reference_table, other_column_name=”other”) -> pd.DataFrame:: Add an ‘other’ column to a contingency table for IDs that exist in a reference table but are missing from the contingency table.
add_sbo_role(reaction_species) -> pd.DataFrame:: Add an sbo_role column to the reaction_species table.
check_entity_data_index_matching(sbml_dfs, table) -> sbml_dfs:: Update the input smbl_dfs’s entity_data (dict) index with match_entitydata_index_to_entity, so that index for dataframe(s) in entity_data (dict) matches the sbml_dfs’ corresponding entity, and then passes sbml_dfs.validate()
construct_formula_string(reaction_species_df, reactions_df, name_var) -> str:: Construct Formula String
create_reaction_formula_series(reaction_data, reactions_df, species_name_col, sort_cols, group_cols=None, add_compartment_prefix=False, r_id_col=SBML_DFS.R_ID, c_name_col=SBML_DFS.C_NAME) -> pd.Series:: Create a pd.Series of reaction formula strings.
display_post_consensus_checks(checks_results) -> None:: Display post-consensus checks results.
find_underspecified_reactions(reaction_species_w_roles) -> pd.DataFrame:: Find underspecified reactions in a reaction_species table.
find_unused_entities(sbml_dfs_or_dict) -> dict[str, set[str]]:: Find unused entities in a SBML_dfs or dict of SBML_dfs instances.
filter_to_characteristic_species_ids(species_ids, max_complex_size=4, max_promiscuity=20, defining_biological_qualifiers=BQB_DEFINING_ATTRS) -> pd.DataFrame:: Filter to characteristic species IDs.
force_edgelist_consistency(interaction_edgelist, species_df, compartments_df) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:: Force edgelist consistency.
format_sbml_dfs_summary(data) -> str:: Format a summary of a SBML_dfs instance.
get_current_max_id(sbml_dfs_table) -> int:: Get the current maximum ID for a given SBML_dfs table.
id_formatter(input_vals, id_type, id_len=8) -> pd.Series:: Format a list of input values as a series of identifiers.
id_formatter_inv(ids) -> list:: Invert the id_formatter function.
match_entitydata_index_to_entity(entity_data_dict, an_entity_data_type, consensus_entity_df, entity_schema, table) -> pd.DataFrame:: Match the index of an entity data dictionary to the index of a consensus entity DataFrame.
species_type_types(x, ontology_to_species_type=ONTOLOGY_TO_SPECIES_TYPE, prioritized_species_types=PRIORITIZED_SPECIES_TYPES) -> str:: Determine the species type of a given entity.
stub_compartments(stubbed_compartment=GENERIC_COMPARTMENT, with_source=False) -> pd.DataFrame:: Stub compartments in a SBML_dfs instance.
unnest_identifiers(id_table, id_var) -> pd.DataFrame:: Unnest identifiers from a table.
validate_sbml_dfs_table(df, table) -> None:: Validate that a DataFrame is a valid SBML_dfs table.

Functions

`add_missing_ids_column`(contingency_table, ...)	Add an 'other' column to a contingency table for IDs that exist in a reference table but are missing from the contingency table.
`add_sbo_role`(reaction_species)	Add an sbo_role column to the reaction_species table.
`check_entity_data_index_matching`(sbml_dfs, table)	Update the input smbl_dfs's entity_data (dict) index with match_entitydata_index_to_entity, so that index for dataframe(s) in entity_data (dict) matches the sbml_dfs' corresponding entity, and then passes sbml_dfs.validate() Args sbml_dfs (cpr.SBML_dfs): a cpr.SBML_dfs table (str): table whose data is being consolidates (currently species or reactions) Returns sbml_dfs (cpr.SBML_dfs): sbml_dfs whose entity_data is checked to have the same index as the corresponding entity.
`construct_formula_string`(...)	Construct Formula String
`create_reaction_formula_series`(...[, ...])	Helper function to create reaction formula series.
`display`(obj)
`display_post_consensus_checks`(checks_results)	Display the results of post_consensus_checks in a formatted way.
`filter_to_characteristic_species_ids`(species_ids)	Filter to Characteristic Species IDs
`find_underspecified_reactions`(...)
`find_unused_entities`(sbml_dfs_or_dict)
`force_edgelist_consistency`(...)	Force the edgelist to be consistent with the species and compartments dataframes.
`format_sbml_dfs_summary`(data)	Format model data into a clean summary table for Jupyter display
`get_current_max_id`(sbml_dfs_table)	Get Current Max ID
`id_formatter`(id_values, id_type[, id_len])
`id_formatter_inv`(ids)	ID Formatter Inverter
`match_entitydata_index_to_entity`(...)	Match the index of entity_data_dict[an_entity_data_type] with the index of corresponding entity. Update entity_data_dict[an_entity_data_type]'s index to the same as consensus_entity_df's index Report cases where entity_data has indices not in corresponding entity's index. Args entity_data_dict (dict): dictionary containing all model's "an_entity_data_type" dictionaries an_entity_data_type (str): data_type from species/reactions_data in entity_data_dict consensus_entity_df (pd.DataFrame): the dataframe of the corresponding entity entity_schema (dict): schema for "table" table (str): table whose data is being consolidates (currently species or reactions) :returns: entity_data_df (pd.DataFrame) table for entity_data_dict[an_entity_data_type].
`species_type_types`(x[, ...])	Assign a high-level molecule type to a molecular species
`stub_compartments`([stubbed_compartment, ...])	Stub Compartments
`unnest_identifiers`(id_table, id_var)	Unnest Identifiers
`validate_sbml_dfs_table`(table_data, table_name)	Validate a standalone table against the SBML_dfs schema.

napistu.sbml_dfs_utils._add_edgelist_defaults(interaction_edgelist: DataFrame, edgelist_defaults: dict[str, Any] | None = {'compartment_downstream': 'cellular_component', 'compartment_upstream': 'cellular_component', 'r_isreversible': False, 'sbo_term_name_downstream': 'modified', 'sbo_term_name_upstream': 'modifier', 'stoichiometry_downstream': 0, 'stoichiometry_upstream': 0}) → DataFrame

Add default values to the interaction edgelist

Parameters:

interaction_edgelist (pd.DataFrame) – The interaction edgelist to add defaults to
edgelist_defaults (dict[str, Any]) – The defaults to add to the interaction edgelist
Returns

napistu.sbml_dfs_utils._add_stoi_to_species_name(stoi: float | int, name: str) → str

Add Stoi To Species Name

Add # of molecules to a species name

Parameters:

stoi: float or int: Number of molecules
name: str: Name of species

Returns:

name: str: Name containing number of species

napistu.sbml_dfs_utils._dogmatic_to_defining_bqbs(dogmatic: bool = False) → str

napistu.sbml_dfs_utils._edgelist_create_compartmentalized_species(interaction_edgelist, species_df, compartments_df, interaction_source)

Create compartmentalized species from interactions.

Parameters:

interaction_edgelist (pd.DataFrame) – Interaction data containing species-compartment combinations
species_df (pd.DataFrame) – Processed species data with IDs
compartments_df (pd.DataFrame) – Processed compartments data with IDs
interaction_source (source.Source) – Source object to assign to compartmentalized species

Returns:

Compartmentalized species with formatted names and IDs

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._edgelist_create_reactions_and_species(interaction_edgelist, comp_species, processed_species, processed_compartments, interaction_source, extra_reactions_columns)

Create reactions and reaction species from interactions.

Parameters:

interaction_edgelist (pd.DataFrame) – Original interaction data
comp_species (pd.DataFrame) – Compartmentalized species with IDs
processed_species (pd.DataFrame) – Processed species data with IDs
processed_compartments (pd.DataFrame) – Processed compartments data with IDs
interaction_source (source.Source) – Source object for reactions
extra_reactions_columns (list) – Names of extra columns to preserve

Returns:

(reactions_df, reaction_species_df, reactions_data)

Return type:

tuple

napistu.sbml_dfs_utils._edgelist_identify_extra_columns(interaction_edgelist, species_df, keep_reactions_data, keep_species_data)

Identify extra columns in input data that should be preserved.

Parameters:

interaction_edgelist (pd.DataFrame) – Interaction data containing potential extra columns
species_df (pd.DataFrame) – Species data containing potential extra columns
keep_reactions_data (bool or str) – Whether to keep extra reaction columns
keep_species_data (bool or str) – Whether to keep extra species columns

Returns:

Dictionary with ‘reactions’ and ‘species’ keys containing lists of extra column names

Return type:

dict

napistu.sbml_dfs_utils._edgelist_process_compartments(compartments_df, interaction_source)

Format compartments DataFrame with source and ID columns.

Parameters:

compartments_df (pd.DataFrame) – Raw compartments data
interaction_source (source.Source) – Source object to assign to compartments

Returns:

Processed compartments with IDs, indexed by compartment ID

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._edgelist_process_species(species_df, interaction_source, extra_species_columns)

Format species DataFrame and extract extra data.

Parameters:

species_df (pd.DataFrame) – Raw species data
interaction_source (source.Source) – Source object to assign to species
extra_species_columns (list) – Names of extra columns to preserve separately

Returns:

Processed species DataFrame and species extra data DataFrame

Return type:

tuple of pd.DataFrame

napistu.sbml_dfs_utils._edgelist_validate_inputs(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame) → None

Validate input DataFrames have required columns.

Parameters:

interaction_edgelist (pd.DataFrame) – Interaction data to validate
species_df (pd.DataFrame) – Species data to validate
compartments_df (pd.DataFrame) – Compartments data to validate

napistu.sbml_dfs_utils._filter_promiscuous_components(bqb_has_parts_species: DataFrame, max_promiscuity: int) → DataFrame

napistu.sbml_dfs_utils._filter_to_pathways(df: DataFrame, pathways: list[str]) → DataFrame: Filter a table to only include pathways in the list.

napistu.sbml_dfs_utils._find_underspecified_reactions(reaction_species_w_roles: DataFrame) → DataFrame

napistu.sbml_dfs_utils._get_interaction_symbol(sbo_term_or_name: str) → str

napistu.sbml_dfs_utils._id_dict_to_df(ids)

napistu.sbml_dfs_utils._name_interaction(upstream_name: str, downstream_name: str, sbo_term_upstream: str | None = 'interactor')

Name an interaction

Parameters:

upstream_name (str) – The name of the upstream species
downstream_name (str) – The name of the downstream species
sbo_term_upstream (str, optional) – The SBO term of the upstream species. If not provided, the interaction will be named “interactor”

Returns:

The name of the interaction

Return type:

str

napistu.sbml_dfs_utils._perform_sbml_dfs_table_validation(table_data: DataFrame, table_schema: dict, table_name: str) → None

Core validation logic for SBML_dfs tables.

This function performs the actual validation checks for any table against its schema, regardless of whether it’s part of an SBML_dfs object or standalone.

table_datapd.DataFrame

The table data to validate

table_schemadict

Schema definition for the table

table_namestr

Name of the table (for error messages)

ValueError If the table does not conform to its schema: - Not a DataFrame - Wrong index name - Duplicate primary keys - Missing required variables - Empty table

napistu.sbml_dfs_utils._sbml_dfs_from_edgelist_check_cspecies_merge(merged_species: DataFrame, original_species: DataFrame) → None: Check for a mismatch between the provided species data and species implied by the edgelist.

napistu.sbml_dfs_utils._select_priority_pathway_sources(source_table: DataFrame, priority_pathways: list[str] | None = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) → DataFrame

Filter the source table to only include pathways in the list. If 0 or 1 priority pathways are found, return the source table.

Parameters:

source_table (pd.DataFrame) – The source table to filter
priority_pathways (Optional[list[str]], default DEFAULT_PRIORITIZED_PATHWAYS) – The list of pathways to filter to. If None, returns source_table with no filtering or warning. If fewer than 2 pathways are found in the source table, returns the full source table with a warning.

Returns:

The filtered source table. If priority_pathways is None, returns the original source_table. If fewer than 2 priority pathways are found, returns the full source_table with a warning.

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._summarize_ontology_cooccurrence(df: DataFrame, stratify_by_bqb: bool = True, allow_col_multindex: bool = False) → DataFrame

Create a cooccurrence matrix of ontologies based entities sharing the same ontology.

This can be used to identify ontologies which are associated with the same types of entities.

Parameters:

(pd.DataFrame) (df) – a table generated using sbml_dfs.get_sources
(bool) (allow_col_multindex) – whether to stratify by bqb
(bool) – whether to allow the column multindex

Returns:

Square matrix with pathways as both rows and columns

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._summarize_ontology_occurrence(df: DataFrame, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, binarize: bool = False) → DataFrame

Summarize the types of identifiers associated with each entity.

Parameters:

(pd.DataFrame) (df) – a table generated using sbml_dfs.get_identifiers or sbml_dfs.get_characteristic_species_ids
(bool) (allow_col_multindex) – whether to stratify by bqb
(bool) – whether to allow the column multindex
binarize (bool) – whether to convert the result to binary values (0 vs 1+)

Returns:

a table with entities as rows and ontologies as columns

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._summarize_source_cooccurrence(df: DataFrame) → DataFrame

Create a cooccurrence matrix of pathways based on the presence of entities in pathways.

Parameters:: (pd.DataFrame) (df) – a table generated using sbml_dfs.get_sources
Returns:: Square matrix with pathways as both rows and columns
Return type:: pd.DataFrame

napistu.sbml_dfs_utils._summarize_source_occurrence(df: DataFrame, binarize: bool = False) → DataFrame

Summarize the occurrence of entities in pathways.

Parameters:

(pd.DataFrame) (df) – a table generated using sbml_dfs.get_sources
binarize (bool) – whether to convert the result to binary values (0 vs 1+)

Returns:

a table with entities as rows and pathways as columns

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._validate_edgelist_consistency(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame, raise_on_missing: bool = True) → None

Check for missing entity references, optionally raising or warning.

This function is used to validate the consistency of the interaction edgelist, species_df, and compartments_df.

Parameters:

interaction_edgelist (pd.DataFrame) – The interaction edgelist to validate
species_df (pd.DataFrame) – The species dataframe to validate
compartments_df (pd.DataFrame) – The compartments dataframe to validate
raise_on_missing (bool, optional) – Whether to raise an error if missing entities are found

Return type:

None

napistu.sbml_dfs_utils._validate_matching_data(data_table: DataFrame, ref_table: DataFrame)

Validates a table against a reference

This check if the table has the same index, no duplicates in the index and that all values in the index are in the reference table.

Parameters:

data_table (pd.DataFrame) – a table with data that should match the reference
ref_table (pd.DataFrame) – a reference table

Raises:

ValueError – not same index name
ValueError – index contains duplicates
ValueError – index not subset of index of reactions table

napistu.sbml_dfs_utils._validate_non_null_values(df: DataFrame, expected_columns: set, table_name: str) → None

Validate that all required columns in a DataFrame have non-null values.

Parameters:

df (pd.DataFrame) – The DataFrame to validate
expected_columns (set) – Set of column names that should have non-null values
table_name (str) – Name of the table for error messages

Raises:

ValueError – If any required column contains null values

napistu.sbml_dfs_utils._validate_sbo_values(sbo_series: Series, validate: str = 'names') → None

Validate SBO terms or names

Parameters:

sbo_series (pd.Series) – The SBO terms or names to validate.
validate (str, optional) – Whether the values are SBO terms (“terms”) or names (“names”, default).

Return type:

None

Raises:

ValueError – If the validation type is invalid.
TypeError – If the invalid_counts is not a pandas DataFrame.
ValueError – If some reaction species have unusable SBO terms.

napistu.sbml_dfs_utils.add_missing_ids_column(contingency_table: DataFrame, reference_table: DataFrame, other_column_name: str = 'other') → DataFrame

Add an ‘other’ column to a contingency table for IDs that exist in a reference table but are missing from the contingency table.

Parameters:

contingency_tablepd.DataFrame: The contingency table with binary values (subset of IDs)
reference_tablepd.DataFrame: The reference table containing all possible IDs
other_column_namestr, optional: Name for the ‘other’ column, by default “other”

Returns:

pd.DataFrame: Updated contingency table with ‘other’ column(s) added if there are missing IDs. If no IDs are missing, returns a copy of the original contingency table without adding an ‘other’ column.

Raises:

ValueError: If the index names don’t match between the two tables

napistu.sbml_dfs_utils.add_sbo_role(reaction_species: DataFrame) → DataFrame

Add an sbo_role column to the reaction_species table.

The sbo_role column is a string column that contains the SBO role of the reaction species. The values in the sbo_role column are taken from the sbo_term column.

The sbo_role column is added to the reaction_species table by mapping the sbo_term column to the SBO_NAME_TO_ROLE dictionary.

napistu.sbml_dfs_utils.check_entity_data_index_matching(sbml_dfs, table)

Update the input smbl_dfs’s entity_data (dict) index with match_entitydata_index_to_entity, so that index for dataframe(s) in entity_data (dict) matches the sbml_dfs’ corresponding entity, and then passes sbml_dfs.validate() Args

sbml_dfs (cpr.SBML_dfs): a cpr.SBML_dfs table (str): table whose data is being consolidates (currently species or reactions)

Returns: sbml_dfs (cpr.SBML_dfs): sbml_dfs whose entity_data is checked to have the same index as the corresponding entity.

napistu.sbml_dfs_utils.construct_formula_string(reaction_species_df: DataFrame, reactions_df: DataFrame, name_var: str) → str

Construct Formula String

Convert a table of reaction species into a formula string

Parameters:

reaction_species_df: pd.DataFrame: Table containing a reactions’ species
reactions_df: pd.DataFrame: smbl.reactions
name_var: str: Name used to label species

Returns:

formula_str: str: String representation of a reactions substrates, products and modifiers

napistu.sbml_dfs_utils.create_reaction_formula_series(reaction_data, reactions_df, species_name_col, sort_cols, group_cols=None, add_compartment_prefix=False, r_id_col='r_id', c_name_col='c_name')

Helper function to create reaction formula series.

Parameters:

reaction_datapd.DataFrame: The reaction species data to process
reactions_dfpd.DataFrame: The reactions dataframe needed by construct_formula_string
species_name_colstr: Column name to use for species names in formulas
sort_colslist: Columns to sort by before grouping
group_colslist, optional: Columns to group by. If None, uses [r_id_col]
add_compartment_prefixbool: Whether to add compartment name as prefix to formula
r_id_colstr: Column name for reaction ID
c_name_colstr: Column name for compartment name (used when add_compartment_prefix=True)

Returns:

pd.Series or None : Formula strings indexed by reaction ID, or None if no data

napistu.sbml_dfs_utils.display(obj)

napistu.sbml_dfs_utils.display_post_consensus_checks(checks_results: dict) → None

Display the results of post_consensus_checks in a formatted way.

This function takes the results from the post_consensus_checks method and displays them using the same formatting as shown in the sandbox notebook.

Parameters:: checks_results (dict) – Dictionary returned by the post_consensus_checks method, containing nested dictionaries with entity types and check types as keys, and DataFrames as values.
Returns:: This function displays results but doesn’t return anything.
Return type:: None

napistu.sbml_dfs_utils.filter_to_characteristic_species_ids(species_ids: DataFrame, max_complex_size: int = 4, max_promiscuity: int = 20, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) → DataFrame

Filter to Characteristic Species IDs

Remove identifiers corresponding to one component within a large protein complexes and non-characteristic annotations such as pubmed references and homologues.

species_ids: pd.DataFrame: A table of identifiers produced by sdbml_dfs.get_identifiers(“species”)
max_complex_size: int: The largest size of a complex, where BQB_HAS_PART terms will be retained. In most cases, complexes are handled with specific formation and dissolutation reactions,but these identifiers will be pulled in when searching by identifiers or searching the identifiers associated with a species against an external resource such as Open Targets.
max_promiscuity: int: Maximum number of species where a single molecule can act as a BQB_HAS_PART component associated with a single identifier (and common ontology).
defining_biological_qualifiers (list[str]):: BQB codes which define distinct entities. Narrowly this would be BQB_IS, while more permissive settings would include homologs, different forms of the same gene.

Returns:

species_id: pd.DataFrame: Input species filtered to characteristic identifiers

napistu.sbml_dfs_utils.find_underspecified_reactions(reaction_species_w_roles: DataFrame) → DataFrame

napistu.sbml_dfs_utils.find_unused_entities(sbml_dfs_or_dict: SBML_dfs | dict[str, pd.DataFrame]) → dict[str, set[str]]

napistu.sbml_dfs_utils.force_edgelist_consistency(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame) → tuple[DataFrame, DataFrame, DataFrame]

Force the edgelist to be consistent with the species and compartments dataframes.

Parameters:

interaction_edgelist (pd.DataFrame) – The interaction edgelist to force consistency with
species_df (pd.DataFrame) – The species dataframe to force consistency with
compartments_df (pd.DataFrame) – The compartments dataframe to force consistency with

Returns:

A tuple containing the filtered interaction edgelist, species dataframe, and compartments dataframe

Return type:

tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

napistu.sbml_dfs_utils.format_sbml_dfs_summary(data): Format model data into a clean summary table for Jupyter display

napistu.sbml_dfs_utils.get_current_max_id(sbml_dfs_table: DataFrame) → int

Get Current Max ID

Look at a table from an SBML_dfs object and find the largest primary key following the default naming convention for a the table.

Params: sbml_dfs_table (pd.DataFrame):

A table derived from an SBML_dfs object.

Returns: current_max_id (int):

The largest id which is already defined in the table using its expected naming convention. If no IDs following this convention are present then the default will be -1. In this way new IDs will be added starting with 0.

napistu.sbml_dfs_utils.id_formatter(id_values: Iterable[Any], id_type: str, id_len: int = 8) → list[str]

napistu.sbml_dfs_utils.id_formatter_inv(ids: list[str]) → list[int]

ID Formatter Inverter

Convert from internal IDs back to integer IDs

napistu.sbml_dfs_utils.match_entitydata_index_to_entity(entity_data_dict: dict, an_entity_data_type: str, consensus_entity_df: DataFrame, entity_schema: dict, table: str) → DataFrame

Match the index of entity_data_dict[an_entity_data_type] with the index of corresponding entity. Update entity_data_dict[an_entity_data_type]’s index to the same as consensus_entity_df’s index Report cases where entity_data has indices not in corresponding entity’s index. Args

entity_data_dict (dict): dictionary containing all model’s “an_entity_data_type” dictionaries an_entity_data_type (str): data_type from species/reactions_data in entity_data_dict consensus_entity_df (pd.DataFrame): the dataframe of the corresponding entity entity_schema (dict): schema for “table” table (str): table whose data is being consolidates (currently species or reactions)

Returns:: entity_data_df (pd.DataFrame) table for entity_data_dict[an_entity_data_type]

napistu.sbml_dfs_utils.species_type_types(x, ontology_to_species_type: dict = {'bigg_metabolite': 'metabolite', 'chebi': 'metabolite', 'corum': 'complex', 'drugbank': 'drug', 'ensembl_gene': 'protein', 'ensembl_protein': 'protein', 'ensembl_transcript': 'protein', 'gene_name': 'protein', 'kegg': 'metabolite', 'kegg.drug': 'drug', 'mirbase': 'regulatory_rna', 'ncbi_entrez_gene': 'protein', 'pubchem': 'metabolite', 'rnacentral': 'regulatory_rna', 'smiles': 'metabolite', 'symbol': 'protein', 'uniprot': 'protein'}, prioritized_species_types: set[str] = {'complex', 'drug'}) → str

Assign a high-level molecule type to a molecular species

Parameters:

x (Identifiers) – The identifiers object to assign a species type to
ontology_to_species_type (dict) – The mapping of ontologies to species types
prioritized_species_types (set[str]) – The set of prioritized species types

Returns:

The high-level molecule type of the species

Return type:

str

Examples

>>> identifiers = Identifiers([{'ontology': 'CHEBI', 'identifier': '123456', 'bqb': 'BQB.IS'}])
>>> species_type_types(identifiers)
'metabolite'

napistu.sbml_dfs_utils.stub_compartments(stubbed_compartment: str = 'cellular_component', with_source: bool = False) → DataFrame

Stub Compartments

Create a compartments table with only a single compartment

Parameters:

stubbed_compartment (str) – the name of a compartment which should match the keys in ingestion.constants.VALID_COMPARTMENTS and ingestion.constants.COMPARTMENTS_GO_TERMS
with_source (bool) – whether to include a source column in the compartments dataframe. Defaults to False which is the standard approach for edgelist creation. True will create a valid compartments table with a c_Source column.

Returns:

compartments_df – compartments dataframe

Return type:

pd.DataFrame

napistu.sbml_dfs_utils.unnest_identifiers(id_table: DataFrame, id_var: str) → DataFrame

Unnest Identifiers

Take a pd.DataFrame containing an array of Identifiers and return one-row per identifier.

Parameters:

id_table (pd.DataFrame) – Table containing Identifiers objects
id_var (str) – Column name containing Identifiers objects

Returns:

DataFrame with one row per identifier, MultiIndex with original index + entry

Return type:

pd.DataFrame

napistu.sbml_dfs_utils.validate_sbml_dfs_table(table_data: DataFrame, table_name: str) → None

Validate a standalone table against the SBML_dfs schema.

This function validates a table against the schema defined in SBML_DFS_SCHEMA, without requiring an SBML_dfs object. Useful for validating tables before creating an SBML_dfs object.

Parameters:

table_data (pd.DataFrame) – The table to validate
table_name (str) – Name of the table in the SBML_dfs schema

Raises:

ValueError –
If table_name is not in schema or validation fails –