napistu.sbml_dfs_utils

Utilities supporting creation and manipulation of SBML_dfs instances.

Public Functions

add_missing_ids_column(contingency_table, reference_table, other_column_name=”other”) -> pd.DataFrame:

Add an ‘other’ column to a contingency table for IDs that exist in a reference table but are missing from the contingency table.

add_sbo_role(reaction_species) -> pd.DataFrame:

Add an sbo_role column to the reaction_species table.

check_entity_data_index_matching(sbml_dfs, table) -> sbml_dfs:

Update the input smbl_dfs’s entity_data (dict) index with match_entitydata_index_to_entity, so that index for dataframe(s) in entity_data (dict) matches the sbml_dfs’ corresponding entity, and then passes sbml_dfs.validate()

construct_formula_string(reaction_species_df, reactions_df, name_var) -> str:

Construct Formula String

create_reaction_formula_series(reaction_data, reactions_df, species_name_col, sort_cols, group_cols=None, add_compartment_prefix=False, r_id_col=SBML_DFS.R_ID, c_name_col=SBML_DFS.C_NAME) -> pd.Series:

Create a pd.Series of reaction formula strings.

display_post_consensus_checks(checks_results) -> None:

Display post-consensus checks results.

find_underspecified_reactions(reaction_species_w_roles) -> pd.DataFrame:

Find underspecified reactions in a reaction_species table.

find_unused_entities(sbml_dfs_or_dict) -> dict[str, set[str]]:

Find unused entities in a SBML_dfs or dict of SBML_dfs instances.

filter_to_characteristic_species_ids(species_ids, max_complex_size=4, max_promiscuity=20, defining_biological_qualifiers=BQB_DEFINING_ATTRS) -> pd.DataFrame:

Filter to characteristic species IDs.

force_edgelist_consistency(interaction_edgelist, species_df, compartments_df) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:

Force edgelist consistency.

format_sbml_dfs_summary(data) -> str:

Format a summary of a SBML_dfs instance.

get_current_max_id(sbml_dfs_table) -> int:

Get the current maximum ID for a given SBML_dfs table.

id_formatter(input_vals, id_type, id_len=8) -> pd.Series:

Format a list of input values as a series of identifiers.

id_formatter_inv(ids) -> list:

Invert the id_formatter function.

match_entitydata_index_to_entity(entity_data_dict, an_entity_data_type, consensus_entity_df, entity_schema, table) -> pd.DataFrame:

Match the index of an entity data dictionary to the index of a consensus entity DataFrame.

species_type_types(x, ontology_to_species_type=ONTOLOGY_TO_SPECIES_TYPE, prioritized_species_types=PRIORITIZED_SPECIES_TYPES) -> str:

Determine the species type of a given entity.

stub_compartments(stubbed_compartment=GENERIC_COMPARTMENT, with_source=False) -> pd.DataFrame:

Stub compartments in a SBML_dfs instance.

unnest_identifiers(id_table, id_var) -> pd.DataFrame:

Unnest identifiers from a table.

validate_sbml_dfs_table(df, table) -> None:

Validate that a DataFrame is a valid SBML_dfs table.

Functions

add_missing_ids_column(contingency_table, ...)

Add an 'other' column to a contingency table for IDs that exist in a reference table but are missing from the contingency table.

add_sbo_role(reaction_species)

Add an sbo_role column to the reaction_species table.

check_entity_data_index_matching(sbml_dfs, table)

Update the input smbl_dfs's entity_data (dict) index with match_entitydata_index_to_entity, so that index for dataframe(s) in entity_data (dict) matches the sbml_dfs' corresponding entity, and then passes sbml_dfs.validate() Args sbml_dfs (cpr.SBML_dfs): a cpr.SBML_dfs table (str): table whose data is being consolidates (currently species or reactions) Returns sbml_dfs (cpr.SBML_dfs): sbml_dfs whose entity_data is checked to have the same index as the corresponding entity.

construct_formula_string(...)

Construct Formula String

create_reaction_formula_series(...[, ...])

Helper function to create reaction formula series.

display(obj)

display_post_consensus_checks(checks_results)

Display the results of post_consensus_checks in a formatted way.

filter_to_characteristic_species_ids(species_ids)

Filter to Characteristic Species IDs

find_underspecified_reactions(...)

find_unused_entities(sbml_dfs_or_dict)

force_edgelist_consistency(...)

Force the edgelist to be consistent with the species and compartments dataframes.

format_sbml_dfs_summary(data)

Format model data into a clean summary table for Jupyter display

get_current_max_id(sbml_dfs_table)

Get Current Max ID

id_formatter(id_values, id_type[, id_len])

id_formatter_inv(ids)

ID Formatter Inverter

match_entitydata_index_to_entity(...)

Match the index of entity_data_dict[an_entity_data_type] with the index of corresponding entity. Update entity_data_dict[an_entity_data_type]'s index to the same as consensus_entity_df's index Report cases where entity_data has indices not in corresponding entity's index. Args entity_data_dict (dict): dictionary containing all model's "an_entity_data_type" dictionaries an_entity_data_type (str): data_type from species/reactions_data in entity_data_dict consensus_entity_df (pd.DataFrame): the dataframe of the corresponding entity entity_schema (dict): schema for "table" table (str): table whose data is being consolidates (currently species or reactions) :returns: entity_data_df (pd.DataFrame) table for entity_data_dict[an_entity_data_type].

species_type_types(x[, ...])

Assign a high-level molecule type to a molecular species

stub_compartments([stubbed_compartment, ...])

Stub Compartments

unnest_identifiers(id_table, id_var)

Unnest Identifiers

validate_sbml_dfs_table(table_data, table_name)

Validate a standalone table against the SBML_dfs schema.

napistu.sbml_dfs_utils._add_edgelist_defaults(interaction_edgelist: DataFrame, edgelist_defaults: dict[str, Any] | None = {'compartment_downstream': 'cellular_component', 'compartment_upstream': 'cellular_component', 'r_isreversible': False, 'sbo_term_name_downstream': 'modified', 'sbo_term_name_upstream': 'modifier', 'stoichiometry_downstream': 0, 'stoichiometry_upstream': 0}) DataFrame

Add default values to the interaction edgelist

Parameters:
  • interaction_edgelist (pd.DataFrame) – The interaction edgelist to add defaults to

  • edgelist_defaults (dict[str, Any]) – The defaults to add to the interaction edgelist

  • Returns

napistu.sbml_dfs_utils._add_stoi_to_species_name(stoi: float | int, name: str) str

Add Stoi To Species Name

Add # of molecules to a species name

Parameters:

stoi: float or int

Number of molecules

name: str

Name of species

Returns:

name: str

Name containing number of species

napistu.sbml_dfs_utils._dogmatic_to_defining_bqbs(dogmatic: bool = False) str
napistu.sbml_dfs_utils._edgelist_create_compartmentalized_species(interaction_edgelist, species_df, compartments_df, interaction_source)

Create compartmentalized species from interactions.

Parameters:
  • interaction_edgelist (pd.DataFrame) – Interaction data containing species-compartment combinations

  • species_df (pd.DataFrame) – Processed species data with IDs

  • compartments_df (pd.DataFrame) – Processed compartments data with IDs

  • interaction_source (source.Source) – Source object to assign to compartmentalized species

Returns:

Compartmentalized species with formatted names and IDs

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._edgelist_create_reactions_and_species(interaction_edgelist, comp_species, processed_species, processed_compartments, interaction_source, extra_reactions_columns)

Create reactions and reaction species from interactions.

Parameters:
  • interaction_edgelist (pd.DataFrame) – Original interaction data

  • comp_species (pd.DataFrame) – Compartmentalized species with IDs

  • processed_species (pd.DataFrame) – Processed species data with IDs

  • processed_compartments (pd.DataFrame) – Processed compartments data with IDs

  • interaction_source (source.Source) – Source object for reactions

  • extra_reactions_columns (list) – Names of extra columns to preserve

Returns:

(reactions_df, reaction_species_df, reactions_data)

Return type:

tuple

napistu.sbml_dfs_utils._edgelist_identify_extra_columns(interaction_edgelist, species_df, keep_reactions_data, keep_species_data)

Identify extra columns in input data that should be preserved.

Parameters:
  • interaction_edgelist (pd.DataFrame) – Interaction data containing potential extra columns

  • species_df (pd.DataFrame) – Species data containing potential extra columns

  • keep_reactions_data (bool or str) – Whether to keep extra reaction columns

  • keep_species_data (bool or str) – Whether to keep extra species columns

Returns:

Dictionary with ‘reactions’ and ‘species’ keys containing lists of extra column names

Return type:

dict

napistu.sbml_dfs_utils._edgelist_process_compartments(compartments_df, interaction_source)

Format compartments DataFrame with source and ID columns.

Parameters:
  • compartments_df (pd.DataFrame) – Raw compartments data

  • interaction_source (source.Source) – Source object to assign to compartments

Returns:

Processed compartments with IDs, indexed by compartment ID

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._edgelist_process_species(species_df, interaction_source, extra_species_columns)

Format species DataFrame and extract extra data.

Parameters:
  • species_df (pd.DataFrame) – Raw species data

  • interaction_source (source.Source) – Source object to assign to species

  • extra_species_columns (list) – Names of extra columns to preserve separately

Returns:

Processed species DataFrame and species extra data DataFrame

Return type:

tuple of pd.DataFrame

napistu.sbml_dfs_utils._edgelist_validate_inputs(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame) None

Validate input DataFrames have required columns.

Parameters:
  • interaction_edgelist (pd.DataFrame) – Interaction data to validate

  • species_df (pd.DataFrame) – Species data to validate

  • compartments_df (pd.DataFrame) – Compartments data to validate

napistu.sbml_dfs_utils._filter_promiscuous_components(bqb_has_parts_species: DataFrame, max_promiscuity: int) DataFrame
napistu.sbml_dfs_utils._filter_to_pathways(df: DataFrame, pathways: list[str]) DataFrame

Filter a table to only include pathways in the list.

napistu.sbml_dfs_utils._find_underspecified_reactions(reaction_species_w_roles: DataFrame) DataFrame
napistu.sbml_dfs_utils._get_interaction_symbol(sbo_term_or_name: str) str
napistu.sbml_dfs_utils._id_dict_to_df(ids)
napistu.sbml_dfs_utils._name_interaction(upstream_name: str, downstream_name: str, sbo_term_upstream: str | None = 'interactor')

Name an interaction

Parameters:
  • upstream_name (str) – The name of the upstream species

  • downstream_name (str) – The name of the downstream species

  • sbo_term_upstream (str, optional) – The SBO term of the upstream species. If not provided, the interaction will be named “interactor”

Returns:

The name of the interaction

Return type:

str

napistu.sbml_dfs_utils._perform_sbml_dfs_table_validation(table_data: DataFrame, table_schema: dict, table_name: str) None

Core validation logic for SBML_dfs tables.

This function performs the actual validation checks for any table against its schema, regardless of whether it’s part of an SBML_dfs object or standalone.

table_datapd.DataFrame

The table data to validate

table_schemadict

Schema definition for the table

table_namestr

Name of the table (for error messages)

ValueError If the table does not conform to its schema: - Not a DataFrame - Wrong index name - Duplicate primary keys - Missing required variables - Empty table

napistu.sbml_dfs_utils._sbml_dfs_from_edgelist_check_cspecies_merge(merged_species: DataFrame, original_species: DataFrame) None

Check for a mismatch between the provided species data and species implied by the edgelist.

napistu.sbml_dfs_utils._select_priority_pathway_sources(source_table: DataFrame, priority_pathways: list[str] | None = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) DataFrame

Filter the source table to only include pathways in the list. If 0 or 1 priority pathways are found, return the source table.

Parameters:
  • source_table (pd.DataFrame) – The source table to filter

  • priority_pathways (Optional[list[str]], default DEFAULT_PRIORITIZED_PATHWAYS) – The list of pathways to filter to. If None, returns source_table with no filtering or warning. If fewer than 2 pathways are found in the source table, returns the full source table with a warning.

Returns:

The filtered source table. If priority_pathways is None, returns the original source_table. If fewer than 2 priority pathways are found, returns the full source_table with a warning.

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._summarize_ontology_cooccurrence(df: DataFrame, stratify_by_bqb: bool = True, allow_col_multindex: bool = False) DataFrame

Create a cooccurrence matrix of ontologies based entities sharing the same ontology.

This can be used to identify ontologies which are associated with the same types of entities.

Parameters:
  • (pd.DataFrame) (df) – a table generated using sbml_dfs.get_sources

  • (bool) (allow_col_multindex) – whether to stratify by bqb

  • (bool) – whether to allow the column multindex

Returns:

Square matrix with pathways as both rows and columns

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._summarize_ontology_occurrence(df: DataFrame, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, binarize: bool = False) DataFrame

Summarize the types of identifiers associated with each entity.

Parameters:
  • (pd.DataFrame) (df) – a table generated using sbml_dfs.get_identifiers or sbml_dfs.get_characteristic_species_ids

  • (bool) (allow_col_multindex) – whether to stratify by bqb

  • (bool) – whether to allow the column multindex

  • binarize (bool) – whether to convert the result to binary values (0 vs 1+)

Returns:

a table with entities as rows and ontologies as columns

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._summarize_source_cooccurrence(df: DataFrame) DataFrame

Create a cooccurrence matrix of pathways based on the presence of entities in pathways.

Parameters:

(pd.DataFrame) (df) – a table generated using sbml_dfs.get_sources

Returns:

Square matrix with pathways as both rows and columns

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._summarize_source_occurrence(df: DataFrame, binarize: bool = False) DataFrame

Summarize the occurrence of entities in pathways.

Parameters:
  • (pd.DataFrame) (df) – a table generated using sbml_dfs.get_sources

  • binarize (bool) – whether to convert the result to binary values (0 vs 1+)

Returns:

a table with entities as rows and pathways as columns

Return type:

pd.DataFrame

napistu.sbml_dfs_utils._validate_edgelist_consistency(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame, raise_on_missing: bool = True) None

Check for missing entity references, optionally raising or warning.

This function is used to validate the consistency of the interaction edgelist, species_df, and compartments_df.

Parameters:
  • interaction_edgelist (pd.DataFrame) – The interaction edgelist to validate

  • species_df (pd.DataFrame) – The species dataframe to validate

  • compartments_df (pd.DataFrame) – The compartments dataframe to validate

  • raise_on_missing (bool, optional) – Whether to raise an error if missing entities are found

Return type:

None

napistu.sbml_dfs_utils._validate_matching_data(data_table: DataFrame, ref_table: DataFrame)

Validates a table against a reference

This check if the table has the same index, no duplicates in the index and that all values in the index are in the reference table.

Parameters:
  • data_table (pd.DataFrame) – a table with data that should match the reference

  • ref_table (pd.DataFrame) – a reference table

Raises:
  • ValueError – not same index name

  • ValueError – index contains duplicates

  • ValueError – index not subset of index of reactions table

napistu.sbml_dfs_utils._validate_non_null_values(df: DataFrame, expected_columns: set, table_name: str) None

Validate that all required columns in a DataFrame have non-null values.

Parameters:
  • df (pd.DataFrame) – The DataFrame to validate

  • expected_columns (set) – Set of column names that should have non-null values

  • table_name (str) – Name of the table for error messages

Raises:

ValueError – If any required column contains null values

napistu.sbml_dfs_utils._validate_sbo_values(sbo_series: Series, validate: str = 'names') None

Validate SBO terms or names

Parameters:
  • sbo_series (pd.Series) – The SBO terms or names to validate.

  • validate (str, optional) – Whether the values are SBO terms (“terms”) or names (“names”, default).

Return type:

None

Raises:
  • ValueError – If the validation type is invalid.

  • TypeError – If the invalid_counts is not a pandas DataFrame.

  • ValueError – If some reaction species have unusable SBO terms.

napistu.sbml_dfs_utils.add_missing_ids_column(contingency_table: DataFrame, reference_table: DataFrame, other_column_name: str = 'other') DataFrame

Add an ‘other’ column to a contingency table for IDs that exist in a reference table but are missing from the contingency table.

Parameters:

contingency_tablepd.DataFrame

The contingency table with binary values (subset of IDs)

reference_tablepd.DataFrame

The reference table containing all possible IDs

other_column_namestr, optional

Name for the ‘other’ column, by default “other”

Returns:

pd.DataFrame

Updated contingency table with ‘other’ column(s) added if there are missing IDs. If no IDs are missing, returns a copy of the original contingency table without adding an ‘other’ column.

Raises:

ValueError

If the index names don’t match between the two tables

napistu.sbml_dfs_utils.add_sbo_role(reaction_species: DataFrame) DataFrame

Add an sbo_role column to the reaction_species table.

The sbo_role column is a string column that contains the SBO role of the reaction species. The values in the sbo_role column are taken from the sbo_term column.

The sbo_role column is added to the reaction_species table by mapping the sbo_term column to the SBO_NAME_TO_ROLE dictionary.

napistu.sbml_dfs_utils.check_entity_data_index_matching(sbml_dfs, table)

Update the input smbl_dfs’s entity_data (dict) index with match_entitydata_index_to_entity, so that index for dataframe(s) in entity_data (dict) matches the sbml_dfs’ corresponding entity, and then passes sbml_dfs.validate() Args

sbml_dfs (cpr.SBML_dfs): a cpr.SBML_dfs table (str): table whose data is being consolidates (currently species or reactions)

Returns

sbml_dfs (cpr.SBML_dfs): sbml_dfs whose entity_data is checked to have the same index as the corresponding entity.

napistu.sbml_dfs_utils.construct_formula_string(reaction_species_df: DataFrame, reactions_df: DataFrame, name_var: str) str

Construct Formula String

Convert a table of reaction species into a formula string

Parameters:

reaction_species_df: pd.DataFrame

Table containing a reactions’ species

reactions_df: pd.DataFrame

smbl.reactions

name_var: str

Name used to label species

Returns:

formula_str: str

String representation of a reactions substrates, products and modifiers

napistu.sbml_dfs_utils.create_reaction_formula_series(reaction_data, reactions_df, species_name_col, sort_cols, group_cols=None, add_compartment_prefix=False, r_id_col='r_id', c_name_col='c_name')

Helper function to create reaction formula series.

Parameters:

reaction_datapd.DataFrame

The reaction species data to process

reactions_dfpd.DataFrame

The reactions dataframe needed by construct_formula_string

species_name_colstr

Column name to use for species names in formulas

sort_colslist

Columns to sort by before grouping

group_colslist, optional

Columns to group by. If None, uses [r_id_col]

add_compartment_prefixbool

Whether to add compartment name as prefix to formula

r_id_colstr

Column name for reaction ID

c_name_colstr

Column name for compartment name (used when add_compartment_prefix=True)

Returns:

pd.Series or None : Formula strings indexed by reaction ID, or None if no data

napistu.sbml_dfs_utils.display(obj)
napistu.sbml_dfs_utils.display_post_consensus_checks(checks_results: dict) None

Display the results of post_consensus_checks in a formatted way.

This function takes the results from the post_consensus_checks method and displays them using the same formatting as shown in the sandbox notebook.

Parameters:

checks_results (dict) – Dictionary returned by the post_consensus_checks method, containing nested dictionaries with entity types and check types as keys, and DataFrames as values.

Returns:

This function displays results but doesn’t return anything.

Return type:

None

napistu.sbml_dfs_utils.filter_to_characteristic_species_ids(species_ids: DataFrame, max_complex_size: int = 4, max_promiscuity: int = 20, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) DataFrame

Filter to Characteristic Species IDs

Remove identifiers corresponding to one component within a large protein complexes and non-characteristic annotations such as pubmed references and homologues.

species_ids: pd.DataFrame

A table of identifiers produced by sdbml_dfs.get_identifiers(“species”)

max_complex_size: int

The largest size of a complex, where BQB_HAS_PART terms will be retained. In most cases, complexes are handled with specific formation and dissolutation reactions,but these identifiers will be pulled in when searching by identifiers or searching the identifiers associated with a species against an external resource such as Open Targets.

max_promiscuity: int

Maximum number of species where a single molecule can act as a BQB_HAS_PART component associated with a single identifier (and common ontology).

defining_biological_qualifiers (list[str]):

BQB codes which define distinct entities. Narrowly this would be BQB_IS, while more permissive settings would include homologs, different forms of the same gene.

Returns:

species_id: pd.DataFrame

Input species filtered to characteristic identifiers

napistu.sbml_dfs_utils.find_underspecified_reactions(reaction_species_w_roles: DataFrame) DataFrame
napistu.sbml_dfs_utils.find_unused_entities(sbml_dfs_or_dict: SBML_dfs | dict[str, pd.DataFrame]) dict[str, set[str]]
napistu.sbml_dfs_utils.force_edgelist_consistency(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame) tuple[DataFrame, DataFrame, DataFrame]

Force the edgelist to be consistent with the species and compartments dataframes.

Parameters:
  • interaction_edgelist (pd.DataFrame) – The interaction edgelist to force consistency with

  • species_df (pd.DataFrame) – The species dataframe to force consistency with

  • compartments_df (pd.DataFrame) – The compartments dataframe to force consistency with

Returns:

A tuple containing the filtered interaction edgelist, species dataframe, and compartments dataframe

Return type:

tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

napistu.sbml_dfs_utils.format_sbml_dfs_summary(data)

Format model data into a clean summary table for Jupyter display

napistu.sbml_dfs_utils.get_current_max_id(sbml_dfs_table: DataFrame) int

Get Current Max ID

Look at a table from an SBML_dfs object and find the largest primary key following the default naming convention for a the table.

Params: sbml_dfs_table (pd.DataFrame):

A table derived from an SBML_dfs object.

Returns: current_max_id (int):

The largest id which is already defined in the table using its expected naming convention. If no IDs following this convention are present then the default will be -1. In this way new IDs will be added starting with 0.

napistu.sbml_dfs_utils.id_formatter(id_values: Iterable[Any], id_type: str, id_len: int = 8) list[str]
napistu.sbml_dfs_utils.id_formatter_inv(ids: list[str]) list[int]

ID Formatter Inverter

Convert from internal IDs back to integer IDs

napistu.sbml_dfs_utils.match_entitydata_index_to_entity(entity_data_dict: dict, an_entity_data_type: str, consensus_entity_df: DataFrame, entity_schema: dict, table: str) DataFrame

Match the index of entity_data_dict[an_entity_data_type] with the index of corresponding entity. Update entity_data_dict[an_entity_data_type]’s index to the same as consensus_entity_df’s index Report cases where entity_data has indices not in corresponding entity’s index. Args

entity_data_dict (dict): dictionary containing all model’s “an_entity_data_type” dictionaries an_entity_data_type (str): data_type from species/reactions_data in entity_data_dict consensus_entity_df (pd.DataFrame): the dataframe of the corresponding entity entity_schema (dict): schema for “table” table (str): table whose data is being consolidates (currently species or reactions)

Returns:

entity_data_df (pd.DataFrame) table for entity_data_dict[an_entity_data_type]

napistu.sbml_dfs_utils.species_type_types(x, ontology_to_species_type: dict = {'bigg_metabolite': 'metabolite', 'chebi': 'metabolite', 'corum': 'complex', 'drugbank': 'drug', 'ensembl_gene': 'protein', 'ensembl_protein': 'protein', 'ensembl_transcript': 'protein', 'gene_name': 'protein', 'kegg': 'metabolite', 'kegg.drug': 'drug', 'mirbase': 'regulatory_rna', 'ncbi_entrez_gene': 'protein', 'pubchem': 'metabolite', 'rnacentral': 'regulatory_rna', 'smiles': 'metabolite', 'symbol': 'protein', 'uniprot': 'protein'}, prioritized_species_types: set[str] = {'complex', 'drug'}) str

Assign a high-level molecule type to a molecular species

Parameters:
  • x (Identifiers) – The identifiers object to assign a species type to

  • ontology_to_species_type (dict) – The mapping of ontologies to species types

  • prioritized_species_types (set[str]) – The set of prioritized species types

Returns:

The high-level molecule type of the species

Return type:

str

Examples

>>> identifiers = Identifiers([{'ontology': 'CHEBI', 'identifier': '123456', 'bqb': 'BQB.IS'}])
>>> species_type_types(identifiers)
'metabolite'
napistu.sbml_dfs_utils.stub_compartments(stubbed_compartment: str = 'cellular_component', with_source: bool = False) DataFrame

Stub Compartments

Create a compartments table with only a single compartment

Parameters:
  • stubbed_compartment (str) – the name of a compartment which should match the keys in ingestion.constants.VALID_COMPARTMENTS and ingestion.constants.COMPARTMENTS_GO_TERMS

  • with_source (bool) – whether to include a source column in the compartments dataframe. Defaults to False which is the standard approach for edgelist creation. True will create a valid compartments table with a c_Source column.

Returns:

compartments_df – compartments dataframe

Return type:

pd.DataFrame

napistu.sbml_dfs_utils.unnest_identifiers(id_table: DataFrame, id_var: str) DataFrame

Unnest Identifiers

Take a pd.DataFrame containing an array of Identifiers and return one-row per identifier.

Parameters:
  • id_table (pd.DataFrame) – Table containing Identifiers objects

  • id_var (str) – Column name containing Identifiers objects

Returns:

DataFrame with one row per identifier, MultiIndex with original index + entry

Return type:

pd.DataFrame

napistu.sbml_dfs_utils.validate_sbml_dfs_table(table_data: DataFrame, table_name: str) None

Validate a standalone table against the SBML_dfs schema.

This function validates a table against the schema defined in SBML_DFS_SCHEMA, without requiring an SBML_dfs object. Useful for validating tables before creating an SBML_dfs object.

Parameters:
  • table_data (pd.DataFrame) – The table to validate

  • table_name (str) – Name of the table in the SBML_dfs schema

Raises:
  • ValueError

  • If table_name is not in schema or validation fails