napistu.sbml_dfs_utils
Utilities supporting creation and manipulation of SBML_dfs instances.
Public Functions
- add_missing_ids_column(contingency_table, reference_table, other_column_name=”other”) -> pd.DataFrame:
Add an ‘other’ column to a contingency table for IDs that exist in a reference table but are missing from the contingency table.
- add_sbo_role(reaction_species) -> pd.DataFrame:
Add an sbo_role column to the reaction_species table.
- check_entity_data_index_matching(sbml_dfs, table) -> sbml_dfs:
Update the input smbl_dfs’s entity_data (dict) index with match_entitydata_index_to_entity, so that index for dataframe(s) in entity_data (dict) matches the sbml_dfs’ corresponding entity, and then passes sbml_dfs.validate()
- construct_formula_string(reaction_species_df, reactions_df, name_var) -> str:
Construct Formula String
- create_reaction_formula_series(reaction_data, reactions_df, species_name_col, sort_cols, group_cols=None, add_compartment_prefix=False, r_id_col=SBML_DFS.R_ID, c_name_col=SBML_DFS.C_NAME) -> pd.Series:
Create a pd.Series of reaction formula strings.
- display_post_consensus_checks(checks_results) -> None:
Display post-consensus checks results.
- find_underspecified_reactions(reaction_species_w_roles) -> pd.DataFrame:
Find underspecified reactions in a reaction_species table.
- find_unused_entities(sbml_dfs_or_dict) -> dict[str, set[str]]:
Find unused entities in a SBML_dfs or dict of SBML_dfs instances.
- filter_to_characteristic_species_ids(species_ids, max_complex_size=4, max_promiscuity=20, defining_biological_qualifiers=BQB_DEFINING_ATTRS) -> pd.DataFrame:
Filter to characteristic species IDs.
- force_edgelist_consistency(interaction_edgelist, species_df, compartments_df) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
Force edgelist consistency.
- format_sbml_dfs_summary(data) -> str:
Format a summary of a SBML_dfs instance.
- get_current_max_id(sbml_dfs_table) -> int:
Get the current maximum ID for a given SBML_dfs table.
- id_formatter(input_vals, id_type, id_len=8) -> pd.Series:
Format a list of input values as a series of identifiers.
- id_formatter_inv(ids) -> list:
Invert the id_formatter function.
- match_entitydata_index_to_entity(entity_data_dict, an_entity_data_type, consensus_entity_df, entity_schema, table) -> pd.DataFrame:
Match the index of an entity data dictionary to the index of a consensus entity DataFrame.
- species_type_types(x, ontology_to_species_type=ONTOLOGY_TO_SPECIES_TYPE, prioritized_species_types=PRIORITIZED_SPECIES_TYPES) -> str:
Determine the species type of a given entity.
- stub_compartments(stubbed_compartment=GENERIC_COMPARTMENT, with_source=False) -> pd.DataFrame:
Stub compartments in a SBML_dfs instance.
- unnest_identifiers(id_table, id_var) -> pd.DataFrame:
Unnest identifiers from a table.
- validate_sbml_dfs_table(df, table) -> None:
Validate that a DataFrame is a valid SBML_dfs table.
Functions
|
Add an 'other' column to a contingency table for IDs that exist in a reference table but are missing from the contingency table. |
|
Add an sbo_role column to the reaction_species table. |
|
Update the input smbl_dfs's entity_data (dict) index with match_entitydata_index_to_entity, so that index for dataframe(s) in entity_data (dict) matches the sbml_dfs' corresponding entity, and then passes sbml_dfs.validate() Args sbml_dfs (cpr.SBML_dfs): a cpr.SBML_dfs table (str): table whose data is being consolidates (currently species or reactions) Returns sbml_dfs (cpr.SBML_dfs): sbml_dfs whose entity_data is checked to have the same index as the corresponding entity. |
Construct Formula String |
|
|
Helper function to create reaction formula series. |
|
|
|
Display the results of post_consensus_checks in a formatted way. |
|
Filter to Characteristic Species IDs |
|
|
Force the edgelist to be consistent with the species and compartments dataframes. |
|
|
Format model data into a clean summary table for Jupyter display |
|
Get Current Max ID |
|
|
|
ID Formatter Inverter |
Match the index of entity_data_dict[an_entity_data_type] with the index of corresponding entity. Update entity_data_dict[an_entity_data_type]'s index to the same as consensus_entity_df's index Report cases where entity_data has indices not in corresponding entity's index. Args entity_data_dict (dict): dictionary containing all model's "an_entity_data_type" dictionaries an_entity_data_type (str): data_type from species/reactions_data in entity_data_dict consensus_entity_df (pd.DataFrame): the dataframe of the corresponding entity entity_schema (dict): schema for "table" table (str): table whose data is being consolidates (currently species or reactions) :returns: entity_data_df (pd.DataFrame) table for entity_data_dict[an_entity_data_type]. |
|
|
Assign a high-level molecule type to a molecular species |
|
Stub Compartments |
|
Unnest Identifiers |
|
Validate a standalone table against the SBML_dfs schema. |
- napistu.sbml_dfs_utils._add_edgelist_defaults(interaction_edgelist: DataFrame, edgelist_defaults: dict[str, Any] | None = {'compartment_downstream': 'cellular_component', 'compartment_upstream': 'cellular_component', 'r_isreversible': False, 'sbo_term_name_downstream': 'modified', 'sbo_term_name_upstream': 'modifier', 'stoichiometry_downstream': 0, 'stoichiometry_upstream': 0}) DataFrame
Add default values to the interaction edgelist
- Parameters:
interaction_edgelist (pd.DataFrame) – The interaction edgelist to add defaults to
edgelist_defaults (dict[str, Any]) – The defaults to add to the interaction edgelist
Returns
- napistu.sbml_dfs_utils._add_stoi_to_species_name(stoi: float | int, name: str) str
Add Stoi To Species Name
Add # of molecules to a species name
Parameters:
- stoi: float or int
Number of molecules
- name: str
Name of species
Returns:
- name: str
Name containing number of species
- napistu.sbml_dfs_utils._dogmatic_to_defining_bqbs(dogmatic: bool = False) str
- napistu.sbml_dfs_utils._edgelist_create_compartmentalized_species(interaction_edgelist, species_df, compartments_df, interaction_source)
Create compartmentalized species from interactions.
- Parameters:
interaction_edgelist (pd.DataFrame) – Interaction data containing species-compartment combinations
species_df (pd.DataFrame) – Processed species data with IDs
compartments_df (pd.DataFrame) – Processed compartments data with IDs
interaction_source (source.Source) – Source object to assign to compartmentalized species
- Returns:
Compartmentalized species with formatted names and IDs
- Return type:
pd.DataFrame
- napistu.sbml_dfs_utils._edgelist_create_reactions_and_species(interaction_edgelist, comp_species, processed_species, processed_compartments, interaction_source, extra_reactions_columns)
Create reactions and reaction species from interactions.
- Parameters:
interaction_edgelist (pd.DataFrame) – Original interaction data
comp_species (pd.DataFrame) – Compartmentalized species with IDs
processed_species (pd.DataFrame) – Processed species data with IDs
processed_compartments (pd.DataFrame) – Processed compartments data with IDs
interaction_source (source.Source) – Source object for reactions
extra_reactions_columns (list) – Names of extra columns to preserve
- Returns:
(reactions_df, reaction_species_df, reactions_data)
- Return type:
tuple
- napistu.sbml_dfs_utils._edgelist_identify_extra_columns(interaction_edgelist, species_df, keep_reactions_data, keep_species_data)
Identify extra columns in input data that should be preserved.
- Parameters:
interaction_edgelist (pd.DataFrame) – Interaction data containing potential extra columns
species_df (pd.DataFrame) – Species data containing potential extra columns
keep_reactions_data (bool or str) – Whether to keep extra reaction columns
keep_species_data (bool or str) – Whether to keep extra species columns
- Returns:
Dictionary with ‘reactions’ and ‘species’ keys containing lists of extra column names
- Return type:
dict
- napistu.sbml_dfs_utils._edgelist_process_compartments(compartments_df, interaction_source)
Format compartments DataFrame with source and ID columns.
- Parameters:
compartments_df (pd.DataFrame) – Raw compartments data
interaction_source (source.Source) – Source object to assign to compartments
- Returns:
Processed compartments with IDs, indexed by compartment ID
- Return type:
pd.DataFrame
- napistu.sbml_dfs_utils._edgelist_process_species(species_df, interaction_source, extra_species_columns)
Format species DataFrame and extract extra data.
- Parameters:
species_df (pd.DataFrame) – Raw species data
interaction_source (source.Source) – Source object to assign to species
extra_species_columns (list) – Names of extra columns to preserve separately
- Returns:
Processed species DataFrame and species extra data DataFrame
- Return type:
tuple of pd.DataFrame
- napistu.sbml_dfs_utils._edgelist_validate_inputs(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame) None
Validate input DataFrames have required columns.
- Parameters:
interaction_edgelist (pd.DataFrame) – Interaction data to validate
species_df (pd.DataFrame) – Species data to validate
compartments_df (pd.DataFrame) – Compartments data to validate
- napistu.sbml_dfs_utils._filter_promiscuous_components(bqb_has_parts_species: DataFrame, max_promiscuity: int) DataFrame
- napistu.sbml_dfs_utils._filter_to_pathways(df: DataFrame, pathways: list[str]) DataFrame
Filter a table to only include pathways in the list.
- napistu.sbml_dfs_utils._find_underspecified_reactions(reaction_species_w_roles: DataFrame) DataFrame
- napistu.sbml_dfs_utils._get_interaction_symbol(sbo_term_or_name: str) str
- napistu.sbml_dfs_utils._id_dict_to_df(ids)
- napistu.sbml_dfs_utils._name_interaction(upstream_name: str, downstream_name: str, sbo_term_upstream: str | None = 'interactor')
Name an interaction
- Parameters:
upstream_name (str) – The name of the upstream species
downstream_name (str) – The name of the downstream species
sbo_term_upstream (str, optional) – The SBO term of the upstream species. If not provided, the interaction will be named “interactor”
- Returns:
The name of the interaction
- Return type:
str
- napistu.sbml_dfs_utils._perform_sbml_dfs_table_validation(table_data: DataFrame, table_schema: dict, table_name: str) None
Core validation logic for SBML_dfs tables.
This function performs the actual validation checks for any table against its schema, regardless of whether it’s part of an SBML_dfs object or standalone.
- table_datapd.DataFrame
The table data to validate
- table_schemadict
Schema definition for the table
- table_namestr
Name of the table (for error messages)
ValueError If the table does not conform to its schema: - Not a DataFrame - Wrong index name - Duplicate primary keys - Missing required variables - Empty table
- napistu.sbml_dfs_utils._sbml_dfs_from_edgelist_check_cspecies_merge(merged_species: DataFrame, original_species: DataFrame) None
Check for a mismatch between the provided species data and species implied by the edgelist.
- napistu.sbml_dfs_utils._select_priority_pathway_sources(source_table: DataFrame, priority_pathways: list[str] | None = ['BiGG', 'Dogma', 'IDEA', 'IntAct', 'OmniPath', 'Reactome', 'Reactome-FI', 'STRING', 'TRRUST', 'Recon3D', 'iMM1415', 'iMM904']) DataFrame
Filter the source table to only include pathways in the list. If 0 or 1 priority pathways are found, return the source table.
- Parameters:
source_table (pd.DataFrame) – The source table to filter
priority_pathways (Optional[list[str]], default DEFAULT_PRIORITIZED_PATHWAYS) – The list of pathways to filter to. If None, returns source_table with no filtering or warning. If fewer than 2 pathways are found in the source table, returns the full source table with a warning.
- Returns:
The filtered source table. If priority_pathways is None, returns the original source_table. If fewer than 2 priority pathways are found, returns the full source_table with a warning.
- Return type:
pd.DataFrame
- napistu.sbml_dfs_utils._summarize_ontology_cooccurrence(df: DataFrame, stratify_by_bqb: bool = True, allow_col_multindex: bool = False) DataFrame
Create a cooccurrence matrix of ontologies based entities sharing the same ontology.
This can be used to identify ontologies which are associated with the same types of entities.
- Parameters:
(pd.DataFrame) (df) – a table generated using sbml_dfs.get_sources
(bool) (allow_col_multindex) – whether to stratify by bqb
(bool) – whether to allow the column multindex
- Returns:
Square matrix with pathways as both rows and columns
- Return type:
pd.DataFrame
- napistu.sbml_dfs_utils._summarize_ontology_occurrence(df: DataFrame, stratify_by_bqb: bool = True, allow_col_multindex: bool = False, binarize: bool = False) DataFrame
Summarize the types of identifiers associated with each entity.
- Parameters:
(pd.DataFrame) (df) – a table generated using sbml_dfs.get_identifiers or sbml_dfs.get_characteristic_species_ids
(bool) (allow_col_multindex) – whether to stratify by bqb
(bool) – whether to allow the column multindex
binarize (bool) – whether to convert the result to binary values (0 vs 1+)
- Returns:
a table with entities as rows and ontologies as columns
- Return type:
pd.DataFrame
- napistu.sbml_dfs_utils._summarize_source_cooccurrence(df: DataFrame) DataFrame
Create a cooccurrence matrix of pathways based on the presence of entities in pathways.
- Parameters:
(pd.DataFrame) (df) – a table generated using sbml_dfs.get_sources
- Returns:
Square matrix with pathways as both rows and columns
- Return type:
pd.DataFrame
- napistu.sbml_dfs_utils._summarize_source_occurrence(df: DataFrame, binarize: bool = False) DataFrame
Summarize the occurrence of entities in pathways.
- Parameters:
(pd.DataFrame) (df) – a table generated using sbml_dfs.get_sources
binarize (bool) – whether to convert the result to binary values (0 vs 1+)
- Returns:
a table with entities as rows and pathways as columns
- Return type:
pd.DataFrame
- napistu.sbml_dfs_utils._validate_edgelist_consistency(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame, raise_on_missing: bool = True) None
Check for missing entity references, optionally raising or warning.
This function is used to validate the consistency of the interaction edgelist, species_df, and compartments_df.
- Parameters:
interaction_edgelist (pd.DataFrame) – The interaction edgelist to validate
species_df (pd.DataFrame) – The species dataframe to validate
compartments_df (pd.DataFrame) – The compartments dataframe to validate
raise_on_missing (bool, optional) – Whether to raise an error if missing entities are found
- Return type:
None
- napistu.sbml_dfs_utils._validate_matching_data(data_table: DataFrame, ref_table: DataFrame)
Validates a table against a reference
This check if the table has the same index, no duplicates in the index and that all values in the index are in the reference table.
- Parameters:
data_table (pd.DataFrame) – a table with data that should match the reference
ref_table (pd.DataFrame) – a reference table
- Raises:
ValueError – not same index name
ValueError – index contains duplicates
ValueError – index not subset of index of reactions table
- napistu.sbml_dfs_utils._validate_non_null_values(df: DataFrame, expected_columns: set, table_name: str) None
Validate that all required columns in a DataFrame have non-null values.
- Parameters:
df (pd.DataFrame) – The DataFrame to validate
expected_columns (set) – Set of column names that should have non-null values
table_name (str) – Name of the table for error messages
- Raises:
ValueError – If any required column contains null values
- napistu.sbml_dfs_utils._validate_sbo_values(sbo_series: Series, validate: str = 'names') None
Validate SBO terms or names
- Parameters:
sbo_series (pd.Series) – The SBO terms or names to validate.
validate (str, optional) – Whether the values are SBO terms (“terms”) or names (“names”, default).
- Return type:
None
- Raises:
ValueError – If the validation type is invalid.
TypeError – If the invalid_counts is not a pandas DataFrame.
ValueError – If some reaction species have unusable SBO terms.
- napistu.sbml_dfs_utils.add_missing_ids_column(contingency_table: DataFrame, reference_table: DataFrame, other_column_name: str = 'other') DataFrame
Add an ‘other’ column to a contingency table for IDs that exist in a reference table but are missing from the contingency table.
Parameters:
- contingency_tablepd.DataFrame
The contingency table with binary values (subset of IDs)
- reference_tablepd.DataFrame
The reference table containing all possible IDs
- other_column_namestr, optional
Name for the ‘other’ column, by default “other”
Returns:
- pd.DataFrame
Updated contingency table with ‘other’ column(s) added if there are missing IDs. If no IDs are missing, returns a copy of the original contingency table without adding an ‘other’ column.
Raises:
- ValueError
If the index names don’t match between the two tables
- napistu.sbml_dfs_utils.add_sbo_role(reaction_species: DataFrame) DataFrame
Add an sbo_role column to the reaction_species table.
The sbo_role column is a string column that contains the SBO role of the reaction species. The values in the sbo_role column are taken from the sbo_term column.
The sbo_role column is added to the reaction_species table by mapping the sbo_term column to the SBO_NAME_TO_ROLE dictionary.
- napistu.sbml_dfs_utils.check_entity_data_index_matching(sbml_dfs, table)
Update the input smbl_dfs’s entity_data (dict) index with match_entitydata_index_to_entity, so that index for dataframe(s) in entity_data (dict) matches the sbml_dfs’ corresponding entity, and then passes sbml_dfs.validate() Args
sbml_dfs (cpr.SBML_dfs): a cpr.SBML_dfs table (str): table whose data is being consolidates (currently species or reactions)
- Returns
sbml_dfs (cpr.SBML_dfs): sbml_dfs whose entity_data is checked to have the same index as the corresponding entity.
- napistu.sbml_dfs_utils.construct_formula_string(reaction_species_df: DataFrame, reactions_df: DataFrame, name_var: str) str
Construct Formula String
Convert a table of reaction species into a formula string
Parameters:
- reaction_species_df: pd.DataFrame
Table containing a reactions’ species
- reactions_df: pd.DataFrame
smbl.reactions
- name_var: str
Name used to label species
Returns:
- formula_str: str
String representation of a reactions substrates, products and modifiers
- napistu.sbml_dfs_utils.create_reaction_formula_series(reaction_data, reactions_df, species_name_col, sort_cols, group_cols=None, add_compartment_prefix=False, r_id_col='r_id', c_name_col='c_name')
Helper function to create reaction formula series.
Parameters:
- reaction_datapd.DataFrame
The reaction species data to process
- reactions_dfpd.DataFrame
The reactions dataframe needed by construct_formula_string
- species_name_colstr
Column name to use for species names in formulas
- sort_colslist
Columns to sort by before grouping
- group_colslist, optional
Columns to group by. If None, uses [r_id_col]
- add_compartment_prefixbool
Whether to add compartment name as prefix to formula
- r_id_colstr
Column name for reaction ID
- c_name_colstr
Column name for compartment name (used when add_compartment_prefix=True)
Returns:
pd.Series or None : Formula strings indexed by reaction ID, or None if no data
- napistu.sbml_dfs_utils.display(obj)
- napistu.sbml_dfs_utils.display_post_consensus_checks(checks_results: dict) None
Display the results of post_consensus_checks in a formatted way.
This function takes the results from the post_consensus_checks method and displays them using the same formatting as shown in the sandbox notebook.
- Parameters:
checks_results (dict) – Dictionary returned by the post_consensus_checks method, containing nested dictionaries with entity types and check types as keys, and DataFrames as values.
- Returns:
This function displays results but doesn’t return anything.
- Return type:
None
- napistu.sbml_dfs_utils.filter_to_characteristic_species_ids(species_ids: DataFrame, max_complex_size: int = 4, max_promiscuity: int = 20, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) DataFrame
Filter to Characteristic Species IDs
Remove identifiers corresponding to one component within a large protein complexes and non-characteristic annotations such as pubmed references and homologues.
- species_ids: pd.DataFrame
A table of identifiers produced by sdbml_dfs.get_identifiers(“species”)
- max_complex_size: int
The largest size of a complex, where BQB_HAS_PART terms will be retained. In most cases, complexes are handled with specific formation and dissolutation reactions,but these identifiers will be pulled in when searching by identifiers or searching the identifiers associated with a species against an external resource such as Open Targets.
- max_promiscuity: int
Maximum number of species where a single molecule can act as a BQB_HAS_PART component associated with a single identifier (and common ontology).
- defining_biological_qualifiers (list[str]):
BQB codes which define distinct entities. Narrowly this would be BQB_IS, while more permissive settings would include homologs, different forms of the same gene.
Returns:
- species_id: pd.DataFrame
Input species filtered to characteristic identifiers
- napistu.sbml_dfs_utils.find_underspecified_reactions(reaction_species_w_roles: DataFrame) DataFrame
- napistu.sbml_dfs_utils.find_unused_entities(sbml_dfs_or_dict: SBML_dfs | dict[str, pd.DataFrame]) dict[str, set[str]]
- napistu.sbml_dfs_utils.force_edgelist_consistency(interaction_edgelist: DataFrame, species_df: DataFrame, compartments_df: DataFrame) tuple[DataFrame, DataFrame, DataFrame]
Force the edgelist to be consistent with the species and compartments dataframes.
- Parameters:
interaction_edgelist (pd.DataFrame) – The interaction edgelist to force consistency with
species_df (pd.DataFrame) – The species dataframe to force consistency with
compartments_df (pd.DataFrame) – The compartments dataframe to force consistency with
- Returns:
A tuple containing the filtered interaction edgelist, species dataframe, and compartments dataframe
- Return type:
tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
- napistu.sbml_dfs_utils.format_sbml_dfs_summary(data)
Format model data into a clean summary table for Jupyter display
- napistu.sbml_dfs_utils.get_current_max_id(sbml_dfs_table: DataFrame) int
Get Current Max ID
Look at a table from an SBML_dfs object and find the largest primary key following the default naming convention for a the table.
Params: sbml_dfs_table (pd.DataFrame):
A table derived from an SBML_dfs object.
Returns: current_max_id (int):
The largest id which is already defined in the table using its expected naming convention. If no IDs following this convention are present then the default will be -1. In this way new IDs will be added starting with 0.
- napistu.sbml_dfs_utils.id_formatter(id_values: Iterable[Any], id_type: str, id_len: int = 8) list[str]
- napistu.sbml_dfs_utils.id_formatter_inv(ids: list[str]) list[int]
ID Formatter Inverter
Convert from internal IDs back to integer IDs
- napistu.sbml_dfs_utils.match_entitydata_index_to_entity(entity_data_dict: dict, an_entity_data_type: str, consensus_entity_df: DataFrame, entity_schema: dict, table: str) DataFrame
Match the index of entity_data_dict[an_entity_data_type] with the index of corresponding entity. Update entity_data_dict[an_entity_data_type]’s index to the same as consensus_entity_df’s index Report cases where entity_data has indices not in corresponding entity’s index. Args
entity_data_dict (dict): dictionary containing all model’s “an_entity_data_type” dictionaries an_entity_data_type (str): data_type from species/reactions_data in entity_data_dict consensus_entity_df (pd.DataFrame): the dataframe of the corresponding entity entity_schema (dict): schema for “table” table (str): table whose data is being consolidates (currently species or reactions)
- Returns:
entity_data_df (pd.DataFrame) table for entity_data_dict[an_entity_data_type]
- napistu.sbml_dfs_utils.species_type_types(x, ontology_to_species_type: dict = {'bigg_metabolite': 'metabolite', 'chebi': 'metabolite', 'corum': 'complex', 'drugbank': 'drug', 'ensembl_gene': 'protein', 'ensembl_protein': 'protein', 'ensembl_transcript': 'protein', 'gene_name': 'protein', 'kegg': 'metabolite', 'kegg.drug': 'drug', 'mirbase': 'regulatory_rna', 'ncbi_entrez_gene': 'protein', 'pubchem': 'metabolite', 'rnacentral': 'regulatory_rna', 'smiles': 'metabolite', 'symbol': 'protein', 'uniprot': 'protein'}, prioritized_species_types: set[str] = {'complex', 'drug'}) str
Assign a high-level molecule type to a molecular species
- Parameters:
x (Identifiers) – The identifiers object to assign a species type to
ontology_to_species_type (dict) – The mapping of ontologies to species types
prioritized_species_types (set[str]) – The set of prioritized species types
- Returns:
The high-level molecule type of the species
- Return type:
str
Examples
>>> identifiers = Identifiers([{'ontology': 'CHEBI', 'identifier': '123456', 'bqb': 'BQB.IS'}]) >>> species_type_types(identifiers) 'metabolite'
- napistu.sbml_dfs_utils.stub_compartments(stubbed_compartment: str = 'cellular_component', with_source: bool = False) DataFrame
Stub Compartments
Create a compartments table with only a single compartment
- Parameters:
stubbed_compartment (str) – the name of a compartment which should match the keys in ingestion.constants.VALID_COMPARTMENTS and ingestion.constants.COMPARTMENTS_GO_TERMS
with_source (bool) – whether to include a source column in the compartments dataframe. Defaults to False which is the standard approach for edgelist creation. True will create a valid compartments table with a c_Source column.
- Returns:
compartments_df – compartments dataframe
- Return type:
pd.DataFrame
- napistu.sbml_dfs_utils.unnest_identifiers(id_table: DataFrame, id_var: str) DataFrame
Unnest Identifiers
Take a pd.DataFrame containing an array of Identifiers and return one-row per identifier.
- Parameters:
id_table (pd.DataFrame) – Table containing Identifiers objects
id_var (str) – Column name containing Identifiers objects
- Returns:
DataFrame with one row per identifier, MultiIndex with original index + entry
- Return type:
pd.DataFrame
- napistu.sbml_dfs_utils.validate_sbml_dfs_table(table_data: DataFrame, table_name: str) None
Validate a standalone table against the SBML_dfs schema.
This function validates a table against the schema defined in SBML_DFS_SCHEMA, without requiring an SBML_dfs object. Useful for validating tables before creating an SBML_dfs object.
- Parameters:
table_data (pd.DataFrame) – The table to validate
table_name (str) – Name of the table in the SBML_dfs schema
- Raises:
ValueError –
If table_name is not in schema or validation fails –