napistu.consensus

Creating a consensus model by merging shared entities across pathway models.

Public Functions

construct_consensus_model(sbml_dfs_dict, pw_index, model_source=None, dogmatic=True, check_mergeability=True, no_rxn_pathway_ids=None) -> SBML_dfs:: Construct a Consensus Model by merging shared entities across pathway models.
construct_meta_entities_fk(sbml_dfs_dict, pw_index, table=”compartmentalized_species”, fk_lookup_tables={}, extra_defining_attrs=[]) -> tuple[pd.DataFrame, pd.Series]:: Construct Meta Entities Defined by Foreign Keys
construct_meta_entities_identifiers(sbml_dfs_dict, pw_index, table, fk_lookup_tables={}, defining_biological_qualifiers=BQB_DEFINING_ATTRS) -> tuple[pd.DataFrame, pd.Series]:: Construct meta-entities by merging entities across models that share identifiers.
construct_meta_entities_members(sbml_dfs_dict, pw_index=None, table=”reactions”, defined_by=”reaction_species”, defined_lookup_tables={}, defining_attrs=[SC_ID, STOICHIOMETRY]) -> tuple[pd.DataFrame, pd.Series]:: Construct Meta Entities Defined by Membership
construct_sbml_dfs_dict(pw_index, strict=True, verbose=False) -> dict[str, SBML_dfs]:: Construct a dictionary of SBML_dfs objects from a pathway index.
prepare_consensus_model(sbml_dfs_list) -> tuple[dict[str, SBML_dfs], PWIndex]:: Prepare for creating a consensus model using a list of to-be-consolidated sbml_dfs objects.

Functions

`construct_consensus_model`(sbml_dfs_dict, ...)	Construct a Consensus Model by merging shared entities across pathway models.
`construct_meta_entities_fk`(sbml_dfs_dict, ...)	Construct Meta Entities Defined by Foreign Keys
`construct_meta_entities_identifiers`(...[, ...])	Construct meta-entities by merging entities across models that share identifiers.
`construct_meta_entities_members`(...[, ...])	Construct Meta Entities Defined by Membership
`construct_sbml_dfs_dict`(pw_index[, strict, ...])	Construct a dictionary of SBML_dfs objects from a pathway index.
`prepare_consensus_model`(sbml_dfs_list)	Prepare for creating a consensus model using a list of to-be-consolidated sbml_dfs objects.

napistu.consensus._add_consensus_sources(new_id_table: DataFrame, agg_table_harmonized: DataFrame, lookup_table: Series, table_schema: dict, pw_index: PWIndex) → DataFrame

Add source information to the consensus table.

Parameters:

new_id_table: pd.DataFrame: Consensus table without source information
agg_table_harmonized: pd.DataFrame: Original table with cluster assignments
lookup_table: pd.Series: Maps old IDs to new consensus IDs
table_schema: dict: Schema for the table
pw_index: PWIndex: An index of all tables being aggregated

Returns:

pd.DataFrame: Consensus table with source information added

napistu.consensus._add_entity_data(sbml_dfs: SBML_dfs, sbml_dfs_dict: dict[str, SBML_dfs], lookup_tables: dict) → SBML_dfs

Add entity data from component models to the consensus model.

Parameters:

sbml_dfs: SBML_dfs: The consensus model being built
sbml_dfs_dict: dict[str, SBML_dfs]: A dictionary of SBML_dfs from different models
lookup_tables: dict: Dictionary of lookup tables for translating between old and new entity IDs

Returns:

SBML_dfs: The updated consensus model

napistu.consensus._build_consensus_identifiers(sbml_df: DataFrame, table_schema: dict, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) → tuple[Series, DataFrame]

Build consensus identifiers by clustering entities that share biological identifiers.

This function takes a set of entities spanning multiple models and finds all unique entities by grouping them according to the provided biological qualifiers. It returns a mapping from original entities to clusters and a DataFrame of consensus identifier objects for each cluster.

Parameters:

sbml_df (pd.DataFrame) – Table of entities from multiple models, with model in the index (as produced by _unnest_SBML_df).
table_schema (dict) – Schema for the table being processed.
defining_biological_qualifiers (list[str], optional) – List of biological qualifier types to use for grouping. Defaults to BQB_DEFINING_ATTRS.

Returns:

indexed_cluster (pd.Series) – Series mapping the index from sbml_df onto a set of clusters which define unique entities.
cluster_consensus_identifiers_df (pd.DataFrame) – DataFrame mapping clusters to consensus identifiers (Identifiers objects).

napistu.consensus._check_sbml_dfs(sbml_dfs: SBML_dfs, model_label: str, N_examples: int | str = 5) → None: Check SBML_dfs for identifiers which are associated with different entities before a merge.

napistu.consensus._check_sbml_dfs_dict(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, check_mergeability: bool = True) → None

Check models in SBML_dfs for problems which can be reported up-front

Parameters:

sbml_dfs_dict (dict[str, SBML_dfs]) – a dict of sbml_dfs models;
pw_index (indices.PWIndex) – an index of all tables being aggregated
check_mergeability (bool, default=True) – whether to check for issues which will prevent merging across models

Returns:

This function returns None but logs error messages if incompatible ontology structures are detected.

Return type:

None

napistu.consensus._check_sbml_dfs_mergeability(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex) → None

Check SBML_dfs for obvious issues which will prevent merging across models.

Parameters:

sbml_dfs_dict (dict[str, SBML_dfs]) – a dict of sbml_dfs models;
pw_index (indices.PWIndex) – an index of all tables being aggregated

Returns:

This function returns None but logs error messages if incompatible ontology structures are detected.

Return type:

None

napistu.consensus._create_cluster_identifiers(meta_identifiers: DataFrame, indexed_cluster: Series, sbml_df: DataFrame, ind_clusters: DataFrame, table_schema: dict) → DataFrame

Create identifier objects for each cluster.

Parameters:

meta_identifiers (pd.DataFrame) – All identifiers (including those filtered out by BQB)
indexed_cluster (pd.Series) – Maps entity indices to cluster IDs
sbml_df (pd.DataFrame) – Original table of entities
ind_clusters (pd.DataFrame) – Cluster assignments from graph algorithm
table_schema (dict) – Schema for the table, used to determine the correct identifier column name

Returns:

Table mapping clusters to their consensus identifiers, with the identifier column named according to the schema

Return type:

pd.DataFrame

napistu.consensus._create_consensus_entities(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, defining_biological_qualifiers: list[str], no_rxn_pathway_ids: list[str] | None = None) → tuple[dict, dict]

Create consensus entities for all primary tables in the model.

This helper function creates consensus compartments, species, compartmentalized species, reactions, and reaction species by finding shared entities across source models.

Parameters:

sbml_dfs_dict: dict{SBML_dfs}: A dictionary of SBML_dfs from different models
pw_index: PWIndex: An index of all tables being aggregated
defining_biological_qualifiers: list[str]: Biological qualifier terms that define distinct entities
no_rxn_pathway_ids: Optional[list[str]] = None,: The pathway ids for models which should not have reactions. If None, use the defaults. This can be used to include pathways which are just metadata like “Dogma”.

Returns:

tuple:

dict of consensus entities tables
dict of lookup tables

napistu.consensus._create_consensus_entity_data(combined_entity_data: DataFrame, primary_key: str) → DataFrame

Create consensus entity data by combining multiple rows with the same index value.

This function takes a DataFrame that might have multiple rows for the same index value and combines them so there is exactly 1 row per index value using the “first” method.

Parameters:

combined_entity_datapd.DataFrame: Input DataFrame with potentially multiple rows per index value
primary_keystr: The column name to use as the primary key for grouping

Returns:

pd.DataFrame: DataFrame with exactly one row per unique primary key value

napistu.consensus._create_consensus_sources(agg_tbl: DataFrame, lookup_table: Series, table_schema: dict, pw_index: PWIndex) → Series

Create Consensus Sources

Annotate the source of to-be-merged species with the models they came from, and combine with existing annotations.

Parameters:

agg_tbl: pd.DataFrame: A table containing existing Source objects and a many-1 “new_id” of their post-aggregation consensus entity
lookup_table: pd.Series: A series where the index are old identifiers and the values are post-aggregation new identifiers
table_schema: dict: Summary of the schema for the operant entitye type
pw_index: PWIndex: An index of all tables being aggregated

Returns:

new_sources: pd.DataFrame: Mapping where the index is new identifiers and values are aggregated Source objects

napistu.consensus._create_consensus_table(agg_primary_table: DataFrame, lookup_table: Series, updated_identifiers: Series, table_schema: dict) → DataFrame

Create a consensus table with merged entities.

Parameters:

agg_primary_table: pd.DataFrame: Table of entities
lookup_table: pd.Series: Lookup table mapping old IDs to new IDs
updated_identifiers: pd.Series: Series mapping new IDs to merged identifier objects
table_schema: dict: Schema for the table

Returns:

pd.DataFrame: Consensus table with one row per unique entity

napistu.consensus._create_default_consensus_source(sbml_dfs_dict: dict[str, SBML_dfs]) → Source

A default consensus source is created when no model source object is provided.

Parameters:: sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.
Returns:: A default consensus source object.
Return type:: Source

napistu.consensus._create_entity_consensus(membership_lookup: DataFrame, table_schema: dict) → tuple[DataFrame, Series]

Create consensus entities based on membership.

Parameters:

membership_lookup: pd.DataFrame: Table mapping entities to their member strings
table_schema: dict: Schema for the table

Returns:

tuple:

Consensus entities DataFrame
Lookup table mapping old IDs to new IDs

napistu.consensus._create_entity_lookup_table(agg_table_harmonized: DataFrame, table_schema: dict) → Series

Create a lookup table mapping original entity IDs to new consensus IDs.

Parameters:

agg_table_harmonized: pd.DataFrame: Table with cluster assignments for each entity
table_schema: dict: Schema for the table

Returns:

pd.Series: Lookup table mapping old entity IDs to new consensus IDs

napistu.consensus._create_member_string(x: list[str]) → str

napistu.consensus._create_membership_lookup(agg_tbl: DataFrame, table_schema: dict) → DataFrame

Create a lookup table for entity membership.

Parameters:

agg_tbl: pd.DataFrame: Table with member information
table_schema: dict: Schema for the table

Returns:

pd.DataFrame: Lookup table mapping entity IDs to member strings

napistu.consensus._create_vertex_category(df: DataFrame, category: str) → DataFrame: Create vertex dataframe for a specific category from a source column.

napistu.consensus._filter_identifiers_by_qualifier(meta_identifiers: DataFrame, defining_biological_qualifiers: list[str]) → DataFrame

Filter identifiers to only include those with specific biological qualifiers.

Parameters:

meta_identifiers: pd.DataFrame: Table of identifiers
defining_biological_qualifiers: list[str]: List of biological qualifier types to keep

Returns:

pd.DataFrame: Filtered identifiers

napistu.consensus._get_no_rxn_pathway_ids(pw_index: PWIndex, no_rxn_pathway_ids: list[str] | None = None) → list[str]

Get the pathway ids for models which should not have reactions.

Parameters:

pw_index (pd.DataFrame) – The pathway index.
no_rxn_pathway_ids (list, optional) – The pathway ids for models which should not have reactions. If None, use the defaults.

Returns:

no_rxn_pathway_ids – The pathway ids for models which should not have reactions.

Return type:

list

napistu.consensus._handle_entries_without_identifiers(sbml_df: DataFrame, valid_identifiers: DataFrame) → DataFrame

Handle entities that don’t have identifiers by adding dummy identifiers.

Parameters:

sbml_df: pd.DataFrame: Original table of entities
valid_identifiers: pd.DataFrame: Table of identifiers that passed filtering

Returns:

pd.DataFrame: Valid identifiers with dummy entries added

napistu.consensus._merge_entity_data(sbml_dfs_dict: dict[str, SBML_dfs], lookup_table: Series, table: str) → dict

Merge Entity Data

Report cases where a single “new” id is associated with multiple different values of entity_var

Parameters:

sbml_dfs_dict (dict) – dictionary where values are to-be-merged model nnames and values are SBML_dfs
lookup_table (pd.Series) – a series where the index is an old model and primary key and the value is the new consensus id
table (str) – table whose data is being consolidates (currently species or reactions)

Returns:

entity_data – dictionary containing pd.DataFrames which aggregate all of the individual entity_data tables in “sbml_dfs_dict”

Return type:

dict

napistu.consensus._merge_entity_data_create_consensus(entity_data_dict: dict, lookup_table: Series, entity_schema: dict, an_entity_data_type: str, table: str) → DataFrame

Merge Entity Data - Report Mismatches

Report cases where a single “new” id is associated with multiple different values of entity_var

Parameters:

entity_data_dict: dict: Dictionary containing all model’s “an_entity_data_type” dictionaries
lookup_table: pd.Series: A series where the index is an old model and primary key and the value is the new consensus id
entity_schema: dict: Schema for “table”
an_entity_data_type: str: data_type from species/reactions_data in entity_data_dict
table: str: table whose data is being consolidates (currently species or reactions)

Returns:

pd.DataFrame: Table where index is primary key of “table” and values are all distinct annotations from “an_entity_data_type”.

napistu.consensus._merge_entity_data_report_mismatches(combined_entity_data: DataFrame, entity_schema: dict, an_entity_data_type: str, table: str) → None

Merge Entity Data - Report Mismatches

Report cases where a single “new” id is associated with multiple different values of entity_var

Parameters:

combined_entity_data (pd.DataFrame) – indexed by table primary key containing all data from “an_entity_data_type”
entity_schema (dict) – schema for “table”
an_entity_data_type (str) – data_type from species/reactions_data in combined_entity_data
table (str) – table whose data is being consolidates (currently species or reactions)

Return type:

None

napistu.consensus._merge_entity_identifiers(agg_primary_table: DataFrame, lookup_table: Series, table_schema: dict) → Series

Merge identifiers from multiple entities.

Parameters:

agg_primary_table: pd.DataFrame: Table of entities
lookup_table: pd.Series: Lookup table mapping old IDs to new IDs
table_schema: dict: Schema for the table

Returns:

pd.Series: Series mapping new IDs to merged identifier objects

napistu.consensus._pre_consensus_compartment_check(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex) → None

Check for compartment compatibility across models before consensus building.

This function identifies models that won’t mix well in a consensus because they contain non-overlapping sets of compartments. It constructs a bipartite graph connecting models to their compartments and identifies disconnected components, which indicate incompatible compartment structures.

Parameters:

sbml_dfs_dict (dict) – Dictionary containing SBML dataframes for each model, keyed by model name.
pw_index (pandas.DataFrame) – Pathway index dataframe containing model metadata and pathway information.

Returns:

This function returns None but logs error messages if incompatible compartment structures are detected.

Return type:

None

Notes

The function builds a graph where: - Models are connected to their compartments via shared identifiers - Compartments are connected to their model-specific labels - Disconnected components indicate models with non-overlapping compartment sets

If multiple disconnected components are found, an error is logged listing the incompatible compartment groups that would result in an unmixed consensus.

Examples

>>> sbml_dfs_dict = {"model1": sbml_dfs1, "model2": sbml_dfs2}
>>> pw_index = pd.DataFrame({"model": ["model1", "model2"], ...})
>>> _pre_consensus_compartment_check(sbml_dfs_dict, pw_index)
# Logs error if models have incompatible compartment structures

napistu.consensus._pre_consensus_ontology_check(sbml_dfs_dict: dict[str, SBML_dfs], entity_type: str) → None

Check for ontology compatibility across models before consensus building.

This function determines whether any models possess disjoint sets of ontologies for a given entity type (compartments, or species). It constructs a bipartite graph connecting models to their ontologies and identifies disconnected components, which indicate models with non-overlapping ontology structures.

Parameters:

sbml_dfs_dict (dict[str, SBML_dfs]) – Dictionary containing SBML dataframes for each model, keyed by model name.
entity_type (str) – The type of entity to check ontologies for. Must be one of ‘compartments’, ‘species’, or ‘reactions’.

Returns:

This function returns None but logs error messages if incompatible ontology structures are detected.

Return type:

None

Notes

The function builds a graph where: - Models are connected to ontologies they contain for the specified entity type - Disconnected components indicate models with non-overlapping ontology sets

If multiple disconnected components are found, an error is logged listing the incompatible ontology groups that would result in an unmixed consensus.

Examples

>>> sbml_dfs_dict = {"model1": sbml_dfs1, "model2": sbml_dfs2}
>>> _pre_consensus_ontology_check(sbml_dfs_dict, "compartments")
# Logs error if models have incompatible compartment ontologies

napistu.consensus._prepare_consensus_table(agg_table_harmonized: DataFrame, table_schema: dict, cluster_consensus_identifiers: DataFrame) → DataFrame

Prepare a consensus table with one row per unique entity.

Parameters:

agg_table_harmonized: pd.DataFrame: Table with nameness scores and cluster assignments
table_schema: dict: Schema for the table
cluster_consensus_identifiers: pd.DataFrame: Consensus identifiers for each cluster

Returns:

pd.DataFrame: New consensus table with merged entities

napistu.consensus._prepare_identifier_edgelist(valid_identifiers: DataFrame, sbml_df: DataFrame) → DataFrame

Prepare an edgelist for clustering identifiers.

Parameters:

valid_identifiers: pd.DataFrame: Table of identifiers
sbml_df: pd.DataFrame: Original table of entities

Returns:

pd.DataFrame: Edgelist connecting entities to their identifiers

napistu.consensus._prepare_member_table(sbml_dfs_dict: dict[str, SBML_dfs], defined_by: str, defined_lookup_tables: dict, table_schema: dict, defined_by_schema: dict, defining_attrs: list[str], table: str = 'reactions') → tuple[DataFrame, str]

Prepare a table of members and validate their structure.

Parameters:

sbml_dfs_dict: dict[str, SBML_dfs]: Dictionary of SBML_dfs from different models
defined_by: str: Name of the table whose entries define membership
defined_lookup_tables: dict: Lookup tables for updating IDs
table_schema: dict: Schema for the main table
defined_by_schema: dict: Schema for the defining table
defining_attrs: list[str]: Attributes that define a unique member
table: str: Name of the main table (default: REACTIONS)

Returns:

tuple:

Updated aggregated table with member strings
Name of the foreign key

napistu.consensus._reduce_to_consensus_ids(sbml_df: DataFrame, table_schema: dict, pw_index: PWIndex | None = None, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) → tuple[DataFrame, Series]

Reduce a table of entities to unique entries based on consensus identifiers.

This function clusters entities that share identifiers (as defined by the provided biological qualifiers) and produces a new table of unique entities, along with a lookup table mapping original entities to consensus IDs.

Parameters:

sbml_df (pd.DataFrame) – Table of entities from multiple models, with model in the index (as produced by _unnest_SBML_df).
table_schema (dict) – Schema for the table being reduced.
pw_index (PWIndex) – An index of all tables being aggregated. Optional if no source information is required.
defining_biological_qualifiers (list[str]) – List of biological qualifier types which define distinct entities. Defaults to BQB_DEFINING_ATTRS.

Returns:

new_id_table (pd.DataFrame) – Table matching the schema of one of the input models, with merged entities.
lookup_table (pd.Series) – Series mapping the index of the aggregated entities to new consensus IDs.

napistu.consensus._remove_no_rxn_pathways(no_rxn_pathway_ids: list[str], sbml_dfs_dict: dict[str, SBML_dfs], compspec_lookup_table: DataFrame) → None

Remove pathways which don’t contribute reactions from the pw_index.

Parameters:

no_rxn_pathway_ids (list) – The pathway ids for models which should not have reactions. (i.e., models which are just species metadata like “Dogma”)
sbml_dfs_dict (dict) – The dictionary of SBML_dfs.
compspec_lookup_table (pd.DataFrame) – The lookup table for compartmentalized species.

Returns:

Modifies objects in place.

Return type:

None

napistu.consensus._report_consensus_merges(lookup_table: Series, table_schema: dict, agg_tbl: DataFrame | None = None, sbml_dfs_dict: dict[str, SBML_dfs] | None = None, n_example_merges: int = 3) → None

Report Consensus Merges

Print a summary of merges that occurred

Parameters:

lookup_tablepd.Series: An index of “model” and the entities primary key with values of new_id
table_schemadict: Schema of the table being merged
agg_tblpd.DataFrame or None: Contains the original model, primary keys and a label. Required if the primary key is not r_id (i.e., reactions)
sbml_dfs_dictpd.DataFrame or None: The dict of full models across all models. Used to create reaction formulas if the primary key is r_id
n_example_mergesint: Number of example merges to report details on

Returns:

None

napistu.consensus._resolve_reversibility(sbml_dfs_dict: dict[str, SBML_dfs], rxn_consensus_species: DataFrame, rxn_lookup_table: Series) → DataFrame: For a set of merged reactions determine what their consensus reaction reversibilities are

napistu.consensus._unnest_SBML_df(sbml_dfs_dict: dict[str, SBML_dfs], table: str) → DataFrame

Unnest and concatenate a specific table from multiple SBML_dfs models.

This function merges corresponding tables from a set of models into a single DataFrame, adding the model name as an index level.

Parameters:

sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.
table (str) – The name of the table to aggregate (e.g., ‘species’, ‘reactions’, ‘compartments’).

Returns:

A concatenated table with a MultiIndex of model and entity ID.

Return type:

pd.DataFrame

napistu.consensus._update_foreign_keys(agg_tbl: DataFrame, table_schema: dict, fk_lookup_tables: dict) → DataFrame: Update one or more foreign keys based on old-to-new foreign key lookup table(s).

napistu.consensus._validate_consensus_table(new_id_table: DataFrame, sbml_df: DataFrame) → None

Validate that the new consensus table has the same structure as the original.

Parameters:

new_id_table: pd.DataFrame: Newly created consensus table
sbml_df: pd.DataFrame: Original table from which consensus was built

Raises:

ValueError: If index names or columns don’t match

napistu.consensus._validate_merge_entity_data_create_consensus(entity_data_dict, an_entity_data_type, models_w_entity_data_type)

Validate creating a consensus of entity data tables in cases where the same table is present in multiple models

This function checks whether tables with the same entity data key can be reasonably merged (same index and column names) or whether they seem like apples-to-oranges.

Parameters:

entity_data_dict: dict: Dictionary containing all model’s “an_entity_data_type” dictionaries
an_entity_data_type: str: The type of entity data to merge
models_w_entity_data_type: list: List of models with the same entity data type

Returns:

None

Raises:

ValueError:: If the tables have different index or column names

napistu.consensus._validate_meta_identifiers(meta_identifiers: DataFrame) → None: Check Identifiers to make sure they aren’t empty and flag cases where IDs are missing BQB terms.

napistu.consensus.construct_consensus_model(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, model_source: Source | None = None, dogmatic: bool = True, check_mergeability: bool = True, no_rxn_pathway_ids: list[str] | None = None) → SBML_dfs

Construct a Consensus Model by merging shared entities across pathway models.

This function takes a dictionary of pathway models and merges shared entities (compartments, species, reactions, etc.) into a single consensus model, using a set of rules for entity identity and merging.

Parameters:

sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.
pw_index (PWIndex) – An index of all tables being aggregated, used for cross-referencing entities.
model_source (Source) – A source object for the consensus model.
dogmatic (bool, default=True) – If True, preserve genes, transcripts, and proteins as separate species. If False, merge them when possible.
check_mergeability (bool, default=True) – whether to check for issues which will prevent merging across models
no_rxn_pathway_ids (list, optional) – The pathway ids for models which should not have reactions. If None, use the defaults. This can be used to include pathways which are just metadata like “Dogma”.

Returns:

A consensus SBML_dfs object containing the merged model.

Return type:

SBML_dfs

napistu.consensus.construct_meta_entities_fk(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: DataFrame, table: str = 'compartmentalized_species', fk_lookup_tables: dict = {}, extra_defining_attrs: list = []) → tuple[DataFrame, Series]

Construct Meta Entities Defined by Foreign Keys

Aggregating across one entity type for a set of pathway models merge entities which are defined by their foreign keys.

Parameters:

sbml_df_dict: dict{“model”: SBML_dfs}: A dictionary of cpr.SBML_dfs
pw_index: PWIndex: An index of all tables being aggregated
table:: A table/entity set from the sbml_dfs to work-with
fk_lookup_tables: dict: Dictionary containing lookup tables for all foreign keys used by the table
extra_defining_attrs: list: List of terms which uniquely define a reaction species in addition to the foreign keys. A common case is when a species is a modifier and a substrate in a reaction.

Returns:

new_id_table: pd.DataFrame: Matching the schema of one of the tables within sbml_df_dict
lookup_table: pd.Series: Matches the index of the aggregated entities to new_ids

napistu.consensus.construct_meta_entities_identifiers(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, table: str, fk_lookup_tables: dict = {}, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) → tuple[DataFrame, Series]

Construct meta-entities by merging entities across models that share identifiers.

Aggregates a single entity type from a set of pathway models and merges entities that share identifiers (as defined by the provided biological qualifiers).

Parameters:

sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.
pw_index (PWIndex) – An index of all tables being aggregated.
table (str) – The name of the table/entity set to aggregate (e.g., ‘species’, ‘compartments’).
fk_lookup_tables (dict, optional) – Dictionary containing lookup tables for all foreign keys used by the table (default: empty dict).
defining_biological_qualifiers (list[str], optional) – List of BQB codes which define distinct entities. Defaults to BQB_DEFINING_ATTRS.

Returns:

new_id_table (pd.DataFrame) – Table matching the schema of one of the input models, with merged entities.
lookup_table (pd.Series) – Series mapping the index of the aggregated entities to new consensus IDs.

napistu.consensus.construct_meta_entities_members(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex | None, table: str = 'reactions', defined_by: str = 'reaction_species', defined_lookup_tables: dict = {}, defining_attrs: list[str] = ['sc_id', 'stoichiometry']) → tuple[DataFrame, Series]

Construct Meta Entities Defined by Membership

Aggregating across one entity type for a set of pathway models, merge entities with the same members.

Parameters:

sbml_df_dict: dict{“model”: SBML_dfs}: A dictionary of SBML_dfs
pw_index: PWIndex: An index of all tables being aggregated
table: str: A table/entity set from the sbml_dfs to work-with
defined_by: dict: A table/entity set whose entries are members of “table”
defined_lookup_tables: {pd.Series}: Lookup table for updating the ids of “defined_by”
defining_attrs: [str]: A list of attributes which jointly define a unique entity

Returns:

new_id_table: pd.DataFrame: Matching the schema of one of the tables within sbml_df_dict
lookup_table: pd.Series: Matches the index of the aggregated entities to new_ids

napistu.consensus.construct_sbml_dfs_dict(pw_index: DataFrame, strict: bool = True, verbose: bool = False) → dict[str, SBML_dfs]

Construct a dictionary of SBML_dfs objects from a pathway index.

This function converts all models in the pathway index into SBML_dfs objects and adds them to a dictionary. Optionally, it can skip erroneous files with a warning instead of raising an error.

Parameters:

pw_index (pd.DataFrame) – An index of all tables being aggregated, containing model metadata and file paths.
strict (bool, default=True) – If True, raise an error on any file that cannot be loaded. If False, skip erroneous files with a warning.
verbose (bool, default=False) – If True, then include detailed logs.

Returns:

A dictionary mapping model names to SBML_dfs objects.

Return type:

dict[str, SBML_dfs]

napistu.consensus.prepare_consensus_model(sbml_dfs_list: list[SBML_dfs]) → tuple[dict[str, SBML_dfs], PWIndex]

Prepare for creating a consensus model using a list of to-be-consolidated sbml_dfs objects.

This function will extract the core source metadata from a set of SBML_dfs objects and use it to create a pathway index object. The pathway_id from these objects will then be used to key the the sbml_dfs_list objects to create the expected input for construct_consensus_model.

Parameters:

sbml_dfs_list (list[SBML_dfs]) – List of sbml_dfs objects to be consolidated.

Returns:

sbml_dfs_dict (dict[str, SBML_dfs]) – Dictionary of sbml_dfs objects indexed by pathway_id.
pw_index (PWIndex) – Pathway index object.

Raises:

ValueError – If the sbml_dfs_list is empty. If the sbml_dfs_list contains sbml_dfs objects with more than one row. If the sbml_dfs_list contains sbml_dfs objects with missing columns. If the sbml_dfs_list contains sbml_dfs objects with duplicate pathway_ids. If the sbml_dfs_list contains sbml_dfs objects with invalid pathway_ids.