napistu.consensus

Creating a consensus model by merging shared entities across pathway models.

Public Functions

construct_consensus_model(sbml_dfs_dict, pw_index, model_source=None, dogmatic=True, check_mergeability=True, no_rxn_pathway_ids=None) -> SBML_dfs:

Construct a Consensus Model by merging shared entities across pathway models.

construct_meta_entities_fk(sbml_dfs_dict, pw_index, table=”compartmentalized_species”, fk_lookup_tables={}, extra_defining_attrs=[]) -> tuple[pd.DataFrame, pd.Series]:

Construct Meta Entities Defined by Foreign Keys

construct_meta_entities_identifiers(sbml_dfs_dict, pw_index, table, fk_lookup_tables={}, defining_biological_qualifiers=BQB_DEFINING_ATTRS) -> tuple[pd.DataFrame, pd.Series]:

Construct meta-entities by merging entities across models that share identifiers.

construct_meta_entities_members(sbml_dfs_dict, pw_index=None, table=”reactions”, defined_by=”reaction_species”, defined_lookup_tables={}, defining_attrs=[SC_ID, STOICHIOMETRY]) -> tuple[pd.DataFrame, pd.Series]:

Construct Meta Entities Defined by Membership

construct_sbml_dfs_dict(pw_index, strict=True, verbose=False) -> dict[str, SBML_dfs]:

Construct a dictionary of SBML_dfs objects from a pathway index.

prepare_consensus_model(sbml_dfs_list) -> tuple[dict[str, SBML_dfs], PWIndex]:

Prepare for creating a consensus model using a list of to-be-consolidated sbml_dfs objects.

Functions

construct_consensus_model(sbml_dfs_dict, ...)

Construct a Consensus Model by merging shared entities across pathway models.

construct_meta_entities_fk(sbml_dfs_dict, ...)

Construct Meta Entities Defined by Foreign Keys

construct_meta_entities_identifiers(...[, ...])

Construct meta-entities by merging entities across models that share identifiers.

construct_meta_entities_members(...[, ...])

Construct Meta Entities Defined by Membership

construct_sbml_dfs_dict(pw_index[, strict, ...])

Construct a dictionary of SBML_dfs objects from a pathway index.

prepare_consensus_model(sbml_dfs_list)

Prepare for creating a consensus model using a list of to-be-consolidated sbml_dfs objects.

napistu.consensus._add_consensus_sources(new_id_table: DataFrame, agg_table_harmonized: DataFrame, lookup_table: Series, table_schema: dict, pw_index: PWIndex) DataFrame

Add source information to the consensus table.

Parameters:

new_id_table: pd.DataFrame

Consensus table without source information

agg_table_harmonized: pd.DataFrame

Original table with cluster assignments

lookup_table: pd.Series

Maps old IDs to new consensus IDs

table_schema: dict

Schema for the table

pw_index: PWIndex

An index of all tables being aggregated

Returns:

pd.DataFrame

Consensus table with source information added

napistu.consensus._add_entity_data(sbml_dfs: SBML_dfs, sbml_dfs_dict: dict[str, SBML_dfs], lookup_tables: dict) SBML_dfs

Add entity data from component models to the consensus model.

Parameters:

sbml_dfs: SBML_dfs

The consensus model being built

sbml_dfs_dict: dict[str, SBML_dfs]

A dictionary of SBML_dfs from different models

lookup_tables: dict

Dictionary of lookup tables for translating between old and new entity IDs

Returns:

SBML_dfs

The updated consensus model

napistu.consensus._build_consensus_identifiers(sbml_df: DataFrame, table_schema: dict, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) tuple[Series, DataFrame]

Build consensus identifiers by clustering entities that share biological identifiers.

This function takes a set of entities spanning multiple models and finds all unique entities by grouping them according to the provided biological qualifiers. It returns a mapping from original entities to clusters and a DataFrame of consensus identifier objects for each cluster.

Parameters:
  • sbml_df (pd.DataFrame) – Table of entities from multiple models, with model in the index (as produced by _unnest_SBML_df).

  • table_schema (dict) – Schema for the table being processed.

  • defining_biological_qualifiers (list[str], optional) – List of biological qualifier types to use for grouping. Defaults to BQB_DEFINING_ATTRS.

Returns:

  • indexed_cluster (pd.Series) – Series mapping the index from sbml_df onto a set of clusters which define unique entities.

  • cluster_consensus_identifiers_df (pd.DataFrame) – DataFrame mapping clusters to consensus identifiers (Identifiers objects).

napistu.consensus._check_sbml_dfs(sbml_dfs: SBML_dfs, model_label: str, N_examples: int | str = 5) None

Check SBML_dfs for identifiers which are associated with different entities before a merge.

napistu.consensus._check_sbml_dfs_dict(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, check_mergeability: bool = True) None

Check models in SBML_dfs for problems which can be reported up-front

Parameters:
  • sbml_dfs_dict (dict[str, SBML_dfs]) – a dict of sbml_dfs models;

  • pw_index (indices.PWIndex) – an index of all tables being aggregated

  • check_mergeability (bool, default=True) – whether to check for issues which will prevent merging across models

Returns:

This function returns None but logs error messages if incompatible ontology structures are detected.

Return type:

None

napistu.consensus._check_sbml_dfs_mergeability(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex) None

Check SBML_dfs for obvious issues which will prevent merging across models.

Parameters:
  • sbml_dfs_dict (dict[str, SBML_dfs]) – a dict of sbml_dfs models;

  • pw_index (indices.PWIndex) – an index of all tables being aggregated

Returns:

This function returns None but logs error messages if incompatible ontology structures are detected.

Return type:

None

napistu.consensus._create_cluster_identifiers(meta_identifiers: DataFrame, indexed_cluster: Series, sbml_df: DataFrame, ind_clusters: DataFrame, table_schema: dict) DataFrame

Create identifier objects for each cluster.

Parameters:
  • meta_identifiers (pd.DataFrame) – All identifiers (including those filtered out by BQB)

  • indexed_cluster (pd.Series) – Maps entity indices to cluster IDs

  • sbml_df (pd.DataFrame) – Original table of entities

  • ind_clusters (pd.DataFrame) – Cluster assignments from graph algorithm

  • table_schema (dict) – Schema for the table, used to determine the correct identifier column name

Returns:

Table mapping clusters to their consensus identifiers, with the identifier column named according to the schema

Return type:

pd.DataFrame

napistu.consensus._create_consensus_entities(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, defining_biological_qualifiers: list[str], no_rxn_pathway_ids: list[str] | None = None) tuple[dict, dict]

Create consensus entities for all primary tables in the model.

This helper function creates consensus compartments, species, compartmentalized species, reactions, and reaction species by finding shared entities across source models.

Parameters:

sbml_dfs_dict: dict{SBML_dfs}

A dictionary of SBML_dfs from different models

pw_index: PWIndex

An index of all tables being aggregated

defining_biological_qualifiers: list[str]

Biological qualifier terms that define distinct entities

no_rxn_pathway_ids: Optional[list[str]] = None,

The pathway ids for models which should not have reactions. If None, use the defaults. This can be used to include pathways which are just metadata like “Dogma”.

Returns:

tuple:
  • dict of consensus entities tables

  • dict of lookup tables

napistu.consensus._create_consensus_entity_data(combined_entity_data: DataFrame, primary_key: str) DataFrame

Create consensus entity data by combining multiple rows with the same index value.

This function takes a DataFrame that might have multiple rows for the same index value and combines them so there is exactly 1 row per index value using the “first” method.

Parameters:

combined_entity_datapd.DataFrame

Input DataFrame with potentially multiple rows per index value

primary_keystr

The column name to use as the primary key for grouping

Returns:

pd.DataFrame

DataFrame with exactly one row per unique primary key value

napistu.consensus._create_consensus_sources(agg_tbl: DataFrame, lookup_table: Series, table_schema: dict, pw_index: PWIndex) Series

Create Consensus Sources

Annotate the source of to-be-merged species with the models they came from, and combine with existing annotations.

Parameters:

agg_tbl: pd.DataFrame

A table containing existing Source objects and a many-1 “new_id” of their post-aggregation consensus entity

lookup_table: pd.Series

A series where the index are old identifiers and the values are post-aggregation new identifiers

table_schema: dict

Summary of the schema for the operant entitye type

pw_index: PWIndex

An index of all tables being aggregated

Returns:

new_sources: pd.DataFrame

Mapping where the index is new identifiers and values are aggregated Source objects

napistu.consensus._create_consensus_table(agg_primary_table: DataFrame, lookup_table: Series, updated_identifiers: Series, table_schema: dict) DataFrame

Create a consensus table with merged entities.

Parameters:

agg_primary_table: pd.DataFrame

Table of entities

lookup_table: pd.Series

Lookup table mapping old IDs to new IDs

updated_identifiers: pd.Series

Series mapping new IDs to merged identifier objects

table_schema: dict

Schema for the table

Returns:

pd.DataFrame

Consensus table with one row per unique entity

napistu.consensus._create_default_consensus_source(sbml_dfs_dict: dict[str, SBML_dfs]) Source

A default consensus source is created when no model source object is provided.

Parameters:

sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.

Returns:

A default consensus source object.

Return type:

Source

napistu.consensus._create_entity_consensus(membership_lookup: DataFrame, table_schema: dict) tuple[DataFrame, Series]

Create consensus entities based on membership.

Parameters:

membership_lookup: pd.DataFrame

Table mapping entities to their member strings

table_schema: dict

Schema for the table

Returns:

tuple:
  • Consensus entities DataFrame

  • Lookup table mapping old IDs to new IDs

napistu.consensus._create_entity_lookup_table(agg_table_harmonized: DataFrame, table_schema: dict) Series

Create a lookup table mapping original entity IDs to new consensus IDs.

Parameters:

agg_table_harmonized: pd.DataFrame

Table with cluster assignments for each entity

table_schema: dict

Schema for the table

Returns:

pd.Series

Lookup table mapping old entity IDs to new consensus IDs

napistu.consensus._create_member_string(x: list[str]) str
napistu.consensus._create_membership_lookup(agg_tbl: DataFrame, table_schema: dict) DataFrame

Create a lookup table for entity membership.

Parameters:

agg_tbl: pd.DataFrame

Table with member information

table_schema: dict

Schema for the table

Returns:

pd.DataFrame

Lookup table mapping entity IDs to member strings

napistu.consensus._create_vertex_category(df: DataFrame, category: str) DataFrame

Create vertex dataframe for a specific category from a source column.

napistu.consensus._filter_identifiers_by_qualifier(meta_identifiers: DataFrame, defining_biological_qualifiers: list[str]) DataFrame

Filter identifiers to only include those with specific biological qualifiers.

Parameters:

meta_identifiers: pd.DataFrame

Table of identifiers

defining_biological_qualifiers: list[str]

List of biological qualifier types to keep

Returns:

pd.DataFrame

Filtered identifiers

napistu.consensus._get_no_rxn_pathway_ids(pw_index: PWIndex, no_rxn_pathway_ids: list[str] | None = None) list[str]

Get the pathway ids for models which should not have reactions.

Parameters:
  • pw_index (pd.DataFrame) – The pathway index.

  • no_rxn_pathway_ids (list, optional) – The pathway ids for models which should not have reactions. If None, use the defaults.

Returns:

no_rxn_pathway_ids – The pathway ids for models which should not have reactions.

Return type:

list

napistu.consensus._handle_entries_without_identifiers(sbml_df: DataFrame, valid_identifiers: DataFrame) DataFrame

Handle entities that don’t have identifiers by adding dummy identifiers.

Parameters:

sbml_df: pd.DataFrame

Original table of entities

valid_identifiers: pd.DataFrame

Table of identifiers that passed filtering

Returns:

pd.DataFrame

Valid identifiers with dummy entries added

napistu.consensus._merge_entity_data(sbml_dfs_dict: dict[str, SBML_dfs], lookup_table: Series, table: str) dict

Merge Entity Data

Report cases where a single “new” id is associated with multiple different values of entity_var

Parameters:
  • sbml_dfs_dict (dict) – dictionary where values are to-be-merged model nnames and values are SBML_dfs

  • lookup_table (pd.Series) – a series where the index is an old model and primary key and the value is the new consensus id

  • table (str) – table whose data is being consolidates (currently species or reactions)

Returns:

entity_data – dictionary containing pd.DataFrames which aggregate all of the individual entity_data tables in “sbml_dfs_dict”

Return type:

dict

napistu.consensus._merge_entity_data_create_consensus(entity_data_dict: dict, lookup_table: Series, entity_schema: dict, an_entity_data_type: str, table: str) DataFrame

Merge Entity Data - Report Mismatches

Report cases where a single “new” id is associated with multiple different values of entity_var

Parameters:

entity_data_dict: dict

Dictionary containing all model’s “an_entity_data_type” dictionaries

lookup_table: pd.Series

A series where the index is an old model and primary key and the value is the new consensus id

entity_schema: dict

Schema for “table”

an_entity_data_type: str

data_type from species/reactions_data in entity_data_dict

table: str

table whose data is being consolidates (currently species or reactions)

Returns:

pd.DataFrame

Table where index is primary key of “table” and values are all distinct annotations from “an_entity_data_type”.

napistu.consensus._merge_entity_data_report_mismatches(combined_entity_data: DataFrame, entity_schema: dict, an_entity_data_type: str, table: str) None

Merge Entity Data - Report Mismatches

Report cases where a single “new” id is associated with multiple different values of entity_var

Parameters:
  • combined_entity_data (pd.DataFrame) – indexed by table primary key containing all data from “an_entity_data_type”

  • entity_schema (dict) – schema for “table”

  • an_entity_data_type (str) – data_type from species/reactions_data in combined_entity_data

  • table (str) – table whose data is being consolidates (currently species or reactions)

Return type:

None

napistu.consensus._merge_entity_identifiers(agg_primary_table: DataFrame, lookup_table: Series, table_schema: dict) Series

Merge identifiers from multiple entities.

Parameters:

agg_primary_table: pd.DataFrame

Table of entities

lookup_table: pd.Series

Lookup table mapping old IDs to new IDs

table_schema: dict

Schema for the table

Returns:

pd.Series

Series mapping new IDs to merged identifier objects

napistu.consensus._pre_consensus_compartment_check(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex) None

Check for compartment compatibility across models before consensus building.

This function identifies models that won’t mix well in a consensus because they contain non-overlapping sets of compartments. It constructs a bipartite graph connecting models to their compartments and identifies disconnected components, which indicate incompatible compartment structures.

Parameters:
  • sbml_dfs_dict (dict) – Dictionary containing SBML dataframes for each model, keyed by model name.

  • pw_index (pandas.DataFrame) – Pathway index dataframe containing model metadata and pathway information.

Returns:

This function returns None but logs error messages if incompatible compartment structures are detected.

Return type:

None

Notes

The function builds a graph where: - Models are connected to their compartments via shared identifiers - Compartments are connected to their model-specific labels - Disconnected components indicate models with non-overlapping compartment sets

If multiple disconnected components are found, an error is logged listing the incompatible compartment groups that would result in an unmixed consensus.

Examples

>>> sbml_dfs_dict = {"model1": sbml_dfs1, "model2": sbml_dfs2}
>>> pw_index = pd.DataFrame({"model": ["model1", "model2"], ...})
>>> _pre_consensus_compartment_check(sbml_dfs_dict, pw_index)
# Logs error if models have incompatible compartment structures
napistu.consensus._pre_consensus_ontology_check(sbml_dfs_dict: dict[str, SBML_dfs], entity_type: str) None

Check for ontology compatibility across models before consensus building.

This function determines whether any models possess disjoint sets of ontologies for a given entity type (compartments, or species). It constructs a bipartite graph connecting models to their ontologies and identifies disconnected components, which indicate models with non-overlapping ontology structures.

Parameters:
  • sbml_dfs_dict (dict[str, SBML_dfs]) – Dictionary containing SBML dataframes for each model, keyed by model name.

  • entity_type (str) – The type of entity to check ontologies for. Must be one of ‘compartments’, ‘species’, or ‘reactions’.

Returns:

This function returns None but logs error messages if incompatible ontology structures are detected.

Return type:

None

Notes

The function builds a graph where: - Models are connected to ontologies they contain for the specified entity type - Disconnected components indicate models with non-overlapping ontology sets

If multiple disconnected components are found, an error is logged listing the incompatible ontology groups that would result in an unmixed consensus.

Examples

>>> sbml_dfs_dict = {"model1": sbml_dfs1, "model2": sbml_dfs2}
>>> _pre_consensus_ontology_check(sbml_dfs_dict, "compartments")
# Logs error if models have incompatible compartment ontologies
napistu.consensus._prepare_consensus_table(agg_table_harmonized: DataFrame, table_schema: dict, cluster_consensus_identifiers: DataFrame) DataFrame

Prepare a consensus table with one row per unique entity.

Parameters:

agg_table_harmonized: pd.DataFrame

Table with nameness scores and cluster assignments

table_schema: dict

Schema for the table

cluster_consensus_identifiers: pd.DataFrame

Consensus identifiers for each cluster

Returns:

pd.DataFrame

New consensus table with merged entities

napistu.consensus._prepare_identifier_edgelist(valid_identifiers: DataFrame, sbml_df: DataFrame) DataFrame

Prepare an edgelist for clustering identifiers.

Parameters:

valid_identifiers: pd.DataFrame

Table of identifiers

sbml_df: pd.DataFrame

Original table of entities

Returns:

pd.DataFrame

Edgelist connecting entities to their identifiers

napistu.consensus._prepare_member_table(sbml_dfs_dict: dict[str, SBML_dfs], defined_by: str, defined_lookup_tables: dict, table_schema: dict, defined_by_schema: dict, defining_attrs: list[str], table: str = 'reactions') tuple[DataFrame, str]

Prepare a table of members and validate their structure.

Parameters:

sbml_dfs_dict: dict[str, SBML_dfs]

Dictionary of SBML_dfs from different models

defined_by: str

Name of the table whose entries define membership

defined_lookup_tables: dict

Lookup tables for updating IDs

table_schema: dict

Schema for the main table

defined_by_schema: dict

Schema for the defining table

defining_attrs: list[str]

Attributes that define a unique member

table: str

Name of the main table (default: REACTIONS)

Returns:

tuple:
  • Updated aggregated table with member strings

  • Name of the foreign key

napistu.consensus._reduce_to_consensus_ids(sbml_df: DataFrame, table_schema: dict, pw_index: PWIndex | None = None, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) tuple[DataFrame, Series]

Reduce a table of entities to unique entries based on consensus identifiers.

This function clusters entities that share identifiers (as defined by the provided biological qualifiers) and produces a new table of unique entities, along with a lookup table mapping original entities to consensus IDs.

Parameters:
  • sbml_df (pd.DataFrame) – Table of entities from multiple models, with model in the index (as produced by _unnest_SBML_df).

  • table_schema (dict) – Schema for the table being reduced.

  • pw_index (PWIndex) – An index of all tables being aggregated. Optional if no source information is required.

  • defining_biological_qualifiers (list[str]) – List of biological qualifier types which define distinct entities. Defaults to BQB_DEFINING_ATTRS.

Returns:

  • new_id_table (pd.DataFrame) – Table matching the schema of one of the input models, with merged entities.

  • lookup_table (pd.Series) – Series mapping the index of the aggregated entities to new consensus IDs.

napistu.consensus._remove_no_rxn_pathways(no_rxn_pathway_ids: list[str], sbml_dfs_dict: dict[str, SBML_dfs], compspec_lookup_table: DataFrame) None

Remove pathways which don’t contribute reactions from the pw_index.

Parameters:
  • no_rxn_pathway_ids (list) – The pathway ids for models which should not have reactions. (i.e., models which are just species metadata like “Dogma”)

  • sbml_dfs_dict (dict) – The dictionary of SBML_dfs.

  • compspec_lookup_table (pd.DataFrame) – The lookup table for compartmentalized species.

Returns:

Modifies objects in place.

Return type:

None

napistu.consensus._report_consensus_merges(lookup_table: Series, table_schema: dict, agg_tbl: DataFrame | None = None, sbml_dfs_dict: dict[str, SBML_dfs] | None = None, n_example_merges: int = 3) None

Report Consensus Merges

Print a summary of merges that occurred

Parameters:

lookup_tablepd.Series

An index of “model” and the entities primary key with values of new_id

table_schemadict

Schema of the table being merged

agg_tblpd.DataFrame or None

Contains the original model, primary keys and a label. Required if the primary key is not r_id (i.e., reactions)

sbml_dfs_dictpd.DataFrame or None

The dict of full models across all models. Used to create reaction formulas if the primary key is r_id

n_example_mergesint

Number of example merges to report details on

Returns:

None

napistu.consensus._resolve_reversibility(sbml_dfs_dict: dict[str, SBML_dfs], rxn_consensus_species: DataFrame, rxn_lookup_table: Series) DataFrame

For a set of merged reactions determine what their consensus reaction reversibilities are

napistu.consensus._unnest_SBML_df(sbml_dfs_dict: dict[str, SBML_dfs], table: str) DataFrame

Unnest and concatenate a specific table from multiple SBML_dfs models.

This function merges corresponding tables from a set of models into a single DataFrame, adding the model name as an index level.

Parameters:
  • sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.

  • table (str) – The name of the table to aggregate (e.g., ‘species’, ‘reactions’, ‘compartments’).

Returns:

A concatenated table with a MultiIndex of model and entity ID.

Return type:

pd.DataFrame

napistu.consensus._update_foreign_keys(agg_tbl: DataFrame, table_schema: dict, fk_lookup_tables: dict) DataFrame

Update one or more foreign keys based on old-to-new foreign key lookup table(s).

napistu.consensus._validate_consensus_table(new_id_table: DataFrame, sbml_df: DataFrame) None

Validate that the new consensus table has the same structure as the original.

Parameters:

new_id_table: pd.DataFrame

Newly created consensus table

sbml_df: pd.DataFrame

Original table from which consensus was built

Raises:

ValueError

If index names or columns don’t match

napistu.consensus._validate_merge_entity_data_create_consensus(entity_data_dict, an_entity_data_type, models_w_entity_data_type)

Validate creating a consensus of entity data tables in cases where the same table is present in multiple models

This function checks whether tables with the same entity data key can be reasonably merged (same index and column names) or whether they seem like apples-to-oranges.

Parameters:

entity_data_dict: dict

Dictionary containing all model’s “an_entity_data_type” dictionaries

an_entity_data_type: str

The type of entity data to merge

models_w_entity_data_type: list

List of models with the same entity data type

Returns:

None

Raises:

ValueError:

If the tables have different index or column names

napistu.consensus._validate_meta_identifiers(meta_identifiers: DataFrame) None

Check Identifiers to make sure they aren’t empty and flag cases where IDs are missing BQB terms.

napistu.consensus.construct_consensus_model(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, model_source: Source | None = None, dogmatic: bool = True, check_mergeability: bool = True, no_rxn_pathway_ids: list[str] | None = None) SBML_dfs

Construct a Consensus Model by merging shared entities across pathway models.

This function takes a dictionary of pathway models and merges shared entities (compartments, species, reactions, etc.) into a single consensus model, using a set of rules for entity identity and merging.

Parameters:
  • sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.

  • pw_index (PWIndex) – An index of all tables being aggregated, used for cross-referencing entities.

  • model_source (Source) – A source object for the consensus model.

  • dogmatic (bool, default=True) – If True, preserve genes, transcripts, and proteins as separate species. If False, merge them when possible.

  • check_mergeability (bool, default=True) – whether to check for issues which will prevent merging across models

  • no_rxn_pathway_ids (list, optional) – The pathway ids for models which should not have reactions. If None, use the defaults. This can be used to include pathways which are just metadata like “Dogma”.

Returns:

A consensus SBML_dfs object containing the merged model.

Return type:

SBML_dfs

napistu.consensus.construct_meta_entities_fk(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: DataFrame, table: str = 'compartmentalized_species', fk_lookup_tables: dict = {}, extra_defining_attrs: list = []) tuple[DataFrame, Series]

Construct Meta Entities Defined by Foreign Keys

Aggregating across one entity type for a set of pathway models merge entities which are defined by their foreign keys.

Parameters:

sbml_df_dict: dict{“model”: SBML_dfs}

A dictionary of cpr.SBML_dfs

pw_index: PWIndex

An index of all tables being aggregated

table:

A table/entity set from the sbml_dfs to work-with

fk_lookup_tables: dict

Dictionary containing lookup tables for all foreign keys used by the table

extra_defining_attrs: list

List of terms which uniquely define a reaction species in addition to the foreign keys. A common case is when a species is a modifier and a substrate in a reaction.

Returns:

new_id_table: pd.DataFrame

Matching the schema of one of the tables within sbml_df_dict

lookup_table: pd.Series

Matches the index of the aggregated entities to new_ids

napistu.consensus.construct_meta_entities_identifiers(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, table: str, fk_lookup_tables: dict = {}, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) tuple[DataFrame, Series]

Construct meta-entities by merging entities across models that share identifiers.

Aggregates a single entity type from a set of pathway models and merges entities that share identifiers (as defined by the provided biological qualifiers).

Parameters:
  • sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.

  • pw_index (PWIndex) – An index of all tables being aggregated.

  • table (str) – The name of the table/entity set to aggregate (e.g., ‘species’, ‘compartments’).

  • fk_lookup_tables (dict, optional) – Dictionary containing lookup tables for all foreign keys used by the table (default: empty dict).

  • defining_biological_qualifiers (list[str], optional) – List of BQB codes which define distinct entities. Defaults to BQB_DEFINING_ATTRS.

Returns:

  • new_id_table (pd.DataFrame) – Table matching the schema of one of the input models, with merged entities.

  • lookup_table (pd.Series) – Series mapping the index of the aggregated entities to new consensus IDs.

napistu.consensus.construct_meta_entities_members(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex | None, table: str = 'reactions', defined_by: str = 'reaction_species', defined_lookup_tables: dict = {}, defining_attrs: list[str] = ['sc_id', 'stoichiometry']) tuple[DataFrame, Series]

Construct Meta Entities Defined by Membership

Aggregating across one entity type for a set of pathway models, merge entities with the same members.

Parameters:

sbml_df_dict: dict{“model”: SBML_dfs}

A dictionary of SBML_dfs

pw_index: PWIndex

An index of all tables being aggregated

table: str

A table/entity set from the sbml_dfs to work-with

defined_by: dict

A table/entity set whose entries are members of “table”

defined_lookup_tables: {pd.Series}

Lookup table for updating the ids of “defined_by”

defining_attrs: [str]

A list of attributes which jointly define a unique entity

Returns:

new_id_table: pd.DataFrame

Matching the schema of one of the tables within sbml_df_dict

lookup_table: pd.Series

Matches the index of the aggregated entities to new_ids

napistu.consensus.construct_sbml_dfs_dict(pw_index: DataFrame, strict: bool = True, verbose: bool = False) dict[str, SBML_dfs]

Construct a dictionary of SBML_dfs objects from a pathway index.

This function converts all models in the pathway index into SBML_dfs objects and adds them to a dictionary. Optionally, it can skip erroneous files with a warning instead of raising an error.

Parameters:
  • pw_index (pd.DataFrame) – An index of all tables being aggregated, containing model metadata and file paths.

  • strict (bool, default=True) – If True, raise an error on any file that cannot be loaded. If False, skip erroneous files with a warning.

  • verbose (bool, default=False) – If True, then include detailed logs.

Returns:

A dictionary mapping model names to SBML_dfs objects.

Return type:

dict[str, SBML_dfs]

napistu.consensus.prepare_consensus_model(sbml_dfs_list: list[SBML_dfs]) tuple[dict[str, SBML_dfs], PWIndex]

Prepare for creating a consensus model using a list of to-be-consolidated sbml_dfs objects.

This function will extract the core source metadata from a set of SBML_dfs objects and use it to create a pathway index object. The pathway_id from these objects will then be used to key the the sbml_dfs_list objects to create the expected input for construct_consensus_model.

Parameters:

sbml_dfs_list (list[SBML_dfs]) – List of sbml_dfs objects to be consolidated.

Returns:

  • sbml_dfs_dict (dict[str, SBML_dfs]) – Dictionary of sbml_dfs objects indexed by pathway_id.

  • pw_index (PWIndex) – Pathway index object.

Raises:

ValueError – If the sbml_dfs_list is empty. If the sbml_dfs_list contains sbml_dfs objects with more than one row. If the sbml_dfs_list contains sbml_dfs objects with missing columns. If the sbml_dfs_list contains sbml_dfs objects with duplicate pathway_ids. If the sbml_dfs_list contains sbml_dfs objects with invalid pathway_ids.