napistu.consensus
Creating a consensus model by merging shared entities across pathway models.
Public Functions
- construct_consensus_model(sbml_dfs_dict, pw_index, model_source=None, dogmatic=True, check_mergeability=True, no_rxn_pathway_ids=None) -> SBML_dfs:
Construct a Consensus Model by merging shared entities across pathway models.
- construct_meta_entities_fk(sbml_dfs_dict, pw_index, table=”compartmentalized_species”, fk_lookup_tables={}, extra_defining_attrs=[]) -> tuple[pd.DataFrame, pd.Series]:
Construct Meta Entities Defined by Foreign Keys
- construct_meta_entities_identifiers(sbml_dfs_dict, pw_index, table, fk_lookup_tables={}, defining_biological_qualifiers=BQB_DEFINING_ATTRS) -> tuple[pd.DataFrame, pd.Series]:
Construct meta-entities by merging entities across models that share identifiers.
- construct_meta_entities_members(sbml_dfs_dict, pw_index=None, table=”reactions”, defined_by=”reaction_species”, defined_lookup_tables={}, defining_attrs=[SC_ID, STOICHIOMETRY]) -> tuple[pd.DataFrame, pd.Series]:
Construct Meta Entities Defined by Membership
- construct_sbml_dfs_dict(pw_index, strict=True, verbose=False) -> dict[str, SBML_dfs]:
Construct a dictionary of SBML_dfs objects from a pathway index.
- prepare_consensus_model(sbml_dfs_list) -> tuple[dict[str, SBML_dfs], PWIndex]:
Prepare for creating a consensus model using a list of to-be-consolidated sbml_dfs objects.
Functions
|
Construct a Consensus Model by merging shared entities across pathway models. |
|
Construct Meta Entities Defined by Foreign Keys |
|
Construct meta-entities by merging entities across models that share identifiers. |
|
Construct Meta Entities Defined by Membership |
|
Construct a dictionary of SBML_dfs objects from a pathway index. |
|
Prepare for creating a consensus model using a list of to-be-consolidated sbml_dfs objects. |
- napistu.consensus._add_consensus_sources(new_id_table: DataFrame, agg_table_harmonized: DataFrame, lookup_table: Series, table_schema: dict, pw_index: PWIndex) DataFrame
Add source information to the consensus table.
Parameters:
- new_id_table: pd.DataFrame
Consensus table without source information
- agg_table_harmonized: pd.DataFrame
Original table with cluster assignments
- lookup_table: pd.Series
Maps old IDs to new consensus IDs
- table_schema: dict
Schema for the table
- pw_index: PWIndex
An index of all tables being aggregated
Returns:
- pd.DataFrame
Consensus table with source information added
- napistu.consensus._add_entity_data(sbml_dfs: SBML_dfs, sbml_dfs_dict: dict[str, SBML_dfs], lookup_tables: dict) SBML_dfs
Add entity data from component models to the consensus model.
Parameters:
- sbml_dfs: SBML_dfs
The consensus model being built
- sbml_dfs_dict: dict[str, SBML_dfs]
A dictionary of SBML_dfs from different models
- lookup_tables: dict
Dictionary of lookup tables for translating between old and new entity IDs
Returns:
- SBML_dfs
The updated consensus model
- napistu.consensus._build_consensus_identifiers(sbml_df: DataFrame, table_schema: dict, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) tuple[Series, DataFrame]
Build consensus identifiers by clustering entities that share biological identifiers.
This function takes a set of entities spanning multiple models and finds all unique entities by grouping them according to the provided biological qualifiers. It returns a mapping from original entities to clusters and a DataFrame of consensus identifier objects for each cluster.
- Parameters:
sbml_df (pd.DataFrame) – Table of entities from multiple models, with model in the index (as produced by _unnest_SBML_df).
table_schema (dict) – Schema for the table being processed.
defining_biological_qualifiers (list[str], optional) – List of biological qualifier types to use for grouping. Defaults to BQB_DEFINING_ATTRS.
- Returns:
indexed_cluster (pd.Series) – Series mapping the index from sbml_df onto a set of clusters which define unique entities.
cluster_consensus_identifiers_df (pd.DataFrame) – DataFrame mapping clusters to consensus identifiers (Identifiers objects).
- napistu.consensus._check_sbml_dfs(sbml_dfs: SBML_dfs, model_label: str, N_examples: int | str = 5) None
Check SBML_dfs for identifiers which are associated with different entities before a merge.
- napistu.consensus._check_sbml_dfs_dict(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, check_mergeability: bool = True) None
Check models in SBML_dfs for problems which can be reported up-front
- Parameters:
sbml_dfs_dict (dict[str, SBML_dfs]) – a dict of sbml_dfs models;
pw_index (indices.PWIndex) – an index of all tables being aggregated
check_mergeability (bool, default=True) – whether to check for issues which will prevent merging across models
- Returns:
This function returns None but logs error messages if incompatible ontology structures are detected.
- Return type:
None
- napistu.consensus._check_sbml_dfs_mergeability(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex) None
Check SBML_dfs for obvious issues which will prevent merging across models.
- Parameters:
sbml_dfs_dict (dict[str, SBML_dfs]) – a dict of sbml_dfs models;
pw_index (indices.PWIndex) – an index of all tables being aggregated
- Returns:
This function returns None but logs error messages if incompatible ontology structures are detected.
- Return type:
None
- napistu.consensus._create_cluster_identifiers(meta_identifiers: DataFrame, indexed_cluster: Series, sbml_df: DataFrame, ind_clusters: DataFrame, table_schema: dict) DataFrame
Create identifier objects for each cluster.
- Parameters:
meta_identifiers (pd.DataFrame) – All identifiers (including those filtered out by BQB)
indexed_cluster (pd.Series) – Maps entity indices to cluster IDs
sbml_df (pd.DataFrame) – Original table of entities
ind_clusters (pd.DataFrame) – Cluster assignments from graph algorithm
table_schema (dict) – Schema for the table, used to determine the correct identifier column name
- Returns:
Table mapping clusters to their consensus identifiers, with the identifier column named according to the schema
- Return type:
pd.DataFrame
- napistu.consensus._create_consensus_entities(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, defining_biological_qualifiers: list[str], no_rxn_pathway_ids: list[str] | None = None) tuple[dict, dict]
Create consensus entities for all primary tables in the model.
This helper function creates consensus compartments, species, compartmentalized species, reactions, and reaction species by finding shared entities across source models.
Parameters:
- sbml_dfs_dict: dict{SBML_dfs}
A dictionary of SBML_dfs from different models
- pw_index: PWIndex
An index of all tables being aggregated
- defining_biological_qualifiers: list[str]
Biological qualifier terms that define distinct entities
- no_rxn_pathway_ids: Optional[list[str]] = None,
The pathway ids for models which should not have reactions. If None, use the defaults. This can be used to include pathways which are just metadata like “Dogma”.
Returns:
- tuple:
dict of consensus entities tables
dict of lookup tables
- napistu.consensus._create_consensus_entity_data(combined_entity_data: DataFrame, primary_key: str) DataFrame
Create consensus entity data by combining multiple rows with the same index value.
This function takes a DataFrame that might have multiple rows for the same index value and combines them so there is exactly 1 row per index value using the “first” method.
Parameters:
- combined_entity_datapd.DataFrame
Input DataFrame with potentially multiple rows per index value
- primary_keystr
The column name to use as the primary key for grouping
Returns:
- pd.DataFrame
DataFrame with exactly one row per unique primary key value
- napistu.consensus._create_consensus_sources(agg_tbl: DataFrame, lookup_table: Series, table_schema: dict, pw_index: PWIndex) Series
Create Consensus Sources
Annotate the source of to-be-merged species with the models they came from, and combine with existing annotations.
Parameters:
- agg_tbl: pd.DataFrame
A table containing existing Source objects and a many-1 “new_id” of their post-aggregation consensus entity
- lookup_table: pd.Series
A series where the index are old identifiers and the values are post-aggregation new identifiers
- table_schema: dict
Summary of the schema for the operant entitye type
- pw_index: PWIndex
An index of all tables being aggregated
Returns:
- new_sources: pd.DataFrame
Mapping where the index is new identifiers and values are aggregated Source objects
- napistu.consensus._create_consensus_table(agg_primary_table: DataFrame, lookup_table: Series, updated_identifiers: Series, table_schema: dict) DataFrame
Create a consensus table with merged entities.
Parameters:
- agg_primary_table: pd.DataFrame
Table of entities
- lookup_table: pd.Series
Lookup table mapping old IDs to new IDs
- updated_identifiers: pd.Series
Series mapping new IDs to merged identifier objects
- table_schema: dict
Schema for the table
Returns:
- pd.DataFrame
Consensus table with one row per unique entity
- napistu.consensus._create_default_consensus_source(sbml_dfs_dict: dict[str, SBML_dfs]) Source
A default consensus source is created when no model source object is provided.
- napistu.consensus._create_entity_consensus(membership_lookup: DataFrame, table_schema: dict) tuple[DataFrame, Series]
Create consensus entities based on membership.
Parameters:
- membership_lookup: pd.DataFrame
Table mapping entities to their member strings
- table_schema: dict
Schema for the table
Returns:
- tuple:
Consensus entities DataFrame
Lookup table mapping old IDs to new IDs
- napistu.consensus._create_entity_lookup_table(agg_table_harmonized: DataFrame, table_schema: dict) Series
Create a lookup table mapping original entity IDs to new consensus IDs.
Parameters:
- agg_table_harmonized: pd.DataFrame
Table with cluster assignments for each entity
- table_schema: dict
Schema for the table
Returns:
- pd.Series
Lookup table mapping old entity IDs to new consensus IDs
- napistu.consensus._create_member_string(x: list[str]) str
- napistu.consensus._create_membership_lookup(agg_tbl: DataFrame, table_schema: dict) DataFrame
Create a lookup table for entity membership.
Parameters:
- agg_tbl: pd.DataFrame
Table with member information
- table_schema: dict
Schema for the table
Returns:
- pd.DataFrame
Lookup table mapping entity IDs to member strings
- napistu.consensus._create_vertex_category(df: DataFrame, category: str) DataFrame
Create vertex dataframe for a specific category from a source column.
- napistu.consensus._filter_identifiers_by_qualifier(meta_identifiers: DataFrame, defining_biological_qualifiers: list[str]) DataFrame
Filter identifiers to only include those with specific biological qualifiers.
Parameters:
- meta_identifiers: pd.DataFrame
Table of identifiers
- defining_biological_qualifiers: list[str]
List of biological qualifier types to keep
Returns:
- pd.DataFrame
Filtered identifiers
- napistu.consensus._get_no_rxn_pathway_ids(pw_index: PWIndex, no_rxn_pathway_ids: list[str] | None = None) list[str]
Get the pathway ids for models which should not have reactions.
- Parameters:
pw_index (pd.DataFrame) – The pathway index.
no_rxn_pathway_ids (list, optional) – The pathway ids for models which should not have reactions. If None, use the defaults.
- Returns:
no_rxn_pathway_ids – The pathway ids for models which should not have reactions.
- Return type:
list
- napistu.consensus._handle_entries_without_identifiers(sbml_df: DataFrame, valid_identifiers: DataFrame) DataFrame
Handle entities that don’t have identifiers by adding dummy identifiers.
Parameters:
- sbml_df: pd.DataFrame
Original table of entities
- valid_identifiers: pd.DataFrame
Table of identifiers that passed filtering
Returns:
- pd.DataFrame
Valid identifiers with dummy entries added
- napistu.consensus._merge_entity_data(sbml_dfs_dict: dict[str, SBML_dfs], lookup_table: Series, table: str) dict
Merge Entity Data
Report cases where a single “new” id is associated with multiple different values of entity_var
- Parameters:
sbml_dfs_dict (dict) – dictionary where values are to-be-merged model nnames and values are SBML_dfs
lookup_table (pd.Series) – a series where the index is an old model and primary key and the value is the new consensus id
table (str) – table whose data is being consolidates (currently species or reactions)
- Returns:
entity_data – dictionary containing pd.DataFrames which aggregate all of the individual entity_data tables in “sbml_dfs_dict”
- Return type:
dict
- napistu.consensus._merge_entity_data_create_consensus(entity_data_dict: dict, lookup_table: Series, entity_schema: dict, an_entity_data_type: str, table: str) DataFrame
Merge Entity Data - Report Mismatches
Report cases where a single “new” id is associated with multiple different values of entity_var
Parameters:
- entity_data_dict: dict
Dictionary containing all model’s “an_entity_data_type” dictionaries
- lookup_table: pd.Series
A series where the index is an old model and primary key and the value is the new consensus id
- entity_schema: dict
Schema for “table”
- an_entity_data_type: str
data_type from species/reactions_data in entity_data_dict
- table: str
table whose data is being consolidates (currently species or reactions)
Returns:
- pd.DataFrame
Table where index is primary key of “table” and values are all distinct annotations from “an_entity_data_type”.
- napistu.consensus._merge_entity_data_report_mismatches(combined_entity_data: DataFrame, entity_schema: dict, an_entity_data_type: str, table: str) None
Merge Entity Data - Report Mismatches
Report cases where a single “new” id is associated with multiple different values of entity_var
- Parameters:
combined_entity_data (pd.DataFrame) – indexed by table primary key containing all data from “an_entity_data_type”
entity_schema (dict) – schema for “table”
an_entity_data_type (str) – data_type from species/reactions_data in combined_entity_data
table (str) – table whose data is being consolidates (currently species or reactions)
- Return type:
None
- napistu.consensus._merge_entity_identifiers(agg_primary_table: DataFrame, lookup_table: Series, table_schema: dict) Series
Merge identifiers from multiple entities.
Parameters:
- agg_primary_table: pd.DataFrame
Table of entities
- lookup_table: pd.Series
Lookup table mapping old IDs to new IDs
- table_schema: dict
Schema for the table
Returns:
- pd.Series
Series mapping new IDs to merged identifier objects
- napistu.consensus._pre_consensus_compartment_check(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex) None
Check for compartment compatibility across models before consensus building.
This function identifies models that won’t mix well in a consensus because they contain non-overlapping sets of compartments. It constructs a bipartite graph connecting models to their compartments and identifies disconnected components, which indicate incompatible compartment structures.
- Parameters:
sbml_dfs_dict (dict) – Dictionary containing SBML dataframes for each model, keyed by model name.
pw_index (pandas.DataFrame) – Pathway index dataframe containing model metadata and pathway information.
- Returns:
This function returns None but logs error messages if incompatible compartment structures are detected.
- Return type:
None
Notes
The function builds a graph where: - Models are connected to their compartments via shared identifiers - Compartments are connected to their model-specific labels - Disconnected components indicate models with non-overlapping compartment sets
If multiple disconnected components are found, an error is logged listing the incompatible compartment groups that would result in an unmixed consensus.
Examples
>>> sbml_dfs_dict = {"model1": sbml_dfs1, "model2": sbml_dfs2} >>> pw_index = pd.DataFrame({"model": ["model1", "model2"], ...}) >>> _pre_consensus_compartment_check(sbml_dfs_dict, pw_index) # Logs error if models have incompatible compartment structures
- napistu.consensus._pre_consensus_ontology_check(sbml_dfs_dict: dict[str, SBML_dfs], entity_type: str) None
Check for ontology compatibility across models before consensus building.
This function determines whether any models possess disjoint sets of ontologies for a given entity type (compartments, or species). It constructs a bipartite graph connecting models to their ontologies and identifies disconnected components, which indicate models with non-overlapping ontology structures.
- Parameters:
sbml_dfs_dict (dict[str, SBML_dfs]) – Dictionary containing SBML dataframes for each model, keyed by model name.
entity_type (str) – The type of entity to check ontologies for. Must be one of ‘compartments’, ‘species’, or ‘reactions’.
- Returns:
This function returns None but logs error messages if incompatible ontology structures are detected.
- Return type:
None
Notes
The function builds a graph where: - Models are connected to ontologies they contain for the specified entity type - Disconnected components indicate models with non-overlapping ontology sets
If multiple disconnected components are found, an error is logged listing the incompatible ontology groups that would result in an unmixed consensus.
Examples
>>> sbml_dfs_dict = {"model1": sbml_dfs1, "model2": sbml_dfs2} >>> _pre_consensus_ontology_check(sbml_dfs_dict, "compartments") # Logs error if models have incompatible compartment ontologies
- napistu.consensus._prepare_consensus_table(agg_table_harmonized: DataFrame, table_schema: dict, cluster_consensus_identifiers: DataFrame) DataFrame
Prepare a consensus table with one row per unique entity.
Parameters:
- agg_table_harmonized: pd.DataFrame
Table with nameness scores and cluster assignments
- table_schema: dict
Schema for the table
- cluster_consensus_identifiers: pd.DataFrame
Consensus identifiers for each cluster
Returns:
- pd.DataFrame
New consensus table with merged entities
- napistu.consensus._prepare_identifier_edgelist(valid_identifiers: DataFrame, sbml_df: DataFrame) DataFrame
Prepare an edgelist for clustering identifiers.
Parameters:
- valid_identifiers: pd.DataFrame
Table of identifiers
- sbml_df: pd.DataFrame
Original table of entities
Returns:
- pd.DataFrame
Edgelist connecting entities to their identifiers
- napistu.consensus._prepare_member_table(sbml_dfs_dict: dict[str, SBML_dfs], defined_by: str, defined_lookup_tables: dict, table_schema: dict, defined_by_schema: dict, defining_attrs: list[str], table: str = 'reactions') tuple[DataFrame, str]
Prepare a table of members and validate their structure.
Parameters:
- sbml_dfs_dict: dict[str, SBML_dfs]
Dictionary of SBML_dfs from different models
- defined_by: str
Name of the table whose entries define membership
- defined_lookup_tables: dict
Lookup tables for updating IDs
- table_schema: dict
Schema for the main table
- defined_by_schema: dict
Schema for the defining table
- defining_attrs: list[str]
Attributes that define a unique member
- table: str
Name of the main table (default: REACTIONS)
Returns:
- tuple:
Updated aggregated table with member strings
Name of the foreign key
- napistu.consensus._reduce_to_consensus_ids(sbml_df: DataFrame, table_schema: dict, pw_index: PWIndex | None = None, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) tuple[DataFrame, Series]
Reduce a table of entities to unique entries based on consensus identifiers.
This function clusters entities that share identifiers (as defined by the provided biological qualifiers) and produces a new table of unique entities, along with a lookup table mapping original entities to consensus IDs.
- Parameters:
sbml_df (pd.DataFrame) – Table of entities from multiple models, with model in the index (as produced by _unnest_SBML_df).
table_schema (dict) – Schema for the table being reduced.
pw_index (PWIndex) – An index of all tables being aggregated. Optional if no source information is required.
defining_biological_qualifiers (list[str]) – List of biological qualifier types which define distinct entities. Defaults to BQB_DEFINING_ATTRS.
- Returns:
new_id_table (pd.DataFrame) – Table matching the schema of one of the input models, with merged entities.
lookup_table (pd.Series) – Series mapping the index of the aggregated entities to new consensus IDs.
- napistu.consensus._remove_no_rxn_pathways(no_rxn_pathway_ids: list[str], sbml_dfs_dict: dict[str, SBML_dfs], compspec_lookup_table: DataFrame) None
Remove pathways which don’t contribute reactions from the pw_index.
- Parameters:
no_rxn_pathway_ids (list) – The pathway ids for models which should not have reactions. (i.e., models which are just species metadata like “Dogma”)
sbml_dfs_dict (dict) – The dictionary of SBML_dfs.
compspec_lookup_table (pd.DataFrame) – The lookup table for compartmentalized species.
- Returns:
Modifies objects in place.
- Return type:
None
- napistu.consensus._report_consensus_merges(lookup_table: Series, table_schema: dict, agg_tbl: DataFrame | None = None, sbml_dfs_dict: dict[str, SBML_dfs] | None = None, n_example_merges: int = 3) None
Report Consensus Merges
Print a summary of merges that occurred
Parameters:
- lookup_tablepd.Series
An index of “model” and the entities primary key with values of new_id
- table_schemadict
Schema of the table being merged
- agg_tblpd.DataFrame or None
Contains the original model, primary keys and a label. Required if the primary key is not r_id (i.e., reactions)
- sbml_dfs_dictpd.DataFrame or None
The dict of full models across all models. Used to create reaction formulas if the primary key is r_id
- n_example_mergesint
Number of example merges to report details on
Returns:
None
- napistu.consensus._resolve_reversibility(sbml_dfs_dict: dict[str, SBML_dfs], rxn_consensus_species: DataFrame, rxn_lookup_table: Series) DataFrame
For a set of merged reactions determine what their consensus reaction reversibilities are
- napistu.consensus._unnest_SBML_df(sbml_dfs_dict: dict[str, SBML_dfs], table: str) DataFrame
Unnest and concatenate a specific table from multiple SBML_dfs models.
This function merges corresponding tables from a set of models into a single DataFrame, adding the model name as an index level.
- Parameters:
sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.
table (str) – The name of the table to aggregate (e.g., ‘species’, ‘reactions’, ‘compartments’).
- Returns:
A concatenated table with a MultiIndex of model and entity ID.
- Return type:
pd.DataFrame
- napistu.consensus._update_foreign_keys(agg_tbl: DataFrame, table_schema: dict, fk_lookup_tables: dict) DataFrame
Update one or more foreign keys based on old-to-new foreign key lookup table(s).
- napistu.consensus._validate_consensus_table(new_id_table: DataFrame, sbml_df: DataFrame) None
Validate that the new consensus table has the same structure as the original.
Parameters:
- new_id_table: pd.DataFrame
Newly created consensus table
- sbml_df: pd.DataFrame
Original table from which consensus was built
Raises:
- ValueError
If index names or columns don’t match
- napistu.consensus._validate_merge_entity_data_create_consensus(entity_data_dict, an_entity_data_type, models_w_entity_data_type)
Validate creating a consensus of entity data tables in cases where the same table is present in multiple models
This function checks whether tables with the same entity data key can be reasonably merged (same index and column names) or whether they seem like apples-to-oranges.
Parameters:
- entity_data_dict: dict
Dictionary containing all model’s “an_entity_data_type” dictionaries
- an_entity_data_type: str
The type of entity data to merge
- models_w_entity_data_type: list
List of models with the same entity data type
Returns:
None
Raises:
- ValueError:
If the tables have different index or column names
- napistu.consensus._validate_meta_identifiers(meta_identifiers: DataFrame) None
Check Identifiers to make sure they aren’t empty and flag cases where IDs are missing BQB terms.
- napistu.consensus.construct_consensus_model(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, model_source: Source | None = None, dogmatic: bool = True, check_mergeability: bool = True, no_rxn_pathway_ids: list[str] | None = None) SBML_dfs
Construct a Consensus Model by merging shared entities across pathway models.
This function takes a dictionary of pathway models and merges shared entities (compartments, species, reactions, etc.) into a single consensus model, using a set of rules for entity identity and merging.
- Parameters:
sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.
pw_index (PWIndex) – An index of all tables being aggregated, used for cross-referencing entities.
model_source (Source) – A source object for the consensus model.
dogmatic (bool, default=True) – If True, preserve genes, transcripts, and proteins as separate species. If False, merge them when possible.
check_mergeability (bool, default=True) – whether to check for issues which will prevent merging across models
no_rxn_pathway_ids (list, optional) – The pathway ids for models which should not have reactions. If None, use the defaults. This can be used to include pathways which are just metadata like “Dogma”.
- Returns:
A consensus SBML_dfs object containing the merged model.
- Return type:
- napistu.consensus.construct_meta_entities_fk(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: DataFrame, table: str = 'compartmentalized_species', fk_lookup_tables: dict = {}, extra_defining_attrs: list = []) tuple[DataFrame, Series]
Construct Meta Entities Defined by Foreign Keys
Aggregating across one entity type for a set of pathway models merge entities which are defined by their foreign keys.
Parameters:
- sbml_df_dict: dict{“model”: SBML_dfs}
A dictionary of cpr.SBML_dfs
- pw_index: PWIndex
An index of all tables being aggregated
- table:
A table/entity set from the sbml_dfs to work-with
- fk_lookup_tables: dict
Dictionary containing lookup tables for all foreign keys used by the table
- extra_defining_attrs: list
List of terms which uniquely define a reaction species in addition to the foreign keys. A common case is when a species is a modifier and a substrate in a reaction.
Returns:
- new_id_table: pd.DataFrame
Matching the schema of one of the tables within sbml_df_dict
- lookup_table: pd.Series
Matches the index of the aggregated entities to new_ids
- napistu.consensus.construct_meta_entities_identifiers(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex, table: str, fk_lookup_tables: dict = {}, defining_biological_qualifiers: list[str] = ['BQB_IS', 'BQB_IS_HOMOLOG_TO']) tuple[DataFrame, Series]
Construct meta-entities by merging entities across models that share identifiers.
Aggregates a single entity type from a set of pathway models and merges entities that share identifiers (as defined by the provided biological qualifiers).
- Parameters:
sbml_dfs_dict (dict[str, SBML_dfs]) – A dictionary of SBML_dfs objects from different models, keyed by model name.
pw_index (PWIndex) – An index of all tables being aggregated.
table (str) – The name of the table/entity set to aggregate (e.g., ‘species’, ‘compartments’).
fk_lookup_tables (dict, optional) – Dictionary containing lookup tables for all foreign keys used by the table (default: empty dict).
defining_biological_qualifiers (list[str], optional) – List of BQB codes which define distinct entities. Defaults to BQB_DEFINING_ATTRS.
- Returns:
new_id_table (pd.DataFrame) – Table matching the schema of one of the input models, with merged entities.
lookup_table (pd.Series) – Series mapping the index of the aggregated entities to new consensus IDs.
- napistu.consensus.construct_meta_entities_members(sbml_dfs_dict: dict[str, SBML_dfs], pw_index: PWIndex | None, table: str = 'reactions', defined_by: str = 'reaction_species', defined_lookup_tables: dict = {}, defining_attrs: list[str] = ['sc_id', 'stoichiometry']) tuple[DataFrame, Series]
Construct Meta Entities Defined by Membership
Aggregating across one entity type for a set of pathway models, merge entities with the same members.
Parameters:
- sbml_df_dict: dict{“model”: SBML_dfs}
A dictionary of SBML_dfs
- pw_index: PWIndex
An index of all tables being aggregated
- table: str
A table/entity set from the sbml_dfs to work-with
- defined_by: dict
A table/entity set whose entries are members of “table”
- defined_lookup_tables: {pd.Series}
Lookup table for updating the ids of “defined_by”
- defining_attrs: [str]
A list of attributes which jointly define a unique entity
Returns:
- new_id_table: pd.DataFrame
Matching the schema of one of the tables within sbml_df_dict
- lookup_table: pd.Series
Matches the index of the aggregated entities to new_ids
- napistu.consensus.construct_sbml_dfs_dict(pw_index: DataFrame, strict: bool = True, verbose: bool = False) dict[str, SBML_dfs]
Construct a dictionary of SBML_dfs objects from a pathway index.
This function converts all models in the pathway index into SBML_dfs objects and adds them to a dictionary. Optionally, it can skip erroneous files with a warning instead of raising an error.
- Parameters:
pw_index (pd.DataFrame) – An index of all tables being aggregated, containing model metadata and file paths.
strict (bool, default=True) – If True, raise an error on any file that cannot be loaded. If False, skip erroneous files with a warning.
verbose (bool, default=False) – If True, then include detailed logs.
- Returns:
A dictionary mapping model names to SBML_dfs objects.
- Return type:
dict[str, SBML_dfs]
- napistu.consensus.prepare_consensus_model(sbml_dfs_list: list[SBML_dfs]) tuple[dict[str, SBML_dfs], PWIndex]
Prepare for creating a consensus model using a list of to-be-consolidated sbml_dfs objects.
This function will extract the core source metadata from a set of SBML_dfs objects and use it to create a pathway index object. The pathway_id from these objects will then be used to key the the sbml_dfs_list objects to create the expected input for construct_consensus_model.
- Parameters:
sbml_dfs_list (list[SBML_dfs]) – List of sbml_dfs objects to be consolidated.
- Returns:
sbml_dfs_dict (dict[str, SBML_dfs]) – Dictionary of sbml_dfs objects indexed by pathway_id.
pw_index (PWIndex) – Pathway index object.
- Raises:
ValueError – If the sbml_dfs_list is empty. If the sbml_dfs_list contains sbml_dfs objects with more than one row. If the sbml_dfs_list contains sbml_dfs objects with missing columns. If the sbml_dfs_list contains sbml_dfs objects with duplicate pathway_ids. If the sbml_dfs_list contains sbml_dfs objects with invalid pathway_ids.