napistu.context.filtering

Functions

filter_reactions_with_disconnected_cspecies(...)

Remove reactions from the SBML_dfs object whose defining compartmentalized species (cspecies) are disconnected according to a co-occurrence matrix derived from a species data table.

filter_species_by_attribute(sbml_dfs, ...[, ...])

Filter species in the SBML_dfs based on an attribute value.

find_species_with_attribute(species_data, ...)

Find species that match the given attribute filter criteria.

napistu.context.filtering._binarize_species_data(species_data: DataFrame) DataFrame

Convert all boolean or binary columns in a species data table to a DataFrame of binary (0/1) values.

This function selects columns of dtype ‘bool’ or integer columns containing only 0 and 1, and converts them to a DataFrame of binary values (0/1). Columns that are not boolean or binary are ignored. If no such columns are found, a ValueError is raised.

Parameters:

species_data (pd.DataFrame) – The species data table to binarize.

Returns:

DataFrame containing only the binarized columns (0/1 values) from the input.

Return type:

pd.DataFrame

Raises:

ValueError – If no binary or boolean columns are found in the input DataFrame.

Warns:

UserWarning – If some columns in the input were not binarized and left out of the output.

napistu.context.filtering._create_cooccurence_edgelist(sbml_dfs: SBML_dfs, species_data_table: str)

Create a co-occurrence edgelist for species based on a binary species data table.

This function computes a co-occurrence matrix for all pairs of species in the given data table, where each entry represents the number of conditions in which both species are present (i.e., have value 1). The result is returned as an edgelist DataFrame with columns ‘s_id_1’, ‘s_id_2’, and ‘cooccurence’.

Parameters:
  • sbml_dfs (sbml_dfs_core.SBML_dfs) – The SBML_dfs object containing the species data table.

  • species_data_table (str) – The name of the species data table to use for co-occurrence calculation. The table must contain only binary or boolean columns.

Returns:

Edgelist DataFrame with columns [‘s_id_1’, ‘s_id_2’, ‘cooccurence’], where each row gives the number of conditions in which the two species co-occur.

Return type:

pd.DataFrame

Raises:

ValueError – If no binary or boolean columns are found in the species data table.

napistu.context.filtering._find_reactions_with_disconnected_cspecies(coccurrence_edgelist: DataFrame, sbml_dfs: SBML_dfs | None, cooccurence_threshold: int = 0) set

Find reactions with disconnected cspecies.

This function finds reactions with disconnected cspecies based on the cooccurrence matrix. Only cspecies which are DEFINING are considered because these are AND rules for reaction operability. It returns the set of reaction ids with disconnected cspecies.

Parameters:
  • coccurrence_edgelist (pd.DataFrame) – The cooccurrence edgelist.

  • sbml_dfs (sbml_dfs_core.SBML_dfs) – The SBML_dfs object.

  • cooccurence_threshold (int) – The threshold for cooccurrence. Values equal to or below this threshold are considered disconnected.

Returns:

The set of reaction ids with disconnected cspecies.

Return type:

set

napistu.context.filtering.filter_reactions_with_disconnected_cspecies(sbml_dfs: SBML_dfs, species_data_table: str, inplace: bool = False) SBML_dfs | None

Remove reactions from the SBML_dfs object whose defining compartmentalized species (cspecies) are disconnected according to a co-occurrence matrix derived from a species data table.

This function identifies reactions where any pair of defining cspecies do not co-occur (i.e., are disconnected) in the provided species data table, and removes those reactions from the model. The operation can be performed in-place or on a copy of the SBML_dfs object.

Parameters:
  • sbml_dfs (sbml_dfs_core.SBML_dfs) – The SBML_dfs object to filter reactions from.

  • species_data_table (str) – The name of the species data table to use for co-occurrence calculation.

  • inplace (bool, optional) – If True, modifies the input SBML_dfs object in-place and returns None. If False (default), returns a new SBML_dfs object with the filtered reactions.

Returns:

If inplace=True, returns None. If inplace=False, returns a new SBML_dfs object with filtered reactions.

Return type:

Optional[sbml_dfs_core.SBML_dfs]

Warns:

UserWarning – If no reactions are pruned based on non-cooccurrence.

Examples

>>> filtered_sbml_dfs = filter_reactions_with_disconnected_cspecies(sbml_dfs, "test_data", inplace=False)
>>> # To modify in-place:
>>> filter_reactions_with_disconnected_cspecies(sbml_dfs, "test_data", inplace=True)
napistu.context.filtering.filter_species_by_attribute(sbml_dfs: SBML_dfs, species_data_table: str, attribute_name: str, attribute_value: int | bool | str | List[str], negate: bool = False, remove_references: bool = True, inplace: bool = True) SBML_dfs | None

Filter species in the SBML_dfs based on an attribute value.

Parameters:
  • sbml_dfs (sbml_dfs_core.SBML_dfs) – The SBML_dfs object to filter.

  • species_data_table (str) – The name of the species data table to filter.

  • attribute_name (str) – The name of the attribute to filter on.

  • attribute_value (Union[int, bool, str, List[str]]) – The value of the attribute to filter on. Can be a single value or a list of values.

  • negate (bool, optional) – Whether to negate the filter, by default False. If True, keeps species with the attribute defined that do NOT match the attribute value.

  • remove_references (bool, optional) – Whether to remove references to the filtered species, by default True. If False, keeps references to the filtered species which may result in a validation error.

  • inplace (bool, optional) – Whether to filter the SBML_dfs in place, by default True. If False, returns a new SBML_dfs object with the filtered species.

Returns:

If inplace=True, returns None. If inplace=False, returns a new SBML_dfs object with the filtered species.

Return type:

Optional[sbml_dfs_core.SBML_dfs]

Raises:

ValueError – If species_data_table is not found in sbml_dfs.species_data If attribute_name is not found in the species data table columns

napistu.context.filtering.find_species_with_attribute(species_data: DataFrame, attribute_name: str, attribute_value: int | bool | str | List[str], negate: bool = False) List[str]

Find species that match the given attribute filter criteria.

Parameters:
  • species_data (pd.DataFrame) – The species data table to filter.

  • attribute_name (str) – The name of the attribute to filter on.

  • attribute_value (Union[int, bool, str, List[str]]) – The value of the attribute to filter on. Can be a single value or a list of values.

  • negate (bool, optional) – Whether to negate the filter, by default False. If True, returns species that do NOT match the attribute value.

Returns:

List of species IDs that match the filter criteria.

Return type:

List[str]

Raises:

ValueError – If attribute_name is not found in the species data table columns