napistu.indices

Pathway Index for organizing metadata and paths of pathway representations.

Classes

PWIndex: Pathway Index for organizing metadata and paths of pathway representations.

Public Functions

adapt_pw_index(source, organismal_species, outdir=None, update_index=False) -> PWIndex:: Adapt a pathway index by filtering for specific organismal species.
create_pathway_index_df(model_keys, model_urls, model_organismal_species, base_path, data_source, model_names=None, file_extension=”.sbml”) -> pd.DataFrame:: Create a pathway index DataFrame from model definitions.

Functions

`adapt_pw_index`(source, organismal_species[, ...])	Adapt a pathway index by filtering for specific organismal species.
`create_pathway_index_df`(model_keys, ...[, ...])	Create a pathway index DataFrame from model definitions.

Classes

PWIndex(pw_index[, pw_index_base_path, ...])

Pathway Index for organizing metadata and paths of pathway representations.

class napistu.indices.PWIndex(pw_index: PathLike[str] | str | DataFrame, pw_index_base_path=None, validate_paths=True)

Bases: object

Pathway Index for organizing metadata and paths of pathway representations.

The PWIndex class manages a collection of pathway files and their associated metadata. It provides functionality to filter, search, and validate pathway data across different sources and species.

index

A table describing the location and contents of pathway files. Contains columns for pathway_id, name, source, organismal_species, file path, URL, and other metadata.

Type:: pd.DataFrame

base_path

Path to directory of indexed files. Set to None if path validation is disabled.

Type:: str or None

filter(data_sources, organismal_species): Filter index based on pathway source and/or organismal species

search(query): Filter index to pathways matching the search query

Examples

>>> # Create a pathway index from a file
>>> pw_index = PWIndex("path/to/pw_index.tsv")
>>>
>>> # Filter for specific sources and species
>>> pw_index.filter(data_sources=["BiGG", "Reactome"], organismal_species="human")
>>>
>>> # Search for pathways containing "metabolism"
>>> pw_index.search("metabolism")
>>>
>>> # Create from DataFrame
>>> df = pd.DataFrame({
...     'pathway_id': ['R-HSA-123456'],
...     'name': ['Test Pathway'],
...     'source': ['Reactome'],
...     'organismal_species': ['human'],
...     'file': ['test.sbml'],
...     'url': ['https://example.com'],
...     'sbml_path': ['/path/to/test.sbml'],
...     'date': ['20231201']
... })
>>> pw_index = PWIndex(df)

__init__(pw_index: PathLike[str] | str | DataFrame, pw_index_base_path=None, validate_paths=True) → None

Initialize a Pathway Index object.

Creates a PWIndex instance from a file path, DataFrame, or PathLike object. The index contains metadata about pathway files and can optionally validate that the referenced files exist.

Parameters:

pw_index (PathLike[str] or str or pd.DataFrame) – Path to index file, or a DataFrame containing pathway index data. The DataFrame should contain all required columns defined in EXPECTED_PW_INDEX_COLUMNS.
pw_index_base_path (str or None, optional) – Base path that relative paths in pw_index will reference. If None and pw_index is a file path, uses the directory of pw_index.
validate_paths (bool, optional) – If True, validates that files referenced in the index exist. If False, skips file validation and sets base_path to None. Default is True.

Return type:

None

Raises:

ValueError – If pw_index is not a valid type or if required columns are missing.
FileNotFoundError – If validate_paths is True and base_path is not a valid directory.
TypeError – If pw_index_base_path is not a string or validate_paths is not a boolean.

Examples

>>> # Create from file path
>>> pw_index = PWIndex("path/to/pw_index.tsv")
>>>
>>> # Create from DataFrame
>>> df = pd.DataFrame({
...     'pathway_id': ['R-HSA-123456'],
...     'name': ['Test Pathway'],
...     'source': ['Reactome'],
...     'organismal_species': ['human'],
...     'file': ['test.sbml'],
...     'url': ['https://example.com'],
...     'sbml_path': ['/path/to/test.sbml'],
...     'date': ['20231201']
... })
>>> pw_index = PWIndex(df)
>>>
>>> # Create with custom base path and no validation
>>> pw_index = PWIndex(
...     "pw_index.tsv",
...     pw_index_base_path="/custom/path",
...     validate_paths=False
... )

_check_files()

Verify that all files referenced in the pathway index exist.

Checks that all files listed in the index’s ‘file’ column exist in the base_path directory. This is used for validation during initialization when validate_paths=True.

Return type:: None
Raises:: FileNotFoundError – If any files referenced in the index are missing from the base_path.

Examples

>>> # This method is called internally during initialization
>>> pw_index = PWIndex("path/to/pw_index.tsv", validate_paths=True)
>>> # If any files are missing, FileNotFoundError will be raised

filter(data_sources: str | Iterable[str] | None = None, organismal_species: str | Iterable[str] | None = None)

Filter the pathway index by data sources and/or organismal species.

Modifies the index in-place to include only pathways that match the specified criteria. If no filters are provided, the index remains unchanged.

Parameters:

data_sources (str or Iterable[str] or None, optional) – Data sources to filter for (e.g., [“BiGG”, “Reactome”]). If None, no filtering by data source is applied.
organismal_species (str or Iterable[str] or None, optional) – Organismal species to filter for (e.g., [“human”, “mouse”]). If None, no filtering by species is applied.

Returns:

Modifies the index in-place.

Return type:

None

Examples

>>> # Filter for specific data sources
>>> pw_index.filter(data_sources=["BiGG", "Reactome"])
>>>
>>> # Filter for specific species
>>> pw_index.filter(organismal_species="human")
>>>
>>> # Filter for both sources and species
>>> pw_index.filter(
...     data_sources=["BiGG"],
...     organismal_species=["human", "mouse"]
... )
>>>
>>> # No filtering (index remains unchanged)
>>> pw_index.filter()

search(query)

Search the pathway index for pathways matching a query string.

Filters the index in-place to include only pathways whose names contain the query string (case-insensitive). Uses regex matching for flexible pattern matching.

Parameters:: query (str) – Search query to match against pathway names. Case-insensitive regex matching is used.
Returns:: Modifies the index in-place.
Return type:: None

Examples

>>> # Search for pathways containing "metabolism"
>>> pw_index.search("metabolism")
>>>
>>> # Search for pathways containing "glycolysis"
>>> pw_index.search("glycolysis")
>>>
>>> # Search with regex pattern
>>> pw_index.search("glucose.*pathway")
>>>
>>> # Case-insensitive search
>>> pw_index.search("METABOLISM")  # Same as "metabolism"

napistu.indices.adapt_pw_index(source: str | PWIndex, organismal_species: str | Iterable[str] | None, outdir: str | None = None, update_index: bool = False) → PWIndex

Adapt a pathway index by filtering for specific organismal species.

This function is helpful for filtering pathway indices for specific species before reconstructing models or performing other operations.

Parameters:

source (str or PWIndex) – URI for pw_index.csv file or PWIndex object to adapt
organismal_species (str or Iterable[str] or None) – Organismal species to filter for. Should match the nomenclature of the pathway index. If None, no filtering is applied.
outdir (str or None, optional) – Optional directory to write the filtered pw_index to. If provided and update_index is True, the filtered index will be saved as “pw_index.tsv” in this directory.
update_index (bool, optional) – Whether to write the filtered pathway index to the output directory. Only used if outdir is provided. Default is False.

Returns:

Filtered pathway index containing only entries for the specified organismal species.

Return type:

PWIndex

Raises:

ValueError – If source is neither a string nor a PWIndex object.

Examples

>>> # Filter pathway index for human species
>>> filtered_index = adapt_pw_index("path/to/pw_index.csv", "human")
>>>
>>> # Filter and save to output directory
>>> filtered_index = adapt_pw_index(
...     pw_index_obj,
...     ["human", "mouse"],
...     outdir="filtered_data",
...     update_index=True
... )

napistu.indices.create_pathway_index_df(model_keys: dict[str, str], model_urls: dict[str, str], model_organismal_species: dict[str, str], base_path: str, data_source: str, model_names: dict[str, str] | None = None, file_extension: str = '.sbml') → DataFrame

Create a pathway index DataFrame from model definitions.

This function creates a standardized pathway index DataFrame that can be used across different model sources. It handles file paths and metadata consistently, generating all required columns for a valid pathway index.

Parameters:

model_keys (dict[str, str]) – Mapping of species identifiers to model keys/IDs. Keys should be species identifiers, values are model keys.
model_urls (dict[str, str]) – Mapping of species identifiers to model download URLs. Keys should match those in model_keys.
model_organismal_species (dict[str, str]) – Mapping of species identifiers to full organismal species names. Keys should match those in model_keys.
base_path (str) – Base directory path where model files will be stored.
data_source (str) – Name of the source database (e.g., “BiGG”, “Reactome”).
model_names (dict[str, str] or None, optional) – Optional mapping of model keys to display names. If None, uses model keys as display names.
file_extension (str, optional) – File extension for model files. Default is “.sbml”.

Returns:

DataFrame containing pathway index information with columns: - pathway_id: Unique identifier for the pathway (from model_keys) - name: Display name for the pathway - source: Source database name - organismal_species: Organismal species name - file: Basename of the model file - url: URL to download the model from - sbml_path: Full path where model will be stored - date: Current date in YYYYMMDD format

Return type:

pd.DataFrame

Raises:

TypeError – If model_names is provided but not a dictionary.
ValueError – If model_names is provided but contains keys not present in model_keys.

Examples

>>> # Create a basic pathway index
>>> model_keys = {"human": "HUMAN", "mouse": "MOUSE"}
>>> model_urls = {
...     "human": "https://bigg.ucsd.edu/models/HUMAN",
...     "mouse": "https://bigg.ucsd.edu/models/MOUSE"
... }
>>> model_species = {"human": "Homo sapiens", "mouse": "Mus musculus"}
>>>
>>> df = create_pathway_index_df(
...     model_keys=model_keys,
...     model_urls=model_urls,
...     model_organismal_species=model_species,
...     base_path="/path/to/models",
...     data_source="BiGG"
... )
>>>
>>> # Create with custom display names
>>> model_names = {"HUMAN": "Human Metabolic Network", "MOUSE": "Mouse Metabolic Network"}
>>> df = create_pathway_index_df(
...     model_keys=model_keys,
...     model_urls=model_urls,
...     model_organismal_species=model_species,
...     base_path="/path/to/models",
...     data_source="BiGG",
...     model_names=model_names
... )
>>>
>>> # Create with custom file extension
>>> df = create_pathway_index_df(
...     model_keys=model_keys,
...     model_urls=model_urls,
...     model_organismal_species=model_species,
...     base_path="/path/to/models",
...     data_source="BiGG",
...     file_extension=".xml"
... )