napistu.indices
Pathway Index for organizing metadata and paths of pathway representations.
Classes
- PWIndex
Pathway Index for organizing metadata and paths of pathway representations.
Public Functions
- adapt_pw_index(source, organismal_species, outdir=None, update_index=False) -> PWIndex:
Adapt a pathway index by filtering for specific organismal species.
- create_pathway_index_df(model_keys, model_urls, model_organismal_species, base_path, data_source, model_names=None, file_extension=”.sbml”) -> pd.DataFrame:
Create a pathway index DataFrame from model definitions.
Functions
|
Adapt a pathway index by filtering for specific organismal species. |
|
Create a pathway index DataFrame from model definitions. |
Classes
|
Pathway Index for organizing metadata and paths of pathway representations. |
- class napistu.indices.PWIndex(pw_index: PathLike[str] | str | DataFrame, pw_index_base_path=None, validate_paths=True)
Bases:
objectPathway Index for organizing metadata and paths of pathway representations.
The PWIndex class manages a collection of pathway files and their associated metadata. It provides functionality to filter, search, and validate pathway data across different sources and species.
- index
A table describing the location and contents of pathway files. Contains columns for pathway_id, name, source, organismal_species, file path, URL, and other metadata.
- Type:
pd.DataFrame
- base_path
Path to directory of indexed files. Set to None if path validation is disabled.
- Type:
str or None
- filter(data_sources, organismal_species)
Filter index based on pathway source and/or organismal species
- search(query)
Filter index to pathways matching the search query
Examples
>>> # Create a pathway index from a file >>> pw_index = PWIndex("path/to/pw_index.tsv") >>> >>> # Filter for specific sources and species >>> pw_index.filter(data_sources=["BiGG", "Reactome"], organismal_species="human") >>> >>> # Search for pathways containing "metabolism" >>> pw_index.search("metabolism") >>> >>> # Create from DataFrame >>> df = pd.DataFrame({ ... 'pathway_id': ['R-HSA-123456'], ... 'name': ['Test Pathway'], ... 'source': ['Reactome'], ... 'organismal_species': ['human'], ... 'file': ['test.sbml'], ... 'url': ['https://example.com'], ... 'sbml_path': ['/path/to/test.sbml'], ... 'date': ['20231201'] ... }) >>> pw_index = PWIndex(df)
- __init__(pw_index: PathLike[str] | str | DataFrame, pw_index_base_path=None, validate_paths=True) None
Initialize a Pathway Index object.
Creates a PWIndex instance from a file path, DataFrame, or PathLike object. The index contains metadata about pathway files and can optionally validate that the referenced files exist.
- Parameters:
pw_index (PathLike[str] or str or pd.DataFrame) – Path to index file, or a DataFrame containing pathway index data. The DataFrame should contain all required columns defined in EXPECTED_PW_INDEX_COLUMNS.
pw_index_base_path (str or None, optional) – Base path that relative paths in pw_index will reference. If None and pw_index is a file path, uses the directory of pw_index.
validate_paths (bool, optional) – If True, validates that files referenced in the index exist. If False, skips file validation and sets base_path to None. Default is True.
- Return type:
None
- Raises:
ValueError – If pw_index is not a valid type or if required columns are missing.
FileNotFoundError – If validate_paths is True and base_path is not a valid directory.
TypeError – If pw_index_base_path is not a string or validate_paths is not a boolean.
Examples
>>> # Create from file path >>> pw_index = PWIndex("path/to/pw_index.tsv") >>> >>> # Create from DataFrame >>> df = pd.DataFrame({ ... 'pathway_id': ['R-HSA-123456'], ... 'name': ['Test Pathway'], ... 'source': ['Reactome'], ... 'organismal_species': ['human'], ... 'file': ['test.sbml'], ... 'url': ['https://example.com'], ... 'sbml_path': ['/path/to/test.sbml'], ... 'date': ['20231201'] ... }) >>> pw_index = PWIndex(df) >>> >>> # Create with custom base path and no validation >>> pw_index = PWIndex( ... "pw_index.tsv", ... pw_index_base_path="/custom/path", ... validate_paths=False ... )
- _check_files()
Verify that all files referenced in the pathway index exist.
Checks that all files listed in the index’s ‘file’ column exist in the base_path directory. This is used for validation during initialization when validate_paths=True.
- Return type:
None
- Raises:
FileNotFoundError – If any files referenced in the index are missing from the base_path.
Examples
>>> # This method is called internally during initialization >>> pw_index = PWIndex("path/to/pw_index.tsv", validate_paths=True) >>> # If any files are missing, FileNotFoundError will be raised
- filter(data_sources: str | Iterable[str] | None = None, organismal_species: str | Iterable[str] | None = None)
Filter the pathway index by data sources and/or organismal species.
Modifies the index in-place to include only pathways that match the specified criteria. If no filters are provided, the index remains unchanged.
- Parameters:
data_sources (str or Iterable[str] or None, optional) – Data sources to filter for (e.g., [“BiGG”, “Reactome”]). If None, no filtering by data source is applied.
organismal_species (str or Iterable[str] or None, optional) – Organismal species to filter for (e.g., [“human”, “mouse”]). If None, no filtering by species is applied.
- Returns:
Modifies the index in-place.
- Return type:
None
Examples
>>> # Filter for specific data sources >>> pw_index.filter(data_sources=["BiGG", "Reactome"]) >>> >>> # Filter for specific species >>> pw_index.filter(organismal_species="human") >>> >>> # Filter for both sources and species >>> pw_index.filter( ... data_sources=["BiGG"], ... organismal_species=["human", "mouse"] ... ) >>> >>> # No filtering (index remains unchanged) >>> pw_index.filter()
- search(query)
Search the pathway index for pathways matching a query string.
Filters the index in-place to include only pathways whose names contain the query string (case-insensitive). Uses regex matching for flexible pattern matching.
- Parameters:
query (str) – Search query to match against pathway names. Case-insensitive regex matching is used.
- Returns:
Modifies the index in-place.
- Return type:
None
Examples
>>> # Search for pathways containing "metabolism" >>> pw_index.search("metabolism") >>> >>> # Search for pathways containing "glycolysis" >>> pw_index.search("glycolysis") >>> >>> # Search with regex pattern >>> pw_index.search("glucose.*pathway") >>> >>> # Case-insensitive search >>> pw_index.search("METABOLISM") # Same as "metabolism"
- napistu.indices.adapt_pw_index(source: str | PWIndex, organismal_species: str | Iterable[str] | None, outdir: str | None = None, update_index: bool = False) PWIndex
Adapt a pathway index by filtering for specific organismal species.
This function is helpful for filtering pathway indices for specific species before reconstructing models or performing other operations.
- Parameters:
source (str or PWIndex) – URI for pw_index.csv file or PWIndex object to adapt
organismal_species (str or Iterable[str] or None) – Organismal species to filter for. Should match the nomenclature of the pathway index. If None, no filtering is applied.
outdir (str or None, optional) – Optional directory to write the filtered pw_index to. If provided and update_index is True, the filtered index will be saved as “pw_index.tsv” in this directory.
update_index (bool, optional) – Whether to write the filtered pathway index to the output directory. Only used if outdir is provided. Default is False.
- Returns:
Filtered pathway index containing only entries for the specified organismal species.
- Return type:
- Raises:
ValueError – If source is neither a string nor a PWIndex object.
Examples
>>> # Filter pathway index for human species >>> filtered_index = adapt_pw_index("path/to/pw_index.csv", "human") >>> >>> # Filter and save to output directory >>> filtered_index = adapt_pw_index( ... pw_index_obj, ... ["human", "mouse"], ... outdir="filtered_data", ... update_index=True ... )
- napistu.indices.create_pathway_index_df(model_keys: dict[str, str], model_urls: dict[str, str], model_organismal_species: dict[str, str], base_path: str, data_source: str, model_names: dict[str, str] | None = None, file_extension: str = '.sbml') DataFrame
Create a pathway index DataFrame from model definitions.
This function creates a standardized pathway index DataFrame that can be used across different model sources. It handles file paths and metadata consistently, generating all required columns for a valid pathway index.
- Parameters:
model_keys (dict[str, str]) – Mapping of species identifiers to model keys/IDs. Keys should be species identifiers, values are model keys.
model_urls (dict[str, str]) – Mapping of species identifiers to model download URLs. Keys should match those in model_keys.
model_organismal_species (dict[str, str]) – Mapping of species identifiers to full organismal species names. Keys should match those in model_keys.
base_path (str) – Base directory path where model files will be stored.
data_source (str) – Name of the source database (e.g., “BiGG”, “Reactome”).
model_names (dict[str, str] or None, optional) – Optional mapping of model keys to display names. If None, uses model keys as display names.
file_extension (str, optional) – File extension for model files. Default is “.sbml”.
- Returns:
DataFrame containing pathway index information with columns: - pathway_id: Unique identifier for the pathway (from model_keys) - name: Display name for the pathway - source: Source database name - organismal_species: Organismal species name - file: Basename of the model file - url: URL to download the model from - sbml_path: Full path where model will be stored - date: Current date in YYYYMMDD format
- Return type:
pd.DataFrame
- Raises:
TypeError – If model_names is provided but not a dictionary.
ValueError – If model_names is provided but contains keys not present in model_keys.
Examples
>>> # Create a basic pathway index >>> model_keys = {"human": "HUMAN", "mouse": "MOUSE"} >>> model_urls = { ... "human": "https://bigg.ucsd.edu/models/HUMAN", ... "mouse": "https://bigg.ucsd.edu/models/MOUSE" ... } >>> model_species = {"human": "Homo sapiens", "mouse": "Mus musculus"} >>> >>> df = create_pathway_index_df( ... model_keys=model_keys, ... model_urls=model_urls, ... model_organismal_species=model_species, ... base_path="/path/to/models", ... data_source="BiGG" ... ) >>> >>> # Create with custom display names >>> model_names = {"HUMAN": "Human Metabolic Network", "MOUSE": "Mouse Metabolic Network"} >>> df = create_pathway_index_df( ... model_keys=model_keys, ... model_urls=model_urls, ... model_organismal_species=model_species, ... base_path="/path/to/models", ... data_source="BiGG", ... model_names=model_names ... ) >>> >>> # Create with custom file extension >>> df = create_pathway_index_df( ... model_keys=model_keys, ... model_urls=model_urls, ... model_organismal_species=model_species, ... base_path="/path/to/models", ... data_source="BiGG", ... file_extension=".xml" ... )