napistu.network.neighborhoods

Approaches to define the molecular neighborhoods around a compartmentalized species.

Public Functions

create_neighborhoods(s_ids, sbml_dfs, napistu_graph, network_type, order, top_n, verbose)

Create neighborhoods for a set of species and return a table containing all species in each query s_ids neighborhood.

find_and_prune_neighborhoods(sbml_dfs, napistu_graph, compartmentalized_species, precomputed_distances, min_pw_size, source_total_counts, network_type, order, verbose, top_n)

Find and prune neighborhoods for a set of species and return a dictionary containing the neighborhood of each compartmentalized species.

find_neighborhoods(sbml_dfs, napistu_graph, compartmentalized_species, network_type, order, min_pw_size, precomputed_neighbors, source_total_counts, verbose)

Find neighborhoods for a set of species and return a dictionary containing the neighborhood of each compartmentalized species.

Functions

add_vertices_uri_urls(vertices, sbml_dfs)

Add URI URLs to neighborhood vertices DataFrame.

create_neighborhood_dict_entry(sc_id, ...[, ...])

Create Neighborhood Dict Entry

create_neighborhoods(s_ids, sbml_dfs, ...[, ...])

Create Neighborhoods

find_and_prune_neighborhoods(sbml_dfs, ...)

Find and Prune Neighborhoods

find_neighborhoods(sbml_dfs, napistu_graph, ...)

Find Neighborhood

plot_neighborhood(neighborhood_graph[, ...])

Plot Neighborhood

prune_neighborhoods(neighborhoods[, top_n])

Prune Neighborhoods

napistu.network.neighborhoods._build_final_result(vertices: DataFrame, edges: DataFrame, reaction_sources: DataFrame | None, neighborhood_path_entities: dict, napistu_graph: Graph, sbml_dfs) dict[str, Any]

Build the final result dictionary with all required components.

Handles the final assembly of the neighborhood result, including adding reference URLs and creating the updated graph.

napistu.network.neighborhoods._build_raw_neighborhood_df(napistu_graph: Graph, compartmentalized_species: list[str], network_type: str, order: int, precomputed_neighbors: DataFrame | None = None) DataFrame
napistu.network.neighborhoods._calculate_path_attrs(neighborhood_paths: list[list], edges: DataFrame, vertices: list, weight_var: str = 'weight') tuple[DataFrame, dict[Any, set]]

Calculate Path Attributes

Return the vertices and path weights (sum of edge weights) for a list of paths.

Parameters:
  • neighborhood_paths (list) – List of lists of edge indices

  • edges (pd.DataFrame) – Edges with rows correponding to entries in neighborhood_paths inner lists

  • vertices (list) – List of vertices correponding to the ordering of neighborhood_paths

  • weights_var (str) – variable in edges to use for scoring path weights

Returns:

  • path_attributes_df (pd.DataFrame) – A table containing attributes summarizing the path to each neighbor

  • neighborhood_path_entities (dict) – Dict mapping from each neighbor to the entities connecting it to the focal node

napistu.network.neighborhoods._clean_disconnected_components(vertices: DataFrame, edges: DataFrame, reaction_sources: DataFrame | None, sc_id: str) tuple[DataFrame, DataFrame, DataFrame | None]

Remove disconnected components and filter related data structures.

Handles the cleanup logic for removing nodes that couldn’t be reached from the focal node and updating all related data structures accordingly.

napistu.network.neighborhoods._create_neighborhood_dict_entry_logging(sc_id: str, one_neighborhood_df: DataFrame, sbml_dfs: SBML_dfs)
napistu.network.neighborhoods._find_neighbors(napistu_graph: Graph, compartmentalized_species: list[str], relationship: str, order: int = 3, precomputed_neighbors: DataFrame | None = None) DataFrame

Find Neighbors

Identify the neighbors nearby each of the requested compartmentalized_species

If ‘precomputed_neighbors’ are provided, neighbors will be summarized by reformatting this table. Otherwise, neighbors will be found on-the-fly using the igraph.neighborhood() method.

napistu.network.neighborhoods._find_neighbors_paths(neighborhood_graph: Graph, one_neighborhood_df: DataFrame, sc_id: str, edges: DataFrame) tuple[DataFrame, dict[Any, set], DataFrame, dict[Any, set]]

Find shortest paths between the focal node and its neighbors in both directions.

This function calculates shortest paths from the focal node to its descendants (downstream) and ancestors (upstream) using igraph’s shortest path algorithms. It uses _calculate_path_attrs to compute path attributes including path weights, lengths, and polarity information.

Parameters:
  • neighborhood_graph (ig.Graph) – The igraph Graph object representing the neighborhood network

  • one_neighborhood_df (pd.DataFrame) – DataFrame containing neighborhood information with ‘relationship’ column indicating ‘descendants’ or ‘ancestors’ for each node

  • sc_id (str) – The compartmentalized species ID of the focal node

  • edges (pd.DataFrame) – DataFrame containing edge information with columns for ‘from’, ‘to’, weights, and link polarity

Returns:

  • downstream_path_attrs (pd.DataFrame) – DataFrame containing path attributes for downstream paths from focal node to descendants. Includes columns: neighbor, path_weight, path_length, net_polarity, final_from, final_to, node_orientation

  • downstream_entity_dict (dict[Any, set]) – Dictionary mapping each descendant neighbor to the set of entities (nodes) connecting it to the focal node

  • upstream_path_attrs (pd.DataFrame) – DataFrame containing path attributes for upstream paths from focal node to ancestors. Includes columns: neighbor, path_weight, path_length, net_polarity, final_from, final_to, node_orientation

  • upstream_entity_dict (dict[Any, set]) – Dictionary mapping each ancestor neighbor to the set of entities (nodes) connecting it to the focal node

napistu.network.neighborhoods._find_reactions_by_relationship(precomputed_neighbors, compartmentalized_species: list, sbml_dfs: SBML_dfs, relationship: str) DataFrame | None

Find Reactions by Relationship

Based on an ancestor-descendant edgelist of compartmentalized species find all reactions which involve 2+ members

Since we primarily care about paths between species and reactions are more of a means-to-an-end of connecting pairs of species precomputed_distances are generated between just pairs of species this also makes the problem feasible since the number of species is upper bounded at <100K but the number of reactions is unbounded. Having a bound ensures that we can calculate the precomputed_distances efficiently using matrix operations whose memory footprint scales with O(N^2).

napistu.network.neighborhoods._precompute_neighbors(compartmentalized_species: list[str], precomputed_distances: DataFrame, sbml_dfs: SBML_dfs, network_type: str = 'downstream', order: int = 3, top_n: int = 10) DataFrame

Precompute Neighbors

Identify compartmentalized_species’ most tightly connected neighbors using parameters shared by the on-the-fly methods (order for identifying neighbors within N steps; top_n for identifying the most the lowest weight network paths between the focal node and each possible neighbors). This precomputation will greatly speed up the neighborhood generation for highly connected species or densely connected networks. In those situations naively creating a neighborhood in N steps could contain thousands of neighbors.

napistu.network.neighborhoods._process_path_information(neighborhood_graph: Graph, one_neighborhood_df: DataFrame, sc_id: str, edges: DataFrame, vertices: DataFrame) tuple[DataFrame, DataFrame, dict]

Process shortest path information and merge with vertices/edges.

Handles the complex path-finding logic and attribute merging that was cluttering the main function.

napistu.network.neighborhoods._prune_vertex_set(one_neighborhood: dict, top_n: int) DataFrame

Prune Vertex Set

Filter a neighborhood to the lowest weight neighbors connected to the focal node. During this process upstream and downstream nodes are treated separately.

Parameters:
  • one_neighborhood (dict) – The neighborhood around a single compartmentalized species - one of the values in dict created by find_neighborhoods().

  • top_n (int) – How many neighboring molecular species should be retained? If the neighborhood includes both upstream and downstream connections (i.e., hourglass), this filter will be applied to both sets separately.

Returns:

vertices – the vertices in one_neighborhood with high weight neighbors removed.

Return type:

pd.DataFrame

napistu.network.neighborhoods._validate_neighborhood_consistency(neighborhood: dict, sc_id: str) None

Validate that a single neighborhood has consistent vertices, edges, and reaction_sources.

This reproduces the exact validation logic from the R add_sources_to_graph function.

napistu.network.neighborhoods.add_vertices_uri_urls(vertices: DataFrame, sbml_dfs: SBML_dfs) DataFrame

Add URI URLs to neighborhood vertices DataFrame.

This function enriches a vertices DataFrame with URI URLs for both species and reactions. For species, it adds standard reference identifiers and Pharos IDs where available. For reactions, it adds reaction-specific URI URLs.

Parameters:
  • vertices (pd.DataFrame) – DataFrame containing neighborhood vertices with the following required columns: - NAPISTU_GRAPH_VERTICES.NAME: The name/identifier of each vertex - NAPISTU_GRAPH_VERTICES.NODE_TYPE: The type of node, either NAPISTU_GRAPH_NODE_TYPES.SPECIES or NAPISTU_GRAPH_NODE_TYPES.REACTION

  • sbml_dfs (sbml_dfs_core.SBML_dfs) – Pathway model including species, compartmentalized species, reactions and ontologies

Returns:

Input vertices DataFrame enriched with URI URL columns: - For species: standard reference identifier URLs and Pharos IDs - For reactions: reaction-specific URI URLs - Empty strings for missing URLs

Return type:

pd.DataFrame

Raises:
  • ValueError – If vertices DataFrame is empty (no rows)

  • TypeError – If the output is not a pandas DataFrame

  • ValueError – If the output row count doesn’t match the input row count

Notes

  • Species vertices are merged with compartmentalized_species to get s_id mappings

  • Reaction vertices are processed directly using their names

  • Missing URLs are filled with empty strings

  • The function preserves the original row order and count

napistu.network.neighborhoods.create_neighborhood_dict_entry(sc_id: str, neighborhood_df: DataFrame, sbml_dfs: SBML_dfs, napistu_graph: Graph, min_pw_size: int = 3, source_total_counts: Series | DataFrame | None = None, verbose: bool = False) dict[str, Any]

Create Neighborhood Dict Entry

Generate a summary of a compartmentalized species’ neighborhood

Parameters:
  • sc_id (str) – A compartmentalized species id

  • neighborhood_df (pd.DataFrame) – A table of upstream and/or downstream neighbors of all compartmentalized species

  • sbml_dfs (sbml_dfs_core.SBML_dfs) – A mechanistic molecular model

  • napistu_graph (igraph.Graph) – A network connecting molecular species and reactions

  • min_pw_size (int) – the minimum size of a pathway to be considered

  • source_total_counts (pd.Series | pd.DataFrame) – Optional, A series of the total counts of each source or a pd.DataFrame with two columns: pathway_id and total_counts. As produced by sbml_dfs.get_source_total_counts()

  • verbose (bool) – Extra reporting?

Returns:

graph: igraph.Graph

subgraph of sc_id’s neighborhood,

vertices: pd.DataFrame

nodes in the neighborhood

edges: pd.DataFrame

edges in the neighborhood

reaction_sources: pd.DataFrame

models that reactions were derived from

neighborhood_path_entities: dict

upstream and downstream dicts representing entities in paths. If the keys are to be included in a neighborhood, the values should be as well in order to maintain connection to the focal node.

Return type:

dict containing

napistu.network.neighborhoods.create_neighborhoods(s_ids: list[str], sbml_dfs: SBML_dfs, napistu_graph: Graph, network_type: str, order: int, top_n: int, verbose: bool = False) tuple[DataFrame, dict]

Create Neighborhoods

Create neighborhoods for a set of species and return

Parameters:
  • s_ids (list(str)) – create a neighborhood around each species

  • sbml_dfs (sbml_dfs_core.SBML_dfs) – network model

  • napistu_graph (igraph.Graph) – network associated with sbml_dfs

  • network_type (str) – downstream, upstream or hourglass (i.e., downstream and upstream)

  • order (10) – maximum number of steps from the focal node

  • top_n (30) – target number of upstream and downstream species to retain

  • verbose (bool) – extra reporting

Returns:

  • all_neighborhoods_df (pd.DataFrame) – A table containing all species in each query s_ids neighborhood

  • neighborhood_dicts (dict) – Outputs from find_and_prune_neighborhoods for each s_id

napistu.network.neighborhoods.find_and_prune_neighborhoods(sbml_dfs: SBML_dfs, napistu_graph: Graph, compartmentalized_species: str | list[str], precomputed_distances: DataFrame | None = None, min_pw_size: int = 3, source_total_counts: Series | DataFrame | None = None, network_type: str = 'hourglass', order: int = 3, verbose: bool = True, top_n: int = 10) dict[str, Any]

Find and Prune Neighborhoods

Wrapper which combines find_neighborhoods() and prune_neighborhoods()

Parameters:
  • sbml_dfs (sbml_dfs_core.SBML_dfs) – A mechanistic molecular model

  • napistu_graph (igraph.Graph) – A bipartite network connecting molecular species and reactions

  • compartmentalized_species ([str] or str) – Compartmentalized species IDs for neighborhood centers

  • precomputed_distances (pd.DataFrame or None) – If provided, an edgelist of origin->destination path weights and lengths

  • min_pw_size (int) – the minimum size of a pathway to be considered

  • source_total_counts (pd.Series | pd.DataFrame | None) – Optional, A series of the total counts of each source or a pd.DataFrame with two columns: pathway_id and total_counts. As produced by sbml_dfs.get_source_total_counts(). If None, pathways will be selected by size rather than statistical enrichment.

  • network_type (str) – If the network is directed should neighbors be located “downstream”, or “upstream” of each compartmentalized species. The “hourglass” option locates both upstream and downstream species.

  • order (int) – Max steps away from center node

  • verbose (bool) – Extra reporting

  • top_n (int) – How many neighboring molecular species should be retained? If the neighborhood includes both upstream and downstream connections (i.e., hourglass), this filter will be applied to both sets separately.

  • Returns

  • ----------

  • species. (A dict containing the neighborhood of each compartmentalized)

  • subgraph (Each entry in the dict is a dict of the)

  • vertices

  • edges. (and)

napistu.network.neighborhoods.find_neighborhoods(sbml_dfs: SBML_dfs, napistu_graph: Graph, compartmentalized_species: list[str], network_type: str = 'hourglass', order: int = 3, min_pw_size: int = 3, precomputed_neighbors: DataFrame | None = None, source_total_counts: Series | DataFrame | None = None, verbose: bool = True) dict

Find Neighborhood

Create a network composed of all species and reactions within N steps of each of a set of compartmentalized species.

Parameters:
  • sbml_dfs (sbml_dfs_core.SBML_dfs) – A mechanistic molecular model

  • napistu_graph (igraph.Graph) – A network connecting molecular species and reactions

  • compartmentalized_species ([str]) – Compartmentalized species IDs for neighborhood centers

  • network_type (str) – If the network is directed should neighbors be located “downstream”, or “upstream” of each compartmentalized species. The “hourglass” option locates both upstream and downstream species.

  • order (int) – Max steps away from center node

  • precomputed_neighbors (pd.DataFrame or None) – If provided, a pre-filtered table of nodes nearby the compartmentalized species which will be used to skip on-the-fly neighborhood generation.

  • min_pw_size (int) – the minimum size of a pathway to be considered

  • source_total_counts (pd.Series | pd.DataFrame | None) – Optional, A series of the total counts of each source or a pd.DataFrame with two columns: pathway_id and total_counts. As produced by sbml_dfs.get_source_total_counts(). If None, pathways will be selected by size rather than statistical enrichment.

  • verbose (bool) – Extra reporting

  • Returns

  • ----------

  • species. (A dict containing the neighborhood of each compartmentalized)

  • subgraph (Each entry in the dict is a dict of the)

  • vertices

  • edges. (and)

napistu.network.neighborhoods.plot_neighborhood(neighborhood_graph: Graph, name_nodes: bool = False, plot_size: int = 1000, network_layout: str = 'drl') plot

Plot Neighborhood

Parameters:

neighborhood_graph: igraph.Graph

An igraph network

name_nodes: bool

Should nodes be named

plot_size: int

Plot width/height in pixels

network_layout: str

Igraph network layout method

Returns:

An igraph plot

napistu.network.neighborhoods.prune_neighborhoods(neighborhoods: dict, top_n: int = 100) dict

Prune Neighborhoods

Take a possibly very large neighborhood around a set of focal nodes and prune to the most highly weighted nodes. Nodes weights are constructed as the sum of path weights from the focal node to each neighbor so each pruned neighborhood will still be a single subnetwork.

Parameters:
  • neighborhoods (dict) – A dictionary of sc_id neighborhoods as produced by find_neighborhoods()

  • top_n (int) – How many neighbors should be retained? If the neighborhood includes both upstream and downstream connections (i.e., hourglass), this filter will be applied to both sets separately

Returns:

neighborhoods – Same structure as neighborhoods input

Return type:

dict