napistu.network.neighborhoods

Approaches to define the molecular neighborhoods around a compartmentalized species.

Public Functions

create_neighborhoods(s_ids, sbml_dfs, napistu_graph, network_type, order, top_n, verbose): Create neighborhoods for a set of species and return a table containing all species in each query s_ids neighborhood.
find_and_prune_neighborhoods(sbml_dfs, napistu_graph, compartmentalized_species, precomputed_distances, min_pw_size, source_total_counts, network_type, order, verbose, top_n): Find and prune neighborhoods for a set of species and return a dictionary containing the neighborhood of each compartmentalized species.
find_neighborhoods(sbml_dfs, napistu_graph, compartmentalized_species, network_type, order, min_pw_size, precomputed_neighbors, source_total_counts, verbose): Find neighborhoods for a set of species and return a dictionary containing the neighborhood of each compartmentalized species.

Functions

`add_vertices_uri_urls`(vertices, sbml_dfs)	Add URI URLs to neighborhood vertices DataFrame.
`create_neighborhood_dict_entry`(sc_id, ...[, ...])	Create Neighborhood Dict Entry
`create_neighborhoods`(s_ids, sbml_dfs, ...[, ...])	Create Neighborhoods
`find_and_prune_neighborhoods`(sbml_dfs, ...)	Find and Prune Neighborhoods
`find_neighborhoods`(sbml_dfs, napistu_graph, ...)	Find Neighborhood
`plot_neighborhood`(neighborhood_graph[, ...])	Plot Neighborhood
`prune_neighborhoods`(neighborhoods[, top_n])	Prune Neighborhoods

napistu.network.neighborhoods._build_final_result(vertices: DataFrame, edges: DataFrame, reaction_sources: DataFrame | None, neighborhood_path_entities: dict, napistu_graph: Graph, sbml_dfs) → dict[str, Any]

Build the final result dictionary with all required components.

Handles the final assembly of the neighborhood result, including adding reference URLs and creating the updated graph.

napistu.network.neighborhoods._build_raw_neighborhood_df(napistu_graph: Graph, compartmentalized_species: list[str], network_type: str, order: int, precomputed_neighbors: DataFrame | None = None) → DataFrame

napistu.network.neighborhoods._calculate_path_attrs(neighborhood_paths: list[list], edges: DataFrame, vertices: list, weight_var: str = 'weight') → tuple[DataFrame, dict[Any, set]]

Calculate Path Attributes

Return the vertices and path weights (sum of edge weights) for a list of paths.

Parameters:

neighborhood_paths (list) – List of lists of edge indices
edges (pd.DataFrame) – Edges with rows correponding to entries in neighborhood_paths inner lists
vertices (list) – List of vertices correponding to the ordering of neighborhood_paths
weights_var (str) – variable in edges to use for scoring path weights

Returns:

path_attributes_df (pd.DataFrame) – A table containing attributes summarizing the path to each neighbor
neighborhood_path_entities (dict) – Dict mapping from each neighbor to the entities connecting it to the focal node

napistu.network.neighborhoods._clean_disconnected_components(vertices: DataFrame, edges: DataFrame, reaction_sources: DataFrame | None, sc_id: str) → tuple[DataFrame, DataFrame, DataFrame | None]

Remove disconnected components and filter related data structures.

Handles the cleanup logic for removing nodes that couldn’t be reached from the focal node and updating all related data structures accordingly.

napistu.network.neighborhoods._create_neighborhood_dict_entry_logging(sc_id: str, one_neighborhood_df: DataFrame, sbml_dfs: SBML_dfs)

napistu.network.neighborhoods._find_neighbors(napistu_graph: Graph, compartmentalized_species: list[str], relationship: str, order: int = 3, precomputed_neighbors: DataFrame | None = None) → DataFrame

Find Neighbors

Identify the neighbors nearby each of the requested compartmentalized_species

If ‘precomputed_neighbors’ are provided, neighbors will be summarized by reformatting this table. Otherwise, neighbors will be found on-the-fly using the igraph.neighborhood() method.

napistu.network.neighborhoods._find_neighbors_paths(neighborhood_graph: Graph, one_neighborhood_df: DataFrame, sc_id: str, edges: DataFrame) → tuple[DataFrame, dict[Any, set], DataFrame, dict[Any, set]]

Find shortest paths between the focal node and its neighbors in both directions.

This function calculates shortest paths from the focal node to its descendants (downstream) and ancestors (upstream) using igraph’s shortest path algorithms. It uses _calculate_path_attrs to compute path attributes including path weights, lengths, and polarity information.

Parameters:

neighborhood_graph (ig.Graph) – The igraph Graph object representing the neighborhood network
one_neighborhood_df (pd.DataFrame) – DataFrame containing neighborhood information with ‘relationship’ column indicating ‘descendants’ or ‘ancestors’ for each node
sc_id (str) – The compartmentalized species ID of the focal node
edges (pd.DataFrame) – DataFrame containing edge information with columns for ‘from’, ‘to’, weights, and link polarity

Returns:

downstream_path_attrs (pd.DataFrame) – DataFrame containing path attributes for downstream paths from focal node to descendants. Includes columns: neighbor, path_weight, path_length, net_polarity, final_from, final_to, node_orientation
downstream_entity_dict (dict[Any, set]) – Dictionary mapping each descendant neighbor to the set of entities (nodes) connecting it to the focal node
upstream_path_attrs (pd.DataFrame) – DataFrame containing path attributes for upstream paths from focal node to ancestors. Includes columns: neighbor, path_weight, path_length, net_polarity, final_from, final_to, node_orientation
upstream_entity_dict (dict[Any, set]) – Dictionary mapping each ancestor neighbor to the set of entities (nodes) connecting it to the focal node

napistu.network.neighborhoods._find_reactions_by_relationship(precomputed_neighbors, compartmentalized_species: list, sbml_dfs: SBML_dfs, relationship: str) → DataFrame | None

Find Reactions by Relationship

Based on an ancestor-descendant edgelist of compartmentalized species find all reactions which involve 2+ members

Since we primarily care about paths between species and reactions are more of a means-to-an-end of connecting pairs of species precomputed_distances are generated between just pairs of species this also makes the problem feasible since the number of species is upper bounded at <100K but the number of reactions is unbounded. Having a bound ensures that we can calculate the precomputed_distances efficiently using matrix operations whose memory footprint scales with O(N^2).

napistu.network.neighborhoods._precompute_neighbors(compartmentalized_species: list[str], precomputed_distances: DataFrame, sbml_dfs: SBML_dfs, network_type: str = 'downstream', order: int = 3, top_n: int = 10) → DataFrame

Precompute Neighbors

Identify compartmentalized_species’ most tightly connected neighbors using parameters shared by the on-the-fly methods (order for identifying neighbors within N steps; top_n for identifying the most the lowest weight network paths between the focal node and each possible neighbors). This precomputation will greatly speed up the neighborhood generation for highly connected species or densely connected networks. In those situations naively creating a neighborhood in N steps could contain thousands of neighbors.

napistu.network.neighborhoods._process_path_information(neighborhood_graph: Graph, one_neighborhood_df: DataFrame, sc_id: str, edges: DataFrame, vertices: DataFrame) → tuple[DataFrame, DataFrame, dict]

Process shortest path information and merge with vertices/edges.

Handles the complex path-finding logic and attribute merging that was cluttering the main function.

napistu.network.neighborhoods._prune_vertex_set(one_neighborhood: dict, top_n: int) → DataFrame

Prune Vertex Set

Filter a neighborhood to the lowest weight neighbors connected to the focal node. During this process upstream and downstream nodes are treated separately.

Parameters:

one_neighborhood (dict) – The neighborhood around a single compartmentalized species - one of the values in dict created by find_neighborhoods().
top_n (int) – How many neighboring molecular species should be retained? If the neighborhood includes both upstream and downstream connections (i.e., hourglass), this filter will be applied to both sets separately.

Returns:

vertices – the vertices in one_neighborhood with high weight neighbors removed.

Return type:

pd.DataFrame

napistu.network.neighborhoods._validate_neighborhood_consistency(neighborhood: dict, sc_id: str) → None

Validate that a single neighborhood has consistent vertices, edges, and reaction_sources.

This reproduces the exact validation logic from the R add_sources_to_graph function.

napistu.network.neighborhoods.add_vertices_uri_urls(vertices: DataFrame, sbml_dfs: SBML_dfs) → DataFrame

Add URI URLs to neighborhood vertices DataFrame.

This function enriches a vertices DataFrame with URI URLs for both species and reactions. For species, it adds standard reference identifiers and Pharos IDs where available. For reactions, it adds reaction-specific URI URLs.

Parameters:

vertices (pd.DataFrame) – DataFrame containing neighborhood vertices with the following required columns: - NAPISTU_GRAPH_VERTICES.NAME: The name/identifier of each vertex - NAPISTU_GRAPH_VERTICES.NODE_TYPE: The type of node, either NAPISTU_GRAPH_NODE_TYPES.SPECIES or NAPISTU_GRAPH_NODE_TYPES.REACTION
sbml_dfs (sbml_dfs_core.SBML_dfs) – Pathway model including species, compartmentalized species, reactions and ontologies

Returns:

Input vertices DataFrame enriched with URI URL columns: - For species: standard reference identifier URLs and Pharos IDs - For reactions: reaction-specific URI URLs - Empty strings for missing URLs

Return type:

pd.DataFrame

Raises:

ValueError – If vertices DataFrame is empty (no rows)
TypeError – If the output is not a pandas DataFrame
ValueError – If the output row count doesn’t match the input row count

Notes

Species vertices are merged with compartmentalized_species to get s_id mappings
Reaction vertices are processed directly using their names
Missing URLs are filled with empty strings
The function preserves the original row order and count

napistu.network.neighborhoods.create_neighborhood_dict_entry(sc_id: str, neighborhood_df: DataFrame, sbml_dfs: SBML_dfs, napistu_graph: Graph, min_pw_size: int = 3, source_total_counts: Series | DataFrame | None = None, verbose: bool = False) → dict[str, Any]

Create Neighborhood Dict Entry

Generate a summary of a compartmentalized species’ neighborhood

Parameters:

sc_id (str) – A compartmentalized species id
neighborhood_df (pd.DataFrame) – A table of upstream and/or downstream neighbors of all compartmentalized species
sbml_dfs (sbml_dfs_core.SBML_dfs) – A mechanistic molecular model
napistu_graph (igraph.Graph) – A network connecting molecular species and reactions
min_pw_size (int) – the minimum size of a pathway to be considered
source_total_counts (pd.Series | pd.DataFrame) – Optional, A series of the total counts of each source or a pd.DataFrame with two columns: pathway_id and total_counts. As produced by sbml_dfs.get_source_total_counts()
verbose (bool) – Extra reporting?

Returns:

graph: igraph.Graph: subgraph of sc_id’s neighborhood,
vertices: pd.DataFrame: nodes in the neighborhood
edges: pd.DataFrame: edges in the neighborhood
reaction_sources: pd.DataFrame: models that reactions were derived from
neighborhood_path_entities: dict: upstream and downstream dicts representing entities in paths. If the keys are to be included in a neighborhood, the values should be as well in order to maintain connection to the focal node.

Return type:

dict containing

napistu.network.neighborhoods.create_neighborhoods(s_ids: list[str], sbml_dfs: SBML_dfs, napistu_graph: Graph, network_type: str, order: int, top_n: int, verbose: bool = False) → tuple[DataFrame, dict]

Create Neighborhoods

Create neighborhoods for a set of species and return

Parameters:

s_ids (list(str)) – create a neighborhood around each species
sbml_dfs (sbml_dfs_core.SBML_dfs) – network model
napistu_graph (igraph.Graph) – network associated with sbml_dfs
network_type (str) – downstream, upstream or hourglass (i.e., downstream and upstream)
order (10) – maximum number of steps from the focal node
top_n (30) – target number of upstream and downstream species to retain
verbose (bool) – extra reporting

Returns:

all_neighborhoods_df (pd.DataFrame) – A table containing all species in each query s_ids neighborhood
neighborhood_dicts (dict) – Outputs from find_and_prune_neighborhoods for each s_id

napistu.network.neighborhoods.find_and_prune_neighborhoods(sbml_dfs: SBML_dfs, napistu_graph: Graph, compartmentalized_species: str | list[str], precomputed_distances: DataFrame | None = None, min_pw_size: int = 3, source_total_counts: Series | DataFrame | None = None, network_type: str = 'hourglass', order: int = 3, verbose: bool = True, top_n: int = 10) → dict[str, Any]

Find and Prune Neighborhoods

Wrapper which combines find_neighborhoods() and prune_neighborhoods()

Parameters:

sbml_dfs (sbml_dfs_core.SBML_dfs) – A mechanistic molecular model
napistu_graph (igraph.Graph) – A bipartite network connecting molecular species and reactions
compartmentalized_species ([str] or str) – Compartmentalized species IDs for neighborhood centers
precomputed_distances (pd.DataFrame or None) – If provided, an edgelist of origin->destination path weights and lengths
min_pw_size (int) – the minimum size of a pathway to be considered
source_total_counts (pd.Series | pd.DataFrame | None) – Optional, A series of the total counts of each source or a pd.DataFrame with two columns: pathway_id and total_counts. As produced by sbml_dfs.get_source_total_counts(). If None, pathways will be selected by size rather than statistical enrichment.
network_type (str) – If the network is directed should neighbors be located “downstream”, or “upstream” of each compartmentalized species. The “hourglass” option locates both upstream and downstream species.
order (int) – Max steps away from center node
verbose (bool) – Extra reporting
top_n (int) – How many neighboring molecular species should be retained? If the neighborhood includes both upstream and downstream connections (i.e., hourglass), this filter will be applied to both sets separately.
Returns
----------
species. (A dict containing the neighborhood of each compartmentalized)
subgraph (Each entry in the dict is a dict of the)
vertices
edges. (and)

napistu.network.neighborhoods.find_neighborhoods(sbml_dfs: SBML_dfs, napistu_graph: Graph, compartmentalized_species: list[str], network_type: str = 'hourglass', order: int = 3, min_pw_size: int = 3, precomputed_neighbors: DataFrame | None = None, source_total_counts: Series | DataFrame | None = None, verbose: bool = True) → dict

Find Neighborhood

Create a network composed of all species and reactions within N steps of each of a set of compartmentalized species.

Parameters:

sbml_dfs (sbml_dfs_core.SBML_dfs) – A mechanistic molecular model
napistu_graph (igraph.Graph) – A network connecting molecular species and reactions
compartmentalized_species ([str]) – Compartmentalized species IDs for neighborhood centers
network_type (str) – If the network is directed should neighbors be located “downstream”, or “upstream” of each compartmentalized species. The “hourglass” option locates both upstream and downstream species.
order (int) – Max steps away from center node
precomputed_neighbors (pd.DataFrame or None) – If provided, a pre-filtered table of nodes nearby the compartmentalized species which will be used to skip on-the-fly neighborhood generation.
min_pw_size (int) – the minimum size of a pathway to be considered
source_total_counts (pd.Series | pd.DataFrame | None) – Optional, A series of the total counts of each source or a pd.DataFrame with two columns: pathway_id and total_counts. As produced by sbml_dfs.get_source_total_counts(). If None, pathways will be selected by size rather than statistical enrichment.
verbose (bool) – Extra reporting
Returns
----------
species. (A dict containing the neighborhood of each compartmentalized)
subgraph (Each entry in the dict is a dict of the)
vertices
edges. (and)

napistu.network.neighborhoods.plot_neighborhood(neighborhood_graph: Graph, name_nodes: bool = False, plot_size: int = 1000, network_layout: str = 'drl') → plot

Plot Neighborhood

Parameters:

neighborhood_graph: igraph.Graph: An igraph network
name_nodes: bool: Should nodes be named
plot_size: int: Plot width/height in pixels
network_layout: str: Igraph network layout method

Returns:

An igraph plot

napistu.network.neighborhoods.prune_neighborhoods(neighborhoods: dict, top_n: int = 100) → dict

Prune Neighborhoods

Take a possibly very large neighborhood around a set of focal nodes and prune to the most highly weighted nodes. Nodes weights are constructed as the sum of path weights from the focal node to each neighbor so each pruned neighborhood will still be a single subnetwork.

Parameters:

neighborhoods (dict) – A dictionary of sc_id neighborhoods as produced by find_neighborhoods()
top_n (int) – How many neighbors should be retained? If the neighborhood includes both upstream and downstream connections (i.e., hourglass), this filter will be applied to both sets separately

Returns:

neighborhoods – Same structure as neighborhoods input

Return type:

dict