napistu.network.precompute
Functions
|
Filter precomputed distances to only include the top-n pairs for each distance measure. |
|
Precompute Distances between all pairs of species in a NapistuGraph network. |
- napistu.network.precompute._calculate_distances_subset(napistu_graph: NapistuGraph, vs_to_partition: DataFrame, one_partition: DataFrame, weight_vars: list[str] = ['weight', 'weight_upstream'], max_steps: int | None = None) DataFrame
Calculate shortest path distances from a subset of vertices to all vertices.
This function computes both unweighted (hop count) and weighted shortest path distances from a subset of source vertices to all target vertices in the graph. Memory optimization is achieved through early filtering of invalid paths and deduplication of identical weight variables.
- Parameters:
napistu_graph (NapistuGraph) – The network graph containing vertices and weighted edges. Must be a subclass of igraph.Graph with edge attributes specified in weight_vars.
vs_to_partition (pd.DataFrame) – DataFrame containing all target vertices in the graph. Must have columns matching SBML_DFS.SC_ID and NAPISTU_GRAPH_VERTICES.NODE_TYPE. Represents the full set of potential destination nodes.
one_partition (pd.DataFrame) – DataFrame containing the subset of source vertices for this calculation. Must be a subset of vs_to_partition with the same column structure. Represents the source nodes for shortest path calculations.
weight_vars (list of str, default ['weight', 'upstream_weight']) – List of edge attribute names to use for weighted shortest path calculations. Each variable will result in a corresponding ‘path_{weight_var}’ column in the output. Identical weight variables are automatically detected and deduplicated to avoid redundant calculations.
max_steps (int, optional) – Maximum number of hops to consider in shortest paths. If specified, paths longer than max_steps are filtered out during calculation rather than after, reducing memory usage. If None, no early filtering is applied.
- Returns:
DataFrame with shortest path information containing:
- sc_id_originstr
Source vertex identifier from one_partition
- sc_id_deststr
Destination vertex identifier from vs_to_partition
- path_lengthint
Minimum number of hops in unweighted shortest path
- path_{weight_var}float
Minimum weighted path cost for each weight variable specified. One column per entry in weight_vars. Values are np.nan for unreachable vertex pairs.
- Return type:
pd.DataFrame
Notes
Implementation optimizations:
Early filtering: If max_steps is provided, only paths ≤ max_steps and finite distances are retained, significantly reducing memory usage for sparse or filtered networks.
Weight deduplication: Identical weight variables (checked via np.array_equal) are detected automatically. Only unique weight calculations are performed, with results copied to duplicate columns.
Memory efficiency: Distance matrices are processed immediately after calculation and masked arrays are used to avoid storing full NxM matrices for large graphs.
The function assumes that napistu_graph.distances() returns finite values for connected vertex pairs and np.inf for disconnected pairs. Self-loops (same origin and destination) are not filtered at this level.
Examples
>>> # Calculate distances from first 100 nodes to all nodes >>> partition_0 = vs_to_partition.iloc[:100] >>> distances = _calculate_distances_subset( ... graph, vs_to_partition, partition_0, ... weight_vars=['weight', 'upstream_weight'], ... max_steps=5 ... ) >>> distances.head()
- napistu.network.precompute._filter_precomputed_distances(precomputed_distances: DataFrame, max_score_q: float = 1, path_weight_vars: list[str] = ['path_weight', 'path_weight_upstream']) DataFrame
Filter precomputed distances by maximum steps and/or to low scores by quantile.
- napistu.network.precompute._find_unique_weight_vars(napistu_graph: NapistuGraph, weight_vars: list[str]) tuple[dict, dict]
Find unique weight variables to avoid redundant distance calculations.
- Returns:
- (unique_vars_map, representatives)
unique_vars_map: Maps weight_var -> representative_var for calculation
representatives: Maps representative_var -> list of vars it represents
- Return type:
tuple
- napistu.network.precompute._validate_precomputed_distances(precomputed_distances: DataFrame) None
Validate the precomputed distances DataFrame.
This function checks the following: 1. All required variables are present. 2. All weight variables are numeric. 3. No missing values are present. 4. No negative weights are present. 5. No infinite weights are present.
- napistu.network.precompute.filter_precomputed_distances_top_n(precomputed_distances, top_n=50)
Filter precomputed distances to only include the top-n pairs for each distance measure.
- Parameters:
precomputed_distances (pd.DataFrame) – Precomputed distances.
top_n (int, optional) – Top-n pairs to include for each distance measure.
- Returns:
Filtered precomputed distances.
- Return type:
pd.DataFrame
- napistu.network.precompute.precompute_distances(napistu_graph: NapistuGraph, max_steps: int | None = None, max_score_q: float = 1.0, partition_size: int = 1000, weight_vars: list[str] = ['weight', 'weight_upstream']) DataFrame
Precompute Distances between all pairs of species in a NapistuGraph network.
- Parameters:
napistu_graph (NapistuGraph) – An NapistuGraph network model (subclass of igraph.Graph)
max_steps (int) – The maximum number of steps between pairs of species to save a distance
max_score_q (float) – Retain up to the “max_score_q” quantiles of all scores (small scores are better)
partition_size (int) – The number of species to process together when computing distances. Decreasing this value will lower the overall memory footprint of distance calculation.
weight_vars (list) – One or more variables defining edge weights to use when calculating weighted shortest paths. Shortest paths will be separately calculated with each type of weights and used to construct path weights named according to ‘path_{weight_var}’
Returns
----------
containing (A pd.DataFrame)
sc_id_origin (-)
sc_id_dest (-)
path_length (-)
path_weight* (-) – *One variable will exist for each weight specified in ‘weight_vars’