napistu.network.edgelist
Edgelist class for representing and validating graph edges.
Classes
- Edgelist
A class representing an edgelist with validation and merging capabilities.
Classes
|
A class representing an edgelist with validation and merging capabilities. |
- class napistu.network.edgelist.Edgelist(data: DataFrame, source_col: str | None = None, target_col: str | None = None)
Bases:
objectA class representing an edgelist with validation and merging capabilities.
Wraps a pandas DataFrame containing edge information with standardized column names (source/target or from/to) plus any additional attributes.
- Parameters:
data (pd.DataFrame) – DataFrame with edge information. Must have either: - ‘source’ and ‘target’ columns, or - ‘from’ and ‘to’ columns
source_col (str, optional) – Name of source column. If None, auto-detects from ‘source’ or ‘from’
target_col (str, optional) – Name of target column. If None, auto-detects from ‘target’ or ‘to’
- df
The underlying DataFrame containing edge data
- Type:
pd.DataFrame
- source_col
Name of the source column
- Type:
str
- target_col
Name of the target column
- Type:
str
- Properties
- ----------------
- standard_merge_by
List of supported merge_by values based on column conventions.
- Type:
list[str]
- Public Methods
- --------------
- ensure(data
Ensure the input is an Edgelist.
- Type:
Union[Edgelist, pd.DataFrame]) -> Edgelist:
- merge_edgelists(other
Merge this edgelist with another edgelist.
- Type:
Union[Edgelist, pd.DataFrame], how: str = “inner”, suffixes: tuple[str, str] = (“_x”, “_y”), relationship: Optional[str] = None) -> Edgelist:
- to_dataframe pd.DataFrame
Return the underlying DataFrame.
Examples
>>> import pandas as pd >>> from napistu.network.edgelist import Edgelist >>> df = pd.DataFrame({ ... 'source': ['A', 'B'], ... 'target': ['B', 'C'], ... 'weight': [1.0, 2.0] ... }) >>> el = Edgelist(df) >>> el.validate_subset(graph) # Validate against a graph >>> merged = el.merge_edgelists(other_edgelist) # Merge with another edgelist
- classmethod ensure(data: DataFrame | Edgelist) Edgelist
Ensure the input is an Edgelist.
- Parameters:
data (pd.DataFrame or Edgelist) – Data to ensure is an Edgelist.
- __init__(data: DataFrame, source_col: str | None = None, target_col: str | None = None)
- merge_edgelists(other: Edgelist | DataFrame, how: str = 'inner', suffixes: tuple[str, str] = ('_x', '_y'), relationship: str | None = None) Edgelist
Merge this edgelist with another edgelist.
This merges on the two edge key columns (source/target or from/to). If relationship is provided, the merge keys are validated via napistu.utils.pd_utils.validate_merge before merging.
- Parameters:
other (Edgelist or pd.DataFrame) – Other edgelist to merge with
how (str) – Type of merge: ‘inner’, ‘outer’, ‘left’, ‘right’ (default: ‘inner’)
suffixes (tuple[str, str]) – Suffixes to apply to overlapping column names
relationship (str, optional) – Expected relationship type to validate: - ‘1:1’ (one-to-one): both keys are unique - ‘1:m’ (one-to-many): left keys can be matched to multiple right keys, but each right key can only be matched to one left key - ‘m:1’ (many-to-one): right keys can be matched to multiple left keys, but each left key can only be matched to one right key - ‘m:m’ (many-to-many): both keys may have duplicates - ‘1:0’ (one-to-zero-or-one): left keys can be matched to zero or more right keys, but each right key can only be matched to one left key - ‘0:1’ (zero-or-one-to-one): right keys can be matched to zero or more left keys, but each left key can only be matched to one right key If None, no validation is performed.
- Returns:
Merged edgelist
- Return type:
- Raises:
ValueError – If edgelists have no overlapping supported merge_by conventions If relationship validation fails
- remove_duplicated_edges(keep: str = 'first', inplace: bool = False) Edgelist | None
Remove duplicate edges from the edgelist.
For edges with the same (source, target) pair, keeps only one based on the keep parameter. Only considers source and target columns, ignoring all other attributes.
- Parameters:
keep (str, default "first") – Which duplicate edge to keep: - “first”: Keep the first occurrence (by DataFrame order) - “last”: Keep the last occurrence (by DataFrame order)
inplace (bool, default False) – If True, modify the edgelist in place. If False, return a new Edgelist.
- Returns:
If inplace=False, returns a new Edgelist with duplicate edges removed. If inplace=True, returns None.
- Return type:
Edgelist or None
Examples
>>> import pandas as pd >>> el = Edgelist(pd.DataFrame({ ... 'source': ['A', 'B', 'A'], ... 'target': ['B', 'C', 'B'], ... 'weight': [1.0, 2.0, 1.5] ... })) >>> el_cleaned = el.remove_duplicated_edges(keep="first") >>> len(el_cleaned) 2 # One of the A->B duplicates is removed
- remove_reciprocal_edges(keep: str = 'first', inplace: bool = False) Edgelist | None
Remove reciprocal edges from the edgelist.
For pairs of edges (A, B) and (B, A), keeps only one direction based on the keep parameter.
- Parameters:
keep (str, default "first") – Which edge to keep when a reciprocal pair is found: - “first”: Keep the first edge encountered (by DataFrame order) - “lexicographic”: Keep the edge where source < target (lexicographically)
inplace (bool, default False) – If True, modify the edgelist in place. If False, return a new Edgelist.
- Returns:
If inplace=False, returns a new Edgelist with reciprocal edges removed. If inplace=True, returns None.
- Return type:
Edgelist or None
Examples
>>> import pandas as pd >>> el = Edgelist(pd.DataFrame({ ... 'source': ['A', 'B', 'C'], ... 'target': ['B', 'A', 'D'] ... })) >>> el_cleaned = el.remove_reciprocal_edges(keep="first") >>> len(el_cleaned) 2 # One of A->B or B->A is removed
- to_dataframe() DataFrame
Return the underlying DataFrame.
- Returns:
The edgelist DataFrame
- Return type:
pd.DataFrame
- validate_subset(graph: Graph, validate: str = 'both', edgelist_name: str = 'edgelist', graph_name: str = 'graph') None
Validate that this edgelist is a subset of the graph’s edges.
The merge_by convention is automatically determined from the edgelist’s source/target column conventions: - If using ‘source’/’target’ columns, validates against vertex indices - If using ‘from’/’to’ columns, validates against vertex names - For custom columns, validates against vertex names
- Parameters:
graph (Graph) – Graph to validate against.
validate (str) – Entities to validate: ‘vertices’, ‘edges’, or ‘both’. If ‘both’, validates both vertices and edges.
edgelist_name (str) – Name to use for edgelist in error messages.
graph_name (str) – Name to use for graph in error messages.
- Raises:
ValueError – If edgelist contains vertices or edges not in graph If source/target columns are not edge attributes in the graph
- property has_duplicated_edges: bool
Check if the edgelist contains duplicate edges.
Duplicate edges are multiple rows with the same (source, target) pair, regardless of other attributes.
- Returns:
True if duplicate edges exist, False otherwise.
- Return type:
bool
Examples
>>> import pandas as pd >>> el = Edgelist(pd.DataFrame({ ... 'source': ['A', 'B', 'A'], ... 'target': ['B', 'C', 'B'], ... 'weight': [1.0, 2.0, 1.5] ... })) >>> el.has_duplicated_edges True # A->B appears twice
- property has_reciprocal_edges: bool
Check if the edgelist contains reciprocal edges.
A reciprocal edge is a pair (A, B) where both (A, B) and (B, A) exist in the edgelist.
- Returns:
True if reciprocal edges exist, False otherwise.
- Return type:
bool
Examples
>>> import pandas as pd >>> el = Edgelist(pd.DataFrame({ ... 'source': ['A', 'B', 'C'], ... 'target': ['B', 'A', 'D'] ... })) >>> el.has_reciprocal_edges True # A->B and B->A both exist
- property standard_merge_by: str
Suggest a default merge_by value based on column conventions.
- Returns:
name for ‘source’/’target’ style columns, index for ‘from’/’to’ style columns. custom merge attribute based on source_col and target_col
- Return type:
str