napistu.network.edgelist

Edgelist class for representing and validating graph edges.

Classes

Edgelist: A class representing an edgelist with validation and merging capabilities.

Classes

Edgelist(data[, source_col, target_col])

A class representing an edgelist with validation and merging capabilities.

class napistu.network.edgelist.Edgelist(data: DataFrame, source_col: str | None = None, target_col: str | None = None)

Bases: object

A class representing an edgelist with validation and merging capabilities.

Wraps a pandas DataFrame containing edge information with standardized column names (source/target or from/to) plus any additional attributes.

Parameters:

data (pd.DataFrame) – DataFrame with edge information. Must have either: - ‘source’ and ‘target’ columns, or - ‘from’ and ‘to’ columns
source_col (str, optional) – Name of source column. If None, auto-detects from ‘source’ or ‘from’
target_col (str, optional) – Name of target column. If None, auto-detects from ‘target’ or ‘to’

df

The underlying DataFrame containing edge data

Type:: pd.DataFrame

source_col

Name of the source column

Type:: str

target_col

Name of the target column

Type:: str

Properties

----------------

standard_merge_by

List of supported merge_by values based on column conventions.

Type:: list[str]

Public Methods

--------------

ensure(data

Ensure the input is an Edgelist.

Type:: Union[Edgelist, pd.DataFrame]) -> Edgelist:

merge_edgelists(other

Merge this edgelist with another edgelist.

Type:: Union[Edgelist, pd.DataFrame], how: str = “inner”, suffixes: tuple[str, str] = (“_x”, “_y”), relationship: Optional[str] = None) -> Edgelist:

to_dataframe → pd.DataFrame: Return the underlying DataFrame.

Examples

>>> import pandas as pd
>>> from napistu.network.edgelist import Edgelist
>>> df = pd.DataFrame({
...     'source': ['A', 'B'],
...     'target': ['B', 'C'],
...     'weight': [1.0, 2.0]
... })
>>> el = Edgelist(df)
>>> el.validate_subset(graph)  # Validate against a graph
>>> merged = el.merge_edgelists(other_edgelist)  # Merge with another edgelist

classmethod ensure(data: DataFrame | Edgelist) → Edgelist

Ensure the input is an Edgelist.

Parameters:: data (pd.DataFrame or Edgelist) – Data to ensure is an Edgelist.

__init__(data: DataFrame, source_col: str | None = None, target_col: str | None = None)

merge_edgelists(other: Edgelist | DataFrame, how: str = 'inner', suffixes: tuple[str, str] = ('_x', '_y'), relationship: str | None = None) → Edgelist

Merge this edgelist with another edgelist.

This merges on the two edge key columns (source/target or from/to). If relationship is provided, the merge keys are validated via napistu.utils.pd_utils.validate_merge before merging.

Parameters:

other (Edgelist or pd.DataFrame) – Other edgelist to merge with
how (str) – Type of merge: ‘inner’, ‘outer’, ‘left’, ‘right’ (default: ‘inner’)
suffixes (tuple[str, str]) – Suffixes to apply to overlapping column names
relationship (str, optional) – Expected relationship type to validate: - ‘1:1’ (one-to-one): both keys are unique - ‘1:m’ (one-to-many): left keys can be matched to multiple right keys, but each right key can only be matched to one left key - ‘m:1’ (many-to-one): right keys can be matched to multiple left keys, but each left key can only be matched to one right key - ‘m:m’ (many-to-many): both keys may have duplicates - ‘1:0’ (one-to-zero-or-one): left keys can be matched to zero or more right keys, but each right key can only be matched to one left key - ‘0:1’ (zero-or-one-to-one): right keys can be matched to zero or more left keys, but each left key can only be matched to one right key If None, no validation is performed.

Returns:

Merged edgelist

Return type:

Edgelist

Raises:

ValueError – If edgelists have no overlapping supported merge_by conventions If relationship validation fails

remove_duplicated_edges(keep: str = 'first', inplace: bool = False) → Edgelist | None

Remove duplicate edges from the edgelist.

For edges with the same (source, target) pair, keeps only one based on the keep parameter. Only considers source and target columns, ignoring all other attributes.

Parameters:

keep (str, default "first") – Which duplicate edge to keep: - “first”: Keep the first occurrence (by DataFrame order) - “last”: Keep the last occurrence (by DataFrame order)
inplace (bool, default False) – If True, modify the edgelist in place. If False, return a new Edgelist.

Returns:

If inplace=False, returns a new Edgelist with duplicate edges removed. If inplace=True, returns None.

Return type:

Edgelist or None

Examples

>>> import pandas as pd
>>> el = Edgelist(pd.DataFrame({
...     'source': ['A', 'B', 'A'],
...     'target': ['B', 'C', 'B'],
...     'weight': [1.0, 2.0, 1.5]
... }))
>>> el_cleaned = el.remove_duplicated_edges(keep="first")
>>> len(el_cleaned)
2  # One of the A->B duplicates is removed

remove_reciprocal_edges(keep: str = 'first', inplace: bool = False) → Edgelist | None

Remove reciprocal edges from the edgelist.

For pairs of edges (A, B) and (B, A), keeps only one direction based on the keep parameter.

Parameters:

keep (str, default "first") – Which edge to keep when a reciprocal pair is found: - “first”: Keep the first edge encountered (by DataFrame order) - “lexicographic”: Keep the edge where source < target (lexicographically)
inplace (bool, default False) – If True, modify the edgelist in place. If False, return a new Edgelist.

Returns:

If inplace=False, returns a new Edgelist with reciprocal edges removed. If inplace=True, returns None.

Return type:

Edgelist or None

Examples

>>> import pandas as pd
>>> el = Edgelist(pd.DataFrame({
...     'source': ['A', 'B', 'C'],
...     'target': ['B', 'A', 'D']
... }))
>>> el_cleaned = el.remove_reciprocal_edges(keep="first")
>>> len(el_cleaned)
2  # One of A->B or B->A is removed

to_dataframe() → DataFrame

Return the underlying DataFrame.

Returns:: The edgelist DataFrame
Return type:: pd.DataFrame

validate_subset(graph: Graph, validate: str = 'both', edgelist_name: str = 'edgelist', graph_name: str = 'graph') → None

Validate that this edgelist is a subset of the graph’s edges.

The merge_by convention is automatically determined from the edgelist’s source/target column conventions: - If using ‘source’/’target’ columns, validates against vertex indices - If using ‘from’/’to’ columns, validates against vertex names - For custom columns, validates against vertex names

Parameters:

graph (Graph) – Graph to validate against.
validate (str) – Entities to validate: ‘vertices’, ‘edges’, or ‘both’. If ‘both’, validates both vertices and edges.
edgelist_name (str) – Name to use for edgelist in error messages.
graph_name (str) – Name to use for graph in error messages.

Raises:

ValueError – If edgelist contains vertices or edges not in graph If source/target columns are not edge attributes in the graph

property has_duplicated_edges: bool

Check if the edgelist contains duplicate edges.

Duplicate edges are multiple rows with the same (source, target) pair, regardless of other attributes.

Returns:: True if duplicate edges exist, False otherwise.
Return type:: bool

Examples

>>> import pandas as pd
>>> el = Edgelist(pd.DataFrame({
...     'source': ['A', 'B', 'A'],
...     'target': ['B', 'C', 'B'],
...     'weight': [1.0, 2.0, 1.5]
... }))
>>> el.has_duplicated_edges
True  # A->B appears twice

property has_reciprocal_edges: bool

Check if the edgelist contains reciprocal edges.

A reciprocal edge is a pair (A, B) where both (A, B) and (B, A) exist in the edgelist.

Returns:: True if reciprocal edges exist, False otherwise.
Return type:: bool

Examples

>>> import pandas as pd
>>> el = Edgelist(pd.DataFrame({
...     'source': ['A', 'B', 'C'],
...     'target': ['B', 'A', 'D']
... }))
>>> el.has_reciprocal_edges
True  # A->B and B->A both exist

property standard_merge_by: str

Suggest a default merge_by value based on column conventions.

Returns:: name for ‘source’/’target’ style columns, index for ‘from’/’to’ style columns. custom merge attribute based on source_col and target_col
Return type:: str