napistu.network.edgelist

Edgelist class for representing and validating graph edges.

Classes

Edgelist

A class representing an edgelist with validation and merging capabilities.

Classes

Edgelist(data[, source_col, target_col])

A class representing an edgelist with validation and merging capabilities.

class napistu.network.edgelist.Edgelist(data: DataFrame, source_col: str | None = None, target_col: str | None = None)

Bases: object

A class representing an edgelist with validation and merging capabilities.

Wraps a pandas DataFrame containing edge information with standardized column names (source/target or from/to) plus any additional attributes.

Parameters:
  • data (pd.DataFrame) – DataFrame with edge information. Must have either: - ‘source’ and ‘target’ columns, or - ‘from’ and ‘to’ columns

  • source_col (str, optional) – Name of source column. If None, auto-detects from ‘source’ or ‘from’

  • target_col (str, optional) – Name of target column. If None, auto-detects from ‘target’ or ‘to’

df

The underlying DataFrame containing edge data

Type:

pd.DataFrame

source_col

Name of the source column

Type:

str

target_col

Name of the target column

Type:

str

Properties
----------------
standard_merge_by

List of supported merge_by values based on column conventions.

Type:

list[str]

Public Methods
--------------
ensure(data

Ensure the input is an Edgelist.

Type:

Union[Edgelist, pd.DataFrame]) -> Edgelist:

merge_edgelists(other

Merge this edgelist with another edgelist.

Type:

Union[Edgelist, pd.DataFrame], how: str = “inner”, suffixes: tuple[str, str] = (“_x”, “_y”), relationship: Optional[str] = None) -> Edgelist:

to_dataframe pd.DataFrame

Return the underlying DataFrame.

Examples

>>> import pandas as pd
>>> from napistu.network.edgelist import Edgelist
>>> df = pd.DataFrame({
...     'source': ['A', 'B'],
...     'target': ['B', 'C'],
...     'weight': [1.0, 2.0]
... })
>>> el = Edgelist(df)
>>> el.validate_subset(graph)  # Validate against a graph
>>> merged = el.merge_edgelists(other_edgelist)  # Merge with another edgelist
classmethod ensure(data: DataFrame | Edgelist) Edgelist

Ensure the input is an Edgelist.

Parameters:

data (pd.DataFrame or Edgelist) – Data to ensure is an Edgelist.

__init__(data: DataFrame, source_col: str | None = None, target_col: str | None = None)
merge_edgelists(other: Edgelist | DataFrame, how: str = 'inner', suffixes: tuple[str, str] = ('_x', '_y'), relationship: str | None = None) Edgelist

Merge this edgelist with another edgelist.

This merges on the two edge key columns (source/target or from/to). If relationship is provided, the merge keys are validated via napistu.utils.pd_utils.validate_merge before merging.

Parameters:
  • other (Edgelist or pd.DataFrame) – Other edgelist to merge with

  • how (str) – Type of merge: ‘inner’, ‘outer’, ‘left’, ‘right’ (default: ‘inner’)

  • suffixes (tuple[str, str]) – Suffixes to apply to overlapping column names

  • relationship (str, optional) – Expected relationship type to validate: - ‘1:1’ (one-to-one): both keys are unique - ‘1:m’ (one-to-many): left keys can be matched to multiple right keys, but each right key can only be matched to one left key - ‘m:1’ (many-to-one): right keys can be matched to multiple left keys, but each left key can only be matched to one right key - ‘m:m’ (many-to-many): both keys may have duplicates - ‘1:0’ (one-to-zero-or-one): left keys can be matched to zero or more right keys, but each right key can only be matched to one left key - ‘0:1’ (zero-or-one-to-one): right keys can be matched to zero or more left keys, but each left key can only be matched to one right key If None, no validation is performed.

Returns:

Merged edgelist

Return type:

Edgelist

Raises:

ValueError – If edgelists have no overlapping supported merge_by conventions If relationship validation fails

remove_duplicated_edges(keep: str = 'first', inplace: bool = False) Edgelist | None

Remove duplicate edges from the edgelist.

For edges with the same (source, target) pair, keeps only one based on the keep parameter. Only considers source and target columns, ignoring all other attributes.

Parameters:
  • keep (str, default "first") – Which duplicate edge to keep: - “first”: Keep the first occurrence (by DataFrame order) - “last”: Keep the last occurrence (by DataFrame order)

  • inplace (bool, default False) – If True, modify the edgelist in place. If False, return a new Edgelist.

Returns:

If inplace=False, returns a new Edgelist with duplicate edges removed. If inplace=True, returns None.

Return type:

Edgelist or None

Examples

>>> import pandas as pd
>>> el = Edgelist(pd.DataFrame({
...     'source': ['A', 'B', 'A'],
...     'target': ['B', 'C', 'B'],
...     'weight': [1.0, 2.0, 1.5]
... }))
>>> el_cleaned = el.remove_duplicated_edges(keep="first")
>>> len(el_cleaned)
2  # One of the A->B duplicates is removed
remove_reciprocal_edges(keep: str = 'first', inplace: bool = False) Edgelist | None

Remove reciprocal edges from the edgelist.

For pairs of edges (A, B) and (B, A), keeps only one direction based on the keep parameter.

Parameters:
  • keep (str, default "first") – Which edge to keep when a reciprocal pair is found: - “first”: Keep the first edge encountered (by DataFrame order) - “lexicographic”: Keep the edge where source < target (lexicographically)

  • inplace (bool, default False) – If True, modify the edgelist in place. If False, return a new Edgelist.

Returns:

If inplace=False, returns a new Edgelist with reciprocal edges removed. If inplace=True, returns None.

Return type:

Edgelist or None

Examples

>>> import pandas as pd
>>> el = Edgelist(pd.DataFrame({
...     'source': ['A', 'B', 'C'],
...     'target': ['B', 'A', 'D']
... }))
>>> el_cleaned = el.remove_reciprocal_edges(keep="first")
>>> len(el_cleaned)
2  # One of A->B or B->A is removed
to_dataframe() DataFrame

Return the underlying DataFrame.

Returns:

The edgelist DataFrame

Return type:

pd.DataFrame

validate_subset(graph: Graph, validate: str = 'both', edgelist_name: str = 'edgelist', graph_name: str = 'graph') None

Validate that this edgelist is a subset of the graph’s edges.

The merge_by convention is automatically determined from the edgelist’s source/target column conventions: - If using ‘source’/’target’ columns, validates against vertex indices - If using ‘from’/’to’ columns, validates against vertex names - For custom columns, validates against vertex names

Parameters:
  • graph (Graph) – Graph to validate against.

  • validate (str) – Entities to validate: ‘vertices’, ‘edges’, or ‘both’. If ‘both’, validates both vertices and edges.

  • edgelist_name (str) – Name to use for edgelist in error messages.

  • graph_name (str) – Name to use for graph in error messages.

Raises:

ValueError – If edgelist contains vertices or edges not in graph If source/target columns are not edge attributes in the graph

property has_duplicated_edges: bool

Check if the edgelist contains duplicate edges.

Duplicate edges are multiple rows with the same (source, target) pair, regardless of other attributes.

Returns:

True if duplicate edges exist, False otherwise.

Return type:

bool

Examples

>>> import pandas as pd
>>> el = Edgelist(pd.DataFrame({
...     'source': ['A', 'B', 'A'],
...     'target': ['B', 'C', 'B'],
...     'weight': [1.0, 2.0, 1.5]
... }))
>>> el.has_duplicated_edges
True  # A->B appears twice
property has_reciprocal_edges: bool

Check if the edgelist contains reciprocal edges.

A reciprocal edge is a pair (A, B) where both (A, B) and (B, A) exist in the edgelist.

Returns:

True if reciprocal edges exist, False otherwise.

Return type:

bool

Examples

>>> import pandas as pd
>>> el = Edgelist(pd.DataFrame({
...     'source': ['A', 'B', 'C'],
...     'target': ['B', 'A', 'D']
... }))
>>> el.has_reciprocal_edges
True  # A->B and B->A both exist
property standard_merge_by: str

Suggest a default merge_by value based on column conventions.

Returns:

name for ‘source’/’target’ style columns, index for ‘from’/’to’ style columns. custom merge attribute based on source_col and target_col

Return type:

str