napistu.ingestion.obo

Functions for ingesting OBO files

Public Functions

create_go_ancestors_df

Create GO Ancestors DataFrame

create_go_parents_df

Create the GO Parents Table

create_parent_child_graph

Create Parent:Child Graph

download_go_basic_obo

Download the GO Basic OBO file

format_obo_dict_as_df

Format an OBO Dict as a DataFrame

read_obo_as_dict

Read OBO as Dictionary

Functions

create_go_ancestors_df(parent_child_graph)

Create GO Ancestors DataFrame

create_go_parents_df(go_basic_obo_df)

Create the GO Parents Table

create_parent_child_graph(go_parents_df)

Create Parent:Child Graph

download_go_basic_obo([local_obo_path])

Download an OBO file containing GO categories and their relations (but not the genes in each category).

format_obo_dict_as_df(obo_term_dict)

Format an OBO Dict as a DataFrame

read_obo_as_dict(local_obo_path)

Read OBO as Dictionary

napistu.ingestion.obo._find_obo_attrib_dups(one_term) list

Identify attributes which are present multiple times.

napistu.ingestion.obo._format_entry_tuple(line_str: str) tuple | None

Split and return a colon-separated tuple.

napistu.ingestion.obo._isa_str_list_to_dict_list(isa_list: list) list[dict[str, Any]]

Split parent-child relationships from individual strings to dictionaries where parent and child are separated.

napistu.ingestion.obo._reformat_obo_entry_as_dict(one_term, degenerate_attribs) dict
napistu.ingestion.obo.create_go_ancestors_df(parent_child_graph: Graph) DataFrame

Create GO Ancestors DataFrame

Parameters:

parent_child_graph (ig.Graph) – A DAG formed from parent-child relationships.

Returns:

go_ancestors_df – A table with: - go_id: GO ID of a CC GO term of interest - ancestor_id: An ancestor (parent, parent of parent, …)’s GO CC ID

Return type:

pd.DataFrame

napistu.ingestion.obo.create_go_parents_df(go_basic_obo_df: DataFrame) DataFrame

Create the GO Parents Table

Reformat a table with GO attributes into a table with child-parent relationships

Parameters:

go_basic_obo_df (pd.DataFrame) – Table generated from parsing go-basic.obo with obo.format_obo_dict_as_df

Returns:

go_parents_df – A table with: - parent_id: GO ID of parent (from an is-a entry) - parent_name: common name of parent (from an is-a entry) - child_id: GO ID from the index

Return type:

pd.DataFrame

Examples

>>> go_basic_obo_df = obo.format_obo_dict_as_df(obo.read_obo_as_dict(GO_OBO_DEFS.GO_BASIC_LOCAL_TMP))
>>> go_parents_df = obo.create_go_parents_df(go_basic_obo_df)
>>> go_parents_df.head()
   parent_id parent_name child_id
0        GO:0005575             nucleus        GO:0005654
1        GO:0005575             nucleus        GO:0005667
2        GO:0005575             nucleus        GO:0005674
3        GO:0005575             nucleus        GO:0005681
napistu.ingestion.obo.create_parent_child_graph(go_parents_df: DataFrame) Graph

Create Parent:Child Graph

Format the Simple GO CC Ontology as a Directed Acyclic Graph (DAG).

Parameters:

go_parents_df (pd.DataFrame) – A table with: - parent_id: GO ID of parent (from an is-a entry) - parent_name: common name of parent (from an is-a entry) - child_id: GO ID from the index

Returns:

parent_child_graph – A DAG formed from parent-child relationships.

Return type:

ig.Graph

napistu.ingestion.obo.download_go_basic_obo(local_obo_path: str = '/tmp/go-basic.obo') None

Download an OBO file containing GO categories and their relations (but not the genes in each category).

Parameters:

local_obo_path (str) – Path to a local obo file.

Return type:

None

Raises:

FileNotFoundError – If the OBO file was not found after trying to download from the URL.

napistu.ingestion.obo.format_obo_dict_as_df(obo_term_dict: dict) DataFrame

Format an OBO Dict as a DataFrame

Reorganize a dictionary of tuples into a DataFrame

Parameters:

obo_term_dict (dict) – Dictionary where keys are ids and values are tuples containing (attribute, value) pairs

Returns:

obo_df – A pd.DataFrame with one row per identifier and one columns for unique attribute

Return type:

pd.DataFrame

napistu.ingestion.obo.read_obo_as_dict(local_obo_path: str) dict

Read OBO as Dictionary

The Open Biological and Biomedical Ontologies (OBO) format is a standard format for representing ontologies. Many parsers exist for obo but since we are not relying extensively on it and we are trying to minimize dependencies here we provide a few functions for parsing standard obo formats.

Parameters:

local_obo_path (str) – Path to a local obo file.

Returns:

term_dict – Dictionary where keys are ids and values are tuples containing (attribute, value) pairs

Return type:

dict