napistu.ingestion.obo
Functions for ingesting OBO files
Public Functions
- create_go_ancestors_df
Create GO Ancestors DataFrame
- create_go_parents_df
Create the GO Parents Table
- create_parent_child_graph
Create Parent:Child Graph
- download_go_basic_obo
Download the GO Basic OBO file
- format_obo_dict_as_df
Format an OBO Dict as a DataFrame
- read_obo_as_dict
Read OBO as Dictionary
Functions
|
Create GO Ancestors DataFrame |
|
Create the GO Parents Table |
|
Create Parent:Child Graph |
|
Download an OBO file containing GO categories and their relations (but not the genes in each category). |
|
Format an OBO Dict as a DataFrame |
|
Read OBO as Dictionary |
- napistu.ingestion.obo._find_obo_attrib_dups(one_term) list
Identify attributes which are present multiple times.
- napistu.ingestion.obo._format_entry_tuple(line_str: str) tuple | None
Split and return a colon-separated tuple.
- napistu.ingestion.obo._isa_str_list_to_dict_list(isa_list: list) list[dict[str, Any]]
Split parent-child relationships from individual strings to dictionaries where parent and child are separated.
- napistu.ingestion.obo._reformat_obo_entry_as_dict(one_term, degenerate_attribs) dict
- napistu.ingestion.obo.create_go_ancestors_df(parent_child_graph: Graph) DataFrame
Create GO Ancestors DataFrame
- Parameters:
parent_child_graph (ig.Graph) – A DAG formed from parent-child relationships.
- Returns:
go_ancestors_df – A table with: - go_id: GO ID of a CC GO term of interest - ancestor_id: An ancestor (parent, parent of parent, …)’s GO CC ID
- Return type:
pd.DataFrame
- napistu.ingestion.obo.create_go_parents_df(go_basic_obo_df: DataFrame) DataFrame
Create the GO Parents Table
Reformat a table with GO attributes into a table with child-parent relationships
- Parameters:
go_basic_obo_df (pd.DataFrame) – Table generated from parsing go-basic.obo with obo.format_obo_dict_as_df
- Returns:
go_parents_df – A table with: - parent_id: GO ID of parent (from an is-a entry) - parent_name: common name of parent (from an is-a entry) - child_id: GO ID from the index
- Return type:
pd.DataFrame
Examples
>>> go_basic_obo_df = obo.format_obo_dict_as_df(obo.read_obo_as_dict(GO_OBO_DEFS.GO_BASIC_LOCAL_TMP)) >>> go_parents_df = obo.create_go_parents_df(go_basic_obo_df) >>> go_parents_df.head() parent_id parent_name child_id 0 GO:0005575 nucleus GO:0005654 1 GO:0005575 nucleus GO:0005667 2 GO:0005575 nucleus GO:0005674 3 GO:0005575 nucleus GO:0005681
- napistu.ingestion.obo.create_parent_child_graph(go_parents_df: DataFrame) Graph
Create Parent:Child Graph
Format the Simple GO CC Ontology as a Directed Acyclic Graph (DAG).
- Parameters:
go_parents_df (pd.DataFrame) – A table with: - parent_id: GO ID of parent (from an is-a entry) - parent_name: common name of parent (from an is-a entry) - child_id: GO ID from the index
- Returns:
parent_child_graph – A DAG formed from parent-child relationships.
- Return type:
ig.Graph
- napistu.ingestion.obo.download_go_basic_obo(local_obo_path: str = '/tmp/go-basic.obo') None
Download an OBO file containing GO categories and their relations (but not the genes in each category).
- Parameters:
local_obo_path (str) – Path to a local obo file.
- Return type:
None
- Raises:
FileNotFoundError – If the OBO file was not found after trying to download from the URL.
- napistu.ingestion.obo.format_obo_dict_as_df(obo_term_dict: dict) DataFrame
Format an OBO Dict as a DataFrame
Reorganize a dictionary of tuples into a DataFrame
- Parameters:
obo_term_dict (dict) – Dictionary where keys are ids and values are tuples containing (attribute, value) pairs
- Returns:
obo_df – A pd.DataFrame with one row per identifier and one columns for unique attribute
- Return type:
pd.DataFrame
- napistu.ingestion.obo.read_obo_as_dict(local_obo_path: str) dict
Read OBO as Dictionary
The Open Biological and Biomedical Ontologies (OBO) format is a standard format for representing ontologies. Many parsers exist for obo but since we are not relying extensively on it and we are trying to minimize dependencies here we provide a few functions for parsing standard obo formats.
- Parameters:
local_obo_path (str) – Path to a local obo file.
- Returns:
term_dict – Dictionary where keys are ids and values are tuples containing (attribute, value) pairs
- Return type:
dict