napistu.ingestion.psi_mi
Functions
|
Aggregate PSI-MI molecular interactions and study metadata and format results as a dictionary of dataframes. |
|
Format PSI 3.0 |
|
Format PSI-MI XML files |
- napistu.ingestion.psi_mi._create_reaction_species_df(one_study: Dict[str, Any]) DataFrame
Format the interactions in the study into a dataframe of reaction species.
- Parameters:
one_study (dict[str, Any]) – Study data dictionary
- Returns:
DataFrame containing reaction species data
- Return type:
pd.DataFrame
- napistu.ingestion.psi_mi._create_study_species_df(one_study: Dict[str, Any]) Tuple[DataFrame, DataFrame]
Create species and species identifiers DataFrames from study data.
- Parameters:
one_study (dict[str, Any]) – Study data dictionary
- Returns:
species_df (pd.DataFrame) – DataFrame containing species data
species_identifiers (pd.DataFrame) – DataFrame containing species identifiers data
- napistu.ingestion.psi_mi._format_entry(an_entry: Element, xml_namespace: str) Dict[str, Any]
Extract a single XML entry of interactors and interactions.
- Parameters:
an_entry (xml.etree.ElementTree.Element) – XML entry element to format
xml_namespace (str) – XML namespace to use for parsing
- Returns:
Dictionary containing formatted entry data with keys: source, experiment, interactor_list, interactions_list
- Return type:
dict[str, Any]
- Raises:
ValueError – If the entry tag is not as expected
- napistu.ingestion.psi_mi._format_entry_experiment(an_entry: Element, xml_namespace: str) Dict[str, str]
Format experiment-level information in an XML entry.
- Parameters:
an_entry (xml.etree.ElementTree.Element) – XML entry element to format
xml_namespace (str) – XML namespace to use for parsing
- Returns:
Dictionary containing experiment information with keys: experiment_name, interaction_method, ontology, identifier
- Return type:
dict[str, str]
- napistu.ingestion.psi_mi._format_entry_interaction(interaction: Element, xml_namespace: str) Dict[str, Any]
Format a single interaction in an XML interaction list.
- Parameters:
interaction (xml.etree.ElementTree.Element) – XML interaction element to format
xml_namespace (str) – XML namespace to use for parsing
- Returns:
Dictionary containing formatted interaction data with keys: interaction_name, interaction_type, interactors
- Return type:
dict[str, Any]
- napistu.ingestion.psi_mi._format_entry_interaction_participants(interaction_participant: Element, xml_namespace: str) Dict[str, str]
Format the participants in an XML interaction.
- Parameters:
interaction_participant (xml.etree.ElementTree.Element) – XML participant element to format
xml_namespace (str) – XML namespace to use for parsing
- Returns:
Dictionary containing formatted participant data with keys: participant_id, interactor_id, biological_role, experimental_role
- Return type:
dict[str, str]
- Raises:
ValueError – If the participant tag is not as expected
- napistu.ingestion.psi_mi._format_entry_interactions(an_entry: Element, xml_namespace: str) List[Dict[str, Any]]
Format the molecular interaction in an XML entry.
- Parameters:
an_entry (xml.etree.ElementTree.Element) – XML entry element to format
xml_namespace (str) – XML namespace to use for parsing
- Returns:
List of dictionaries containing formatted interaction data
- Return type:
list[dict[str, Any]]
- napistu.ingestion.psi_mi._format_entry_interactor(interactor: Element, xml_namespace: str) Dict[str, Any]
Format a single molecular interactor in an interaction list XML node.
- Parameters:
interactor (xml.etree.ElementTree.Element) – XML interactor element to format
xml_namespace (str) – XML namespace to use for parsing
- Returns:
Dictionary containing formatted interactor data with keys: interactor_id, interactor_label, interactor_name, interactor_aliases, interactor_xrefs
- Return type:
dict[str, Any]
- Raises:
ValueError – If the interactor tag is not as expected
- napistu.ingestion.psi_mi._format_entry_interactor_list(an_entry: Element, xml_namespace: str) List[Dict[str, Any]]
Format the molecular interactors in an XML entry.
- Parameters:
an_entry (xml.etree.ElementTree.Element) – XML entry element to format
xml_namespace (str) – XML namespace to use for parsing
- Returns:
List of dictionaries containing formatted interactor data
- Return type:
list[dict[str, Any]]
- napistu.ingestion.psi_mi._format_entry_interactor_xrefs(interactor: Element, xml_namespace: str) List[Dict[str, str]]
Format the cross-references of a single interactor.
- Parameters:
interactor (xml.etree.ElementTree.Element) – XML interactor element to format
xml_namespace (str) – XML namespace to use for parsing
- Returns:
List of dictionaries containing cross-reference data with keys: ref_type, ontology, identifier
- Return type:
list[dict[str, str]]
- napistu.ingestion.psi_mi._format_entry_source(an_entry: Element, xml_namespace: str) Dict[str, str]
Format the source describing the provenance of an XML entry.
- Parameters:
an_entry (xml.etree.ElementTree.Element) – XML entry element to format
xml_namespace (str) – XML namespace to use for parsing
- Returns:
Dictionary containing source information with keys: shortLabel, fullName
- Return type:
dict[str, str]
- napistu.ingestion.psi_mi._format_study_level_data(one_study: Dict[str, Any]) DataFrame
Format study-level data into a DataFrame.
- Parameters:
one_study (dict[str, Any]) – Study data dictionary
- Returns:
DataFrame containing study-level data
- Return type:
pd.DataFrame
- napistu.ingestion.psi_mi._get_optional_attribute(element: Element, xpath: str, attribute: str, default: str = '') str
Safely extract an attribute from an optional XML element.
- Parameters:
element (xml.etree.ElementTree.Element) – The parent element to search within
xpath (str) – The xpath expression to find the child element
attribute (str) – The attribute name to extract
default (str, optional) – Default value to return if element or attribute is not found, by default PSI_MI_MISSING_VALUE_STR
- Returns:
The attribute value, or the default value if not found
- Return type:
str
- napistu.ingestion.psi_mi._get_optional_text(element: Element, xpath: str, default: str = '') str
Safely extract text from an optional XML element.
- Parameters:
element (xml.etree.ElementTree.Element) – The parent element to search within
xpath (str) – The xpath expression to find the child element
default (str, optional) – Default value to return if element is not found, by default PSI_MI_MISSING_VALUE_STR
- Returns:
The text content of the element, or the default value if not found
- Return type:
str
- napistu.ingestion.psi_mi.aggregate_psi_mis(formatted_psi_mis: List[Dict[str, Any]]) Dict[str, DataFrame]
Aggregate PSI-MI molecular interactions and study metadata and format results as a dictionary of dataframes.
- Parameters:
formatted_psi_mis (dict[str, Any]) – A dictionary of PSI-MI files, where the keys are the study IDs and the values are the PSI-MI files. As returned by napistu.ingestion.psi_mi.format_psi_mis.
- Returns:
A dictionary of dataframes, where the keys are the study IDs and the values are: - reaction_species : A dataframe of reaction species, where the columns are the reaction species and the rows are the study IDs. - species : A dataframe of species, where the columns are the species and the rows are the study IDs. - species_identifiers : A dataframe of species identifiers, where the columns are the species identifiers and the rows are the study IDs. - study_level_data : A dataframe of study level data, where the columns are the study level data and the rows are the study IDs.
- Return type:
dict[str, pd.DataFrame]
- napistu.ingestion.psi_mi.format_psi_mi(xml_path: str, xml_namespace: str = '{http://psi.hupo.org/mi/mif300}', verbose: bool = False) list[dict[str, Any]]
Format PSI 3.0
Format an .xml file containing molecular interactions following the PSI 3.0 format.
- Parameters:
xml_path (str) – Path to a .xml file
xml_namespace (str, optional) – Namespace for the xml file, by default PSI_MI_INTACT_XML_NAMESPACE
verbose (bool, optional) – Whether to print verbose output, by default False
- Returns:
entry_list – A list containing molecular interaction entry dicts of the format: - source : dict containing the database that interactions were drawn from. - experiment : a simple summary of the experimental design and the publication. - interactor_list : list containing dictionaries annotating the molecules (defined by their “interactor_id”) involved in interactions. - interactions_list : list containing dictionaries annotating molecular interactions involving a set of “interactor_id”s.
- Return type:
list[dict[str, Any]]
- Raises:
FileNotFoundError – If the XML file does not exist
ValueError – If the XML file does not have the expected root tag
- napistu.ingestion.psi_mi.format_psi_mis(intact_xml_dir: str, xml_namespace: str = '{http://psi.hupo.org/mi/mif300}', verbose: bool = False, files_to_process: int = -1) list[dict[str, Any]]
Format PSI-MI XML files
Format PSI-MI XML files into a list of dictionaries.
- Parameters:
intact_xml_dir (str) – Path to the directory containing the PSI-MI XML files
xml_namespace (str, optional) – Namespace for the xml file, by default PSI_MI_INTACT_XML_NAMESPACE
verbose (bool, optional) – Whether to print verbose output, by default False
files_to_process (int, optional) – Number of files to process (-1 for all files), by default -1
- Returns:
formatted_psi_mis – A list containing molecular interaction entry dicts of the format: - source : dict containing the database that interactions were drawn from. - experiment : a simple summary of the experimental design and the publication. - interactor_list : list containing dictionaries annotating the molecules (defined by their “interactor_id”) involved in interactions.
- Return type:
list[dict[str, Any]]
- Raises:
FileNotFoundError – If the directory does not exist or contains no files