napistu.ingestion.psi_mi

Functions

aggregate_psi_mis(formatted_psi_mis)

Aggregate PSI-MI molecular interactions and study metadata and format results as a dictionary of dataframes.

format_psi_mi(xml_path[, xml_namespace, verbose])

Format PSI 3.0

format_psi_mis(intact_xml_dir[, ...])

Format PSI-MI XML files

napistu.ingestion.psi_mi._create_reaction_species_df(one_study: Dict[str, Any]) DataFrame

Format the interactions in the study into a dataframe of reaction species.

Parameters:

one_study (dict[str, Any]) – Study data dictionary

Returns:

DataFrame containing reaction species data

Return type:

pd.DataFrame

napistu.ingestion.psi_mi._create_study_species_df(one_study: Dict[str, Any]) Tuple[DataFrame, DataFrame]

Create species and species identifiers DataFrames from study data.

Parameters:

one_study (dict[str, Any]) – Study data dictionary

Returns:

  • species_df (pd.DataFrame) – DataFrame containing species data

  • species_identifiers (pd.DataFrame) – DataFrame containing species identifiers data

napistu.ingestion.psi_mi._format_entry(an_entry: Element, xml_namespace: str) Dict[str, Any]

Extract a single XML entry of interactors and interactions.

Parameters:
  • an_entry (xml.etree.ElementTree.Element) – XML entry element to format

  • xml_namespace (str) – XML namespace to use for parsing

Returns:

Dictionary containing formatted entry data with keys: source, experiment, interactor_list, interactions_list

Return type:

dict[str, Any]

Raises:

ValueError – If the entry tag is not as expected

napistu.ingestion.psi_mi._format_entry_experiment(an_entry: Element, xml_namespace: str) Dict[str, str]

Format experiment-level information in an XML entry.

Parameters:
  • an_entry (xml.etree.ElementTree.Element) – XML entry element to format

  • xml_namespace (str) – XML namespace to use for parsing

Returns:

Dictionary containing experiment information with keys: experiment_name, interaction_method, ontology, identifier

Return type:

dict[str, str]

napistu.ingestion.psi_mi._format_entry_interaction(interaction: Element, xml_namespace: str) Dict[str, Any]

Format a single interaction in an XML interaction list.

Parameters:
  • interaction (xml.etree.ElementTree.Element) – XML interaction element to format

  • xml_namespace (str) – XML namespace to use for parsing

Returns:

Dictionary containing formatted interaction data with keys: interaction_name, interaction_type, interactors

Return type:

dict[str, Any]

napistu.ingestion.psi_mi._format_entry_interaction_participants(interaction_participant: Element, xml_namespace: str) Dict[str, str]

Format the participants in an XML interaction.

Parameters:
  • interaction_participant (xml.etree.ElementTree.Element) – XML participant element to format

  • xml_namespace (str) – XML namespace to use for parsing

Returns:

Dictionary containing formatted participant data with keys: participant_id, interactor_id, biological_role, experimental_role

Return type:

dict[str, str]

Raises:

ValueError – If the participant tag is not as expected

napistu.ingestion.psi_mi._format_entry_interactions(an_entry: Element, xml_namespace: str) List[Dict[str, Any]]

Format the molecular interaction in an XML entry.

Parameters:
  • an_entry (xml.etree.ElementTree.Element) – XML entry element to format

  • xml_namespace (str) – XML namespace to use for parsing

Returns:

List of dictionaries containing formatted interaction data

Return type:

list[dict[str, Any]]

napistu.ingestion.psi_mi._format_entry_interactor(interactor: Element, xml_namespace: str) Dict[str, Any]

Format a single molecular interactor in an interaction list XML node.

Parameters:
  • interactor (xml.etree.ElementTree.Element) – XML interactor element to format

  • xml_namespace (str) – XML namespace to use for parsing

Returns:

Dictionary containing formatted interactor data with keys: interactor_id, interactor_label, interactor_name, interactor_aliases, interactor_xrefs

Return type:

dict[str, Any]

Raises:

ValueError – If the interactor tag is not as expected

napistu.ingestion.psi_mi._format_entry_interactor_list(an_entry: Element, xml_namespace: str) List[Dict[str, Any]]

Format the molecular interactors in an XML entry.

Parameters:
  • an_entry (xml.etree.ElementTree.Element) – XML entry element to format

  • xml_namespace (str) – XML namespace to use for parsing

Returns:

List of dictionaries containing formatted interactor data

Return type:

list[dict[str, Any]]

napistu.ingestion.psi_mi._format_entry_interactor_xrefs(interactor: Element, xml_namespace: str) List[Dict[str, str]]

Format the cross-references of a single interactor.

Parameters:
  • interactor (xml.etree.ElementTree.Element) – XML interactor element to format

  • xml_namespace (str) – XML namespace to use for parsing

Returns:

List of dictionaries containing cross-reference data with keys: ref_type, ontology, identifier

Return type:

list[dict[str, str]]

napistu.ingestion.psi_mi._format_entry_source(an_entry: Element, xml_namespace: str) Dict[str, str]

Format the source describing the provenance of an XML entry.

Parameters:
  • an_entry (xml.etree.ElementTree.Element) – XML entry element to format

  • xml_namespace (str) – XML namespace to use for parsing

Returns:

Dictionary containing source information with keys: shortLabel, fullName

Return type:

dict[str, str]

napistu.ingestion.psi_mi._format_study_level_data(one_study: Dict[str, Any]) DataFrame

Format study-level data into a DataFrame.

Parameters:

one_study (dict[str, Any]) – Study data dictionary

Returns:

DataFrame containing study-level data

Return type:

pd.DataFrame

napistu.ingestion.psi_mi._get_optional_attribute(element: Element, xpath: str, attribute: str, default: str = '') str

Safely extract an attribute from an optional XML element.

Parameters:
  • element (xml.etree.ElementTree.Element) – The parent element to search within

  • xpath (str) – The xpath expression to find the child element

  • attribute (str) – The attribute name to extract

  • default (str, optional) – Default value to return if element or attribute is not found, by default PSI_MI_MISSING_VALUE_STR

Returns:

The attribute value, or the default value if not found

Return type:

str

napistu.ingestion.psi_mi._get_optional_text(element: Element, xpath: str, default: str = '') str

Safely extract text from an optional XML element.

Parameters:
  • element (xml.etree.ElementTree.Element) – The parent element to search within

  • xpath (str) – The xpath expression to find the child element

  • default (str, optional) – Default value to return if element is not found, by default PSI_MI_MISSING_VALUE_STR

Returns:

The text content of the element, or the default value if not found

Return type:

str

napistu.ingestion.psi_mi.aggregate_psi_mis(formatted_psi_mis: List[Dict[str, Any]]) Dict[str, DataFrame]

Aggregate PSI-MI molecular interactions and study metadata and format results as a dictionary of dataframes.

Parameters:

formatted_psi_mis (dict[str, Any]) – A dictionary of PSI-MI files, where the keys are the study IDs and the values are the PSI-MI files. As returned by napistu.ingestion.psi_mi.format_psi_mis.

Returns:

A dictionary of dataframes, where the keys are the study IDs and the values are: - reaction_species : A dataframe of reaction species, where the columns are the reaction species and the rows are the study IDs. - species : A dataframe of species, where the columns are the species and the rows are the study IDs. - species_identifiers : A dataframe of species identifiers, where the columns are the species identifiers and the rows are the study IDs. - study_level_data : A dataframe of study level data, where the columns are the study level data and the rows are the study IDs.

Return type:

dict[str, pd.DataFrame]

napistu.ingestion.psi_mi.format_psi_mi(xml_path: str, xml_namespace: str = '{http://psi.hupo.org/mi/mif300}', verbose: bool = False) list[dict[str, Any]]

Format PSI 3.0

Format an .xml file containing molecular interactions following the PSI 3.0 format.

Parameters:
  • xml_path (str) – Path to a .xml file

  • xml_namespace (str, optional) – Namespace for the xml file, by default PSI_MI_INTACT_XML_NAMESPACE

  • verbose (bool, optional) – Whether to print verbose output, by default False

Returns:

entry_list – A list containing molecular interaction entry dicts of the format: - source : dict containing the database that interactions were drawn from. - experiment : a simple summary of the experimental design and the publication. - interactor_list : list containing dictionaries annotating the molecules (defined by their “interactor_id”) involved in interactions. - interactions_list : list containing dictionaries annotating molecular interactions involving a set of “interactor_id”s.

Return type:

list[dict[str, Any]]

Raises:
  • FileNotFoundError – If the XML file does not exist

  • ValueError – If the XML file does not have the expected root tag

napistu.ingestion.psi_mi.format_psi_mis(intact_xml_dir: str, xml_namespace: str = '{http://psi.hupo.org/mi/mif300}', verbose: bool = False, files_to_process: int = -1) list[dict[str, Any]]

Format PSI-MI XML files

Format PSI-MI XML files into a list of dictionaries.

Parameters:
  • intact_xml_dir (str) – Path to the directory containing the PSI-MI XML files

  • xml_namespace (str, optional) – Namespace for the xml file, by default PSI_MI_INTACT_XML_NAMESPACE

  • verbose (bool, optional) – Whether to print verbose output, by default False

  • files_to_process (int, optional) – Number of files to process (-1 for all files), by default -1

Returns:

formatted_psi_mis – A list containing molecular interaction entry dicts of the format: - source : dict containing the database that interactions were drawn from. - experiment : a simple summary of the experimental design and the publication. - interactor_list : list containing dictionaries annotating the molecules (defined by their “interactor_id”) involved in interactions.

Return type:

list[dict[str, Any]]

Raises:

FileNotFoundError – If the directory does not exist or contains no files