napistu.ontologies.id_tables

Functions

filter_id_table(id_table[, identifiers, ...])

Filter an identifier table by identifiers, ontologies, and BQB terms for a given entity type.

napistu.ontologies.id_tables._sanitize_id_table_bqbs(bqbs: str | list | set, id_table: DataFrame) set

Sanitize and validate BQBs against the id_table.

Parameters:
  • bqbs (str, list, or set) – BQB terms to validate.

  • id_table (pd.DataFrame) – DataFrame containing BQB reference data.

Returns:

Set of validated BQB terms.

Return type:

set

napistu.ontologies.id_tables._sanitize_id_table_identifiers(identifiers: str | list | set, id_table: DataFrame) set

Sanitize and validate identifiers against the id_table.

Parameters:
  • identifiers (str, list, or set) – Identifier values to validate.

  • id_table (pd.DataFrame) – DataFrame containing identifier reference data.

Returns:

Set of validated identifiers.

Return type:

set

napistu.ontologies.id_tables._sanitize_id_table_ontologies(ontologies: str | list | set, id_table: DataFrame) set

Sanitize and validate ontologies against the id_table.

Parameters:
  • ontologies (str, list, or set) – Ontology names to validate.

  • id_table (pd.DataFrame) – DataFrame containing ontology reference data.

Returns:

Set of validated ontology names.

Return type:

set

napistu.ontologies.id_tables._sanitize_id_table_values(values: str | list | set, id_table: DataFrame, column_name: str, valid_values: Set[str] | None = None, value_type_name: str = None) set

Generic function to sanitize and validate values against an id_table column.

Parameters:
  • values (str, list, or set) – Values to sanitize and validate. Can be a single string, list of strings, or set of strings.

  • id_table (pd.DataFrame) – DataFrame containing the reference data to validate against.

  • column_name (str) – Name of the column in id_table to check values against.

  • valid_values (set of str, optional) – Optional set of globally valid values for additional validation (e.g., VALID_BQB_TERMS). If provided, values must be a subset of this set.

  • value_type_name (str, optional) – Human-readable name for the value type used in error messages. If None, defaults to column_name.

Returns:

Set of sanitized and validated values.

Return type:

set

Raises:

ValueError – If values is not a string, list, or set. If any values are not in valid_values (when provided). If none of the requested values are present in the id_table.

Warning

Logs a warning if some (but not all) requested values are missing from id_table.

napistu.ontologies.id_tables._validate_id_table(id_table: DataFrame, entity_type: str) None

Validate that the id_table contains the required columns and matches the schema for the given entity_type.

Parameters:
  • id_table (pd.DataFrame) – DataFrame containing identifier mappings for a given entity type.

  • entity_type (str) – The type of entity (e.g., ‘species’, ‘reactions’) to validate against the schema.

Return type:

None

Raises:

ValueError – If entity_type is not present in the schema, or if required columns are missing in id_table.

napistu.ontologies.id_tables.filter_id_table(id_table: DataFrame, identifiers: str | list | set | None = None, ontologies: str | list | set | None = None, bqbs: str | list | set | None = ['BQB_IS', 'BQB_IS_HOMOLOG_TO', 'BQB_IS_ENCODED_BY', 'BQB_ENCODES', 'BQB_HAS_PART']) DataFrame

Filter an identifier table by identifiers, ontologies, and BQB terms for a given entity type.

Parameters:
  • id_table (pd.DataFrame) – DataFrame containing identifier mappings to be filtered.

  • identifiers (str, list, set, or None, optional) – Identifiers to filter by. If None, no filtering is applied on identifiers.

  • ontologies (str, list, set, or None, optional) – Ontologies to filter by. If None, no filtering is applied on ontologies.

  • bqbs (str, list, set, or None, optional) – BQB terms to filter by. If None, no filtering is applied on BQB terms. Default is [BQB.IS, BQB.HAS_PART].

Returns:

Filtered DataFrame containing only rows matching the specified criteria.

Return type:

pd.DataFrame

Raises:

ValueError – If the id_table or filter values are invalid, or required columns are missing.