napistu.ontologies.standardization
Standardization of ontologies for creating Identifiers and URLs
Public Functions
- check_reactome_identifier_compatibility
Check whether two sets of Reactome identifiers are from the same species.
- create_uri_url
Convert from an identifier and ontology to a URL reference for the identifier.
- ensembl_id_to_url_regex
Map an ensembl ID to a validation regex and its canonical url on ensembl.
- format_uri
Convert a RDF URI into an identifier list.
- format_uri_url
Convert a URI into an identifier dictionary.
- format_uri_url_identifiers_dot_org
Parse identifiers.org identifiers from a split URL path.
- is_known_unsupported_uri
Check if a URI is known to be unsupported/pathological.
- parse_ensembl_id
Extract the molecule type and species name from an ensembl identifier.
Functions
Check Reactome Identifier Compatibility |
|
|
Create URI URL |
|
Ensembl ID to URL and Regex |
|
Convert a RDF URI into an identifier list |
|
Convert a URI into an identifier dictionary |
|
Parse identifiers.org identifiers |
Check if a URI is known to be unsupported/pathological. |
|
|
Parse Ensembl ID |
- napistu.ontologies.standardization._count_reactome_species(reactome_series: Series) Series
Count the number of species tags in a set of reactome IDs
- napistu.ontologies.standardization._format_Identifiers_pubmed(pubmed_id: str) Identifiers
Format Identifiers for a single PubMed ID.
These will generally be used in an r_Identifiers field.
- napistu.ontologies.standardization._infer_primary_reactome_species(reactome_series: Series) tuple[str, int]
Infer the best supported species based on a set of Reactome identifiers
- napistu.ontologies.standardization._netloc_to_identifiers_matrixdb_adaptor(uri, class_regex, id_regex)
- napistu.ontologies.standardization._netloc_to_identifiers_mirbase_adaptor(split_path, result)
- napistu.ontologies.standardization._netloc_to_identifiers_pubchem_adaptor(split_path, result)
- napistu.ontologies.standardization._netloc_w_url_prefix_to_identifiers_ncbi_adaptor(result, uri)
- napistu.ontologies.standardization._netloc_w_url_suffix_to_identifiers_ensembl_adaptor(result, ontology: str)
- napistu.ontologies.standardization._netloc_w_url_suffix_to_identifiers_phosphosite_adaptor(split_path, result)
- napistu.ontologies.standardization._reactome_id_species(reactome_id: str) str
Extract the species code from a Reactome ID
- napistu.ontologies.standardization._split_one_to_identifiers_chebi_adaptor(split_path, result)
- napistu.ontologies.standardization._validate_bqb(bqb: str) None
Validate a BQB code
- Parameters:
bqb (str) – The BQB code to validate
- Return type:
None
- Raises:
TypeError – If the BQB code is not a string
ValueError – If the BQB code does not start with ‘BQB’
- napistu.ontologies.standardization.check_reactome_identifier_compatibility(reactome_series_a: Series, reactome_series_b: Series) None
Check Reactome Identifier Compatibility
Determine whether two sets of Reactome identifiers are from the same organismal species.
- Parameters:
reactome_series_a (pd.Series) – a Series containing Reactome identifiers
reactome_series_b (pd.Series) – a Series containing Reactome identifiers
- Return type:
None
- napistu.ontologies.standardization.create_uri_url(ontology: str, identifier: str, strict: bool = True) str
Create URI URL
Convert from an identifier and ontology to a URL reference for the identifier
- Parameters:
ontology (str) – An ontology for organizing genes, metabolites, etc.
identifier (str) – A systematic identifier from the “ontology” ontology.
strict (bool) – if strict then throw errors for invalid IDs otherwise return None
- Returns:
url – A url representing a unique identifier
- Return type:
str
- napistu.ontologies.standardization.ensembl_id_to_url_regex(identifier: str, ontology: str) tuple[str, str]
Ensembl ID to URL and Regex
Map an ensembl ID to a validation regex and its canonical url on ensembl
- Parameters:
identifier (str) – A standard identifier from ensembl genes, transcripts, or proteins
ontology (str) – The standard ontology (ensembl_gene, ensembl_transcript, or ensembl_protein)
- Returns:
id_regex : a regex which should match a valid entry in this ontology url : the id’s url on ensembl
- Return type:
tuple[str, str]
- napistu.ontologies.standardization.format_uri(uri: str, bqb: str, strict: bool = True) list[dict] | None
Convert a RDF URI into an identifier list
- Parameters:
uri (str) – The RDF URI to convert
bqb (str) – The BQB to add to the identifier
strict (bool) – Whether to raise an error if the URI is not valid
- Returns:
The identifier list or None if the URI is not valid
- Return type:
Optional[list[dict]]
- napistu.ontologies.standardization.format_uri_url(uri: str, strict: bool = True) dict
Convert a URI into an identifier dictionary
- Parameters:
uri (str) – The URI to convert
strict (bool) – Whether to raise an error if the URI is not valid
- Returns:
The identifier dictionary
- Return type:
dict
- Raises:
NotImplementedError – If a parsing precedure has not been implemented for the netloc
TypeError – If the URI is not valid
ValueError – If there is a pathological identifier within ontology-specific parsing
- napistu.ontologies.standardization.format_uri_url_identifiers_dot_org(split_path: list[str])
Parse identifiers.org identifiers
The identifiers.org identifier have two different formats: 1. http://identifiers.org/<ontology>/<id> 2. http://identifiers.org/<ontology>:<id>
Currently we are identifying the newer format 2. by looking for the : in the second element of the split path.
Also the ontology is converted to lower case letters.
- Parameters:
split_path (list[str]) – split url path
- Returns:
ontology, identifier
- Return type:
tuple[str, str]
- napistu.ontologies.standardization.is_known_unsupported_uri(uri: str) bool
Check if a URI is known to be unsupported/pathological.
This prevents throwing exceptions for URIs we know we can’t parse, allowing for cleaner logging and batch processing.
- Parameters:
uri (str) – The URI to check
- Returns:
True if the URI is known to be unsupported
- Return type:
bool
- napistu.ontologies.standardization.parse_ensembl_id(input_str: str) tuple[str, str, str]
Parse Ensembl ID
Extract the molecule type and species name from a string containing an ensembl identifier.
- Parameters:
(str) (input_str) – A string containing an ensembl gene, transcript, or protein identifier
- Returns:
- identifier (str):
The substring matching the full identifier
- molecule_type (str):
- The ontology the identifier belongs to:
G -> ensembl_gene
T -> ensembl_transcript
P -> ensembl_protein
- organismal_species (str):
The species name the identifier belongs to
- Return type:
tuple[str, str, str]