napistu.ingestion.perturbseq
Ingestion and formatting utilities for Perturb-seq datasets.
Functions
|
Assign predicted direction categories based on perturbation type and fold-change. |
|
Ingest Replogle et al. Perturb-seq p-values. |
|
Load aggregated perturbseq data with species IDs. |
Load Replogle et al. Perturb-seq p-values with species IDs. |
- napistu.ingestion.perturbseq._categorize_perturbseq_row(row: Series, perturbation_type_col: str, standardized_value_col: str, threshold_value_col: str) str
Categorize a row of perturbseq data into a direction category.
- Parameters:
row (pd.Series) – Row of perturbseq data
perturbation_type_col (str) – Column name for perturbation type
standardized_value_col (str) – Column name for standardized value
threshold_value_col (str) – Column name for threshold value
- Returns:
Series with direction category
- Return type:
pd.Series
Examples
- df = pd.Series({
‘perturbation_type’: ‘OE’, ‘standardized_value’: 1.0, ‘threshold_value’: 0.5
}) _categorize_perturbseq_row(df, ‘perturbation_type’, ‘standardized_value’, ‘threshold_value’)
- napistu.ingestion.perturbseq._format_harmonizome_replogle_with_species_ids(harmonizome_replogle_interactions: DataFrame, species_identifiers: DataFrame) DataFrame
Format Replogle interactions from Harmonizome with species IDs.
- Parameters:
harmonizome_replogle_interactions (pd.DataFrame) – Harmonizome’s Replogle interactions dataframe.
species_identifiers (pd.DataFrame) – Species identifiers dataframe.
- Returns:
Replogle interactions with species IDs.
- Return type:
pd.DataFrame
Examples
datasets = [HARMONIZOME_DATASET_SHORTNAMES.REPLOGLE_K562_ESSENTIAL] _ = process_harmonizome_datasets(datasets, “/tmp/harmonizome_data”) perturbseq_data = load_harmonizome_datasets(datasets, “/tmp/harmonizome_data”) harmonizome_replogle_interactions_with_species_ids = format_harmonizome_replogle_with_species_ids(perturbseq_data[datasets[0]][“interactions”], species_identifiers)
- napistu.ingestion.perturbseq._format_perturbatlas_with_species_ids(perturbatlas_interactions: DataFrame, species_identifiers: DataFrame) DataFrame
Format PerturbAtlas interactions with species IDs.
- Parameters:
perturbatlas_interactions (pd.DataFrame) – PerturbAtlas interactions dataframe.
species_identifiers (pd.DataFrame) – Species identifiers dataframe.
- Returns:
PerturbAtlas interactions with species IDs.
- Return type:
pd.DataFrame
Examples
datasets = [HARMONIZOME_DATASET_SHORTNAMES.PERTURB_ATLAS_MOUSE] _ = process_harmonizome_datasets(datasets, “/tmp/harmonizome_data”) perturbseq_data = load_harmonizome_datasets(datasets, “/tmp/harmonizome_data”) perturbatlas_interactions_with_species_ids = format_perturbatlas_with_species_ids(perturbseq_data[datasets[0]][“interactions”], species_identifiers)
- napistu.ingestion.perturbseq._get_distinct_harmonizome_perturbseq_interactions(aggregated_perturbseq_data_with_species_ids: DataFrame) DataFrame
Reduce the harmonizome perturbseq data to a single entry per study-type-perturbed-target pair.
- napistu.ingestion.perturbseq._get_distinct_replogle_pvalues(replogle_pvalues_with_species_ids: DataFrame) DataFrame
Reduce the Replogle reported significance to a single entry per perturbed-target pair.
- napistu.ingestion.perturbseq.assign_predicted_direction(df, perturbation_type_col='perturbation_type', standardized_value_col='Standardized Value', threshold_value_col='Threshold Value')
Assign predicted direction categories based on perturbation type and fold-change.
- For OE (overexpression):
standardized_value > threshold: strong activation
0 < standardized_value <= threshold: weak activation
-threshold <= standardized_value < 0: weak repression
standardized_value < -threshold: strong repression
- For KD/KO (knockdown/knockout) - directions are flipped:
standardized_value > threshold: strong repression
0 < standardized_value <= threshold: weak repression
-threshold <= standardized_value < 0: weak activation
standardized_value < -threshold: strong activation
- Parameters:
df (pd.DataFrame) – DataFrame with perturbation data
perturbation_type_col (str) – Column name for perturbation type (should contain ‘KD’, ‘KO’, or ‘OE’)
standardized_value_col (str) – Column name for standardized fold-change values
threshold_value_col (str) – Column name for threshold values (absolute value)
- Returns:
Series with predicted direction categories
- Return type:
pd.Series
- napistu.ingestion.perturbseq.ingest_replogle_pvalues(target_uri: str) None
Ingest Replogle et al. Perturb-seq p-values.
- Parameters:
target_uri (str) – Target URI to download the Replogle et al. Perturb-seq p-values to.
- Return type:
None
- napistu.ingestion.perturbseq.load_harmonizome_perturbseq_datasets(harmonizome_data_dir: str, species_identifiers: DataFrame, datasets_w_formatters: Dict[str, Callable] | None = None, return_distinct_interactions: bool = False) DataFrame
Load aggregated perturbseq data with species IDs.
- Parameters:
harmonizome_data_dir (str) – Directory containing harmonizome data.
species_identifiers (pd.DataFrame) – Species identifiers dataframe.
datasets_w_formatters (Optional[Dict[str, Callable]]) – Dictionary mapping dataset shortnames to formatters. By default, uses the human perturbseq datasets to formatters.
return_distinct_interactions (bool) – Whether to return distinct interactions. Default is False.
- Returns:
Aggregated perturbseq data with the following columns: - perturbed_species_id: the species id of the perturbed gene - target_species_id: the species id of the target gene - perturbation_type: the type of perturbation (for perturbatlas, e.g., KO for knockout) - perturbation_study: the study that reported the perturbation (for perturbatlas, a study code) - standardized_value: the standardized value of the perturbation - thresholded_value: the thresholded value of the perturbation - dataset_shortname: the shortname of the dataset
- Return type:
pd.DataFrame
- napistu.ingestion.perturbseq.load_replogle_pvalues_with_species_ids(path_to_wide_replogle_pvalues: str | Path, species_identifiers: DataFrame, return_distinct_interactions: bool = False) DataFrame
Load Replogle et al. Perturb-seq p-values with species IDs.
- Parameters:
path_to_wide_replogle_pvalues (Union[str, Path]) – Path to the wide Replogle et al. Perturb-seq p-values file.
species_identifiers (pd.DataFrame) – Species identifiers dataframe.
return_distinct_interactions (bool) – Whether to return distinct interactions. Default is False.
- Returns:
Replogle et al. Perturb-seq p-values with species IDs.
- Return type:
pd.DataFrame