napistu.utils.path_utils
Utilities for path and URI operations.
Public Functions
- copy_uri(input_uri: str, output_uri: str, is_file: bool = True) -> None:
Copy a file or folder from one URI to another.
- ensure_path(path: Union[str, Path], expand_user: bool = True) -> Path:
Convert a string or Path to a Path object, optionally expanding user home directory.
- get_extn_from_url(url: str) -> str:
Retrieve file extension from a URL.
- get_source_base_and_path(uri: str) -> tuple[str, str]:
Get the base of a bucket or folder and the path to the file.
- get_target_base_and_path(uri: str) -> tuple[str, str]:
Get the base of a bucket + directory and the file.
- initialize_dir(output_dir_path: str, overwrite: bool) -> None:
Initialize a filesystem directory.
- path_exists(path: str) -> bool:
Check if a path or URI exists.
Functions
|
Copy a file or folder from one URI to another. |
|
Convert a string or Path to a Path object, optionally expanding user home directory. |
|
Retrieve file extension from a URL. |
Get the base of a bucket or folder and the path to the file. |
|
Get the base directory + parent path and the filename. |
|
|
Initialize a filesystem directory. |
|
Check if a path or URI exists. |
- napistu.utils.path_utils.copy_uri(input_uri: str, output_uri: str, is_file: bool = True) None
Copy a file or folder from one URI to another.
- Parameters:
input_uri (str) – Input file URI (e.g., ‘gs://bucket/file’, ‘/local/path’, ‘memory://path’).
output_uri (str) – Output file URI (e.g., ‘gs://bucket/file’, ‘/local/path’, ‘memory://path’).
is_file (bool, default=True) – If True, copy a single file. If False, copy directory recursively.
Examples
>>> copy_uri('/local/source.txt', '/local/dest.txt') >>> copy_uri('gs://bucket/source/', 'gs://bucket/dest/', is_file=False)
- napistu.utils.path_utils.ensure_path(path: str | Path, expand_user: bool = True) Path
Convert a string or Path to a Path object, optionally expanding user home directory.
- Parameters:
path (Union[str, Path]) – Path to convert. Can be a string (e.g., “~/data/store”) or Path object.
expand_user (bool, default=True) – If True, expand tildes (~) to the user’s home directory.
- Returns:
Path object, with user expanded if expand_user=True.
- Return type:
Path
- Raises:
TypeError – If path is not a str or Path object.
Examples
>>> ensure_path("~/data/store") PosixPath('/home/user/data/store') >>> ensure_path(Path("./relative/path")) PosixPath('./relative/path') >>> ensure_path("~/data", expand_user=False) PosixPath('~/data')
- napistu.utils.path_utils.get_extn_from_url(url: str) str
Retrieve file extension from a URL.
- Parameters:
url (str) – URL to extract extension from.
- Returns:
File extension including the leading dot (e.g., ‘.gz’, ‘.tar.gz’).
- Return type:
str
- Raises:
ValueError – If no file extension can be identified in the URL.
Examples
>>> get_extn_from_url('https://test/test.gz') '.gz' >>> get_extn_from_url('https://test/test.tar.gz') '.tar.gz' >>> get_extn_from_url('https://test/test.tar.gz/bla') Traceback (most recent call last): ... ValueError: File extension not identifiable: https://test/test.tar.gz/bla
- napistu.utils.path_utils.get_source_base_and_path(uri: str) tuple[str, str]
Get the base of a bucket or folder and the path to the file.
For URIs with a scheme (e.g., ‘gs://’), returns the scheme + netloc as base. For local paths, returns the directory as base.
- Parameters:
uri (str) – URI or path to parse.
- Returns:
A tuple of (base, path) where: - base : str
The base URI or directory (e.g., ‘gs://bucket’ or ‘/local/dir’).
- pathstr
The relative path to the file (e.g., ‘folder/file’ or ‘file’).
- Return type:
tuple[str, str]
Examples
>>> get_source_base_and_path("gs://bucket/folder/file") ('gs://bucket', 'folder/file') >>> get_source_base_and_path("/bucket/folder/file") ('/bucket/folder', 'file')
- napistu.utils.path_utils.get_target_base_and_path(uri: str) tuple[str, str]
Get the base directory + parent path and the filename.
Splits the URI at the last path separator to extract the filename.
- Parameters:
uri (str) – URI or path to parse.
- Returns:
A tuple of (base, filename) where: - base : str
The directory path (e.g., ‘gs://bucket/folder’ or ‘/local/folder’).
- filenamestr
The filename (e.g., ‘file’).
- Return type:
tuple[str, str]
Examples
>>> get_target_base_and_path("gs://bucket/folder/file") ('gs://bucket/folder', 'file') >>> get_target_base_and_path("bucket/folder/file") ('bucket/folder', 'file') >>> get_target_base_and_path("/bucket/folder/file") ('/bucket/folder', 'file')
- napistu.utils.path_utils.initialize_dir(output_dir_path: str, overwrite: bool) None
Initialize a filesystem directory.
Creates a new directory or optionally overwrites an existing one. Works with any fsspec-supported filesystem (local, GCS, S3, etc.).
- Parameters:
output_dir_path (str) – Path or URI to the directory to create (e.g., ‘/local/path’, ‘gs://bucket/path’).
overwrite (bool) – If True, delete and recreate the directory if it exists. If False, raise FileExistsError if the directory exists.
- Raises:
FileExistsError – If directory exists and overwrite is False.
Examples
>>> initialize_dir('/tmp/newdir', overwrite=False) >>> initialize_dir('gs://bucket/path', overwrite=True)
- napistu.utils.path_utils.path_exists(path: str) bool
Check if a path or URI exists.
Works with any fsspec-supported filesystem (local, GCS, S3, memory, etc.).
- Parameters:
path (str) – Path or URI to check (e.g., ‘/local/path’, ‘gs://bucket/path’, ‘memory://path’).
- Returns:
True if the path exists, False otherwise.
- Return type:
bool
Examples
>>> path_exists('/tmp/myfile.txt') False >>> path_exists('gs://bucket/existing_file.txt') True >>> path_exists('.') True