napistu.utils.path_utils

Utilities for path and URI operations.

Public Functions

copy_uri(input_uri: str, output_uri: str, is_file: bool = True) -> None:

Copy a file or folder from one URI to another.

ensure_path(path: Union[str, Path], expand_user: bool = True) -> Path:

Convert a string or Path to a Path object, optionally expanding user home directory.

get_extn_from_url(url: str) -> str:

Retrieve file extension from a URL.

get_source_base_and_path(uri: str) -> tuple[str, str]:

Get the base of a bucket or folder and the path to the file.

get_target_base_and_path(uri: str) -> tuple[str, str]:

Get the base of a bucket + directory and the file.

initialize_dir(output_dir_path: str, overwrite: bool) -> None:

Initialize a filesystem directory.

path_exists(path: str) -> bool:

Check if a path or URI exists.

Functions

copy_uri(input_uri, output_uri[, is_file])

Copy a file or folder from one URI to another.

ensure_path(path[, expand_user])

Convert a string or Path to a Path object, optionally expanding user home directory.

get_extn_from_url(url)

Retrieve file extension from a URL.

get_source_base_and_path(uri)

Get the base of a bucket or folder and the path to the file.

get_target_base_and_path(uri)

Get the base directory + parent path and the filename.

initialize_dir(output_dir_path, overwrite)

Initialize a filesystem directory.

path_exists(path)

Check if a path or URI exists.

napistu.utils.path_utils.copy_uri(input_uri: str, output_uri: str, is_file: bool = True) None

Copy a file or folder from one URI to another.

Parameters:
  • input_uri (str) – Input file URI (e.g., ‘gs://bucket/file’, ‘/local/path’, ‘memory://path’).

  • output_uri (str) – Output file URI (e.g., ‘gs://bucket/file’, ‘/local/path’, ‘memory://path’).

  • is_file (bool, default=True) – If True, copy a single file. If False, copy directory recursively.

Examples

>>> copy_uri('/local/source.txt', '/local/dest.txt')
>>> copy_uri('gs://bucket/source/', 'gs://bucket/dest/', is_file=False)
napistu.utils.path_utils.ensure_path(path: str | Path, expand_user: bool = True) Path

Convert a string or Path to a Path object, optionally expanding user home directory.

Parameters:
  • path (Union[str, Path]) – Path to convert. Can be a string (e.g., “~/data/store”) or Path object.

  • expand_user (bool, default=True) – If True, expand tildes (~) to the user’s home directory.

Returns:

Path object, with user expanded if expand_user=True.

Return type:

Path

Raises:

TypeError – If path is not a str or Path object.

Examples

>>> ensure_path("~/data/store")
PosixPath('/home/user/data/store')
>>> ensure_path(Path("./relative/path"))
PosixPath('./relative/path')
>>> ensure_path("~/data", expand_user=False)
PosixPath('~/data')
napistu.utils.path_utils.get_extn_from_url(url: str) str

Retrieve file extension from a URL.

Parameters:

url (str) – URL to extract extension from.

Returns:

File extension including the leading dot (e.g., ‘.gz’, ‘.tar.gz’).

Return type:

str

Raises:

ValueError – If no file extension can be identified in the URL.

Examples

>>> get_extn_from_url('https://test/test.gz')
'.gz'
>>> get_extn_from_url('https://test/test.tar.gz')
'.tar.gz'
>>> get_extn_from_url('https://test/test.tar.gz/bla')
Traceback (most recent call last):
...
ValueError: File extension not identifiable: https://test/test.tar.gz/bla
napistu.utils.path_utils.get_source_base_and_path(uri: str) tuple[str, str]

Get the base of a bucket or folder and the path to the file.

For URIs with a scheme (e.g., ‘gs://’), returns the scheme + netloc as base. For local paths, returns the directory as base.

Parameters:

uri (str) – URI or path to parse.

Returns:

A tuple of (base, path) where: - base : str

The base URI or directory (e.g., ‘gs://bucket’ or ‘/local/dir’).

  • pathstr

    The relative path to the file (e.g., ‘folder/file’ or ‘file’).

Return type:

tuple[str, str]

Examples

>>> get_source_base_and_path("gs://bucket/folder/file")
('gs://bucket', 'folder/file')
>>> get_source_base_and_path("/bucket/folder/file")
('/bucket/folder', 'file')
napistu.utils.path_utils.get_target_base_and_path(uri: str) tuple[str, str]

Get the base directory + parent path and the filename.

Splits the URI at the last path separator to extract the filename.

Parameters:

uri (str) – URI or path to parse.

Returns:

A tuple of (base, filename) where: - base : str

The directory path (e.g., ‘gs://bucket/folder’ or ‘/local/folder’).

  • filenamestr

    The filename (e.g., ‘file’).

Return type:

tuple[str, str]

Examples

>>> get_target_base_and_path("gs://bucket/folder/file")
('gs://bucket/folder', 'file')
>>> get_target_base_and_path("bucket/folder/file")
('bucket/folder', 'file')
>>> get_target_base_and_path("/bucket/folder/file")
('/bucket/folder', 'file')
napistu.utils.path_utils.initialize_dir(output_dir_path: str, overwrite: bool) None

Initialize a filesystem directory.

Creates a new directory or optionally overwrites an existing one. Works with any fsspec-supported filesystem (local, GCS, S3, etc.).

Parameters:
  • output_dir_path (str) – Path or URI to the directory to create (e.g., ‘/local/path’, ‘gs://bucket/path’).

  • overwrite (bool) – If True, delete and recreate the directory if it exists. If False, raise FileExistsError if the directory exists.

Raises:

FileExistsError – If directory exists and overwrite is False.

Examples

>>> initialize_dir('/tmp/newdir', overwrite=False)
>>> initialize_dir('gs://bucket/path', overwrite=True)
napistu.utils.path_utils.path_exists(path: str) bool

Check if a path or URI exists.

Works with any fsspec-supported filesystem (local, GCS, S3, memory, etc.).

Parameters:

path (str) – Path or URI to check (e.g., ‘/local/path’, ‘gs://bucket/path’, ‘memory://path’).

Returns:

True if the path exists, False otherwise.

Return type:

bool

Examples

>>> path_exists('/tmp/myfile.txt')
False
>>> path_exists('gs://bucket/existing_file.txt')
True
>>> path_exists('.')
True