piscis.data

piscis.data#

Classes#

`SpotsDataset`	Spot detection dataset.
`MultiSpotsDataset`	Multi-spot detection dataset.
`WeightedDatasetSampler`	Weighted dataset sampler.
`SpotsDataStream`	Spot detection data stream.

Functions#

`get_torch_dataloader`(→ torch.utils.data.DataLoader)	Get a Torch dataloader from a dataset.
`get_torch_dataset`(→ Dict)	Get a Torch dataset from a directory.
`load_datasets`(→ Dict)	Load datasets from a directory.
`load_dataset`(→ Dict)	Load a dataset from a directory.
`generate_dataset`(, min_spots, train_size, test_size)	Generate a dataset from images and spot coordinates.

Module Contents#

class piscis.data.SpotsDataset(x_paths: List[pathlib.Path], y_paths: List[pathlib.Path], adjustment: str | None = 'standardize', split: str | None = None)#

Bases: torch.utils.data.Dataset

Spot detection dataset.

Parameters:

xList[Path]: List of image paths.
yList[Path]: List of ground truth spot coordinates paths.
adjustmentOptional[str], optional: Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.
splitOptional[str], optional: Dataset split. Default is None.

x_paths#

y_paths#

adjustment = 'standardize'#

split = None#

__len__() → int#

__getitem__(index: int) → Tuple[numpy.ndarray, numpy.ndarray]#

class piscis.data.MultiSpotsDataset(datasets: List[SpotsDataset], weights: List[float])#

Bases: torch.utils.data.Dataset

Multi-spot detection dataset.

Parameters:

datasetsList[SpotsDataset]: List of datasets.
weightsList[float]: List of dataset sampling weights.

datasets#

weights#

epoch_size#

split#

__len__() → int#

__getitem__(index) → Tuple[numpy.ndarray, numpy.ndarray]#

class piscis.data.WeightedDatasetSampler(multi_dataset: MultiSpotsDataset, num_samples: int | None = None, seed: int | None = None)#

Bases: torch.utils.data.Sampler

Weighted dataset sampler.

Parameters:

multi_datasetMultiSpotsDataset: Multi-spot detection dataset.
num_samplesOptional[int], optional: Number of samples to draw. Default is None.
seedOptional[int], optional: Random seed. Default is None.

datasets#

weights#

generator#

seed = None#

__len__()#

__iter__()#

class piscis.data.SpotsDataStream(dataset: torch.utils.data.Dataset, min_num_samples: int, epoch: int = 1, seed: int = 0, shuffle: bool = True, augment_cls: Callable | None = None)#

Bases: torch.utils.data.IterableDataset

Spot detection data stream.

Parameters:

datasettorch.utils.data.Dataset: Torch dataset.
min_num_samplesint: Minimum number of samples per epoch.
epochint, optional: Current epoch. Default is 1.
seedint, optional: Random seed. Default is 0.
shufflebool, optional: Whether to shuffle the dataset. Default is True.
augment_clsOptional[Callable], optional: Augmentation class. Default is None.

dataset#

min_num_samples#

epoch = 1#

seed = 0#

shuffle = True#

augment_cls = None#

cached_indices = None#

set_epoch(epoch: int) → None#

next_epoch() → None#

_make_sampler(sampler_seed: int, worker_id: int, num_workers: int) → torch.utils.data.Sampler | range#

static _get_worker_info() → Tuple[int, int]#

__len__()#

__iter__()#

piscis.data.get_torch_dataloader(dataset: torch.utils.data.Dataset, image_size: Tuple[int, int], batch_size: int = 4, num_workers: int = 4, seed: int = 0, *args, **kwargs) → torch.utils.data.DataLoader#

Get a Torch dataloader from a dataset.

Parameters:

datasettorch.utils.data.Dataset: Torch dataset.
image_sizeTuple[int, int]: Desired image size.
batch_sizeint, optional: Batch size. Default is 4.
num_workersint, optional: Number of workers for data loading. Default is 4.
seedint, optional: Random seed used for shuffling the dataset. Default is 0.

Returns:

dataloadertorch.utils.data.DataLoader: Torch dataloader.

Raises:

ValueError: If the dataset is not an instance of SpotsDataset or MultiSpotsDataset.

piscis.data.get_torch_dataset(paths: str | List[str] | Dict[str, float] | pathlib.Path, adjustment: str | None = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = False) → Dict#

Get a Torch dataset from a directory.

Parameters:

pathsUnion[str, List[str], Dict[str, float], Path]: Path to a dataset, path to a directory containing multiple datasets, a list of multiple dataset paths, or a dictionary of multiple dataset paths and their corresponding sampling weights. If a directory of datasets or a list is provided, all datasets in the directory or list will be loaded and concatenated with equal weights. If a dictionary is provided, the datasets will be loaded and concatenated with the specified weights.
adjustmentOptional[str], optional: Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.
load_trainbool, optional: Whether to load the training set. Default is True.
load_valbool, optional: Whether to load the validation set. Default is True.
load_testbool, optional: Whether to load the test set. Default is False.

Returns:

datasetDict: Torch dataset.

piscis.data.load_datasets(path: str, adjustment: str | None = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = True) → Dict#

Load datasets from a directory.

Parameters:

pathstr: Path to a dataset or directory of datasets.
adjustmentOptional[str], optional: Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.
load_trainbool, optional: Whether to load the training set. Default is True.
load_valbool, optional: Whether to load the validation set. Default is True.
load_testbool, optional: Whether to load the test set. Default is True.

Returns:

datasetDict: Dataset dictionary.

piscis.data.load_dataset(path: str, adjustment: str | None = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = True) → Dict#

Load a dataset from a directory.

Parameters:

pathstr: Path to a dataset.
adjustmentOptional[str], optional: Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.
load_trainbool, optional: Whether to load the training set. Default is True.
load_valbool, optional: Whether to load the validation set. Default is True.
load_testbool, optional: Whether to load the test set. Default is True.

Returns:

datasetDict: Dataset dictionary.

piscis.data.generate_dataset(path: str, images: List[numpy.ndarray], coords: List[numpy.ndarray], seed: int, tile_size: Tuple[int, int] = (256, 256), min_spots: int = 1, train_size: float = 0.7, test_size: float = 0.15) → None#

Generate a dataset from images and spot coordinates.

Parameters:

pathstr: Path to save dataset.
imagesList[np.ndarray]: List of images.
coordsList[np.ndarray]: List of ground truth spot coordinates.
seedint: Random seed used for splitting the dataset into training, validation, and test sets.
tile_sizeTuple[int, int], optional: Tile size used for splitting images. Default is (256, 256).
min_spotsint, optional: Minimum number of spots per tile. Default is 1.
train_sizefloat, optional: Fraction of dataset used for training. Default is 0.70.
test_sizefloat, optional: Fraction of dataset used for testing. Default is 0.15.