piscis.data#

Classes#

SpotsDataset

Spot detection dataset.

MultiSpotsDataset

Multi-spot detection dataset.

WeightedDatasetSampler

Weighted dataset sampler.

SpotsDataStream

Spot detection data stream.

Functions#

get_torch_dataloader(→ torch.utils.data.DataLoader)

Get a Torch dataloader from a dataset.

get_torch_dataset(→ Dict)

Get a Torch dataset from a directory.

load_datasets(→ Dict)

Load datasets from a directory.

load_dataset(→ Dict)

Load a dataset from a directory.

generate_dataset(, min_spots, train_size, test_size)

Generate a dataset from images and spot coordinates.

Module Contents#

class piscis.data.SpotsDataset(x_paths: List[pathlib.Path], y_paths: List[pathlib.Path], adjustment: str | None = 'standardize', split: str | None = None)#

Bases: torch.utils.data.Dataset

Spot detection dataset.

Parameters:
xList[Path]

List of image paths.

yList[Path]

List of ground truth spot coordinates paths.

adjustmentOptional[str], optional

Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.

splitOptional[str], optional

Dataset split. Default is None.

x_paths#
y_paths#
adjustment = 'standardize'#
split = None#
__len__() int#
__getitem__(index: int) Tuple[numpy.ndarray, numpy.ndarray]#
class piscis.data.MultiSpotsDataset(datasets: List[SpotsDataset], weights: List[float])#

Bases: torch.utils.data.Dataset

Multi-spot detection dataset.

Parameters:
datasetsList[SpotsDataset]

List of datasets.

weightsList[float]

List of dataset sampling weights.

datasets#
weights#
epoch_size#
split#
__len__() int#
__getitem__(index) Tuple[numpy.ndarray, numpy.ndarray]#
class piscis.data.WeightedDatasetSampler(multi_dataset: MultiSpotsDataset, num_samples: int | None = None, seed: int | None = None)#

Bases: torch.utils.data.Sampler

Weighted dataset sampler.

Parameters:
multi_datasetMultiSpotsDataset

Multi-spot detection dataset.

num_samplesOptional[int], optional

Number of samples to draw. Default is None.

seedOptional[int], optional

Random seed. Default is None.

datasets#
weights#
generator#
seed = None#
__len__()#
__iter__()#
class piscis.data.SpotsDataStream(dataset: torch.utils.data.Dataset, min_num_samples: int, epoch: int = 1, seed: int = 0, shuffle: bool = True, augment_cls: Callable | None = None)#

Bases: torch.utils.data.IterableDataset

Spot detection data stream.

Parameters:
datasettorch.utils.data.Dataset

Torch dataset.

min_num_samplesint

Minimum number of samples per epoch.

epochint, optional

Current epoch. Default is 1.

seedint, optional

Random seed. Default is 0.

shufflebool, optional

Whether to shuffle the dataset. Default is True.

augment_clsOptional[Callable], optional

Augmentation class. Default is None.

dataset#
min_num_samples#
epoch = 1#
seed = 0#
shuffle = True#
augment_cls = None#
cached_indices = None#
set_epoch(epoch: int) None#
next_epoch() None#
_make_sampler(sampler_seed: int, worker_id: int, num_workers: int) torch.utils.data.Sampler | range#
static _get_worker_info() Tuple[int, int]#
__len__()#
__iter__()#
piscis.data.get_torch_dataloader(dataset: torch.utils.data.Dataset, image_size: Tuple[int, int], batch_size: int = 4, num_workers: int = 4, seed: int = 0, *args, **kwargs) torch.utils.data.DataLoader#

Get a Torch dataloader from a dataset.

Parameters:
datasettorch.utils.data.Dataset

Torch dataset.

image_sizeTuple[int, int]

Desired image size.

batch_sizeint, optional

Batch size. Default is 4.

num_workersint, optional

Number of workers for data loading. Default is 4.

seedint, optional

Random seed used for shuffling the dataset. Default is 0.

Returns:
dataloadertorch.utils.data.DataLoader

Torch dataloader.

Raises:
ValueError

If the dataset is not an instance of SpotsDataset or MultiSpotsDataset.

piscis.data.get_torch_dataset(paths: str | List[str] | Dict[str, float] | pathlib.Path, adjustment: str | None = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = False) Dict#

Get a Torch dataset from a directory.

Parameters:
pathsUnion[str, List[str], Dict[str, float], Path]

Path to a dataset, path to a directory containing multiple datasets, a list of multiple dataset paths, or a dictionary of multiple dataset paths and their corresponding sampling weights. If a directory of datasets or a list is provided, all datasets in the directory or list will be loaded and concatenated with equal weights. If a dictionary is provided, the datasets will be loaded and concatenated with the specified weights.

adjustmentOptional[str], optional

Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.

load_trainbool, optional

Whether to load the training set. Default is True.

load_valbool, optional

Whether to load the validation set. Default is True.

load_testbool, optional

Whether to load the test set. Default is False.

Returns:
datasetDict

Torch dataset.

piscis.data.load_datasets(path: str, adjustment: str | None = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = True) Dict#

Load datasets from a directory.

Parameters:
pathstr

Path to a dataset or directory of datasets.

adjustmentOptional[str], optional

Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.

load_trainbool, optional

Whether to load the training set. Default is True.

load_valbool, optional

Whether to load the validation set. Default is True.

load_testbool, optional

Whether to load the test set. Default is True.

Returns:
datasetDict

Dataset dictionary.

piscis.data.load_dataset(path: str, adjustment: str | None = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = True) Dict#

Load a dataset from a directory.

Parameters:
pathstr

Path to a dataset.

adjustmentOptional[str], optional

Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.

load_trainbool, optional

Whether to load the training set. Default is True.

load_valbool, optional

Whether to load the validation set. Default is True.

load_testbool, optional

Whether to load the test set. Default is True.

Returns:
datasetDict

Dataset dictionary.

piscis.data.generate_dataset(path: str, images: List[numpy.ndarray], coords: List[numpy.ndarray], seed: int, tile_size: Tuple[int, int] = (256, 256), min_spots: int = 1, train_size: float = 0.7, test_size: float = 0.15) None#

Generate a dataset from images and spot coordinates.

Parameters:
pathstr

Path to save dataset.

imagesList[np.ndarray]

List of images.

coordsList[np.ndarray]

List of ground truth spot coordinates.

seedint

Random seed used for splitting the dataset into training, validation, and test sets.

tile_sizeTuple[int, int], optional

Tile size used for splitting images. Default is (256, 256).

min_spotsint, optional

Minimum number of spots per tile. Default is 1.

train_sizefloat, optional

Fraction of dataset used for training. Default is 0.70.

test_sizefloat, optional

Fraction of dataset used for testing. Default is 0.15.