piscis.data#
Classes#
Spot detection dataset. |
|
Multi-spot detection dataset. |
|
Weighted dataset sampler. |
|
Spot detection data stream. |
Functions#
|
Get a Torch dataloader from a dataset. |
|
Get a Torch dataset from a directory. |
|
Load datasets from a directory. |
|
Load a dataset from a directory. |
|
Generate a dataset from images and spot coordinates. |
Module Contents#
- class piscis.data.SpotsDataset(x_paths: List[pathlib.Path], y_paths: List[pathlib.Path], adjustment: str | None = 'standardize', split: str | None = None)#
Bases:
torch.utils.data.DatasetSpot detection dataset.
- Parameters:
- xList[Path]
List of image paths.
- yList[Path]
List of ground truth spot coordinates paths.
- adjustmentOptional[str], optional
Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.
- splitOptional[str], optional
Dataset split. Default is None.
- x_paths#
- y_paths#
- adjustment = 'standardize'#
- split = None#
- __getitem__(index: int) Tuple[numpy.ndarray, numpy.ndarray]#
- class piscis.data.MultiSpotsDataset(datasets: List[SpotsDataset], weights: List[float])#
Bases:
torch.utils.data.DatasetMulti-spot detection dataset.
- Parameters:
- datasetsList[SpotsDataset]
List of datasets.
- weightsList[float]
List of dataset sampling weights.
- datasets#
- weights#
- epoch_size#
- split#
- __getitem__(index) Tuple[numpy.ndarray, numpy.ndarray]#
- class piscis.data.WeightedDatasetSampler(multi_dataset: MultiSpotsDataset, num_samples: int | None = None, seed: int | None = None)#
Bases:
torch.utils.data.SamplerWeighted dataset sampler.
- Parameters:
- multi_datasetMultiSpotsDataset
Multi-spot detection dataset.
- num_samplesOptional[int], optional
Number of samples to draw. Default is None.
- seedOptional[int], optional
Random seed. Default is None.
- datasets#
- weights#
- generator#
- seed = None#
- __len__()#
- __iter__()#
- class piscis.data.SpotsDataStream(dataset: torch.utils.data.Dataset, min_num_samples: int, epoch: int = 1, seed: int = 0, shuffle: bool = True, augment_cls: Callable | None = None)#
Bases:
torch.utils.data.IterableDatasetSpot detection data stream.
- Parameters:
- datasettorch.utils.data.Dataset
Torch dataset.
- min_num_samplesint
Minimum number of samples per epoch.
- epochint, optional
Current epoch. Default is 1.
- seedint, optional
Random seed. Default is 0.
- shufflebool, optional
Whether to shuffle the dataset. Default is True.
- augment_clsOptional[Callable], optional
Augmentation class. Default is None.
- dataset#
- min_num_samples#
- epoch = 1#
- seed = 0#
- shuffle = True#
- augment_cls = None#
- cached_indices = None#
- _make_sampler(sampler_seed: int, worker_id: int, num_workers: int) torch.utils.data.Sampler | range#
- __len__()#
- __iter__()#
- piscis.data.get_torch_dataloader(dataset: torch.utils.data.Dataset, image_size: Tuple[int, int], batch_size: int = 4, num_workers: int = 4, seed: int = 0, *args, **kwargs) torch.utils.data.DataLoader#
Get a Torch dataloader from a dataset.
- Parameters:
- datasettorch.utils.data.Dataset
Torch dataset.
- image_sizeTuple[int, int]
Desired image size.
- batch_sizeint, optional
Batch size. Default is 4.
- num_workersint, optional
Number of workers for data loading. Default is 4.
- seedint, optional
Random seed used for shuffling the dataset. Default is 0.
- Returns:
- dataloadertorch.utils.data.DataLoader
Torch dataloader.
- Raises:
- ValueError
If the dataset is not an instance of SpotsDataset or MultiSpotsDataset.
- piscis.data.get_torch_dataset(paths: str | List[str] | Dict[str, float] | pathlib.Path, adjustment: str | None = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = False) Dict#
Get a Torch dataset from a directory.
- Parameters:
- pathsUnion[str, List[str], Dict[str, float], Path]
Path to a dataset, path to a directory containing multiple datasets, a list of multiple dataset paths, or a dictionary of multiple dataset paths and their corresponding sampling weights. If a directory of datasets or a list is provided, all datasets in the directory or list will be loaded and concatenated with equal weights. If a dictionary is provided, the datasets will be loaded and concatenated with the specified weights.
- adjustmentOptional[str], optional
Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.
- load_trainbool, optional
Whether to load the training set. Default is True.
- load_valbool, optional
Whether to load the validation set. Default is True.
- load_testbool, optional
Whether to load the test set. Default is False.
- Returns:
- datasetDict
Torch dataset.
- piscis.data.load_datasets(path: str, adjustment: str | None = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = True) Dict#
Load datasets from a directory.
- Parameters:
- pathstr
Path to a dataset or directory of datasets.
- adjustmentOptional[str], optional
Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.
- load_trainbool, optional
Whether to load the training set. Default is True.
- load_valbool, optional
Whether to load the validation set. Default is True.
- load_testbool, optional
Whether to load the test set. Default is True.
- Returns:
- datasetDict
Dataset dictionary.
- piscis.data.load_dataset(path: str, adjustment: str | None = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = True) Dict#
Load a dataset from a directory.
- Parameters:
- pathstr
Path to a dataset.
- adjustmentOptional[str], optional
Adjustment type applied to images. Supported types are ‘normalize’ and ‘standardize’. Default is ‘standardize’.
- load_trainbool, optional
Whether to load the training set. Default is True.
- load_valbool, optional
Whether to load the validation set. Default is True.
- load_testbool, optional
Whether to load the test set. Default is True.
- Returns:
- datasetDict
Dataset dictionary.
- piscis.data.generate_dataset(path: str, images: List[numpy.ndarray], coords: List[numpy.ndarray], seed: int, tile_size: Tuple[int, int] = (256, 256), min_spots: int = 1, train_size: float = 0.7, test_size: float = 0.15) None#
Generate a dataset from images and spot coordinates.
- Parameters:
- pathstr
Path to save dataset.
- imagesList[np.ndarray]
List of images.
- coordsList[np.ndarray]
List of ground truth spot coordinates.
- seedint
Random seed used for splitting the dataset into training, validation, and test sets.
- tile_sizeTuple[int, int], optional
Tile size used for splitting images. Default is (256, 256).
- min_spotsint, optional
Minimum number of spots per tile. Default is 1.
- train_sizefloat, optional
Fraction of dataset used for training. Default is 0.70.
- test_sizefloat, optional
Fraction of dataset used for testing. Default is 0.15.