piscis.data
===========

.. py:module:: piscis.data


Classes
-------

.. autoapisummary::

   piscis.data.SpotsDataset
   piscis.data.MultiSpotsDataset
   piscis.data.WeightedDatasetSampler
   piscis.data.SpotsDataStream


Functions
---------

.. autoapisummary::

   piscis.data.get_torch_dataloader
   piscis.data.get_torch_dataset
   piscis.data.load_datasets
   piscis.data.load_dataset
   piscis.data.generate_dataset


Module Contents
---------------

.. py:class:: SpotsDataset(x_paths: List[pathlib.Path], y_paths: List[pathlib.Path], adjustment: Optional[str] = 'standardize', split: Optional[str] = None)

   Bases: :py:obj:`torch.utils.data.Dataset`


   Spot detection dataset.


   :Parameters:

       **x** : List[Path]
           List of image paths.

       **y** : List[Path]
           List of ground truth spot coordinates paths.

       **adjustment** : Optional[str], optional
           Adjustment type applied to images. Supported types are 'normalize' and 'standardize'. Default is 'standardize'.

       **split** : Optional[str], optional
           Dataset split. Default is None.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: x_paths


   .. py:attribute:: y_paths


   .. py:attribute:: adjustment
      :value: 'standardize'


   .. py:attribute:: split
      :value: None


   .. py:method:: __len__() -> int


   .. py:method:: __getitem__(index: int) -> Tuple[numpy.ndarray, numpy.ndarray]


.. py:class:: MultiSpotsDataset(datasets: List[SpotsDataset], weights: List[float])

   Bases: :py:obj:`torch.utils.data.Dataset`


   Multi-spot detection dataset.


   :Parameters:

       **datasets** : List[SpotsDataset]
           List of datasets.

       **weights** : List[float]
           List of dataset sampling weights.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: datasets


   .. py:attribute:: weights


   .. py:attribute:: epoch_size


   .. py:attribute:: split


   .. py:method:: __len__() -> int


   .. py:method:: __getitem__(index) -> Tuple[numpy.ndarray, numpy.ndarray]


.. py:class:: WeightedDatasetSampler(multi_dataset: MultiSpotsDataset, num_samples: Optional[int] = None, seed: Optional[int] = None)

   Bases: :py:obj:`torch.utils.data.Sampler`


   Weighted dataset sampler.


   :Parameters:

       **multi_dataset** : MultiSpotsDataset
           Multi-spot detection dataset.

       **num_samples** : Optional[int], optional
           Number of samples to draw. Default is None.

       **seed** : Optional[int], optional
           Random seed. Default is None.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: datasets


   .. py:attribute:: weights


   .. py:attribute:: generator


   .. py:attribute:: seed
      :value: None


   .. py:method:: __len__()


   .. py:method:: __iter__()


.. py:class:: SpotsDataStream(dataset: torch.utils.data.Dataset, min_num_samples: int, epoch: int = 1, seed: int = 0, shuffle: bool = True, augment_cls: Optional[Callable] = None)

   Bases: :py:obj:`torch.utils.data.IterableDataset`


   Spot detection data stream.


   :Parameters:

       **dataset** : torch.utils.data.Dataset
           Torch dataset.

       **min_num_samples** : int
           Minimum number of samples per epoch.

       **epoch** : int, optional
           Current epoch. Default is 1.

       **seed** : int, optional
           Random seed. Default is 0.

       **shuffle** : bool, optional
           Whether to shuffle the dataset. Default is True.

       **augment_cls** : Optional[Callable], optional
           Augmentation class. Default is None.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: dataset


   .. py:attribute:: min_num_samples


   .. py:attribute:: epoch
      :value: 1


   .. py:attribute:: seed
      :value: 0


   .. py:attribute:: shuffle
      :value: True


   .. py:attribute:: augment_cls
      :value: None


   .. py:attribute:: cached_indices
      :value: None


   .. py:method:: set_epoch(epoch: int) -> None


   .. py:method:: next_epoch() -> None


   .. py:method:: _make_sampler(sampler_seed: int, worker_id: int, num_workers: int) -> Union[torch.utils.data.Sampler, range]


   .. py:method:: _get_worker_info() -> Tuple[int, int]
      :staticmethod:


   .. py:method:: __len__()


   .. py:method:: __iter__()


.. py:function:: get_torch_dataloader(dataset: torch.utils.data.Dataset, image_size: Tuple[int, int], batch_size: int = 4, num_workers: int = 4, seed: int = 0, *args, **kwargs) -> torch.utils.data.DataLoader

   
   Get a Torch dataloader from a dataset.


   :Parameters:

       **dataset** : torch.utils.data.Dataset
           Torch dataset.

       **image_size** : Tuple[int, int]
           Desired image size.

       **batch_size** : int, optional
           Batch size. Default is 4.

       **num_workers** : int, optional
           Number of workers for data loading. Default is 4.

       **seed** : int, optional
           Random seed used for shuffling the dataset. Default is 0.


   :Returns:

       **dataloader** : torch.utils.data.DataLoader
           Torch dataloader.


   :Raises:

       ValueError
           If the dataset is not an instance of SpotsDataset or MultiSpotsDataset.


   ..
       !! processed by numpydoc !!

.. py:function:: get_torch_dataset(paths: Union[str, List[str], Dict[str, float], pathlib.Path], adjustment: Optional[str] = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = False) -> Dict

   
   Get a Torch dataset from a directory.


   :Parameters:

       **paths** : Union[str, List[str], Dict[str, float], Path]
           Path to a dataset, path to a directory containing multiple datasets, a list of multiple dataset paths, or a
           dictionary of multiple dataset paths and their corresponding sampling weights. If a directory of datasets or a
           list is provided, all datasets in the directory or list will be loaded and concatenated with equal weights. If
           a dictionary is provided, the datasets will be loaded and concatenated with the specified weights.

       **adjustment** : Optional[str], optional
           Adjustment type applied to images. Supported types are 'normalize' and 'standardize'. Default is 'standardize'.

       **load_train** : bool, optional
           Whether to load the training set. Default is True.

       **load_val** : bool, optional
           Whether to load the validation set. Default is True.

       **load_test** : bool, optional
           Whether to load the test set. Default is False.


   :Returns:

       **dataset** : Dict
           Torch dataset.


   ..
       !! processed by numpydoc !!

.. py:function:: load_datasets(path: str, adjustment: Optional[str] = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = True) -> Dict

   
   Load datasets from a directory.


   :Parameters:

       **path** : str
           Path to a dataset or directory of datasets.

       **adjustment** : Optional[str], optional
           Adjustment type applied to images. Supported types are 'normalize' and 'standardize'. Default is 'standardize'.

       **load_train** : bool, optional
           Whether to load the training set. Default is True.

       **load_val** : bool, optional
           Whether to load the validation set. Default is True.

       **load_test** : bool, optional
           Whether to load the test set. Default is True.


   :Returns:

       **dataset** : Dict
           Dataset dictionary.


   ..
       !! processed by numpydoc !!

.. py:function:: load_dataset(path: str, adjustment: Optional[str] = 'standardize', load_train: bool = True, load_val: bool = True, load_test: bool = True) -> Dict

   
   Load a dataset from a directory.


   :Parameters:

       **path** : str
           Path to a dataset.

       **adjustment** : Optional[str], optional
           Adjustment type applied to images. Supported types are 'normalize' and 'standardize'. Default is 'standardize'.

       **load_train** : bool, optional
           Whether to load the training set. Default is True.

       **load_val** : bool, optional
           Whether to load the validation set. Default is True.

       **load_test** : bool, optional
           Whether to load the test set. Default is True.


   :Returns:

       **dataset** : Dict
           Dataset dictionary.


   ..
       !! processed by numpydoc !!

.. py:function:: generate_dataset(path: str, images: List[numpy.ndarray], coords: List[numpy.ndarray], seed: int, tile_size: Tuple[int, int] = (256, 256), min_spots: int = 1, train_size: float = 0.7, test_size: float = 0.15) -> None

   
   Generate a dataset from images and spot coordinates.


   :Parameters:

       **path** : str
           Path to save dataset.

       **images** : List[np.ndarray]
           List of images.

       **coords** : List[np.ndarray]
           List of ground truth spot coordinates.

       **seed** : int
           Random seed used for splitting the dataset into training, validation, and test sets.

       **tile_size** : Tuple[int, int], optional
           Tile size used for splitting images. Default is (256, 256).

       **min_spots** : int, optional
           Minimum number of spots per tile. Default is 1.

       **train_size** : float, optional
           Fraction of dataset used for training. Default is 0.70.

       **test_size** : float, optional
           Fraction of dataset used for testing. Default is 0.15.


   ..
       !! processed by numpydoc !!