Dataset Module¶

This module holds functionality related to dataset management. Both for downloading and iterating on the dataset.

In the project we use 2 datasets:

class stransfer.dataset.CocoDataset(images=None, image_limit=None)[source]¶: An implementation of the torch.utils.data.Dataset class, specific for the COCO dataset

class stransfer.dataset.VideoDataset(videos=None, data_limit=None, batch_size=3)[source]¶: Dataset wrapper for the video dataset

stransfer.dataset.download_coco_images()[source]¶

Ensures that the coco dataset is downloaded

stransfer.dataset.download_from_url(url, dst)[source]¶

Parameters

Return type

int

Returns

the size of the downloaded file

stransfer.dataset.download_list_of_urls(urls, destination_folder='data/video/')[source]¶

Download a list of urls into destination_folder

Parameters

urls (List[str]) – list of urls to download
destination_folder – the destination folder to which they will be downloaded

Return type

None

Returns

None

stransfer.dataset.download_videos_dataset()[source]¶

Ensures that the videos in the video dataset have been downloaded

stransfer.dataset.get_coco_loader(batch_size=4, test_split=0.1, test_limit=None, train_limit=None)[source]¶

Sets up and returns the dataloaders for the coco dataset

Parameters

batch_size – the amount of elements we want per batch
test_split – the percentage of items from the whole set that we want to be part of the test set
test_limit – the maximum amount of items we want in our test set
train_limit – the maximum amount of items we want in the training set

Return type

Tuple[DataLoader, DataLoader]

Returns

the test set dataloader, and the train set dataloader

stransfer.dataset.iterate_on_video_batches(batch, max_frames=2160)[source]¶

Generator that, given a list of video readers, will yield at each iteration a list composed of one frame from each video.

Parameters

batch (List[Reader]) – batch of video readers we want to iterate on
max_frames – the maximum number of frames we want to yield. By default we limit to 90 seconds which is the same as 90*24 if the videos are 24 FPS

Return type

Generator[Tensor, None, None]

stransfer.dataset.make_batches(l, n)[source]¶

Yield successive n-sized chunks from l.

Parameters

Return type

List[List[Any]]

Returns

list of batches of elements