Datasets

Manage your DataGym.ai Datasets and their images with our Python API

The Dataset object in our Python API Wrapper is a representation of the Datasets you already know from DataGym. This object includes the Images that belongs to your DataGym Datasets. Datasets can be accessed through the Client object introduced in the Getting Started page.

Fetch Datasets from DataGym.ai

Before we look at the Dataset object in our Python API Wrapper, we first have to learn how to get the data from DataGym.ai. There are multiple ways to fetch Datasets from DataGym's backend.

Get all Datasets

datasets = client.get_datasets()

get_datasets returns all Datasets in a list.

Get a Dataset by its name

dummy_dataset = client.get_dataset_by_name(dataset_name="Dummy_Dataset")

get_dataset_by_name returns a specific Dataset by its unique name

Manage Datasets

Create new Datasets

To create a new Dataset on DataGym.ai, you can use the create_dataset method of the Client class. Therefore, you have to specify a Dataset name and an optional short description. In this example we use the owner ID of our Dummy Project.

new_dataset = client.create_dataset(  
                                      name="My first Dataset"
                                      short_description="This is Optional"
                                   )

create_dataset returns the newly created Dataset

Upload Images to a Dataset via URL

The create_images_from_urls method of the Client class helps you to add Images to your Datasets. The methods requires a List of image URLs and a Dataset ID.

images_to_upload = [ 
                         "<IMAGE_URL_JPG_1>", 
                         "<IMAGE_URL_JPG_2>", 
                         "<IMAGE_URL_JPG_3>"
                   ]

images_created = client.create_images_from_urls(dataset_id=new_dataset.id,
                                                image_url_set=images_to_upload)

create_images_from_urls returns a list of errors that may have occurred during the Image upload.

output:
[
 {
  'imageUploadStatus': 'SUCCESS',
  'imageUrl': '<IMAGE_URL_JPG_1>'
 },
 {
  'imageUploadStatus': 'SUCCESS',
  'imageUrl': '<IMAGE_URL_JPG_2>'
 },
 {
  'imageUploadStatus': 'SUCCESS',
  'imageUrl': '<IMAGE_URL_JPG_3>'
 }
]

Add a Dataset to a Project

The add_dataset method of the Client class enables you to connect a Dataset to a Project. This method requires a respective Dataset ID and Project ID. Therefore, the example below fetches these objects before it executes the add_dataset method:

new_dataset = client.get_dataset_by_name("My first Dataset") 
empty_project = client.get_project_by_name("Some empty Project")

added_success = client.add_dataset(dataset_id=new_dataset.id,
                                   project_id=empty_project.id)

add_dataset returns True if connecting the Dataset was successful

Remove a Dataset from a Project

The remove_dataset method of the Client class allows you to remove a Dataset from a DataGym.ai Project. This method requires a Dataset ID and Project ID. Therefore, the example below fetches these objects before it executes the remove_dataset method:

new_dataset = client.get_dataset_by_name("My first Dataset") 
empty_project = client.get_project_by_name("Some empty Project")

removed_success = client.remove_dataset(dataset_id=new_dataset_1.id,
                                        project_id=empty_project.id)

remove_dataset returns True if removing the Dataset was successful

The Dataset object

Dataset Attributes

Datasets in the Python API are modeled after DataGym's Datasets and, therefore, inherit the same attributes you already know from your DataGym.ai Datasets.

>>>> print(dummy_dataset)
<Dataset {
    'id': '<DATASET_ID>', 
    'name': 'Dummy_Dataset', 
    'short_description': 'This is short', 
    'timestamp': 1583828520627, 
    'owner': '<OWNER_ID>', 
    'images': <List[Image] with 10 elements>
}>

As you can see, a Dataset object also contains a list of its images, which are represented as Image objects.

Dataset helper methods

A Dataset can hold any number of images. To simplify the access to the respective Image objects, the Dataset object provides a variety of helper methods.

Get Images by name

In some use-cases you might only be interested in a specific set of images. The get_images_by_name method can reduce the effort of searching for these images.

images = dataset.get_images_by_name(image_name="satellite_img_01.jpg")

get_images_by_name returns a list of Images that match the search term

You can also use regular expressions to start a broader search and return only images that fit into a specific naming pattern. For example, let's get all satellite images from your project:

images = dataset.get_images_by_name(image_name="satellite.*", regex=True)
output
[    
    <Image {
        'id': '<ID_2>', 
        'image_name': 'satellite_img_01.jpg', 
        'image_type': 'SHAREABLE_LINK', 
        'timestamp': 1583831745119
    }>, 
    <Image {
        'id': '<ID_2>', 
        'image_name': 'satellite_img_01.jpg', 
        'image_type': 'SHAREABLE_LINK', 
        'timestamp': 1583831714382
    }>
]

Last updated