DataGym.ai
  • DataGym.ai
  • Getting Started
  • Project
    • What is a project?
    • Create a new project
    • Update a project
    • Delete a project
    • Connect a dataset
    • Export data
    • Import Data
  • Dataset
    • What is a dataset?
    • Create a new dataset
    • Update a dataset
    • Connect with AWS S3
    • Delete a dataset
    • Manage images
      • Upload to DataGym.ai
      • Add public links
      • Synchronize with AWS S3
    • Connect to a project
    • Use the review mode
  • Label configuration
    • What is a label configuration?
    • Configuration entry
    • Entry types
    • Creating an entry
    • Editing an entry
    • Duplicating an entry
  • Label mode
    • What is the label mode?
    • Entry-list
    • Value-list
    • Task control
    • Toolbar
    • Workspace
    • AI-assisted labeling
    • Video labeling
  • Tasks
    • What is a task?
    • Process a task
    • Manage Tasks
  • AI-Assistant
    • AI-assisted pre-labeling
    • Object Classes
  • API Token
    • API
    • Manage API Token
  • Account-Management
    • Account Settings
    • Organisation-Management
  • Python API
    • Getting Started
    • Projects
    • Labeled data
    • Datasets
    • Images
    • Label configuration
    • Uploading COCO
  • Changelog
Powered by GitBook
On this page
  • Fetch Datasets from DataGym.ai
  • Get all Datasets
  • Get a Dataset by its name
  • Manage Datasets
  • Create new Datasets
  • Upload Images to a Dataset via URL
  • Add a Dataset to a Project
  • Remove a Dataset from a Project
  • The Dataset object
  • Dataset Attributes
  • Dataset helper methods

Was this helpful?

  1. Python API

Datasets

Manage your DataGym.ai Datasets and their images with our Python API

PreviousLabeled dataNextImages

Last updated 4 years ago

Was this helpful?

The Dataset object in our Python API Wrapper is a representation of the Datasets you already know from DataGym. This object includes the Images that belongs to your DataGym Datasets. Datasets can be accessed through the Client object introduced in the page.

Fetch Datasets from DataGym.ai

Before we look at the Dataset object in our Python API Wrapper, we first have to learn how to get the data from DataGym.ai. There are multiple ways to fetch Datasets from DataGym's backend.

Get all Datasets

datasets = client.get_datasets()

get_datasets returns all Datasets in a list.

Get a Dataset by its name

dummy_dataset = client.get_dataset_by_name(dataset_name="Dummy_Dataset")

get_dataset_by_name returns a specific Dataset by its unique name

Manage Datasets

Create new Datasets

To create a new Dataset on DataGym.ai, you can use the create_dataset method of the Client class. Therefore, you have to specify a Dataset name and an optional short description. In this example we use the owner ID of our Dummy Project.

new_dataset = client.create_dataset(  
                                      name="My first Dataset"
                                      short_description="This is Optional"
                                   )

create_dataset returns the newly created Dataset

Upload Images to a Dataset via URL

The create_images_from_urls method of the Client class helps you to add Images to your Datasets. The methods requires a List of image URLs and a Dataset ID.

images_to_upload = [ 
                         "<IMAGE_URL_JPG_1>", 
                         "<IMAGE_URL_JPG_2>", 
                         "<IMAGE_URL_JPG_3>"
                   ]

images_created = client.create_images_from_urls(dataset_id=new_dataset.id,
                                                image_url_set=images_to_upload)

create_images_from_urls returns a list of errors that may have occurred during the Image upload.

output:
[
 {
  'imageUploadStatus': 'SUCCESS',
  'imageUrl': '<IMAGE_URL_JPG_1>'
 },
 {
  'imageUploadStatus': 'SUCCESS',
  'imageUrl': '<IMAGE_URL_JPG_2>'
 },
 {
  'imageUploadStatus': 'SUCCESS',
  'imageUrl': '<IMAGE_URL_JPG_3>'
 }
]

Add a Dataset to a Project

new_dataset = client.get_dataset_by_name("My first Dataset") 
empty_project = client.get_project_by_name("Some empty Project")

added_success = client.add_dataset(dataset_id=new_dataset.id,
                                   project_id=empty_project.id)

add_dataset returns True if connecting the Dataset was successful

Remove a Dataset from a Project

The remove_dataset method of the Client class allows you to remove a Dataset from a DataGym.ai Project. This method requires a Dataset ID and Project ID. Therefore, the example below fetches these objects before it executes the remove_dataset method:

new_dataset = client.get_dataset_by_name("My first Dataset") 
empty_project = client.get_project_by_name("Some empty Project")

removed_success = client.remove_dataset(dataset_id=new_dataset_1.id,
                                        project_id=empty_project.id)

remove_dataset returns True if removing the Dataset was successful

The Dataset object

Dataset Attributes

Datasets in the Python API are modeled after DataGym's Datasets and, therefore, inherit the same attributes you already know from your DataGym.ai Datasets.

>>>> print(dummy_dataset)
<Dataset {
    'id': '<DATASET_ID>', 
    'name': 'Dummy_Dataset', 
    'short_description': 'This is short', 
    'timestamp': 1583828520627, 
    'owner': '<OWNER_ID>', 
    'images': <List[Image] with 10 elements>
}>

Dataset helper methods

A Dataset can hold any number of images. To simplify the access to the respective Image objects, the Dataset object provides a variety of helper methods.

Get Images by name

In some use-cases you might only be interested in a specific set of images. The get_images_by_name method can reduce the effort of searching for these images.

images = dataset.get_images_by_name(image_name="satellite_img_01.jpg")

get_images_by_name returns a list of Images that match the search term

You can also use regular expressions to start a broader search and return only images that fit into a specific naming pattern. For example, let's get all satellite images from your project:

images = dataset.get_images_by_name(image_name="satellite.*", regex=True)
output
[    
    <Image {
        'id': '<ID_2>', 
        'image_name': 'satellite_img_01.jpg', 
        'image_type': 'SHAREABLE_LINK', 
        'timestamp': 1583831745119
    }>, 
    <Image {
        'id': '<ID_2>', 
        'image_name': 'satellite_img_01.jpg', 
        'image_type': 'SHAREABLE_LINK', 
        'timestamp': 1583831714382
    }>
]

The add_dataset method of the Client class enables you to . This method requires a respective Dataset ID and Project ID. Therefore, the example below fetches these objects before it executes the add_dataset method:

As you can see, a Dataset object also contains a list of its images, which are represented as objects.

Getting Started
Image
connect a Dataset to a Project