# Datasets

The Dataset object in our Python API Wrapper is a representation of the Datasets you already know from DataGym. This object includes the Images that belongs to your DataGym Datasets. Datasets can be accessed through the Client object introduced in the [Getting Started](https://docs.datagym.ai/documentation/python-api/getting-started) page.

## Fetch Datasets from DataGym.ai

Before we look at the Dataset object in our Python API Wrapper, we first have to learn how to get the data from **Data**Gym.ai. There are multiple ways to fetch Datasets from DataGym's backend.

### Get all Datasets

```python
datasets = client.get_datasets()
```

`get_datasets` returns all Datasets in a list.

### Get a Dataset by its name

```python
dummy_dataset = client.get_dataset_by_name(dataset_name="Dummy_Dataset")
```

`get_dataset_by_name` returns a specific Dataset by its unique name

## Manage Datasets

### Create new Datasets

To create a new Dataset on **Data**Gym.ai, you can use the `create_dataset` method of the Client class. Therefore, you have to specify a Dataset **name** and an optional **short description**. In this example we use the owner ID of our Dummy Project.&#x20;

```python
new_dataset = client.create_dataset(  
                                      name="My first Dataset"
                                      short_description="This is Optional"
                                   )
```

`create_dataset` returns the newly created Dataset

### Upload Images to a Dataset via URL

The `create_images_from_urls` method of the Client class helps you to add Images to your Datasets. The methods requires a List of image URLs and a Dataset ID.

```python
images_to_upload = [ 
                         "<IMAGE_URL_JPG_1>", 
                         "<IMAGE_URL_JPG_2>", 
                         "<IMAGE_URL_JPG_3>"
                   ]

images_created = client.create_images_from_urls(dataset_id=new_dataset.id,
                                                image_url_set=images_to_upload)
```

`create_images_from_urls` returns a list of errors that may have occurred during the Image upload.

{% code title="output:" %}

```python
[
 {
  'imageUploadStatus': 'SUCCESS',
  'imageUrl': '<IMAGE_URL_JPG_1>'
 },
 {
  'imageUploadStatus': 'SUCCESS',
  'imageUrl': '<IMAGE_URL_JPG_2>'
 },
 {
  'imageUploadStatus': 'SUCCESS',
  'imageUrl': '<IMAGE_URL_JPG_3>'
 }
]
```

{% endcode %}

### Add a Dataset to a Project

The `add_dataset` method of the Client class enables you to [connect a Dataset to a Project](https://docs.datagym.ai/documentation/project/connect-a-dataset#connect-a-dataset-with-the-project). This method requires a respective Dataset ID and Project ID. Therefore, the example below fetches these objects before it executes the `add_dataset` method:

```python
new_dataset = client.get_dataset_by_name("My first Dataset") 
empty_project = client.get_project_by_name("Some empty Project")

added_success = client.add_dataset(dataset_id=new_dataset.id,
                                   project_id=empty_project.id)
```

`add_dataset` returns `True` if connecting the Dataset was successful

### Remove a Dataset from a Project

The `remove_dataset` method of the Client class allows you to remove a Dataset from a **Data**Gym.ai Project. This method requires a Dataset ID and Project ID. Therefore, the example below fetches these objects before it executes the `remove_dataset` method:

```python
new_dataset = client.get_dataset_by_name("My first Dataset") 
empty_project = client.get_project_by_name("Some empty Project")

removed_success = client.remove_dataset(dataset_id=new_dataset_1.id,
                                        project_id=empty_project.id)
```

`remove_dataset` returns `True` if removing the Dataset was successful

## The Dataset object

### Dataset Attributes

Datasets in the Python API are modeled after DataGym's Datasets and, therefore, inherit the same attributes you already know from your **Data**Gym.ai Datasets.

{% code title=">>>> print(dummy\_dataset)" %}

```python
<Dataset {
    'id': '<DATASET_ID>', 
    'name': 'Dummy_Dataset', 
    'short_description': 'This is short', 
    'timestamp': 1583828520627, 
    'owner': '<OWNER_ID>', 
    'images': <List[Image] with 10 elements>
}>
```

{% endcode %}

As you can see, a Dataset object also contains a list of its images, which are represented as [Image](https://docs.datagym.ai/documentation/python-api/images) objects.&#x20;

### Dataset helper methods

A Dataset can hold any number of images. To simplify the access to the respective Image objects, the Dataset object provides a variety of helper methods.

#### Get Images by name

In some use-cases you might only be interested in a specific set of images. The `get_images_by_name` method can reduce the effort of searching for these images.

```python
images = dataset.get_images_by_name(image_name="satellite_img_01.jpg")
```

`get_images_by_name` returns a list of Images that match the search term

You can also use regular expressions to start a broader search and return only images that fit into a specific naming pattern. For example, let's get all satellite images from your project:

```python
images = dataset.get_images_by_name(image_name="satellite.*", regex=True)
```

{% code title="output" %}

```python
[    
    <Image {
        'id': '<ID_2>', 
        'image_name': 'satellite_img_01.jpg', 
        'image_type': 'SHAREABLE_LINK', 
        'timestamp': 1583831745119
    }>, 
    <Image {
        'id': '<ID_2>', 
        'image_name': 'satellite_img_01.jpg', 
        'image_type': 'SHAREABLE_LINK', 
        'timestamp': 1583831714382
    }>
]
```

{% endcode %}
