Skip to content

DataBlob

airt.client.DataBlob

A class for importing and processing data from sources such as CSV/parquet files, databases, AWS S3 buckets, and Azure Blob Storage.

Currently, the only way to instantiate the DataBlob class is to call one of the following static methods from_local, from_mysql, from_clickhouse, from_s3, or from_azure_blob_storage which imports the data in the parquet file format from:

  • a local CSV/parquet file,

  • a MySql database,

  • a ClickHouse database

  • an AWS S3 bucket, and

  • an Azure Blob Storage respectively.

We intend to support additional databases and storage mediums in future releases.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())

__init__(self, uuid, type=None, source=None, region=None, cloud_provider=None, datasources=None, total_steps=None, completed_steps=None, folder_size=None, disabled=None, pulled_on=None, user=None, tags=None, error=None) special

Constructs a new DataBlob instance.

Warning

Do not construct this object directly by calling the constructor, please use from_s3, from_azure_blob_storage, from_mysql, from_clickhouse or from_local methods instead.

Parameters:

Name Type Description Default
uuid str

Datablob uuid.

required
source Optional[str]

The URI of the data that was used to create the datablob.

None
type Optional[str]

The type of source used to generate the datablob. Depending on the source type, one of the following values will be assigned: "s3", "local", "db", or "azure_blob_storage".

None
region Optional[str]

The destination cloud provider's region to store the datablob. If None (default value) then the default region will be assigned based on the cloud provider.

None
cloud_provider Optional[str]

Cloud storage provider's name to store the datablob. Currently, the API only supports aws and azure as cloud storage providers.

None
datasources Optional[List[str]]

The uuids of the datasources created from the datablob.

None
total_steps Optional[int]

The number of steps required to upload the datablob to the server.

None
completed_steps Optional[int]

The number of steps completed during the datablob's upload to the server.

None
folder_size Optional[int]

The uploaded datablob's size in bytes.

None
disabled Optional[bool]

A flag that indicates the datablob's status. If the datablob is deleted, then False will be set.

None
pulled_on Optional[str]

The most recent date the datablob was uploaded.

None
user Optional[str]

The uuid of the user who created the datablob.

None
tags Optional[List]

Tag names associated with the datablob.

None
error Optional[str]

Contains the error message if the processing of the datablob fails.

None
Source code in airt/client.py
def __init__(
    self,
    uuid: str,
    type: Optional[str] = None,
    source: Optional[str] = None,
    region: Optional[str] = None,
    cloud_provider: Optional[str] = None,
    datasources: Optional[List[str]] = None,
    total_steps: Optional[int] = None,
    completed_steps: Optional[int] = None,
    folder_size: Optional[int] = None,
    disabled: Optional[bool] = None,
    pulled_on: Optional[str] = None,
    user: Optional[str] = None,
    tags: Optional[List] = None,
    error: Optional[str] = None,
):
    """Constructs a new DataBlob instance.

    Warning:
        Do not construct this object directly by calling the constructor, please use `from_s3`, `from_azure_blob_storage`,
        `from_mysql`, `from_clickhouse` or `from_local` methods instead.

    Args:
        uuid: Datablob uuid.
        source: The URI of the data that was used to create the datablob.
        type: The type of source used to generate the datablob. Depending on the source type, one of the following
            values will be assigned: "s3", "local", "db", or "azure_blob_storage".
        region: The destination cloud provider's region to store the datablob. If None (default value) then the default region will be assigned based on the cloud provider.
        cloud_provider: Cloud storage provider's name to store the datablob. Currently, the API only supports **aws** and **azure** as cloud storage providers.
        datasources: The uuids of the datasources created from the datablob.
        total_steps: The number of steps required to upload the datablob to the server.
        completed_steps: The number of steps completed during the datablob's upload to the server.
        folder_size: The uploaded datablob's size in bytes.
        disabled: A flag that indicates the datablob's status. If the datablob is deleted, then **False** will be set.
        pulled_on: The most recent date the datablob was uploaded.
        user: The uuid of the user who created the datablob.
        tags: Tag names associated with the datablob.
        error: Contains the error message if the processing of the datablob fails.
    """
    self.uuid = uuid
    self.type = type
    self.source = source
    self.region = region
    self.cloud_provider = cloud_provider
    self.datasources = datasources
    self.total_steps = total_steps
    self.completed_steps = completed_steps
    self.folder_size = folder_size
    self.disabled = disabled
    self.pulled_on = pulled_on
    self.user = user
    self.tags = tags
    self.error = error

as_df(dbx) staticmethod

Return the details of datablob instances as a pandas dataframe.

Parameters:

Name Type Description Default
dbx List[DataBlob]

List of datablob instances.

required

Returns:

Type Description
DataFrame

Details of all the datablobs in a dataframe.

Exceptions:

Type Description
ConnectionError

If the server address is invalid or not reachable.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())
Source code in airt/client.py
@staticmethod
def as_df(dbx: List["DataBlob"]) -> pd.DataFrame:
    """Return the details of datablob instances as a pandas dataframe.

    Args:
        dbx: List of datablob instances.

    Returns:
        Details of all the datablobs in a dataframe.

    Raises:
        ConnectionError: If the server address is invalid or not reachable.
    """
    db_lists = get_attributes_from_instances(dbx, DataBlob.ALL_DB_COLS)  # type: ignore

    for db in db_lists:
        db = DataBlob._get_tag_name_and_datasource_id(db)

    lists_df = generate_df(db_lists, DataBlob.BASIC_DB_COLS)
    df = add_ready_column(lists_df)

    df = df.rename(columns=DataBlob.COLS_TO_RENAME)

    return df

delete(self)

Delete a datablob from the server.

Returns:

Type Description
DataFrame

A pandas DataFrame encapsulating the details of the deleted datablob.

Exceptions:

Type Description
ConnectionError

If the server address is invalid or not reachable.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())
Source code in airt/client.py
@patch
def delete(self: DataBlob) -> pd.DataFrame:
    """Delete a datablob from the server.

    Returns:
        A pandas DataFrame encapsulating the details of the deleted datablob.

    Raises:
        ConnectionError: If the server address is invalid or not reachable.
    """

    response = Client._delete_data(relative_url=f"/datablob/{self.uuid}")

    response = DataBlob._get_tag_name_and_datasource_id(response)

    df = pd.DataFrame([response])[DataBlob.BASIC_DB_COLS]

    df = df.rename(columns=DataBlob.COLS_TO_RENAME)

    return add_ready_column(df)

details(self)

Return details of a datablob.

Returns:

Type Description
DataFrame

The datablob details as a pandas dataframe.

Exceptions:

Type Description
ConnectionError

If the server address is invalid or not reachable.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())

Columns in the resulting dataframe are: uuid, datasources, type, source, region, cloud_provider, tags, pulled_on, completed_steps, total_steps, folder_size, user, error, disabled.

Source code in airt/client.py
@patch
def details(self: DataBlob) -> pd.DataFrame:
    """Return details of a datablob.

    Returns:
        The datablob details as a pandas dataframe.

    Raises:
        ConnectionError: If the server address is invalid or not reachable.
    """

    details = Client._get_data(relative_url=f"/datablob/{self.uuid}")

    details = DataBlob._get_tag_name_and_datasource_id(details)

    details_df = pd.DataFrame([details])[DataBlob.ALL_DB_COLS]

    details_df = details_df.rename(columns=DataBlob.COLS_TO_RENAME)

    return add_ready_column(details_df)

from_azure_blob_storage(uri, credential, cloud_provider=None, region=None, tag=None) classmethod

Create and return a datablob that encapsulates the data from an Azure Blob Storage.

Parameters:

Name Type Description Default
uri str

Azure Blob Storage URI of the source file.

required
credential str

Credential to access the Azure Blob Storage.

required
cloud_provider Optional[str]

The destination cloud storage provider's name to store the datablob. Currently, the API only supports aws and azure as cloud storage providers. If None (default value), then azure will be used as the cloud storage provider.

None
region Optional[str]

The destination cloud provider's region to store the datablob. If None (default value) then the default region will be assigned based on the cloud provider. In the case of aws, eu-west-1 will be used and in the case of azure, westeurope will be used. The supported AWS regions are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1, us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast, brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast, japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia, switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.

None
tag Optional[str]

A string to tag the datablob. If not passed, then the tag latest will be assigned to the datablob.

None

Returns:

Type Description
DataBlob

An instance of the DataBlob class.

Exceptions:

Type Description
ValueError

If parameters to the API are invalid.

ConnectionError

If the server address is invalid or not reachable.

To create a Datablob from Azure Blob Storage, you must have a valid Azure Blob Storage credential.

If you don't know how to get the Azure Blob Storage credential, you can follow the below python example. It's one of the ways to get the Azure Blob Storage credential.

  • If you don't already have it, please install the Azure Storage Management (azure-mgmt-storage) and Azure Resource Management (azure-mgmt-resource) python client libraries using pip.

  • Ensure the following four environment variables are set into your current working environment with appropriate values.

    • AZURE_TENANT_ID

    • AZURE_CLIENT_ID

    • AZURE_CLIENT_SECRET

    • AZURE_SUBSCRIPTION_ID

  • Assign the resource group name in the GROUP_NAME variable and the storage account name in the STORAGE_ACCOUNT_NAME variable.

  • Below is a sample code to create a datablob and storing it in S3. Please copy it and replace the placeholders with appropriate values

Examples:

# Importing necessary libraries
import os

from azure.identity import DefaultAzureCredential
from azure.mgmt.storage import StorageManagementClient

from  airt.client import Client, DataBlob

# Create a credential for accessing Azure Blob Storage
# Setting the required environment variables
os.environ["AZURE_SUBSCRIPTION_ID"] = "{fill in azure_subscription_id}"
os.environ["AZURE_CLIENT_ID"] = "{fill in azure_client_id}"
os.environ["AZURE_CLIENT_SECRET"] = "{fill in azure_client_secret}"
os.environ["AZURE_TENANT_ID"]= "{fill in azure_tenant_id}"

# Setting the resource group name and storage account name
azure_group_name = "{fill in azure_group_name}"
azure_storage_account_name = "{fill in azure_storage_account_name}"

# Retrieving the credential
azure_storage_client = StorageManagementClient(
    DefaultAzureCredential(), os.environ["AZURE_SUBSCRIPTION_ID"]
)
azure_storage_keys = azure_storage_client.storage_accounts.list_keys(
    azure_group_name, azure_storage_account_name
)
azure_storage_keys = {v.key_name: v.value for v in azure_storage_keys.keys}
credential = azure_storage_keys['key1']


# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The region
# is set to eu-west-1 (default), feel free to change the cloud provider and
# the region to suit your needs.
db = DataBlob.from_azure_blob_storage(
    uri="{fill in uri}",
    cloud_provider="aws",
    credential=credential
)

# Display the status in a progress bar
db.progress_bar()

# Print the details of the newly created datablob
# If the upload is successful, the ready flag should be set to True
print(db.details())
Source code in airt/client.py
@classmethod
def from_azure_blob_storage(
    cls,
    uri: str,
    credential: str,
    cloud_provider: Optional[str] = None,
    region: Optional[str] = None,
    tag: Optional[str] = None,
) -> "DataBlob":
    """Create and return a datablob that encapsulates the data from an Azure Blob Storage.

    Args:
        uri: Azure Blob Storage URI of the source file.
        credential: Credential to access the Azure Blob Storage.
        cloud_provider: The destination cloud storage provider's name to store the datablob. Currently, the API only supports **aws** and **azure** as cloud storage providers.
            If **None** (default value), then **azure**  will be used as the cloud storage provider.
        region: The destination cloud provider's region to store the datablob. If **None** (default value) then the default region will be assigned based on the cloud
            provider. In the case of **aws**, **eu-west-1** will be used and in the case of **azure**, **westeurope** will be used. The supported AWS regions
            are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1,
            us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast,
            brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast,
            japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia,
            switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.
        tag: A string to tag the datablob. If not passed, then the tag **latest** will be assigned to the datablob.

    Returns:
        An instance of the `DataBlob` class.

    Raises:
        ValueError: If parameters to the API are invalid.
        ConnectionError: If the server address is invalid or not reachable.

    To create a Datablob from Azure Blob Storage, you must have a valid Azure Blob Storage credential.

    If you don't know how to get the Azure Blob Storage credential, you can follow the below python example. It's one of the ways to get the Azure Blob Storage credential.

    - If you don't already have it, please install the Azure Storage Management (azure-mgmt-storage) and Azure Resource Management (azure-mgmt-resource) python client libraries using pip.

    - Ensure the following four environment variables are set into your current working environment with appropriate values.

        - AZURE_TENANT_ID

        - AZURE_CLIENT_ID

        - AZURE_CLIENT_SECRET

        - AZURE_SUBSCRIPTION_ID

    - Assign the resource group name in the GROUP_NAME variable and the storage account name in the STORAGE_ACCOUNT_NAME variable.

    - Below is a sample code to create a datablob and storing it in S3. Please copy it and replace the placeholders with appropriate values

    Example:
        ```python
        # Importing necessary libraries
        import os

        from azure.identity import DefaultAzureCredential
        from azure.mgmt.storage import StorageManagementClient

        from  airt.client import Client, DataBlob

        # Create a credential for accessing Azure Blob Storage
        # Setting the required environment variables
        os.environ["AZURE_SUBSCRIPTION_ID"] = "{fill in azure_subscription_id}"
        os.environ["AZURE_CLIENT_ID"] = "{fill in azure_client_id}"
        os.environ["AZURE_CLIENT_SECRET"] = "{fill in azure_client_secret}"
        os.environ["AZURE_TENANT_ID"]= "{fill in azure_tenant_id}"

        # Setting the resource group name and storage account name
        azure_group_name = "{fill in azure_group_name}"
        azure_storage_account_name = "{fill in azure_storage_account_name}"

        # Retrieving the credential
        azure_storage_client = StorageManagementClient(
            DefaultAzureCredential(), os.environ["AZURE_SUBSCRIPTION_ID"]
        )
        azure_storage_keys = azure_storage_client.storage_accounts.list_keys(
            azure_group_name, azure_storage_account_name
        )
        azure_storage_keys = {v.key_name: v.value for v in azure_storage_keys.keys}
        credential = azure_storage_keys['key1']


        # Authenticate
        Client.get_token(username="{fill in username}", password="{fill in password}")

        # Create a datablob
        # In this example, the datablob will be stored in an AWS S3 bucket. The region
        # is set to eu-west-1 (default), feel free to change the cloud provider and
        # the region to suit your needs.
        db = DataBlob.from_azure_blob_storage(
            uri="{fill in uri}",
            cloud_provider="aws",
            credential=credential
        )

        # Display the status in a progress bar
        db.progress_bar()

        # Print the details of the newly created datablob
        # If the upload is successful, the ready flag should be set to True
        print(db.details())
        ```
    """
    cloud_provider, region = DataBlob._get_cloud_provider_and_region(cloud_provider=cloud_provider, region=region, default_cloud_provider="azure")  # type: ignore

    response = Client._post_data(
        relative_url="/datablob/from_azure_blob_storage",
        json=dict(
            uri=uri,
            credential=credential,
            region=region,
            cloud_provider=cloud_provider,
            tag=tag,
        ),
    )

    return DataBlob(
        uuid=response["uuid"], type=response["type"], source=response["source"]
    )

from_clickhouse(*, host, database, table, protocol, index_column, timestamp_column, port=0, cloud_provider=None, region=None, username=None, password=None, filters=None, tag=None) staticmethod

Create and return a datablob that encapsulates the data from a ClickHouse database.

If the database requires authentication, pass the username/password as parameters or store it in the CLICKHOUSE_USERNAME and CLICKHOUSE_PASSWORD environment variables.

Parameters:

Name Type Description Default
host str

Remote database host name.

required
database str

Database name.

required
table str

Table name.

required
protocol str

Protocol to use. The valid values are "native" and "http".

required
index_column str

The column to use as index (row labels).

required
timestamp_column str

Timestamp column name in the tabel.

required
port int

Host port number. If not passed, then the default value 0 will be used.

0
cloud_provider Optional[str]

The destination cloud storage provider's name to store the datablob. Currently, the API only supports aws and azure as cloud storage providers. If None (default value), then aws will be used as the cloud storage provider.

None
region Optional[str]

The destination cloud provider's region to store the datablob. If None (default value) then the default region will be assigned based on the cloud provider. In the case of aws, eu-west-1 will be used and in the case of azure, westeurope will be used. The supported AWS regions are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1, us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast, brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast, japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia, switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.

None
username Optional[str]

Database username. If not passed, the default value "root" will be used unless the value is explicitly set in the environment variable CLICKHOUSE_USERNAME.

None
password Optional[str]

Database password. If not passed, the default value "root" will be used unless the value is explicitly set in the environment variable CLICKHOUSE_PASSWORD.

None
filters Optional[Dict[str, Any]]

Additional parameters to be used when importing data. For example, if you want to filter and extract data only for a specific user_id, pass {"user_id": 1}.

None
tag Optional[str]

A string to tag the datablob. If not passed, then the tag latest will be assigned to the datablob.

None

Returns:

Type Description
DataBlob

An instance of the DataBlob class.

Exceptions:

Type Description
ValueError

If parameters to the API are invalid.

ConnectionError

If the server address is invalid or not reachable.

Here's an example of how to create a Datablob from a ClickHouse database:

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The region
# is set to eu-west-3, feel free to change the cloud provider and the region
# to suit your needs.
db = DataBlob.from_clickhouse(
    username="{fill in database_username}",
    password="{fill in database_password}",
    host="{fill in host}",
    database="{fill in database}",
    table="{fill in table}",
    index_column="{fill in index_column}",
    timestamp_column="{fill in timestamp_column}",
    port="{fill in port}",
    filters={fill in filters},
    protocol="native",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
db.progress_bar()

# Print the details of the newly created datablob
# If the upload is successful, the ready flag should be set to True
print(db.details())
Source code in airt/client.py
@staticmethod
def from_clickhouse(
    *,
    host: str,
    database: str,
    table: str,
    protocol: str,
    index_column: str,
    timestamp_column: str,
    port: int = 0,
    cloud_provider: Optional[str] = None,
    region: Optional[str] = None,
    username: Optional[str] = None,
    password: Optional[str] = None,
    filters: Optional[Dict[str, Any]] = None,
    tag: Optional[str] = None,
) -> "DataBlob":
    """Create and return a datablob that encapsulates the data from a ClickHouse database.

    If the database requires authentication, pass the username/password as parameters or store it in
    the **CLICKHOUSE_USERNAME** and **CLICKHOUSE_PASSWORD** environment variables.

    Args:
        host: Remote database host name.
        database: Database name.
        table: Table name.
        protocol: Protocol to use. The valid values are "native" and "http".
        index_column: The column to use as index (row labels).
        timestamp_column: Timestamp column name in the tabel.
        port: Host port number. If not passed, then the default value **0** will be used.
        cloud_provider: The destination cloud storage provider's name to store the datablob. Currently, the API only supports **aws** and **azure** as cloud storage providers.
            If **None** (default value), then **aws**  will be used as the cloud storage provider.
        region: The destination cloud provider's region to store the datablob. If **None** (default value) then the default region will be assigned based on the cloud
            provider. In the case of **aws**, **eu-west-1** will be used and in the case of **azure**, **westeurope** will be used. The supported AWS regions
            are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1,
            us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast,
            brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast,
            japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia,
            switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.
        username: Database username. If not passed, the default value "root" will be used unless the value is explicitly set in the environment variable
            **CLICKHOUSE_USERNAME**.
        password: Database password. If not passed, the default value "root" will be used unless the value is explicitly set in the environment variable
            **CLICKHOUSE_PASSWORD**.
        filters: Additional parameters to be used when importing data. For example, if you want to filter and extract data only for a specific user_id, pass {"user_id": 1}.
        tag: A string to tag the datablob. If not passed, then the tag **latest** will be assigned to the datablob.

    Returns:
       An instance of the `DataBlob` class.

    Raises:
        ValueError: If parameters to the API are invalid.
        ConnectionError: If the server address is invalid or not reachable.

    Here's an example of how to create a Datablob from a ClickHouse database:

    Example:
        ```python
        # Importing necessary libraries
        from  airt.client import Client, DataBlob

        # Authenticate
        Client.get_token(username="{fill in username}", password="{fill in password}")

        # Create a datablob
        # In this example, the datablob will be stored in an AWS S3 bucket. The region
        # is set to eu-west-3, feel free to change the cloud provider and the region
        # to suit your needs.
        db = DataBlob.from_clickhouse(
            username="{fill in database_username}",
            password="{fill in database_password}",
            host="{fill in host}",
            database="{fill in database}",
            table="{fill in table}",
            index_column="{fill in index_column}",
            timestamp_column="{fill in timestamp_column}",
            port="{fill in port}",
            filters={fill in filters},
            protocol="native",
            cloud_provider="aws",
            region="eu-west-3"
        )

        # Display the status in a progress bar
        db.progress_bar()

        # Print the details of the newly created datablob
        # If the upload is successful, the ready flag should be set to True
        print(db.details())
        ```
    """
    username = (
        username
        if username is not None
        else os.environ.get("CLICKHOUSE_USERNAME", "root")
    )

    password = (
        password
        if password is not None
        else os.environ.get("CLICKHOUSE_PASSWORD", "")
    )

    cloud_provider, region = DataBlob._get_cloud_provider_and_region(cloud_provider, region)  # type: ignore

    json_req = dict(
        host=host,
        database=database,
        table=table,
        protocol=protocol,
        port=port,
        username=username,
        password=password,
        index_column=index_column,
        timestamp_column=timestamp_column,
        filters=filters,
        region=region,
        cloud_provider=cloud_provider,
        tag=tag,
    )

    response = Client._post_data(
        relative_url=f"/datablob/from_clickhouse", json=json_req
    )

    return DataBlob(
        uuid=response["uuid"], type=response["type"], source=response["source"]
    )

from_local(path, cloud_provider=None, region=None, tag=None, show_progress=True) staticmethod

Create and return a datablob from local file.

The API currently allows users to create datablobs from CSV or Parquet files. We intend to support additional file formats in future releases.

Parameters:

Name Type Description Default
path Union[str, pathlib.Path]

The relative or absolute path to a local file or to a directory containing the source files.

required
cloud_provider Optional[str]

The destination cloud storage provider's name to store the datablob. Currently, the API only supports aws and azure as cloud storage providers. If None (default value), then aws will be used as the cloud storage provider.

None
region Optional[str]

The destination cloud provider's region to store the datablob. If None (default value) then the default region will be assigned based on the cloud provider. In the case of aws, eu-west-1 will be used and in the case of azure, westeurope will be used. The supported AWS regions are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1, us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast, brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast, japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia, switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.

None
tag Optional[str]

A string to tag the datablob. If not passed, then the tag latest will be assigned to the datablob.

None
show_progress Optional[bool]

Flag to set the progressbar visibility. If not passed, then the default value True will be used.

True

Returns:

Type Description
DataBlob

An instance of the DataBlob class.

Exceptions:

Type Description
ValueError

If parameters to the API are invalid.

ConnectionError

If the server address is invalid or not reachable.

Here's an example of how to create a Datablob from a local file:

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The region
# is set to eu-west-3, feel free to change the cloud provider and the region
# to suit your needs.
db = DataBlob.from_local(
    path="{fill in path}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
db.progress_bar()

# Print the details of the newly created datablob
# If the upload is successful, the ready flag should be set to True
print(db.details())
Source code in airt/client.py
@staticmethod
def from_local(
    path: Union[str, Path],
    cloud_provider: Optional[str] = None,
    region: Optional[str] = None,
    tag: Optional[str] = None,
    show_progress: Optional[bool] = True,
) -> "DataBlob":
    """Create and return a datablob from local file.

    The API currently allows users to create datablobs from CSV or Parquet files. We intend to support additional file formats in future releases.

    Args:
        path: The relative or absolute path to a local file or to a directory containing the source files.
        cloud_provider: The destination cloud storage provider's name to store the datablob. Currently, the API only supports **aws** and **azure** as cloud storage providers.
            If **None** (default value), then **aws**  will be used as the cloud storage provider.
        region: The destination cloud provider's region to store the datablob. If **None** (default value) then the default region will be assigned based on the cloud
            provider. In the case of **aws**, **eu-west-1** will be used and in the case of **azure**, **westeurope** will be used. The supported AWS regions
            are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1,
            us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast,
            brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast,
            japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia,
            switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.
        tag: A string to tag the datablob. If not passed, then the tag **latest** will be assigned to the datablob.
        show_progress: Flag to set the progressbar visibility. If not passed, then the default value **True** will be used.

    Returns:
       An instance of the `DataBlob` class.

    Raises:
        ValueError: If parameters to the API are invalid.
        ConnectionError: If the server address is invalid or not reachable.

    Here's an example of how to create a Datablob from a local file:

    Example:
        ```python
        # Importing necessary libraries
        from  airt.client import Client, DataBlob

        # Authenticate
        Client.get_token(username="{fill in username}", password="{fill in password}")

        # Create a datablob
        # In this example, the datablob will be stored in an AWS S3 bucket. The region
        # is set to eu-west-3, feel free to change the cloud provider and the region
        # to suit your needs.
        db = DataBlob.from_local(
            path="{fill in path}",
            cloud_provider="aws",
            region="eu-west-3"
        )

        # Display the status in a progress bar
        db.progress_bar()

        # Print the details of the newly created datablob
        # If the upload is successful, the ready flag should be set to True
        print(db.details())

        ```
    """
    path = Path(path)
    cloud_provider, region = DataBlob._get_cloud_provider_and_region(cloud_provider, region)  # type: ignore

    # Step 1: get presigned URL
    _path = f"local:{str(path)}"

    response = Client._post_data(
        relative_url=f"/datablob/from_local/start",
        json=dict(
            path=_path, region=region, cloud_provider=cloud_provider, tag=tag
        ),
    )

    # Step 2: download the csv to the s3 bucket
    files = list(path.glob("*")) if path.is_dir() else [path]

    # Initiate progress bar
    t = tqdm(total=len(files), disable=not show_progress)

    for file_to_upload in files:
        DataBlob._upload_to_s3_with_retry(
            file_to_upload=file_to_upload,
            presigned_url=response["presigned"]["url"],
            presigned_fields=response["presigned"]["fields"],
        )
        t.update()

    t.close()
    return DataBlob(uuid=response["uuid"], type=response["type"])

from_mysql(*, host, database, table, port=3306, cloud_provider=None, region=None, username=None, password=None, tag=None) staticmethod

Create and return a datablob that encapsulates the data from a mysql database.

If the database requires authentication, pass the username/password as parameters or store it in the AIRT_CLIENT_DB_USERNAME and AIRT_CLIENT_DB_PASSWORD environment variables.

Parameters:

Name Type Description Default
host str

Remote database host name.

required
database str

Database name.

required
table str

Table name.

required
port int

Host port number. If not passed, then the default value 3306 will be used.

3306
cloud_provider Optional[str]

The destination cloud storage provider's name to store the datablob. Currently, the API only supports aws and azure as cloud storage providers. If None (default value), then aws will be used as the cloud storage provider.

None
region Optional[str]

The destination cloud provider's region to store the datablob. If None (default value) then the default region will be assigned based on the cloud provider. In the case of aws, eu-west-1 will be used and in the case of azure, westeurope will be used. The supported AWS regions are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1, us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast, brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast, japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia, switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.

None
username Optional[str]

Database username. If not passed, the default value "root" will be used unless the value is explicitly set in the environment variable AIRT_CLIENT_DB_USERNAME.

None
password Optional[str]

Database password. If not passed, the default value "" will be used unless the value is explicitly set in the environment variable AIRT_CLIENT_DB_PASSWORD.

None
tag Optional[str]

A string to tag the datablob. If not passed, then the tag latest will be assigned to the datablob.

None

Returns:

Type Description
DataBlob

An instance of the DataBlob class.

Exceptions:

Type Description
ValueError

If parameters to the API are invalid.

ConnectionError

If the server address is invalid or not reachable.

Here's an example of how to create a Datablob from a MySQL database:

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The region
# is set to eu-west-3, feel free to change the cloud provider and the region
# to suit your needs.
db = DataBlob.from_mysql(
    username="{fill in database_username}",
    password="{fill in database_password}",
    host="{fill in host}",
    database="{fill in database}",
    table="{fill in table}",
    port="{fill in port}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
db.progress_bar()

# Print the details of the newly created datablob
# If the upload is successful, the ready flag should be set to True
print(db.details())
Source code in airt/client.py
@staticmethod
def from_mysql(
    *,
    host: str,
    database: str,
    table: str,
    port: int = 3306,
    cloud_provider: Optional[str] = None,
    region: Optional[str] = None,
    username: Optional[str] = None,
    password: Optional[str] = None,
    tag: Optional[str] = None,
) -> "DataBlob":
    """Create and return a datablob that encapsulates the data from a mysql database.

    If the database requires authentication, pass the username/password as parameters or store it in
    the **AIRT_CLIENT_DB_USERNAME** and **AIRT_CLIENT_DB_PASSWORD** environment variables.

    Args:
        host: Remote database host name.
        database: Database name.
        table: Table name.
        port: Host port number. If not passed, then the default value **3306** will be used.
        cloud_provider: The destination cloud storage provider's name to store the datablob. Currently, the API only supports **aws** and **azure** as cloud storage providers.
            If **None** (default value), then **aws**  will be used as the cloud storage provider.
        region: The destination cloud provider's region to store the datablob. If **None** (default value) then the default region will be assigned based on the cloud
            provider. In the case of **aws**, **eu-west-1** will be used and in the case of **azure**, **westeurope** will be used. The supported AWS regions
            are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1,
            us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast,
            brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast,
            japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia,
            switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.
        username: Database username. If not passed, the default value **"root"** will be used unless the value is explicitly set in the environment variable
            **AIRT_CLIENT_DB_USERNAME**.
        password: Database password. If not passed, the default value **""** will be used unless the value is explicitly set in the environment variable
            **AIRT_CLIENT_DB_PASSWORD**.
        tag: A string to tag the datablob. If not passed, then the tag **latest** will be assigned to the datablob.

    Returns:
       An instance of the `DataBlob` class.

    Raises:
        ValueError: If parameters to the API are invalid.
        ConnectionError: If the server address is invalid or not reachable.

    Here's an example of how to create a Datablob from a MySQL database:

    Example:
        ```python
        # Importing necessary libraries
        from  airt.client import Client, DataBlob

        # Authenticate
        Client.get_token(username="{fill in username}", password="{fill in password}")

        # Create a datablob
        # In this example, the datablob will be stored in an AWS S3 bucket. The region
        # is set to eu-west-3, feel free to change the cloud provider and the region
        # to suit your needs.
        db = DataBlob.from_mysql(
            username="{fill in database_username}",
            password="{fill in database_password}",
            host="{fill in host}",
            database="{fill in database}",
            table="{fill in table}",
            port="{fill in port}",
            cloud_provider="aws",
            region="eu-west-3"
        )

        # Display the status in a progress bar
        db.progress_bar()

        # Print the details of the newly created datablob
        # If the upload is successful, the ready flag should be set to True
        print(db.details())
        ```
    """
    username = (
        username
        if username is not None
        else os.environ.get(CLIENT_DB_USERNAME, "root")
    )

    password = (
        password if password is not None else os.environ.get(CLIENT_DB_PASSWORD, "")
    )

    cloud_provider, region = DataBlob._get_cloud_provider_and_region(cloud_provider, region)  # type: ignore

    json_req = dict(
        host=host,
        port=port,
        username=username,
        password=password,
        database=database,
        table=table,
        region=region,
        cloud_provider=cloud_provider,
        tag=tag,
    )

    response = Client._post_data(
        relative_url=f"/datablob/from_mysql", json=json_req
    )

    return DataBlob(
        uuid=response["uuid"], type=response["type"], source=response["source"]
    )

from_s3(*, uri, access_key=None, secret_key=None, cloud_provider=None, region=None, tag=None) staticmethod

Create and return a datablob that encapsulates the data from an AWS S3 bucket.

Parameters:

Name Type Description Default
uri str

AWS S3 bucket uri.

required
access_key Optional[str]

Access key for the S3 bucket. If None (default value), then the value from AWS_ACCESS_KEY_ID environment variable will be used.

None
secret_key Optional[str]

Secret key for the S3 bucket. If None (default value), then the value from AWS_SECRET_ACCESS_KEY environment variable will be used.

None
cloud_provider Optional[str]

The destination cloud storage provider's name to store the datablob. Currently, the API only supports aws and azure as cloud storage providers. If None (default value), then aws will be used as the cloud storage provider.

None
region Optional[str]

The region of the destination cloud provider where the datablob will be stored. If None (default value) then the default region will be assigned based on the cloud provider. In the case of aws, the datablob's source bucket region will be used, whereas azure will use westeurope. The supported AWS regions are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1, us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast, brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast, japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia, switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.

None
tag Optional[str]

A string to tag the datablob. If not passed, then the tag latest will be assigned to the datablob.

None

Returns:

Type Description
DataBlob

An instance of the DataBlob class.

Exceptions:

Type Description
ValueError

If parameters to the API are invalid.

ConnectionError

If the server address is invalid or not reachable.

Here's an example of how to create a Datablob from an AWS S3 bucket:

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the access_key and the secret_key are set in the
# AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables. The region
# is set to eu-west-3, feel free to change the cloud provider and the region
# to suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
db.progress_bar()

# Print the details of the newly created datablob
# If the upload is successful, the ready flag should be set to True
print(db.details())
Source code in airt/client.py
@staticmethod
def from_s3(
    *,
    uri: str,
    access_key: Optional[str] = None,
    secret_key: Optional[str] = None,
    cloud_provider: Optional[str] = None,
    region: Optional[str] = None,
    tag: Optional[str] = None,
) -> "DataBlob":
    """Create and return a datablob that encapsulates the data from an AWS S3 bucket.

    Args:
        uri: AWS S3 bucket uri.
        access_key: Access key for the S3 bucket. If **None** (default value), then the value
            from **AWS_ACCESS_KEY_ID** environment variable will be used.
        secret_key: Secret key for the S3 bucket. If **None** (default value), then the value
            from **AWS_SECRET_ACCESS_KEY** environment variable will be used.
        cloud_provider: The destination cloud storage provider's name to store the datablob. Currently, the API only supports **aws** and **azure** as cloud storage providers.
            If **None** (default value), then **aws**  will be used as the cloud storage provider.
        region: The region of the destination cloud provider where the datablob will be stored. If **None** (default value) then the default region will be assigned based on
            the cloud provider. In the case of **aws**, the datablob's source bucket region will be used, whereas **azure** will use **westeurope**. The supported AWS regions
            are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1,
            us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast,
            brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast,
            japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia,
            switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.
        tag: A string to tag the datablob. If not passed, then the tag **latest** will be assigned to the datablob.

    Returns:
        An instance of the `DataBlob` class.

    Raises:
        ValueError: If parameters to the API are invalid.
        ConnectionError: If the server address is invalid or not reachable.

    Here's an example of how to create a Datablob from an AWS S3 bucket:

    Example:
        ```python
        # Importing necessary libraries
        from  airt.client import Client, DataBlob

        # Authenticate
        Client.get_token(username="{fill in username}", password="{fill in password}")

        # Create a datablob
        # In this example, the access_key and the secret_key are set in the
        # AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables. The region
        # is set to eu-west-3, feel free to change the cloud provider and the region
        # to suit your needs.
        db = DataBlob.from_s3(
            uri="{fill in uri}",
            cloud_provider="aws",
            region="eu-west-3"
        )

        # Display the status in a progress bar
        db.progress_bar()

        # Print the details of the newly created datablob
        # If the upload is successful, the ready flag should be set to True
        print(db.details())
        ```
    """
    access_key = (
        access_key if access_key is not None else os.environ["AWS_ACCESS_KEY_ID"]
    )
    secret_key = (
        secret_key
        if secret_key is not None
        else os.environ["AWS_SECRET_ACCESS_KEY"]
    )

    cloud_provider, region = DataBlob._get_cloud_provider_and_region(cloud_provider=cloud_provider, region=region, set_source_region=True)  # type: ignore

    response = Client._post_data(
        relative_url="/datablob/from_s3",
        json=dict(
            uri=uri,
            access_key=access_key,
            secret_key=secret_key,
            region=region,
            cloud_provider=cloud_provider,
            tag=tag,
        ),
    )

    return DataBlob(
        uuid=response["uuid"], type=response["type"], source=response["source"]
    )

is_ready(self)

Check if the method's progress is complete.

Info

This method will return True immediately and will not wait for the progress to finish if the datablob is created using the from_local method.

Returns:

Type Description
bool

True if the upload progress is completed, else False.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())
Source code in airt/client.py
def is_ready(self) -> bool:
    """Check if the method's progress is complete.

    !!! info

        This method will return `True` immediately and will not wait for the progress to finish
        if the datablob is created using the `from_local` method.

    Returns:
        **True** if the upload progress is completed, else **False**.
    """
    if self.type in ["local"]:
        return True

    progress_status = ProgressStatus(relative_url=f"/datablob/{self.uuid}")

    return progress_status.is_ready()

ls(offset=0, limit=100, disabled=False, completed=False) staticmethod

Return the list of DataBlob instances

Parameters:

Name Type Description Default
offset int

The number of datablobs to offset at the beginning. If None, then the default value 0 will be used.

0
limit int

The maximum number of datablobs to return from the server. If None, then the default value 100 will be used.

100
disabled bool

If set to True, then only the deleted datablobs will be returned. Else, the default value False will be used to return only the list of active datablobs.

False
completed bool

If set to True, then only the datablobs that are successfully downloaded to the server will be returned. Else, the default value False will be used to return all the datablobs.

False

Returns:

Type Description
List[DataBlob]

A list of DataBlob instances available in the server.

Exceptions:

Type Description
ConnectionError

If the server address is invalid or not reachable.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())
Source code in airt/client.py
@staticmethod
def ls(
    offset: int = 0,
    limit: int = 100,
    disabled: bool = False,
    completed: bool = False,
) -> List["DataBlob"]:
    """Return the list of DataBlob instances

    Args:
        offset: The number of datablobs to offset at the beginning. If **None**,
            then the default value **0** will be used.
        limit: The maximum number of datablobs to return from the server. If **None**,
            then the default value **100** will be used.
        disabled: If set to **True**, then only the deleted datablobs will be returned.
            Else, the default value **False** will be used to return only the list
            of active datablobs.
        completed: If set to **True**, then only the datablobs that are successfully downloaded
            to the server will be returned. Else, the default value **False** will be used to
            return all the datablobs.

    Returns:
        A list of DataBlob instances available in the server.

    Raises:
        ConnectionError: If the server address is invalid or not reachable.
    """
    lists = Client._get_data(
        relative_url=f"/datablob/?disabled={disabled}&completed={completed}&offset={offset}&limit={limit}"
    )

    dbx = [
        DataBlob(
            uuid=db["uuid"],
            type=db["type"],
            source=db["source"],
            region=db["region"],
            cloud_provider=db["cloud_provider"],
            datasources=db["datasources"],
            total_steps=db["total_steps"],
            completed_steps=db["completed_steps"],
            folder_size=db["folder_size"],
            disabled=db["disabled"],
            pulled_on=db["pulled_on"],
            user=db["user"],
            tags=db["tags"],
            error=db["error"],
        )
        for db in lists
    ]

    return dbx

progress_bar(self, sleep_for=5, timeout=0)

Blocks the execution and displays a progress bar showing the remote action progress.

Info

This method will not check the progress if the datablob is created using the from_local method.

Parameters:

Name Type Description Default
sleep_for Union[int, float]

The time interval in seconds between successive API calls.

5
timeout int

The maximum time allowed in seconds for the asynchronous call to complete. If not the progressbar will be terminated.

0

Exceptions:

Type Description
ConnectionError

If the server address is invalid or not reachable.

TimeoutError

in case of connection timeout.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())
Source code in airt/client.py
def progress_bar(self, sleep_for: Union[int, float] = 5, timeout: int = 0):
    """Blocks the execution and displays a progress bar showing the remote action progress.

    !!! info

        This method will not check the progress if the datablob is created using the
        `from_local` method.

    Args:
        sleep_for: The time interval in seconds between successive API calls.
        timeout: The maximum time allowed in seconds for the asynchronous call to complete. If not the
            progressbar will be terminated.

    Raises:
        ConnectionError: If the server address is invalid or not reachable.
        TimeoutError: in case of connection timeout.
    """
    if self.type not in ["local"]:
        progress_status = ProgressStatus(
            relative_url=f"/datablob/{self.uuid}",
            sleep_for=sleep_for,
            timeout=timeout,
        )

        progress_status.progress_bar()

set_default_cloud_provider(cls, cloud_provider, region=None)

Sets the default destination value for the cloud_provider and the region.

Whenever you call the from_* methods of the DataBlob class inside this context manager, the destination cloud_provider and region set in this context will be passed to the from_* methods, unless you explicitely override it in the parameter.

Parameters:

Name Type Description Default
cloud_provider str

The destination cloud storage provider's name to store the datablob. Currently, the API only supports aws and azure as cloud storage providers.

required
region Optional[str]

The destination cloud provider's region to store the datablob. The supported AWS regions are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1, us-east-1, us-east-2, us-west-1, us-west-2. The supported Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast, brazilsouth, canadacentral, canadaeast, centralindia, centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast, japanwest, koreacentral, koreasouth, northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia, switzerlandnorth, switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.

None

Returns:

Type Description
Iterator[NoneType]

A context manager that specifies the cloud provider and region to use.

Here's an example of creating a datablob from Azure Blob Storage and storing it in AWS S3:

Examples:

# Importing necessary libraries
import os

from azure.identity import DefaultAzureCredential
from azure.mgmt.storage import StorageManagementClient

from  airt.client import Client, DataBlob

# Create a credential for accessing Azure Blob Storage
# Setting the required environment variables
os.environ["AZURE_SUBSCRIPTION_ID"] = "{fill in azure_subscription_id}"
os.environ["AZURE_CLIENT_ID"] = "{fill in azure_client_id}"
os.environ["AZURE_CLIENT_SECRET"] = "{fill in azure_client_secret}"
os.environ["AZURE_TENANT_ID"]= "{fill in azure_tenant_id}"

# Setting the resource group name and storage account name
azure_group_name = "{fill in azure_group_name}"
azure_storage_account_name = "{fill in azure_storage_account_name}"

# Retrieving the credential
azure_storage_client = StorageManagementClient(
    DefaultAzureCredential(), os.environ["AZURE_SUBSCRIPTION_ID"]
)
azure_storage_keys = azure_storage_client.storage_accounts.list_keys(
    azure_group_name, azure_storage_account_name
)
azure_storage_keys = {v.key_name: v.value for v in azure_storage_keys.keys}
credential = azure_storage_keys['key1']


# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablobs created inside the context manager will be
# stored in an AWS S3 bucket with the region set to eu-west-3.
with DataBlob.set_default_cloud_provider(
    cloud_provider="aws",
    region="eu-west-3"
):
    db = DataBlob.from_azure_blob_storage(
        uri="{fill in uri}",
        credential=credential
    )

# Display the status in a progress bar
db.progress_bar()

# Print the details of the newly created datablob
# If the upload is successful, the ready flag should be set to True
print(db.details())
Source code in airt/client.py
@patch(cls_method=True)
@contextmanager
def set_default_cloud_provider(
    cls: DataBlob, cloud_provider: str, region: Optional[str] = None
) -> Iterator[None]:
    """Sets the default destination value for the cloud_provider and the region.

    Whenever you call the from_\* methods of the `DataBlob` class inside this context manager, the destination cloud_provider and region set in this context
    will be passed to the from_\* methods, unless you explicitely override it in the parameter.

    Args:
        cloud_provider: The destination cloud storage provider's name to store the datablob. Currently, the API only supports **aws** and **azure** as cloud storage providers.
        region: The destination cloud provider's region to store the datablob. The supported AWS regions are: ap-northeast-1, ap-northeast-2, ap-south-1, ap-southeast-1,
            ap-southeast-2, ca-central-1, eu-central-1, eu-north-1, eu-west-1, eu-west-2, eu-west-3, sa-east-1, us-east-1, us-east-2, us-west-1, us-west-2. The supported
            Azure Blob Storage regions are: australiacentral, australiacentral2, australiaeast, australiasoutheast, brazilsouth, canadacentral, canadaeast, centralindia,
            centralus, eastasia, eastus, eastus2, francecentral, francesouth, germanynorth, germanywestcentral, japaneast, japanwest, koreacentral, koreasouth,
            northcentralus, northeurope, norwayeast, norwaywest, southafricanorth, southafricawest, southcentralus, southeastasia, southindia, switzerlandnorth,
            switzerlandwest, uaecentral, uaenorth, uksouth, ukwest, westcentralus, westeurope, westindia, westus, westus2.

    Returns:
        A context manager that specifies the cloud provider and region to use.

    Here's an example of creating a datablob from Azure Blob Storage and storing it in AWS S3:

    Example:
        ```python
        # Importing necessary libraries
        import os

        from azure.identity import DefaultAzureCredential
        from azure.mgmt.storage import StorageManagementClient

        from  airt.client import Client, DataBlob

        # Create a credential for accessing Azure Blob Storage
        # Setting the required environment variables
        os.environ["AZURE_SUBSCRIPTION_ID"] = "{fill in azure_subscription_id}"
        os.environ["AZURE_CLIENT_ID"] = "{fill in azure_client_id}"
        os.environ["AZURE_CLIENT_SECRET"] = "{fill in azure_client_secret}"
        os.environ["AZURE_TENANT_ID"]= "{fill in azure_tenant_id}"

        # Setting the resource group name and storage account name
        azure_group_name = "{fill in azure_group_name}"
        azure_storage_account_name = "{fill in azure_storage_account_name}"

        # Retrieving the credential
        azure_storage_client = StorageManagementClient(
            DefaultAzureCredential(), os.environ["AZURE_SUBSCRIPTION_ID"]
        )
        azure_storage_keys = azure_storage_client.storage_accounts.list_keys(
            azure_group_name, azure_storage_account_name
        )
        azure_storage_keys = {v.key_name: v.value for v in azure_storage_keys.keys}
        credential = azure_storage_keys['key1']


        # Authenticate
        Client.get_token(username="{fill in username}", password="{fill in password}")

        # Create a datablob
        # In this example, the datablobs created inside the context manager will be
        # stored in an AWS S3 bucket with the region set to eu-west-3.
        with DataBlob.set_default_cloud_provider(
            cloud_provider="aws",
            region="eu-west-3"
        ):
            db = DataBlob.from_azure_blob_storage(
                uri="{fill in uri}",
                credential=credential
            )

        # Display the status in a progress bar
        db.progress_bar()

        # Print the details of the newly created datablob
        # If the upload is successful, the ready flag should be set to True
        print(db.details())
        ```
    """

    cls._default_provider_and_regions.append((cloud_provider, region))  # type: ignore

    yield

    cls._default_provider_and_regions.pop()

tag(self, name)

Tag an existing datablob in the server.

Parameters:

Name Type Description Default
name str

A string to tag the datablob.

required

Returns:

Type Description
DataFrame

A pandas dataframe with the details of the tagged datablob.

Exceptions:

Type Description
ConnectionError

If the server address is invalid or not reachable.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())
Source code in airt/client.py
@patch
def tag(self: DataBlob, name: str) -> pd.DataFrame:
    """Tag an existing datablob in the server.

    Args:
        name: A string to tag the datablob.

    Returns:
        A pandas dataframe with the details of the tagged datablob.

    Raises:
        ConnectionError: If the server address is invalid or not reachable.
    """
    response = Client._post_data(
        relative_url=f"/datablob/{self.uuid}/tag", json=dict(name=name)
    )

    response = DataBlob._get_tag_name_and_datasource_id(response)

    df = pd.DataFrame([response])[DataBlob.BASIC_DB_COLS]

    df = df.rename(columns=DataBlob.COLS_TO_RENAME)

    return add_ready_column(df)

to_datasource(self, *, file_type, index_column, sort_by, deduplicate_data=False, blocksize='256MB', **kwargs)

Process the datablob and return a datasource object.

Parameters:

Name Type Description Default
file_type str

The file type of the datablob. Currently, the API only supports "csv" and "parquet" as file types.

required
index_column str

The column to use as index (row labels).

required
sort_by Union[str, List[str]]

The column(s) to sort the data. Can either be a string or a list of strings.

required
deduplicate_data bool

If set to True (default value False), the datasource will be created with duplicate rows removed.

False
blocksize str

The number of bytes used to split larger files. If None, then the default value 256MB will be used.

'256MB'
kwargs

Additional keyword arguments to use while processing the data.e.g: To skip 100 lines from the bottom of file, pass **{"skipfooter": 100}

{}

Returns:

Type Description
DataSource

An instance of the DataSource class.

Exceptions:

Type Description
ValueError

If the CSV file processing fails.

ConnectionError

If the server address is invalid or not reachable.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())
Source code in airt/client.py
@patch
def to_datasource(
    self: DataBlob,
    *,
    file_type: str,
    index_column: str,
    sort_by: Union[str, List[str]],
    deduplicate_data: bool = False,
    blocksize: str = "256MB",
    **kwargs,
) -> DataSource:
    """Process the datablob and return a datasource object.

    Args:
        file_type: The file type of the datablob. Currently, the API only supports **"csv"** and **"parquet"** as file types.
        index_column: The column to use as index (row labels).
        sort_by: The column(s) to sort the data. Can either be a string or a list of strings.
        deduplicate_data: If set to **True** (default value **False**), the datasource will be created with duplicate rows removed.
        blocksize: The number of bytes used to split larger files. If None, then the default value **256MB** will be used.
        kwargs: Additional keyword arguments to use while processing the data.e.g: To skip 100 lines from the bottom of file,
            pass **{"skipfooter": 100}

    Returns:
        An instance of the `DataSource` class.

    Raises:
        ValueError: If the CSV file processing fails.
        ConnectionError: If the server address is invalid or not reachable.
    """
    json_req = dict(
        file_type=file_type,
        deduplicate_data=deduplicate_data,
        index_column=index_column,
        sort_by=sort_by,
        blocksize=blocksize,
        kwargs=kwargs,
    )
    response = Client._post_data(
        relative_url=f"/datablob/{self.uuid}/to_datasource", json=json_req
    )

    return DataSource(uuid=response["uuid"])

wait(self, sleep_for=1, timeout=0)

Blocks execution while waiting for the remote action to complete.

Info

This method will not check the progress if the datablob is created using the from_local method.

Parameters:

Name Type Description Default
sleep_for Union[int, float]

The time interval in seconds between successive API calls.

1
timeout int

The maximum time allowed in seconds for the asynchronous call to complete. If not the progressbar will be terminated.

0

Exceptions:

Type Description
ConnectionError

If the server address is invalid or not reachable.

TimeoutError

in case of timeout.

Examples:

# Importing necessary libraries
from  airt.client import Client, DataBlob

# Authenticate
Client.get_token(username="{fill in username}", password="{fill in password}")

# Create a datablob
# In this example, the datablob will be stored in an AWS S3 bucket. The
# access_key and the secret_key are set in the AWS_ACCESS_KEY_ID and
# AWS_SECRET_ACCESS_KEY environment variables, and the region is set to
# eu-west-3; feel free to change the cloud provider and the region to
# suit your needs.
db = DataBlob.from_s3(
    uri="{fill in uri}",
    cloud_provider="aws",
    region="eu-west-3"
)

# Display the status in a progress bar
# Call the wait method to wait for the progress to finish but
# without displaying an interactive progress bar.
db.progress_bar()

# Display the ready status
# If the datablob is successfully uploaded, True will be returned.
print(db.is_ready())

# Print the details of the newly created datablob
print(db.details())

# Display the details of all datablob created by the currently
# logged-in user
print(DataBlob.as_df(DataBlob.ls()))

# Create a datasource
ds = db.to_datasource(
    file_type="{fill in file_type}",
    index_column="{fill in index_column}",
    sort_by="{fill in sort_by}",
)

# Display the status in a progress bar
ds.progress_bar()

# Display the head of the data to ensure everything is fine.
print(ds.head())

# Tag the datablob
print(db.tag(name="{fill in tag_name}"))

# Delete the datablob
print(db.delete())
Source code in airt/client.py
def wait(self, sleep_for: Union[int, float] = 1, timeout: int = 0):
    """Blocks execution while waiting for the remote action to complete.

    !!! info

        This method will not check the progress if the datablob is created using the
        `from_local` method.

    Args:
        sleep_for: The time interval in seconds between successive API calls.
        timeout: The maximum time allowed in seconds for the asynchronous call to complete. If not the
            progressbar will be terminated.

    Raises:
        ConnectionError: If the server address is invalid or not reachable.
        TimeoutError: in case of timeout.
    """
    if self.type not in ["local"]:
        progress_status = ProgressStatus(
            relative_url=f"/datablob/{self.uuid}",
            sleep_for=sleep_for,
            timeout=timeout,
        )

        progress_status.wait()