Skip to content

Tutorial

The airt-client library has the following main classes:

  • Client for authenticating and accessing the airt service,

  • DataBlob for encapsulating the data from sources like CSV files, databases, Azure Blob Storage, or AWS S3 bucket, and

  • DataSource for managing datasources and training the models in the airt service.

We import them from airt.client module as follows:

from airt.client import Client, DataBlob, DataSource

Authentication

To access the airt service, you must create a developer account. Please fill out the signup form below to get one:

Upon successful verification, you will receive the username/password for the developer account in an email.

Finally, you need an application token to access all the APIs in airt service. Please call the Client.get_token method with the username/password to get one.

You can either pass the username, password, and server address as parameters to the Client.get_token method or store the same in the AIRT_SERVICE_USERNAME, AIRT_SERVICE_PASSWORD, and AIRT_SERVER_URL environment variables.

After successful authentication, the airt services will be available to access.

As an additional layer of security, we also support Multi-Factor Authentication (MFA) and Single sign-on (SSO) for generating tokens.

Multi-Factor Authentication (MFA) can be used to help protect your account from unauthorized access by requiring you to enter an additional code when you request a new token.

Once authenticated successfully, activating MFA for your account is a simple two step process:

  1. Enable MFA for your account by calling User.enable_mfa method which will generate a QR code. You can then use an authenticator app, such as Google Authenticator to scan the QR code.

  2. Activate the MFA by calling User.activate_mfa method and passing the dynamically generated six-digit verification code from the authenticator app.

Once MFA is successfully activated, you need to pass the dynamically generated six-digit verification code along with the username/password to Client.get_token method for generating new tokens.

You can also disable MFA for you account anytime by calling User.disable_mfa method.

Once authenticated successfully, you can also enable Single sign-on (SSO) for your account. Currently, we support only Google and Github as the external authentication providers (SSO). More authentication providers will be supported in the upcoming releases.

Authenticating using Single sign-on (SSO) is also a three-step process:

  1. Enable the SSO provider by calling the User.enable_sso method with a valid SSO provider and an email address.

  2. To get the token, you must have to complete the SSO authentication with the provider. Calling the Client.get_token method with a valid SSO provider will give you an authorization URL. Please copy and paste it into your preferred browser and initiate the authentication and authorization process with the SSO provider.

  3. Once the authentication is successful, calling the Client.set_sso_token method gets a new developer token and will implicitly use in all the interactions with the airt server.

Info

In the below example, the username, password, and server address are stored in AIRT_SERVICE_USERNAME, AIRT_SERVICE_PASSWORD, and AIRT_SERVER_URL environment variables.

Client.get_token()

1. Data Blob

DataBlob objects are used to encapsulate data access. Currently, we support:

  • access for local CSV files,

  • database access for MySql, ClickHouse, and

  • files stored in AWS S3 bucket.

We plan to support other databases and storage medium in the future.

To create a DataBlob object, you can call either DataBlob.from_local, DataBlob.from_mysql, DataBlob.from_clickhouse, or DataBlob.from_s3 static methods which imports the data from:

  • a local CSV file,

  • a MySql database,

  • a ClickHouse database, and

  • an AWS S3 bucket in the Parquet file format respectively.

Example:

data_blob = DataBlob.from_mysql(
    host="db.staging.airt.ai",
    database="test",
    table="events"
)

data_blob = DataBlob.from_s3(
    uri="s3://test-airt-service/ecommerce_behavior_csv"
)

The above methods will automatically pull the data into airt server and all calls to the library are asynchronous and they return immediately.

To manage completion, all methods will return a status object indicating the status of the completion. Alternatively, you can monitor the completion status interactively in a progress bar by calling the DataBlob.progress_bar method:

data_blob.progress_bar()
100%|██████████| 1/1 [00:35<00:00, 35.67s/it]
assert data_blob.is_ready()

The next step is to preprocess the data. We currently support preprocessing of CSV and Parquet files. Please use the DataBlob.to_datasource method in the DataBlob class for the same. Support for more file formats will be added in the future.

data_source = data_blob.to_datasource(
    file_type="csv",
    index_column="user_id",
    sort_by="event_time"
)

data_source.progress_bar()
100%|██████████| 1/1 [00:35<00:00, 35.36s/it]

After completition, you can display a head of the data to make sure everything is fine:

data_source.head()
event_time event_type product_id category_id category_code brand price user_session
user_id
10300217 2019-11-06 06:51:52+00:00 view 26300219 2053013563424899933 None sokolov 40.54 d1fdcbf1-bb1f-434b-8f1a-4b77f29a84a0
253299396 2019-11-05 21:25:44+00:00 view 2400724 2053013563743667055 appliances.kitchen.hood bosch 246.85 b097b84d-cfb8-432c-9ab0-a841bb4d727f
253299396 2019-11-05 21:27:43+00:00 view 2400724 2053013563743667055 appliances.kitchen.hood bosch 246.85 b097b84d-cfb8-432c-9ab0-a841bb4d727f
272811580 2019-11-05 19:38:48+00:00 view 3601406 2053013563810775923 appliances.kitchen.washer beko 195.60 d18427ab-8f2b-44f7-860d-a26b9510a70b
272811580 2019-11-05 19:40:21+00:00 view 3601406 2053013563810775923 appliances.kitchen.washer beko 195.60 d18427ab-8f2b-44f7-860d-a26b9510a70b
288929779 2019-11-06 05:39:21+00:00 view 15200134 2053013553484398879 None racer 55.86 fc582087-72f8-428a-b65a-c2f45d74dc27
288929779 2019-11-06 05:39:34+00:00 view 15200134 2053013553484398879 None racer 55.86 fc582087-72f8-428a-b65a-c2f45d74dc27
310768124 2019-11-05 20:25:52+00:00 view 1005106 2053013555631882655 electronics.smartphone apple 1422.31 79d8406f-4aa3-412c-8605-8be1031e63d6
315309190 2019-11-05 23:13:43+00:00 view 31501222 2053013558031024687 None dobrusskijfarforovyjzavod 115.18 e3d5a1a4-f8fd-4ac3-acb7-af6ccd1e3fa9
339186405 2019-11-06 07:00:32+00:00 view 1005115 2053013555631882655 electronics.smartphone apple 915.69 15197c7e-aba0-43b4-9f3a-a815e31ade40

2. Training

The prediction engine is specialized for predicting which clients are most likely to have a specified event in future.

We assume the input data includes the following:

  • a column identifying a client client_column (person, car, business, etc.),

  • a colum specifying a type of event we will try to predict target_column (buy, checkout, click on form submit, etc.), and

  • a timestamp column specifying the time of an occured event.

Each row in the data might have additional columns of int, category, float or datetime type and they will be used to make predictions more accurate. E.g. there could be a city associated with each user or type, credit card used for a transaction, smartphone model used to access a mobile app, etc.

Finally, we need to know how much ahead we wish to make predictions for. E.g. if we predict that a client is most likely to buy a product in the next minute, there is not much we can do anyway. We might be more interested in clients that are most likely to buy a product tomorrow so we can send them a special offer or engage them in some other way. That lead time varies widely from application to application and can be in minutes for a web shop or even several weeks for a banking product such as loan. In any case, there is a parameter predict_after that allows you to specify the time period based on your particular needs.

The DataSource.train method is asynchronous and can take a few hours to finish depending on your dataset size. You can check the status by calling the Model.is_ready method or monitor the completion progress interactively by calling the Model.progress_bar method.

In the following example, we will train a model to predict which users will perform a purchase event (*purchase) 3 hours before they acctually do it:

from datetime import timedelta

model = data_source.train(
    client_column="user_id",
    target_column="event_type",
    target="*purchase",
    predict_after=timedelta(hours=3),
)

model.progress_bar()
100%|██████████| 5/5 [00:00<00:00, 140.52it/s]
assert model.is_ready()

After training is complete, you can check the quality of the model by calling the Model.evaluate method.

model.evaluate()
eval
accuracy 0.985
recall 0.962
precision 0.934

3. Predictions

Finally, you can run the predictions by calling the Model.predict method.

The Model.predict method is asynchronous and can take a few hours to finish depending on your dataset size. You can check the status by calling the Prediction.is_ready method or monitor the completion progress interactively by calling the Prediction.progress_bar method.

predictions = model.predict()

predictions.progress_bar()
100%|██████████| 3/3 [00:10<00:00,  3.38s/it]
assert predictions.is_ready()

If the dataset is small enough, you can download the prediction results as a Pandas DataFrame by calling the Prediction.to_pandas method:

predictions.to_pandas()
Score
user_id
520088904 0.979853
530496790 0.979157
561587266 0.979055
518085591 0.978915
558856683 0.977960
520772685 0.004043
514028527 0.003890
518574284 0.001346
532364121 0.001341
532647354 0.001139

In many cases, it's much better to push the prediction results to destinations like AWS S3, MySql database, or even download it to the local machine.

Below is an example to push the prediction results to an s3 bucket. For other available options, please check the documentation of the Prediction class.

status = predictions.to_s3(uri=TARGET_S3_BUCKET)

status.progress_bar()
100%|██████████| 1/1 [00:10<00:00, 10.12s/it]

Back to top