Configure Cloud Access

📘

Only for cloud imports

This section is only relevant if importing dataset files from a cloud storage provider. If you wish to upload dataset files from your local filesystem, proceed to the next guide.

If you have datasets stored in the cloud, you will need to provide access in order to register them with MarkovML. You can create access credentials with your cloud provider and add them to your MarkovML workspace, where they will be stored securely.

Register Cloud Access Credentials

🚧

Only AWS support

Currently, MarkovML only supports importing datasets from AWS S3. Support for other cloud storage providers is on our roadmap.

With MarkovML, you can analyze your datasets stored in AWS S3. Just specify the S3 location for your dataset, or for each segment if your dataset is segmented.

If your dataset is hosted securely on S3, you must provide MarkovML with credentials to access your S3 bucket. You can register your S3 ACCESS_KEY and ACCESS_SECRET with MarkovML once and reuse the credentials to access other datasets in the future.

Register Access Credentials Using SDK

NOTE: If you've already registered a credential using UI, you do not need to re-register the credential again from the SDK.

You can retrieve any existing credential by name using this.

from markov.api.credentials.credential import CredentialManager

cred_response = CredentialManager.find_with_name('hatespeech')

The code example below illustrates registering an S3 credential with MarkovML using the Python SDK.

from markov.api.credentials.credential import S3Credentials, CredentialManager

s3_cred = S3Credentials(name='hatespeech',
                        details='Access credentials for the HateSpeech dataset',
                        access_key='<YOUR_S3_ACCESS_KEY>',
                        access_secret='<YOUR_S3_ACCESS_SECRET>')

cred_response = CredentialManager().register_s3_cred(s3_cred)

# check if the credential_id was successfully registered with Markov and handle it
if cred_response.is_ok():
   credential_id = cred_response.credential_id
else:
    raise f'Unable to register credentials. Error {cred_response.message}'

Note: For security, we do not allow the retrieval of original cloud credentials through the SDK. You can use credential_id returned.

Register Access Credentials Using Web UI

You can add new access credentials to register a dataset from the MarkovML web application as part of the workflow.

Once logged in, navigate to the Datasets page. Click the "Add New Dataset" button at the top of the screen.
Begin workflow to add a new dataset.

Add dataset using the button

Add dataset using the button

To register dataset stored in the cloud, click on the option that says My dataset is stored in the cloud

The option Cloud Storage should be selected by default. Click on the Add new in the Access Credentials dropdown menu below, and you'll see an option to add a new credential.

In the dialog to add new credentials, specify the cloud storage type and enter the required access information. You'll need to provide a unique name for the credential and may also give the credential a brief description if desired. Click the Save button when everything looks good.

Congratulations, your credentials have been securely stored with MarkovML! You can now proceed to register a dataset or register a data family.