Register Data Family

Organize or group similar datasets under same data family.

Register your Data Family

Data families in MarkovML help organize or group related datasets. Think of a data family as a virtual folder containing all versions of datasets related to a specific topic.

For example, if you have datasets for sentiment analysis, you can group them under a "Sentiment Analysis" data family. Remember, once you register a dataset in MarkovML, you can't update it.

You can create a data family either through the Web UI or using the Markov SDK. Make sure to create a data family before registering any related datasets.

1. Creating a Data Family Using the MarkovML Web UI

Follow the below step to create a new data family in MarkovML through the UI:

  1. Log In: Sign in to your MarkovML account.
  2. Navigate to Dataset Page: Click on Dataset and then on Add New Dataset.
  3. Proceed to Dataset Details: After choosing analyzers, click Next. A pop-up will appear asking for dataset details.
  4. Choose or Create Data Family: Add the dataset to an existing data family or create a new one by selecting Add new.
  5. Name and Describe: Give your data family a unique name and, if you want, a short description.
  6. Save: Click Save to complete the process.

2. Create a Data Family Using the Markov SDK

You can create a new data family directly from the Markov SDK using themarkov.data.register_datafamily()method by providing the following info:

  1. name: Give a unique name to the data family.
  2. notes: Add notes or descriptions for future references. (optional)
  3. lang: Set it as "en_us" if the dataset content is written in US English.
  4. source: Write the source name, such as Kaggle, for future reference. (optional)

Sample Code

import markov

# Create a new data family for the dataset
df_reg_resp = markov.data.register_datafamily(  
    name="Hate Speech Data Family",      # Unique Data Family Name
    notes="This is a data family for hate speech datasets",  
    lang="en-us",  
    source="SOURCE_OF_THIS_DATASET",#e.g kaggle, customer_alpha, annotation_  
)

Now that you have successfully created a data family let's move on to registering your dataset with MarkovML.


What’s Next