Markov Quality Score

Estimate your data quality using MarkovML trust estimate.

Markov Quality Score helps you measure the potentially mislabeled records in the dataset. You can download the dataset with the estimates with Markov SDK.

Make sure to Register Datasets with MarkovML first. Currently, only text and numeric datasets are supported​. You can find the overall Markov Quality Score at the top right corner of the Dataset Details page.

Getting Markov Quality Score with Markov SDK

Use the Markov SDK to fetch the Markov Quality Score for a dataset. Fetch the dataset you are interested in by its name. Get the dataset's quality metrics and store it as a DataFrame. Finally, obtain a direct download link for the data quality DataFrame using the dataset.url feature.

The DataFrame includes the following columns:

  1. is_label_issue: Shows if there are problems with dataset labels (True/False).
  2. label_quality: Rates the quality of the dataset labels (numerical score).
  3. Other columns are the original labels (target) and features the user selected during dataset registration.

Sample Code

import markov  

#Fetch the dataset by name
dataset = markov.dataset.get_by_name(dataset_name="Sentiment Analysis Tweets")

# Access the data quality information
data_quality = dataset.quality  

# Access the data quality metrics as a DataFrame
data_quality.df  

# Retrieve a direct download link for data  quality data frame
data_quality.url  

Sample Result

      is_label_issue  label_quality  ...  text                                                    feeling  
0              False       0.818080  ...  im feeling rather rotten, so I'm not very ambitious...  sadness  
1              False       0.789854  ...  im updating my blog because I feel shitty               sadness  
1998           False       0.052096  ...  i keep feeling like someone is being unki...            anger  
1999            True       0.123182  ...  i feel all weird when i have to meet w people ...       fear

What’s Next