Data Quality

Estimate your data quality using MarkovML trust estimate.

Label Trust Estimate
Label Trust Estimate measures the potentially mislabeled records in the dataset. You can download the dataset with the estimates using the MarkovML SDK.

📘

Make sure to Register Datasets with MarkovML. Currently, only text datasets are supported​. You can find overall label quality estimate on the top right.

Code

import markov  

dataset = markov.dataset.get_by_name(dataset_name="Sentiment Analysis Tweets")

# Access the data quality information

data_quality = dataset.quality  

# Access the data quality metrics as a DataFrame

data_quality.df  

# Retrieve a direct download link for data  quality data frame

data_quality.url  

Sample Result

      is_label_issue  label_quality  ...  text                                                    feeling  
0              False       0.818080  ...  im feeling rather rotten, so I'm not very ambitious...  sadness  
1              False       0.789854  ...  im updating my blog because I feel shitty               sadness  
1998           False       0.052096  ...  i keep feeling like someone is being unki...            anger  
1999            True       0.123182  ...  i feel all weird when i have to meet w people ...       fear

The data frame following columns

  • is_label_issue: A boolean indicating whether there are issues with the labels in the dataset.
  • label_quality: A numerical score representing the quality of the labels in the dataset.
  • Other columns are the original labels (target) and feature the user selected during dataset registration.