Data Quality

Label Trust Estimate
Label Trust Estimate measures the potentially mislabeled records in the dataset. You can download the dataset with the estimates using the MarkovML SDK.

📘
Make sure to Register Datasets with MarkovML. Currently, only text datasets are supported. You can find overall label quality estimate on the top right.

Code

import markov  

dataset = markov.dataset.get_by_name(dataset_name="Sentiment Analysis Tweets")

# Access the data quality information

data_quality = dataset.quality  

# Access the data quality metrics as a DataFrame

data_quality.df  

# Retrieve a direct download link for data  quality data frame

data_quality.url

Sample Result

      is_label_issue  label_quality  ...  text                                                    feeling  
0              False       0.818080  ...  im feeling rather rotten, so I'm not very ambitious...  sadness  
1              False       0.789854  ...  im updating my blog because I feel shitty               sadness  
1998           False       0.052096  ...  i keep feeling like someone is being unki...            anger  
1999            True       0.123182  ...  i feel all weird when i have to meet w people ...       fear

The data frame following columns

is_label_issue: A boolean indicating whether there are issues with the labels in the dataset.
label_quality: A numerical score representing the quality of the labels in the dataset.
Other columns are the original labels (target) and feature the user selected during dataset registration.