Compare Datasets

Visualize and highlight similarities and differences amongst multiple datasets

MarkovML allows the comparison of a primary dataset against multiple datasets. The comparison consists of the following:

Steps

  1. Follow Register Datasets before initiating dataset comparison. You will need dataset IDs (ds_id) to trigger a comparison. Only registered datasets can be compared.
  2. Trigger a comparison run between a primary dataset and multiple secondary datasets.
import markov  

# Trigger dataset comparison from SDK

# Get primary dataset

primary_dataset = markov.dataset.get_by_id("ds_id1")  

# comparison

primary_dataset.compare(compare_input=["ds_id2", "ds_id3", "ds_id4"])

It takes some time to finish the comparision. Once the comparision is complete, you can see the comparision results on the Runs Page. You'll get a notification in the email when the comparision is complete.

View Dataset Comparison Jobs

Any compute jobs that run for Dataset Comparison will be listed on the Runs page.

Dataset comparison jobs are listed on the Runs page

Dataset comparison jobs are listed on the Runs page

.