Datasets & Data Families

A Dataset represents a collection of examples, where each example consists of one or more variables along with a label or target.

By registering your datasets with MarkovML, you gain insights into essential characteristics such as distributions, correlations between columns, frequency of empty values, and more. This analysis aids in understanding your data better and facilitates informed decision-making during model training and evaluation processes.

MarkovML Datasets may be segmented to distinguish data used to train, test, or validate a model.

MarkovML Datasets may be segmented to distinguish data used to train, test, or validate a model.

A single MarkovML Dataset can be segmented or unsegmented. ML engineers frequently divide datasets into segments to train, test, and/or validate a model. MarkovML allows you to specify different dataset segments and provides insights into how your train, test, and validate segments compare.

Data Family

Data Family  is a collection of datasets.

Data Family is a collection of datasets.

To help keep your datasets organized, each dataset you register with MarkovML is associated with a data family. A data family is a set of one or more Datasets that share a similar schema. Organizing your datasets into data families makes it much easier to locate a particular dataset when needed.