Read Datasets

Access registered datasets to get their metadata and DataFrames or download as a CSV file

List Datasets

This section describes how you can list datasets registered with MarkovML in the current workspace

import markov

# Fetches all the datasets registered within the logged-in workspace

for dataset in markov.dataset.get_datasets():  
    print(dataset) # prints the metadata  

The result would look something like this

{  
 "ds_prop": {  
  "name": "Resume Dataset  Reduced",  
  "notes": "Contains the Resume dataset filtered from the original dataset in a 1/10 ratio",  
  "data_category": "text",  
  "delimiter": ",",  
  "df_id": "dYFoqzhBBCxR74uh",  
  "storage_type": "s3",  
  "x_indexes": [],  
  "y_index": -1,  
  "x_col_names": [  
  	"Resume_str"  
  ],  
  "y_name": "Category",  
  "storage_format": "csv",  
  "info": {},  
  "source": ""  
 },  
 "ds_paths": [  
  {  
   "segment_type": "train",  
   "path": "s3://XXXXXXXv/wsp-XXXXXXX/uido1o8s5sra7/Resume Dataset  Reduced/reduced_resume_train.csv",  
   "multi_file": false  
  },  
  {  
   "segment_type": "test",  
   "path": "s3://XXXXXXX/wsp-XXXXXXXX/uido1o8s5sra7/Resume Dataset  Reduced/reduced_resume_test.csv",  
   "multi_file": false  
  }  
 ],  
 "ds_id": "3b64AfvqRsPaBVrmP",  
 "analysis_status": "RESULTS_AVAILABLE",  
 "df": null,  
 "cred_id": "XXXXXXXX",  
 "\_credentials": null  
},  
{  
 ...  
}

Fetch a Dataset

Markov allows you to fetch registered datasets using the following APIs. Know more about datasets in MarkovML in Datasets & Data Families

If you don't have any dataset registered with Markov, follow Register Datasets with MarkovML.

You can use the datasets to do the following:

  • Get the feature column
  • Get the target column
  • Use the dataset segments (train/test/validate) as dataframes
  • Get the number of columns in the dataset / segment
  • Get the number of rows in the dataset / segment
  • Download the dataset as csv

Fetch registered dataset using dataset ID

import markov

dataset = markov.dataset.get_by_id(dataset_id="paste_dataset_id_here")

# get the feature columns
features = dataset.features

# get the target column
target = dataset.target

# get the segments
segments = dataset.segments

# get the train segment's dataframe
dataset_train_dataframe = dataset.train.as_df()

# get the number of rows of the train segment
train_num_rows = dataset.train.num_rows

# get the number of columns of the test segment
test_num_cols = dataset.test.num_cols

# download the test segment as csv
dataset.test.download_as_csv(filepath="test.csv")

Fetch registered dataset using dataset name

import markov

dataset = markov.dataset.get_by_name(dataset_name="paste_dataset_name_here")

# get the feature columns
features = dataset.features

# get the target column
target = dataset.target

# get the segments
segments = dataset.segments

# get the train segment's dataframe
dataset_train_dataframe = dataset.train.as_df()

# get the number of rows of the train segment
train_num_rows = dataset.train.num_rows

# get the number of columns of the test segment
test_num_cols = dataset.test.num_cols

# download the test segment as csv
dataset.test.download_as_csv(filepath="test.csv")

Download Dataset Segment

You can use the dataset object to download one or more segments of the dataset.

import markov

# Fetch registered dataset by id
dataset = markov.dataset.get_by_id("<paste_dataset_id_here>")

# download the train segment of the dataset
dataset.train.download_as_csv(filepath="train.csv")

# downloads all segments of the dataset
dataset.download_as_csv()

View Dataset in the Web UI

import markov

dataset = markov.dataset.get_by_id("<paste_dataset_id_here>")

# get url of the dataset
url = dataset.get_url()

# view details page in browser of the dataset
dataset.view_details()

This would open your browser and prompt you to log in (if you haven't already).

Get Dataset Preview

You can retrieve a preview of a dataset's data by dataset ID

import markov

# Fetch registered dataset
dataset = markov.dataset.get_by_id("<paste_dataset_id_here>")

# Get preview
dataset.get_preview()

The dataset preview would look like this

{
 "segments": [
  "train",
  "test"
 ],
 "preview": {
  "train": {
   "data": [
    "Unnamed: 0,order,sentiment,tweet_id,date,Query,handle,tweet",
    "179491,379027,0,2052245373,Sat Jun 06 00:06:05 PDT 2009,NO_QUERY,torilovesbradie,@jess_0000 being stood up is the worst thing in the world ",
    "211679,448154,0,2068887341,Sun Jun 07 14:52:55 PDT 2009,NO_QUERY,blackkitty,Splitting headache... About to pass out. Really sad because the Tony's are tonight! ",
    "268518,1337563,4,2017684084,Wed Jun 03 08:47:19 PDT 2009,NO_QUERY,moodeey,I tweeted asking how to cancel a domain on godaddy yesterday and I got reply from @GoDaddyGuy with the instructions .. it's very nice ",
    "132189,188370,0,1968867310,Fri May 29 22:24:53 PDT 2009,NO_QUERY,Kelvin_Anethema,Just burned my foot ",
    "20072,717414,0,2259995617,Sat Jun 20 18:30:05 PDT 2009,NO_QUERY,lauraa15,@Jonasbrothers likee miley? LOL i wish i was there  when are you coming back to chile?",
    "298358,971255,4,1831132511,Sun May 17 18:16:37 PDT 2009,NO_QUERY,abeckb,\"Love love making random, last minute Sconnie plans for @courtneyfaile   Hollerrr for double datin'!!!\"",
    "141710,683485,0,2250287314,Sat Jun 20 00:08:54 PDT 2009,NO_QUERY,KHolwick,I lost my phone ",
    "107872,458194,0,2071816448,Sun Jun 07 19:58:08 PDT 2009,NO_QUERY,artiseverything,I over fed myself  man man man",
    "203261,538147,0,2199000171,Tue Jun 16 16:52:41 PDT 2009,NO_QUERY,kebridgeman,\"Looks like someone cut my phone! Ugh... No phone, no internet...this sucks! Text me...it's all I got right now. \"",
    "234716,1111240,4,1972194333,Sat May 30 08:45:14 PDT 2009,NO_QUERY,imhannahh,@AlanCarr Alphabeat-Fascnation  ? Cant get more upbeat and happier than that "
   ],
   "metadata": {
    "line_separator": "\n"
   }
  },
  "test": {
   "data": [
    "Unnamed: 0,order,sentiment,tweet_id,date,Query,handle,tweet",
    "1,1295011,4,2003602665,Tue Jun 02 06:49:50 PDT 2009,NO_QUERY,jessiii_babiii,time to play mind games ",
    "9,1505010,4,2072281940,Sun Jun 07 20:43:46 PDT 2009,NO_QUERY,brianjshoopman,@raingraves Indeed it was. I'll be seeing Mo Broaddus tomorrow so I'm teasing him about the upcoming &quot;slumber party&quot; with you &amp; Wrath. ",
    "10,617001,0,2226879429,Thu Jun 18 12:30:08 PDT 2009,NO_QUERY,franmesquish,hates it when i lose my train of thought and forget what i was going to look at on the tinternet ",
    "13,1564370,4,2187296656,Mon Jun 15 20:04:36 PDT 2009,NO_QUERY,ladydollparts,@Buccah Disney cruise.. comes with complimentary Prince Charming ",
    "29,986626,4,1834587595,Mon May 18 03:30:13 PDT 2009,NO_QUERY,voguex,@magicmillie maybe low fat snickers? we are making it with Mars bars as well ",
    "32,806225,4,1468786164,Tue Apr 07 03:43:43 PDT 2009,NO_QUERY,SomersetBob,\"@John1954Moi No, not yet - this is the first time I've really disclosed anything about him \"",
    "34,1310693,4,2013356851,Tue Jun 02 22:29:03 PDT 2009,NO_QUERY,AthenaATL,Happy birthday @SupBritt! now it's bedtimeee ",
    "36,67425,0,1692330758,Sun May 03 19:45:55 PDT 2009,NO_QUERY,Pawel_Sarkowicz,\"Shit, I'm so tired I'm like falling asleep! I still gotta finish my project though  Gotta stay awake O_O\"",
    "40,331338,0,2012640890,Tue Jun 02 21:01:16 PDT 2009,NO_QUERY,ohsaby,@Usedink I smelt it ",
    "42,336215,0,2013950296,Wed Jun 03 00:01:57 PDT 2009,NO_QUERY,mandu86,@JONESmichael  can i not register and just have some tix plz? mine fell through "
   ],
   "metadata": {
    "line_separator": "\n"
   }
  }
 },
 "delimiter": ","
}