Build your Custom Model

Build your custom model using Markov SDK.

As mentioned in the quick intro, building custom models involves specifying inference stages and actions using MarkovML operators. These stages are executed in sequence, and the results are packaged into a single entity known as a custom model.

Define your Inference Model

The inference pipeline requires the following inputs from you.

  1. Name: Name your inference pipeline for your reference.
  2. Sample: Give the sample inputs as given by the users.
  3. Schema (mandatory): Schema provides the target columns along with their features and data types (e.g., string or numbers) within the dataset that the model will use to make predictions. This information is mandatory, as it enables the model to understand the user input data and generate accurate predictions.

๐Ÿšง

Convert schema and sample to Markov backend acceptable format

Use the utility infer_schema_and_samples_from_dataframe to convert your input data frame into schema and samples in the format accepted by the Markov backend.

schema, samples = infer_schema_and_samples_from_dataframe(sample_input)

Sample Code

import os.path
import markov
import pandas as pd

# markov imports
from markov.api.models.artifacts.base import (
    MarkovPredictor,
    MarkovPyfunc,
    infer_schema_and_samples_from_dataframe,
)
from markov.api.models.artifacts.inference_pipeline import InferencePipeline


# To create a model app you will need to provide samples from your test/train set.
# You can sample some rows from your test/train dataframe to register with MarkovML.
samples = ["Generative AI has been impacting the industry trends at a very fast pace."]

# Note the content here is the feature column in AG News Dataset,
# You just need to provide a dataframe with a few examples.
sample_input = pd.DataFrame([{"content": samples}])


# Use the utility `infer_schema_and_samples_from_dataframe` to convert your input dataframe
# into schema and samples in the format accepted by Markov backend
schema, samples = infer_schema_and_samples_from_dataframe(sample_input)


#Define your inference model
my_inference_model = InferencePipeline(
   name="pytorch-text-classifier-demo",  # 1: Add the inference model name
   schema=schema,                        # 2: Give the schema (mandatory)
   samples=samples,                      # 3: User sample inputs
 )

๐Ÿ“˜

Note

Currently, we only support the addition of samples through pandas DataFrames.

Inference Pipeline Operators

Every step in the pipeline is known as a โ€œstage.โ€ The add_pipeline_stage() method allows you to add stages to the pipeline.

MarkovML provides 3 operators to specify the action performed during each stage. They are:

  1. MarkovPyfunc: This operator specifies that the stage includes a function. The function can be any Python function you want to use for that stage.
  2. MarkovPredictor: This operator specifies that the stage includes the trained model to perform prediction-related tasks. It also specifies the type of framework used for training the model, as well as the model type, such as Sklearn or PyTorch.

    ๐Ÿ“˜

    Note

    MarkovML supports lightGBM, PyTorch, sklearn, and XGBoost frameworks. Use MarkovSupportedFlavours to call MarkovML supported framework, which is also referred to as MarkovML Supported Flavours.

  3. MarkovTransformer: This operator specifies that the stage includes a training model usingtransform method instead of predict in your ML library.
    For example, sklearn library has a transformer called TruncatedSVD. If you need to callTruncatedSVD.transform()rather than TruncatedSVD.predict(), you use MarkovTransformer.

Stages of Inference Pipeline

Inference pipeline, comprised of "stages" that execute each of the defined tasks. The stages are basically of three types, as shown below:

Stage 1: Pre-processing

Add your pre-processing functions with all the pre-processing steps needed for the dataset.

You can use MarkovPyfunc operator to add the pre-processing function to this stage and name the stage. For example, name the stage as โ€œpreprocess.โ€

Sample Code

...
# Model Preprocessing stage
stage=MarkovPyfunc(
        name="preprocess", pyfunc= model.dataset_handler.process_text
    )
...

Stage 2: Trained Model

Use the MarkovPredictor operator to add the model with the MarkovML-supported flavor or framework and name the stage. For example, name the stage โ€œpytorch_predictor,โ€ as shown in the below sample code.

Sample Code

# Your trained model 
# Get your model and assign it to model
model = get_trained_model()

...
# Model Prediction stage
stage=MarkovPredictor(
        name="pytorch_predictor", model=model, flavour=MarkovSupportedFlavours.PYTORCH
    )
...

Stage 3: Post-Processing

Once you have the model predictions, you might want to convert them to your desired format. For example, your model is returning 0 and 1 and you want to map it to Negative and positive.

You can use the MarkovPyfunc operator to add your post_process function to this stage.

Sample Code

# Sample code for post-processing where the model predictions are mapped to the following
# Dataset used is AG News dataset
# 1 indicate it's the world news, followed by 2 as Sports news, 3 as Business news
# and 4 as Sci/Tech news
def post_process(prediction):
    ag_news_label = {1: "World", 2: "Sports", 3: "Business", 4: "Sci/Tec"}
    prediction_int = prediction.argmax(1).item() + 1
    return ag_news_label[prediction_int]

...
   # Post processing stage
   stage=MarkovPyfunc(name="post_process", pyfunc=post_process)
...

Your Complete Custom Model Inference Pipeline

Use the add_pipeline_stage() method to add your stages to the pipeline. You can also add environment requirements to the inference pipeline using theadd_pip_requirements()method and other dependent code file paths, such as the Python files used to train your custom model, using add_dependent_code() method.

๐Ÿ“˜

Note

Stages must be in the correct order of execution.


import os.path
import markov
import pandas as pd

# markov imports
from markov.api.models.artifacts.base import (
    MarkovPredictor,
    MarkovPyfunc,
    infer_schema_and_samples_from_dataframe,
)
from markov.api.models.artifacts.inference_pipeline import InferencePipeline
from markov.library.dependencies_helper import pytorch_pip_requirements
from markov.library.mlflow_helper import MarkovSupportedFlavours
from train_model import get_trained_model

# Sample and Schema
samples = ["Generative AI has been impacting the industry trends at a very fast pace."]
sample_input = pd.DataFrame([{"content": samples}])
schema, samples = infer_schema_and_samples_from_dataframe(sample_input)


# Define inference model 
my_inference_model = InferencePipeline(
    name="pytorch-text-classifier-demo",
    schema=schema,
    samples=samples,
)

... 

# Build your inference pipeline 
# Add stages to the Inference Pipeline
my_inference_model.add_pipeline_stage(
    # pre-processing stage
    stage=MarkovPyfunc(
        name="preprocess", pyfunc= model.dataset_handler.process_text
).add_pipeline_stage(
    # Model Predictions
    stage=MarkovPredictor(
        name="pytorch_predictor", model=model, flavour=MarkovSupportedFlavours.PYTORCH
    )
).add_pipeline_stage(
    # Post-processing
    stage=MarkovPyfunc(name="post_process", pyfunc=post_process)
).add_pip_requirements(
    # Add requirements if missing
    pytorch_pip_requirements()
).add_dependent_code(
    # Add any dependent code
    code_paths=[os.path.join(get_current_directory_path(), 'train_model.py')]
)

Whatโ€™s Next