24 Apr 2021

Red Wine Quality prediction using AzureML, AKS with TensorFlow Keras

What are we trying to do

Predict the Quality of Red Wine using Tensorflow Keras deep learning framework given certain attributes such as fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol

We divide our approach into 2 major blocks:

Building the Model in Azure ML
Inference from the Model in Azure ML

Building the model in Azure ML has the following steps:

Create the Azure ML workspace
Upload data into the Azure ML Workspace
Create the code folder
Create the Compute Cluster
Create the Model
Create the Compute Environment
Create the Estimator
Create the Experiment and Run
Register the Model

Inferencing from the model in Azure ML has the following steps:

Create the Inference Script
Create the Inference Dependencies
Create the Inference Config
Create the Inference Clusters
Deploy the Model in the Inference Cluster
Get the predictions

Please read the other post Red Wine Quality prediction using AzureML, AKS. This was done using machine learning techniques and not using deep learning. The same thing is accomplished here but using the deep learning framework Keras. Most of the things remain the same compared to the machine learning method, but a few steps change. I am going to highlight the changed aspects here only so that it is easy to follow.

Step	Change / No Change
Create the Azure ML workspace	No Change
Upload data into the Azure ML Workspace	No Change
Create the code folder	No Change
Create the Compute Cluster	No Change
Create the Model	Change
Create the Compute Environment	Change
Create the Estimator	Change
Create the Experiment and Run	No Change
Register the Model	No Change

Create the Model

The model that we create here makes use of a repeatable block made up of

Dense Layer
Dropout
BatchNormalization

The last layer is a Dense layer of a single neuron.

This block is repeated 3 times.


%%writefile $folder_training_script/train.py

import argparse
import os
import numpy as np
import pandas as pd
import glob

from azureml.core import Run
# from utils import load_data

# let user feed in 2 parameters, the dataset to mount or download, and the regularization rate of the logistic regression model
parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')
args = parser.parse_args()

###
data_folder = os.path.join(args.data_folder, 'winedata')
print('Data folder:', data_folder)

red_wine = pd.read_csv(os.path.join(data_folder, 'winequality_red.csv'))

                        
from tensorflow import keras
from tensorflow.keras import layers, callbacks


# Create training and validation splits
df_train = red_wine.sample(frac=0.7, random_state=0)
df_valid = red_wine.drop(df_train.index)

X = df_train.copy()
X = X.drop(columns = ["quality"])
df_train_stats = X.describe()
df_train_stats = df_train_stats.transpose()

def norm(x):
    return (x - df_train_stats['mean']) / df_train_stats['std']

# Split features and target
X_train = df_train.drop('quality', axis=1)
X_valid = df_valid.drop('quality', axis=1)
X_train = norm(X_train)
X_valid = norm(X_valid)
y_train = df_train['quality']
y_valid = df_valid['quality']


early_stopping = callbacks.EarlyStopping(
    min_delta=0.001, # minimium amount of change to count as an improvement
    patience=20, # how many epochs to wait before stopping
    restore_best_weights=True,
)

input_shape=X_train.shape[1]

model = keras.Sequential([
    # the hidden ReLU layers
    layers.Dense(units=512, activation='relu', input_shape=[input_shape]),
    layers.Dropout(0.3),
    layers.BatchNormalization(),
    layers.Dense(units=512, activation='relu'),
    layers.Dropout(0.3),
    layers.BatchNormalization(),
    layers.Dense(units=512, activation='relu'),
    layers.Dropout(0.3),
    layers.BatchNormalization(),
    # the linear output layer 
    layers.Dense(units=1)
])

model.compile(
    optimizer='adam',
    loss='mae',
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=256,
    epochs=500,
    callbacks=[early_stopping], # put your callbacks in a list
    verbose=0,  # turn off training log
)

history_df = pd.DataFrame(history.history)

# Get the experiment run context
run = Run.get_context()

run.log('min_val_loss', np.float(history_df['val_loss'].min()))

os.makedirs('outputs', exist_ok=True)

# note file saved in the outputs folder is automatically uploaded into experiment record
model.save('outputs/my_model')

run.complete()

Create the Compute Environment

The compute environment makes use of the curated environment AzureML-TensorFlow-2.2-GPU provided by AzureML.

from azureml.core import Environment
curated_env_name = 'AzureML-TensorFlow-2.2-GPU'
tf_env = Environment.get(workspace=ws, name=curated_env_name)

Create the Estimator

We change the estimator to make use of the curated environment environment_definition = tf_env

from azureml.train.estimator import Estimator

script_params = {
    '--data-folder': ds.as_mount()
}


# Create an estimator
estimator = Estimator(source_directory=folder_training_script,
                      script_params=script_params,
                      compute_target = compute_target, # Run the experiment on the remote compute target
                      environment_definition = tf_env,
                      entry_script='train.py')

Predict the data

Predicting or Inferencing from the model in Azure ML has the following steps:

Create the Inference Script
Create the Inference Dependencies
Create the Inference Config
Create the Inference Clusters
Deploy the Model in the Inference Cluster
Get the predictions

Same as in the previous section, we do not explain in detail the steps which are the same in building the model using the machine learning work. We would highlight only the step which has changed from the previous implementation.

Step	Change / No Change
Create the Inference Script	Change
Create the Inference Dependencies	`Not required`
Create the Inference Config	Change
Create the Inference Clusters	No Change
Deploy the Model in the Inference Cluster	No Change
Get the predictions	No Change

Create the Inference Script

%%writefile $folder_training_script/score.py

import json
from tensorflow import keras
import numpy as np
from azureml.core.model import Model

# Called when the service is loaded
def init():
    global model
    # Get the path to the registered model file and load it
    model_path = Model.get_model_path('wine_model')
    model = keras.models.load_model(model_path)

# Called when a request is received
def run(raw_data):
    try:
               
        # Get the input data as a numpy array
        data = np.array(json.loads(raw_data)['data']) 
        # Get a prediction from the model
        predictions = model.predict(data)
        log_txt = 'Data:' + str(data) + ' - Predictions:' + str(predictions)
        print(log_txt)
        # Return the predictions as any JSON serializable format
        return predictions.tolist()
    
    except Exception as e:
        result = str(e)
        # return error message back to the client
        return json.dumps({"error": result})

Create the Inference Dependencies

We do not require this step since we use a curated environment

Create the Inference Config

from azureml.core.model import InferenceConfig

classifier_inference_config = InferenceConfig(source_directory = './winecode',
                                              entry_script="score.py",                                             environment=tf_env)

Here we use the curated environment tf_env created earlier to prepare the Inference Config

Conclusion

In this post, we have highlighted the code which would change when using the Keras deep learning framework compared to the method which we have used for building the model using machine learning.

Thank you for reading. Please do leave your comments.

Thoughts - Ambarish