Author's image | canva
Deep checks is a Python package that provides a wide variety of built-in checks to detect problems with model performance, data distribution, data integrity, and more.
In this tutorial, we will learn about DeepChecks and use it to validate the dataset and test the trained machine learning model to generate a complete report. We will also learn how to test models in specific tests instead of generating full reports.
Why do we need machine learning tests?
Machine learning testing is essential to ensure the reliability, fairness, and security of ai models. It helps verify model performance, detect bias, improve security against adversarial attacks, especially on large language models (LLM), ensure regulatory compliance, and enable continuous improvement. Tools like Deepchecks provide a comprehensive testing solution that addresses all aspects of ai and ML validation, from research to production, making them invaluable for developing robust and reliable ai systems.
Getting started with DeepChecks
In this getting started guide, we will load the dataset and perform a data integrity test. This critical step ensures that our data set is reliable and accurate, paving the way for successful model training.
- We will start by installing the DeepChecks Python package using the `pip` command.
!pip install deepchecks --upgrade
- Import essential Python packages.
- Load the dataset using the pandas library, which consists of 569 samples and 30 functions. He Cancer classification The data set is derived from digitized images of fine needle aspirations (FNACs) of breast masses, where each feature represents a characteristic of the cell nuclei present in the image. These characteristics allow us to predict whether the cancer is benign or malignant.
- Split the data set into training and testing using the target column 'benign_0__mal_1'.
import pandas as pd
from sklearn.model_selection import train_test_split
# Load Data
cancer_data = pd.read_csv("/kaggle/input/cancer-classification/cancer_classification.csv")
label_col="benign_0__mal_1"
df_train, df_test = train_test_split(cancer_data, stratify=cancer_data(label_col), random_state=0)
- Create the DeepChecks dataset by providing additional metadata. Since our data set has no categorical features, we leave the argument empty.
from deepchecks.tabular import Dataset
ds_train = Dataset(df_train, label=label_col, cat_features=())
ds_test = Dataset(df_test, label=label_col, cat_features=())
- Run the data integrity test on the train data set.
from deepchecks.tabular.suites import data_integrity
integ_suite = data_integrity()
integ_suite.run(ds_train)
The report generation will take a few seconds.
The data integrity report contains test results on:
- Feature-feature correlation
- Feature-tag correlation
- Unique value in column
- Special characters
- mixed nulls
- Mixed data types
- Mismatched string
- Data Duplicates
- Rope length out of limits
- Conflicting tags
- Detection of atypical samples
Testing machine learning models
Let's train our model and then run a model evaluation suite to learn more about the model's performance.
- Load essential Python packages.
- Create three machine learning models (logistic regression, random forest classifier, and Gaussian NB).
- Put them together using the voting classifier.
- Fit the ensemble model on the training data set.
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
# Train Model
clf1 = LogisticRegression(random_state=1,max_iter=10000)
clf2 = RandomForestClassifier(n_estimators=50, random_state=1)
clf3 = GaussianNB()
V_clf = VotingClassifier(
estimators=(('lr', clf1), ('rf', clf2), ('gnb', clf3)),
voting='hard')
V_clf.fit(df_train.drop(label_col, axis=1), df_train(label_col));
- After the training phase is complete, run the DeepChecks model evaluation suite using the training and testing data sets and the model.
from deepchecks.tabular.suites import model_evaluation
evaluation_suite = model_evaluation()
suite_result = evaluation_suite.run(ds_train, ds_test, V_clf)
suite_result.show()
The model evaluation report contains test results on:
- Unused Features: Train Dataset
- Unused Features: Test Data Set
- Train test performance
- Prediction drift
- Comparison of simple models
- Model inference time: train data set
- Model inference time: test data set
- Confusion Matrix Report: Train Dataset
- Confusion matrix report: test data set
There are other tests available in the suite that were not run due to the ensemble model type. If you ran a simple model like logistic regression, you may have gotten a complete report.
- If you want to use a model evaluation report in a structured format, you can always use the `.to_json()` function to convert your report to JSON format.
- Additionally, you can also save this interactive report as a web page using the
.save_as_html()
function.
Running the single check
If you don't want to run the entire model evaluation test suite, you can also test your model in a single check.
For example, you can check the bias of the labels by providing the training and testing data set.
from deepchecks.tabular.checks import LabelDrift
check = LabelDrift()
result = check.run(ds_train, ds_test)
result
As a result, you will get a distribution graph and a drift score.
You can even extract the value and methodology from the drift score.
{'Drift score': 0.0, 'Method': "Cramer's V"}
Conclusion
The next step in your learning journey is to automate the machine learning testing process and track performance. You can do it with GitHub Actions by following the Deep checks in CI/CD guide.
In this beginner's guide, we have learned how to generate data validation and machine learning evaluation reports using DeepChecks. If you are having trouble running the code, I suggest you take a look at the Machine learning testing with DeepChecks Kaggle Notebook and launch it yourself.
Abid Ali Awan (@1abidaliawan) is a certified professional data scientist who loves building machine learning models. Currently, he focuses on content creation and writing technical blogs on data science and machine learning technologies. Abid has a master's degree in technology management and a bachelor's degree in telecommunications engineering. His vision is to build an artificial intelligence product using a graph neural network for students struggling with mental illness.