What is your most favorite IDE? A developer can mention various of IDE’s. How about data scientist?
Normally, the Developers develop the code and data scientists train develop and train the model. Tools such as VSCode come in handy as easy to install and use. Many have many preferences so the chances are that it can be a choice of the developer or the development team. Usually the trained model is handed over to the app developer for integrating it and build the final application. There are times where the mismatches in compatibility can cost both the app developer and the model developer. The resulting friction between app developers and data scientists to identify and fix the root cause can be a slow, frustrating, and expensive process
We often here organizations including managers continuously talking about Artificial Intelligence. People like to find solutions that are integrated with AI. So as the developers have a development lifecycle, the data scientists follow a data science lifecycle.
The lifecycle includes processes such as,
Data Ingestion --> Data Preparation --> Model Development --> Model Deployment
There can be many iterations of this lifecycle as there can be requirements for changing the data labels, removing anomalies, changes upon user feedback and timely decision changes and many more.
Application Development Lifecycle for a developer includes building app, testing, maintaining, and continuously adopt to changes in user requirements where the models of data scientists also can be a part of. There shall be a good understanding between the application developers and model developers as mismatches can lead to errors in the integrated systems. I have seen many issues with even traditional SSRS implementations to web applications. When the model changes, app fails.
Azure DevOps and Azure Machine Learning are combined to overcome this issue with a solution suggested by Microsoft as shown in the image below.
|Source: Azure Blog|
Both the app developers and model developers need to use Git as the repository for managing the development artifacts.
If we simplify this process to understand, the model training code developed by the data scientist will trigger the Azure DevOps CI/CD Pipeline as available on the right top of the diagram above. There the developer can execute multiple steps such as unit tests, integration tests, training and etc.. The changes made by the app developer will trigger integration tests whenever a code change is pushed to the repository. There can be triggers set on data lake as well to execute different commands. The Model Store will register the new models and trigger releases for human approval followed by many other necessary tests as shown in the bottom right. These procedures will help both the developers coordinate and work well within the changes required as well.