Abstract:
In the modern world, most of the enterprises willing to leverage the use of machine learning models in their applications. Due to the high demand usage of the machine learning models in production, need to bring the machine learning models from research to production with minimal time duration, MLOps emerge an unavoidable practice. Big scope of the MLOps opens many doors for research. MLOps is one of the emerging topic among researchers. There are many people involved in the entire machine learning life cycle with various roles. Similar to DevOps, MLOps is also a culture that should be practiced by all the parties with different roles who involved in the entire process to get a better outcome. MLOps adopts many practices from DevOps and it has some own set of practices as well. Even though there are are many tools and technologies developed to build MLOps pipeline, there are still rooms for further studies to improve the performance of the MLOps pipelines. There are many phases in the entire machine learning process such as data handling, model training, model evaluation, hyperparameter tuning, model deployment, model versioning, and model monitoring etc. For a successful performance of an MLOps pipeline, all of these phases should be automated as much as possible. Performance improvements in the MLOps pipeline can be achieved in terms of easiness of usage, time and cost. In this study we have taken a simple machine learning problem called "Stock price prediction for Google stock prices using LSTM". We have analysed many tools that can be used in MLOps pipeline. Finally we have implemented an end-to-end MLOps pipeline with open source tools and technologies for the selected machine learning problem. Our final solution is implemented using DVC, MLflow, Evidently and GitHub Actions. We compared our final solution along with other solutions available in the market and analysed the pros and cons. Our solution is very flexible to use. It has no vendor locking. If any modifications or extensions of tools needed, it can be plugged easily into the proposed architecture. We have automated almost all the phases in the MLOps pipeline. It reduce the time taken to bring the machine learning models from research to production. Since we have used free and open source tools mostly, it is very cost effective. We have found that our final solution improves the performance of the MLOps pipeline in terms of easiness of usage, time and cost. Keywords: MLOps, Machine Learning, Pipeline, DevOps, Data Version Control, Continuous Integration(CI), Continuous Deployment(CD), Continuous Training (CT), Workflow
Citation:
Kasthururaajan, R. (2023). Performance improvements in MLOPS pipeline [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/23391