By Andrea Mirabile, Global Director of AI Research, Zebra Technologies
A few months can be a long time in the technology industry. For example, in my team—the AI team within Zebra’s Chief Technology Office— we maintain an environment of innovation by regularly hosting presentations from PhD students and academic professors, showcasing their latest research and insights. This practice keeps us at the forefront of emerging technologies and use cases. Currently, we are exploring industry-leading themes, including high-performance computing, edge computing, metaverse, robotics and autonomous systems, artificial intelligence (AI), and deep learning.
My own areas of focus are mainly AI, deep learning, and machine learning operations (ML Ops), where I see advancements in vision and natural language processing, especially on large-language models (LLM), as well as techniques for improving the performance and interpretability of deep neural networks and applications of AI and deep learning in fields such manufacturing, autonomous vehicles, healthcare, and financial services.
Retail is another key area of AI problem solving. Inventory tracking, automated self-monitoring, improving inventory picking accuracy, and streamlined returns are some of the challenges that could be solved with AI solutions, such as computer vision. Deployed in real life, computer vision applications could play a key role in improving the customer experience. For example, computer vision can be used for queue management to ensure shorter wait times, as part of self-checkout systems that reduce friction and make the checkout process faster and more convenient, as well as for fraud detection to ensure secure transactions. To solve these challenges, we need to make sure our research and development processes are the best they can be, and that means getting ML Ops right.
Generally speaking, ML Ops is the practice of combining software development best practices with machine learning in order to put AI models into production. It aims to improve collaboration between data scientists, engineers, and IT teams, and to automate the process of deploying, scaling, and maintaining AI models in production environments. ML Ops helps us ensure that the models we create are deployed, monitored, and maintained in a way that is similar to software development. The goal is to ensure that the models are always working as expected for customers, providing high accuracy results, and able to be updated as needed with minimal or no downtime.
Models are not perfect and come with their own challenges. Models can degrade over time due to changing data patterns or environmental factors. Continuous monitoring is essential to identify performance issues and ensure that models are performing as expected. One of the challenges faced when scaling and maintaining AI models in production is managing models across different edge devices, which often have varying update mechanisms. Standardizing deployment frameworks could ensure consistent and efficient model updates across our product portfolio.
Common and Not so Common ML Ops
ML Ops teams are concerned with how to manage and monitor AI models in production, how to improve collaboration between data scientists, engineers, and IT teams, and how to ensure the quality and reliability of AI models in production. It’s a discussion in the wide AI community too, as we share the challenges and best practices around scaling, monitoring, and maintaining AI models in production. There are five areas where I find ML Ops practices most helpful: automating the process of model development, testing, deployment and monitoring; enabling the ability to scale models up or down based on demand; ensuring models are compliant with regulatory requirements and meet the standards
set by both our company and our customers; continuously monitoring the model’s performance, accuracy and data drift; and managing the end-to-end life cycle of the model and providing transparency to stakeholders.
And although not always commonly considered part of ML Ops, I would add collaboration, security and privacy. It’s so important to adopt tools and practices that encourage communication and knowledge sharing among AI development team members. This includes using shared repositories, documenting processes, and conducting regular code reviews to ensure code quality and consistency. Developers should be aware of the security and privacy implications of deploying machine learning models. It is important to implement appropriate access controls, data anonymization techniques, and encryption measures to protect sensitive data and prevent unauthorized access.
Three Myths about ML Ops
Apart from the common and not so common focuses of ML Ops, there’s also some myths that need busting. First, I’ve heard people claim, “ML Ops is solely the responsibility of the data science team.” The reality is that ML Ops is a collaborative effort involving data scientists, developers, operations teams, and other stakeholders. Developers should actively participate in building and maintaining the ML Ops pipeline to ensure the successful deployment and management of models.
Secondly, some say, “ML Ops is only about deploying models to production.” However, that’s misleading. While deploying models is a significant aspect of ML Ops, it is not the only one. ML Ops encompasses the entire lifecycle of a machine learning model, including data pre-processing, data versioning, model training, deployment, monitoring, and retraining. Building efficient data pipelines can enhance the speed and reliability of the development cycle. A data pipeline is a series of interconnected steps and processes that transform raw data into a usable format for machine learning tasks. It involves collecting, ingesting, pre-processing, and transforming data, as well as feeding it into machine learning models for training or inference. Data pipelines automate these steps, ensuring consistency, reproducibility, and scalability in data processing workflows.
Finally, it’s wrong to believe that “ML Ops is a one-time setup.” ML Ops is an ongoing process that requires continuous improvement and iteration. Developers should regularly evaluate and optimize their ML Ops pipeline to adapt to changing requirements, emerging technologies, and evolving best practices and tools. It’s important to note that not all tools fit every need or use case in the deployment, monitoring, and management of machine learning models. Depending on the specific requirements, infrastructure, and constraints, it may be necessary to build custom or in-house tools to address specific challenges or use an off-the-shelf solution. For example, some software platforms utilise kubernetes and dockers to automate and orchestrate AI workloads, complemented with other tools to monitor and track experiments, foster collaboration, and facilitate knowledge sharing within the team.
As we head into 2024, we can hope for a maturation of ML Ops, as more research and development teams test and develop ways for generative AI to play a transformative role, where automation and innovation accelerate. As we say, a few months can be a long time in the technology industry.
For more information and learn more and to contact Andrea, click here