MLOps Roadmap

Giới thiệu

MLOps là gì?

MLOps có thể được định nghĩa hẹp là "khả năng áp dụng các nguyên tắc DevOps cho các ứng dụng Machine Learning", tuy nhiên, định nghĩa hẹp này bỏ lỡ giá trị thực của MLOps đối với khách hàng. Thay vào đó, chúng tôi định nghĩa MLOps là "phần mở rộng của phương pháp DevOps để bao gồm các tài sản Khoa học dữ liệu (Data Science) và Máy học (Machine Learning)". MLOps nên được xem như một phương pháp quản lý nhất quán các khía cạnh ML của sản phẩm theo cách thống nhất với tất cả các yếu tố kỹ thuật và phi kỹ thuật khác cần thiết để thương mại hóa thành công các sản phẩm đó với tiềm năng tồn tại tối đa trên thị trường. Điều này cũng bao gồm cả DataOps, vì ML không có dữ liệu đầy đủ, nhất quán, hợp lệ về mặt ngữ nghĩa, chính xác, kịp thời và không thiên vị sẽ gây ra vấn đề hoặc dẫn đến các giải pháp thiếu sót có thể làm trầm trọng thêm các thành kiến sẵn có.

MLOps không được nhầm lẫn với "AIOps". AIOps thường có nghĩa là ứng dụng công nghệ AI vào dữ liệu Hoạt động với mục đích đôi khi không rõ ràng để thu thập thông tin chi tiết. Các thuật ngữ này vẫn đang phát triển, nhưng với mục đích của tài liệu này, chúng tôi không có nghĩa là AIOps trong cách sử dụng sau này. Một số tổ chức cảm thấy thoải mái hơn với tên gọi AI thay vì ML và do đó, có thể mong đợi rằng MLOps có thể được AIOps gọi trong các miền đó, tuy nhiên điều ngược lại là không đúng khi sử dụng thuật ngữ "AIOps" có thể không đề cập đến phương pháp MLOps.

Cái gì không phải là MLOps?

Đôi khi sẽ hữu ích khi xem xét những định nghĩa trái chiều về một khái niệm để hiểu rõ hơn về khái niệm đó.

Ví dụ: MLOps không phải là "đưa Jupyter Notebooks vào môi trường Production".

Trong phần tiếp theo, chúng tôi sẽ thảo luận về các trình điều khiển chính cho MLOps và mở rộng dựa trên các yêu cầu đối với phương pháp DevOps thực sự để quản lý tài sản ML. Tại thời điểm này trong quá trình phát triển thực tiễn, có lẽ sẽ giúp hiểu rằng phần lớn hoạt động nghiên cứu và phát triển ML và AI đã được thúc đẩy bởi Khoa học dữ liệu thay vì các nhóm Khoa học máy tính. Chuyên môn này đã tạo ra những bước nhảy vọt trong lĩnh vực ML nhưng đồng thời có nghĩa là một tỷ lệ đáng kể những người thực hành ML chưa bao giờ được tiếp xúc với những bài học trong 70 năm qua về quản lý tài sản phần mềm trong môi trường thương mại.

Như chúng ta sẽ thấy, điều này có thể dẫn đến khoảng cách khái niệm lớn giữa những gì liên quan đến việc tạo bằng chứng khả thi về khái niệm của mô hình ML được đào tạo trên máy tính xách tay của Nhà khoa học dữ liệu so với những gì cần thiết sau đó để có thể chuyển tài sản đó thành sản phẩm thương mại một cách an toàn trong các môi trường sản xuất. Trên thực tế, MLOps vẫn đang trong giai đoạn đầu của quá trình hoàn thiện và có khả năng là nhiều phương pháp thường thấy hiện nay sẽ bị loại bỏ để có những cách tiếp cận tốt hơn trong vài năm tới khi các nhóm tiếp xúc nhiều hơn với phạm vi đầy đủ của miền vấn đề này.

Cùng với thách thức này, các giải pháp ML có xu hướng trở thành hệ thống ra quyết định thay vì chỉ là hệ thống xử lý dữ liệu và do đó sẽ phải chịu trách nhiệm theo các tiêu chuẩn cao hơn nhiều so với tiêu chuẩn áp dụng cho các dự án phân phối phần mềm chất lượng tốt nhất. Do đó, tiêu chuẩn đối với các quy trình quản trị và chất lượng rất cao, trong nhiều trường hợp đại diện cho các quy trình tuân thủ pháp luật bắt buộc bởi luật pháp khu vực.

Khi các giải pháp ra quyết định này ngày càng thay thế con người ra quyết định trong thương mại và chính phủ, chúng ta gặp phải một loại vấn đề quản trị mới, được gọi chung là "Trí tuệ nhân tạo có trách nhiệm". Những điều này đưa ra một loạt thách thức xung quanh các vấn đề phức tạp như đạo đức, sự công bằng và thiên vị trong các mô hình ML và thường tuân theo quy định của chính phủ yêu cầu khả năng hiểu và giải thích được của các mô hình, thường ở tiêu chuẩn cao hơn tiêu chuẩn áp dụng cho nhân viên.

Do đó, MLOps sẽ cần phải phát triển theo cách tạo điều kiện thuận lợi cho các quy trình quản trị phức tạp và nhạy cảm theo tiêu chuẩn phù hợp với những gì được kỳ vọng sẽ trở thành một lĩnh vực được quản lý chặt chẽ.

Trình điều khiển duy nhất cho các giải pháp Machine Learning đại diện cho các yêu cầu MLOps

Tối ưu hóa quy trình đưa các tính năng ML lên Production bằng cách giảm Thời gian thực hiện
Tối ưu hóa vòng phản hồi (Loop feedback) giữa mỗi lần Production
Hỗ trợ chu trình thử nghiệm và phản hồi giải quyết vấn đề cho các ứng dụng ML
Thống nhất chu kỳ phát hành cho ML
Kích hoạt kiểm tra tự động (Auto Testing) ML
Áp dụng các nguyên tắc Agile cho các dự án ML
Hỗ trợ tài sản ML với tư cách là công dân hạng nhất trong hệ thống CI/CD
Cho phép shift-left trên Bảo mật để bao gồm nội dung ML
Cải thiện chất lượng thông qua tiêu chuẩn hóa trên các tài sản thông thường và ML
Áp dụng Phân tích tĩnh, Phân tích động, Quét phụ thuộc và Kiểm tra tính toàn vẹn cho tài sản ML
Giảm thời gian trung bình để khôi phục cho các ứng dụng ML
Giảm tỷ lệ thay đổi không thành công cho các ứng dụng ML
Quản lý nợ kỹ thuật trên các tài sản ML
Cho phép tiết kiệm chi phí trọn đời ở cấp độ sản phẩm
Giảm chi phí quản lý CNTT thông qua quy mô kinh tế
Tạo điều kiện thuận lợi cho việc sử dụng lại các phương pháp ML thông qua các dự án mẫu hoặc "khởi động nhanh"
Quản lý rủi ro bằng cách điều chỉnh việc phân phối ML với các quy trình quản trị phù hợp SBOM, ký kết tài sản, chuỗi hành trình sản phẩm

Tầm nhìn

Trải nghiệm MLOps tối ưu là trải nghiệm trong đó nội dung ML được xử lý nhất quán với tất cả nội dung phần mềm khác trong môi trường CI/CD. Các mô hình ML có thể được triển khai cùng với các dịch vụ bao bọc chúng và các dịch vụ tiêu thụ chúng như một phần của quy trình phát hành thống nhất. MLOps phải không phụ thuộc vào ngôn ngữ. Các tập lệnh đào tạo cho các mô hình ML có thể được viết bằng nhiều ngôn ngữ lập trình khác nhau, do đó, việc tập trung vào Python đang hạn chế một cách giả tạo đối với việc áp dụng rộng rãi. MLOps phải không phụ thuộc vào framework. Có rất nhiều framework ML khác nhau thường được sử dụng ngày nay và người ta cho rằng những framework này sẽ tiếp tục phát triển theo thời gian. Phải có khả năng sử dụng bất kỳ framework mong muốn nào trong ngữ cảnh MLOps và có thể kết hợp nhiều framework trong bất kỳ triển khai cụ thể nào. MLOps phải là nền tảng và cơ sở hạ tầng bất khả tri. Việc áp dụng MLOps được xác định dựa trên khả năng vận hành bằng cách sử dụng phương pháp này trong các ràng buộc của cơ sở hạ tầng công ty đã xác định trước đó. Không nên cho rằng chỉ MLOps thôi là một động lực đủ để thúc đẩy thay đổi cơ bản đối với cơ sở hạ tầng. Điều rất quan trọng là MLOps không được đưa ra các giả định đơn giản hóa quá mức đối với phần cứng. Để đạt được các yêu cầu hiện đã biết của khách hàng trong một loạt các trường ML, cần phải đạt được mức tăng hiệu suất ít nhất ba bậc độ lớn. Điều này có thể dẫn đến sự thay đổi đáng kể đối với phần cứng được sử dụng để huấn luyện và thực thi các mô hình.

Chúng tôi khẳng định như sau:

Các mô hình có thể mong đợi được đào tạo về các cách kết hợp khác nhau giữa CPU, GPU, TPU,...
Các mô hình có thể mong đợi được thực thi trên nhiều tổ hợp CPU, GPU, TPU,...
Không được giả định rằng các mô hình sẽ được thực thi trên cùng một phần cứng khi chúng được đào tạo.

Mong muốn có khả năng đào tạo một lần nhưng chạy ở mọi nơi, nghĩa là các mô hình được đào tạo trên phần cứng cụ thể để giảm thiểu thời gian đào tạo và được tối ưu hóa cho các thiết bị mục tiêu khác nhau dựa trên các cân nhắc về chi phí hoặc hiệu suất, tuy nhiên, nguyện vọng này có thể cần được điều chỉnh bằng một mức độ thực dụng. Khả năng đào tạo trên một nền tảng phần cứng nhất định yêu cầu, tối thiểu, hỗ trợ trình điều khiển chuyên dụng và trong thực tế, thường cũng cần hỗ trợ framework/thư viện mở rộng. Khả năng thực thi trên một nền tảng phần cứng nhất định có thể liên quan đến việc phải tối ưu hóa đáng kể việc sử dụng tài nguyên để giảm giá thành sản phẩm hoặc yêu cầu sản xuất vi mạch chuyên dụng dành riêng cho một tác vụ.

Mặc dù chắc chắn sẽ có các tình huống phổ biến, chẳng hạn như trong triển khai Cloud, trong đó CPU -> CPU hoặc GPU -> GPU là đủ để đáp ứng các yêu cầu, MLOps phải cho phép một loạt các tình huống biên dịch chéo tiềm năng.

Việc triển khai MLOps phải tuân theo mẫu 'quy ước về cấu hình', tìm cách giảm thiểu mức độ nội dung dành riêng cho bản dựng cần được duy trì trên giải pháp mong muốn. Khi cấu hình là cần thiết, nó nên được tổng hợp nếu có thể và hoạt động theo nguyên tắc luôn tạo ra các ví dụ hoạt động mà khách hàng có thể sửa đổi dần dần để đáp ứng nhu cầu của họ.

Việc sử dụng MLOps nên dạy các phương pháp áp dụng MLOps tốt nhất đã biết. Cần phải nhận ra rằng nhiều khách hàng sẽ là chuyên gia trong lĩnh vực Khoa học dữ liệu nhưng có thể ít tiếp xúc với DevOps hoặc các nguyên tắc SDLC khác. Để giảm thiểu quá trình học tập, các giá trị mặc định của MLOps phải luôn phù hợp với phương pháp hay nhất trong môi trường sản xuất thay vì 'nhanh và bẩn' như trường hợp người dùng chuyên nghiệp muốn dùng thử phiên bản công cụ trong môi trường băng thử nghiệm.

Phạm vi

Trọng tâm của lộ trình này là về các khía cạnh liên quan đến việc mở rộng các nguyên tắc DevOps sang lĩnh vực Học máy và không thảo luận về các tính năng cơ bản của phương pháp tiếp cận DevOps hoặc CI/CD phổ biến trên cả hai miền. Mục đích là để MLOps trang trí DevOps hơn là phân biệt.

Thách thức

Trong phần này, chúng tôi xác định những thách thức trong tương lai gần và xa sẽ cản trở áp dụng thành công và hưởng lợi từ các nguyên tắc MLOps:

Hướng dẫn các nhóm khoa học dữ liệu về những rủi ro khi cố gắng sử dụng Jupyter Notebooks trong Production

Truyền đạt SDLC và các bài học về DevOps theo cách mà team cảm thấy thoải mái
Làm nổi bật sự nguy hiểm của việc sử dụng các công cụ như Jupyter Notebooks trong Production
Thể hiện các lựa chọn thay thế đơn giản, dễ sử dụng như một phần của quy trình MLOps

Coi tài sản ML là thứ quan trọng trong các quy trình DevOps

Mở rộng các công cụ CI/CD để hỗ trợ tài sản ML
Tích hợp nội dung ML với hệ thống kiểm soát phiên bản (VCS)

Cung cấp các cơ chế quản lý tập dữ liệu train/test/val, code để train - thực nghiệm và service đều được quản lý phiên bản phù hợp

Tất cả tài sản dưới sự kiểm soát phiên bản
Nội dung ML bao gồm các bộ dữ liệu
Khối lượng dữ liệu có thể lớn
Dữ liệu có thể nhạy cảm
Quyền sở hữu dữ liệu có thể phức tạp
Nhiều nền tảng có thể được tham gia

Cung cấp các cơ chế quản lý tập dữ liệu train/test/val, code để train - thực nghiệm và service đều được kiểm tra trong toàn bộ vòng đời của chúng

Các mô hình có thể được đào tạo lại dựa trên dữ liệu mới
Các mô hình có thể được đào tạo lại dựa trên phương pháp mới
Các mô hình có thể tự học
Code có thể xuống cấp theo thời gian
Các mô hình có thể được triển khai trong các ứng dụng mới
Các mô hình có thể bị tấn công và yêu cầu sửa đổi
Các sự cố có thể xảy ra cần phân tích nguyên nhân gốc rễ và thay đổi
Sự tuân thủ của công ty hoặc chính phủ có thể yêu cầu kiểm toán hoặc điều tra

Xử lý tập dữ liệu train/test/val dưới dạng tài sản được quản lý trong quy trình làm việc MLOps

Nội dung tập dữ liệu huấn luyện/kiểm tra/xác nhận là cần thiết cho quá trình kiểm toán, phân tích nguyên nhân gốc rễ
Dữ liệu thường không được coi là tài sản được quản lý theo CI/CD thông thường
Dữ liệu có thể nằm trên nhiều hệ thống
Dữ liệu chỉ có thể cư trú trong các khu vực pháp lý bị hạn chế
Lưu trữ dữ liệu có thể không phải là bất biến
Quyền sở hữu dữ liệu có thể là một yếu tố

Managing the security of data in the MLOps process with particular focus upon the increased risk associated with aggregated data sets used for training or batch processing

- Aggregated data carries additional risk and represents a higher value target - Migration of data into Cloud environments, particularly for training, may be problematic

Implications of privacy, GDPR and the right to be forgotten upon training sets and deployed models

- The right to use data for training purposes may require audit - Legal compliance may require removing data from audited training sets, which in turn implies the need to retrain and redeploy models built from that data. - The right to be forgotten or the right to opt out of certain data tracking may require per-user reset of model predictions

Methods for wrapping trained models as deployable services in scenarios where data scientists training the models may not be experienced software developers with a background in service-oriented design

Operational use of a model brings new architectural requirements that may sit outside the domain of expertise of the data scientists who created it. Model developers may not be software developers and therefore experience challenges implementing APIs around their models and integrating within solutions.

Approaches for enabling all Machine Learning frameworks to be used within the scope of MLOps, regardless of language or platform

Common mistake to assume ML = Python Many commonly used frameworks such as PyTorch and TensorFlow exist and can be expected to continue to proliferate. MLOps must not be opinionated about frameworks or languages.

Approaches for enabling MLOps to support a broad range of target platforms, including but not limited to CPU, GPU, TPU, custom ASICs and neuromorphic silicon.

Choice of training platform and operational platform may be varied and could be different for separate models in a single project

Methods for ensuring efficient use of hardware in both training and operational scenarios

- Hardware accelerators are expensive and difficult to virtualise - Cadence of training activities impacts cost of ownership - Elastic scaling of models against demand in operation is challenging when based upon hardware acceleration

Approaches for applying MLOps to very large scale problems at petabyte scale and beyond

- Problems associated with moving and refreshing large training sets - Problems associated with distributing training loads across hardware accelerators - Problems with speed of distributing training data to correct hardware accelerators - Problems of provisioning / releasing large pools of hardware resources

Providing appropriate pipeline tools to manage MLOps workflows transparently as part of existing DevOps solutions

- Integration of ML assets with existing CD/CD solutions - Extending Cloud-native build tools to support allocation of ML assets, data and hardware during training builds - Hardware pool management

Testing ML assets appropriately

- Conventional Unit / Integration / BDD / UAT testing - Adversarial testing - Bias detection - Fairness testing - Ethics testing - Interpretability - Stress testing - Security testing

Governance processes for managing the release cycle of MLOps assets, including Responsible AI principles

- Managing release of new training sets to data science team - Establishing thresholds for acceptable models - Monitoring model performance (and drift) over time to feed into thresholds for retraining and deployments - Managing competitive training of model variants against each other in dev environments - Managing release of preferred models into staging environments for integration and UAT - Managing release of specific model versions into production environments for specific clients/deployments - Managing root cause analysis for incident investigation - Observability / interpretability - Explainability and compliance

Management of shared dependencies between training and operational phases

A number of ML approaches require the ability to have reusable resources that are applied both during training and during the pre-processing of data being passed to operational models. It is necessary to be able to synchronise these assets across the lifecycle of the model. e.g. Preprocessing, Validation, Word embeddings etc.

Abstraction for models

Stored models are currently often little more than serialised objects. To decouple training languages, platforms and hardware from operational languages, platforms and hardware it is necessary to have broadly supported standard intermediate storage formats for models that can be used reliably to decouple training and operational phases.

Longevity of ML assets

Decision-making systems can be expected to require very long effective operating lifetimes. It will be necessary in some scenarios to be able to refer to instances of models across significant spans of time and therefore forward and backward compatibility issues, storage formats and migration of long running transactions are all to be considered.

Managing and tracking trade-offs

Solutions including ML components will be required to manage trade-offs between multiple factors, for example in striking a balance between model accuracy and customer privacy, or explainability and the risk of revealing data about individuals in the data set. It may be necessary to provide intrinsic metrics to help customers balance these equations in production. It should also be anticipated that customers will need to be able to safely A/B test different scenarios to measure their impact upon this balance.

Escalation of data categories

As a side effect of applying governance processes to check for fairness and bias within models, it may become necessary to hold special category data providing information about race, religion or belief, sexual orientation, disability, pregnancy, or gender reassignment in order to detect such biases. As a result of this, there will be an escalation of data sensitivity and in the legal constraints that apply to the solution.

Intrinsic protection of models

Models are vulnerable to certain common classes of attack, such as: - Black box attempts to reverse engineer them - Model Inversion attacks attempting to extract data about individuals - Membership Inference attacks attempting to verify the presence of individuals in data sets - Adversarial attacks using tailored data to manipulate outcomes It should be anticipated that there will be a need for generic protections against these classes of challenge across all deployed models.

Emergency cut out

As models are trained typically with production/runtime data, and act on production data, there can be cases where undesirable behaviour of a recently deployed model change is only apparent in a production environment. One example is a chat bot that uses inappropriate language in reaction to some interactions and training sets. There is a need to have the ability to quickly cut out a model or roll back immediately to an earlier version should this happen.

Online learning

There are systems in use that use online learning, where a model or similar evolves in near real time with the data flowing in (and there is not necessarily a deploy stage). Systems also may modify themselves at runtime in this case without any sort of rebuilding or human approval. This will require further research and observation of emerging practices and use cases of this approach to machine learning.

Prioritising training activities

Training activities are time-consuming. In scenarios where systems support the simultaneous queuing and/or processing of multiple training activities, it must be possible to prioritise those activities to prevent long-running trainings from blocking the deployment of other processes such as security or bug fixes.

Guardrail metrics

MLOps processes always carry inherent risk so systems should support the use of guardrail metrics to aid in mitigating these risks.

Government regulation of AI

Proposed legislation on AI introduces the power to regulate or prohibit certain classes of products. Regulation introduces: - Mandatory data quality controls - Mandatory end-to-end compliance documentation - Transparency of system decisions - Human oversight over functioning - Ongoing publication of accuracy metrics - Proof of resilience against error or attack Compliance requires: - Third party conformity assessment prior to release - A new conformity assessment for each change - Post-market monitoring - Mandatory incident reporting - Traceability throughout the system’s lifecycle Government agencies, third party accreditation businesses and their subcontractors are required to have access to confidential intellectual property. Surveillance authorities shall be granted full access to the training, validation and testing datasets used.

Understanding ML models as a part of the broader product(s) in which they reside (rather than as independent products)

ML predictions are always exposed within the context of a broader user experience (interface), whether that experience is an application, report, or other UI/UX. This creates a challenge to manage the ML assets of a product in such a way that: - ML assets can be associated with a product so that consistent governance processes can be applied to the product & its ML assets - ML assets can be sold / transferred as a block of IP or re-used in other systems/applications Furthering this challenge is that sometimes multiple ML models are used in a single user-facing feature or a single ML model is used in multiple applications. Other challenges address the technical concerns (linking & releasing ML assets in tandem with software assets, etc).

Educating data science practitioners on the approaches & best practices used in product development while educating product development teams on the requirements of ML

Many best practices and approaches exist for delivering software products (Product/Customer Discovery, Lean Startups, Agile Development, Product Lifecycle Management, DevOps, Shift Left, etc). These approaches are well-understood and engrained in the culture of most product-led organizations. As ML matures to the point where the vast majority of business challenges can be addressed with “off the shelf” ML approaches, the use of ML will become a defacto standard part of every software application. Accordingly, data scientists will need to build ML models as a participant in the software development processes where these approaches are applied (rather than as a separate process). The challenge is to educate data scientists on established product processes and to communicate the additional requirements that ML brings to asset management.

Technology Requirements

In this section, we capture specific technology requirements to enable progress against the challenges identified above:

Educating data science teams regarding the risks of trying to use Jupyter Notebooks in production

Jupyter Notebooks are the tool we use to educate Data Scientists as they can easily be used to explore ad hoc ML problems incrementally on a laptop. Unfortunately, when all you have is a hammer, everything tends to look like a nail. We see Jupyter Notebooks featuring in production scenarios, not because this are the best tool for the job, but because this is the tool we taught people to use and because we didn't teach about any of the problems inherent in using that tool. This approach persists because of an ongoing gap in the tool chain. - Improved technology solutions are required that enable Data Scientists to easily run experiments at scale on elastic Cloud resources in a consistent, audited and repeatable way. - These solutions should integrate with existing Integrated Development Environments, Source Code Control and Quality Management tools. - Solutions should integrate with CI/CD environments so as to facilitate Data Scientists setting up multiple variations on a training approach and launching these as parallel, audited activities on Cloud infrastructure. - Working with the training of models and the consumption of trained models should be a seamless experience within a single toolchain. - Tooling should introduce new Data Scientists to the core concepts of software asset management, quality management and automated testing. - Tooling should enable appropriate governance processes around the release of ML assets into team environments.

Treat ML assets as first class citizens in DevOps processes

ML assets include but are not limited to training sets, training configurations (hyper-parameters), scripts and models. Some of these assets can be large, some are more like source code assets. We should be able to track changes to all these assets, and see how the changes relate to each other. Reporting on popular DevOps "metrics" like "Change Failure Rate" and "Mean cycle time" should be possible if done correctly. Whilst some of these assets are large, they are not without precedent in the DevOps world. People have been handling changes to database configurations, copying and backup of data for some time, so they same should apply to all ML assets. In the following sections we can explore how this may be implemented.

Providing mechanisms by which training sets, training scripts and service wrappers may all be versioned appropriately

Training sets are prepared data sets which are extracted from production data. They may contain PII or sensitive information, so making the data available to developers in a way analogous to source code may be problematic. Training scripts are smaller artefacts but are sometimes coupled to the training sets. All scripts should be versioned in a way that makes the connected to the training sets that they are associated with. Scripts and training sets are also coupled to meta data, such as instructions as to how training sets are split up for testing and validation, so a model can be reproduced in ideally a deterministic manner if required.

Providing mechanisms by which changes to training sets, training scripts and service wrappers may all be auditable across their full lifecycle

It is an essential requirement that all changes to ML assets must result in a robust audit trail capable of meeting forensic standards of analysis. It must be possible to work backwards from a given deployment of assets at a specified time, tracing all changes to this set of assets and the originators of each change. Tooling supporting decision-making systems in applications working with sensitive personal data or life-threatening situations must support non-repudiation and immutable audit records. It should be anticipated that customers will be operating in environments subject to legal or regulatory compliance requirements that will vary by industry and jurisdiction that may require varying standards of audit transparency, including requirements for third party audit.

Treating training sets as managed assets under a MLOps workflow

One of the difficult challenges for MLOps tooling is to be able to treat data as a managed asset in an MLOps workflow. Typically, it should be expected that traditional source code control techniques are inapplicable to data assets, which may be very large and reside in environments that are not amenable to tracking changes in the same manner as source code. New meta-data techniques must be created to effectively record and manage aggregates of data that represent specific versions of training, testing or validation sets. Given the variety of data storage mechanisms in common usage, this will likely necessitate pluggable extensions to tooling. It must be possible to define a specific set of data which can be introduced into MLOps tooling for a given training run to produce a model version, and to retrospectively inspect the set of data that was used to create a known model version in the past. Multiple model instances may share one data set and subsequent iterations of a model may be required to be regression tested against specific data set instances. It should be recognised that data sets are long-lived assets that may have compliance implications but which may also be required to be edited in response to data protection requests, invalidating all models based upon a set.

Managing the security of data in the MLOps process with particular focus upon the increased risk associated with aggregated data sets used for training or batch processing

The vast majority of MLOps use-cases can be expected to involve mission-critical applications, sensitive personal data and large aggregated data sets which all represent high value targets with high impact from a security breach. As a result, MLOps tooling must adopt a 'secure-by-design' position rather than assuming that customers will harden their deployments as part of their responsibilities. Solutions must not default to insecure configurations for convenience, nor should they provide user-facing options that invalidate system security as a side effect of adding functionality.

Implications of privacy, GDPR and the right to be forgotten upon training sets and deployed models

- Tooling should provide mechanisms for auditing individual permissions to use source data as part of training sets, with the assumption that permission may be withdrawn at any time - Tooling should provide mechanisms to invalidate the status of deployed models where permissions to use have been revoked - Tooling may optionally provide mechanisms to automatically retrain and revalidate models on the basis of revoked permissions - Tooling should facilitate user-specific exceptions to model invocation rules where this is necessary to enable the right to be forgotten or to support the right to opt out of certain types of data tracking

Models that have been trained and which have passed acceptance testing need to be deployed as part of a broader application. This might take the form of elastically scalable Cloud services within a distributed web application, or as an embedded code/data bundle within a mobile application or other physical device. In some cases, it may be expected that models need to be translated to an alternate form prior to deployment, perhaps as components of a dedicated FPGA or ASIC in a hardware solution. - MLOps tooling should integrate and simplify the deployment of models into customer applications, according to the architecture specified by the customer. - MLOps tooling should not force a specific style of deployment for models, such as a dedicated, central 'model server' - Tooling should assume that model execution in Cloud environments must be able to scale elastically - Tooling should allow for careful management of execution cost of models in Cloud environments, to mitigate the risk of unexpected proliferation of consumption of expensive compute resources - Tooling should provide mechanisms for out-of-the-box deployment of models against common architectures, with the assumption that customers may not be expert service developers - Tooling should provide automated governance processes to manage the release of models into production environments in a controlled manner - Tooling should provide the facility to upgrade and roll-back deployed models across environments - It should be assumed that models represent reusable assets that may be deployed in the form of multiple instances at differing point release versions across many independent production environments - It should be assumed that more than one point release version of a model may be deployed concurrently in order to support phased upgrades of system functionality across customer environments

Approaches for enabling all Machine Learning frameworks to be used within the scope of MLOps, regardless of language or platform

MLOps is a methodology that must be applicable in all environments, using any programming language or framework. Implementations of MLOps tooling may be opinionated about the approach to the methodology but must be agnostic to the underlying technologies used to implement the models and services associated. It should be possible to use MLOps tooling to deploy solution components utilising different languages or frameworks using loosely-coupled principles to provide compatibility layers.

Approaches for enabling MLOps to support a broad range of target platforms, including but not limited to CPU, GPU, TPU, custom ASICs and neuromorphic silicon.

MLOps should be considered as a cross-compilation problem where the architectures of the source and target platforms may be different. In trivial cases, models may be trained on say CPU or GPU and deployed to execute on the same CPU or GPU architecture, however other scenarios already exist and should be expected to be increasingly likely in the future. This may include training on GPU / TPU and executing on CPU or, in edge devices, training on any architecture and then translating the models into physical logic that can be implemented at very low cost / size / power directly on FPGA or ASIC devices. This implies the need for architecture-independent intermediate formats to facilitate cross-deployment or cross-compilation onto target platforms.

Methods for ensuring efficient use of hardware in both training and operational scenarios

ML training and inferencing operations are typically very processing intensive operations that can expect to be accelerated by the use of dedicated hardware. Training is essentially an intermittent ad-hoc process that may run for hours-to-days to complete a single training run, demanding full utilisation of large scale compute resources for this period and then releasing that demand outside active training runs. Similarly, inferencing on a model may be processing intensive during a given execution. Elastic scaling of hardware resources will be essential for minimising cost of ownership however the dedicated nature of existing accelerator cards makes it currently hard to scale these elastically in today's Cloud infrastructure. Additionally, some accelerators provide the ability to subdivide processing resource into smaller allocation units to allow for efficient allocation of smaller work units within high capacity infrastructure. It will be necessary to extend the capabilities of existing Cloud platforms to permit more efficient utilisation of expensive compute resources whilst managing overall demand across multiple customers in a way that mitigates security and privacy concerns.

Approaches for applying MLOps to very large scale problems at petabyte scale and beyond

As of 2022, large ML data sets are considered to start at around 50TB and very large data sets may derive from petabytes of source data, especially in visual applications such as autonomous vehicle control. At these scales, it becomes necessary to spread ML workloads across thousands of GPU instances in order to keep overall training times within acceptable elapsed time windows (less than a week per run). Individual GPUs are currently able to process in the order of 1-10GB of data per second but only have around 40GB of local RAM. An individual server can be expected to have around 1TB of conventional RAM and around 15TB of local high speed storage as cache for around 8 GPUs, so may be able to transfer data between these and the compute units at high hundreds of GB/s with upstream connections to network storage running at low hundreds of GB/s. Efficient workflows rely upon being able to reduce problems into smaller sub-units with constrained data requirements or systems start to become I/O bound. MLOps tooling for large problems must be able to efficiently decompose training and inferencing workloads into individual operations and data sets that can be effectively distributed as parallel activities across a homogeneous infrastructure with a supercomputing-style architecture. This can be expected to exceed the capabilities of conventional Cloud computing infrastructure and require dedicated hardware components and architecture so any MLOps tooling must have appropriate awareness of the target architecture in order to optimise deployments. At this scale, it is not feasible to create multiple physical copies of petabytes of training data due to storage capacity constraints and limitations of data transfer rates, so strategies for versioning sets of data with metadata against an incrementally growing pool will be necessary.

Providing appropriate pipeline tools to manage MLOps workflows transparently as part of existing DevOps solutions

Existing projects using DevOps practices typically will have pipelines and delivery automated. Ideally MLOps solutions would extend this rather than replace it. In some cases it may be necessary to have a ML model (for example) have its own tooling and pipeline, but that should be the exception (as typically there is a non trivial amount of source code that goes along with the training and model for scripts and endpoints, as covered previously).

Testing ML assets appropriately

ML assets should be considered at least at the same level as traditional source code in terms of testing (unit, integration, end to end, acceptance etc). Metrics like coverage may still apply to scripts and service endpoints, it not for the model itself (as it isn't derived from source code). Further to this, typically models when deployed are used in decision making capacities where the stakes are higher, or there are potential governance or compliance or bias risks. This implies that testing will need to cover far more than source code, but actively show in a way suitable to a variety of stakeholders, how it was tested (for example was socioeconomic bias testing included). The testing involved should be in an accessible fashion so it is not only available for developers to audit.

Governance processes for managing the release cycle of MLOps assets, including Responsible AI principles

MLOps as a process extends governance requirements into areas beyond those typically considered as part of conventional DevOps practices. It is necessary to be able to extend auditing and traceability of MLOps assets all the way back into the data that is chosen for the purposes of training models in the first instance. MLOps tooling will need to provide mechanisms for managing the release of new training sets for the purposes of training, with the consideration that many data scientists may be working on a given model, and more than one model may be trained against a specific instance of a training data set. Customers must have the capability to retain a history of prior versions of training data and be able to recreate specific environments as the basis of new avenues of investigation, remedial work or root cause analysis, potentially under forensic conditions. The training process is predicated upon the idea of setting predefined success criteria for a given training session. Tooling should make it easy for data science teams to clearly and expressively declare success criteria against which an automated training execution will be evaluated, such that no manual intervention is required to determine when a particular training run meets the standard required for promotion to a staging environment. Tooling should also offer the ability to manage competitive training of model variants against each other as part of automated batch training activities. This could involve predefined sets of hyper-parameters to test in parallel training executions, use of smart hyper-parameter tuning libraries as part of training scripts, or evolutionary approaches to iteratively creating model variants. It should be possible to promote preferred candidate models into staging environments for integration and acceptance testing using a defined set of automated governance criteria and providing a full audit trail that can be retained across the lifetime of the ML asset being created. Tooling should permit the selective promotion of specific model versions into target production environments with the assumption that customers may need to manage multiple live versions of any given model in production for multiple client environments. Again, this requires a persistent audit trail for each deployment. It should be assumed that the decision-making nature of ML-based products will require that any incident or defect in a production environment may result in the need for a formal investigation or root-cause analysis, potentially as part of a compliance audit or litigation case. Tooling should facilitate the ability to easily walk backwards through audit trail pathways to establish the full state of all assets associated with a given deployment and all governance decisions associated. This should be implemented in such a way as to be easily initiated by non-technical staff and provide best efforts at non-repudiation and tamper protection of any potential evidence. Models themselves can be expected to be required to be constructed with a level of conformance to observability, interpretability and explainability standards, which may be defined in legislation in some cases. This is a fundamentally hard problem and one which will have a significant impact upon the practical viability of deploying ML solutions in some fields. Typically, these concerns have an inverse relationship with security and privacy requirements so it is important that tooling consider the balance of these trade-offs when providing capabilities in these areas.

Management of shared dependencies between training and operational phases

Appropriate separation of concerns should be facilitated in the implementation of training scripts so that aspects associated with the preprocessing of data are handled independently from the core training activities. Tooling should provide clean mechanisms for efficiently and safely deploying code associated with preprocessing activities to both training and service deployment environments. It is critical that preprocessing algorithms are kept consistent between these environments at all times and a change in one must trigger a change in the other or raise an alert status capable of blocking an accidental release of mismatched libraries. It should be considered that the target environment for deployment may differ from that of training so it may be necessary to support implementations of preprocessing functions in different languages or architectures. Under such circumstances, tooling should provide robust mechanisms for ensuring an ongoing link between the implementations of these algorithms, preferably with a single, unified form of testing to prevent divergence.

Abstraction layer for models

It is unsafe to assume that models will be deployed into environments that closely match those it which the model was originally trained. As a result, the use of object serialisation for model description is only viable in a narrow range of potential use cases. Typically, we need to be able to use a platform independent intermediate format to describe models so that this can act as a normalised form for data exchange between training and operational environments. This form must be machine readable and structured in such a way that it is extensible to support a wide range of evolving ML techniques and can easily have new marshalling and unmarshalling components added as new source and target environments are adopted. The normalised form should be structured such that the training environment has no need to know about the target operational environment in advance.

Longevity of ML assets

With traditional software assets, such as binaries, there is some expectation of backwards compatibility (in the case of, for example, Java, this compatibility of binaries has spanned decades). ML Assets such as binaries will need to have some reasonable backwards compatibility. Versioning of artefacts for serving the model are also important (but typically that is a better known challenge). In the case of backwards breaking changes, ML assets such as training scripts, tests and data sets will need to be accessed to reproduce a model for the new target runtime.

Managing and tracking trade-offs

ML solutions always involve trade-offs between different compromises. A typical example might be the trade-off between model accuracy, model explainability and data privacy. Viewed as the points of a triangle, we can select for a model that fits some point within the triangle where its distance from each vertex represents proximity to that ideal case. Since we train models by discovery, we can only make changes to our training data sets and hyper-parameters and then evaluate the properties of any resulting model by testing against these desired properties. Because of this, it is important that MLOps tooling provides capabilities to automate much of the heavy lifting associated with managing trade-offs in general. This might take the form of automated training of variations upon a basic model which are subsequently tested against a panel of selection criteria, evaluated and presented to customers in such a way as to make their relevant properties easily interpretable.

Escalation of data categories

To obtain an accurate model, or to prevent the production of models with undesirable biases, it may be necessary to store data of a very sensitive nature or legally protected categories. This data may be used to vet a model pre release, or for training, in any case it will likely be persisted along with training scripts. This will require strong data protections (as with any database of a sensitive nature) and auditability of access. MLOps systems are likely to be targets of attack to obtain this data, meaning stronger considerations for protections that just source code would be required. It is anticipated that regulatory requirements intended to reduce the impact of bias or fairness issues will have unintended consequences relating to the sensitivity of other data that must be collected to fulfil these requirements, creating additional privacy risks.

Intrinsic protection of models

Model inferencing will have to embrace modern application security techniques to protect the model against these kinds of attacks. Inferencing might be protected through restriction of access(tokens), rate-limiting, and monitoring of incoming traffic. In addition, as part of the integration test phase, there is a requirement to test the model against adversarial attacks(both common attacks, and domain-specific attacks) in a sandboxed environment. It must also be recognised that Python, whilst convenient as a language for expressing ML concepts, is an interpreted scripting language that is intrinsically insecure in production environments since any ad-hoc Python source that can be injected into a Python environment can be executed without constraint, even if shell access is disabled. The long term use of Python to build mission-critical ML models should be discouraged in favour of more secure-by-design options.

Emergency cut out

As a model may need to abruptly be cut out, this may need to be done at the service wrapper level as an emergency measure. Rolling back to a previous version of the model and a service wrapper is desirable, but only if it is fast enough for safety reasons. At the very least, the ability to deny service in the span of minutes in cases of misbehaviour is required. This cut out needs to be human triggered at least, and possibly triggered via a live health check in the service wrapper. It is common in traditional service deployments to have health and liveness checks, a similar thing exists for deployed models where health includes acceptable behaviour.

Online learning

Self-learning ML techniques require that models are trained live in production environments, often continuously. This places the behaviour of such models outside the governance constraints of current MLOps platforms and introduces potentially unconstrained risk that must be managed in end user code. Further consideration must be given to identifying ways in which MLOps capabilities can be extended into this space in order to provide easier access to best known methods for mitigating the risk of degrading quality.

Prioritising training activities

The requirement to prioritise training activities implies the need for prioritisation of pipeline tasks within MLOps tooling, along with management tools to support the maintenance of these priorities.

Guardrail metrics

It is necessary that solutions support automated protection against predictable risks. For example, a runaway training process may demand many costly instances of training hardware, or extended processing times that tend towards months or years. Small changes in training data may introduce significant regressions in patch releases of models that would be harmful in production. Solutions should provide the ability to specify guardrail metrics and facilitate the controlled interruption of processing in the event of these guards being triggered.

Government regulation of AI

Proposed regulation creates an urgent requirement for the implementation of many of the technology requirements outlined in the roadmap as existing approaches will cease to be viable upon introduction. This in turn also introduces the need to provide traceable reporting to third parties whilst protecting intellectual property as part of the daily function of regulatory compliance.

Understanding ML models as a part of the broader product(s) in which they reside (rather than as independent products)

Tooling should enable management of ML assets as reusable components within a larger system/application (similar to dependency management for software). Tooling should enable explicit dependency management between: - Models that depend upon other models - Applications that depend upon models The tooling should enable operational changes to an ML asset in a way that downstream impacts to the dependent applications can be managed.

Educating data science practitioners on the approaches & best practices used in product development while educating product development teams on the requirements of ML

Adapting existing educational material that discusses product development to be easily understood/resonate with data scientists. This should complement a technology-based solution to help data scientists to understand how product teams (currently) function so they can fit into these processes. The technology solution should enable a data scientist to build ML models as a direct participant in the existing product design & development processes. While detailed in other challenges, these tools should enable ML assets to be treated in the same way as conventional software assets.

Potential Solutions

This section addresses a timeline of potential solutions to assist in understanding the relative maturity of each area of concern and to highlight where significant gaps exist:

Key:

Cross-Cutting Concerns

The following areas represent aspects of MLOps that overlap with issues covered by other SIGs and indicate where we must act to ensure alignment of work on these elements:

The following cross-cutting concerns are identified:

CI/CD tooling across DevOps and MLOps scenarios
Pipeline management
Common testing aspects such as code coverage, code quality, licence tracking, CVE checking etc.
Data quality
Security and Supply Chain management
Legislative compliance
SBOMs

Conclusions and Recommendations

Progress in the MLOps space has been slow, with many ML products continuing down the path of implementing isolated, run-time stores, manually deploying ML assets into live production environments without automated governance processes. Meanwhile, pressure for sweeping government regulation in the field of AI grows across multiple jurisdictions. There is increasing concern that it will not be possible to meet the demands of imminent legislation with the tools and approaches being used today and there is an urgent need for the widespread adoption of more formal MLOps methods.

Concerns have been raised that some of the proposed legislation would negatively impact Open Source projects, making it cost-prohibitive to create open, ML-based solutions in some regions, and exposing contributors to potential risk of prosecution, if their projects are used in regions where this legislation is in force.

Another emerging requirement is the need for products to provide a Software Bill of Materials (SBOM) for audit and compliance purposes. As this becomes a legislative requirement in some markets, it will become increasingly important to include ML assets within the BOM for a given product.

A meta-study of 769 Machine Learning papers, published in 2021 by Benjamin, et al (https://arxiv.org/pdf/2106.15195.pdf), showed multiple failings of scientific rigour in validating the results of many projects. It is clear that a more mature process and availability of suitable tooling is essential for the successful management of machine learning products.

Of similar concern is the ongoing lack of understanding of the issues associated with the use of Jupyter Notebooks in building ML assets. A large scale study by Pimentel et al (http://www.ic.uff.br/~leomurta/papers/pimentel2019a.pdf) looked at 1.4 million Jupyter Notebooks from GitHub. They found that only 24% could be executed without errors and that a mere 4% produced repeatable results, with only 1.5% having any form of testing implemented. As a result, it is to be expected that the mortality rate of ML products will continue to be excessively high due to the high likelihood of multiple failures of reproducibility of core assets in production.

Market forces are inexorably moving us to an imminent need for MLOps tooling that is Product-focused, rather than ML-focused.

Glossary

Short definitions of terms used that aren't explained inline.

Training: act of combining data with (hyper) parameters to yield a model
Model: A deployable (usually binary) artefact that is the result of training. Can be used at runtime to make predictions for example
Hyper-parameter: parameters used during training of a model, typically set by a data scientist/human
Parameter: a configuration variable set after the training phase (usually part of a model)
Endpoint: typically a model is deployed to an endpoint which may be a HTTPS service which serves up predictions (models may be also deployed to devices and other places)
Training Pipeline: all the steps needed to prepare a model
Training set: a set of example data used for training a model

Previous6 nguyên lý thiết kế microservices NextSBOMs là gì?

Last updated 2 years ago