Accelerating your machine learning pipeline with an ML platform team

Manasi Vartak is founder and CEO of poura provider of operational AI and ML model management solutions based in Palo Alto.

Organizations that are expanding their use of artificial intelligence/machine learning (AI/ML) often hit a tipping point where they need a centralized ML platform team to support their ML operations.

Perhaps your data scientists are being warned during office hours because something went wrong with a model in production, or they are supporting tools instead of building and training models. Perhaps the number of smart products or features being shipped has dropped to an unacceptable threshold due to the size of your data science team, or perhaps you are scaling up real-time use cases that will need 24/7 support.

An ML platform team can help address all of these challenges.

The Role of the ML Platform Team

The ML platform team assembles, updates and supports the tools and processes for producing ML models. That includes:

• Execute the selection processes for the tools that make up the ML platform, including gathering requirements from stakeholders, such as data scientists, data engineers, IT, production engineers and the governance/risk team.

• Build Jenkins wrappers or pipelines to complement vendor solutions — often required due to the fragmented ML tooling landscape.

• Establish and update consistent processes around model production, including deploying models to the test environment, promoting models to production, and making changes to production.

• Provide production support for templates to ensure high availability and reliability, ensuring the toolset that make up the platform work as needed, and taking the lead when issues arise.

As an organization grows and matures in ML capabilities over time, the ML platform team can separate into distinct subgroups for engineering and platform support. The former brings together new tools, optimizes them, and maintains the overall integrity of the platform. The latter is on call to support production models.

What to Look for in ML Platform Engineers

The main challenge to building an ML platform team is hiring. “Unicorn” is an accurate description of the ideal ML platform engineer! Here are some considerations to keep in mind.

• ML platform engineers need competence in software, DevOps, ML, and data.

• The skill set of a data scientist differs substantially from what is needed for an ML platform engineer, so it is not a natural career path for data scientists.

• If a candidate isn’t curious to learn about data science or ML or doesn’t like working with diverse stakeholders, this will not be a good job fit.

Where to look for ML platform engineers

Potential candidates for this role could come from the ranks of solution architects, accustomed to putting together multiple components and capable of working in the diverse ML landscape. What they often lack is experience running software in production.

This can also be a lucrative path for someone in DevOps to improve their skills and also get paid more. DevOps engineers looking to advance should familiarize themselves with MLOps and explore ML platform engineer positions.

Finally, sometimes people come into this world with a background in big data — “big data” being a catch-all term for anyone who has worked with Spark or Hadoop.

Note that “ML engineer” and “ML platform engineers” are not the same role. While the ML platform engineer maintains the platform, the ML engineer works on optimizing the models from a software engineering point of view. The ML engineer will not be involved in working on the tools that make up the platform.

Starting

With the right set of tools, you can start small and scale. A small company or a larger organization with a small team of data scientists might start with one or two individual contributors. So, as you grow your data science team, the ML platform team should equal about 10% of the number of data scientists. At some point, you hit a steady state where the team can provide all the support you need without adding staff, as long as you have the right tools.

Organizationally, for mid-sized companies or startups, the ML platform team usually falls under central engineering. For large companies, staff is typically more closely aligned with IT. My opinion is that they should be close to the infrastructure, encompassing IT and data science.

In general, however, it doesn’t matter where the ML platform team is located. What’s important is that they have strong connections with IT and data science – the key stakeholders they’ll collaborate with on a regular basis.

The ROI on an ML Platform Team

It is important to define clear metrics to evaluate the performance of the ML platform team. These must include:

• Service Level Agreements (SLAs) related to the performance of the platform itself (eg uptime and number of outage incidents).

• Metrics related to data science (for example, number of models shipped in a quarter and model velocity).

• The number of ML-related security incidents that have occurred.

• A subjective customer success metric based on data scientists’ satisfaction with using the platform.

Generally, ML platform teams guarantee “speed safely”. That is, they ensure that data scientists can deploy models into production efficiently, with checks and balances that reduce risk and provide proper governance. The ML platform team also provides L0 and L1 support for models in production, protecting data science teams from the day-to-day vagaries of major production systems.

Without an ML platform team, consider how much of your data science team’s work isn’t seeing the light of day or how much time data scientists are spending dealing with tools. An ML platform team helps increase the productivity of the data scientist team and the value they and their models provide.

In conclusion, if your data scientists are responding to model issues at odd hours or tool support write cycles, you’re probably ready to set up a centralized ML platform team. This is especially true for organizations that are scaling up real-time use cases and that need to keep the models up and running in production like any other business-critical software in the enterprise.


The Forbes Technology Council is an invite-only community for world-class CIOs, CTOs and technology executives. Do I qualify?


Leave a Reply

Your email address will not be published. Required fields are marked *