Showing posts with label feature store. Show all posts

Friday, 14 June 2024

Part 3: An AI Lakehouse can be a forcing function for MLOps

In the first and second part of this article series, we explored the paradox that organizations face when considering the adoption of AI systems. On the one hand, AI systems offer significant potential benefits, such as real-time operational input and improved decision-making. On the other hand, organizations are confronted with significant challenges, primarily related to the perceived complexity of AI systems.

Part 1 of the series identified three main categories of complexity: ecosystem integration complexity, engineering complexity, and operational complexity. Ecosystem integration complexity arises from the need to integrate AI systems with various input and output systems, which can involve multiple data sources, targets, cadences, and types. Engineering complexity stems from the extensive pre-processing, efficiency requirements, and diverse frameworks and languages used in AI systems. Operational complexity involves ensuring the availability and uptime of AI systems, maintaining and evolving them, and implementing and ensuring data governance.

Part 2 of the series introduced MLOps as a solution to the paradox of AI complexity. MLOps is a set of practices and tools that help organizations manage the lifecycle of machine learning (ML) models. By implementing MLOps practices, organizations can improve the quality and reliability of their ML models, reduce the risk of model failures, and accelerate the time it takes to bring ML models to production.

In Part 3, we will delve deeper into the benefits of MLOps and explore how it can help organizations overcome the challenges of AI complexity. MLOps has implications on people and processes that are used to build AI systems, and in this third article we will discuss how we can maximize the chances of such a system to be successfully implemented. We will consider the key components of an MLOps platform, articulate how to use these components as forcing functions and provide practical tips and best practices for implementing MLOps in organizations.

At Hopsworks , we have worked with many clients that have successfully used and implemented an MLOps system - and they have done so using the Hopsworks AI Lakehouse.

The Hopsworks AI Lakehouse acts as a centralized repository around which all the different process steps in a machine learning system can be built. This will then help people, and force them to some degree, to adhere to the principles of MLOps by leveraging the AI Lakehouse:

It brings together and integrates all the moving parts of an ML system, so that the individual engineer does not need to duct tape a system together with him- or herself.
It integrates online and offline machine learning data into one coherent metadata-based system.
It provides easy-api based access to the right data that is needed for the ML processes to take place. For example, it ensures point-in-time-correct joins that prevent leakage of future data into the predictive model that we want to use.
It provides automated deployment procedures that allow devops best practices to be adopted by machine learning experts
It guarantees reproducibility and auditability that is (or will be) requirement by internal or external AI rules and regulations

The core idea of the AI Lakehouse, therefore, becomes that of using centralized data as the forcing function for implementing all the required tools, processes and team interactions that we need to make AI a much more predictable, productive, and efficient endeavour.

Just to make sure we all understand: a forcing function is a factor or event that compels or influences a change or action. It is often used in project management, strategy development, and organizational change to create a sense of urgency and drive progress. The goal is to encourage individuals or teams to take action by creating a situation where they feel compelled to do so. Forcing functions can be internal or external. Internal forcing functions are typically related to the organization's goals, objectives, or values. External forcing functions are often driven by market conditions, technological advancements, or regulatory changes. Forcing functions can be powerful tools for driving change and innovation. By creating a sense of urgency and compelling action, they can help organizations overcome inertia and achieve their goals more quickly.

And that is exactly what the AI Lakehouse can provide in the context of the AI Paradox and MLOps as a solution strategy to that paradox: all different stakeholders will be organizing their processes around the AI Lakehouse, offering a central point of control for MLOps best practices to be implemented. This makes MLOps a reality, and allows us to use the power of these principles to resolve the AI paradox that we addressed in the first part of these three articles. Effectively, AI Lakehouses make AI systems easier to implement, faster to deploy, more economically viable and structurally compliant with your organizational context’s regulations. That is a pretty great thing!

Thank you for taking the time to read these three articles - I hope they will allow you to shape your thinking around AI projects.

Thursday, 13 June 2024

Part 2 of 3: MLOps is a solution to the paradox of AI

In case you missed it: please find part 1 of this article series over here.

In that Part 1 of this article series, we highlighted the existing paradox in the AI industry where organizations recognize the massive potential benefits of AI systems like ChatGPT but face significant challenges due to perceived complexities. These complexities include

ecosystem integration complexities involving multiple data sources, targets, cadences, and types;
engineering complexities such as pre-processing, efficiency, and framework choices; and
operational complexities related to availability, maintenance, and data governance.

To address this paradox, organizations must strike a balance between leveraging the benefits of AI while managing its complexities effectively, with successful companies likely to gain a competitive advantage. In this second part of the article series, I would like to propose a particular solution to this balancing act. This solution is called MLOps.

About MLOps

MLOps (Machine Learning Operations) is a set of practices and tools that help organizations manage the lifecycle of machine learning (ML) models. It encompasses the entire process of developing, deploying, and maintaining ML models, from data collection and preparation to model training, testing, deployment, and monitoring. The goal of MLOps is to ensure that ML models are reliable, scalable, and perform as expected in production environments. It also aims to streamline the ML development process, making it more efficient and reproducible.

MLOps is becoming increasingly important as organizations adopt ML systems at scale, and borrows many of its ideas and concepts from the world of software engineering, often referred to as DevOps (Developer Operations). By implementing MLOps practices, organizations can improve the quality and reliability of their ML models, reduce the risk of model failures, and accelerate the time it takes to bring ML models to production. MLOps can help us productionize AI systems more efficiently, by

automating the different steps in the process

Aligning the different teams that are responsible for these different steps

Automating the processes and aligning the teams around these processes is what is going to allow organizations to seize the value that is being promised by AI, while at the same time managing and potentially even reducing the complexity of these systems. MLOps is the antidote to the potential poisoning of AI technology with unwieldy complexity.

So to summarize: the thesis of this second part of this article series, is that there is a solution possible to the paradox that we are currently seeing in the AI industry, and that this solution presents itself as a series of technological tools, processes and team alignment best practices. The question therefore becomes how we can easily and efficiently implement the practices of MLOps. This is likely not to be a trivial task, as we will be touching tools, processes and people during the implementation of MLOps in our organization. This, therefore, is what we will be discussing in the third and final part of this article series.

AI Lakehouse as a Forcing Function for Production AI systems (intro and part 1/3)

At Hopsworks, we have been developing awesome technologies that make it possible to develop powerful #AI systems efficiently and effectively, in a way that also safeguards the potentially privacy-sensitive data that we expose to it. In this 3-part series or articles, I would like to articulate and summarize the reasons why these technologies are of interest to our customers, so that others can benefit from it as well.

I will do so in 3 parts:

Part 1: Explaining the particular paradox of AI that organizations are facing today, and that is potentially slowing down their willingness to engage with this powerful new class of technologies.
Part 2: Investigating how we can break the paradox, and overcome the barriers that hold us back. In this part, we will focus a lot on a particular class of IT systems grouped together as MLOps systems, and explain how they help in overcoming the seeming paradox.
Part 3: Articulate how MLOps systems need to be architected in a particular way in order for them to drive behavior and achieve successful implementation. This will focus on the observation that a feature store, a central data repository for all MLOps systems and the AI systems that they enable, can act as a forcing function for successful implementations.

So let’s explore these topics, in a three-part series. I will be publishing these parts over the next few days. But lets's start with today's article - Part 1.

Part 1 of 3: The Paradox of AI

Let’s start with an interesting observation that we hear from almost every single Hopsworks user, prospect or customer: there is something paradoxical about the current state of the AI industry. On the one hand, and especially since the rise of generative AI systems like ChatGPT and its siblings, people seem very much convinced that AI has massive potential benefits that could impact every organization, big or small. Listing these benefits is almost impossible to do exhaustively, but at a high level we see benefits related to

Increased Data Processing capacity: AI enables the processing of vast amounts of data, allowing organizations to gain valuable insights and make informed decisions.
Faster and Better Decision-Making: AI-powered systems can analyze data in real-time, enabling faster and more accurate decision-making.
Improved Efficiency and Innovation: AI drives efficiency by automating repetitive tasks and fostering innovation by providing new solutions to complex problems.
Moving up the value pyramid: AI systems are delivering real-time operational input, enabling organizations to respond quickly to changing conditions.

It’s almost impossible to ignore the potential of these systems - yet at the same time we see some real and important challenges that are preventing organizations from making significant commitments to them. For the most part, these challenges seem to be related to the perceived Complexity of AI systems, which manifest themselves in a number of different ways:

Ecosystem Integration Complexity:

When we look at these AI systems, we see that the integration of these systems with some of the input and output systems around it has become significantly more complex:

Input Data from Different Sources: AI systems often integrate data from multiple systems and technology layers, leading to increased complexity.
Output Data to Different Targets: AI systems often output data to multiple target systems and technology layers, leading to increased complexity.
Data with Different Cadences: The data being integrated may have different timing requirements in which the systems need to be receiving data from / sending data to these systems, further complicating the integration process.
Data of Various Types and Schemas: AI systems need to handle different data types and schemas, such as pictures, audio, and time series, adding to the complexity.

Engineering Complexity:

Also from an engineering perspective, there is quite a bit of complexity to be reckoned with. AI systems often come with

Multiple Layers of Pre-Processing: AI models require extensive pre-processing and transformations to ensure data consistency and accuracy.
Real Requirements on Efficiency and Speed of Delivery: AI systems need to be efficient and deliver results quickly, which can be challenging to achieve.
Multiple Frameworks and Languages: The AI landscape comprises various frameworks and languages, making it difficult to choose the right ones for a particular project.

All of these add engineering complexity to the AI system.

Operational Complexity:

Last but not least, we also see our Hopsworks users grappling with the complexity of operationally managing these AI systems start to finish. This means

Guaranteeing Availability and Uptime: Ensuring the availability and uptime of AI systems is crucial for continuous operation.
Maintaining and Evolving the systems: AI systems require regular maintenance and evolution to keep up with changing requirements and technological advancements.
Implementing and ensuring Data Governance: AI systems need proper data governance to comply with current and upcoming regulations, such as GDPR and the EU AI Act. This involves versioning, metadata management, lineage tracking, and monitoring.

So to summarize: the paradox confronts AI’s significant benefits with significant complexity. The marketplace will force organizations to balance these competitively - and companies that best succeed in seizing the benefits while managing the complexities, are very likely to end up on top. This is why we would like to present a credible antidote to this paradox of AI in part 2 of these article series.

Tuesday, 27 February 2024

The 3 Whys of Feature Stores for Machine Learning & AI

Why you need a feature store, why you should buy (not build) one, and why you should consider Hopsworks

Start with 3 Why’s

Quite a few years ago, I read a really intriguing book by Simon Sinek: Start with Why. The subtitle actually gives away the essence of the book: How Great Leaders Inspire Everyone to Take Action. Spoiler alert: they do so by explaining WHY something needs to get done, before explaining how and what needs to get done. It’s a very simple, but in my experience, important and intuitive way to effectively communicate something to any audience. Whether you are communicating to customers, co-workers or your kids - the WHY usually paves that way for much smoother discussions and actions. Sinek talks about the Golden Circle, which outlines how starting from the inside (why) and working towards the outside (what) is an effective method of any communication strategy.

Since I started working for Hopsworks, I have had this framework in the back of my mind, as I got to talk to many more users, customers and partners that have been adopting the amazing technology that the team has built. In these discussions, it actually became clear to me that there are three different “WHY” questions that we need to answer for our community, if we want to be successful in the marketplace. At the risk of misusing the golden circle visualization, I have tried to put these 3 questions in 3 concentric circles in the figure below:

As you can see, you move from the OUTER circle to the INNER circle, and you try to address the following 3 questions:

Why would you consider using a feature store architecture in the first place? If you would find enough solid reasons for doing so, you would proceed to the next “Why” question, being:
Why would you NOT BUILD, but instead BUY a feature store for your data platform architecture? And if you find enough reasons to BUY and not build, then you would consider the last and final “Why” question, being:
Why would you specifically choose to buy the Hopsworks feature store for your data platform architecture?

If, and only if, we understand the potential answers to these questions, in all their variations, can we successfully meet the customer’s expectations and provide value in their implementation. That’s the core idea behind this thought process.

So let’s explore these three WHY questions, and their answers, in a bit more detail.

1. Why Consider a Feature Store for ML/AI?

It’s pretty clear that not everyone needs a Feature Store. A data platform like that is quite specific to the ML/AI workloads, and would only realistically be required or used by organizations and teams that have quite a deep understanding and investment into the relatively new fields of machine learning and artificial intelligence. If all you have in your environment is an early stage experiment with ML/AI technology, then most likely you do not yet have a need for a feature store - seems logical, right? So: what are the conditions under which you would want to consider it? What are the reasons for implementing a Feature Store in your organizations? Let’s explore this!

Many of these reasons were actually outlined in an earlier article on the Hopsworks Blog, and I believe that the reasons for considering a Feature Store are accurately described there. In this overview, I would like to make the distinction between technical and non-technical (as in, business / organizational / competency-related). Let’s dig into it:

Technical reasons for considering a feature store:

Existing models running in production are expensive - they are hard to debug, review and upgrade, they are bespoke systems that are difficult and costly to maintain. There’s a growing body of evidence that ML/AI systems that do NOT have a feature store architecture in the backend, are simply too expensive because of that - see other points.
Monitoring production pipelines is challenging, or impossible. The data that powers AI changes over time, and identifying when there are significant changes that require retraining your AI is not easy.
Difficulties in managing the lifecycle of feature data, including the tracking of versions and historical changes. This is an elementary requirement for all regulated data processing environments - and a key reason why feature stores align so well with these industries’ requirements.
Feature data is not centrally managed; it is duplicated, features are re-engineered, and generally data is not reused across the organization.

Non-technical reasons for considering a feature store:

Valuable models are created but once the experimentation stage is over they do not bridge the chasm to operations - the models do not consistently generate revenue or savings. This is all about getting the models to deliver value, consistently.
No cohesive governance in the storage and use of AI assets (feature data and models), everything is done in a bespoke manner, leading to compliance risks.
Slow time-to-market of AI models, and a general inability to provide very fresh feature data or handle real-time data for ML models, which is critical for industries like finance, retail, or logistics where real-time insights can add significant business value. This point is all about the speed with which the data science team can develop their models and bring them to life in a production environment.
Hard to derive a direct business value from the models, they exist in isolated environments that do not directly influence business operations. This then obviously makes it much harder to justify the investments required to develop and operationalize the models.
Slow ramp-up time when onboarding new talent into the ML teams. Sharing available AI assets is complex because operational knowledge is held by a few individuals or groups.

We have summarized these reasons in the outer circle of the figure below. I am sure that there are other reasons that could potentially be more applicable to your specific environment - but these are the higher level ones that we see time and time again in our Hopsworks user discussions.

So now we know and understand why an organization requires a feature store - great! But that does not necessarily mean that they will actually go out to look for one in the marketplace! Many organizations, especially the “digital natives” that are tuned in to the latest technology trends (like ML/AI) nowadays have a tendency to at least consider building a software component themselves - instead of buying one. This is a good and worthwhile consideration, as it seems clear to me that there is a minimum of scale and maturity required before wanting to go “all-in” on this brand new technology. For many people, a homegrown solution might be “good enough”.

So how do we consider whether or not a roll-your-own solution is good enough or not? Let’s consider some criteria.

2. Why NOT BUILD, but BUY a Feature Store?

In the second layer of the diagram below, we consider some of the reasons / criteria that would warrant you to look at the BUY option instead of the BUILD option. Some of these reasons have also been covered in a previous article, but let's revisit it here.

The most common reasons for buying and not building a feature store are:

Maintenance Burden & Total Cost of Ownership (TCO): clearly, this is something that every mature IT organization will consider. Ultimately, this is related to the potential technical debt that this organization will want to incur, given the significant costs that could be associated with this down the line. It’s important to consider not just the short term, but also the longer term implications of a build vs. buy decision.
Technical complexity: clearly, a piece of infrastructure software like a Feature Store, which will underpin all ML/AI applications that the organization would choose to develop, has a significant amount of technical complexity associated to it. It’s important to consider this, and to investigate the most crucial domains in which a “build” approach could encounter unexpected technical challenges.

Offline / Online sync: one of the key characteristics of a feature store is that it will both contain the historical data of a feature dataset, as well as the most recent values. Both have their use and purpose, and need to be kept in sync inside the feature repository. Feature Stores like Hopsworks do this for you, but in a “build” scenario you would need to take this into account and do all the ETL data lifting yourself.
Reporting and search: in any large machine learning system where you have dozens/hundreds/thousands of models in production, you would want and need the feature data to be findable, accessible, interoperable, and reusable - according to the so-called “F.A.I.R.” principles that we have described in this post. This seems easy - but if you consider all of the different combinations that you could have between versions of datasets, pipelines and models, it is clear that this is not a trivial engineering assignment.
Metadata for versioning and lineage: similar to the previous point, a larger ML/AI platform that is hosting a larger number of models, will need metadata for its online and offline datasets, and will need to accurately keep track of the versions and lineage of the data. This will increasingly become a requirement, as governance for ML/AI systems will cease to be optional. Implementations of and compliance with the EU AI Act, will simply mandate this - and the complexity around implementing it at scale is significant.
Time-travel and PITC joins: if we want to make the predictive results of our ML/AI systems explainable, we will need to be able to offer so-called “time travel” capabilities. This means that we can look at how a particular model yielded specific results based on the inputs that it received at a specific point in time. Feature Stores will need to offer this capability, on top of the requirement to guarantee that the models yield accurate and correct information at a given point in time - something we call “Point-in-time correctness”. Again, the technical complexity of implementing this yourself is not to be underestimated.

With that, we hope to have outlined some of the key reasons that you should consider buying, not building your feature store solution. At the end of the day this is a strategic decision that will be different for every organization - as long as the question is honestly asked and answered.

3. Why buy the Hopsworks Feature Store?

Last but not least, we would also like to offer the readers that have a) first decided that they need a feature store, and b) also decided that they will want to buy such a critical piece of infrastructure and not build it themselves, a perspective on why Hopsworks might be the best choice for your environment. In line with the previous “Golden Circle” visuals, we now get to the “inner” circle of the diagram:

Obviously we are conscious of the poor readability of the diagram, so here’s a cut-out that is a bit more readable.

As you can see, we think that there are essentially 4 main reasons why the Hopsworks Feature Store solution could be the best possible fit for your environment. Let’s discuss each of these briefly:

Performance and HA: Hopsworks has been working on the Feature Store for a number of years, with a top team of academic and industry specialists. We have integrated and embedded the best possible technologies, like for example RonBD, on the market, and have proven that this is currently giving us unparallelled performance. Take a look at these open benchmarks for yourself, and you will see that Hopsworks is in a league of its own with regards to performance. On top of that, we have been leveraging expertise in systems High-Availability to develop a feature store solution that can withstand the most demanding workloads.
Flexible deployment (serverless / cloud / on-prem): Hopsworks is the only solution on the market that offers you the choice of deployment options that is best-suited for your specific environment. You can start small with a multi-tenant-based serverless environment, grow into a managed cloud deployment in your AWS / Azure / GCP account, or even repatriate the workload onto your own, on-premise hardware. No other solution offers this, today.
Governance and compliance: Hopsworks has taken great pains at developing industry leading governance capabilities into the product. Versioning, lineage, time travel, search, security, monitoring and reporting - all of the advanced functionalities that a compliant solution will be required to deliver, now and in the future.
Value for money, TCO: Hopsworks believes that in order for ML/AI to be successful, it needs to deliver value, and it needs to offer its users a clear Return on Investment. That means that the solution needs to be available at a reasonable price, and that consumption-based metrics cannot always be used for billing. We need to allow for testing, training, experimentation, learning and development - without requiring the customer to empty their pockets from day one, and all the while managing the total cost of ownership of the solution.

This brings us to the end of this article, where we have tried to discuss the “3 Whys” of Feature Store implementations. We hope this was a useful discussion, and are happy to discuss this with you as well. No doubt, we can make the argument even more detailed and refined, together.

All the best

Rik

Wednesday, 6 December 2023

Joining Hopsworks: of course I had to get in on the AI/ML hype! Here's why!

I wanted to write a short article to share with you some of my thoughts on why I joined Hopsworks and what my and our plans are for the next couple of months and years - aside from utter world domination, of course :) … Here goes.

2023 has been an interesting year for me. Exiting from Neo4j after a decade was of course a little bit of an earthquake for me - mostly because all of a sudden I was no longer part of the graph tribe as much any more. Even though I wanted the exit and totally agreed to the process, it was still more impactful than I had guessed. Then came a number of interesting but turbulent months where I worked for a number of smaller startups, among which the amazing team at Hackolade - but in all honesty, just could not find my bearings. So when that quest ended, it felt like I really needed to take a step back and think about what I wanted to do.

That process, and a whole bunch of coincidences that I never could have planned, lead me to join the best Machine Learning platform company I could find :) … Here’s what made me do that.

The Hopsworks market seems incredibly interesting to me. Yes, since the ChatGPT hype every man and their dog seems to want to do something with AI - but I am convinced that this is not the next blockchain hype. There is much more to it - and ML and AI systems are here to stay.
The reason why they are here to stay isstay, is that there is true value in it. Yes, we have had great data analytics for years. Yes we have had statistics for years. But to be able to use the best of that in a system that brings in the best of these techniques in operational pipelines that are actually going to accurately predict the future - it’s pretty much a game changer. If you have seen how ChatGPT is able to predict the words of sentences etc… then you understand how good these ML-based predictions can really be. The results are just very impressive.
The number of business use-cases for that type of predictive power is just without limit. As always, I expect the adoption to be gradual and value-based: the people that get the most value from these platforms will likely start using it first. But, not unsimilar to what I experienced at Neo4j, there is no limit to the potential of Hopsworks. The current customers already prove it.
Hopsworks actually has some amazing product tech that makes it really well poised for success. At the heart of that technology is a feature store (think of it as the beating heart of the ML system: this is where the online and offline data of your ML system gets stored and managed) that, just because it is so damn fast, allows for a completely different way of doing machine learning. RonDB, the key-value store that Mikael Ronström originally conceived for MySQL as NDB, has found another great use case for it - and we are going to use it to the max.

And then, there is one final thing that made it easy to join a new adventure: the team. Hopsworks is about 35 people strong these days, and they have been nothing but superbly welcoming to me. It’s been a month of fantastic energy, lots of laughter, great discussions, and a feeling of collective purpose that I am savoring by the minute. I am extremely grateful to be able to start this journey - and will share updates along the way.

If you want to know more about Hopsworks, please go to hopsworks.ai, or hit me up so that we can schedule a chat. I have also included a short intro video at the bottom of this post.

All the best

Rik

PS: Find a lot of our videos on our youtube page!

Bruggen Blog

Pages