Friday, 6 December 2024

Can you elevate your pitch with AI?

 


Working at Hopsworks has been a great experience for many reasons, but one of the main attractions for me personally has been and still is the proximity that it offers me to some of the most exciting IT developments in our lifetime: the rise of Artificial Intelligence in an innumerable number of business use cases.

Of course, much of that interest and the fascination for it is fueled by the impressive achievements offered by Large Language Models (LLM) and their applications:  LLMs are such powerful tools, when used in capable hands, of course, that they can really offer massive productivity enhancements and therefore, new fields of application. 

In my own daily work, I use LLMs (either Google's Gemini or OpenAI 's different ChatGPT based systems) very regularly - increasingly so. I have found it to be a superbly useful tool for writing, summarizing, coding and just in general, learning. And recently I had a couple of amazing experiences that have simply been too good not to share. One of them I already wrote about: using ChatGPT as an interactive role-playing agent to practice objection handling. It is a baffling experience.

But here's another one. I recently tried to generate a short "Elevator Pitch" for Hopsworks, which goes something like this:
The Hopsworks AI Lakehouse is unique: it provides organisations like yours with the data infrastructure for your Machine Learning systems, allowing you to streamline all your MLOps tasks, teams and processes quickly and efficiently. With the AI Lakehouse, all your stakeholders benefit. First, your individual data scientist, data engineer, or machine learning engineer benefits, because they will be able to work with the same consistent operational infrastructure for all of their tasks. They will save precious time by not having to integrate the infrastructure themselves, and spending more time with their actual day jobs. Second, your data science or machine learning team leader will win because the AI lakehouse will make the team more efficient, and therefore they will be able to do more with less, and contribute more and better end results back to the business. Thirdly and lastly, your governance team will win, because the centralized infrastructure will be much easier to govern, making compliance with the latest and upcoming AI regulations much easier. This is how Hopsworks makes the booming AI application space much more valuable and attainable for your organisation. 
I wanted to figure out a way to customize this "Pitch" for different potential prospects, and see if I could use AI tools to do so. So I tried a bunch of tools, and found that they all have their different strengths and weaknesses. I found that the voice synthesis of ElevenLabs was clearly the best and most flexible around, but then also found that Google Vids offered some amazing capabilities, and could get me some crazy nice results super easily.

So: let me show you some of the results. Here's a Youtube playlist with some of the videos that I generated:

 


I thought that was pretty cool, but... I was also pretty underwhelmed with the lack of intonation and variation that was delivered by these AI voices. They are good - way better than the robo-voices of yesteryear, but they are nowhere near the quality of a real, human voice. To try and prove that - with my limited acting / voiceover skills, here's how I would deliver the same pitch:



There you go. I think it was amazing to see how far the technology has gotten already, and how easy it has become to make custom pitches for specific environments in a fairly automated way. But it's also pretty clear that we still have a way to go and that for now, personal and human content will stand out pretty clearly.

Hope that was a useful experiment. As always, I look forward to your comments and reactions!

Cheers

Rik

Monday, 2 December 2024

Training yourself with ChatGPT

Here's something I want to share. I have been using OpenAI's chatgpt for some personal training, and I have also been sharing this with our Hopsworks team. One of the unbelievably cool things you can do with it, is that you can role play specific topics with it. For example: Objection Handling - ie. to get better at dealing with some of the objections that a prospect might throw at you. Let me give you an example - completely hypothetical. I am going to try to handle the objections that a salesperson for a solar panel company might get from one of his/her prospects.

Role-playing part 1: Setting the scene

In ChatGPT, what you can do is you can set the scene by explaining the type of situation that you are in: the product that you are selling, and the prospect that you are dealing with.

It will then revert with some very detailed guidance on the objections that you may find.

You can find the entire overview of all the objections in this chat transcript. The key is that at the end of the overview of all possible objections and counter-arguments, ChatGPT basically says "Would you like to role play this?"

Role-playing part 2: going back and forth


The coolest thing then, is that the role play is not WRITTEN - it's ORAL. You literally talk to ChatGPT, and it will act out the role of the prospect, and you can practice your handling of the prospect's objections as a tried and tested salesperson. Afterwards, you get a nice little transcript of the entire conversation, of course, for review.
Here's a little clip of the way I was addressing one particular concern:


I have found this method extremely useful and interesting. It's like having an endlessly patient, unemotional and always available teacher at your fingertips. I liked it a lot!

Hope you thought this was a useful article - looking forward to your feedback.

Cheers

Rik

Wednesday, 6 November 2024

Happy Hoppaversary!


Today, November 6th 2024, it’s been a full year (!) that I have been working for Hopsworks. As with any youthful startups, the road to world domination is a bumpy one, and the journey with our team has been no different, so far. But one thing’s for sure: after one lap around the sun, my interest in and enthusiasm for the work that our team produces on a daily basis, is as radiant as ever. In this article, I would like to explore what I personally find so interesting about Hopsworks, and why I find the company and its products so interesting and motivating.

Hopsworks’ underlying tech is super solid

At the lowest level Hopsworks has unique technology that originated in years of research: HopsFS and RonDB. At some level, the origin story of Hopsworks is actually firmly tied to the bringing together of these two technologies in one consolidated data infrastructure platform. This project is grounded in years of practical experience, and the basic realization that these two components enable some really interesting capabilities in Machine Learning and AI. Bringing these components together, and integrating them into the coherent architecture that is the Hopsworks Feature Store in its essence, is the secret sauce of what Hopsworks does. It’s super unique because it can be deployed anywhere, and boasts crazy good performance stats - if only because the underlying storage components (HopsFS and RonDB) have been optimized for the feature store workload.



Hopsworks takes a broader view

Ok - so the tech is great. But why does that matter? Well, to me, it seems like the “Feature Store story” is “just” the start. A feature store is more than just a technology component: it is an enabler for building better AI and ML systems, by applying all the lessons learned from DevOps, Agile software development and FTI pipeline architectures. Today, Hopsworks is an MLOps platform, that brings the people and the processes that we want to apply to professional AI and ML systems, together around a feature store. This is not trivial, because MLOps is about more than just the tech: bringing people and processes together is hard, as anyone who has ever worked on a complex project will know. We have found that the feature store can be a fantastic forcing function for MLOps: the data foundation will lead the way, and it will bring the people and processes together.

Hopsworks’ unique value proposition, on technical AND non-technical levels

Sometimes people think that an infrastructure product like Hopsworks will only be used by very technical ML engineers, data engineers or data scientists, and that they are the only persona that stand to benefit from this kind of implementation. This is a complicated message, because of course it is true and not true at the same time.

It’s true because data engineers and data scientists stand to gain a massive amount of productivity and professional satisfaction, because the Hopsworks infrastructure will simplify their infrastructure related tasks. Research has shown that technical engineers spend 30-40% of their time doing non-productive, infrastructure-related tasks. That number needs to come down if we want to have any kind of productivity in building these systems, and that is Hopsworks’ objective. By providing a unified platform for AI and ML, technical stakeholders can make their lives easier and more productive.

But it’s also not true, because Hopsworks serves two other, important stakeholders, and it does so quite significantly.

First, we serve the technical team managers, IT managers, project managers and budget holders to much more efficiently allocate their budgets. We do that by reusing artifacts (features, pipelines, models), but also by offering deployment flexibility (on-prem and in the cloud) that allows you to choose the right platform for the right workload. Cost savings of 100+% are not unrealistic there, on an annual basis. That is NOT small change!

And secondly, we facilitate ML and AI governance at a very profound level. By managing the data that is used in models and tracking all the different manipulations that are run on it as we prepare, create and deploy our models, we can work towards the required explainability and FAIR principles that our regulators are going to require, in all kinds of industries. Before too long, any organization, public or private, that wants to use AI/ML for business purposes, will need to demonstrate proper governance - and the MLOps infrastructure around the feature store that Hopsworks provides will be super useful for this. As such, it will enable EU AI Act, or any other regulatory initiative out there, compliance for our customers.


So let there be bread cake

These are the main reasons why I think Hopsworks is just the best thing since sliced bread, and why it has been a fantastic personal and professional challenge to work with this team for the past year. It’s not always easy to get your head around complex platform products like this, but after this year in the trenches, I feel like I have seen a lot, learned a lot, and that we are supremely well positioned to provide amazing value for our clients. Onwards, and upwards!


Wednesday, 7 August 2024

The benefits of Openness and Modularity in an AI Lakehouse




I have written before about some of the important paradoxical tendencies in the Machine Learning (ML) and Intelligence (AI) industries. Let’s look at it from two perspectives:
  • The Developer: the top engineer that is trying to help his or her organization move forward, by using ML and AI techniques.
  • The CTO: the engineering manager that is trying to oversee these efforts, and who wants to ensure that they are pursued in a technologically and economically/budgetary sound way - while complying with all the new rules and regulations that are now forthcoming.
It should be clear that these are two important, decisive, even, perspectives that can truly determine the success or failure of AI/ML projects. Therefore, we’d like to zoom into these two perspectives in a little more detail, and then explore what the impact of openness and modularity could be on the context of these two personas.

The Developer's Context

For developers working on these types of AI/ML software projects, I think we can state some universal truths:
  • developers don't like architectural overhauls
  • developers are opinionated about the tools that they like and use: they want to be productive, and because of this they usually prefer to stick to the tools that they know.
At the same time: we know that data scientists and machine learning engineers spend a lot of time on setting up and managing the tools in their technical work environment. See the 2022 Kaggle survey: data scientists spend 50-80% of their time on their environment, data engineers spend 30-40% of their time on maintaining their pipelines, ML Engineers spend 20-30% of their time on this. All of this time is being spent on If we can cut the time spent on non-core ML and AI tasks by 10%, we will make a massive win for the organization and the individual contributor.

The CTO's Context

For CTO’s that are working on AI/ML software projects, we can also consider some universal considerations:
  • The CTO is responsible for making sure that his organization is constantly innovating and adding value to the business. AI and ML seem to offer massive potential for innovation for many organizations: we want to seize the opportunity, before our competitors do so. If the CTO does not act proactively, there will be a CxO on the management team that will start asking questions about the potential of AI shortly - you can bet your house on that.
  • Exceptionally, this is done with "big bang" solutions that will radically change an entire architecture from scratch, in a greenfield.
  • However, most people don't work in a greenfield, they work in a brownfield. Therefore, big bang approaches are not possible, and most people and organizations turn to innovate in different small incremental steps.
We know that all this work will have to be done in an economically sound and feasible way. See the SpringerLink Study and the McKinsey State of AI report - both stressing the need for cost efficiency in AI and ML systems.

So: in assessing both of these perspectives, we are clearly presented with an important question: how can we figure out a way to incrementally innovate using AI/ML, provide value to our organization in an economically sound manner, while respecting the productivity requirements and current tool choices of our engineers as much as possible?

The importance of Openness and Modularity

As I have tried to explain how both the individual engineer and their CTO management have at least slightly conflicting interests in the implementation of AI and ML projects. How can we make this work then? Here’s where I think we can take a look at the software industry in general, and look at some of the technological and methodological approaches that have made software engineering so much more productive in recent years.

To do so, let’s think back to where we came from: as little as two or three decades ago, waterfall-based software engineering methodologies were the standard for how you would tackle large software projects. It took books like the Mythical Man-month and the Agile Manifesto to emerge for people to start changing their approach, and to start developing and then using iterative software engineering methodologies, DevOps practices, Infrastructure-as-Code and cloud computing as critical enablers of a more modern approach. This is what we need to apply in the world of the very specific domain of AI and ML now: we need to learn how to apply these techniques in the world of AI and ML, and that will require a specific architecture to enable this.

This architecture should therefore
  • NOT be a closed, all-or-nothing environment,
  • NOT limit the engineers to a specific toolset,
  • NOT bound you to a specific deployment environment, offered by this that or the other cloud vendor
  • NOT dictate the specific framework that you should or should not use for your AI and ML
On the contrary - our vision here should be for an open and modular architecture that would allow for us to leverage the power of AI and ML in an incremental, balanced way. Like in general software development best practices, this architecture will enable a combination of tools and processes that will propel AI and ML projects forward, and resolve the paradox that we have seen emerge.

Let’s discuss this vision in a bit more detail. We’ll call this the Incremental AI Architecture (AIAI), enabled by the AI Lakehouse.

The Incremental AI Architecture (IAIA)

Today, many AI and ML projects are characterized by structures that are very different from what we want in an innovative and incremental AI Architecture. Systems are developed with bespoke tools and processes that are specific to the project in which they are implemented, and do not allow for reuse and automation in a way that an IAIA enabled by an AI Lakehouse would. Therefore, we propose a combination of best practices and tools that will allow you to move in this direction.

Let’s walk through the different steps in a few succinct paragraphs.

Start with modularization of pipelines:

At Hopsworks, we have argued many times before that one of the first steps that one should take when embarking on an AI/ML project, is to conceptually break up the AI/ML system into three parts. We refer to these three parts as the “F-T-I” pipelines, where we would distinguish three distinct phases and processes in the development and productionalization of these systems. At a high level, we would consider
  • The Feature Pipeline, where the different data sources of the input data of the AI/ML system are wrangled into the right format(s), and then eventually persisted in a way that would enable the next “T” phase of the system to proceed.
  • The Training Pipeline, where the input features would be used to train and test a an AI/ML model that would be used for making the predictions that we need for our organization.
  • The Inference Pipeline, where we deploy the model that was generated by the training pipeline, and start using it to allow systems and applications to use it’s capabilities and serve the predictions of the model to the outside world. In this pipeline, we would also want to monitor for any impactful changes inside or outside the system that would warrant us to revisit the process.


Once we have split the AI/ML system into these three functional and modular parts, we can proceed to take the next step to make the system into a true production-ready AI system. This will require some education of our team, and that would be the important second step in this journey.

Educate the team on the importance of MLOps for production AI systems

In order to maximize the benefits of AI systems in production, we will need to automate as many of the FTI pipeline steps as possible. This is what we call MLOps, short for “Machine Learning Operations” and analogous to DevOps.

These automations will offer
  • Significant Engineering and Technical benefits that will allow us to write better software systems
  • Process benefits that will allow these improved systems to be seized and maintained
  • Budgetary and Financial benefits: both our operational expenditures (OpEx) and our capital expenditures (CapEx) can be reduced, enabling more innovation at a lower cost.
Once we have the team on board, we can start looking at an implementation of a professional MLOps platform - and to choose one that allows for a gradual, incremental implementation to accommodate the concerns of the Developer and the CTO.

Do a gradual implementation of an MLOps platform

Over the years, Hopsworks has gained a lot of experience with the implementation of AI/ML systems on top of an MLOps platform, and as a consequence we have developed and released the 4.0 release of the Hopsworks Enterprise platform. We have called this the “AI Lakehouse” release, as we believe that it allows for a comprehensive but gradual implementation of everything an AI/ML project needs to become successful.

Based on this experience, and using the AI Lakehouse infrastructure, we recommend that you consider the following steps in sequence:


  • Implement and automate the FTI pipelines on the Feature-store platform. As you can see from the graphic above, all of the pipelines of the recommended F-T-I approach will leverage this central infrastructure, and therefore drive all users and stakeholders of the AI/ML system towards a centralized approach that will be much easier for the Alpha Developer, manageable for the Empowered CTO, and governable for the architects and compliance teams that are to ensure that rules and regulations are appropriately followed.
  • Implement a model registry, thereby accurately keeping track of all the impacts and changes that are required of our production AI/ML systems. If necessary, you could combine the registry with an experiment tracking system - but we have found this to be largely unnecessary if the above steps are duly implemented. Once you have the FTI pipelines connected to a feature store, tracking experiments becomes pure overhead.
  • Implement centralized model deployment for offline, batch use cases, as well as for online, real time use cases.
  • Finalize with centralized feature and model monitoring, iterating back to the FTI pipelines whenever such monitoring demonstrates the need for updating and adaptation.
Last but not least, and in order to ensure solid future implementation of additional AI and ML systems in your organization, you should leverage the above architecture to measure the business results of the AI system implementation. If you do so, chances are that this will be fueling additional investments into the technology, bringing better prediction services to our organizations and a more competitive market position as a result.

Wrapping up

With that, we hope to have made the case for open, modular AI and ML infrastructure more clearly. At Hopsworks, we live and breathe this philosophy day in and day out, and we would love to help you see its power for yourself and your business. Please reach out to us if you have any comments or questions - we will be happy to help out.

Friday, 14 June 2024

Part 3: An AI Lakehouse can be a forcing function for MLOps

In the first and second part of this article series, we explored the paradox that organizations face when considering the adoption of AI systems. On the one hand, AI systems offer significant potential benefits, such as real-time operational input and improved decision-making. On the other hand, organizations are confronted with significant challenges, primarily related to the perceived complexity of AI systems.

Part 1 of the series identified three main categories of complexity: ecosystem integration complexity, engineering complexity, and operational complexity. Ecosystem integration complexity arises from the need to integrate AI systems with various input and output systems, which can involve multiple data sources, targets, cadences, and types. Engineering complexity stems from the extensive pre-processing, efficiency requirements, and diverse frameworks and languages used in AI systems. Operational complexity involves ensuring the availability and uptime of AI systems, maintaining and evolving them, and implementing and ensuring data governance.

Part 2 of the series introduced MLOps as a solution to the paradox of AI complexity. MLOps is a set of practices and tools that help organizations manage the lifecycle of machine learning (ML) models. By implementing MLOps practices, organizations can improve the quality and reliability of their ML models, reduce the risk of model failures, and accelerate the time it takes to bring ML models to production.

In Part 3, we will delve deeper into the benefits of MLOps and explore how it can help organizations overcome the challenges of AI complexity. MLOps has implications on people and processes that are used to build AI systems, and in this third article we will discuss how we can maximize the chances of such a system to be successfully implemented. We will consider the key components of an MLOps platform, articulate how to use these components as forcing functions and provide practical tips and best practices for implementing MLOps in organizations.

At Hopsworks , we have worked with many clients that have successfully used and implemented an MLOps system - and they have done so using the Hopsworks AI Lakehouse.

The Hopsworks AI Lakehouse acts as a centralized repository around which all the different process steps in a machine learning system can be built. This will then help people, and force them to some degree, to adhere to the principles of MLOps by leveraging the AI Lakehouse:
  • It brings together and integrates all the moving parts of an ML system, so that the individual engineer does not need to duct tape a system together with him- or herself.
  • It integrates online and offline machine learning data into one coherent metadata-based system.
  • It provides easy-api based access to the right data that is needed for the ML processes to take place. For example, it ensures point-in-time-correct joins that prevent leakage of future data into the predictive model that we want to use.
  • It provides automated deployment procedures that allow devops best practices to be adopted by machine learning experts
  • It guarantees reproducibility and auditability that is (or will be) requirement by internal or external AI rules and regulations

The core idea of the AI Lakehouse, therefore, becomes that of using centralized data as the forcing function for implementing all the required tools, processes and team interactions that we need to make AI a much more predictable, productive, and efficient endeavour.

Just to make sure we all understand: a forcing function is a factor or event that compels or influences a change or action. It is often used in project management, strategy development, and organizational change to create a sense of urgency and drive progress. The goal is to encourage individuals or teams to take action by creating a situation where they feel compelled to do so. Forcing functions can be internal or external. Internal forcing functions are typically related to the organization's goals, objectives, or values. External forcing functions are often driven by market conditions, technological advancements, or regulatory changes. Forcing functions can be powerful tools for driving change and innovation. By creating a sense of urgency and compelling action, they can help organizations overcome inertia and achieve their goals more quickly.

And that is exactly what the AI Lakehouse can provide in the context of the AI Paradox and MLOps as a solution strategy to that paradox: all different stakeholders will be organizing their processes around the AI Lakehouse, offering a central point of control for MLOps best practices to be implemented. This makes MLOps a reality, and allows us to use the power of these principles to resolve the AI paradox that we addressed in the first part of these three articles. Effectively, AI Lakehouses make AI systems easier to implement, faster to deploy, more economically viable and structurally compliant with your organizational context’s regulations. That is a pretty great thing!

Thank you for taking the time to read these three articles - I hope they will allow you to shape your thinking around AI projects.

Thursday, 13 June 2024

Part 2 of 3: MLOps is a solution to the paradox of AI


In case you missed it: please find part 1 of this article series over here.

In that Part 1 of this article series, we highlighted the existing paradox in the AI industry where organizations recognize the massive potential benefits of AI systems like ChatGPT but face significant challenges due to perceived complexities. These complexities include
  • ecosystem integration complexities involving multiple data sources, targets, cadences, and types;
  • engineering complexities such as pre-processing, efficiency, and framework choices; and
  • operational complexities related to availability, maintenance, and data governance.
To address this paradox, organizations must strike a balance between leveraging the benefits of AI while managing its complexities effectively, with successful companies likely to gain a competitive advantage. In this second part of the article series, I would like to propose a particular solution to this balancing act. This solution is called MLOps.

About MLOps

MLOps (Machine Learning Operations) is a set of practices and tools that help organizations manage the lifecycle of machine learning (ML) models. It encompasses the entire process of developing, deploying, and maintaining ML models, from data collection and preparation to model training, testing, deployment, and monitoring. The goal of MLOps is to ensure that ML models are reliable, scalable, and perform as expected in production environments. It also aims to streamline the ML development process, making it more efficient and reproducible.

MLOps is becoming increasingly important as organizations adopt ML systems at scale, and borrows many of its ideas and concepts from the world of software engineering, often referred to as DevOps (Developer Operations). By implementing MLOps practices, organizations can improve the quality and reliability of their ML models, reduce the risk of model failures, and accelerate the time it takes to bring ML models to production. MLOps can help us productionize AI systems more efficiently, by
  • automating the different steps in the process
  • Aligning the different teams that are responsible for these different steps
Automating the processes and aligning the teams around these processes is what is going to allow organizations to seize the value that is being promised by AI, while at the same time managing and potentially even reducing the complexity of these systems. MLOps is the antidote to the potential poisoning of AI technology with unwieldy complexity.

So to summarize: the thesis of this second part of this article series, is that there is a solution possible to the paradox that we are currently seeing in the AI industry, and that this solution presents itself as a series of technological tools, processes and team alignment best practices. The question therefore becomes how we can easily and efficiently implement the practices of MLOps. This is likely not to be a trivial task, as we will be touching tools, processes and people during the implementation of MLOps in our organization. This, therefore, is what we will be discussing in the third and final part of this article series.

AI Lakehouse as a Forcing Function for Production AI systems (intro and part 1/3)



At Hopsworks, we have been developing awesome technologies that make it possible to develop powerful #AI systems efficiently and effectively, in a way that also safeguards the potentially privacy-sensitive data that we expose to it. In this 3-part series or articles, I would like to articulate and summarize the reasons why these technologies are of interest to our customers, so that others can benefit from it as well.

I will do so in 3 parts:

  • Part 1: Explaining the particular paradox of AI that organizations are facing today, and that is potentially slowing down their willingness to engage with this powerful new class of technologies.
  • Part 2: Investigating how we can break the paradox, and overcome the barriers that hold us back. In this part, we will focus a lot on a particular class of IT systems grouped together as MLOps systems, and explain how they help in overcoming the seeming paradox.
  • Part 3: Articulate how MLOps systems need to be architected in a particular way in order for them to drive behavior and achieve successful implementation. This will focus on the observation that a feature store, a central data repository for all MLOps systems and the AI systems that they enable, can act as a forcing function for successful implementations.
So let’s explore these topics, in a three-part series. I will be publishing these parts over the next few days. But lets's start with today's article - Part 1.

Part 1 of 3: The Paradox of AI

Let’s start with an interesting observation that we hear from almost every single Hopsworks user, prospect or customer: there is something paradoxical about the current state of the AI industry. On the one hand, and especially since the rise of generative AI systems like ChatGPT and its siblings, people seem very much convinced that AI has massive potential benefits that could impact every organization, big or small. Listing these benefits is almost impossible to do exhaustively, but at a high level we see benefits related to

  1. Increased Data Processing capacity: AI enables the processing of vast amounts of data, allowing organizations to gain valuable insights and make informed decisions.
  2. Faster and Better Decision-Making: AI-powered systems can analyze data in real-time, enabling faster and more accurate decision-making.
  3. Improved Efficiency and Innovation: AI drives efficiency by automating repetitive tasks and fostering innovation by providing new solutions to complex problems.
  4. Moving up the value pyramid: AI systems are delivering real-time operational input, enabling organizations to respond quickly to changing conditions.
It’s almost impossible to ignore the potential of these systems - yet at the same time we see some real and important challenges that are preventing organizations from making significant commitments to them. For the most part, these challenges seem to be related to the perceived Complexity of AI systems, which manifest themselves in a number of different ways:

Ecosystem Integration Complexity:

When we look at these AI systems, we see that the integration of these systems with some of the input and output systems around it has become significantly more complex:
  • Input Data from Different Sources: AI systems often integrate data from multiple systems and technology layers, leading to increased complexity.
  • Output Data to Different Targets: AI systems often output data to multiple target systems and technology layers, leading to increased complexity.
  • Data with Different Cadences: The data being integrated may have different timing requirements in which the systems need to be receiving data from / sending data to these systems, further complicating the integration process.
  • Data of Various Types and Schemas: AI systems need to handle different data types and schemas, such as pictures, audio, and time series, adding to the complexity.

Engineering Complexity:

Also from an engineering perspective, there is quite a bit of complexity to be reckoned with. AI systems often come with
  • Multiple Layers of Pre-Processing: AI models require extensive pre-processing and transformations to ensure data consistency and accuracy.
  • Real Requirements on Efficiency and Speed of Delivery: AI systems need to be efficient and deliver results quickly, which can be challenging to achieve.
  • Multiple Frameworks and Languages: The AI landscape comprises various frameworks and languages, making it difficult to choose the right ones for a particular project.
All of these add engineering complexity to the AI system.

Operational Complexity:

Last but not least, we also see our Hopsworks users grappling with the complexity of operationally managing these AI systems start to finish. This means
  • Guaranteeing Availability and Uptime: Ensuring the availability and uptime of AI systems is crucial for continuous operation.
  • Maintaining and Evolving the systems: AI systems require regular maintenance and evolution to keep up with changing requirements and technological advancements.
  • Implementing and ensuring Data Governance: AI systems need proper data governance to comply with current and upcoming regulations, such as GDPR and the EU AI Act. This involves versioning, metadata management, lineage tracking, and monitoring.
So to summarize: the paradox confronts AI’s significant benefits with significant complexity. The marketplace will force organizations to balance these competitively - and companies that best succeed in seizing the benefits while managing the complexities, are very likely to end up on top. This is why we would like to present a credible antidote to this paradox of AI in part 2 of these article series.

Tuesday, 27 February 2024

The 3 Whys of Feature Stores for Machine Learning & AI

Why you need a feature store, why you should buy (not build) one, and why you should consider Hopsworks

Start with 3 Why’s

Quite a few years ago, I read a really intriguing book by Simon Sinek: Start with Why. The subtitle actually gives away the essence of the book: How Great Leaders Inspire Everyone to Take Action. Spoiler alert: they do so by explaining WHY something needs to get done, before explaining how and what needs to get done. It’s a very simple, but in my experience, important and intuitive way to effectively communicate something to any audience. Whether you are communicating to customers, co-workers or your kids - the WHY usually paves that way for much smoother discussions and actions. Sinek talks about the Golden Circle, which outlines how starting from the inside (why) and working towards the outside (what) is an effective method of any communication strategy.

Since I started working for Hopsworks, I have had this framework in the back of my mind, as I got to talk to many more users, customers and partners that have been adopting the amazing technology that the team has built. In these discussions, it actually became clear to me that there are three different “WHY” questions that we need to answer for our community, if we want to be successful in the marketplace. At the risk of misusing the golden circle visualization, I have tried to put these 3 questions in 3 concentric circles in the figure below:

As you can see, you move from the OUTER circle to the INNER circle, and you try to address the following 3 questions:
  • Why would you consider using a feature store architecture in the first place? If you would find enough solid reasons for doing so, you would proceed to the next “Why” question, being:
  • Why would you NOT BUILD, but instead BUY a feature store for your data platform architecture? And if you find enough reasons to BUY and not build, then you would consider the last and final “Why” question, being:
  • Why would you specifically choose to buy the Hopsworks feature store for your data platform architecture?
If, and only if, we understand the potential answers to these questions, in all their variations, can we successfully meet the customer’s expectations and provide value in their implementation. That’s the core idea behind this thought process.

So let’s explore these three WHY questions, and their answers, in a bit more detail.

1. Why Consider a Feature Store for ML/AI?

It’s pretty clear that not everyone needs a Feature Store. A data platform like that is quite specific to the ML/AI workloads, and would only realistically be required or used by organizations and teams that have quite a deep understanding and investment into the relatively new fields of machine learning and artificial intelligence. If all you have in your environment is an early stage experiment with ML/AI technology, then most likely you do not yet have a need for a feature store - seems logical, right? So: what are the conditions under which you would want to consider it? What are the reasons for implementing a Feature Store in your organizations? Let’s explore this!

Many of these reasons were actually outlined in an earlier article on the Hopsworks Blog, and I believe that the reasons for considering a Feature Store are accurately described there. In this overview, I would like to make the distinction between technical and non-technical (as in, business / organizational / competency-related). Let’s dig into it:
  • Technical reasons for considering a feature store:

    • Existing models running in production are expensive - they are hard to debug, review and upgrade, they are bespoke systems that are difficult and costly to maintain. There’s a growing body of evidence that ML/AI systems that do NOT have a feature store architecture in the backend, are simply too expensive because of that - see other points.
    • Monitoring production pipelines is challenging, or impossible. The data that powers AI changes over time, and identifying when there are significant changes that require retraining your AI is not easy.
    • Difficulties in managing the lifecycle of feature data, including the tracking of versions and historical changes. This is an elementary requirement for all regulated data processing environments - and a key reason why feature stores align so well with these industries’ requirements.
    • Feature data is not centrally managed; it is duplicated, features are re-engineered, and generally data is not reused across the organization.
  • Non-technical reasons for considering a feature store:

    • Valuable models are created but once the experimentation stage is over they do not bridge the chasm to operations - the models do not consistently generate revenue or savings. This is all about getting the models to deliver value, consistently.
    • No cohesive governance in the storage and use of AI assets (feature data and models), everything is done in a bespoke manner, leading to compliance risks.
    • Slow time-to-market of AI models, and a general inability to provide very fresh feature data or handle real-time data for ML models, which is critical for industries like finance, retail, or logistics where real-time insights can add significant business value. This point is all about the speed with which the data science team can develop their models and bring them to life in a production environment.
    • Hard to derive a direct business value from the models, they exist in isolated environments that do not directly influence business operations. This then obviously makes it much harder to justify the investments required to develop and operationalize the models.
    • Slow ramp-up time when onboarding new talent into the ML teams. Sharing available AI assets is complex because operational knowledge is held by a few individuals or groups.
We have summarized these reasons in the outer circle of the figure below. I am sure that there are other reasons that could potentially be more applicable to your specific environment - but these are the higher level ones that we see time and time again in our Hopsworks user discussions.

So now we know and understand why an organization requires a feature store - great! But that does not necessarily mean that they will actually go out to look for one in the marketplace! Many organizations, especially the “digital natives” that are tuned in to the latest technology trends (like ML/AI) nowadays have a tendency to at least consider building a software component themselves - instead of buying one. This is a good and worthwhile consideration, as it seems clear to me that there is a minimum of scale and maturity required before wanting to go “all-in” on this brand new technology. For many people, a homegrown solution might be “good enough”.

So how do we consider whether or not a roll-your-own solution is good enough or not? Let’s consider some criteria.

2. Why NOT BUILD, but BUY a Feature Store?


In the second layer of the diagram below, we consider some of the reasons / criteria that would warrant you to look at the BUY option instead of the BUILD option. Some of these reasons have also been covered in a previous article, but let's revisit it here.

The most common reasons for buying and not building a feature store are:
  • Maintenance Burden & Total Cost of Ownership (TCO): clearly, this is something that every mature IT organization will consider. Ultimately, this is related to the potential technical debt that this organization will want to incur, given the significant costs that could be associated with this down the line. It’s important to consider not just the short term, but also the longer term implications of a build vs. buy decision.
  • Technical complexity: clearly, a piece of infrastructure software like a Feature Store, which will underpin all ML/AI applications that the organization would choose to develop, has a significant amount of technical complexity associated to it. It’s important to consider this, and to investigate the most crucial domains in which a “build” approach could encounter unexpected technical challenges.
    • Offline / Online sync: one of the key characteristics of a feature store is that it will both contain the historical data of a feature dataset, as well as the most recent values. Both have their use and purpose, and need to be kept in sync inside the feature repository. Feature Stores like Hopsworks do this for you, but in a “build” scenario you would need to take this into account and do all the ETL data lifting yourself.
    • Reporting and search: in any large machine learning system where you have dozens/hundreds/thousands of models in production, you would want and need the feature data to be findable, accessible, interoperable, and reusable - according to the so-called “F.A.I.R.” principles that we have described in this post. This seems easy - but if you consider all of the different combinations that you could have between versions of datasets, pipelines and models, it is clear that this is not a trivial engineering assignment.
    • Metadata for versioning and lineage: similar to the previous point, a larger ML/AI platform that is hosting a larger number of models, will need metadata for its online and offline datasets, and will need to accurately keep track of the versions and lineage of the data. This will increasingly become a requirement, as governance for ML/AI systems will cease to be optional. Implementations of and compliance with the EU AI Act, will simply mandate this - and the complexity around implementing it at scale is significant.
    • Time-travel and PITC joins: if we want to make the predictive results of our ML/AI systems explainable, we will need to be able to offer so-called “time travel” capabilities. This means that we can look at how a particular model yielded specific results based on the inputs that it received at a specific point in time. Feature Stores will need to offer this capability, on top of the requirement to guarantee that the models yield accurate and correct information at a given point in time - something we call “Point-in-time correctness”. Again, the technical complexity of implementing this yourself is not to be underestimated.

With that, we hope to have outlined some of the key reasons that you should consider buying, not building your feature store solution. At the end of the day this is a strategic decision that will be different for every organization - as long as the question is honestly asked and answered.

3. Why buy the Hopsworks Feature Store?

Last but not least, we would also like to offer the readers that have a) first decided that they need a feature store, and b) also decided that they will want to buy such a critical piece of infrastructure and not build it themselves, a perspective on why Hopsworks might be the best choice for your environment. In line with the previous “Golden Circle” visuals, we now get to the “inner” circle of the diagram:




Obviously we are conscious of the poor readability of the diagram, so here’s a cut-out that is a bit more readable.


As you can see, we think that there are essentially 4 main reasons why the Hopsworks Feature Store solution could be the best possible fit for your environment. Let’s discuss each of these briefly:
  1. Performance and HA: Hopsworks has been working on the Feature Store for a number of years, with a top team of academic and industry specialists. We have integrated and embedded the best possible technologies, like for example RonBD, on the market, and have proven that this is currently giving us unparallelled performance. Take a look at these open benchmarks for yourself, and you will see that Hopsworks is in a league of its own with regards to performance. On top of that, we have been leveraging expertise in systems High-Availability to develop a feature store solution that can withstand the most demanding workloads.
  2. Flexible deployment (serverless / cloud / on-prem): Hopsworks is the only solution on the market that offers you the choice of deployment options that is best-suited for your specific environment. You can start small with a multi-tenant-based serverless environment, grow into a managed cloud deployment in your AWS / Azure / GCP account, or even repatriate the workload onto your own, on-premise hardware. No other solution offers this, today.
  3. Governance and compliance: Hopsworks has taken great pains at developing industry leading governance capabilities into the product. Versioning, lineage, time travel, search, security, monitoring and reporting - all of the advanced functionalities that a compliant solution will be required to deliver, now and in the future.
  4. Value for money, TCO: Hopsworks believes that in order for ML/AI to be successful, it needs to deliver value, and it needs to offer its users a clear Return on Investment. That means that the solution needs to be available at a reasonable price, and that consumption-based metrics cannot always be used for billing. We need to allow for testing, training, experimentation, learning and development - without requiring the customer to empty their pockets from day one, and all the while managing the total cost of ownership of the solution.
This brings us to the end of this article, where we have tried to discuss the “3 Whys” of Feature Store implementations. We hope this was a useful discussion, and are happy to discuss this with you as well. No doubt, we can make the argument even more detailed and refined, together.

All the best

Rik