Thursday 13 June 2024

AI Lakehouse as a Forcing Function for Production AI systems (intro and part 1/3)



At Hopsworks, we have been developing awesome technologies that make it possible to develop powerful #AI systems efficiently and effectively, in a way that also safeguards the potentially privacy-sensitive data that we expose to it. In this 3-part series or articles, I would like to articulate and summarize the reasons why these technologies are of interest to our customers, so that others can benefit from it as well.

I will do so in 3 parts:

  • Part 1: Explaining the particular paradox of AI that organizations are facing today, and that is potentially slowing down their willingness to engage with this powerful new class of technologies.
  • Part 2: Investigating how we can break the paradox, and overcome the barriers that hold us back. In this part, we will focus a lot on a particular class of IT systems grouped together as MLOps systems, and explain how they help in overcoming the seeming paradox.
  • Part 3: Articulate how MLOps systems need to be architected in a particular way in order for them to drive behavior and achieve successful implementation. This will focus on the observation that a feature store, a central data repository for all MLOps systems and the AI systems that they enable, can act as a forcing function for successful implementations.
So let’s explore these topics, in a three-part series. I will be publishing these parts over the next few days. But lets's start with today's article - Part 1.

Part 1 of 3: The Paradox of AI

Let’s start with an interesting observation that we hear from almost every single Hopsworks user, prospect or customer: there is something paradoxical about the current state of the AI industry. On the one hand, and especially since the rise of generative AI systems like ChatGPT and its siblings, people seem very much convinced that AI has massive potential benefits that could impact every organization, big or small. Listing these benefits is almost impossible to do exhaustively, but at a high level we see benefits related to

  1. Increased Data Processing capacity: AI enables the processing of vast amounts of data, allowing organizations to gain valuable insights and make informed decisions.
  2. Faster and Better Decision-Making: AI-powered systems can analyze data in real-time, enabling faster and more accurate decision-making.
  3. Improved Efficiency and Innovation: AI drives efficiency by automating repetitive tasks and fostering innovation by providing new solutions to complex problems.
  4. Moving up the value pyramid: AI systems are delivering real-time operational input, enabling organizations to respond quickly to changing conditions.
It’s almost impossible to ignore the potential of these systems - yet at the same time we see some real and important challenges that are preventing organizations from making significant commitments to them. For the most part, these challenges seem to be related to the perceived Complexity of AI systems, which manifest themselves in a number of different ways:

Ecosystem Integration Complexity:

When we look at these AI systems, we see that the integration of these systems with some of the input and output systems around it has become significantly more complex:
  • Input Data from Different Sources: AI systems often integrate data from multiple systems and technology layers, leading to increased complexity.
  • Output Data to Different Targets: AI systems often output data to multiple target systems and technology layers, leading to increased complexity.
  • Data with Different Cadences: The data being integrated may have different timing requirements in which the systems need to be receiving data from / sending data to these systems, further complicating the integration process.
  • Data of Various Types and Schemas: AI systems need to handle different data types and schemas, such as pictures, audio, and time series, adding to the complexity.

Engineering Complexity:

Also from an engineering perspective, there is quite a bit of complexity to be reckoned with. AI systems often come with
  • Multiple Layers of Pre-Processing: AI models require extensive pre-processing and transformations to ensure data consistency and accuracy.
  • Real Requirements on Efficiency and Speed of Delivery: AI systems need to be efficient and deliver results quickly, which can be challenging to achieve.
  • Multiple Frameworks and Languages: The AI landscape comprises various frameworks and languages, making it difficult to choose the right ones for a particular project.
All of these add engineering complexity to the AI system.

Operational Complexity:

Last but not least, we also see our Hopsworks users grappling with the complexity of operationally managing these AI systems start to finish. This means
  • Guaranteeing Availability and Uptime: Ensuring the availability and uptime of AI systems is crucial for continuous operation.
  • Maintaining and Evolving the systems: AI systems require regular maintenance and evolution to keep up with changing requirements and technological advancements.
  • Implementing and ensuring Data Governance: AI systems need proper data governance to comply with current and upcoming regulations, such as GDPR and the EU AI Act. This involves versioning, metadata management, lineage tracking, and monitoring.
So to summarize: the paradox confronts AI’s significant benefits with significant complexity. The marketplace will force organizations to balance these competitively - and companies that best succeed in seizing the benefits while managing the complexities, are very likely to end up on top. This is why we would like to present a credible antidote to this paradox of AI in part 2 of these article series.

No comments:

Post a Comment