Sunday 23 June 2024


via Instagram

I just posted this on Twitter as @ rvanbruggen

This was the text:
RT @jim_dowling: RAG using tabular data with function calling - my talk from @pydatalondon is now available on youtube:…

from Twitter

June 23, 2024 at 08:20AM

Saturday 22 June 2024

Friday 21 June 2024

Thursday 20 June 2024

Wednesday 19 June 2024

Tuesday 18 June 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
A #Recommendersystem that does 1000000 recommendations per second. That's proper #scale. You should come and see how we did it with @hopsworks. The team told me that _generating_ the load was harder than _handling_ it :) ...

from Twitter

June 18, 2024 at 03:08PM

I just posted this on Twitter as @ rvanbruggen

This was the text:
RT @jim_dowling: Tomorrow, i will give a webinar on scaling recommender systems to Tiktok scale (1m ops/sec). Part of it will be how to des…

from Twitter

June 18, 2024 at 03:07PM


via Instagram

Monday 17 June 2024

Sunday 16 June 2024

Saturday 15 June 2024

Friday 14 June 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
RT @hopsworks: Thank you SIGMOD and to everyone who attended the presentation for our paper on feature stores for ML yesterday! 🫶 If you h…

from Twitter

June 14, 2024 at 10:52AM

I just posted this on Twitter as @ rvanbruggen

This was the text:
Looking forward to a couple of fun days at #pydata #london. Come see us at our @hopsworks booth if you are in the area - and look at the schedule at ...

from Twitter

June 14, 2024 at 10:51AM

Part 3: An AI Lakehouse can be a forcing function for MLOps

In the first and second part of this article series, we explored the paradox that organizations face when considering the adoption of AI systems. On the one hand, AI systems offer significant potential benefits, such as real-time operational input and improved decision-making. On the other hand, organizations are confronted with significant challenges, primarily related to the perceived complexity of AI systems.

Part 1 of the series identified three main categories of complexity: ecosystem integration complexity, engineering complexity, and operational complexity. Ecosystem integration complexity arises from the need to integrate AI systems with various input and output systems, which can involve multiple data sources, targets, cadences, and types. Engineering complexity stems from the extensive pre-processing, efficiency requirements, and diverse frameworks and languages used in AI systems. Operational complexity involves ensuring the availability and uptime of AI systems, maintaining and evolving them, and implementing and ensuring data governance.

Part 2 of the series introduced MLOps as a solution to the paradox of AI complexity. MLOps is a set of practices and tools that help organizations manage the lifecycle of machine learning (ML) models. By implementing MLOps practices, organizations can improve the quality and reliability of their ML models, reduce the risk of model failures, and accelerate the time it takes to bring ML models to production.

In Part 3, we will delve deeper into the benefits of MLOps and explore how it can help organizations overcome the challenges of AI complexity. MLOps has implications on people and processes that are used to build AI systems, and in this third article we will discuss how we can maximize the chances of such a system to be successfully implemented. We will consider the key components of an MLOps platform, articulate how to use these components as forcing functions and provide practical tips and best practices for implementing MLOps in organizations.

At Hopsworks , we have worked with many clients that have successfully used and implemented an MLOps system - and they have done so using the Hopsworks AI Lakehouse.

The Hopsworks AI Lakehouse acts as a centralized repository around which all the different process steps in a machine learning system can be built. This will then help people, and force them to some degree, to adhere to the principles of MLOps by leveraging the AI Lakehouse:
  • It brings together and integrates all the moving parts of an ML system, so that the individual engineer does not need to duct tape a system together with him- or herself.
  • It integrates online and offline machine learning data into one coherent metadata-based system.
  • It provides easy-api based access to the right data that is needed for the ML processes to take place. For example, it ensures point-in-time-correct joins that prevent leakage of future data into the predictive model that we want to use.
  • It provides automated deployment procedures that allow devops best practices to be adopted by machine learning experts
  • It guarantees reproducibility and auditability that is (or will be) requirement by internal or external AI rules and regulations

The core idea of the AI Lakehouse, therefore, becomes that of using centralized data as the forcing function for implementing all the required tools, processes and team interactions that we need to make AI a much more predictable, productive, and efficient endeavour.

Just to make sure we all understand: a forcing function is a factor or event that compels or influences a change or action. It is often used in project management, strategy development, and organizational change to create a sense of urgency and drive progress. The goal is to encourage individuals or teams to take action by creating a situation where they feel compelled to do so. Forcing functions can be internal or external. Internal forcing functions are typically related to the organization's goals, objectives, or values. External forcing functions are often driven by market conditions, technological advancements, or regulatory changes. Forcing functions can be powerful tools for driving change and innovation. By creating a sense of urgency and compelling action, they can help organizations overcome inertia and achieve their goals more quickly.

And that is exactly what the AI Lakehouse can provide in the context of the AI Paradox and MLOps as a solution strategy to that paradox: all different stakeholders will be organizing their processes around the AI Lakehouse, offering a central point of control for MLOps best practices to be implemented. This makes MLOps a reality, and allows us to use the power of these principles to resolve the AI paradox that we addressed in the first part of these three articles. Effectively, AI Lakehouses make AI systems easier to implement, faster to deploy, more economically viable and structurally compliant with your organizational context’s regulations. That is a pretty great thing!

Thank you for taking the time to read these three articles - I hope they will allow you to shape your thinking around AI projects.


via Instagram

Thursday 13 June 2024


via Instagram

Part 2 of 3: MLOps is a solution to the paradox of AI

In case you missed it: please find part 1 of this article series over here.

In that Part 1 of this article series, we highlighted the existing paradox in the AI industry where organizations recognize the massive potential benefits of AI systems like ChatGPT but face significant challenges due to perceived complexities. These complexities include
  • ecosystem integration complexities involving multiple data sources, targets, cadences, and types;
  • engineering complexities such as pre-processing, efficiency, and framework choices; and
  • operational complexities related to availability, maintenance, and data governance.
To address this paradox, organizations must strike a balance between leveraging the benefits of AI while managing its complexities effectively, with successful companies likely to gain a competitive advantage. In this second part of the article series, I would like to propose a particular solution to this balancing act. This solution is called MLOps.

About MLOps

MLOps (Machine Learning Operations) is a set of practices and tools that help organizations manage the lifecycle of machine learning (ML) models. It encompasses the entire process of developing, deploying, and maintaining ML models, from data collection and preparation to model training, testing, deployment, and monitoring. The goal of MLOps is to ensure that ML models are reliable, scalable, and perform as expected in production environments. It also aims to streamline the ML development process, making it more efficient and reproducible.

MLOps is becoming increasingly important as organizations adopt ML systems at scale, and borrows many of its ideas and concepts from the world of software engineering, often referred to as DevOps (Developer Operations). By implementing MLOps practices, organizations can improve the quality and reliability of their ML models, reduce the risk of model failures, and accelerate the time it takes to bring ML models to production. MLOps can help us productionize AI systems more efficiently, by

  • automating the different steps in the process
  • Aligning the different teams that are responsible for these different steps

Automating the processes and aligning the teams around these processes is what is going to allow organizations to seize the value that is being promised by AI, while at the same time managing and potentially even reducing the complexity of these systems. MLOps is the antidote to the potential poisoning of AI technology with unwieldy complexity.

So to summarize: the thesis of this second part of this article series, is that there is a solution possible to the paradox that we are currently seeing in the AI industry, and that this solution presents itself as a series of technological tools, processes and team alignment best practices. The question therefore becomes how we can easily and efficiently implement the practices of MLOps. This is likely not to be a trivial task, as we will be touching tools, processes and people during the implementation of MLOps in our organization. This, therefore, is what we will be discussing in the third and final part of this article series.

AI Lakehouse as a Forcing Function for Production AI systems (intro and part 1/3)

At Hopsworks, we have been developing awesome technologies that make it possible to develop powerful #AI systems efficiently and effectively, in a way that also safeguards the potentially privacy-sensitive data that we expose to it. In this 3-part series or articles, I would like to articulate and summarize the reasons why these technologies are of interest to our customers, so that others can benefit from it as well.

I will do so in 3 parts:

  • Part 1: Explaining the particular paradox of AI that organizations are facing today, and that is potentially slowing down their willingness to engage with this powerful new class of technologies.
  • Part 2: Investigating how we can break the paradox, and overcome the barriers that hold us back. In this part, we will focus a lot on a particular class of IT systems grouped together as MLOps systems, and explain how they help in overcoming the seeming paradox.
  • Part 3: Articulate how MLOps systems need to be architected in a particular way in order for them to drive behavior and achieve successful implementation. This will focus on the observation that a feature store, a central data repository for all MLOps systems and the AI systems that they enable, can act as a forcing function for successful implementations.
So let’s explore these topics, in a three-part series. I will be publishing these parts over the next few days. But lets's start with today's article - Part 1.

Part 1 of 3: The Paradox of AI

Let’s start with an interesting observation that we hear from almost every single Hopsworks user, prospect or customer: there is something paradoxical about the current state of the AI industry. On the one hand, and especially since the rise of generative AI systems like ChatGPT and its siblings, people seem very much convinced that AI has massive potential benefits that could impact every organization, big or small. Listing these benefits is almost impossible to do exhaustively, but at a high level we see benefits related to

  1. Increased Data Processing capacity: AI enables the processing of vast amounts of data, allowing organizations to gain valuable insights and make informed decisions.
  2. Faster and Better Decision-Making: AI-powered systems can analyze data in real-time, enabling faster and more accurate decision-making.
  3. Improved Efficiency and Innovation: AI drives efficiency by automating repetitive tasks and fostering innovation by providing new solutions to complex problems.
  4. Moving up the value pyramid: AI systems are delivering real-time operational input, enabling organizations to respond quickly to changing conditions.
It’s almost impossible to ignore the potential of these systems - yet at the same time we see some real and important challenges that are preventing organizations from making significant commitments to them. For the most part, these challenges seem to be related to the perceived Complexity of AI systems, which manifest themselves in a number of different ways:

Ecosystem Integration Complexity:

When we look at these AI systems, we see that the integration of these systems with some of the input and output systems around it has become significantly more complex:
  • Input Data from Different Sources: AI systems often integrate data from multiple systems and technology layers, leading to increased complexity.
  • Output Data to Different Targets: AI systems often output data to multiple target systems and technology layers, leading to increased complexity.
  • Data with Different Cadences: The data being integrated may have different timing requirements in which the systems need to be receiving data from / sending data to these systems, further complicating the integration process.
  • Data of Various Types and Schemas: AI systems need to handle different data types and schemas, such as pictures, audio, and time series, adding to the complexity.

Engineering Complexity:

Also from an engineering perspective, there is quite a bit of complexity to be reckoned with. AI systems often come with
  • Multiple Layers of Pre-Processing: AI models require extensive pre-processing and transformations to ensure data consistency and accuracy.
  • Real Requirements on Efficiency and Speed of Delivery: AI systems need to be efficient and deliver results quickly, which can be challenging to achieve.
  • Multiple Frameworks and Languages: The AI landscape comprises various frameworks and languages, making it difficult to choose the right ones for a particular project.
All of these add engineering complexity to the AI system.

Operational Complexity:

Last but not least, we also see our Hopsworks users grappling with the complexity of operationally managing these AI systems start to finish. This means
  • Guaranteeing Availability and Uptime: Ensuring the availability and uptime of AI systems is crucial for continuous operation.
  • Maintaining and Evolving the systems: AI systems require regular maintenance and evolution to keep up with changing requirements and technological advancements.
  • Implementing and ensuring Data Governance: AI systems need proper data governance to comply with current and upcoming regulations, such as GDPR and the EU AI Act. This involves versioning, metadata management, lineage tracking, and monitoring.
So to summarize: the paradox confronts AI’s significant benefits with significant complexity. The marketplace will force organizations to balance these competitively - and companies that best succeed in seizing the benefits while managing the complexities, are very likely to end up on top. This is why we would like to present a credible antidote to this paradox of AI in part 2 of these article series.

I just posted this on Twitter as @ rvanbruggen

This was the text:
RT @jim_dowling: @Javierdlrm presented the 1st feature store system paper at a top tier conference - SIGMOD. Highlights: * read data from…

from Twitter

June 13, 2024 at 09:59AM

Wednesday 12 June 2024

Tuesday 11 June 2024

Monday 10 June 2024

Sunday 9 June 2024

Saturday 8 June 2024

Friday 7 June 2024


via Instagram

I just posted this on Twitter as @ rvanbruggen

This was the text:
I had such a great time talking to @MaartenSukel for our @hopsworks #5minuteinterviews. Maarten wrote a fantastic book (in Dutch) about the "AI Revolution" that focuses a lot on the societal impact of #ai and #ml - episode on this link:

from Twitter

June 07, 2024 at 11:30AM

I just posted this on Twitter as @ rvanbruggen

This was the text:
Last night we had a great #meetup in #paris where @SirOibaf demonstrated his use of @hopsworks platform to solve #stockholm's commuter pains with #ai and @MistralAI 's #llm. Thanks to @aicampai for setting it up (and even sorting out a pizza emergency!)!

from Twitter

June 07, 2024 at 11:16AM

Thursday 6 June 2024

Wednesday 5 June 2024

Tuesday 4 June 2024

Monday 3 June 2024


via Instagram

I just posted this on Twitter as @ rvanbruggen

This was the text:
Ce jeudi, nous participons dans un #meetup #exceptionnel à #Paris avec @hopsworks  et @aicampai - rejoignez-nous!

from Twitter

June 03, 2024 at 09:01AM

Sunday 2 June 2024

Saturday 1 June 2024