Saturday 14 September 2024
Friday 13 September 2024
I just posted this on Twitter as @ rvanbruggen
RT @kdnuggets: Introducing the AI Lakehouse In order for Lakehouse to become a unified data layer for both analytics and AI, it needs to be…
RT @kdnuggets: Introducing the AI Lakehouse In order for Lakehouse to become a unified data layer for both analytics and AI, it needs to be…
— Rik Van Bruggen (@rvanbruggen) Sep 13, 2024
from Twitter https://twitter.com/rvanbruggen
September 13, 2024 at 10:25AM
via IFTTT
I just posted this on Twitter as @ rvanbruggen
Next week we are going to be at #pydata #amsterdam with @hopsworks . Super looking forward to lots of great discussions on our booth, and of course to @Javierdlrm's talk on thursday afternoon. Hope to see you there! (https://t.co/FdwpBjtEmn)
Next week we are going to be at #pydata #amsterdam with @hopsworks . Super looking forward to lots of great discussions on our booth, and of course to @Javierdlrm's talk on thursday afternoon. Hope to see you there! (https://t.co/FdwpBjtEmn)
— Rik Van Bruggen (@rvanbruggen) Sep 13, 2024
from Twitter https://twitter.com/rvanbruggen
September 13, 2024 at 10:05AM
via IFTTT
I just posted this on Twitter as @ rvanbruggen
RT @AMULETAnalytics: Introducing The AI Lakehouse - https://t.co/nYVUtpx5b8 #AI #LakeHouse @hopsworks
RT @AMULETAnalytics: Introducing The AI Lakehouse - https://t.co/nYVUtpx5b8 #AI #LakeHouse @hopsworks
— Rik Van Bruggen (@rvanbruggen) Sep 13, 2024
from Twitter https://twitter.com/rvanbruggen
September 13, 2024 at 08:22AM
via IFTTT
Thursday 12 September 2024
Wednesday 11 September 2024
Tuesday 10 September 2024
Monday 9 September 2024
Sunday 8 September 2024
Saturday 7 September 2024
Friday 6 September 2024
Thursday 5 September 2024
Wednesday 4 September 2024
Tuesday 3 September 2024
Monday 2 September 2024
Sunday 1 September 2024
Saturday 31 August 2024
Friday 30 August 2024
Thursday 29 August 2024
Wednesday 28 August 2024
Tuesday 27 August 2024
Monday 26 August 2024
Sunday 25 August 2024
Saturday 24 August 2024
Friday 23 August 2024
Thursday 22 August 2024
Wednesday 21 August 2024
Tuesday 20 August 2024
Monday 19 August 2024
Sunday 18 August 2024
Saturday 17 August 2024
Friday 16 August 2024
Thursday 15 August 2024
Wednesday 14 August 2024
Tuesday 13 August 2024
I just posted this on Twitter as @ rvanbruggen
Sometimes you need people like @jim_dowling to give you a bit of a broader perspective of WHY things are best done in certain ways. @hopsworks' taxonomy for data transformations is a super useful guide. See below, and on https://t.co/emA7Nno223 https://t.co/eNzFelbRGk
Sometimes you need people like @jim_dowling to give you a bit of a broader perspective of WHY things are best done in certain ways. @hopsworks' taxonomy for data transformations is a super useful guide. See below, and on https://t.co/emA7Nno223 https://t.co/eNzFelbRGk
— Rik Van Bruggen (@rvanbruggen) Aug 13, 2024
from Twitter https://twitter.com/rvanbruggen
August 13, 2024 at 12:37PM
via IFTTT
Monday 12 August 2024
Saturday 10 August 2024
Friday 9 August 2024
I just posted this on Twitter as @ rvanbruggen
RT @hopsworks: It’s inspiring to see what users out there do with Hopsworks! 🌟 Sean Yap used the FTI pipeline architecture with Hopsworks…
RT @hopsworks: It’s inspiring to see what users out there do with Hopsworks! 🌟 Sean Yap used the FTI pipeline architecture with Hopsworks…
— Rik Van Bruggen (@rvanbruggen) Aug 9, 2024
from Twitter https://twitter.com/rvanbruggen
August 09, 2024 at 11:08AM
via IFTTT
Thursday 8 August 2024
Wednesday 7 August 2024
The benefits of Openness and Modularity in an AI Lakehouse
I have written before about some of the important paradoxical tendencies in the Machine Learning (ML) and Intelligence (AI) industries. Let’s look at it from two perspectives:
- The Developer: the top engineer that is trying to help his or her organization move forward, by using ML and AI techniques.
- The CTO: the engineering manager that is trying to oversee these efforts, and who wants to ensure that they are pursued in a technologically and economically/budgetary sound way - while complying with all the new rules and regulations that are now forthcoming.
The Developer's Context
For developers working on these types of AI/ML software projects, I think we can state some universal truths:- developers don't like architectural overhauls
- developers are opinionated about the tools that they like and use: they want to be productive, and because of this they usually prefer to stick to the tools that they know.
The CTO's Context
For CTO’s that are working on AI/ML software projects, we can also consider some universal considerations:- The CTO is responsible for making sure that his organization is constantly innovating and adding value to the business. AI and ML seem to offer massive potential for innovation for many organizations: we want to seize the opportunity, before our competitors do so. If the CTO does not act proactively, there will be a CxO on the management team that will start asking questions about the potential of AI shortly - you can bet your house on that.
- Exceptionally, this is done with "big bang" solutions that will radically change an entire architecture from scratch, in a greenfield.
- However, most people don't work in a greenfield, they work in a brownfield. Therefore, big bang approaches are not possible, and most people and organizations turn to innovate in different small incremental steps.
So: in assessing both of these perspectives, we are clearly presented with an important question: how can we figure out a way to incrementally innovate using AI/ML, provide value to our organization in an economically sound manner, while respecting the productivity requirements and current tool choices of our engineers as much as possible?
The importance of Openness and Modularity
As I have tried to explain how both the individual engineer and their CTO management have at least slightly conflicting interests in the implementation of AI and ML projects. How can we make this work then? Here’s where I think we can take a look at the software industry in general, and look at some of the technological and methodological approaches that have made software engineering so much more productive in recent years.To do so, let’s think back to where we came from: as little as two or three decades ago, waterfall-based software engineering methodologies were the standard for how you would tackle large software projects. It took books like the Mythical Man-month and the Agile Manifesto to emerge for people to start changing their approach, and to start developing and then using iterative software engineering methodologies, DevOps practices, Infrastructure-as-Code and cloud computing as critical enablers of a more modern approach. This is what we need to apply in the world of the very specific domain of AI and ML now: we need to learn how to apply these techniques in the world of AI and ML, and that will require a specific architecture to enable this.
This architecture should therefore
- NOT be a closed, all-or-nothing environment,
- NOT limit the engineers to a specific toolset,
- NOT bound you to a specific deployment environment, offered by this that or the other cloud vendor
- NOT dictate the specific framework that you should or should not use for your AI and ML
Let’s discuss this vision in a bit more detail. We’ll call this the Incremental AI Architecture (AIAI), enabled by the AI Lakehouse.
The Incremental AI Architecture (IAIA)
Today, many AI and ML projects are characterized by structures that are very different from what we want in an innovative and incremental AI Architecture. Systems are developed with bespoke tools and processes that are specific to the project in which they are implemented, and do not allow for reuse and automation in a way that an IAIA enabled by an AI Lakehouse would. Therefore, we propose a combination of best practices and tools that will allow you to move in this direction.Let’s walk through the different steps in a few succinct paragraphs.
Start with modularization of pipelines:
At Hopsworks, we have argued many times before that one of the first steps that one should take when embarking on an AI/ML project, is to conceptually break up the AI/ML system into three parts. We refer to these three parts as the “F-T-I” pipelines, where we would distinguish three distinct phases and processes in the development and productionalization of these systems. At a high level, we would consider- The Feature Pipeline, where the different data sources of the input data of the AI/ML system are wrangled into the right format(s), and then eventually persisted in a way that would enable the next “T” phase of the system to proceed.
- The Training Pipeline, where the input features would be used to train and test a an AI/ML model that would be used for making the predictions that we need for our organization.
- The Inference Pipeline, where we deploy the model that was generated by the training pipeline, and start using it to allow systems and applications to use it’s capabilities and serve the predictions of the model to the outside world. In this pipeline, we would also want to monitor for any impactful changes inside or outside the system that would warrant us to revisit the process.
Once we have split the AI/ML system into these three functional and modular parts, we can proceed to take the next step to make the system into a true production-ready AI system. This will require some education of our team, and that would be the important second step in this journey.
Educate the team on the importance of MLOps for production AI systems
In order to maximize the benefits of AI systems in production, we will need to automate as many of the FTI pipeline steps as possible. This is what we call MLOps, short for “Machine Learning Operations” and analogous to DevOps.These automations will offer
- Significant Engineering and Technical benefits that will allow us to write better software systems
- Process benefits that will allow these improved systems to be seized and maintained
- Budgetary and Financial benefits: both our operational expenditures (OpEx) and our capital expenditures (CapEx) can be reduced, enabling more innovation at a lower cost.
Do a gradual implementation of an MLOps platform
Over the years, Hopsworks has gained a lot of experience with the implementation of AI/ML systems on top of an MLOps platform, and as a consequence we have developed and released the 4.0 release of the Hopsworks Enterprise platform. We have called this the “AI Lakehouse” release, as we believe that it allows for a comprehensive but gradual implementation of everything an AI/ML project needs to become successful.Based on this experience, and using the AI Lakehouse infrastructure, we recommend that you consider the following steps in sequence:
- Implement a feature store, using the appropriate taxonomy of feature transformations for offline, batch use cases, as well as for online, real time use cases. As I have written before and elsewhere, a Feature Store is a critical piece of software infrastructure that will serve as a forcing function for MLOps.
- Implement and automate the FTI pipelines on the Feature-store platform. As you can see from the graphic above, all of the pipelines of the recommended F-T-I approach will leverage this central infrastructure, and therefore drive all users and stakeholders of the AI/ML system towards a centralized approach that will be much easier for the Alpha Developer, manageable for the Empowered CTO, and governable for the architects and compliance teams that are to ensure that rules and regulations are appropriately followed.
- Implement a model registry, thereby accurately keeping track of all the impacts and changes that are required of our production AI/ML systems. If necessary, you could combine the registry with an experiment tracking system - but we have found this to be largely unnecessary if the above steps are duly implemented. Once you have the FTI pipelines connected to a feature store, tracking experiments becomes pure overhead.
- Implement centralized model deployment for offline, batch use cases, as well as for online, real time use cases.
- Finalize with centralized feature and model monitoring, iterating back to the FTI pipelines whenever such monitoring demonstrates the need for updating and adaptation.
Wrapping up
With that, we hope to have made the case for open, modular AI and ML infrastructure more clearly. At Hopsworks, we live and breathe this philosophy day in and day out, and we would love to help you see its power for yourself and your business. Please reach out to us if you have any comments or questions - we will be happy to help out.Tuesday 6 August 2024
Sunday 4 August 2024
Saturday 3 August 2024
Thursday 1 August 2024
Wednesday 31 July 2024
I just posted this on Twitter as @ rvanbruggen
RT @jim_dowling: From our SIGMOD'24 paper, an easier read on the work we have done and are doing on making #rondb the database for real-tim…
RT @jim_dowling: From our SIGMOD'24 paper, an easier read on the work we have done and are doing on making #rondb the database for real-tim…
— Rik Van Bruggen (@rvanbruggen) Jul 31, 2024
from Twitter https://twitter.com/rvanbruggen
July 31, 2024 at 02:43PM
via IFTTT