Thursday 29 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
As a sidenote: I remember from my @neo4j days how hard it is to get a book like that pulled together. Especially an @OReillyMedia book. I also fondly remember banter with @jimwebber about the best #neo4j book. Never quite resolved that.

from Twitter https://twitter.com/rvanbruggen

February 29, 2024 at 04:33PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
This is quite a big thing for the #ai, #ml and #llm communities out there: @jim_dowling just published the 1st chapter of his new book on building #mlsystems with a #featurestore. See https://t.co/50DWGObb4r

from Twitter https://twitter.com/rvanbruggen

February 29, 2024 at 04:33PM
via IFTTT

Wednesday 28 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
Have you registered for our inperson @hopsworks  #event in #London yet? Take a look at https://t.co/PinOw7Cdvf for more details.

from Twitter https://twitter.com/rvanbruggen

February 28, 2024 at 11:01PM
via IFTTT

Tuesday 27 February 2024

The 3 Whys of Feature Stores for Machine Learning & AI

Why you need a feature store, why you should buy (not build) one, and why you should consider Hopsworks

Start with 3 Why’s

Quite a few years ago, I read a really intriguing book by Simon Sinek: Start with Why. The subtitle actually gives away the essence of the book: How Great Leaders Inspire Everyone to Take Action. Spoiler alert: they do so by explaining WHY something needs to get done, before explaining how and what needs to get done. It’s a very simple, but in my experience, important and intuitive way to effectively communicate something to any audience. Whether you are communicating to customers, co-workers or your kids - the WHY usually paves that way for much smoother discussions and actions. Sinek talks about the Golden Circle, which outlines how starting from the inside (why) and working towards the outside (what) is an effective method of any communication strategy.




Since I started working for Hopsworks, I have had this framework in the back of my mind, as I got to talk to many more users, customers and partners that have been adopting the amazing technology that the team has built. In these discussions, it actually became clear to me that there are three different “WHY” questions that we need to answer for our community, if we want to be successful in the marketplace. At the risk of misusing the golden circle visualization, I have tried to put these 3 questions in 3 concentric circles in the figure below:


As you can see, you move from the OUTER circle to the INNER circle, and you try to address the following 3 questions:
  • Why would you consider using a feature store architecture in the first place? If you would find enough solid reasons for doing so, you would proceed to the next “Why” question, being:
  • Why would you NOT BUILD, but instead BUY a feature store for your data platform architecture? And if you find enough reasons to BUY and not build, then you would consider the last and final “Why” question, being:
  • Why would you specifically choose to buy the Hopsworks feature store for your data platform architecture?
If, and only if, we understand the potential answers to these questions, in all their variations, can we successfully meet the customer’s expectations and provide value in their implementation. That’s the core idea behind this thought process.

So let’s explore these three WHY questions, and their answers, in a bit more detail.

1. Why Consider a Feature Store for ML/AI?

It’s pretty clear that not everyone needs a Feature Store. A data platform like that is quite specific to the ML/AI workloads, and would only realistically be required or used by organizations and teams that have quite a deep understanding and investment into the relatively new fields of machine learning and artificial intelligence. If all you have in your environment is an early stage experiment with ML/AI technology, then most likely you do not yet have a need for a feature store - seems logical, right? So: what are the conditions under which you would want to consider it? What are the reasons for implementing a Feature Store in your organizations? Let’s explore this!

Many of these reasons were actually outlined in an earlier article on the Hopsworks Blog, and I believe that the reasons for considering a Feature Store are accurately described there. In this overview, I would like to make the distinction between technical and non-technical (as in, business / organizational / competency-related). Let’s dig into it:
  • Technical reasons for considering a feature store:

    • Existing models running in production are expensive - they are hard to debug, review and upgrade, they are bespoke systems that are difficult and costly to maintain. There’s a growing body of evidence that ML/AI systems that do NOT have a feature store architecture in the backend, are simply too expensive because of that - see other points.
    • Monitoring production pipelines is challenging, or impossible. The data that powers AI changes over time, and identifying when there are significant changes that require retraining your AI is not easy.
    • Difficulties in managing the lifecycle of feature data, including the tracking of versions and historical changes. This is an elementary requirement for all regulated data processing environments - and a key reason why feature stores align so well with these industries’ requirements.
    • Feature data is not centrally managed; it is duplicated, features are re-engineered, and generally data is not reused across the organization.
  • Non-technical reasons for considering a feature store:

    • Valuable models are created but once the experimentation stage is over they do not bridge the chasm to operations - the models do not consistently generate revenue or savings. This is all about getting the models to deliver value, consistently.
    • No cohesive governance in the storage and use of AI assets (feature data and models), everything is done in a bespoke manner, leading to compliance risks.
    • Slow time-to-market of AI models, and a general inability to provide very fresh feature data or handle real-time data for ML models, which is critical for industries like finance, retail, or logistics where real-time insights can add significant business value. This point is all about the speed with which the data science team can develop their models and bring them to life in a production environment.
    • Hard to derive a direct business value from the models, they exist in isolated environments that do not directly influence business operations. This then obviously makes it much harder to justify the investments required to develop and operationalize the models.
    • Slow ramp-up time when onboarding new talent into the ML teams. Sharing available AI assets is complex because operational knowledge is held by a few individuals or groups.
We have summarized these reasons in the outer circle of the figure below. I am sure that there are other reasons that could potentially be more applicable to your specific environment - but these are the higher level ones that we see time and time again in our Hopsworks user discussions.


So now we know and understand why an organization requires a feature store - great! But that does not necessarily mean that they will actually go out to look for one in the marketplace! Many organizations, especially the “digital natives” that are tuned in to the latest technology trends (like ML/AI) nowadays have a tendency to at least consider building a software component themselves - instead of buying one. This is a good and worthwhile consideration, as it seems clear to me that there is a minimum of scale and maturity required before wanting to go “all-in” on this brand new technology. For many people, a homegrown solution might be “good enough”.

So how do we consider whether or not a roll-your-own solution is good enough or not? Let’s consider some criteria.

2. Why NOT BUILD, but BUY a Feature Store?


In the second layer of the diagram below, we consider some of the reasons / criteria that would warrant you to look at the BUY option instead of the BUILD option. Some of these reasons have also been covered in a previous article, but let's revisit it here.

The most common reasons for buying and not building a feature store are:
  • Maintenance Burden & Total Cost of Ownership (TCO): clearly, this is something that every mature IT organization will consider. Ultimately, this is related to the potential technical debt that this organization will want to incur, given the significant costs that could be associated with this down the line. It’s important to consider not just the short term, but also the longer term implications of a build vs. buy decision.
  • Technical complexity: clearly, a piece of infrastructure software like a Feature Store, which will underpin all ML/AI applications that the organization would choose to develop, has a significant amount of technical complexity associated to it. It’s important to consider this, and to investigate the most crucial domains in which a “build” approach could encounter unexpected technical challenges.
    • Offline / Online sync: one of the key characteristics of a feature store is that it will both contain the historical data of a feature dataset, as well as the most recent values. Both have their use and purpose, and need to be kept in sync inside the feature repository. Feature Stores like Hopsworks do this for you, but in a “build” scenario you would need to take this into account and do all the ETL data lifting yourself.
    • Reporting and search: in any large machine learning system where you have dozens/hundreds/thousands of models in production, you would want and need the feature data to be findable, accessible, interoperable, and reusable - according to the so-called “F.A.I.R.” principles that we have described in this post. This seems easy - but if you consider all of the different combinations that you could have between versions of datasets, pipelines and models, it is clear that this is not a trivial engineering assignment.
    • Metadata for versioning and lineage: similar to the previous point, a larger ML/AI platform that is hosting a larger number of models, will need metadata for its online and offline datasets, and will need to accurately keep track of the versions and lineage of the data. This will increasingly become a requirement, as governance for ML/AI systems will cease to be optional. Implementations of and compliance with the EU AI Act, will simply mandate this - and the complexity around implementing it at scale is significant.
    • Time-travel and PITC joins: if we want to make the predictive results of our ML/AI systems explainable, we will need to be able to offer so-called “time travel” capabilities. This means that we can look at how a particular model yielded specific results based on the inputs that it received at a specific point in time. Feature Stores will need to offer this capability, on top of the requirement to guarantee that the models yield accurate and correct information at a given point in time - something we call “Point-in-time correctness”. Again, the technical complexity of implementing this yourself is not to be underestimated.




With that, we hope to have outlined some of the key reasons that you should consider buying, not building your feature store solution. At the end of the day this is a strategic decision that will be different for every organization - as long as the question is honestly asked and answered.

3. Why buy the Hopsworks Feature Store?

Last but not least, we would also like to offer the readers that have a) first decided that they need a feature store, and b) also decided that they will want to buy such a critical piece of infrastructure and not build it themselves, a perspective on why Hopsworks might be the best choice for your environment. In line with the previous “Golden Circle” visuals, we now get to the “inner” circle of the diagram:




Obviously we are conscious of the poor readability of the diagram, so here’s a cut-out that is a bit more readable:




As you can see, we think that there are essentially 4 main reasons why the Hopsworks Feature Store solution could be the best possible fit for your environment. Let’s discuss each of these briefly:
  1. Performance and HA: Hopsworks has been working on the Feature Store for a number of years, with a top team of academic and industry specialists. We have integrated and embedded the best possible technologies, like for example RonBD, on the market, and have proven that this is currently giving us unparallelled performance. Take a look at these open benchmarks for yourself, and you will see that Hopsworks is in a league of its own with regards to performance. On top of that, we have been leveraging expertise in systems High-Availability to develop a feature store solution that can withstand the most demanding workloads.
  2. Flexible deployment (serverless / cloud / on-prem): Hopsworks is the only solution on the market that offers you the choice of deployment options that is best-suited for your specific environment. You can start small with a multi-tenant-based serverless environment, grow into a managed cloud deployment in your AWS / Azure / GCP account, or even repatriate the workload onto your own, on-premise hardware. No other solution offers this, today.
  3. Governance and compliance: Hopsworks has taken great pains at developing industry leading governance capabilities into the product. Versioning, lineage, time travel, search, security, monitoring and reporting - all of the advanced functionalities that a compliant solution will be required to deliver, now and in the future.
  4. Value for money, TCO: Hopsworks believes that in order for ML/AI to be successful, it needs to deliver value, and it needs to offer its users a clear Return on Investment. That means that the solution needs to be available at a reasonable price, and that consumption-based metrics cannot always be used for billing. We need to allow for testing, training, experimentation, learning and development - without requiring the customer to empty their pockets from day one, and all the while managing the total cost of ownership of the solution.
This brings us to the end of this article, where we have tried to discuss the “3 Whys” of Feature Store implementations. We hope this was a useful discussion, and are happy to discuss this with you as well. No doubt, we can make the argument even more detailed and refined, together.

All the best

Rik

Monday 26 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
One of the coolest things about @hopsworks is that it allows you to do your #machinelearning and #artificialintelligence on any or all of your data - wherever it lives (in the #cloud or #onprem) and whatever the privacy- and governance-levels it requires.

from Twitter https://twitter.com/rvanbruggen

February 26, 2024 at 01:01PM
via IFTTT

Friday 23 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
Just before the weekend: here's another @hopsworks #5minuteinterview, with Başak Tuğçe Eskili. Başak is a co-host of the @MarvelousMLops, together with @maria-vechtomova, who was part of last week's episode. https://t.co/Xfk5FxkiCY

from Twitter https://twitter.com/rvanbruggen

February 23, 2024 at 04:30PM
via IFTTT

Thursday 22 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
What better way to start a @neo4j #graphsummit than to have @jimwebber teach us about #Birmingham proverbs. And #graphs. That too. https://t.co/mjkWvkVwn2

from Twitter https://twitter.com/rvanbruggen

February 22, 2024 at 10:48AM
via IFTTT

Wednesday 21 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
At @hopsworks , we have been working hard at collaborating with @Neo4j - as part of the "better together" idea. We have released a short #whitepaper to demonstrating the power of #graphalgorithms in #MLPlatforms to boost efficiency. Details in thread -  https://t.co/M9pvS2jYDN

from Twitter https://twitter.com/rvanbruggen

February 21, 2024 at 05:30PM
via IFTTT

Friday 16 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
Last but not least: if you decide that you need a #featurestore for your #machinelearning project, and you would prefer to buy one instead of building it yourself - then why would you look at @hopsworks? Good question - take a look at the green belt of the #goldencircle. https://t.co/3w3HAuRunb

from Twitter https://twitter.com/rvanbruggen

February 16, 2024 at 09:01PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
RT @jim_dowling: Serverless ML Systems at KTH - Real-World Applications . Pydata Meetup on Feb 22nd in Stockholm. https://t.co/BuLvSdLu2W

from Twitter https://twitter.com/rvanbruggen

February 16, 2024 at 12:21AM
via IFTTT

Thursday 15 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
Following up on yesterday's post: once you decide that you need a #featurestore for your #ml and #ai projects - why not just build one yourself? How hard can it be! Well, here are some of the reasons why our @hopsworks clients to buy it, in the orange belt of the #goldencircle. https://t.co/FgLJjvMifP

from Twitter https://twitter.com/rvanbruggen

February 15, 2024 at 10:01PM
via IFTTT

Wednesday 14 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
Following up on my earlier post: WHY would someone want to implement a #featurestore in their #machinelearning #mlops platform? It's a good question. I can see quite a few reasons, which I have tried to summarize in the red ring of the #goldencircle. https://t.co/ESAND5TXP7

from Twitter https://twitter.com/rvanbruggen

February 14, 2024 at 10:30PM
via IFTTT

Monday 12 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
@simonsinek @hopsworks In the next few days/weeks, I plan to write some more about this. Please follow along and comment whenever you feel like it.

from Twitter https://twitter.com/rvanbruggen

February 12, 2024 at 06:01PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
@simonsinek @hopsworks 1. WHY should I consider a #featurestore. 2. WHY should I BUY, and not BUILD a #featurestore for my #MLOps infrastructure. 3. WHY should I, if I decide to BUY instead of building that piece of infrastructure, choose @hopsworks and nothing else.

from Twitter https://twitter.com/rvanbruggen

February 12, 2024 at 06:01PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
@simonsinek In a similar but different vain, I have been working on the "3 WHY's" of the #mlops and #featurestore infrastructure questions that I have been hearing from our @hopsworks customers and users the most.

from Twitter https://twitter.com/rvanbruggen

February 12, 2024 at 06:01PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
I have always been a big fan of @simonsinek 's #goldencircle principle of communication. You #startwithwhy, and then work your way towards the WHAT and the HOW of what you are going to do. https://t.co/dMjC37KVIP

from Twitter https://twitter.com/rvanbruggen

February 12, 2024 at 06:01PM
via IFTTT

Thursday 8 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
I had a a lot of fun talking to @MwitiDerrick for our @hopsworks #5minuteinterviews. So cool to see Derrick doing amazing work in our space, while working for @neuralmagic ! https://t.co/luPnBonllO

from Twitter https://twitter.com/rvanbruggen

February 08, 2024 at 11:30PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
Tonight, me and @SirOibaf are enjoying some quality time briefing the lovely folks at @VectrConsulting about the advantage of using a @hopsworks #featurestore for #machinelearning. Thanks @_TomMichiels_, @Jweiren and @ignazw for inviting us! https://t.co/df19TymMRr

from Twitter https://twitter.com/rvanbruggen

February 08, 2024 at 06:14PM
via IFTTT

Wednesday 7 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
Yey! Now it's @SirOibaf 's turn talking about #rag (retrieval augmented generation) and @hopsworks at the #brussels @elastic #meetup. https://t.co/i4XePHtRgy

from Twitter https://twitter.com/rvanbruggen

February 07, 2024 at 07:55PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
Cool. @xeraa of @elastic is asking us to take the #redpill, and use the new query endpoint using #esql, at the #brussels #meetup. https://t.co/tlCuU7hNFZ

from Twitter https://twitter.com/rvanbruggen

February 07, 2024 at 06:53PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
Excited to be attending the @elastic meetup in #brussels tonight. @SirOibaf is going to be on stage talking about the synergies with @hopsworks so that will be fun! https://t.co/p2PRDAl1Pd

from Twitter https://twitter.com/rvanbruggen

February 07, 2024 at 05:45PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
Really fun event. Great content. Inspiring discussions. Looking forward to many more events like this. https://t.co/gBM5YWT95h

from Twitter https://twitter.com/rvanbruggen

February 07, 2024 at 12:32PM
via IFTTT

Tuesday 6 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
RT @jim_dowling: Imagine you had to pick a novel data source and identify a prediction problem for it. What ML system could you build in a…

from Twitter https://twitter.com/rvanbruggen

February 06, 2024 at 01:23PM
via IFTTT

Monday 5 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
This week, I am really looking forward to doing a @hopsworks talk in #stockholm. We are hosting a breakfast event with Northridge Analytics on the importance of the #euaiact and how we can help clients with this challenge: see  https://t.co/rwnRr164kU

from Twitter https://twitter.com/rvanbruggen

February 05, 2024 at 08:06PM
via IFTTT

Saturday 3 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
So far this Saturday morning is going well! @SirOibaf starting his talk about @hopsworks #hopsfs at this year's @fosdem #softwaredefinedstorage devroom. https://t.co/RNusirgrGp

from Twitter https://twitter.com/rvanbruggen

February 03, 2024 at 10:35AM
via IFTTT

Friday 2 February 2024

I just posted this on Twitter as @ rvanbruggen

This was the text:
Happy Friday! I am super excited to be going to @fosdem tomorrow with our @hopsworks team. @SirOibaf is going to talk about the #fuse mount for #hopsfs - a great way to access #offline #featurestore data. See https://t.co/1cSAoJ9h9j

from Twitter https://twitter.com/rvanbruggen

February 02, 2024 at 03:02PM
via IFTTT

I just posted this on Twitter as @ rvanbruggen

This was the text:
Here's another @hopsworks #5minuteinterview from our fantastic #featurestore and #mlops community: This time I got to speak to @fedor_bystrov who is building out some amazing #ml & @ai platforms at @Skyscanner. See https://t.co/3LR88LAaF6 ...

from Twitter https://twitter.com/rvanbruggen

February 02, 2024 at 08:01AM
via IFTTT