“Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources.”That’s easy enough. Fact is that it’s a really big problem for large organisations - specifically financial institutions as they have to comply with regulations like the Basel Committee on Banking Supervision's standard number 239 - which is all about assuring data governance and risk reporting accuracy.
Here’s a couple of really nice articles and videos that should really give you quite a bit of background.
- Financial Services & Neo4j: Data Lineage & Metadata Management
- Advantages of a Graph-Based Metadata Repository