Data Provenance and Lineage

Audrius Kucinskas
4 min readSep 25, 2020

Provenance is a term used to reflect the origin of an object and a ledger of actions done to it. In modern times, provenance is used in art restoration as well as proof of value.

Provenance in Arts

Titian’s “Diana and Actaeon”

There are various examples where provenance proves the authenticity of the art of ownership. For example, Titian’s “Diana and Actaeon” has quite a substantial list ownership provenance showing moves between owners:

  • Painted between 1556 and 1559 for Philip II, King of Spain; by descent to Philip V.
  • By whom presented to Antoine, 4th Duc de Gramont, French Ambassador to the Spanish court, 1704.
  • By whom presented, probably around 1706–8, to Philippe, 2nd Duc d’Orléans, later French regent.
  • By descent at the Palais-Royal, Paris, to Louis-Philippe-Joseph, Duc d’Orléans (Philippe Egalité), by whom sold in 1792 to Édouard Walckiers of Brussels (but resident in Paris).
  • By whom sold in the same year to his cousin François-Louis-Joseph de Laborde-Méréville, Paris.
  • By whom transported to London in 1793 and mortgaged to Jeremiah Harman.
  • By whom sold in 1798 to the dealer Michael Bryan, acting on behalf of a syndicate consisting of Francis Egerton, 3rd Duke of Bridgewater, his nephew George Granville Leveson-Gower, Earl Gower (later 2nd Marquess of Stafford and 1st Duke of Sutherland), and Frederick, 5th Earl of Carlisle (husband of Lord Gower’s sister).
  • Reserved by the Duke of Bridgewater.
  • By whom bequeathed in 1803 to Lord Gower.
  • By whom bequeathed in 1833 to his second son Lord Francis Leveson-Gower (who took the name Egerton in 1833 and was created 1st Earl of Ellesmere in 1846).
  • By descent to John Sutherland, 5th Earl of Ellesmere, from 1963 6th Duke of Sutherland, by whom placed on loan to the National Gallery of Scotland in 1945.
  • By whom he bequeathed to Francis Egerton, 7th Duke of Sutherland in 2000.
https://en.wikipedia.org/wiki/Diana_and_Actaeon

Ref.: https://www.nationalgallery.org.uk/research/research-papers/titians-diana-and-actaeon?viewPage=5

Apollo Sauroctonos

An example where provenance cannot be traced and thus putting doubt to the authenticity of a piece of art could be Apollo Sauroctonos sculpture at the Cleveland Museum of Art. Some believe it is made by Praxiteles but since it is not possible to track provenance for certain, some believe it is only a copy by an unknown Roman artist in later times.

https://www.clevelandart.org/exhibitions/praxiteles-cleveland-apollo

Lineage

Lineage is mostly a term used to track the history of something. Quite often it is a genealogical lineage that many people are interested in. In this context, the lineage is tracking who is who in terms of ancestry and the supposed composition of one’s DNA.

Below is an example lineage of German bishop Sigmund Christoph von Waldburg-Zeil-Trauchburg, from 1776.

https://en.wikipedia.org/wiki/Sigmund_Christoph_von_Waldburg-Zeil-Trauchburg

Lineage extends provenance with the addition of showing changes that have happened. In genealogical tree examples, it would not be visible, but the lineage also tracks genetic traits that can be inherited from one’s ancestors.

Data Provenance and Lineage in Modern Sciences

It is quite technically feasible to use lineage and provenance concepts in Software Engineering. Using these well-known concepts can help track “the history of data” and ultimately have an audit trail.

System performing complex data transformations

We use the Provenance term to describe a high-level overview of complex data transformations, explaining why and how something came out of the system.

Provenance of the system

Lineage, on the other hand, is a more granular description of every step in detail. Every single time data is transformed — changed — it would be tracked with clear output, input, and delta.

Lineage of the system

There are quite a few well-known systems that do this. Using data lineage should help with any data auditing questions, quality assurance, and general know-how what data is the most valuable in an organization. All of this more often falls behind the “Data Governance” term.
As for full-blown solutions for data lineage & governance most notable ones are MANTA, IBM’s DataOps, and Octopai.

This was original written by Arunas Vaitkus for internal carVertical usage.

--

--