{"id":3116,"date":"2017-04-24T11:28:54","date_gmt":"2017-04-24T11:28:54","guid":{"rendered":"http:\/\/wiki.davelevy.info\/?p=3116"},"modified":"2019-10-11T18:49:55","modified_gmt":"2019-10-11T18:49:55","slug":"data-lineage","status":"publish","type":"post","link":"https:\/\/davelevy.info\/wiki\/data-lineage\/","title":{"rendered":"Data Lineage"},"content":{"rendered":"<p>About Data Lineage &#8230;.. my notes.<!--more-->Or more accurately links.<\/p>\n<ol>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_lineage\">https:\/\/en.wikipedia.org\/wiki\/Data_lineage<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Meta-data_management\">https:\/\/en.wikipedia.org\/wiki\/Meta-data_management<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/ISO\/IEC_11179\">https:\/\/en.wikipedia.org\/wiki\/ISO\/IEC_11179<\/a> a repository standard,\u00a0<a href=\"http:\/\/aristotlemetadata.com\">http:\/\/aristotlemetadata.com<\/a>, an implementation of the standard at apache.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Big_data\">https:\/\/en.wikipedia.org\/wiki\/Big_data<\/a><\/li>\n<li>A white paper, &#8220;<a href=\"https:\/\/web.archive.org\/web\/20171011045005\/http:\/\/ilpubs.stanford.edu:8090\/525\/1\/2001-5.pdf\">Lineage Tracing for General Data Warehouse Transformations<\/a>&#8221; from the wikipedia article on data lineage.<\/li>\n<li><a href=\"http:\/\/people.cs.aau.dk\/~tbp\/BIT\/moede23\/trace.pdf\">Practical Lineage Tracing in Data Warehouses<\/a>, by Cui &amp; Widom, they pose a scenario and show how storage and inversion allows linegae to be captured and queried. The problem is exclusively defined in SQL.<\/li>\n<li><a href=\"http:\/\/www.dataversity.net\/data-lineage-demystified\/\">Data lineage demystified\/<\/a> at dataversity.net, this is easy to read and mainly talks about why. It is a puff piece for <strong>ASG<\/strong>, who have tools in this market and have published a white paper, they host it at <acronym title=\"I can't find a permalink for the paper\">whitepapers.dataversity.net<\/acronym>, I have mirrored it here.\u00a0<a href=\"https:\/\/davelevy.info\/wiki\/wp-content\/uploads\/2017\/04\/Data-Lineage-CaseStudy_ASG_Final.pdf\">here&#8230;<\/a>. ASG advertise their Enterprise Intelligence and Data Lineage solutions <a title=\"They have a data sheet for the Lineage appliance.\" href=\"https:\/\/www.asg.com\/Products\/Enterprise-Data-Intelligence\/Data-Lineage.aspx\">hsted here&#8230;<\/a>, <a href=\"https:\/\/www.asg.com\/thankyou\/DataIntelligence\/datasheet\/Datasheet-Data-Lineage-Appliance\">published here &#8230;<\/a>\u00a0and <a title=\"because yet again they do not have a permalink\" href=\"https:\/\/davelevy.info\/wiki\/wp-content\/uploads\/2017\/04\/ASG-Datasheet-Data-Lineage-Appliance.pdf\">mirrored here<\/a>.\u00a0All this material is strong on why, and the coverage of the problem, weaker on explaining how its taps work, and how its reports meet business need. They published <a href=\"https:\/\/davelevy.info\/wiki\/wp-content\/uploads\/2019\/05\/3-35481_Utilization_ASG_Final.pdf\">a case study with 5 examples<\/a> of the use of their tool.<\/li>\n<li><a href=\"https:\/\/blogs.informatica.com\/2014\/01\/02\/the-architectural-scope-of-data-governance\/#fbid=3yzhLqbdaLo\">The architectural scope of data governance<\/a>\u00a0a blog on Informatica&#8217;s site by Bob Kerel. He argues this is a process and needs a framework. He categorises the heterogeneity of the sources and repos, and classifies the problems as profiling, discovery, semantics (aka glossaries) and management\/lineage. I wonder if the semweb people have anything to offer.<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20170812022914\/http:\/\/db.cs.berkeley.edu\/papers\/\/\/CSD-97-932.pdf\">Supporting Fine-Grained Data Lineage in a Database Visualization Environment<\/a>, authors are Woodruff &amp; Stonebraker. Is this the paper that invented &#8220;lazy macro&#8221;? \u00a0<acronym title=\"we propose a novel method to support fine grained data lineage. Rather than relying on metadata, our approach lazily computes lineage using a limited amount of information about the processing operators and the base data. We introduce the notions of weak inversion and verification. While our system does not perfectly invert the data, it uses weak inversion and verification to provide a number of guarantees about the lineage it generates. We propose a design for the implementation of weak inversion and verification in an object-relational database management system\">Hover here for more.<\/acronym><\/li>\n<li><a href=\"https:\/\/www.timmitchell.net\/post\/2016\/05\/06\/etl-data-lineage\/\">https:\/\/www.timmitchell.net\/post\/2016\/05\/06\/etl-data-lineage\/<\/a><\/li>\n<\/ol>\n<p>I used a picture from <a href=\"http:\/\/www.lyonwj.com\/2016\/06\/26\/graph-of-thrones-neo4j-social-network-analysis\/\">here <\/a>as the featured picture as graphs are a good visualisation of data lineage (it would seem)<\/p>\n<p>See also<\/p>\n<ol>\n<li><a href=\"https:\/\/web.archive.org\/web\/20171011045005\/http:\/\/ilpubs.stanford.edu:8090\/525\/1\/2001-5.pdf\">http:\/\/ilpubs.stanford.edu:8090\/525\/1\/2001-5.pdf<\/a><\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20150921153922\/http:\/\/ilpubs.stanford.edu:8090\/403\/1\/1999-47.pdf\">http:\/\/ilpubs.stanford.edu:8090\/403\/1\/1999-47.pdf<\/a><\/li>\n<\/ol>\n<p style=\"text-align: center;\">ooOOOoo<\/p>\n<p>The more I look at this, why graphs and not <a href=\"https:\/\/web.archive.org\/web\/20161004124423\/http:\/\/www.technologyuk.net:80\/computing\/sad\/entity-life-history.shtml\">ELHs<\/a>.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>About Data Lineage &#8230;.. my notes.<\/p>\n","protected":false},"author":1,"featured_media":3117,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","_share_on_mastodon":"0"},"categories":[3],"tags":[1120,1109,667,911],"class_list":["post-3116","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-data","tag-data-lineage","tag-graphs","tag-technology"],"share_on_mastodon":{"url":"","error":""},"jetpack_featured_media_url":"https:\/\/davelevy.info\/wiki\/wp-content\/uploads\/2017\/04\/got-graph-w650.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/posts\/3116","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/comments?post=3116"}],"version-history":[{"count":5,"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/posts\/3116\/revisions"}],"predecessor-version":[{"id":4559,"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/posts\/3116\/revisions\/4559"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/media\/3117"}],"wp:attachment":[{"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/media?parent=3116"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/categories?post=3116"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/davelevy.info\/wiki\/wp-json\/wp\/v2\/tags?post=3116"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}