Skip to search boxSkip to navigationSkip to main content

Computing location-based lineage from workflow specifications to optimize provenance queries

  • Saumen Deyb(Author)
    ,
  • Sven Köhlerb(Author)
    ,
  • Shawn Bowersa(Author)
    ,
  • Bertram Ludäscherb(Author)
Research Output: Chapter in Book/Report/Conference proceeding Conference contribution

Open access

Abstract

We present a location-based approach for executing provenance lineage queries that significantly reduces query execution cost without incurring additional storage costs. The key idea of our approach is to exploit the fact that provenance graphs resemble the workflow graphs that generated them and that many workflow computation models assume workflow steps have statically defined data consumptionproduction (i.e., data input-output) rates. We describe a new lineage computation technique that uses the structure of workflow specifications together with consumption-production rates to pre-compute (i.e., to forecast) the access paths of all dependent data items prior to workflow execution. We also present experimental results showing that our approach can significantly out perform traditional data lineage query techniques.