Computing location-based lineage from workflow specifications to optimize provenance queries
- Saumen Deyb(Author),
- Sven Köhlerb(Author),
- ,
- Bertram Ludäscherb(Author)
- ,
- bUniversity of California
Open access
Abstract
We present a location-based approach for executing provenance lineage queries that significantly reduces query execution cost without incurring additional storage costs. The key idea of our approach is to exploit the fact that provenance graphs resemble the workflow graphs that generated them and that many workflow computation models assume workflow steps have statically defined data consumptionproduction (i.e., data input-output) rates. We describe a new lineage computation technique that uses the structure of workflow specifications together with consumption-production rates to pre-compute (i.e., to forecast) the access paths of all dependent data items prior to workflow execution. We also present experimental results showing that our approach can significantly out perform traditional data lineage query techniques.
