Skip to search boxSkip to navigationSkip to main content

Provenance in collection-oriented scientific workflows

  • Shawn Bowersb(Author)
    ,
  • Timothy M. McPhillipsb(Author)
    ,
  • Bertram Ludäschera, b(Author)
  • aUniversity of California
    ,
  • bUniversity of California, Davis
Research Output: Contribution to journal Article Peer-review

Open access

Abstract

We describe a provenance model tailored to scientific workflows based on the collection-oriented modeling and design paradigm. Our implementation within the Kepler scientific workflow system captures the dependencies of data and collection creation events on preexisting data and collections, and embeds these provenance records within the data stream. A provenance query engine operates on self-contained workflow traces representing serializations of the output data stream for particular workflow runs. We demonstrate this approach in our response to the first provenance challenge.