Skip to search boxSkip to navigationSkip to main content

X-CSR: Dataflow optimization for distributed XML process pipelines

  • Daniel Zinna(Author)
    ,
  • Shawn Bowersya(Author)
    ,
  • Timothy McPhillipsya(Author)
    ,
  • Bertram Ludaschera(Author)
  • aUniversity of California, Davis
Research Output: Chapter in Book/Report/Conference proceeding Conference contribution

Abstract

XML process networks are a simple, yet powerful programming paradigm for loosely coupled, coarse-grained dataflow applications such as data-centric scientific workflows. We describe a framework called δ-XML that is well-suited for applications in which pipelines of data processors modify parts ("deltas") of XML data collections while keeping the overall collection structure intact.We show how to optimize the execution of δ-XML process networks by minimizing the data shipping cost in distributed settings. This X-CSR 1 optimization employs static type inference based on XML Schema to determine the XML stream fragments that are relevant to a processor, allowing irrelevant fragments to be bypassed ("shipped") to downstream pipeline steps. Finally, we present evaluation results for a realworld scientific workflow, which shows the practical feasibility of X-CSR. A long version of this paper is available as [1].