X-CSR: Dataflow optimization for distributed XML process pipelines
- Daniel Zinna(Author),
- ,
- Timothy McPhillipsya(Author),
- Bertram Ludaschera(Author)
- aUniversity of California, Davis
Abstract
XML process networks are a simple, yet powerful programming paradigm for loosely coupled, coarse-grained dataflow applications such as data-centric scientific workflows. We describe a framework called δ-XML that is well-suited for applications in which pipelines of data processors modify parts ("deltas") of XML data collections while keeping the overall collection structure intact.We show how to optimize the execution of δ-XML process networks by minimizing the data shipping cost in distributed settings. This X-CSR 1 optimization employs static type inference based on XML Schema to determine the XML stream fragments that are relevant to a processor, allowing irrelevant fragments to be bypassed ("shipped") to downstream pipeline steps. Finally, we present evaluation results for a realworld scientific workflow, which shows the practical feasibility of X-CSR. A long version of this paper is available as [1].
