Wednesday, May 15, 2013

MTCProv: A practical provenance query framework for many-task scientific computing

MTCProv: A practical provenance query framework for many-task scientific computing
L. M. R. Gadelha Jr., M. Wilde, M. Mattoso, I. Foster
Distrib. Parallel Databases (2012) 30:351-370

This article presents a system for storing provenance of runs of scientific computations.  It is fairly high-level, but presents two technical contributions:
  1. a detailed description of the schema / data model used, which allows for repeated tasks, annotations describing relevant content/configuration file parameters, linkage between the program graph and run graph, and nesting of tasks (analogous to our hierarchical provenance model
  2. a higher-level query language (SPQL) that allows for more direct statements of typical queries over the results

I would have liked to see more about the computational model (Swift scripts/programs) and how these are mapped to the provenance records.  Most workflow systems/languages lack a formal specification of their semantics (with or without provenance).  Similarly, detailed comparison with other proposed (e.g. the Harvard systems group's PQL) would be interesting.

Labels: , ,

Sunday, May 05, 2013

Static analysis of multi-staged programs via unstaging translation

Static analysis of multi-staged programs via unstaging translation
W. Choi, B. Aktemur, K. Yi, and M. Tatsuta
POPL 2011

Staged programs are programs that construct and run other programs (written using the same syntax).  For example, Lisp's quote, antiquote and eval operations are well-known examples of .  In the typed programming world, MetaML and MetaOCaml may be the best-known language designs for staged computation.  Work on modal type systems (dating to Davies and Pfenning) established that one can view the modal $\Box$ type as a "code" type, with $run : \Box A \to A$.  Subsequent work has looked at modal types indexed by contexts (e.g. Contextual Modal Type Theory or Taha and Nielsen's environment classifiers).

This paper looks at the problem of static analysis for staged computation.  This is challenging because the values of code types are program fragments that may not be known until runtime, hence, it is difficult (or at least requires significant effort) to adapt classic static analysis techniques.  The authors argue that there has been no significant work on static analyses that are more detailed than type systems (and I don't know of any either).

The authors propose a solution based on translation.  A staged program (in a monomorphic $\lambda$-calculus with $box$, $unbox$ and $run$ operations) is translated to a conventional program in a $\lambda$-calculus with records.  A box type $\Box(\Gamma \vdash \tau)$ is translated to a function type $\tau_\Gamma \to \tau'$, where $\tau_\Gamma$ is a record with one field corresponding to each variable in $\Gamma$.  Boxing and unboxing are translated to function abstraction and application respectively.  Importantly, to preserve evaluation order, unboxing operations are "hoisted" outside of the $\lambda$-abstractions introduced by boxing.
Read more »

Labels: ,

Friday, May 03, 2013

A model for distributed systems based on graph rewriting

A model for distributed systems based on graph rewriting
P. Degano and U. Montanari
J ACM 34(2):411-449

I was pointed to this paper by Eric Griffis, a student who will be visiting this summer as part of a NSF PIRE program.  Eric is interested in graph rewriting and distributed systems.

The paper presents a generic computational model for concurrent/distributed computation based on graphs (really, hypergraphs).  The "nodes" of the hypergraphs correspond to channels, and the "hyperedges" correspond to either events (interactions in the past) or processes (placeholders for possible interactions in the future).  Thus, the graph as a whole describes both the past (previous events), present (frontier of active processes that can make progress now) and future (processes whose progress is blocked by other processes).

The execution behavior of the system can be specified using graph rewriting.  Production rules are specified as mappings from nonterminal labels to graph fragments.  Rewrite rules are obtained from these by merging concurrent events.  The meaning of a system (that possibly runs forever) is defined by setting up a ultrametric space on graph states and taking limits.  (This is a large oversimplification.)
Read more »

Labels: ,