Tuesday, July 02, 2013

The Compiler Forest

The Compiler Forest
M.Budiu, J. Galenson and G. D. Plotkin
ESOP 2013

This paper explores a notion of partial compilers, motivated by the problem of compiling high-level code (e.g. LINQ-style collection-based programs) on different architectures, such as multicore, GPU, or database backends.  If we think of a compiler as a simple function $C : source \to target$, then a partial compiler is a function $PC : source \to source' * (target' \to target)$.  Moreover, we can think of a "compilation problem" $C(source,target)$ as a pair of the source language and target language types.  Then a partial compiler $PC : C(source,target) \mathrel{-\!\!\circ} C(source',target')$ translates one "compilation problem" to another, hopefully simpler one: it takes a $source$ and produces a $source'$ together with a function $target' \to target$ that says how to finish compiling to $target$ once the $source'$ has been compiled to $target'$.

An ordinary compiler is then just a partial compiler that reduces the problem of compiling $source$ to $target$ to the "empty" problem $C(unit,unit)$, since $source \to unit * (unit \to target)$ is isomorphic to $source \to target$.

The paper considers different combinators on partial compilers: sequential composition, tensor, conditional, mapping, and iteration.

Partial compilers are closely related to tactics, explored by Milner and many others since in the context of theorem proving.  There, a tactic is a function of the form $goal \to goal~list * (proof~list \to proof)$.  It takes a goal formula (theorem to be proved) and decomposes it into a list of subgoals, and a function showing how to translate proofs of the goals back to a proof of the original theorem.  (Thanks to James McKinna for this observation.)

The paper also discusses experience with using these combinators to structure compilers targeting different back-ends.  It would be interesting to see details of particular "leaf" compilers, and specifically to see to what extent a conventional compiler can be decomposed into more modular, partial compilers.  It seems that the approach has mainly been used to compose existing, "monolithic" compilers.  For example, it seems nontrivial to decompose (say) closure conversion or register allocation into tactic-like combinators.  If one can decompose these into smaller modules then one could imagine giving a target specification for a compiler and letting the system search for different compilers to that target.

Looking at existing conventional modular compilers might be a good place to start.