NOTE: This is my old wordpress blog, snapshotted as a single html page, preserved for posterity.

See mgsloan.com for my new blog.

Template Haskell Lens Idea

I recently ran into the problem of fclabels partial lenses being partial in both directions. This was not a critical problem, but it was annoying that a type had to be Maybe when the code would never use the “Nothing” constructor. There’s a new, interesting lens library called YALL, that inspired me to think about it a bit more. There are some potential issues with this perspective – it is no longer clear that there are algebraic laws that hold. I think that there’s a possibility that this might be resolved by adding some restrictions on the relationship between m and w. Anyway, the point is that lens library design is not a settled issue.

I’m interested in trying another way of using template haskell to express lenses:

fstLens = [mkLens| \(a, _) -> a |]

sndLens = [mkLens| \(_, b) -> b |]

fooLens = [mkLens| \(Just (a, b)) -> [a, b] |]

tupListIso = [mkIso| \(a, b) = [a, b] |]

Each lens is specified in terms of the implementation of its get. We can do this because construction literals are bidirectional – they can be used for pattern matching. The right hand side of the lenses need to have variables in every position in order to preserve the lens laws (otherwise a portion of the set would not be reflected in the corresponding get).

The partiality of the lenses depends on whether any of the types used have multiple constructors – whether a match could fail. If the constructor on the left could fail, then the lens is partial in both directions (so fooLens is fully partial). If the constructor on the right could fail, then the lens is at least partial in the setter.

We can also bring in function application:

plusOneLens = lens (+1) (const . subtract 1)

switchPlus = [mkLens| \(a, b) -> (plusOneLens b, plusOneLens a) |]

This is moving towards a full-blown embedded language for creating bidirectional transformations! It’d be interesting to target the feature set of the Boomerang project, which has a particular focus on doing bidirectional operations with text, and can do so with regexes as well as more powerful grammars. I’ve already written a TH quasi-quoter that allows you to use regular expressions in patterns and expressions: rex. Incorporating this into lens generation, by adding cannonical serialization to regexes, would be really cool.

I think that this way of working with lenses / isos would really help to popularize their use in Haskell. While fclabels is quite excellent, the Applicative instance is not a very clear way to construct lenses on compound structures. This is even nice for the typical lenses, as it avoids using typical records in the first place. Though, foobar_ can be nicer than get foobar.

Plumbers Pointless?

In my last post, I attempted to sarcastically / humorously introduce the plumbers package. I probably should have saved it for April 1st, but I also don’t think the idea is merit-less. I don’t think that becoming a plumbers expert, adept at large plumbing pipelines, would be a very good way to spend your time, just as I don’t think becoming a pointfree combinator ninja is very valuable (though fun!).

The use-case that this is practical / useful for is the very same that the arrow combinators are usually applied to. Let’s face it – most of the time you’re dealing with the (->) arrow, and use (***), (&&&), first, and second. Perhaps I should take a look at making plumber operators for arrows / categories – I have an inkling that they may be useful for lenses.

Anyway, with this particular arrows use case, you don’t really see long chains of arrow combinators – just one at a time, applied to one or two functions. This is the use case I see for the provided plumbers – just take two functions, and apply, bind, or pair the results after giving them the auxiliary parameters. I don’t think that this is too awful – the types of the auxiliary parameters will often make it pretty clear what’s going on. The plumbing operator, when the code reader is skimming code, indicates “combine these two functions, providing these arguments as an environment to their execution”. They can look closer at the plumber to see what’s really happening to the arguments.

The reddit discussion was interesting!

Particularly interesting is this discussion between ehird and cdsmith. I probably should have commented with my thoughts, but I figured out my opinion a few days later, and it was a little bit longwinded, so I figured a followup post was in order. Thank you, ehird, for defending the idea! Thank you cdsmith, for your well informed assessment!

One thing that’s brought up is the “implementation issues” of this idea. I’d like to note that the binary size overhead of including all of these can be mitigated by using Control.Plumbers.TH.implementPlumber. There may be some overhead from invoking the function – I should really add INLINE pragmas!

The other criticism is that “Point-free style is useful when it helps you think at a higher level of abstraction… but I can’t see how these operators lead to higher levels of abstraction.” This is a fair point, however, holding these operators to the standard of “must increase abstraction”. I would argue that points-free form does not significantly increase abstraction, or as ehird points out, “They’re an abstraction of various forms of composition and pipe plumbing. It’s not like not using point-free style lets you escape the plumbing; you just write it in another way.”

f1 = g . h
f2 x = g (h x)

-- Manipulating with f1:
--             f1 $ 1 / 3
-- (Subst)  g . h $ 1 / 3
-- (inline) g ( h ( 1 / 3 ) )

-- Manipulating with f2:
--           f2                $ 1 / 3
-- (inline)  (\x -> g (h x))     1 / 3
-- (apply)          g (h (1 / 3))

If we view things from a value-centric perspective, then our code during evaluation will be full of lambdas, in order to bind these values to names. If we instead view them with a function-centric perspective, we often end up being able to reason about code by direct substitution without beta reduction. I think that the plumbing operators lead to similar substitutional reasoning, and can be good when used tastefully. The question is whether the rules of plumbers (which I should probably write down in a post) are too confusing for reasoning to be effective. It’s quite possible!

The plumbers experiment led me to think about language support for such “classes of identifiers”. It’d be interesting to support using a context free grammar to specify all of the operators / names that something can generate. Then, importing this would import the infinite set of operators generated. They need to be context free, such that we can test for the intersection of identifiers when re-exporting such generators. This would be a huge change to Haskell, for not very much pay off – but interesting to think about!

More Compatible Packages

The Problem

A few days ago, Greg Weber posted “Transcending to dependency heaven”, describing the latest version of Cabal, which sounds like it will far improve the reliability of installing a set of Cabal packages. However, this is not entirely satisfying – we still have the fundamental problems aptly described in a blog post by cdsmith.

Short summaries of some potential solutions:

  • Improve Cabal’s dependency resolution behavior.
  • Isolate build environments.
  • “Create lists of blessed package-sets”
  • Reduce external dependencies by identifying internal dependencies (imports that do not leak into the interface). This has the downside of potentially shipping multiple versions of libraries in your linked binary.

These solutions all help, but ignore the fundamental nature of the diamond dependency issue.

We need to look closely at the purpose and effects of these upper-bound constraints. The intention is to prevent targeting a new, unforeseen version of an API, which may change the interfaces and behaviors. By addressing this with the PVP‘s “proper”, conservative upper bounds, we usually avoid the problem of spewing a bunch of compilation errors.

However, this comes at a huge cost: packages that would otherwise compile and function do not. An upgrade to one module that many of your dependencies depend on can cause many packages to need corresponding updates. This causes a portion of package maintenance to be proportional to the sum of the major-version velocity of the dependencies. This is awful, particularly as small, task-specific packages are encouraged. The last thing we want to do in such an eco-system is to disincentivize adding dependencies and splitting packages.


Solution: Delta Modules

My proposed solution is to create a convention for versioning the modules in packages in order to enable far less conservative package version upper bounds. The idea is that Haskell has enough ways to create synonyms for named things, that for most superficial changes, the old API can be expressed straightforwardly in terms of the new.

For example, data and newtype declarations can be renamed and have parameters re-ordered by type synonyms. Functions can be proxied by declaring functions of the form “old = new” and, in the case of parameter re-ordering, “old x y z = new z y x”.

Many basic refactorings are analogous to plain Haskell code. When applying the refactorings that a module represents, you are just inlining all of the elements of the module necessary to

It makes sense to package up these synonyms in a module that shares a namespace prefix with the module being versioned. For example, the diagrams-lib package would export Diagrams.Prelude.V0_5 as well as Diagrams.Prelude. From here on, such modules will be referred to as “delta modules”, as they express the change in API from version to version.


Example

Let’s say we are going through a number of iterations of a library. For familiarity sake, I chose a well-known, simple module: Data.Maybe


Version 0.0.1

module Data.Maybe ( Maybe(Nothing, Just), maybe ) where

data Maybe a = Nothing | Just a

-- Note: this intentionally deviates from the typical definition
maybe _ d Nothing  = d
maybe f _ (Just x) = f x

In order for people to start using the delta API, preemptively avoiding breaking changes, there needs to be a delta module that straight re-exports the current version:

module Data.Maybe.V0_0_1
  ( Maybe(Nothing, Just), maybe ) where
import Data.Maybe


Version 0.0.2

What happens if we add isNothing / isJust to this?

module Data.Maybe ( Maybe(Nothing, Just), maybe, isNothing, isJust ) where

-- Note: this intentionally deviates from the typical definition
maybe _ d Nothing  = d
maybe f _ (Just x) = f x

isNothing = maybe (const False) True
isJust    = maybe (const True)  False

This is a benign API change – just additions of top-level declarations. It seems a bit silly to add a module just to remove elements of the API.. However, this is still a change that matters, as it could break compilation for modules that import Data.Maybe without qualification or explicit imports. Ideally other changes would be bundled in, as it seems a bit silly to introduce a module just to remove elements of the API, but at least the delta module is very straightforward to create.

Since Data.Maybe.V0_0_1 only differs from the latest version by hiding some exports, we just add a WARNING pragma, and import the latest delta module.

module Data.Maybe.V0_0_1 {-# WARNING "V0_0_1 is not the latest version" #-}
  ( Maybe(Nothing, Just), maybe ) where
import Data.Maybe.V0_0_2
module Data.Maybe.V0_0_2
  ( Maybe(Nothing, Just), maybe, isNothing, isJust ) where
import Data.Maybe

Making the import structure into a linked-list of delta modules allows us to avoid modification of any but the most recent. This works, because if a module exported the correct API when it was the most recent modification of the latest API, then it ought to still be correct, if all subsequent delta modules are too.


Version 0.1.0

Now a breaking change! We’ll fix the definition of maybe to correspond to the typical API:

module Data.Maybe ( Maybe(Nothing, Just), maybe, isNothing, isJust ) where

-- Note: this intentionally deviates from the typical definition
maybe d _ Nothing  = d
maybe _ f (Just x) = f x

isNothing = maybe True  (const False)
isJust    = maybe False (const True)

Here’s what the delta modules would look like:

module Data.Maybe.V0_0_2 {-# WARNING "V0_0_2 is not the latest version" #-}
  ( Maybe(Nothing, Just), maybe, isNothing, isJust ) where
import Data.Maybe.V0_1_0
  ( Maybe(Nothing, Just),        isNothing, isJust )

import qualified Data.Maybe.V0_1_0 as N

maybe a b = N.maybe b a
module Data.Maybe.V0_1_0
  ( Maybe(Nothing, Just), maybe, isNothing, isJust ) where
import Data.Maybe

This is because the actual ADT is only necessary to provide pattern matching – mkConstructor functions are used to avoid writing newtype wrappers everywhere.


Version 1.0.0

Now a really breaking change! As I will explain later, setting the version number to “1.0.0″ should be a strong indication that the pre-1.0.0 delta modules are removed or imperfect.

module Data.Maybe ( Maybe, maybe, mkJust, mkNothing, isNothing, isJust ) where

import Data.Either ( Either(..), either )

type Maybe a = Either () a

maybe d f = either (const d) f

mkJust    = Left ()
mkNothing = Right

-- Other definitions as before
isNothing = maybe True  (const False)
isJust    = maybe False (const True)
module Data.Maybe.V0_1_0 {-# WARNING "V0_0_2 is not the latest version" #-}
  ( Maybe(Nothing, Just), maybe, isNothing, isJust ) where
import Data.Maybe.V1_0_0
  (                       maybe, isNothing, isJust )

import qualified Data.Maybe.V1_0_0 as N

{#- DEPRECATED The old representation of Maybe may require version-coercion #-}
data Maybe a = Nothing | Just a

instance View (Maybe a) (N.Maybe a) where
  view Nothing   = Left ()
  view (Just  x) = Left x

instance View (N.Maybe a) (Maybe a) where
  view (Left  _) = Nothing
  view (Right x) = Just x

maybe d f = N.maybe d f . view
isNothing = N.isNothing . view
isJust    = N.isJust    . view
module Data.Maybe.V1_0_0
  ( module Data.Maybe ) where
import Data.Maybe

This is pretty ugly. We had to export new versions of all of the functions that mention the data type. Worse yet, libraries that target version 0.1.0 or before won’t be directly compatible with those that come after. Also, this code is referencing a class which is a (likely intentionally) unimplemented portion of ViewPatterns. The View class looks like this:

class View a b where
  view :: a -> b

These view functions are then used to do pattern matching on the ADT.

type Iso a b = (View a b, View b a)

Iso is a constraint synonym that I made up for when we have views to and from a datatype. Ideally these would form an isomorphism, though that may not be possible for all data types.

Thankfully, if we know that we will be versioning our code in this way, we can pre-empt this in the design:

module A (ThingADT(..), Thing, mkThing) where

data ThingADT = ThingADT Double

newtype Thing = Thing ThingADT

-- One for each constructor
mkThing = ThingADT . Thing

instance View ThingADT Thing    where { view (ThingADT x) = x            }
instance View Thing    ThingADT where { view           x  = (ThingADT x) }

Then, we write all of the library functions in terms of the Thing wrapper. This is a slightly cumbersome solution, but view patterns, particularly implicit ones, really paper over the syntactic impact. mkConstructor functions are used to avoid writing newtype wrappers everywhere.

In the event that implicit view patterns are added to the language, it may also make sense to add a shorthand for composing view with a constructor, eg, #ThingADT :: Double -> Thing. I’m not sure what a good constructor pre / postfix would be – for better or worse, Haskell is rather short on spare symbol sequences for use by language extension.


Class Instances

Thinking in terms of what we can and can’t change with delta modules can guide API design, in order to avoid irrevocable decisions. What kinds of things might we change in class instances?

  • Remove an instance. This is supported pretty directly – just define the instance in the delta module for the last version for which it should exist.

  • Add an instance. This is also supported, though has caveats regarding orphan instances, described below.

  • Remove a constraint from an instance. This is fine, as reducing the constraints on usage still allow it to be used in all of the same places.

  • Add a constraint to an instance. This is a breaking change, because it causes these constraints to need to be added to anything that uses that particular instance. This isn’t really resolvable, but I’ll talk about a mitigating strategy later.


Adopt the Orphans!

Classes and their instances are a leaky part of the delta module abstraction. Namely, instances are inherited from their imports, so instances in the latest version will come with the older delta modules. This would be entirely fine if we could be sure that none of our client packages define or import instances that conflict with these added instances.

In “A world without orphans”, Luke Palmer points out that eliminating orphans allows us to create super-class instances. This is one way to be able to rework hierarchies (such as the numeric hierarchy) – retrospectively split classes while maintaining compatibility.

Orphan instances can be nice – in the comments of the above post, augustss makes a good point that newtype wrappers are clunky. The problem of course is that orphan instances leads to the same benefits and problems as multiple inheritance: “what do we do about conflicting instances?”

The problem with the newtype solution is that it composes badly, making it difficult to write two independent newtypes over a datatype, and easily combine the instances of the two. This can be done, I think, but only if both newtypes are designed to support it (by deriving instances of Control.Newtype and having each newtype be an existential wrapper like forall a. Newtype a Concrete => a). But then it has to be done with a data declaration and not a newtype! I wonder how the performance compares?

So, in lieu of a good, composable solution with newtype(s), the only solution is to attempt to control and mitigate the problems orphan instances cause. I have an idea of one way to do this:

Create an orphan “adoption” registry, and allow package authors to endorse canonical instances. The dependent packages can then annotate with a pragma that proudly declares cannonicality, suppressing the warning. Then, when GHC is compiling a non-canonical instance of something that has one, it will throw an error unless “-XNonCannonicalOrphans” is supplied. When building with cabal, using this flag will need to coincide with appropriate markings on the package, so that it’s clear to all users that it exports non-canonical orphan instances.

The main important aspect of this orphan registry is that the packages involved do not need to be imported, and therefore do not expand the package’s dependencies.


Classes

As things stand, almost no change to a typeclass can be properly expressed in a delta module. The good news is, with Constraint Kinds, we are much closer! Constraint Kinds allows us to create constraint synonyms, giving one name for many type-classes.

However, one thing is missing: the ability to declare instances for these synonyms. This feature is necessary for any of the changes to classes to be properly versioned. There’d be a couple caveats of these type-synonym instances:

  • Not all constraint synonyms can be instantiated – a synonym might mention a class twice. This could be resolved by “sub-instances” – instances within the typing context of an instance, disambiguating which methods are associated

  • Instances that are part of the type synonym, and declared in the same module, should override anything that would be generated by default. This is discussed in the Default Superclass Instances Proposal. I like my proposal better, because as this wiki page, points out that it’s strange to conflate defaults and instances of class aliases.

One nice thing about instances of constraint groups is that they let you consolidate a bunch of instances that need the same constraint context. Once we have this, we can start thinking about having delta modules deal with type classes. Here are all of the things I can imagine you wanting to do to a type class:

  • Rename a type class. This is a simple application of a type constraint synonym.

  • Split a type class. This is an extremely important feature, as it allows for re-organization of typeclass hierarchies. We split a typeclass by having the older version module export the union of the two classes which provide the operations.

  • Combine type classes. This works too – just define the two type classes in the latest version, and use a constraint synonym to merge them.

  • Change method API. If breaking instance declarations, but not usages is acceptable, then a function declaration works for this. It’s possible to not break instance declarations, but it’s not very pretty – you’d need to preserve both operations in the class, named different things. Each would have default implementations in terms of the other, and the methods would be selectively re-exported.

  • Remove a super-class constraint. This is quite doable – just use a type constraint in the delta modules to specify that the removed superclass is now required for a type of that name. This would have been the way of expressing the refactoring for the GHC 7.4.1 removal of Eq / Show superclasses of Num.

  • Add a super-class constraint. This is pretty much out of the picture if the superclass constraint is being added for use in default methods. If it isn’t, then a constraint synonym in the most-recent API would work.

Adding / removing a method is pretty much the same thing as combining / splitting a typeclass. Except in order to effectively remove a method while supporting backwards compatibility, you do still need to implement it.


Method Defaulting – Enter TH

Above, I make the claim that it’s strange to conflate the defalting mechanism with the typing mechanism. Superclass constraints give us a couple things that constraint synonyms cannot emulate:


  • Default implementations in terms of the super-classes.

  • Algebraic properties in terms of operations that are certainly defined, given a single class constraint.

  • Allows you to re-use the same symbol for a particular concept. With Functor / Applicative / Monad, which should have been more hierarchicalized than it currently is, we have the following sets of identical functions: [pure, return], [fmap, liftM, map, liftA], [(<*>), ap], and [concat, join].


These are very nice things to have! However, I think that this “feature” is a rather ugly side of typeclasses, for the following reasons:


  • They are irrevocable API decisions.

  • The defaulting is such that there is no good way of checking if the user’s definitions form an appropriately minimal subset of the class.

  • I feel like algebraic properties specified about a constraint group synonym, are equally weighty as algebraic properties specified about a class. This is particularly the case when all of the classes come from the same package.

  • There are many use cases for more elaborate defaulting. For example, it might be interesting to be able to define an instance for a class when given a function of a particular type.

  • People often face the “only one instance per class per datatype” problem, which isn’t really a problem, as the solution is to wrap it in a newtype! Introducing superclass instances (like in in Default Superclass Instances) brings about the “only one instance per superclass per class” problem. This is even worse, though, as there is no way to specialize on a per-data-type basis.


Doing things this way is fundamentally different, in that there is not a mandatory implication from one class to another. Instead, we use distinctly named derivers, which allow you to generate some or all definitions for an instance. Since this deriver is picked by the user, and can take additional parameters, the user has full control.

There’s a package of template haskell derivers for datatypes. I haven’t seen a full system for doing default instances from other instances, but James Cook’s flexible defaults package looks like an interesting take on this problem.

The deriver package has a DSL for expressing derivations, and ideally defaulting derivations would also be expressible with instance syntax. I’d like to see them look very similar to Luke Palmer’s superclass instance example:

class Additive a where
    (^+^) :: a -> a -> a

$(mkDeriver "AdditiveNum" [d|
instance (Num a) => Additive a where
    (^+^) = (+)
|] )

Possible now:

data Foo = -- ...

instance Num Foo where { ... }

$(deriveAdditiveNum)

The following would require quasi-quoters to be updated in order to be particularly convenient for defining stuff within instances (providing information about the context). The type of this part of the quasi-quoter would be Dec -> String -> [Dec], where the first Dec is the declaration of the class that has been accumulated so-far.

instance Num Foo where { ... }

instance Additive Foo where
  $[deriveFrom| Num Foo]

I think that Template Haskell is a very good way to experiment with better ways of defaulting class methods. Ideally something like this would make it into the language, so that most code doesn’t have any mysterious invocations of TH. But first, exploring the design space, and figuring out out what’s actually needed, is best.

It would also be quite straightforward to use Template Haskell to implement instances for type-synonym constraints. It would reify each class involved in the synonym, and extract which methods are needed by each, and then split all of the provided definitions into the appropriate instances.


Applying Deltas

This idea not only greatly reduces the pain of Cabal upper-bounds, but also removes much of the pain of changing identifier names. This is an excellent property – less resistance to package evolution leads to better packages.

Even with these delta modules, there is still quite a bit of resistance to API evolution, partially due to package users needing to modify their code when the API changes. The main point of this post is that we do not need a special format to store refactorings – Haskell is currently very close to being able to encode a powerful subset of the refactorings associated with API changes.

These refactoring modules do not need to be restricted to expressing the difference in API between versions. It’s also imaginable that you could represent the relationship between similarly designed libraries!


Generating Deltas

Writing these delta modules by hand wouldn’t be that difficult, but Haskellers are notoriously lazy.  We need a tool that attempts to infer what definitions the delta module needs, and inform the user of the remaining difference.

In order to appropriately mark the definitions that cannot be generated / defined, I’d like to request an additional pragma in the vein of WARNING / DEPRECATED – ERROR.  This would allow for a place to put the boilerplate error of “Automated generation of delta function ‘foobar’ infeasible”.  Usually the definition of functions marked with ERROR would undefined as their definition.


Interactions with existing infrastructure


Cabal

Cabal packages work as before, and support this use case very nicely, via the “hs-source-dirs:” field, which can be set to “src, vers”. This lets you have two module hierarchies, one for your normal code, and one for your delta modules. This avoids delta modules cluttering up the source tree, making it feel less manageable. It’s particularly nice when changing the namespaces of modules, as only the directory structure of “vers” needs to be mucked up.

Another directory that could conceivably exist is “imps”, in order to reduce the ugliness of having versioned imports everywhere. Instead of import Data.Maybe.V1_0_0 you’d import Imp.Data.Maybe, which would re-export the appropriate version.


Haddock

It would be nice if Haddock were aware of this convention, and intelligently hid all of the “Vn_n_n” form modules from the index page. The docs for the delta modules would still be accessible, but only through a link from the haddock of the module that they version.

It would also be nice if orphan instances were prominently specified and distinguished in the Haddock, particularly with clear indication of cannonical / non-cannonical annotations if there are orphan registries. I wouldn’t mind if there was a complete listing of orphans on the contents page of the package, after the exported namespaces, as somewhat of a “wall of shame”.


The PVP

One question is “What happens to the PVP?”. The PVP page on the wiki specifies the semantic meaning of the version numbers.

The PVP is compatible with this idea, however, we now have an additional class of change:

  1. A.B.C.D – Non-API-changing edits
  2. A.B.C.0 – Addition of entity
  3. A.B.0.0 – Removal or breaking modification of API
  4. (New) – Probable removal or breakage of delta modules

Since we are introducing a new type of breakage we are presented with a couple of choices:

  1. A.0.0.0 – Increment A on delta module breakage. The problem with this is that the library designer no longer has control over the major-major version number.

  2. Another option is to stick with the intention of the PVP. Now that we have versioned modules, most removal / breaking changes are equivalent to adding a definition.
    1. A.B.C.D – Non-API-changing edits
    2. A.B.C.0 – Modification of API
    3. A.B.0.0 – Probable removal or breakage of delta modules

    The problem with this is that it deviates from the traditional PVP when we export the “latest version” modules. A middle-ground that seems reasonable to me is to use this convention when forcing usage of the versioned modules, and use the prior otherwise.

  3. Another alternative is to make version numbers use five numbers, to accommodate every type of change, and still provide two digits of major version. I think this is too verbose.

Determining what constitutes a breaking delta module change, and what to do with the delta modules, is entirely up to the library designer. A strict library designer would increment, and deprecate old delta modules every time the behavior of old functionality is substantially changed.


Relationship to the ECT

When looking at the PVP page in order to write this post, I noticed that this idea has been brought up before, in a somewhat different form. I had forgotten about this page, but I read this page several years ago, when first learning Haskell, so the seed for this idea was likely planted then. Clearly, this didn’t catch on, maybe because the cure was uglier than the ailment. It seems like the growth and maturation of Hackage in the seven years since Issue 2 of the Monad Reader necessitates implementation of this, or similar schemes.

Differences between my proposal and the ECT:


  • The ECT recommends duplicating module code in the event of a behavioral change (see “Bugs, Behaviour, Semantics” section). I don’t think there’s a point to having the package contain two copies of the code. Not only will their data structures be incompatible, but the semantic link between the old and new version will be lost.

  • The ECT doesn’t address incompatible classes / mention problems related to orphans.

  • The ECT doesn’t address incompatible ADTs.

  • The ECT uses “DEPRECATED” for old versions. This will break packages that have “-Werror” TODO: Check this!

  • This is a pretty long post, at least by my standards! However, if you remove all of the contextualization / speculation text of this post, the description of this idea is fairly concise. The policy document for the “PVP’” or “MCP” (More Compatible Packages) will ideally be concise and understandable, to encourage adoption.


Conclusion

I realize that this is quite a multi-faceted proposal – changing package conventions, majorly altering typeclasses. However, all of these things are just refinements to attempt make delta packages feasible.

I think that this solution is quite attractive, because it can easily be tried now. With community consensus and buy in, something like this can also be incrementally deployed, without breaking any packages, and already cover most API changes.

I hope that this use case will help encourage and motivate resolution of the orphan instances problem. The class / module system is very close to being able to express API-deltas in a way that compose properly, and that is a very sweet, Haskell-ey quality.

Pointless Plumbers

From this recent reddit comment thread / blog post, I had the idea of generalizing the operators found in the Data.Composition package. This could be a bad idea, as it encourages code to have larger, scarier operators, but I think I decided upon some interesting conventions. These operators can be used to construct pointfree expressions in a somewhat more straightforward, less nested fashion.

I brought it up on #haskell to mixed reactions. The following quote is now attributed to me via lambdabot by ski:

 (on pointless black magic)
<mgsloan> welcome to excessively pointless plumbing operators :)
<byorgey> mgsloan: that's... terrifying
<DanBurton> you should put it on hackage

So I did! I cleaned up the library and put it on hackage. Here’s how it works:

Pair Plumber

(*^) ::       r'  ->       r''  ->  a     -> (r', r'')
(*<) :: (a -> r') ->       r''  ->  a     -> (r', r'')
(*>) ::       r'  -> (a -> r'') ->  a     -> (r', r'')
(*&) :: (a -> r') -> (a -> r'') ->  a     -> (r', r'')
(**) :: (a -> r') -> (b -> r'') -> (a, b) -> (r', r'')

(*^) f1 f2  _     = (f1,   f2  )  -- Drop parameter
(*<) f1 f2  a     = (f1 a, f2  )  -- Left gets parameter
(*>) f1 f2  a     = (f1,   f2 a)  -- Right gets parameter
(*&) f1 f2  a     = (f1 a, f2 a)  -- Both get parameter
(**) f1 f2 (a, b) = (f1 a, f2 b)  -- Split tuple

The first two parameters are functions which are applied to the remainder of the parameters, in a fashion requested by the symbol after the initial “*”. These symbols specify a routing – which functions each parameter is routed to – leading to the name “plumbing”. Here’s one downside of this naming scheme – (**) is Floating exponentiation in the Prelude – so modules that fully import this library need to import the Prelude hiding (**).

If these operators were generalized to arrows, which they could be, then (**) would be the same thing as (***), and (*&) would be the same thing as (&&&). So what’s (***) being used for now?

(***) :: (a -> c -> r') -> (b -> d -> r'') -> (a, b) -> (c, d) -> (r', r'')
(***) f1 f2 (a, b) (c, d) = (f1 a c, f2 b d)

It’s the generic zip on tuples! The additional ‘*’ indicates that an additional tuple parameter should be split between the functions. This version of (***) is something I often want, and have added it, under the name zipT (though bizip is probably a better name), to project-specific utilities libraries a few times.

If this is the extended version of (Control.Arrow.***), then what’s the extended version of (Control.Arrow.&&&)?

(*&&) :: (a -> b -> r') -> (a -> b -> r'') -> a -> b -> (r', r'')
(*&&) f1 f2 a b = (f1 a b, f2 a b)

We can also mix & and * in a couple ways:

(*&*) :: (a -> b -> r') -> (a -> c -> r'') -> a -> (b, c) -> (r', r'')
(*&*) f1 f2 a (b, c) = (f1 a b, f2 a c)

(**&) :: (a -> c -> r') -> (b -> c -> r'') -> (a, b) -> c -> (r', r'')
(**&) f1 f2 (a, b) c = (f1 a c, f2 b c)

Never before seen combinators, as far as I know, but I think they are reasonably understandable with a little practice. In theory, I’m defining a naming scheme for an infinite set of related function definitions. In practice, only plumbers up to arity 3 are defined by default – you can invoke a Template Haskell function to generate more if you need them.

Examples

Some examples of using these functions:

λ> (+1) ** (*2) $ (9, 4)
(10, 8)

λ> ((++) *** (++)) ("a", "b") (" forest", "ird")
("a forest", "bird")

λ> (maybe (:[]) replicate *<& length) (Just 3) "hi"
(["hi","hi","hi"], 2)
(11, 20) == ((+1) *&   (*2)) 10

(12, 20) == ((+)  *&&  (*) ) 10 2

(13, 20) == ((+)  *&>< (*) ) 10 2 3

(12, 30) == ((+)  *&<> (*) ) 10 2 3

(12, 40) == ((+)  *&<  (*4)) 10 2

(14, 20) == ((+4) *&>  (*) ) 10 2

Composition Plumber

($^) :: (     r'' -> r') ->       r''  -> a      -> r'  -- Drop parameter
($<) :: (a -> r'' -> r') ->       r''  -> a      -> r'  -- Left gets parameter
($>) :: (     r'' -> r') -> (a -> r'') -> a      -> r'  -- Right gets parameter
($&) :: (a -> r'' -> r') -> (a -> r'') -> a      -> r'  -- Both get parameter
($*) :: (a -> r'' -> r') -> (b -> r'') -> (a, b) -> r'  -- Split tuple

($>) f1 f2  _     = f1   $ f2    -- Drop parameter
($<) f1 f2  a     = f1 a $ f2    -- Left gets parameter
($>) f1 f2  a     = f1   $ f2 a  -- Right gets parameter
($&) f1 f2  a     = f1 a $ f2 a  -- Both get parameter
($*) f1 f2 (a, b) = f1 a $ f2 b  -- Split tuple

The definitions are exactly the same as in the pair plumber, except using the ($) function to combine the arguments, instead of (,). Ordinary composition is "$>" in this system, as it combines the functions using "$", and provides the parameter to the function on the right. All of these operators have "infixr 9" priority, to match with ordinary composition.

λ> :t (.)
(.) :: (b -> c) -> (a -> b) -> a -> c

λ> :t ($>)
($>) ::  r'' -> r') -> (a -> r'') -> a ->r'


PNorm Example

Let's say we want to implement the p-norm on lists. This works by exponentiating each element of a list by p, summing, and exponentiating by 1 / p. Standard, cartesian distance is the p = 2 norm.

pow = flip (Prelude.**)
pnorm p xs = pow (1 / p) (sum (map (pow p) xs))

Here's how I'd normally write this function:

pnorm p = pow (1 / p) . sum . map (pow p)

But now we can go further! Without descending into the full points-free madness of

pnorm = ap ((.) . pow . (1 /)) ((sum .) . map . pow)

We can instead use the plumbers variant.

pnorm = (pow $> (1/)) $&> sum $>> map $> pow

--      (pow $> (1/)) $&>(sum $>> map($> pow))  -- infixr 9

--            /----^  p xs        ^    /---^
--            |       | |         |    |
--            \-------| \------=--/    |
--                    |                |
--                    \---------=------/

The ascii illustration shows how the plumbing operators route the parameters. The arrow for xs may be a little confusing. That parameter is being provided to the result of (map $> pow), because ($>>) is expecting something that uses two arguments (apart from functions) and ($>) = (.) only has one.

What if we want to normalize according to a given pnorm?

pnormalized = (flip map $<> (/)) $>& pnorm

--                    ^          p xs  ^ ^
--                    |          | |   | |
--                    \--=-------|-\---/ |
--                               |       |
--                               \-------/


List Cons Example

On IRC, ski suggested that such higher-arity combinators should be systematically decomposable into the others, and gave the following points-free example:

list3 :: a -> a -> a -> [a]
list3 = ($ []) .:: (.: (:)) .: (.: (:)) . (.: (:)) id

(.:) :: (b -> c) -> (a -> a1 -> b) -> a -> a1 -> c
(.:) = (.) . (.)

(.::) :: (b -> c) -> (a -> a1 -> a2 -> b) -> a -> a1 -> a2 -> c
(.::) = (.) . (.) . (.)

Instead of figuring out how to express these combinators in terms of the others (I might give some of these definitions / identities in a later post - omitted for brevity and convenience), I gave an equivalent definition of list3 using plumbers:

list3 = ((:) $<>> (:) $<> (:[]))

(.:) = ($>>)
(.::) = ($>>>)

Generalizing this to four is easy. However, the module doesn't currently export arity greater than three, as the compile time was longish, and the binary was 1MB. If you want these operators, you can use Control.Plumbers.TH to request their implementation.

list4 = ((:) $<>>> (:) $<>> (:) $<> (:[]))

Turns out that the expression of this can get even more uniform:

<ski> (could you separate `(:[])' into a `(:)' and a `[]', for uniformity ?)
<mgsloan> ((:) $<>>> (:) $<>> (:) $<> (:) $< []) 1 2 3 4
***ski claps

Something interesting to observe is that when using plumbing operators on cons, just by changing the operators involved, we can get out any 3-list that consists of the passed parameters:

λ> ((:) $<>> (:) $<> (:[])) 1 2 3
[1,2,3]

λ> ((:) $<>> (:) $<> (:) $< []) 1 2 3
[1,2,3]

λ> ((:) $<>> (:) $>< (:) $< []) 1 2 3
[1,3,2]

λ> ((:) $><> (:) $>< (:) $< []) 1 2 3
[2,3,1]

λ> ((:) $>>< (:) $>< (:) $< []) 1 2 3
[3,2,1]

λ> ((:) $>&^ (:) $>< (:) $< []) 1 2 3
[2,2,1]

λ> ((:) $&>^ (:) $>< (:) $< []) 1 2 3
[1,2,1]

Implementation

Here's the main body of Control.Plumbers:

$(implementPlumbers compositionSpec)

infixr 9 $^, $<, $>, $&, $*
infixr 9 $^^, $^<, $^>, $^&, $^*, $<^, $<<, $<>, $<&, $<*, $>^, $><, $>>, $>&, $>*, $&^, $&<, $&>, $&&, $&*, $*^, $*<, $*>, $*&, $**
infixr 9 $^^^, $^^<, $^^>, $^^&, $^^*, $^<^, $^<<, $^<>, $^<&, $^<*, $^>^, $^><, $^>>, $^>&, $^>*, $^&^, $^&<, $^&>, $^&&, $^&*, $^*^, $^*<, $^*>, $^*&, $^**, $<^^, $<^<, $<^>, $<^&, $<^*, $<<^, $<<<, $<<>, $<<&, $<<*, $<>^, $<><, $<>>, $<>&, $<>*, $<&^, $<&<, $<&>, $<&&, $<&*, $<*^, $<*<, $<*>, $<*&, $<**, $>^^, $>^<, $>^>, $>^&, $>^*, $><^, $><<, $><>, $><&, $><*, $>>^, $>><, $>>>, $>>&, $>>*, $>&^, $>&<, $>&>, $>&&, $>&*, $>*^, $>*<, $>*>, $>*&, $>**, $&^^, $&^<, $&^>, $&^&, $&^*, $&<^, $&<<, $&<>, $&<&, $&<*, $&>^, $&><, $&>>, $&>&, $&>*, $&&^, $&&<, $&&>, $&&&, $&&*, $&*^, $&*<, $&*>, $&*&, $&**, $*^^, $*^<, $*^>, $*^&, $*^*, $*<^, $*<<, $*<>, $*<&, $*<*, $*>^, $*><, $*>>, $*>&, $*>*, $*&^, $*&<, $*&>, $*&&, $*&*, $**^, $**<, $**>, $**&, $***

$(implementPlumbers productSpec)

infixr 9 *^, *<, *>, *&, **
infixr 9 *^^, *^<, *^>, *^&, *^*, *<^, *<<, *<>, *<&, *<*, *>^, *><, *>>, *>&, *>*, *&^, *&<, *&>, *&&, *&*, **^, **<, **>, **&, ***
infixr 9 *^^^, *^^<, *^^>, *^^&, *^^*, *^<^, *^<<, *^<>, *^<&, *^<*, *^>^, *^><, *^>>, *^>&, *^>*, *^&^, *^&<, *^&>, *^&&, *^&*, *^*^, *^*<, *^*>, *^*&, *^**, *<^^, *<^<, *<^>, *<^&, *<^*, *<<^, *<<<, *<<>, *<<&, *<<*, *<>^, *<><, *<>>, *<>&, *<>*, *<&^, *<&<, *<&>, *<&&, *<&*, *<*^, *<*<, *<*>, *<*&, *<**, *>^^, *>^<, *>^>, *>^&, *>^*, *><^, *><<, *><>, *><&, *><*, *>>^, *>><, *>>>, *>>&, *>>*, *>&^, *>&<, *>&>, *>&&, *>&*, *>*^, *>*<, *>*>, *>*&, *>**, *&^^, *&^<, *&^>, *&^&, *&^*, *&<^, *&<<, *&<>, *&<&, *&<*, *&>^, *&><, *&>>, *&>&, *&>*, *&&^, *&&<, *&&>, *&&&, *&&*, *&*^, *&*<, *&*>, *&*&, *&**, **^^, **^<, **^>, **^&, **^*, **<^, **<<, **<>, **<&, **<*, **>^, **><, **>>, **>&, **>*, **&^, **&<, **&>, **&&, **&*, ***^, ***<, ***>, ***&, ****

$(implementPlumbers ...) invokes a template haskell function which generates all of the function declarations. All of those "infixr 9" declarations should really be unnecessary - you can't create them with Template Haskell yet. See this GHC bug - which simonpj recently created a fix for! Props to him for fixing stuff like that! Until that fix is included in a GHC release, though, I'll leave these fixity declarations around.

You can create your own plumbing operators by using the following interface from Control.Plumbers.TH:

-- | Specifies all of the information needed to construct type declarations
--   for the plumber.
data PlumberTypes = PlumberTypes
 { leftType   :: Type  -- ^ Type of the left argument's result
 , rightType  :: Type  -- ^ Type of the right argument's result
 , resultType :: Type  -- ^ Results type.  This needs to be wrapped in a
                       --   forall naming all of the utilized type variables.
 }

-- | A basic set of types, which make r' the left type, and r'' the right type.
--   The resultType is a forall that introduces these type variables, and has
--   undefined content.  Therefore any implementation in terms of baseTypes
--   needs to redefine resultType, as the Forall has undefined as its content.
baseTypes :: PlumberTypes
baseTypes = PlumberTypes
  { leftType   = mkVT "r'"
  , rightType  = mkVT "r''"
  , resultType = ForallT [mkVB "r'", mkVB "r''"] [] undefined
  }

-- | Specifies all of the information needed to implement a plumber.
data PlumberSpec = PlumberSpec
 { plumberOpE     :: Exp -> Exp -> Exp  -- ^ The plumber implementation
 , plumberTypes   :: Maybe PlumberTypes -- ^ Optional explicit type signatures
 , plumberArities :: [Int]              -- ^ Arities to generate - 26 is max
 , plumberPrefix  :: String             -- ^ Prefix to use for operator
 }

-- | Creates a plumber spec for the given prefix for the generated operators,
--   and the name of the infix operator to use to construct the implementation.
baseSpec :: String -> String -> PlumberSpec
baseSpec p e = PlumberSpec
  { plumberOpE      = (\l r -> InfixE (Just l) (mkVE e) (Just r))
  , plumberTypes    = Nothing
  , plumberArities  = [1..3]
  , plumberPrefix   = p
  }

The operators, along with those that are exported by Control.Plumbers.Monad are defined in Control.Plumbers.Specs as follows:

productSpec :: PlumberSpec
productSpec     = (baseSpec "*" "_") { plumberTypes = Just productTypes
                                     , plumberOpE   = (\l r -> TupE [l, r]) }

compositionSpec :: PlumberSpec
compositionSpec = (baseSpec "$" "$") { plumberTypes = Just compositionTypes }

lbindSpec  :: PlumberSpec
lbindSpec  = (baseSpec "<=" "=<<")   { plumberTypes = Just lbindTypes }

rbindSpec  :: PlumberSpec
rbindSpec  = (baseSpec ">=" ">>=")   { plumberTypes = Just rbindTypes }

frbindSpec :: PlumberSpec
frbindSpec = (baseSpec ">>" ">>")    { plumberTypes = Just $ fbindTypes False }

flbindSpec :: PlumberSpec
flbindSpec = (baseSpec "<<" "<<")    { plumberTypes = Just $ fbindTypes True  }

productTypes :: PlumberTypes
productTypes = addBaseContext $ baseTypes
  { resultType = tuplesT [leftType baseTypes, rightType baseTypes] }

compositionTypes :: PlumberTypes
compositionTypes = addBaseContext $ baseTypes
  { leftType   = arrowsT [rightType baseTypes, leftType baseTypes]
  , resultType = leftType baseTypes
  }

This leaves the library open to others defining plumbing operators following the same conventions.

Thoughts?

I think that this family of operators has a very memorable and visual notation, and can be put to reasonable. I'm not set on all of these decisions, though - the notation may change in order to avoid collisions with (**) and the arrow operators.

What do people think? Is this awful? Useful? Are the symbol choices good? Other suggestions? It's something I've itched for often in past times when points-free style reaches slightly beyond its reasonable limit.

Visualizing the Haskell AST

I’m very enthused by the potential for development environments that offload more of the uninteresting minutae of programming onto the computer. This has been looked at a lot in the past, by a lot of very smart people. From structure editors to the visualization of the results of a variety of static analysis techniques, tons of work has been done. Despite this, these ideas have not yet revolutionized popular development as we know it, with auto-completion being the main widely utilized language-aware convenience tool.

Much more is known about a Haskell program at compile time than a program written in most of your run-of-the-mill programming languages. It seems like it would be a good idea to provide more of this information to the programmer, in a live, interactive, context-dependent form. Examples include depicting the parse tree, types of subexpressions, applicable semantics-preserving transformations, and example-evaluations, right in the programming editor. Rather than attempting to address this problem directly, on limited free-time, I intend to build a number of toy programs to play with the problem.

This post is literate Haskell, and so should be copy-pastable into a *.lhs file. It depends on two libraries that are not yet hackage-ready:

http://github.com/mgsloan/curve
http://github.com/mgsloan/gtk-toy

Gtk-Toy is a wrapper over GTK / Cairo that processes inputs into more Haskell-ey data types and provides a few convenience data structures. I intend to grow it as I write more “toys” for various purposes. The ‘Curve’ library and the ToyFramework are partial ports / re-imaginings of the lib2geom project, which I was more active in several years ago. While working on this library, we established a habit of creating an interactive toy to exercise particular features or to provide a prototying sandbox to play with a new idea. In order to encourage this development pattern, the infrastructure for toy-making had to be convenient and straight-forward, which is what I attempt to achieve with the Haskell equivalent.

I’ll intersperse explanation between chunks of code, for some of the trickier bits, but familiarity with Haskell is assumed. The source on this page is available on github: http://github.com/mgsloan/ast-vis/blob/master/Main.hs

Here’s a script to make trying it out convenient:

#!/bin/bash

# Download and locally install curve library
wget -O curve.tar.gz http://github.com/mgsloan/curve/tarball/c621e5e6b405801a69dbeb1e1ecdd4edcef28199
tar -xvf curve.tar.gz
rm curve.tar.gz
cd mgsloan-curve-c621e5e
cabal configure --user
cabal install
cd ../

# Download and locally install toyframework
wget -O toyframework.tar.gz http://github.com/mgsloan/toyframework/tarball/354c0225ec6d21c24b7696468b81ddb37aa099f2
tar -xvf toyframework.tar.gz
rm toyframework.tar.gz
cd mgsloan-toyframework-354c022
cabal configure --user
cabal install
cd ../

# Download and run the simple AST-vis
git clone git@github.com:mgsloan/ast-vis.git
cd ast-vis
runhaskell Main.hs
> {-# LANGUAGE FlexibleInstances, TemplateHaskell,
>              TupleSections, TypeOperators #-}
>
> import Control.Arrow ((&&&))
> import Control.Monad (liftM, zipWithM_)
> import Data.Curve
> import Data.Data
> import Data.Function (on)
> import Data.Generics.Aliases
> import Data.Label
> import Data.List (groupBy)
> import Data.Maybe
> import Graphics.ToyFramework
> import Language.Haskell.Exts.Annotated
> import qualified Graphics.Rendering.Cairo as C

Now that the imports are out of the way, we define the state representation for the AST-visualization toy. It’s very simple – this is not intended to be anywhere near a real text editor – and so just stores the code in a plain string, the current cursor position, and a cache of the parsed representation. It also stores the current mouse location, in order to provide vertical scrolling of the AST visualization (as it can easily get quite large).

Following the data declaration is a Template Haskell fclabels invocation which provides lenses for the different fields of the state. These allow you to construct views on data structures, use them to get / set / modify the projection (often times, and the whole time here, these are just ADT fields).

> data State = State
>   { _code :: String
>   , _cursor :: Int
>   , _parsed :: (ParseResult (Decl SrcSpanInfo))
>   , _mouseCursor :: (Double, Double)
>   }
>
> $(mkLabels [''State])

First, a few convenience fclabels-related utilities. modM and setM lift modify and set, respectively to yield monadic values. lensed provides a more generic self-modification, allowing the new value for some label to be derived from the projection of another. updateParse is an example usage of lensed which will soon become useful.

> modM :: Monad m => (b :-> a) -> (a -> a) -> b -> m b
> modM l f = return . modify l f
>
> setM :: Monad m => (b :-> a) -> a -> b -> m b
> setM l x = return . set l x

> lensed :: (f :-> a) -> (f :-> a') -> (a -> a') -> f -> f
> lensed l l' f s = set l' (f $ get l s) s
>
> updateParse :: State -> State
> updateParse = lensed code parsed parseDecl

This is what most toy main functions will look like – an initial value for the state of the toy, followed by references to the functions which handle events and drawing. handleMouse just sets the mouseCursor field of the state to the mouse position. This will later allow for adjustment of the vertical position of the AST diagram.

> main :: IO ()
> main = runToy $ Toy
>  { initialState = updateParse $
>      State "fibs = 0 : 1 : zipWith (+) fibs (tail fibs)" 0 undefined (0, 220)
>   , mouse   = const $ setM mouseCursor
>   , key     = handleKey
>   , display = handleDisplay
>   , tick    = const return
>   }

Definition of Toy from the “toyframework” source code, for reference:

data Toy a = Toy
  { initialState :: a

  -- Given the current keyboard state, perform a single 30ms periodic execution
  , tick    :: KeyTable                              -> a -> IO a

  -- Display using cairo, based on the canvas size and dirty region.
  , display :: IPnt -> IRect                         -> a -> C.Render a

  -- Handle mouse presses (first parameter is (pressed?, which)) and motion.
  , mouse   :: Maybe (Bool, Int) -> (Double, Double) -> a -> IO a

  -- Handle key-presses, first parameter is "pressed?", second is (Left string)
  -- to give the names of non-character keys, and (Right char) for the rest.
  , key     :: Bool -> Either String Char            -> a -> IO a
  }

Definition of the key-handler follows. It handles basic motion, deletion, and insertion.

> handleKey :: Bool -> Either [Char] Char -> State -> IO State
> handleKey True (Right k) (State xs ix p m) =
>   return . updateParse $ State (pre ++ (k : post)) (ix + 1) p m
>  where
>   (pre, post) = splitAt ix xs
>
> handleKey True (Left k) s@(State xs ix _ _) = liftM updateParse $ (case k of
>     "Left"  -> modM cursor (max 0 . subtract 1)
>     "Right" -> modM cursor (min endPos . (+1))
>     "Home"  -> setM cursor 0
>     "End"   -> setM cursor endPos
>     "BackSpace" -> modM cursor (max 0 . subtract 1)
>                  . set code (delIx (ix - 1))
>     "Delete" -> setM code (delIx ix)
>     "Escape" -> const $ error "The user escaped!"
>     _ -> return) s
>   where endPos = length xs
>         delIx i | (pre, (_:post)) <- splitAt i xs = pre ++ post
>                 | otherwise = xs
>
> handleKey _ _ s = return s

The handleDisplay function below draws the text and cursor, followed by either the parse tree or an error message. (^+^) and (^-^) are vector-space operators, in this case operating on 2D vectors.

> handleDisplay :: IPnt -> IRect -> State -> C.Render State
> handleDisplay _ (tl, br) s@(State txt ix p (_, ypos)) = do
>   let textPos = (50.5, 100.5)
>       height = (fromIntegral . snd $ br ^-^ tl) * 0.5
>       astPos = textPos ^+^ (0.0, ypos - height)
>
>   move textPos
>   C.showText txt
>
>   -- Draw the mouse cursor.
>   C.setLineWidth 1
>   draw . offset (textPos ^+^ (-1, 0)) . rside 1 . expandR 2
>        =<< textRect txt 0 ix
>   C.stroke
>
>   case p of
>     ParseOk decl -> drawSpans astPos txt (getSpans decl)
>     f@(ParseFailed _ _) -> C.showText (show f)
>   C.stroke
>
>   return s

We’re done with all the little support bits! Only the meat of the problem, the definition of drawSpans and getSpans remains. In fact, if we set it to be a no-op, the above code is a functioning single-line text editor. Not too bad for around 75 SLOC!

drawSpans _ _ _ = return ()
getSpans = undefined

Next, we display a horizontal-spans based visualization of the abstract syntax tree of the code that the user has typed. This proceeds in a fairly straightforward manner, as a pipeline of transformations to draw the source-spans as a stack of labelled lines:

> drawLabeledLine :: String -> DLine -> C.Render ()
> drawLabeledLine txt lin = do
>   draw lin
>   relText 0.5 (lin `at` 0.5 ^-^ (0, 7)) txt
>
> spanLine :: String -> (Int, Int) -> C.Render (Linear Double, Linear Double)
> spanLine txt (f, t) = liftM (rside 2 . expandR 2) $ textRect txt f (t - 1)
>
> drawSpans :: DPoint  -> String -> [((Int, Int), String)] -> C.Render ()
> drawSpans pos txt =
>       -- Draw each labeled line, with each subsequent line 15 pixels lower.
>   (>>= zipWithM_ (\d (lin, name) -> drawLabeledLine name . (`offset` lin)
>                                   $ pos ^+^ (0, 15) ^* fromIntegral d)
>                  [0..])
>
>       -- Turn each span into an appropriately sized line segment.
>   . mapM (\(s, n) -> liftM (, n) $ spanLine txt s)
>
>       -- Prefer last of all identically-spanned tokens.  Pretty arbitrary.
>   . map last . groupBy ((==) `on` (\(x,_)->x))

On the left is the diagram resulting from commenting out the line “. map last . groupBy ((==) `on` (\(x,_)->x))”. As illustrated, it mostly removes information that the user wouldn’t care about for understanding Haskell’s parse tree.

So, how did we manage to get the spans from the abstract syntax tree of the declaration? The haskell-src-exts documentation has tons of ADTs, each representing a different potential members of Haskell’s AST. In order to collect the source-span information, we could write a function for each type, pattern matching on every single case, recursing into the children of each node. What saves us from such drudgery is that every ADT has a derived Data and Typable instance! We will do something much nicer using SYB.

First off, the SrcSpanInfo annotations indicating source location on the AST nodes contain a lot of information we don’t need. For this simple single-file, single-line case, we discard everything by the column range, and so define a convenient accessor for this. If you aren’t familiar with arrows, when operating with functions, the (&&&) operator has type “(a -> b) -> (a -> c) -> a -> (b, c)”. In other words, it applies two functions to the same input, and wraps the result in a tuple.

> srcSpan :: SrcSpanInfo -> (Int, Int)
> srcSpan = (srcSpanStartColumn &&& srcSpanEndColumn) . srcInfoSpan

Here’s the exciting part! How can we get the source span from an arbitrary data type? SYB makes it very easy. Data.Data.gmapQ applies a generic function to every field, and yields the results as a list. Data.Generics.Aliases.extQ allows us to use “Just . srcSpan” whenever it can be applied (when the types are compatible), and otherwise “const Nothing”. So, in whole the following function Just yields a span tuple if the given data type has a SrcSpanInfo field.

> getSpan :: (Data a) => a -> Maybe (Int, Int)
> getSpan = listToMaybe . catMaybes
>         . gmapQ (const Nothing `extQ` (Just . srcSpan))

Next we need to be able to get all of the spans paired up with the names of the constructors. We use gmapQ again, but this time to recursively traverse the entire tree in preorder. Applying show to the constructor representation yielded by Data.Data.toConstr allows us to get the name of the current node in a generic fashion.

> getSpans :: (Data a) => a -> [((Int, Int), String)]
> getSpans x = maybeToList (fmap (, show $ toConstr x) $ getSpan x)
>           ++ concat (gmapQ getSpans x)

Admittedly, this isn’t very useful yet. It’s fun to modify code and see the AST change in realtime, and perhaps might even be marginally useful for those writing code manipulating Haskell-Src-Exts ASTs. What this does do, however, is lay out an initial skeleton for more useful and compact visualizations of the meta-data of Haskell source code, to come in subsequent posts.

© Michael Sloan