Category ToyFramework

Visualizing the Haskell AST

I’m very enthused by the potential for development environments that offload more of the uninteresting minutae of programming onto the computer. This has been looked at a lot in the past, by a lot of very smart people. From structure editors to the visualization of the results of a variety of static analysis techniques, tons of work has been done. Despite this, these ideas have not yet revolutionized popular development as we know it, with auto-completion being the main widely utilized language-aware convenience tool.

Much more is known about a Haskell program at compile time than a program written in most of your run-of-the-mill programming languages. It seems like it would be a good idea to provide more of this information to the programmer, in a live, interactive, context-dependent form. Examples include depicting the parse tree, types of subexpressions, applicable semantics-preserving transformations, and example-evaluations, right in the programming editor. Rather than attempting to address this problem directly, on limited free-time, I intend to build a number of toy programs to play with the problem.

This post is literate Haskell, and so should be copy-pastable into a *.lhs file. It depends on two libraries that are not yet hackage-ready:

Gtk-Toy is a wrapper over GTK / Cairo that processes inputs into more Haskell-ey data types and provides a few convenience data structures. I intend to grow it as I write more “toys” for various purposes. The ‘Curve’ library and the ToyFramework are partial ports / re-imaginings of the lib2geom project, which I was more active in several years ago. While working on this library, we established a habit of creating an interactive toy to exercise particular features or to provide a prototying sandbox to play with a new idea. In order to encourage this development pattern, the infrastructure for toy-making had to be convenient and straight-forward, which is what I attempt to achieve with the Haskell equivalent.

I’ll intersperse explanation between chunks of code, for some of the trickier bits, but familiarity with Haskell is assumed. The source on this page is available on github:

Here’s a script to make trying it out convenient:


# Download and locally install curve library
wget -O curve.tar.gz
tar -xvf curve.tar.gz
rm curve.tar.gz
cd mgsloan-curve-c621e5e
cabal configure --user
cabal install
cd ../

# Download and locally install toyframework
wget -O toyframework.tar.gz
tar -xvf toyframework.tar.gz
rm toyframework.tar.gz
cd mgsloan-toyframework-354c022
cabal configure --user
cabal install
cd ../

# Download and run the simple AST-vis
git clone
cd ast-vis
runhaskell Main.hs
> {-# LANGUAGE FlexibleInstances, TemplateHaskell,
>              TupleSections, TypeOperators #-}
> import Control.Arrow ((&&&))
> import Control.Monad (liftM, zipWithM_)
> import Data.Curve
> import Data.Data
> import Data.Function (on)
> import Data.Generics.Aliases
> import Data.Label
> import Data.List (groupBy)
> import Data.Maybe
> import Graphics.ToyFramework
> import Language.Haskell.Exts.Annotated
> import qualified Graphics.Rendering.Cairo as C

Now that the imports are out of the way, we define the state representation for the AST-visualization toy. It’s very simple – this is not intended to be anywhere near a real text editor – and so just stores the code in a plain string, the current cursor position, and a cache of the parsed representation. It also stores the current mouse location, in order to provide vertical scrolling of the AST visualization (as it can easily get quite large).

Following the data declaration is a Template Haskell fclabels invocation which provides lenses for the different fields of the state. These allow you to construct views on data structures, use them to get / set / modify the projection (often times, and the whole time here, these are just ADT fields).

> data State = State
>   { _code :: String
>   , _cursor :: Int
>   , _parsed :: (ParseResult (Decl SrcSpanInfo))
>   , _mouseCursor :: (Double, Double)
>   }
> $(mkLabels [''State])

First, a few convenience fclabels-related utilities. modM and setM lift modify and set, respectively to yield monadic values. lensed provides a more generic self-modification, allowing the new value for some label to be derived from the projection of another. updateParse is an example usage of lensed which will soon become useful.

> modM :: Monad m => (b :-> a) -> (a -> a) -> b -> m b
> modM l f = return . modify l f
> setM :: Monad m => (b :-> a) -> a -> b -> m b
> setM l x = return . set l x

> lensed :: (f :-> a) -> (f :-> a') -> (a -> a') -> f -> f
> lensed l l' f s = set l' (f $ get l s) s
> updateParse :: State -> State
> updateParse = lensed code parsed parseDecl

This is what most toy main functions will look like – an initial value for the state of the toy, followed by references to the functions which handle events and drawing. handleMouse just sets the mouseCursor field of the state to the mouse position. This will later allow for adjustment of the vertical position of the AST diagram.

> main :: IO ()
> main = runToy $ Toy
>  { initialState = updateParse $
>      State "fibs = 0 : 1 : zipWith (+) fibs (tail fibs)" 0 undefined (0, 220)
>   , mouse   = const $ setM mouseCursor
>   , key     = handleKey
>   , display = handleDisplay
>   , tick    = const return
>   }

Definition of Toy from the “toyframework” source code, for reference:

data Toy a = Toy
  { initialState :: a

  -- Given the current keyboard state, perform a single 30ms periodic execution
  , tick    :: KeyTable                              -> a -> IO a

  -- Display using cairo, based on the canvas size and dirty region.
  , display :: IPnt -> IRect                         -> a -> C.Render a

  -- Handle mouse presses (first parameter is (pressed?, which)) and motion.
  , mouse   :: Maybe (Bool, Int) -> (Double, Double) -> a -> IO a

  -- Handle key-presses, first parameter is "pressed?", second is (Left string)
  -- to give the names of non-character keys, and (Right char) for the rest.
  , key     :: Bool -> Either String Char            -> a -> IO a

Definition of the key-handler follows. It handles basic motion, deletion, and insertion.

> handleKey :: Bool -> Either [Char] Char -> State -> IO State
> handleKey True (Right k) (State xs ix p m) =
>   return . updateParse $ State (pre ++ (k : post)) (ix + 1) p m
>  where
>   (pre, post) = splitAt ix xs
> handleKey True (Left k) s@(State xs ix _ _) = liftM updateParse $ (case k of
>     "Left"  -> modM cursor (max 0 . subtract 1)
>     "Right" -> modM cursor (min endPos . (+1))
>     "Home"  -> setM cursor 0
>     "End"   -> setM cursor endPos
>     "BackSpace" -> modM cursor (max 0 . subtract 1)
>                  . set code (delIx (ix - 1))
>     "Delete" -> setM code (delIx ix)
>     "Escape" -> const $ error "The user escaped!"
>     _ -> return) s
>   where endPos = length xs
>         delIx i | (pre, (_:post)) <- splitAt i xs = pre ++ post
>                 | otherwise = xs
> handleKey _ _ s = return s

The handleDisplay function below draws the text and cursor, followed by either the parse tree or an error message. (^+^) and (^-^) are vector-space operators, in this case operating on 2D vectors.

> handleDisplay :: IPnt -> IRect -> State -> C.Render State
> handleDisplay _ (tl, br) s@(State txt ix p (_, ypos)) = do
>   let textPos = (50.5, 100.5)
>       height = (fromIntegral . snd $ br ^-^ tl) * 0.5
>       astPos = textPos ^+^ (0.0, ypos - height)
>   move textPos
>   C.showText txt
>   -- Draw the mouse cursor.
>   C.setLineWidth 1
>   draw . offset (textPos ^+^ (-1, 0)) . rside 1 . expandR 2
>        =<< textRect txt 0 ix
>   C.stroke
>   case p of
>     ParseOk decl -> drawSpans astPos txt (getSpans decl)
>     f@(ParseFailed _ _) -> C.showText (show f)
>   C.stroke
>   return s

We’re done with all the little support bits! Only the meat of the problem, the definition of drawSpans and getSpans remains. In fact, if we set it to be a no-op, the above code is a functioning single-line text editor. Not too bad for around 75 SLOC!

drawSpans _ _ _ = return ()
getSpans = undefined

Next, we display a horizontal-spans based visualization of the abstract syntax tree of the code that the user has typed. This proceeds in a fairly straightforward manner, as a pipeline of transformations to draw the source-spans as a stack of labelled lines:

> drawLabeledLine :: String -> DLine -> C.Render ()
> drawLabeledLine txt lin = do
>   draw lin
>   relText 0.5 (lin `at` 0.5 ^-^ (0, 7)) txt
> spanLine :: String -> (Int, Int) -> C.Render (Linear Double, Linear Double)
> spanLine txt (f, t) = liftM (rside 2 . expandR 2) $ textRect txt f (t - 1)
> drawSpans :: DPoint  -> String -> [((Int, Int), String)] -> C.Render ()
> drawSpans pos txt =
>       -- Draw each labeled line, with each subsequent line 15 pixels lower.
>   (>>= zipWithM_ (\d (lin, name) -> drawLabeledLine name . (`offset` lin)
>                                   $ pos ^+^ (0, 15) ^* fromIntegral d)
>                  [0..])
>       -- Turn each span into an appropriately sized line segment.
>   . mapM (\(s, n) -> liftM (, n) $ spanLine txt s)
>       -- Prefer last of all identically-spanned tokens.  Pretty arbitrary.
>   . map last . groupBy ((==) `on` (\(x,_)->x))

On the left is the diagram resulting from commenting out the line “. map last . groupBy ((==) `on` (\(x,_)->x))”. As illustrated, it mostly removes information that the user wouldn’t care about for understanding Haskell’s parse tree.

So, how did we manage to get the spans from the abstract syntax tree of the declaration? The haskell-src-exts documentation has tons of ADTs, each representing a different potential members of Haskell’s AST. In order to collect the source-span information, we could write a function for each type, pattern matching on every single case, recursing into the children of each node. What saves us from such drudgery is that every ADT has a derived Data and Typable instance! We will do something much nicer using SYB.

First off, the SrcSpanInfo annotations indicating source location on the AST nodes contain a lot of information we don’t need. For this simple single-file, single-line case, we discard everything by the column range, and so define a convenient accessor for this. If you aren’t familiar with arrows, when operating with functions, the (&&&) operator has type “(a -> b) -> (a -> c) -> a -> (b, c)”. In other words, it applies two functions to the same input, and wraps the result in a tuple.

> srcSpan :: SrcSpanInfo -> (Int, Int)
> srcSpan = (srcSpanStartColumn &&& srcSpanEndColumn) . srcInfoSpan

Here’s the exciting part! How can we get the source span from an arbitrary data type? SYB makes it very easy. Data.Data.gmapQ applies a generic function to every field, and yields the results as a list. Data.Generics.Aliases.extQ allows us to use “Just . srcSpan” whenever it can be applied (when the types are compatible), and otherwise “const Nothing”. So, in whole the following function Just yields a span tuple if the given data type has a SrcSpanInfo field.

> getSpan :: (Data a) => a -> Maybe (Int, Int)
> getSpan = listToMaybe . catMaybes
>         . gmapQ (const Nothing `extQ` (Just . srcSpan))

Next we need to be able to get all of the spans paired up with the names of the constructors. We use gmapQ again, but this time to recursively traverse the entire tree in preorder. Applying show to the constructor representation yielded by Data.Data.toConstr allows us to get the name of the current node in a generic fashion.

> getSpans :: (Data a) => a -> [((Int, Int), String)]
> getSpans x = maybeToList (fmap (, show $ toConstr x) $ getSpan x)
>           ++ concat (gmapQ getSpans x)

Admittedly, this isn’t very useful yet. It’s fun to modify code and see the AST change in realtime, and perhaps might even be marginally useful for those writing code manipulating Haskell-Src-Exts ASTs. What this does do, however, is lay out an initial skeleton for more useful and compact visualizations of the meta-data of Haskell source code, to come in subsequent posts.

© Michael Sloan

Built on Notes Blog Core
Powered by WordPress