BOOLOPS Redesigns
Boolean Operations
I redesigned the core code several times, each time allowing the same features to be implementable with less code. The overall design on how things would work had a dramatic effect on code size (as such also maintainability, understandability, and overall buginess). Eventually I found an architecture which appears to be near optimal. For posterity, here is the general overview of the various stages:
Design #1
My initial design was mostly focused on the storage of shapes. It was quite a cool idea. Each shape would consist of a tree of regions, where the child regions are contained within the parents. Every layer would switch fill-type, so the children of a filled region would be holes.
The main benefit of this structure is speed - if you know that two nodes don't overlap, you know their children don't either. The largest problem with this approach was complexity. It was way too complex for such an early stage. Now that I have the insights of later redesigns, the ideas here seem more practical. Another problem is that it wasn't really very useful. In the wild, shapes rarely have more than 2 or three recursions of holes-within-islands-within-holes, so the speed benefits would not come into play very often, or significantly. I judged it, rightly so, to be evil premature optimization. Code on this design was never started.
Design #2 (initial code)
In the next design, I flattened the tree down - each Shape would be an outer region with holes inside. Many of the operations had to return lists of shapes, which proved rather inconvenient. This design was the longest lived, lasting a couple of weeks. I wrote data structures and support functions for crossings, and adapted some path intersection code. Upon this support structure I wrote multiple functions to perform boolean operations on regions. I soon realized that these functions contained lots of common code, and combined them all into one function, with a parameter specifying the operation to perform. I figured that this function would be an adequate building block for the Shape boolean operations. I was wrong - while union wasn't horrible size wise (60 or so lines), more complicated operations such as subtraction became large and very difficult to get right.
Design #3 (rewrite)
In search of something better, I went back to the drawing board (literally). I realized that it might be possible to deal with the regions of a shape in a uniform way.
With the current region code, other than special cases, the execution for union and subtraction was the same. The difference was that the second region's path had its winding direction reversed. It turned out if both were reversed, it would be an intersection. I realized that this corresponded to the required operations between each region of a path. The same was true for the operation which simply reversed the main conditional, producing the operations required for intersection.
I overhauled the main region boolean code, and made the shapes into simple collections of regions. I tried very simply applying the boolean operation to each region against the others, but it didn't work. One hard issue, among others, was such situations as a subtraction "breaking into" a hole, causing it to become part of the outer bound.
Design #4 (rewrite)
At this point I shifted back to the old shape representation from design #2. The key difference was that the region boolean function now effortlessly handled quite a bit of the logic.
The definition for union turned out very nicely (20 lines or so), seemingly assuring me that this was the 'right way'. However, intersection and subtraction once again proved more difficult. The main problem was that the region boolean operation was a binary operator. I needed to accumulate the result, yet the results of the sub-results needed to be included when figuring out the other sub-results. It was basically a reduce, but with non-commutative operations. I managed to get subtraction working, yet it was once again fairly large, probably inefficient, and fairly irritable. Before getting around to intersection, I realized that my original, homogeneous design might just work!
Design #5 (finally!)
In the previous redesign I had neglected to realize the implications of path reversal, and the fact that holes had reversed winding. While throughout I had a definite feeling that this was quite elegant, deeper than pure utility, I had not realized what they really mean. The outer boundary of normal shapes, a 'fill' region, contains points that are inside. The reversed, 'hole' regions, contain points that are outside the region. It's quite simple and obvious in retrospect. It implies that in addition to an inversion operation for paths, I should also have an inversion operation for shapes. Ideally the boolean operations would deal with these inverted shapes properly.
The big breakthrough, the "eureka", was the realization that the region intersection code could be adapted to work with two sets of regions, rather than just two regions. In other words, the basic boolean code would work directly on shapes. Since the operation deals with inverted shapes properly, the two basic operations of union and intersection could be made to yield the rest.
I realized that it wouldn't be too hard to write a wrapper which accepts flags representing a boolean logic "truth table". This gives access to all 16 possible boolean operations.
Sanitizer Revisions
The sanitizer turned out to be harder than I expected. I spent about 5 days working on several different implementations before making a major breakthrough. Once again, for posterity I will document the designs attempted or considered:
Design #1
My first attempt was designed to be naive. It would 'simply' traverse all the regions bounded by the paths, then do a winding test on the original path data to tell if it was inside. However, it turned out to be relatively tricky to keep the algorithm from hitting some regions several times. On top of this, using winding wasn't working out well.
Design #2
The second design involved synthesizing a list of edges, culling those that are unwanted, and merging into regions in the output. I managed to get this mostly working, however this scheme also used winding to attempt to figure out the windings on the side of each edge. The issues were not actually my winding function - it was working perfectly. The issue is that the winding cannot be used as a consistent inclusion predicate when the point is on the boundary of the path.
Design #3
The third design was never implemented, and was suggested by Nathan. It involved sweeping paths of monotonic curves, maintaining winding information as it goes. Since sweep on mere bounding boxes took a little while to get right, and I don't precisely understand the method, I decided to forego this otherwise attractive option.
Design #4
Finally, I made a breakthrough, and figured out a completely different method. It would find a crossing on the outer edge of the shape, and traverse the crossings around the outer edge. This part alone took a few days to get right. The next step is to traverse the inner bits, the pathways avoided by the outer-edge traversal, and recurse on this, yielding the inner holes.
As you can see, I did quite a lot of design thrashing on this project. While in retrospect it seems like some of it could have been avoided, really each was necessary, as it led to realizations and framework which allowed the next to work. I'd like to think that much of this design thrashing comes with the territory of working on purely algorithmic code for a difficult topic. I am hoping that the experience gained with this project will allow me to figure out a good design more quickly, and with less spurious effort.