Behind an interactive venn diagram: bitstrings and clipping masks

WHAT a

This is a breakdown of some code behind my venn diagram highlighting tool (try it?) It was fun to implement, and there's a couple nifty things to share like image clipping and bitwise operations. Pascal's triangle even makes an appearance.

First of all, here's the use case:

WHY a

Sometimes when tutoring math students, I like to sketch questions in rapid succession on a whiteboard. This works well for numbers.

" So you know that 102 is divisible by 3 because (1+0+2) is divisible by 3.
How about 103? [erase digit]
104? [erase digit]
105? [sweep away]
1000011? [sweep away]
369? [erase digit]
319? [sweep away]
-42? "

But drawing venn diagrams is hard. Circles are hard, and preportioning space between the circles is hard, and outlining regions is hard.

" So you know this circle represents group A. [after carefully drawing perfectly aligned circles and pointing at one]
What if I take away this part...
What's this region called? [erase a section]
How about this? [redraw, then erase the other part] ...
And this one? ... [hurrying to redraw, then outlining the outer unit region] ...
And notice the familiar shape here... [triple outlining the crescent shaped region for emphasis] ...
Good, now how about this...? [attempting to un-outline, at which point everything looks like squigglevision] "

So a simple web tool is in order, one that can quickly show and erase venn diagram regions; a teaching aid where I can flash different set combinations in front of the student.

PRIOR ART a

An old-school math game website
try it

Naive canvas fill, mspaint style, complete with paint bucket and aliased curves. An undo button but no clear button. Erasing is done by switching to white fill color.
jQuery.venn
demo source

This one fills unit regions on click in an interesting way. It draws horizontal lines between ellipse boundaries, like a textured pattern fill. No concept of drawing region borders though.
D3.js + Venn.js
demo source

demo 2 source 2

Demonstrating the display abilities of D3.js, supplemented with some venn diagram logic. Uses SVG paths and a lot of math. It actually seems quite capable of rendering any region of a venn diagram, along with borders, and with interactivity. But it lacks certain unit regions (eg. in demo 1, you can't select only Outkast, nor can you select Morrissey and Elvis without Radiohead). Maybe this can be changed in the config?

In summary:

Partial borders and interactivity are the hardest features to implement
The code is not friendly; maybe there's a cleaner way to generate fills and borders, something less imperative

My goal is only to implement 2- and 3-venn diagrams; students hardly use anything else, and the diagrams become a bit noisy and confusing beyond that point. But 4-venn and 5-venn would be interesting proofs of concept...

IMAGE CLIPPING MASKS a

First, each circle of the venn diagram is drawn onto an image, a bitmap with pixel values either 0 or 1. As an example, a 4x3 screen for a 2-venn might use these bitmaps:

0 1 0 0     0 0 1 0
1 1 1 0     0 1 1 1 
0 1 0 0     0 0 1 0

   A           B

Each region of the venn diagram is derived from these with combinations of intersections and inverts (AND and NOT operations). The intersection operation is called clipping, and the bitmaps are called masks 1.

Here's what it looks like to create the unit regions for a 2-venn:

0 1 0 0     0 0 1 0     0 0 0 0
1 1 1 0  &  0 1 1 1  =  0 1 1 0
0 1 0 0     0 0 1 0     0 0 0 0

   A           B         middle

0 1 0 0     1 1 0 1     0 1 0 0
1 1 1 0  &  1 0 0 0  =  1 0 0 0
0 1 0 0     1 1 0 1     0 1 0 0

   A          ~ B        only A

1 0 1 1     0 0 1 0     0 0 1 0
0 0 0 1  &  0 1 1 1  =  0 0 0 1
1 0 1 1     0 0 1 0     0 0 1 0

  ~ A          B         only B

1 0 1 1     1 1 0 1     1 0 0 1
0 0 0 1  &  1 0 0 0  =  0 0 0 0
1 0 1 1     1 1 0 1     1 0 0 1
                          
  ~ A         ~ B       not A nor B

Notice how the whole screen has been divided into four regions, the indivisible units of the venn diagram. Other regions can be composed by combining those units (overlapping with an OR operation):

0 1 0 0     0 0 1 0     0 1 1 0
1 0 0 0  |  0 0 0 1  =  1 0 0 1
0 1 0 0     0 0 1 0     0 1 1 0

only A      only B      only A plus only B

This system is not vector-based, so it will have aliasing problems and it won't scale well, but that's ok. The clipping operations are exposed with syntax like:

newMask = maskA.and(maskB).or(maskC.not())

BITSTRINGS a

Clipping masks are good for creating regions, and the syntax is rather nice. But the code will still spiral out of control without a systematic way to generate and store the masks.

This is the sad way, creating variable names alongside each region:

{
   'circleA': maskA,
   'circleB': maskB,
   'circleAandB': maskA.and(maskB),
   'circleAnotB': maskA.and(maskB.not()),
   'circleBnotA': maskA.not().and(maskB),
   ...
}

Imagine naming all regions for 3-venn (gross) or generalizing to higher venn (impossible). It would be much nicer if the region identifiers could be dynamically generated along with the clipping masks.

Enter the happy path, using bitstrings:

A = 0011
B = 0110
{
   [A]: masks[A],
   [B]: masks[B],
   [ A &  B]: masks[A].and(masks[B]),      
   [ A & ~B]: masks[A].and(masks[B].not()),
   [~A &  B]: masks[A].not().and(masks[B]),
   ...
}

2 There's no arbitrary variable names. It can be automated!

CHOOSING THE RIGHT BITS

How were those bitstrings chosen?

A = 0011
B = 0110

Each unit region is assigned a unique bit. For example,
let 0001 be the outer region of circle A,
let 0100 be the outer region of circle B,
let 0010 be the middle region they share,
and let 1000 be the region outside the circles.

Then each circle's bitstring is composed by combining its regions:

outer plus inner = circle A

 0001  | 0010  == 0011

HIGHER VENN

There are multiple such solutions, depending on how you assign the bits. It's a fun little puzzle to find bitstrings by hand.

Here's one for 3-venn:

A = 00011011
B = 00100111
C = 01001101

See where the shared bits represent different regions?
0000000_ is where all three circles overlap.
0000___0 are the inner unit regions shared by only two circles.
0___0000 are the outer unit regions belonging to only one circle.

Any composite region can be derived from these starting bitstrings. For example, 01100100 represents the combined region of B and C, excluding A.

    B   plus    C   but not   A    =   result

(00100111 | 01001101) & ~ 00011011 == 01100100

GENERALIZING FOR AN ALGORITHM

The next fun puzzle is to find a pattern, a general process to create these bitstrings.

Notice what happens when the bitstrings are rotated from 3 rows of 8 digits to 8 rows of 3 digits:

ABC

000     1 row of all 0s
001     3 rows of permuting one 1 and two 0s
010   
100   
101     3 rows of permuting two 1s and one 0
011   
110   
111     1 row of all 1s

and here's 2-venn for comparison:

AB

00     1 row of all 0s
01     2 rows of permuting one 1 and one 0
10
11     1 row of all 1s

121, 1331, ... that's Pascal's triangle, or binomial coefficients, or combinatorics, whatever you want to call it! Sure enough, 14641 pops up with bitstrings for a 4-venn. Very cool to see that math appear here with venn diagrams.

This pattern translates straight into a function, creating permutations of 0s and 1s and transposing them into bitstrings. Bitstrings can be generated for any venn.

BORDERS a

With the code infrastructure in place, time for the final feature: highlighting borders on mouseover. It's a small piece of feedback that makes a big difference (see prior art example 3 compared to examples 1 and 2).

To draw borders around regions, we need to draw arc segments. One method of getting arcs would be to calculate each arc with some math. But being committed to the clipping mask route means avoiding calculating partial curves, even avoiding the concept of paths entirely.

Perhaps there's a clever way to make borders by dilating regions, subtracting a filled region by a slightly scaled down copy (like a making a ring by using two concentric circles).

But turns out there's actually a way to isolate each arc segment with existing code, by clipping borders with circle fills. Here's an example for getting the arcs around a unit region:

unitRegion = A & B & ~C
arcs = [
   borders[A].and(fills[B]).and(fills[C].not()),
   borders[B].and(fills[A]).and(fills[C].not()),
   borders[C].and(fills[A]).and(fills[B])
]

There's a pattern: start with a circle border, then cut away segments using other circles' fills. This gets repeated for every unit region.

MOUSE

How do we know which unit region the mouse is hovering over? It starts with an equation to tell if a point is inside a circle or not. As pseudocode:

let circle A be centered at ax, ay with radius r

circle A contains x,y if 
   distance from ax,ay to x,y is less than r

Then it's a flowchart process to identify the unit region.

Start with a bitstring of all 1s, representing that the mouse could be any in unit region.
Then knock out bits, either ANDing a circle's bitstring if the mouse was inside it, or ANDing the NOTd bitstring. Repeat for all circles.
The result will be a bitstring with only one bit turned on - the identifier for the unit region that the mouse is in.

An example, more pseudocode:

A = 00011011
B = 00100111
C = 01001101

bits = 11111111
is mouse in A?
   yes
   bits = bits & A
   // 00011011
   is mouse in B?
      no
      bits = bits & ~B
      // 00011000
      is mouse in C?
         yes
         bits = bits & C
         // 00001000

That's the last nifty thing to share! The rest of the code is just boring glue.

ALL THE VENNS a

Here's a demo for 5-venn and 4-venn.

They're fun to mess around with. I kept them at 400px so they wouldn't crash a fullscreen page, on account of being slow. They also don't have the extra feature that the main 3-venn page does, that of pressing a/b/c on the keyboard to toggle circle border visibilities. They aren't labeled either.

Fun fact about 4-venn and higher: circles are incapable of expressing all the possible combinations between 4 or more sets. If you try to shim a fourth circle onto a 3-venn, you'll find that there's no way to create the 16 different unit regions. That's why 4-venn and higher introduce some asymmetry, usually with ellipses instead of circles. 4-venn and 5-venn require some particular size and angle constants to look nice - you can find them in the demo code.

OPTIMIZATIONS / ROADMAP a

The main 3-venn page loads embarrasingly slow on my computer and especially my phone. It's generating dozens of fullscreen images and looping over each pixel many times before the user can interact with the page. I'd like to speed it up a bit. Some options:

Reduce the pixel density for the sake of high-res screens. EDIT - done
Cache repeated image operations, eg. store the result of masks[A].and(masks[B]) EDIT - done
Pre-render and save the clipping masks to file so the page is loading them instead of calculating them. Concerns: scaling to different resolutions might look ugly
Shaders! They seem like the right tool for fast pixel-by-pixel logic. And I've sharpened my GLSL skills thanks to GSoC and p5js. Concerns: losing friendly, declarative code structures, like mapping bitstrings to textures
Port to SVG? Nice to imagine one SVG file, and maybe a small mousemove script, that can be plopped anywhere on the web. Fast and portable. Concerns: I think the vector benefits of SVG would be lost because clipping might force rasterization. Not sure if SVG has complete AND/OR/NOT masking functions. Also I'm not sure I feel like learning SVG or D3.js.
Web workers? Offload JS so the UI thread isn't blocked. Concerns: handling DOM elements like <canvas> is disallowed in web workers. And it would still be slow, but at least the page would respond and not lag the browser. Maybe use plain arrays for clipping masks instead of images tied to the canvas?
Port to some other language where these patterns can be expressed more eloquently (types, pure functions, a standard lib?) With vanilla JS, there's lots of ugly boilerplate. This change would just be for fun and for the sake of the code, not resulting in any change to the output.

FOOTNOTES a

1 I like to think of clipping as a cookie cutter removing dough. And I think of masks as stencils. A semantics note as well: if the masks use intermediate opacity values, and the operations are not binary, it's called masking instead of clipping.

2 The code blocks are approximately Javascript, with exceptions:

Binary literals need to be prefixed with 0b, like 0b0011, but that's omitted for clarity.
Bitwise AND, OR, and NOT look like &, |, and ~. This is the normal syntax but I put it here for the uninitiated.
0011 !== ~1100. These binary literals have an unseen string of twenty-eight preceding zeros (since a bitwise-treated number in JS is a 32 bit signed int). I ran into this gotcha when I found that bitwise NOT was flipping all those bits and ruining equality checks, eg. getting 11111111111111111111111111110011 instead of 00000000000000000000000000000011. This behavior is patched by a bitstring class that's omitted for clarity.
JS allows certain expressions as object keys, but they'll get stringified. For example, { [~A & ~B]: ... } gets turned into { ['-8']: ... } The leading ones from the bit flips get parsed back into a signed int, and then to a string. This behavior is patched by the bitstring class as well.
I wish JS supported custom operators for something like [A & ~B]: masks[A] & ~masks[B], but oh well. Chaining syntax is ok too.

I guess the point of this footnote is 'Javscript at your own risk'?