Behind an interactive venn diagram: bitstrings and clipping masks


WHAT a

This is a breakdown of some code behind my venn diagram highlighting tool (try it?) It was fun to implement, and there's a couple nifty things to share like image clipping and bitwise operations. Pascal's triangle even makes an appearance.

First of all, here's the use case:

WHY a

Sometimes when tutoring math students, I like to sketch questions in rapid succession on a whiteboard. This works well for numbers.

" So you know that 102 is divisible by 3 because (1+0+2) is divisible by 3.
How about 103? [erase digit]
104? [erase digit]
105? [sweep away]
1000011? [sweep away]
369? [erase digit]
319? [sweep away]
-42? "

But drawing venn diagrams is hard. Circles are hard, and preportioning space between the circles is hard, and outlining regions is hard.

" So you know this circle represents group A. [after carefully drawing perfectly aligned circles and pointing at one]
What if I take away this part...
What's this region called? [erase a section]
How about this? [redraw, then erase the other part] ...
And this one? ... [hurrying to redraw, then outlining the outer unit region] ...
And notice the familiar shape here... [triple outlining the crescent shaped region for emphasis] ...
Good, now how about this...? [attempting to un-outline, at which point everything looks like squigglevision] "

So a simple web tool is in order, one that can quickly show and erase venn diagram regions; a teaching aid where I can flash different set combinations in front of the student.

PRIOR ART a

In summary:

  1. Partial borders and interactivity are the hardest features to implement
  2. The code is not friendly; maybe there's a cleaner way to generate fills and borders, something less imperative

My goal is only to implement 2- and 3-venn diagrams; students hardly use anything else, and the diagrams become a bit noisy and confusing beyond that point. But 4-venn and 5-venn would be interesting proofs of concept...

IMAGE CLIPPING MASKS a

First, each circle of the venn diagram is drawn onto an image, a bitmap with pixel values either 0 or 1. As an example, a 4x3 screen for a 2-venn might use these bitmaps:

0 1 0 0     0 0 1 0
1 1 1 0     0 1 1 1 
0 1 0 0     0 0 1 0

   A           B

Each region of the venn diagram is derived from these with combinations of intersections and inverts (AND and NOT operations). The intersection operation is called clipping, and the bitmaps are called masks 1.

Here's what it looks like to create the unit regions for a 2-venn:

0 1 0 0     0 0 1 0     0 0 0 0
1 1 1 0  &  0 1 1 1  =  0 1 1 0
0 1 0 0     0 0 1 0     0 0 0 0

   A           B         middle
0 1 0 0     1 1 0 1     0 1 0 0
1 1 1 0  &  1 0 0 0  =  1 0 0 0
0 1 0 0     1 1 0 1     0 1 0 0

   A          ~ B        only A
1 0 1 1     0 0 1 0     0 0 1 0
0 0 0 1  &  0 1 1 1  =  0 0 0 1
1 0 1 1     0 0 1 0     0 0 1 0

  ~ A          B         only B
1 0 1 1     1 1 0 1     1 0 0 1
0 0 0 1  &  1 0 0 0  =  0 0 0 0
1 0 1 1     1 1 0 1     1 0 0 1
                          
  ~ A         ~ B       not A nor B

Notice how the whole screen has been divided into four regions, the indivisible units of the venn diagram. Other regions can be composed by combining those units (overlapping with an OR operation):

0 1 0 0     0 0 1 0     0 1 1 0
1 0 0 0  |  0 0 0 1  =  1 0 0 1
0 1 0 0     0 0 1 0     0 1 1 0

only A      only B      only A plus only B

This system is not vector-based, so it will have aliasing problems and it won't scale well, but that's ok. The clipping operations are exposed with syntax like:

newMask = maskA.and(maskB).or(maskC.not())

BITSTRINGS a

Clipping masks are good for creating regions, and the syntax is rather nice. But the code will still spiral out of control without a systematic way to generate and store the masks.

This is the sad way, creating variable names alongside each region:

{
   'circleA': maskA,
   'circleB': maskB,
   'circleAandB': maskA.and(maskB),
   'circleAnotB': maskA.and(maskB.not()),
   'circleBnotA': maskA.not().and(maskB),
   ...
}

Imagine naming all regions for 3-venn (gross) or generalizing to higher venn (impossible). It would be much nicer if the region identifiers could be dynamically generated along with the clipping masks.

Enter the happy path, using bitstrings:

A = 0011
B = 0110
{
   [A]: masks[A],
   [B]: masks[B],
   [ A &  B]: masks[A].and(masks[B]),      
   [ A & ~B]: masks[A].and(masks[B].not()),
   [~A &  B]: masks[A].not().and(masks[B]),
   ...
}

2 There's no arbitrary variable names. It can be automated!

CHOOSING THE RIGHT BITS

How were those bitstrings chosen?

A = 0011
B = 0110

Each unit region is assigned a unique bit. For example,
let 0001 be the outer region of circle A,
let 0100 be the outer region of circle B,
let 0010 be the middle region they share,
and let 1000 be the region outside the circles.

Then each circle's bitstring is composed by combining its regions:

outer plus inner = circle A

 0001  | 0010  == 0011

HIGHER VENN

There are multiple such solutions, depending on how you assign the bits. It's a fun little puzzle to find bitstrings by hand.

Here's one for 3-venn:

A = 00011011
B = 00100111
C = 01001101

See where the shared bits represent different regions?
0000000_ is where all three circles overlap.
0000___0 are the inner unit regions shared by only two circles.
0___0000 are the outer unit regions belonging to only one circle.

Any composite region can be derived from these starting bitstrings. For example, 01100100 represents the combined region of B and C, excluding A.

    B   plus    C   but not   A    =   result

(00100111 | 01001101) & ~ 00011011 == 01100100 

GENERALIZING FOR AN ALGORITHM

The next fun puzzle is to find a pattern, a general process to create these bitstrings.

Notice what happens when the bitstrings are rotated from 3 rows of 8 digits to 8 rows of 3 digits:

ABC

000     1 row of all 0s
001     3 rows of permuting one 1 and two 0s
010   
100   
101     3 rows of permuting two 1s and one 0
011   
110   
111     1 row of all 1s

and here's 2-venn for comparison:

AB

00     1 row of all 0s
01     2 rows of permuting one 1 and one 0
10
11     1 row of all 1s

121, 1331, ... that's Pascal's triangle, or binomial coefficients, or combinatorics, whatever you want to call it! Sure enough, 14641 pops up with bitstrings for a 4-venn. Very cool to see that math appear here with venn diagrams.

This pattern translates straight into a function, creating permutations of 0s and 1s and transposing them into bitstrings. Bitstrings can be generated for any venn.

BORDERS a

With the code infrastructure in place, time for the final feature: highlighting borders on mouseover. It's a small piece of feedback that makes a big difference (see prior art example 3 compared to examples 1 and 2).

To draw borders around regions, we need to draw arc segments. One method of getting arcs would be to calculate each arc with some math. But being committed to the clipping mask route means avoiding calculating partial curves, even avoiding the concept of paths entirely.

Perhaps there's a clever way to make borders by dilating regions, subtracting a filled region by a slightly scaled down copy (like a making a ring by using two concentric circles).

But turns out there's actually a way to isolate each arc segment with existing code, by clipping borders with circle fills. Here's an example for getting the arcs around a unit region:

unitRegion = A & B & ~C
arcs = [
   borders[A].and(fills[B]).and(fills[C].not()),
   borders[B].and(fills[A]).and(fills[C].not()),
   borders[C].and(fills[A]).and(fills[B])
]

There's a pattern: start with a circle border, then cut away segments using other circles' fills. This gets repeated for every unit region.

MOUSE

How do we know which unit region the mouse is hovering over? It starts with an equation to tell if a point is inside a circle or not. As pseudocode:

let circle A be centered at ax, ay with radius r

circle A contains x,y if 
   distance from ax,ay to x,y is less than r

Then it's a flowchart process to identify the unit region.

An example, more pseudocode:

A = 00011011
B = 00100111
C = 01001101

bits = 11111111
is mouse in A?
   yes
   bits = bits & A
   // 00011011
   is mouse in B?
      no
      bits = bits & ~B
      // 00011000
      is mouse in C?
         yes
         bits = bits & C
         // 00001000

That's the last nifty thing to share! The rest of the code is just boring glue.

ALL THE VENNS a

Here's a demo for 5-venn and 4-venn.

They're fun to mess around with. I kept them at 400px so they wouldn't crash a fullscreen page, on account of being slow. They also don't have the extra feature that the main 3-venn page does, that of pressing a/b/c on the keyboard to toggle circle border visibilities. They aren't labeled either.

Fun fact about 4-venn and higher: circles are incapable of expressing all the possible combinations between 4 or more sets. If you try to shim a fourth circle onto a 3-venn, you'll find that there's no way to create the 16 different unit regions. That's why 4-venn and higher introduce some asymmetry, usually with ellipses instead of circles. 4-venn and 5-venn require some particular size and angle constants to look nice - you can find them in the demo code.

OPTIMIZATIONS / ROADMAP a

The main 3-venn page loads embarrasingly slow on my computer and especially my phone. It's generating dozens of fullscreen images and looping over each pixel many times before the user can interact with the page. I'd like to speed it up a bit. Some options:

FOOTNOTES a

1 I like to think of clipping as a cookie cutter removing dough. And I think of masks as stencils. A semantics note as well: if the masks use intermediate opacity values, and the operations are not binary, it's called masking instead of clipping.

2 The code blocks are approximately Javascript, with exceptions:

I guess the point of this footnote is 'Javscript at your own risk'?