My latest new task for the publishing house is to build them a database. This sounds like exactly the sort of thing that is not in my job description, except that my job description is either 'marketing consultant' or 'porn fairy', depending on whether you look at my CV or the business cards Cecilia told me to order myself, and it is entirely fictional. The thing she actually pays me to do is have a brain, and use it. I was baffled as to how I got hired straightaway after a single phone interview, until I realized that Cecilia spots the weird genius kids the same way I do, which is by talking to them for about five minutes and realizing that they have not been confused by any of the weird genius things she has already said.

I am trying to cobble together this database out of a lot of things that are almost, but not quite, entirely unlike actual database pieces. Which I suppose is fitting, because I am almost, but not quite, entirely unlike a person who is at all trained in computer science. While in college, I successfully talked my way into -- and passed -- a class in digital logic even though I was not a CS or engineering major, and had exactly zero of the prerequisites. (I did that a lot. Got through Anat/Phys of Speech Pathology the same way, except I also opted not to bother buying that book.) I took one programming class, which taught me mainly that CS majors are deeply weirded out when the sociologist does math puzzles for fun. Other than that, I stayed out of the department for my entire college career, and stuck to fixing things on the "Windows just shat itself again, why is that?" level of abstraction.

What I am doing* is an absolutely terrible idea and not to be attempted by anyone, anywhere, ever, but like most ideas this bad, it's highly educational. I didn't know relational algebra before, but it's amazing how interesting things get when you can bill hourly for cramming them into your head.

I've been noodling around with it for ~20 legitimately billable hours now, most of which went on my last invoice. I'm getting that uneasy feeling that this is another one of those not-normal things I do. I'm unsure whether to be more disquieted that I fed a new branch of math into my head in 20 hours or vaguely embarrassed that it took me that long.

One of the reasons it seems to be sticking is that it has mostly the same shape as other branches of math I already know. Most of the formal notation is stolen wholesale from set theory. What struck me immediately, though, is that it was innately spatial.

Say you have a product catalog stuffed into a db somewhere. Each row is going to be something like "fleebwanger" in the Product Name column, "green" in the Color column, "pronged" in the Style column, "N" in the Gronkulation column, "M" in the Size column, and "$12.95" in the Price column. A normal person would look at that and think 'gee, that's a handy way of keeping track of stuff and its qualities'. A mathematician is going to look at that and see a record of point "fleebwanger" at Cartesian coordinates (green, pronged, N, M, $12.95). I am absolutely positive that someone out there has taken a metric fuckton of peyote and then tried to do geometry on a bunch of cross-referenced auto parts listings.

I thought at first it might be some sort of particularly weird synaesthesia talking there, but in fact it isn't; the same imagery occurs in E F Codd's 1970 article "A Relational Model of Data For Large Shared Data Banks", which is apparently considered the Ur-paper for How To Database. He applies the idea somewhat less literally than I do, but he does specifically treat each of the category columns in the table as a dimension, and when you're doing abstract logic work, the dimensions of Size, Price, and Color are just as valid as the physical dimensions of Length, Width, and Height. He goes so far as to call the view you get when you select just a few columns out of your table as a 'projection', which is the same terminology you use when rendering a shape in a space that has fewer dimensions than the n-hedron does.

[Example: The 2-dimensional projection of a sphere is a circle whose radius is equal to the radius of the sphere. You're drawing a 3D thing on a 2D surface, ignoring the depth axis. Which axis you ignore makes a difference; if you decide to ignore the depth when drawing a cylinder, you get a rectangle, but if you opt to ignore the length instead, you get a circle. The 1-dimensional projection of any polygon is a line whose length is equal to the distance between the two points farthest apart on the outside of the polygon. A shadow thrown onto a flat surface is effectively a 2D projection of a 3D thing between the surface and the light. This is a 2D isometric render of a 3D projection of a rotating 4D hypercube. The 'faces' (cubical cells, technically) aren't stretching and squashing; they only appear to do so for the same reason that the square face of a cube appears to transform into a parallelogram when you draw it in perspective.]

From that perspective, selecting specific rows from the db based on the value in one of these columns then reduces down to some reasonably basic algebra/calculus. If you want to go rooting around in the table and bring up the product listings for everything available in green, that's equivalent to asking something like "Find all points in f(x) that intersect the line y = 2", where f(x) is a (discontinuous) function defined point-wise by the table, each record is a point whose full coordinates are its attributes, and you want the query to return each point in the function whose position along the Color axis happens to be "green". Plucking stuff out iff the values match what you want in n columns generalizes to finding all of the points in f(x) that lie on the curve given by an equation in n variables.

[You'd have a hard time actually drawing a coordinate system on which you could graph point "fleebwanger," mind you; colors and styles are not well-ordered sets, at least not conceptually. Normally you need to work with a well-ordered axis to do calculus, because the whole point of differential calculus is looking at which direction a function is trending, and in order to know where you're going you have to know where you've been, if you follow. This f(x), since it's defined point-wise and does not exist outside of those points, has no defined slope or limits from any direction anyway. It doesn't matter where you set the origin of the Color axis, or whether white is farther out than blue, as long as you're consistent.]

I spent the train ride home converting the basic operations of relational algebra into logic gates and using the general formula for calculating shortest path distances in n-dimensional Euclidean space to abuse orthography.

* Namely, using Google Forms for the front end, a Google Sheet as one giant flat table containing both the data and the flow control variables for the form, and Google AppScripts to translate things people want into something kind of like queries and pass them back to the spreadsheet selection functions. There are Reasons for all of these things, but they all boil down to a lack of resources, and the fact that I need to build it out of primitives that the other people in the office at least kind of know how to handle. Else I'll end up the Database Emperor for all time and the first time they need to go digging in it while I'm home dead with the flu, the whole thing will go to pieces.

The end result will be a thing that I can really only describe as a Potemkin database: Looks and acts enough like a database, from a certain angle, to make the boss happy, but rickety and easy to fold up and foist off on someone else in the dead of night. I've started calling it "!db". You can either read that "bang-D-B," if you want to riff on the fact that it's full of smutty Circlet books, or "not a database," if you're into telling the truth.


  1. Your coordinate idea reminds me of color theory, such as cie xyz or hunter l a b. If you're interested.

  2. I do web design, kinda-sorta, and HTML does in fact handle colors with a sort of slapdash coordinate system. It looks as though it was invented by someone who was familiar with CIE/Hunter Lab, and despaired of ever banging it into the heads of programmers who thought they were graphic designers. Colors are specified in red/green/blue 3-space, in the form of three hexadecimal numbers in the range [00, FF] inclusive. It's not necessarily great; the accessible colorspace is a bit square chunk out of the middle of the usual swoosh, and is based on the capacity of your average CRT, rather than the capacity of the human eye.

    It works even less well now that we all use flat panels, which behave a bit differently than CRTs, sometimes in ways you wouldn't expect. I apparently see unusually far towards the ultraviolet end of the spectrum, for instance. UV is bouncing all over the place IRL, but made up at most about 5% of the emissions from a CRT monitor, and isn't emitted at all by LCD screens, which use white LEDs for the backlighting. My glass screens were always adjusted much bluer than the default NTSC color, and my laptop displays are still adjusted well to the blue of PAL. Otherwise it doesn't match what I see out here in meatspace at all.


Post a Comment