Book of Sand prototype

So on Friday I spent a few hours making a prototype Book of Sand app in PHP with MySQL (if you don’t know what those are, you probably won’t get much out of the rest of this post).

BoS is essentially a simplified forum/comment system, where users can create threads of text, by adding lines to other lines. Each line can have many child lines, so that if you start at the top you can read down any number of pathways. And if you want to add something, you simply create a new branch from anywhere. The idea is that multiple users can interact to create collaborative texts, or simply have conversations where they can choose to follow any path that takes their fancy. It’s simpler than most tree-structured comment threads, because each post is only a line of text, allowing you to see on screen the full content of all the ancestors of the current line, so that you can read them like directly off the page, and you can also see the full content of each direct child of the current line, so you can choose where to go next.

A screenshot of the prototype Book of Sand main page

A screenshot of the prototype Book of Sand main page

In the above shot the ancestor lines are in green, the current line is in white, and the child line are in red. Note that each child line has a bunch of statistics in yellow to the right, and you can order the list by most of them using the blue links. This means that if you’re browsing a thread and you want to view, for example, the longest branch, you can easily set it so that the appropriate child is always at the top of the list. All ancestor and child lines are links, so you can easily traverse up and down the tree.

The system doesn’t add anything massively new – its functionality already exists in forums all over the place. What makes it interesting to me is what it removes. In the same way that Twitter took the concept of a social network site and threw away everything except the status update, deliberately making everything streamlined and simple, the Book of Sand is just a forum with all the non-essentials thrown away, leaving just the interesting stuff behind: what people actually write.

In a similar way, none of the PHP or SQL used for this site is particularly complicated. The PHP simply looks some stuff up in the DB depending on the Line ID in the query string and renders it on a page. Pretty much all the work it has to do, apart from the most basic login system, is add slashes to everything to prevent SQL injection. The DB is also pretty simple, but if there is any “cleverness” in the app, it’s here, because the trick is to make ancestor look-ups performant.

The obvious implementation would be to have a table that looks a bit like the following:

CREATE TABLE tLine
(
li_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
li_textValue BLOB NOT NULL,

li_parentId INT NULL,
FOREIGN KEY(li_parentId) REFERENCES tLine(li_id)
)

The problem with this is that while it’s compact, it’s a potential nightmare to  get ancestor information from. To find all the ancestors of each line you need to recursively look up every parent of every parent, which is not easy to do in a single query. And then suppose you want to know which of a line’s child lines has the most generations of descendents? You’d have to look recursively look up every child of every child, and since each line can branch into indefinitely many children, this becomes a massively resource-intensive procedure.

An answer to this would be to have extra columns on tLine that cache statistics like this. Every time you insert a line you call a sproc that updates the stats for every affected line. Which is fine, until multiple users want to use the site at once. Because while you’re calculating statistics you can’t have another process simultaneously updating things and calculating statistics, lest changes get double-counted. So you’d have to introduce locking, letting only one process access tLine at a time. Which would make inserting lines really slow when the site was busy.

The answer is to think of the tree of lines not as a series of one-to-many relationships, but as mass of many-to-many relationships. Instead of thinking that each line has one parent and many children, think that each line has many ancestors and many descendants. To represent this we remove li_parentId from tLine, and instead have a new table, tAncestry, that looks like so:

CREATE TABLE tAncestry
(
an_lineId INT NOT NULL,
an_ancestorLineId INT NOT NULL,
an_stepsRemoved INT NOT NULL,
FOREIGN KEY (an_lineId) REFERENCES tLine(li_id),
FOREIGN KEY (an_ancestorLineId) REFERENCES tLine(li_id)
);

You’ll notice that there’s an extra column: an_stepsRemoved. This is needed because otherwise our tAncestry table wouldn’t be able to tell us the difference between a parent and a grandparent. With it, tAncestry can tell us the relationship every line has to every other line it is related to, meaning that to get our hands on all ancestors we just need to do a simple join on an_ancestorLineId back to tLine where an_lineId is the current line, and to get all children but not other descendants we do the same, swapping the ID columns and specifying an_stepsRemoved = 1.

Yes, this table does need to be updated every time a new line is added to tLine. But the great thing is that since we only add rows, never update them, it doesn’t matter if mutiple users are adding lines simultaneously. To add the appropriate lines to tAncestry all we need to do is find all the lines linking the parent line to its ancestors and duplicate them, incrementing an_stepsRemoved, then add one more line linking the new line to its parent. And because no amount of additions will alter the relationship of an existing line to its ancestors, concurrency isn’t an issue.

Plus, crucially, tAncestry makes looking up useful statistics a cinch. Want to know how many children a line has? Use:

SELECT   COUNT(1)
FROM   tAncestry                    AncCount
WHERE  AncCount.an_ancestorLineId = ThisLine.li_id
AND  AncCount.an_stepsRemoved   = 1

The most recent date that the line gained a new descendent (at any level)?

SELECT   MAX(DescendantLine.li_createdDate)
FROM   tLine                         DescendantLine
JOIN   tAncestry                     AncRecent
ON   AncRecent.an_lineId         = DescendantLine.li_id
WHERE  AncRecent.an_ancestorLineId = ThisLine.li_id

The longest branch of descendent lines that a given line has?

SELECT   MAX(an_stepsRemoved)
FROM   tAncestry                   AncMax
WHERE  AncMax.an_ancestorLineId  = ThisLine.li_id

Once you wrap all these up into subqueries of a view that your PHP can join to whenever it looks up a line, with appropriate indexes on the columns, performance stops being a worry. (So you can get on with prettifying the UI…)

The first use of Book of Sand will probably be to power the forum for The Clockwork Quartet’s website. Plus we’ll probably put it online as a standalone site just to see what people do to it. Once it’s live I’ll put up the source code for people to play with.

Advertisement

One Response to “Book of Sand prototype”

  1. [...] Book of Sand is online It’s been quite a long time since I posted about the Book of Sand. This is not, I hasten to add, because the project’s [...]

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

Please log in to WordPress.com to post a comment to your blog.

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.