Very Big Graphs

Saturday, February 03, 2007

Moving

This blog is defunct. It's content, along with that of my other blogs, is now hosted at thomas-sutton.id.au.

Friday, February 04, 2005

Today, the summer research scholarship program comes to an end (a lot of people have already buggered off) and with it, my project. I've given my presentation, have had my exit interview and am writing my report. It would be time to move on if it weren't for that pesky lack of results.

Although this was a rather experimental "poke it an see what happens" project, the "results" (or rather "result", or even more correctly "data") that I got is not really satisfactory. To see if I can get any further, I'm going to reimplement the technique using another library (the InfoVis Toolkit) whose data representation is more suited to these purposes and will, hopefully make the technique fast enough to eliminate the annoying lags the prefuse implementation suffered from.

Hopefully, I can get the reimplementation done, and finish the work on the log processor quickly enough so that John has time to work the results into a paper for CADE. I'd also like to try and get a paper (or a poster) ready for submission to GD 2005. I'm not expecting to get anything into the conference, I don't even know if my technique is original, but the paper writing practice can't hurt.

Friday, January 28, 2005

The Joys of Working Code

I've discovered that the AffineTransform support in prefuse appears to be broken. Flipping (i.e. scaling by a negative factor) doesn't work, or at least misplaces the transformed co-ordinates. Rotating the display appears to expose some form of redrawing bug, as parts of the display aren't drawn (or need the display to be panned or scrolled).

This is in addition to what I imagine is a race condition of some description that causes a ConcurrentModificationException (if that is the correct exception class) to be thrown by a collection somewhere in the bowels of the library.

All in all, I'm glad that:

the AffineTransform modification didn't take too much time;

that I've been able to get some diagrams done for my presentation; and

that not much was riding on this project.

While it would have been nice to have a complete package to deliver to my supervisor, I can deal with having a dodgy hack. In addition, I don't think I'll be able to get my log filter scripts completed in time for my presentation or, probably, the end of the project. I'd have liked to have more than one problem displayable, but I can't seem to get the filters to deal with things like subsumption and demodulation properly.

Curses!

Monday, January 24, 2005

Screen-shots now hosted on Flickr

Screen-shots now hosted on Flickr, originally uploaded by thsutton.

I'm now putting my various screen-shots and other images on Flickr so that they'll remain live once my RSISE account goes off-line.

So, without further ado, a repeat of the latest screen-shot, this time, from Flickr.

Thursday, January 20, 2005

On the Benefits of Documentation and Its Reading

I am stupid. That is the only possible explanation for the way I've wasted most of the last couple of weeks of work, by hard-coding a number of transformations into my code, when the library I'm using (prefuse, just in case you'd forgotten) supports java.awt.geom.AffineTransform, in all it's glory, or should that be gory. So now, I've been hacking out all my size handling code; all my device co-ordinate handling code; all the uncommented, unneccessary, inexplicable cruft that has been cluttering up my code, and replacing it all with a single AffineTransform applied to the Display.

This has meant that I've been able to trash my idea of creating a new class. I'd already pretty much finished AdjacencyMatrixManager which was to know the size we were drawing at (i.e. the dimensions of the matrix we were filling in) and be able to convert points in a useful co-ordinate system (such as one with the origin at the bottom-left) into device co-ordinates. As quick as it was to hack up, it didn't take much longer to tear out again.

Now, all my classes are smaller (most of the are just a couple of one-liners in an interface stub), my code is [probably] going to be faster (there are a lot fewer method calls), and it is a lot simpler, with all of the stupidity caused by the device co-ordinates confined to one place.

The only problem I can see will be in adding interface elements that I don't want the AffineTransform to be applied to. Things like tooltips mustn't be transformed, or we'll wind up with huge, unreadable text the obscures its subject. On the other hand, it will be a lot easier to do things like highlighting effects, and such like as I no longer need to pass around various parameters.

On the whole, this is the better path, even though I may need to start poking at prefuse's insides.

Thursday, January 13, 2005

Colouring Inside the Box

SET012-demo.png (PNG Image, 2551x284 pixels)

Now that the data sets can be reprocessed more easily, and have more, relevant, data in them, I've started working on the colouring code. At the moment, I've just cleaned up the colours (to get rid of the grey fill colour, etc) and added highlighting of factors. In the image linked above, factors appear as blue points, the green points on the left-hand side are the "given" clauses and the black points are any edges not created by factorisation.

Also evident in this image are the diagonal lines mentioned in a previous post, caused by the implementation of the theorem prover.

All in all, it has been a fairly productive day (so far).

Input Processing: The Joys of a Dodgy Hack

Just finished a rewrite of the input processing scripts. I now have a single Python script that reads a SOFTIE log from a file and does a few things with it. The first new feature is that it processes the option and parameter declarations output at the start of the log and warns the user if very_verbose, print_kept and print_deleted are not turned on. It understands enough to set and unset option flags and parameters, but the parsing is a little stupid. It will ignore warning lines, but treats everything else that looks like a parameter or option as a parameter or option, until it sees the start of the theorem proper.

It then proceeds to build a number of lists of clause Nodes. There is a list of all clauses and a list each of those that were kept, deleted and subsumed. Whilst building the lists of clauses, it keeps an eye on the time (i.e. the iteration) and increments it when necessary.

Once it is done building the lists of clauses, it, by default, prints out the list of kept clauses as a graph in the XML dialect used by prefuse, with the nodes and edges bearing the attributes that the visualisation software expects.

Though it would probably be faster if written with more care (or in a text processing language like Perl or awk), this new script has turned what used to be a 5 (or more) stage process with 3 passes over the log file, into a single stage process with one pass over the data. It doesn't spawn any processes and doesn't use much memory (as opposed to the old shell scripts, and the libxml2 Python scripts respectively).

Thursday, January 06, 2005

A Traditional Adjacency Matrix

John just raised the point that the size problem would be solved by using a 'full' adjacency matrix on the usable set (instead of all clauses). This would indeed give us a square of n size, with the triangular half of it filled. It might also be interesting in that it would allow us to, more easily, represent numerical statistics. These might include the number of clauses that were subsumed by a particular set of clauses, the number of children or the distance from the proof.

On the less positive side, it is possible that the plot might tend toward completeness. This may be a boon however as there may be some relation between its tendency toward completeness and the likelihood of finding a proof. Also, with a denser graph it may be reasonable (though probably quite naive) to assume that any blank space will be more meaningful.

These issues need, and deserve, to be addressed by implementing this idea. Sounds like I'm going to be trying to get a lot of little things done tomorrow.

The end is nigh. Or at least in sight.

The end is nigh, or in sight, depending on which way you look at it. There are only a few weeks left of the summer scholarship program, and a lot of work still to be done.

On the other hand, a meeting with my supervisor (John Slaney) today has reassured me that I'm not doing as badly as I had expected, and when you look at the graphs I've been drawing on a large scale, you can see why. The matrix style diagrams display a number of features that clearly illustrate properties of the system, and the problems being solved.

In the SET012 problem, we can clearly see repeating patterns which may be a graphical reflection of the symmetry of the problem (from Set Theory and regarding some relation between sets and another). Another interesting feature is a number of peculiar 'gaps' in the graph where large numbers of factors are generated consecutively. One of these gaps is several tens of clauses wide.

Also clearly visible are sloping vertical "lines" (in reality they are just adjacent points). This appears to be caused by the method used by the reasoning system. To generate new clauses from a given clause, the system attempts to unify it with the clauses in the usable set. This process traverses the list of usable clauses in reverse order which, for given clauses that unify with a large number of usable clauses, gives the appearance of sloping lines.

Other noticeable features include the appearance of a number of short horizontal "lines", strangely regular patterns, patterns that appear to be interrupted then continue and some patterns that bear a small resemblance to the Game of Life and other cellular automata.

I'm about to start working on adding some useful features to help give some more meaning to the graph. Unless I have a stroke of genius, this will probably be restricted to colouring the nodes (to easily differentiate factors, given clauses, etc) and highlighting those nodes that form part of the proof.

When these features are added in (hopefully be tomorrow or early next week), I'll move on to dynamic exploration. Some form of "zoom" function whereby those parts of the graph not in the immediate vicinity of the cursor are scaled down may be useful due to the rather long horizontal axis. Another feature that I'll have a go at implementing is a "cross-hair" on the cursor to allow the user to view other relations involving the node/s they are interested in, trace across to row and column labels, etc. Another possibility is to have the system display the text of a clause, the method that generated it and other possibly interesting data.

All in all, there is too much to do, and too little time.

Monday, December 27, 2004

Onward, to a less nonsensical visualisation!

SET012-clip2.png

The new, matrix inspired, visualisation method is coming along nicely and, as can be seen in the picture linked above, allows one to identify possibly interesting features of the dataset. In this case, there are three points early in the graph that are well above the line formed by the given parent edges. These three, and a number of strange plateaus in the line are probably cause by factorisation.

Factorisation is a technique used by the prover to 'modify' clauses and is done (in these cases at least) when the clauses are added to the "database" of clauses. That is, when they are created. This leads to points that ignore, almost completely, the rest of the graph.

The first thing on the agenda for the coming week is to finish and polish the visualisation itself. Dealing with the gap a the start of the graph caused by the clauses of the theorem not having any parents will be near the top of the list, as will testing the layout code to ensure that it is not piling nodes on top of each other (I believe that it is, though it didn't seem to be when I had a quick look I've had at this problem). When the drawing code is complete and correct, I'll move on to adding in some useful features like drawing the axes, labeling rows and columns, etc.

After the essential features are implemented, the visualisation will need to be evaluated with respect to the goal of my project. If it doesn't seem to make the search process any more obvious, I'll have to reconsider my approach to the problem. On the other hand, if this approach is useful, I'll start to add some more exotic abilities to the application focusing on the dynamic exploration of the dataset. Animation, dynamic colouring of graph elements and querying and constraining the dataset will all, I hope, help the user explore and understand the data set.

Another avenue that will need to be explored is the sheer size of the visualisation. The SET012 graph has the same [very approximately] dimensions as twenty seven and a half, 17 inch monitors side-by-side. Needless to say, this will be difficult to navigate, especially on machines with a single, or smaller, screens. Fortunately there is a lot of research the area of zoomable interfaces, and a number of toolkits like Piccolo and ZVTM should help implement some form of scaling.

On a more meta-blog note: For some reasoning the Blogger spell check feature insists on using an American dictionary and, therefore, trying to correct words that are perfectly fine in the rest of the English speaking world. If I happen to accept some bizarre substitution it suggests, please forgive me.