Very Big Graphs: 12/01/2004

Monday, December 27, 2004

Onward, to a less nonsensical visualisation!

SET012-clip2.png

The new, matrix inspired, visualisation method is coming along nicely and, as can be seen in the picture linked above, allows one to identify possibly interesting features of the dataset. In this case, there are three points early in the graph that are well above the line formed by the given parent edges. These three, and a number of strange plateaus in the line are probably cause by factorisation.

Factorisation is a technique used by the prover to 'modify' clauses and is done (in these cases at least) when the clauses are added to the "database" of clauses. That is, when they are created. This leads to points that ignore, almost completely, the rest of the graph.

The first thing on the agenda for the coming week is to finish and polish the visualisation itself. Dealing with the gap a the start of the graph caused by the clauses of the theorem not having any parents will be near the top of the list, as will testing the layout code to ensure that it is not piling nodes on top of each other (I believe that it is, though it didn't seem to be when I had a quick look I've had at this problem). When the drawing code is complete and correct, I'll move on to adding in some useful features like drawing the axes, labeling rows and columns, etc.

After the essential features are implemented, the visualisation will need to be evaluated with respect to the goal of my project. If it doesn't seem to make the search process any more obvious, I'll have to reconsider my approach to the problem. On the other hand, if this approach is useful, I'll start to add some more exotic abilities to the application focusing on the dynamic exploration of the dataset. Animation, dynamic colouring of graph elements and querying and constraining the dataset will all, I hope, help the user explore and understand the data set.

Another avenue that will need to be explored is the sheer size of the visualisation. The SET012 graph has the same [very approximately] dimensions as twenty seven and a half, 17 inch monitors side-by-side. Needless to say, this will be difficult to navigate, especially on machines with a single, or smaller, screens. Fortunately there is a lot of research the area of zoomable interfaces, and a number of toolkits like Piccolo and ZVTM should help implement some form of scaling.

On a more meta-blog note: For some reasoning the Blogger spell check feature insists on using an American dictionary and, therefore, trying to correct words that are perfectly fine in the rest of the English speaking world. If I happen to accept some bizarre substitution it suggests, please forgive me.

Wednesday, December 22, 2004

Java Programming Language

Java Programming Language, JDK 5

I've was just digging through the Java API docs and found my way to the description of the latest language enhancements. The changes to the language in JDK 5 (nee 1.5) are seriously cool. The descriptions of the new features and constructs read like a who's-who of modern language features:

typed-checked generics
a useful for loop
static imports
useful enums
annotations.

Java is growing up. The for-each style loop and typed generics, as much as I dislike the <> syntax, will significantly improve the experience of programming in Java and the comprehensibility of the resultant code. Static imports, annotations and enums are, in my opinion, more open to abuse, but with discipline they too will make Java a much more attractive language to code in.

I can't help but wonder how long it will take these features to filter into programming and software engineering courses.

Tuesday, December 21, 2004

Alternative Relations on Graphs

This project has, so far, focused on visualising graphs of only one relation on clauses, namely the parent-child relation. Whilst the addition of temporality adds an extra dimension to such a view, it would be interesting to expand the number of relations.

Another point which might bear investigation is the degree of temporal locality in the graphs growth. A graph with iterations (or whichever time quanta is appropriate) as nodes and the relation between the nodes of one quanta and another being an edge between them. The degree of the relation might be reflected in the weight of the edge. This could be visualised with a time quanta being positioned in a ring layout, with the temporal relation forming the ring, and the "share-a-relation" relation being the internal links.

Monday, December 20, 2004

Automated Deduction Image Gallery

http://rsise.anu.edu.au/~sutton/matrix/

I've started on a layout algorithm based on the concept of adjacency matrices. At the moment, it's a full matrix, but as soon as I can think of an elegant way to code it, I'll be tearing out as much of the white space as I can. In essence, the new layout will be a visual parallel of the adjacency lists data structure for storing sparse graphs.

We'll see how I go, but I think that this approach might be more general than a radial layout algorithm. Having animations of a few minimal spanning tree or network flow algorithms would be quite cool.

Saturday, December 18, 2004

Alternative Visualizations and Pseudo-semantic Orderings

I've started thinking about alternative visualization methods in the last couple of days. The most promising alternative is an adjacency matrix tweaked slightly to handle the added temporal "dimension". An adjacency matrix will [I hope] make patterns easier to spot, will be easier to code, and will probably be a lot faster to run. The added speed will enable the use of a number of run-time effects to aid in exploration of the data sets (highlighting related elements, selecting temporal "regions" of the data set, etc).

On the the other hand, the lecture yesterday by Thomas Meyer (which happened to be the final day of the Logic Summer School) was about belief revision and mentioned using epistemic entrenchment to impose an ordering on the set of clauses (or rather beliefs). The though struck me that this technique, or one similar, might be just what I need to impose some form of ordering on the set of clauses to be able to plot them effectively.

The only problem is that any such ordering would not be stable, so it would not allow comparison between two visualizations of different set of clauses (in the most obvious way at least). This might be applicable to both the radial network layout and an adjacency matrix. The radial layout might derive a clauses polar coordinates from its position in the ordering and from the time at which it was created. On the other hand, having a non-temporal ordering relation allow the adjacency matrix to be more useful in animating the graphs growth and development. I imagine that such a visualisation's appearance would be obvious to experts in automated reasoning, but it seems a lot more promising (and useful) than layout out the graphs to minimise edge crossings, etc.

Wednesday, December 15, 2004

A Radial Layout

All of my work to date [as is now reflected by back posts below] has been on two tasks:

generating runtime data dumps; and
experimenting with radial layout techniques.

Generating data has been relatively easy. The automated reasoning system (one based on SCOTT I believe) already had the capability to output most of the data I need (like some statistics and the generated clauses that are kept) so I simply added the capability to output clauses that are discarded. After the data has been generated, I have a whole heap of scripts in a number of languages to translate it into a variety of formats, from CSV-ish formats suitable for generating graphs of various statistics (using GNUplot) to a simple XML dialect which is then transformed into a number of others (for a couple of graph rendering packages) to dot (the language Graphviz uses to describe graphs).

On the radial layout front, I have been looking at papers describing a number of visualization toolkits and radial layout algorithms for graphs and trees. I've modified a layout component from one of these toolkits (prefuse) to display my data in a radial layout, where radial distance is proportional to time. Unfortunately, this leads to very messy graphs [123K].

I'm going to try and simplify the dataset and see if that will help resolve the problem. If not, I think we'll have to find another visualization for the data.

Summer Project

I remembered today that I had a Blogger account and thought that I should probably use it for something. What better purpose than to document my summer research project at the Research School of Information Sciences and Engineering of the Australian National University. This blog then will focus on my attempts to find a visual metaphor to help [humans] understand the search an automated reasoning system performs whilst attempting to prove a theorem.

Essentially this boils down to displaying very large graphs and the way they change over time. Some examples of preliminary work can be found at my project web page.

[Note: Posts previous to this one are based on work notes and have been back dated.]