Thursday, November 25, 2004

Data, Data, Everywhere

The amount of data produced by the problems may pose a problem. LCL005 for example, produces more than 600M of output and generates over 6,000,000 events.

Processing this data into a graph description is the next phase of the project. Some fairly trivial (and very naive) shell scripts have been created to split these files into a list of events for each iteration of the given clause loop. These are however, excruciatingly slow and will need to be rewritten in something a little faster. A few more scripts were created to produce some event statistics and plot them using Gnuplot. These also will need to be re-written in a language that has decent text processing facilities (so as to remove the need to call three cat|grep's per file). Awk may be a good choice.

I seem to be missing some proportion of KEPT clauses. A trace of LCL403 for instance does not list any KEPT clauses (except for those that form part of the theory) whereas the statistics say that 29208 clauses were kept. Most confusing.

No comments: