Friday, January 02, 2004
Oh dear. The test lucene index under /tmp has disappeared. In trying to re-create it, I've come up against a problem: with the dtable edited out of the entry, there's not much to index.
This will be a drag if we follow this approach in the live system. Once an entry has been imported, indexed, linked and dtabled, we can theoretically remove the dtable element. If we do that, however, we've got to reinsert it whenever we reindex or relink the vol.
Alternatives:
1. Do remove it, but get the indexing system to work directly from the dtable information rather than the source rml.
2. Don't remove it, and hope that the de-chunked entry retrieval is sufficiently fast.
I don't hold out much hope for option 2. I envisage quite large dtables - the cia example is already huge. Having to wheel all that data around every time we want to display the entry doesn't appeal.
I think we should go for dtable-aware tools and invoke them when required. Ultimately it gives us more flexibility, at the cost of some complexity. They should all live within the dtable package. It may be that we develop sibling packages in future to deal with other specialised entry types (atlas, video, etc).
So, what are we going to need?
1. dtable-aware indexer.
2. dtable-aware linkers.
3. ability to apply the right tool to the right entry.
But hang on! Surely the point of all the original work with the parser layer was to isolate changes in the data representation from the downstream tools. Can't we hide the changes at that level and leave the tools as they are? How about if the Entry takes on the job of spotting that it's a dtable and splitting the RML appropriately? The EJBs are only used during import and maintenance, so it can hide the dtable handling within its own encapsulation for that (where speed isn't so important). For display, where we use the DAO directly, we can end up with a nice complact de-dtabled entry text.
Sounds good. Now to re-appraise the entry import process and see if it's feasible to hide all dtable splitting within EntryBean.
T