xtim

Friday, January 02, 2004

This feels really awkward, as it's cutting through the nice neat object model which has worked well so far. Time for a review.

The EntryBean looks after data storage. It associates a piece of data with each entry id, but knows nothing about how to interpret the data.

Each entry has an associated markup id. This identifies which parser to use with an entry. The parser knows how to go from the entry's data to our domain-specific objects: headings, bodies, links, etc.

Nothing else knows this. Everything else in the system uses the domain objects. This means that we can revise RML or introduce completely different markup (eg non-xml) without rewriting the system. Just add a new parser which understands the markup, associate it with the new entries and you're away. Linking, indexing, all the rest will just work. That's useful.

Then we come to the problem of dtables. In theory, these could work on exactly the same lines as the existing structure. The EntryBean would retrieve the marked-up data, the parser would decode it and all would be well. There are two problems:

1. we want to offer flexibility which doesn't sit well with that approach: users can sort and hide columns or use the data in graphs.
2. the full entry data is huge.

So it's not practical to parse the full table every time we want to display a view of it.

The solution we've developed is to pre-parse the entry on import and store the table fields separately in the database. This gives us random access to the cell data, column list, etc. We can create custom views of the table quite efficiently. This happens through a dedicated API call, separate from the standard entry retrieval call. Unfortunately, the user doesn't know to call that function until they've called the standard method and discovered that the entry is, in fact, a table.

It's too expensive to process the whole entry source at that point (particularly as we're not going to use it, we're going to use the dedicated table data instead). Ideally we'd strip the dtable information out of the entry before it's written to the db.

Problem: if we make that the responsibility of the EntryBean, it's starting to interpret the data and that wasn't part of the deal. If we make it the responsibility of the parser then everything which ever calls getText on the EntryBean will have to handle the possibility that it's not getting the full text.

Hmm.

Will continue in next post.
- posted by Tim Bruce @ 4:21 PM

Comments: Post a Comment

<< Home