xtim: 06/01/2004 - 07/01/2004

xtim

Wednesday, June 30, 2004

Back in the framework, ROD still shows no speed improvement. Bah.

T
- posted by Tim Bruce @ 5:22 PM 0 comments

Enabled read-only flag on lucene, which didn't make things any faster.

Reintroduced RODirectory wrapper but tests then slowed to a crawl.

Removed ROD, still crawling. I think it's because I'm running tomcat directly rather than through the framework. Will restart in framework, if we're back to speed I'll try ROD again.

T
- posted by Tim Bruce @ 4:37 PM 0 comments

Ran the new 'n improved indexmerge on goring, for a time of 18 minutes. Good enough for the moment.

The delivery side's still unsatisfactory though, with average results page delivery of 2.5 seconds in our test environment, against 1.5 for the lucene 1.2 implementation. Profiling.

T
- posted by Tim Bruce @ 11:38 AM 0 comments

Tuesday, June 29, 2004

...seven minutes.

That's giving it a mergeFactor of 10,000,000 and 521Mb of heap to play with on readingtest. Expect goring to be a bit slower.

But even so: Woo, tf, Hoo.

T
- posted by Tim Bruce @ 5:07 PM 0 comments

Nine minutes.

But that was on the readingtest uberbrain.

T
- posted by Tim Bruce @ 4:47 PM 0 comments

And the index merge time is down from 3.5 hours to...20 minutes. Schweet. Tweaking further.

T
- posted by Tim Bruce @ 4:37 PM 0 comments

Index merge under 1.4 looks to be no faster - 90 minutes and not finished, I got bored and killed it. Now re-running with new code in there to change the mergeFactor. It's up from the default of 10 to 1,000. Let's see how fast that makes it...

T
- posted by Tim Bruce @ 4:01 PM 0 comments

Now tested pm_check a couple of times and it's working fine with the underscores replacing spaces in element names. Back to the lucene work.

T
- posted by Tim Bruce @ 2:39 PM 0 comments

Back from glasto, tanned and gorgeous. Actually, I think it's mud. Euch.

Pick and mix in all sorts of problems:

1. browser type now getting recorded in logs as "browser type" - a name like that throws off the XML record processing, killing the pm check script before it gets started.
2. jboss not restarted on rss after reboot 13 days ago. Also killing pm script.
3. patched element names, ran script manually which turned up about 15 unhandled requests. Asked Claire to ignore those older than two weeks and handle others with apology for delay.
4. Edited store_fields to replace spaces with underscores.
5. Edited existing logs on servers similarly.
6. Re-running to xpforms to get and compile logs for pm p.m. test.

Also sorted out remote mail access for CC and DR in case tube strike goes ahead tomorrow.

Not even got round to talking about did-you-mean yet with SC. Over lunch, I think.

T
- posted by Tim Bruce @ 12:56 PM 0 comments

Wednesday, June 23, 2004

Indexes still generating, but re-ran test on the semi-complete 1.4 native indexes:

Using lucene 1.2, 512 Mb allocated to Tomcat
w 50 5 http://www.test.xreferplus.com/results.jsp? term=|WORD| 1540 4796 642 1548 4809 644
w 50 5 http://www.test.xreferplus.com/results.jsp? term=|WORD| 1532 4027 700 1545 4029 702
w 50 5 http://www.test.xreferplus.com/results.jsp? term=|WORD| 1586 4509 659 1597 4537 661

Using lucene 1.4rc4, 512 Mb allocated to Tomcat

w 50 5 http://www.test.xreferplus.com/results.jsp? term=|WORD| 3337 14109 1203 3344 14112 1205

Using lucene 1.4rc4, 512 Mb allocated to Tomcat, indexes built using lucene 1.4rc4

w 50 5 http://www.test.xreferplus.com/results.jsp? term=|WORD| 811 3619 330 819 3622 332

which is more encouraging.

In the meantime, tried MultiSearcher as an alternative to the whole merged-indexes enterprise. Doesn't work: we get "Too many open files" errors. Tried adding -DdisableLuceneLocks=true to the command line, but we're still getting nowhere. Could try replacing FSDirectory with our own instrumented version which will tell us which files aren't getting closed, but that's a job for next week.

Waiting for remaining indexes so we can merge them and get a fair comparison on readingtest.

Glastonbury tomorrow.

T
- posted by Tim Bruce @ 4:15 PM 0 comments

Tuesday, June 22, 2004

Comparing lucene speeds on readingtest not encouraging so far:

Using lucene 1.2, 512 Mb allocated to Tomcat
w 50 5 http://www.test.xreferplus.com/results.jsp? term=|WORD| 1540 4796 642 1548 4809 644
w 50 5 http://www.test.xreferplus.com/results.jsp? term=|WORD| 1532 4027 700 1545 4029 702
w 50 5 http://www.test.xreferplus.com/results.jsp? term=|WORD| 1586 4509 659 1597 4537 661

Using lucene 1.4rc4, 512 Mb allocated to Tomcat

w 50 5 http://www.test.xreferplus.com/results.jsp? term=|WORD| 3337 14109 1203 3344 14112 1205

ie about half as fast.

To do:

1. Get 1.4 to use native indexes (we're recycling 1.2 indexes at the moment)
2. Optit
3. Reinstate read-only dir

Started (1) and reindexed vol 293 as a test. Works fine, so readingtest is now working hard reindexing all live content in a separate directory. Then to merge and re-test.

T
- posted by Tim Bruce @ 3:53 PM 0 comments

Monday, June 21, 2004

Lucene 1.4 integration coming along well. They've not changed the API much, so it's just in the areas we've patched 1.2 that we need to amend our code. Searching works, highlighting now works. I've removed the old read-only-directory filter and we should compare speeds tomorrow - given the speed-up it offered I'd be surprised if that approach isn't now in the core codebase.

xlocal pretty much working against unmodified lucene 1.4 rc1. Back on the main line!

To do tomorrow: try generation and merging, compare speeds, run in Optit.

T
- posted by Tim Bruce @ 5:51 PM 0 comments

Friday, June 18, 2004

Harv Music imported (twice, for editorial corrections) and all going well.

Steve's completed the on-the-fly link weights page.

I'm ready to close the book on NASA and outer space...

On to lucene 1.3 integration, then athens update.

T
- posted by Tim Bruce @ 4:21 PM 0 comments

Wednesday, June 16, 2004

Revised import process:

1. No link marking.
2. No sql link steps.
3. New steps to drop/recreate indexes as each linker is invoked.

Tested on db_dev with a few imports, seems OK. Checked in and deployed on import server, started import of Harv Music.

T

- posted by Tim Bruce @ 5:17 PM 0 comments

New jar is live, remaining server is getting updated as we speak.

A cautious thumbs-up so far. We're still breathing, entry retrievals seem OK but I do notice an occasional longer-than-expected wait. Let's see how it runs throughout the afternoon.

T
- posted by Tim Bruce @ 2:55 PM 0 comments

VPN's gone down mid image-retrieval. Bah.

Site itself still seems fine, and I'm hoping that the image retrieval is still underway. Just can't see it. Waiting for Matt to fix the VPN when he's in.

Until then, on with the modifications to the import process.

T
- posted by Tim Bruce @ 8:14 AM 0 comments

Fixed link retrieval so that it returns links in a consistent order - Carl noticed that the sort order varied between top five and all listings. This was down to inconsistent secondary sort order for equally-weighted links. Now sorted by link weight (desc) and entry id (asc) everywhere.

Lucene index merge appears to have completed successfully on w1.

Updated jar on w1.

Started image retrieval ready for cluster update.

T
- posted by Tim Bruce @ 7:31 AM 0 comments

Tuesday, June 15, 2004

cluster update holding off while matt performs vacuum and reindex on w1 db.

making use of the time to upload and merge indexes for vols 289 and 290.

started modifications to import process so that indexes are dropped/recreated as required.

steve extending IDisplayPreferences to incorporate ILinkPreferences ready for a test dynamic link page.

T
- posted by Tim Bruce @ 5:24 PM 0 comments

Remaining sql, indexes and new jar now installed on w1. Tests look ok. Cluster update in progress.

T
- posted by Tim Bruce @ 3:06 PM 0 comments

Markless link site live on xplus and looks good so far. Delivering a few more links here and there than the old site, but they all look appropriate.

T
- posted by Tim Bruce @ 12:03 PM 0 comments

Modified mapper link retrieval code to use the markless links.

Applying new indexes to reading. Will then check the sample query to make sure I've got them all, then test release candidate here.

T
- posted by Tim Bruce @ 11:43 AM 0 comments

rsw1 vacuum complete. Importing remaining sql log.

T
- posted by Tim Bruce @ 10:42 AM 0 comments

Yep, the getLinkDetails method will have to be revised to fit with the new scheme.

Yesterday's SQL went up fine, now vacuuming w1 ready for the remaining data.

Matt taking a db dump for use on readingtest.

To do, then:

1. remaining sql to w1.
2. new indexes on reading and w1.
3. update getLinkDetails code.
3. test and release new jar.
4. cluster update
5. comment our redundant bits of import process and import Harv Mus.
6. test HM links.

After a month-long hiatus, the Darkness return to the top of the morning music hit parade.

T
- posted by Tim Bruce @ 10:20 AM 0 comments

Monday, June 14, 2004

Added getSyncClientId to IMemberDescriptor API. Returns 0 if you're not synced, the source client Id otherwise.

Matt and I got tomcat running in Optit (finally) by choosing the right JVM. It's working with Sun jdk 1.4.1.04 and no other... Now running to spot runaway object creation.

sql log importing on w1, asked content to hold off on more editing until Wednesday.

Switched codebase client over to use markless links throughout. Tested on local machine against db_dev, seems OK.

All import work now waiting until after switch is complete. Tomorrow's the big day.

NB: review mapper link selection tomorrow. May have to modify parts of this too.

T

- posted by Tim Bruce @ 5:23 PM 0 comments

Monday, and I'm back in the yellow/green embrace of B224. Good to return to a familiar desk.

The ambitious plan is to have the site switched over to markless linking by the end of the week, Then dance a little.

Carl's priority is to get the new Harv. Music dictionary up to the site. Imports/linking and work on the markless links are pretty much mutually exclusive, as each makes heavy demands on the DB. Amit's worked over the weekend to get imports up-to-date before his week in Poland, so I'm hoping there's not going to be much of a clash. The plan, then:

1. Survey remaining import work.
2. If Harv can go live, put it up to the live site but don't add it to accounts. Tell Carl when it's ready. Delay any other pending relinks until after the markless link switch.
3. Add the new getSyncClientId API.
4. Prepare new jar for release, including the markless link gubbins.
5. Create support indexes on db_server and rsw1.
6. Test and release this month's jar.
7. Cluster update, watching like a hawk for any anomalies (long queries, etc).
8. If all's well, update link step of import process to drop and recreate indexes, skip sqllink and mark link steps.
9. Perform next import and put it live.
10. If we're happy with everything after a couple of weeks, clear away the now-unused database structures.

Right? Right.

T

- posted by Tim Bruce @ 11:47 AM 0 comments

Thursday, June 10, 2004

Optit wrapping still not going well.

Most of the time, I'm just getting this:

#
# HotSpot Virtual Machine Error, Internal Error
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Client VM (1.4.2_04-b05 interpreted mode)
#
# Error ID: 455843455054494F4E530E43505000F9
#
# Problematic Thread: prio=1 tid=0x080530b8 nid=0x7298 runnable
#

Heap at VM Abort:
Heap
def new generation total 576K, used 576K [0xa6da0000, 0xa6e40000, 0xa7770000)
eden space 512K, 100% used [0xa6da0000, 0xa6e20000, 0xa6e20000)
from space 64K, 100% used [0xa6e20000, 0xa6e30000, 0xa6e30000)
to space 64K, 0% used [0xa6e30000, 0xa6e30000, 0xa6e40000)
tenured generation total 1408K, used 80K [0xa7770000, 0xa78d0000, 0xaeda0000)
the space 1408K, 5% used [0xa7770000, 0xa7784390, 0xa7784400, 0xa78d0000)
compacting perm gen total 4096K, used 833K [0xaeda0000, 0xaf1a0000, 0xb2da0000)
the space 4096K, 20% used [0xaeda0000, 0xaee70528, 0xaee70600, 0xaf1a0000)

So I'm trying alternate jvms.

Running with 1.3 gives a link error - I think the gclibs may not be compatible with readingtest's installation, but don't know much about that.

1.4.1 give this:

OptimizeIt 4.0 build 410 Audit System.
(c) 1997, 1998, 1999, 2000 Intuitive Systems Inc.
Port is 1470
OptimizeIt generic Audit System
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:247)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:421)
at intuitive.audit.Main.runProgramWithClass(Native Method)
at intuitive.audit.Audit.main(Audit.java)
Caused by: java.lang.IllegalAccessError: tried to access method java.lang.Object.clone()Ljava/lang/Object; from class org.apache.xerces.impl.XMLEntityMa
nager
at org.apache.xerces.impl.XMLEntityManager.getRecognizedFeatures(XMLEntityManager.java:1313)
at org.apache.xerces.parsers.XML11Configuration.addRecognizedParamsAndSetDefaults(XML11Configuration.java:1455)
at org.apache.xerces.parsers.XML11Configuration.addCommonComponent(XML11Configuration.java:1421)
at org.apache.xerces.parsers.XML11Configuration.(XML11Configuration.java:536)
at org.apache.xerces.parsers.XML11Configuration.(XML11Configuration.java:406)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
at java.lang.Class.newInstance0(Class.java:306)
at java.lang.Class.newInstance(Class.java:259)
at org.apache.xerces.parsers.ObjectFactory.newInstance(ObjectFactory.java:349)
at org.apache.xerces.parsers.ObjectFactory.findJarServiceProvider(ObjectFactory.java:493)
at org.apache.xerces.parsers.ObjectFactory.createObject(ObjectFactory.java:254)
at org.apache.xerces.parsers.ObjectFactory.createObject(ObjectFactory.java:131)
at org.apache.xerces.parsers.SAXParser.(SAXParser.java:139)
at org.apache.xerces.parsers.SAXParser.(SAXParser.java:124)
at org.apache.xerces.jaxp.SAXParserImpl.(SAXParserImpl.java:98)
at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParser(SAXParserFactoryImpl.java:95)
at org.apache.commons.digester.Digester.getParser(Digester.java:686)

T
- posted by Tim Bruce @ 4:56 PM 0 comments

At home today, as the gas man cometh.

Steve's completed the comparison between old/new approaches to link retrieval with a fine writeup at

http://www.fractalus.com/steve/tmp/xreftimings/index.html

and based on the results I'm happy to move ahead with integration into the client codebase. Steve's pressing ahead with that this afternoon. The new client won't be faster than the old one, but it won't be much slower either and we should be able to simplify the import process dramatically. We'll also get a more flexible client.

Ran a few test imports for Phil so that he can refine a dynamic table for the USA census data. Still trying to get Tomcat running in the OptimizeIt wrapper so we can monitor it remotely.

Amended the legacy entity mapping file so that the musical "natural" sign maps correctly.

T
- posted by Tim Bruce @ 12:08 PM 0 comments

Wednesday, June 09, 2004

Those results in full:

1. local db test machine not available yet. We're going to nab rsw5 for a while this afternoon and use that instead.

2. according to the explain stats, we are faster with the new approach than we were with the tried-and-tested mark links approach (900 versus 1700). Unfortunately, that's not borne out by the tests, which put us about 10 times slower (1.2 seconds versus .12 seconds per query).

I have a theory about this, which is that our new approach requires much more CPU activity than the old one and the explain stats are geared towards IO cost. That means that if the machine is heavily loaded (as our server is) then the new approach will suffer much more than the original. Steve's checking out whether I'm talking rubbish, which is always a possibility. A small flame of hope survives that it's all going to rock when we try it on a responsive machine.

3a. Tried. Discarded. It ramps the cost up several orders of magnitude.

3b. Tried. Incorporated. Raises the cost somewhat (900 to 1050) but is much more scalable and will speed imports.

The presence of the new indexes is slowing imports, so I'm dropping them while the imports go through. I think we'll have to build those steps into the import process.

Matt's seeing memory usage rise continually as tomcat handles requests, so I'm getting OptimizeIt connected to the process to see what's going on. Is it us? Is it tomcat? Enquiring minds want to know.

In the meantime, Steve's verifying the stats on our link retrieval, preparing a script to add the indexes and getting ready to test on w5.

T
- posted by Tim Bruce @ 2:07 PM 0 comments

Tuesday, June 08, 2004

A review of the markless link project and an action plan:

1. queries are averaging .7 seconds, but variability is high and some queries are unacceptably slow. Unfortunately, the office db server is busy with imports and so we can't get an accurate idea of the real-world speed. Would be good to test on production hardware.

2. In lieu of spare hardware, we can compare speeds by checking the "explain" output from psql on our original and new queries. This will give us an indication of expected page reads per query (903..903 for the new query right now).

3. We can alter the sql in a couple of ways;

a) we consult heading_entry_to_entry three times (heading, fan, reverse fan). Can we speed things up by refactoring the query and only scanning this once? This would sacrifice the ability to vary link strengths independently for these linkers, but that's acceptable because they're both fairly dumb and should be marked low.

b) we could skip the linkerx_entry_to_entry tables entirely and go back to the source tables for the query. For example, calculate the heading links on-the-fly from heading_entry_to_phrase. This seems intuitively less efficient but worth testing. It would be a big win in agility.

That's the plan for today, then. Steve's looking at 2 and 3a, I'm taking 1 and 3b.

Matt's preparing our new db machine to test the tomcat configuration and I'll take advantage of that in a spare minute to test 1. We're going with postgres 7.4, which I hear is substantially faster than our current 7.3 install.

On to 3b in the meantime.

T
- posted by Tim Bruce @ 10:43 AM 0 comments

Monday, June 07, 2004

Getting towards the end of qualitative testing of the new link app. The news so far:

1. The new process is getting all the links which the original mark links found.
2. We're also finding a few more.
3. Sometimes the strengths aren't identical, but that seems to be where a back-link hasn't been counted in the original mark links.

Overall then, this is looking very promising. Running the test app to check links for 1000 entries.

Link retrieval is averaging 1.12 seconds at the moment and we're looking for ways to bring that down.

T
- posted by Tim Bruce @ 4:07 PM 0 comments

Added a new back-index to the heading_entry_to_entry table over the weekend, which should speed up the markless link retrieval a little.

Steve modifying the test SQL so that we have an unambiguous name for our entry_id_to union. Previously we had a problem where the back-links query was colliding.

T
- posted by Tim Bruce @ 10:52 AM 0 comments

Friday, June 04, 2004

Deployed the modified admin system so the page footers read 2004 rather than 2003.

T
- posted by Tim Bruce @ 2:05 PM 0 comments

Bah. Found the cookie-handling problem, and it's arising because of how the servlet containers handle browsers without cookies.

If your jsp calls request.getCookies() and there were no cookies with the request, jserv returns you a zero-length array of Cookies. Tomcat returns null. This is really dumb - everything now has to check whether the return value is null before proceeding, when what you probably want to do is loop through them anyway. If you get an empty array back, your loop code will just fall straight through and everybody's happy. Unfortunately, tomcat's the one following the API spec so we can't really complain!

http://java.sun.com/products/servlet/2.3/javadoc/index.html

T
- posted by Tim Bruce @ 11:28 AM 0 comments

Matt's having problems testing the tomcat configuration - browsers which don't support cookies get a maintenance page rather than our "must support cookies" error. Investigating.

T
- posted by Tim Bruce @ 10:54 AM 0 comments

Distributed the new jar to the live site, so scale's gone live and the lat labelling bug is fixed.

Checked in the link test app with back-links enabled for all relevant linkers.

T
- posted by Tim Bruce @ 10:42 AM 0 comments

Thursday, June 03, 2004

jar update delayed while an sql update goes through on w1.

link test app was speeded up but is now back to 20 seconds - we've added a couple of new indexes to the db, but also added support in the app for client shelves and back-links. Tomorrow to verify we're getting the same link scores out as we currently record then work on optimizations.

T
- posted by Tim Bruce @ 5:34 PM 0 comments

Performing a jar update today to fix a problem with the lat/lon labelling around the atlas - the southernmost lat label is incorrect. This will also put the scale indicator live.

jtest and jsitetest passed. tagging as monument_7.

Will modify atlas pages later today to use the new calls in the IQueryTool api for added efficiency and improved logging.

Steve's modified the link test app to allow us to feed in link strengths. We're aiming to reproduce the marklinks table for a given entry with a single sql query against the individual linker tables. It's taking about 20 secs for a query at the moment, we're finding out why.

T
- posted by Tim Bruce @ 10:44 AM 0 comments

Wednesday, June 02, 2004

Progress on the markless link process: We can run a sub-select which concertinas the selection of links / summation of strengths into a single SQL command. This might rock! Tests out OK on the psql client, Steve is adapting our test link App so that you can feed it a link profile on the command line and it will return a list of appropriately-scored links.

T
- posted by Tim Bruce @ 2:54 PM 0 comments

We're simplifying the scale a bit more and it will now be a single line below the map, labelled with the appropriate distance (earth surface distance between the south-east and south-west corners). Modified the renderer so it does everything but the line - waiting for appropriate graphics.

T
- posted by Tim Bruce @ 12:10 PM 0 comments

Tuesday, June 01, 2004

Fixed an error in the entity-mapping files which caused the German sharp s character ß to get dropped during indexing. Will re-index all content in the next import window - Amit's going to give me the nod.

T
- posted by Tim Bruce @ 5:17 PM 0 comments

Right then, completed the scale work on the atlas and tested it against a) proj and b) common sense. Looks OK on both counts, and the maps which span the date line are working fine. I've put the rough version up on xplus and will amend according to feedback.

On to new project - this is a projected month-long collaboration with Steve C, where we're aiming to remove the "mark links" bottleneck from our import process. If we can do so, then

1) all kinds of opportunities open up as far as import is concerned, and
2) we can tailor the link display on-the-fly to give customers more control over what they see.

First step is to get Steve's build environment working.

T
- posted by Tim Bruce @ 4:12 PM 0 comments

That's what you get when you move house - a long time without post.

Anyway, in the intervening:

a) Set up axis toolkit and deployed a test webservice. This is more a proof of concept than anything else, but it does get our foot on the web services ladder. The key steps are

1. get an up-to-date servlet container running (eg Tomcat).
2. deploy the axis webapp.
3. prepare a simple skeleton class library for your API - no functionality, just fields.
4. run the Java2WSDL tool to generate an appropriate WSDL file.
5. run the WSDL2Java too to back-generate your implementation skeleton, service locator and all the rest. This will also create a deployment descriptor.
6. fill in the skeletons and write a test client.
7. feed the deployment descriptor to the axis admin tool.
8. test!

b) Some stats work for Claire and form work for Becky.
c) Atlas revisions - we're going to feature a scale on the south and east of the map. Added calcDistance method to the IProjection interface (our projections incorporate correction for the irregular shape of the globe as well as the projection itself, so distance between a pair of lat/lon points can vary between them). Using algorithms from proj.

T
- posted by Tim Bruce @ 12:01 PM 0 comments