xtim: 07/01/2004 - 08/01/2004

Added support for the contents category to our client code and the IDirectoryPage interface. Haven't yet modified the letter_picker jsp to take advantage as the new API is not yet live. We haven't got any content in that category yet.

T
- posted by Tim Bruce @ 4:00 PM 0 comments

Index merge complete on w1.

Integrated the extra dtable code (for key links) into the regular entry page.

Next, the contents category.

T
- posted by Tim Bruce @ 3:06 PM 0 comments

sql update completed on w1, indexes are up and merging. Phil's given the go-ahead to put the vols live to their respective audiences, but we'll need a cluster update to do it. Also need to update the jar as described below.

Modified the admin system to support pick-and-mix for 150 volumes, brought the preview site up-to-date.

T
- posted by Tim Bruce @ 11:51 AM 0 comments

Paragraph width styles are working fine on other browsers, so that's done. Needs a jar update to work on the site, which may mean we have to make a maintenance release if the publisher won't let Tab Med go live without it.

Extended the maint page even further so that our email now includes the URL requested and a dump of all parameter values.

T
- posted by Tim Bruce @ 10:43 AM 0 comments

Added support for paragraph types: default, image and table. Added stylsheet instruction to limit default paras to 580px width.

Works fine on the tab med under firefox, will check tomorrow on other browsers.

T
- posted by Tim Bruce @ 5:40 PM 0 comments

More bug-hunting for Claire; again, it was a form-submission problem with w3 which is down to the response.sendRedirect following content transmission. Won't happen again now we're back (temporarily) to jserv across the board.

T
- posted by Tim Bruce @ 4:06 PM 0 comments

Performed the rest of the Tabs Med import by hand (well, by content admin tool). Added new controls to select RR/SR on the vol panel. Merging indexes at the moment, after which we can test both new books on xplus.

Putting sql log onto w1.

T
- posted by Tim Bruce @ 3:06 PM 0 comments

Tabs Med failed import last night as the irml file was missing the SR/RR field - my fault, I copied the irml from the irml_complete folder and probably overwrote the good one. I think I can rescue the import without a full re-run.

Fixed the editorial system for James. We had some remaining references to link marking on the jsps and I've removed them.

T
- posted by Tim Bruce @ 12:11 PM 0 comments

Another couple of maints served - tracked this down to the results page searching by subject but not supplying a list of subjects. To reproduce it, go to the advanced search page and clear all subjects, then search.

Tried to fix it but it's not really clear what's going on on the results page - a failed search redirects back to the same page but with an extra nohits parameter. That parameter is used to exclude a check for blank subjects and I don't know what the intent is there. Going to ask A to take a look when he's back on Friday but the Tomcat refactoring can't come too soon.

T
- posted by Tim Bruce @ 4:42 PM 0 comments

Lots of maintenance pages served overnight, all from w3. There are two causes:

1. referencing the cookies array without checking to see whether it's null first. I was sure we fixed all these pages a while ago but the code says otherwise. Now definitely fixed, for all of them.

2. IllegalStateExceptions thrown when we call response.sendRedirect after HTML has already gone back to the client.

Number (2) is more serious. Jserv doesn't complain about it but Tomcat won't allow it. Therefore we have to fix it before we move to TC.

Our redirects are often in include files (or include-include files) and so buried quite far within the compiled page. Even if we've not deliberately sent any content before they get processed, the "include" mechanism can introduce rogue newlines which count as page content.

To fix this we'll have to refactor our pages in a way which ensures that all redirection logic get processed before any content is sent - ie move to controller/view architecture. Struts provides a handy framework for that. Unfortunately, jserv won't support struts.

Our options are to

a) write a struts-like framework which will work under jserv,
b) move to struts under tomcat,
c) carry on as we are, with jserv and scripts on the pages.

(a)'s a waste of time and (c)'s not got much life left in it. We're already running into "too much code" exceptions when compiling some of our more epic pages (see entries for last week). My vote goes for (b).

Would be nice to rationalise all our presentation stuff into taglibs as well so we get clean jsps at the end of it. I reckon it would be a month's project to do.

In the meantime, we're reverting w3 to run jserv rather than tomcat.

T
- posted by Tim Bruce @ 11:41 AM 0 comments

Caught one oddity before release: the column order on the default CIA dtable shifted around. The code wasn't imposing a particular order on the SQL retrieval, it had always just been the natural order. Added an sql clause to ensure consistency.

liverpool_street is up and running.

Partner logins work, the extended dtable support is in there, we've fixed a couple of intermittent maintenance page bugs, we have level-dependent section styles - that's about it for this month.

Performed a page update too for both x and xplus. New library pages are up and we fixed another Tomcat/cookie handling glitch.

T
- posted by Tim Bruce @ 5:25 PM 0 comments

jsitetest passed.

T
- posted by Tim Bruce @ 11:58 AM 0 comments

Plaintext dtable contents now available.

Reimported Phil's census data.

Beefed up the test import process to include context-entry generation.

Pressing on with pre-release checklist - tagged release as liverpool_street and now we're going west again.

jtest passed.

T
- posted by Tim Bruce @ 11:45 AM 0 comments

Alphabetical sorting of dtables now fixed. There's an extra cell_plaintext field in the dtable_cell table, into which we store the plaintext of the cell text, according to the RML parser.

Next, ability to request plaintext rather than html when retrieving IDTable objects so that chart labels can be plain.

T
- posted by Tim Bruce @ 10:07 AM 0 comments

RML tags in dtables are playing havoc with alphabetical sorting and chart labels. Bah.

Fixing.

T
- posted by Tim Bruce @ 4:20 PM 0 comments

Thanks, IE - no png support, so we're going with a standard gif with transparent surround and no anti-aliasing.

T
- posted by Tim Bruce @ 2:42 PM 0 comments

Added key id fields to the dtable value objects and modified entry_dt so that it uses them. If a table or column has a non-zero key field we display an info hyperlink to the appropriate entry. The info graphic is anti-aliased against a white background - will ask james to put together a png with proper alpha channel so that we can display it on the tinted dtable header.

T
- posted by Tim Bruce @ 12:41 PM 0 comments

Performed a test import of Phil's census data late on Friday afternoon, so he can get a view of how it's coming together. Need to add display code for dtable keys today.

The maintenance mailer has earned its keep over the weekend - about 250 maintenance pages were served, with about 240 of them down to a check on the entry page to find out what kind of browser the client's using (we change spacing between some elements depending on mac/ie presentation). This check failed with a NullPointerException if there was no user-agent header supplied from the browser, and I'd guess some firewalls are stripping it off. Anyway, that's now fixed.

About five maints were caused by subject layout code on the no_page.jsp - if the user wasn't logged in then there were no subjects to display and we generated another NullPointerException.

Remaining errors are down to a bug in the RML3 parser (ie the one for older books). The cause was an assumption that entry bodies will contain at least one space - obviously not true in some cases! The error was generated on the letter-browse page.

The page-based errors are fixed now and I'll do a one-off Monday page update to put them live. The RML parse error I'll fix today and put up with tomorrow's early jar update.

T
- posted by Tim Bruce @ 10:24 AM 0 comments

There are five lines in the log sequence for (f) as described below. The first is a request for login_card with a blank referrer - either typed direct or grabbed from a bookmark. The remaining four requests are for images referenced on the login page.

All requests are served with a 200 (OK) response code.

There's no subsequent activity from that client IP until we get a request for our default document 25 minutes later - that request is redirected back to the login page.

I can't reconcile these logs with the description we received from the client. Something very odd must be happening, or perhaps they're seeing a local proxy error page which is getting mistaken for ours?

Anyway, we now have the maintenance mailer in place so we'll find out exactly what's going on next time this happens.

Signing it off for now,
T
- posted by Tim Bruce @ 3:47 PM 0 comments

Right then:

Records of requests for login_card with the specified client id, on the 20th July:

a) w2 at 13:31:14 from 84.65.95.133. No subsequent post.
b) w3 at 03:30:46 from 68.74.164.126. No subsequent post.
c) w3 at 14:34:41 from 195.188.176.200. Form posted 13 seconds later, redirect to search.jsp issued.
d) w3 at 14:41:31 from 195.188.176.200. Form posted 14 seconds later, redirect to "logn_card.jsp" issued. No idea where that mis-spelling comes from.
e) w3 at 15:08:08 from 195.188.176.200. No subsequent post.
f) w3 at 19:21:52 from 195.188.176.200. No subsequent post.
g) w3 at 21:05:24 from 213.122.22.252. No subsequent post.
h) w4 at 14:38:51 from 81.168.111.122. Form posted 15 seconds later, redirect to search.jsp issued. This is our office IP as the client.
i) w5 at 15:02:53 from 195.92.67.76. No subsequent post.

That makes 9 entrance attempts, one of which was from our own office IP. None of them returned a 500 error code (ie invoked the maintenance page).

If the client is running software which strips off the referral URL then form posts will not have been grabbed by the filter I ran above. Will follow through (f), as it's the example explicitly described in the error report.

T
- posted by Tim Bruce @ 3:01 PM 0 comments

Back to tracking down the errors encountered by our client on Tuesday night. I'm not hopeful that we'll get anything from the logs, but with the maintenance mailer in place at least we'll get all the details in future - during the stage show, live, as it's performed.

T
- posted by Tim Bruce @ 12:28 PM 0 comments

Maintenance mailer now working under jserv. Jserv doesn't catch compilation errors, only runtimes; this is still useful. Mail now includes the hostname, client name, client id and member id.

Page update on the way.

T
- posted by Tim Bruce @ 11:38 AM 0 comments

The maintenance mailer caught a couple of pages overnight, both caused by requests to the xplus/trial/signup.jsp page - which is no longer a live URL. There is code there, but it's not maintained and won't work. We have no links to it on our sites, but the URL is out there on press releases from a few years ago.

Replaced page with a redirect to the appropriate new location. Will perform a page update once I know nobody else is editing.

Before that: try to retrofit this hyper-cool tech to the jserv boxes.

T

- posted by Tim Bruce @ 10:49 AM 0 comments

After a certain amount of swearing at Sun's mail API versioning (actually, quite a lot), the maintenance page is now sending a stack trace to the operations dept whenever it's served. This is tomcat-only and barebones at the moment - tomorrow I'll add details of the machine involved, the client we're serving etc.

Got to stamp these things out. Take that!

T
- posted by Tim Bruce @ 5:37 PM 0 comments

xrefer corporate news page not working under tomcat - too much code within the enclosing try block for the compiler to handle. The include/menuing system's got a bit out of hand there and it's another prime target for taglibbing.

Anyway, joining all of the 2000.inc include file onto a single line removes enough of the out.println calls to compile the page for the meantime.

Next: mail when serving the maintenance page.

T
- posted by Tim Bruce @ 12:26 PM 0 comments

Frickin blogger just got confused and lost my post. AAAARGGGH.

Anyway, yesterday was mainly dtables. Added support for local/global keys to dtables and dtcols, changing the db and the import process. Need to add display support for them.

Tracked down an obscure problem with the admin system, which was refusing to add an IP range. The range was allocated to a long-deleted member. We no longer delete members so it's not going to be a recurrent problem.

Claire forwarded on a post from a client who's been having problems logging in:

...tried to access xreferplus last night at around 7.30pm to demonstrate it to a meeting of councillors and important council people.

I have only just set up IP access today, so he was using his remote access log in (http://www.xreferplus.com/login_card.jsp?clientid=xxxxx). He said he tried to access around 15 times in this way, and each time got a message saying that the site was undergoing maintenance. He is deeply disappointed at our service, and also embarrassed by the position that it put him in.

I have occasionally seen this page come up before when I have been doing training sessions and a large number of people are trying to access the same page or the Research Mapper at the same time.

Hmm, that's not very good. Tried to track it down last night but couldn't make much sense of the logs. There are two successful logins on that account at around 2:30pm and I can see the request for the login page at 7:30pm. More investigations today.

The box which served the 7:30 request was our trial tomcat server, but I've tried the same login on that machine and it worked fine for me.

Matt's looking for a way to get Tomcat to send us a stack trace whenever it serves a maintenance page.

T
- posted by Tim Bruce @ 11:00 AM 0 comments

Added level-dependent heading styles for nested sections. Modified the stylesheet accordingly.

Tidied away a few unused references to IDisplayPreferences in the setPos calls.

T
- posted by Tim Bruce @ 3:56 PM 0 comments

On to the key attribute for dtable cols and dtables themselves.

T
- posted by Tim Bruce @ 3:15 PM 0 comments

Ran the updated table handling through optimizeit and it doesn't look too bad - there's a slight overhead from invoking the parser but it's negligable compared to the time taken for the db query. This is all very dependent on the contents of the table, so we'll have to check it again when the new dtables come in.

Claire had a problem with dtable charts during this morning's demo. Traced it through and it coincided with "out of memory" lines in the machine's log. Hoping that Tomcat is going to fix this / make it easier to track down. Matt preparing to roll out tomcat on w3 for a test today.

Discussion with James about approaches to styling nested sections. May go with level-dependent styles and a global config in the site stylesheet. Further discussion required.

Updated and republished stats pages to fix the old familiar null / empty array of cookies problem under tomcat.

T
- posted by Tim Bruce @ 2:30 PM 0 comments

Can't see a neat way to avoid the anchor tags in parser-processed dtable cells. They get introduced at the root-section level, so wrapping the text up in a dtable-type section and handling that differently doesn't work - the root section puts in its own anchor. All this is really deep in the parser logic and I'm reluctant to add a "Don't add anchors" parameter because it's such a tiny case and there's no visible effect if we leave it as it is.

I'm leaving it as it is.

T

- posted by Tim Bruce @ 11:55 AM 0 comments

Modified ContentDAO to look up the correct parser for fancy dtable processing.

T

- posted by Tim Bruce @ 5:17 PM 0 comments

Entities in parsed dtd elements working fine.

T

- posted by Tim Bruce @ 5:08 PM 0 comments

Extended the client API so that retrieveDTable now takes a baseURL and an IDisplayPreferences.

Modified client to pass these through to the ContentDAO and the parser, along with our standard media base url.

Cross-references and images now working in dtables.

Foxed for a while because sort was no longer working - this was because the presence of an internal link caused the entry to be re-written and the val attribute was getting mangled.

T

- posted by Tim Bruce @ 4:41 PM 0 comments

Added basic parsing support to dtable contents. If we spot that there's any RML markup in the cell, we send it off to the RML31 parser before requesting it back as HTML. Proof of concept only at this stage, with the following caveats:

1. Entities need to be escaped otherwise it's all going to crash.
2. Need to select the parser appropriate to the vol (actually, it's only rml31 which supports dtables but that's not the point).
3. Need to pass in the base URL.
4. Need to pass in the media URL.
5. RML parser will insert inappropriate anchor markers throughout text.
6. Haven't profiled it, so it may crawl like a one-legged donkey.

Others as and when.

T

- posted by Tim Bruce @ 4:33 PM 0 comments

It's

changed

again.

Please, please leave the interface alone! I don't want a wysiwyg interface which crashes my browser. I just want to type text and hit submit. Content is _so_ much more important than presentation in this context. In fact, of all the blogs I've read I've never cared whether a word was in bold, indented or blinking. Simplify, simplify, simplify...

Rant over.

Added detailed logging to the DTableParser, re-ran the import of the new test book.

T

- posted by Tim Bruce @ 2:45 PM 0 comments

FRIKKIN BLOGGER HAVE CHANGED THEIR INTERFACE AGAIN. LEAVE IT ALONE, PLEASE, PLEASE, PLEASE.

Memory problems (PC, not personal). Run jboss, run jserv, request a page and my machine runs out of memory, killing jboss. Restarting X to clear any caches.

T
- posted by Tim Bruce @ 11:43 AM 0 comments

Started work on the extensions to dtables. This will allow us to include (at minimum) x, xn and image elements in dtable cells. The big problem here is going to be making it work efficiently.

Brought my test rml file up to date, adding a default sort column and an x link.

T
- posted by Tim Bruce @ 10:47 AM 0 comments

fixes to the no-hits handling (including beefing up did-you-mean) are on hold until after we roll out tomcat. Then it'll go in properly as a taglib. No more scriptlet epics!

T
- posted by Tim Bruce @ 5:40 PM 0 comments

Found and fixed problem with ContentDAO. If your selection contains multiple volumes with the same title, an incorrect array is returned in the subject descriptions, where they have been compressed down to a single title leaving nulls at the end of the array.

T
- posted by Tim Bruce @ 3:36 PM 0 comments

Cleaned up test applications which reference link marking.

Full clean build of dev jars. Deployed import jar locally and ran a test import - all seems good.

Checking in.

T
- posted by Tim Bruce @ 2:38 PM 0 comments

Removed remarkVol and remarkEntry from Linker/LinkerSupport interfaces, plus all implementing classes.

T
- posted by Tim Bruce @ 12:26 PM 0 comments

Remove fan linker and references to it.

T
- posted by Tim Bruce @ 11:49 AM 0 comments

Removed references to mark links from editorial system.

T
- posted by Tim Bruce @ 11:38 AM 0 comments

Removed references to begin/end mark and mark links from admin.content package.

T
- posted by Tim Bruce @ 11:18 AM 0 comments

Remove "Remark Vol" button from ca.

T
- posted by Tim Bruce @ 11:13 AM 0 comments

Performing a full clean-up of codebase now that the link tables have gone.

Removed MasterLinkerSupportBean from codebase.

T
- posted by Tim Bruce @ 11:07 AM 0 comments

Cleaned link tables from db_server.

T
- posted by Tim Bruce @ 10:51 AM 0 comments

Removed the now-redundant link tables from w1's db. It's given us back a fair amount of space: 10 Gig!

Will clean up db_server too once the backup's complete.

T
- posted by Tim Bruce @ 4:51 PM 0 comments

Cleaning up code and db now that markless linking is established.

Checked codebase for all references to something_entry_to_entry. Sole remaining references are to internal_entry_to_entry, which is fine, or in test code.

Removed sqlLink methods, removed prototype from LinkerSupportBean interface and removed the commented calls to it.

T
- posted by Tim Bruce @ 4:06 PM 0 comments

More alterations to the nplayer site.

T
- posted by Tim Bruce @ 3:33 PM 0 comments

Established a "key" extension to the dtable markup, described below.

Dtable headers will acquire an optional "key" link, which appears to the
user as a small i(nformation) graphic link beside the column heading, eg

Population (i) Area (i) ...
Alabama 2 15
Arizona 1 16
.
.
.

Click the (i) link and you'll go to an entry which describes the column
in more detail.

Implementation

If you'd like to annotate a table header, add a key attribute to the
"dth" element and specify the local entry id of the extra information.
For example:

...<dth sort="num" shortname="pop" key="132">Population</dth>...
.
.
.
<e id=132>Population: count legs and divide by two</e>

T
- posted by Tim Bruce @ 12:33 PM 0 comments

Set up production shared secret for md5 generation in G integration. Send details to Carl so he can update G.

Added java 1.4 logging support to TokenHasher, replacing the old DEBUG compile flag.

Discovered the very fine cvs2cl.pl script, which will generate a GNU-style change log from the rather-more-cryptic cvs log output.

T
- posted by Tim Bruce @ 12:30 PM 0 comments

Added support for did-you-mean testing. If you're logged in under the dev test account then you'll be prompted on the no-hits page if we think you've mis-spelled something.

T
- posted by Tim Bruce @ 4:36 PM 0 comments

Added new parameter handling to nplayer site so you can switch in DG's alternative stylesheet. Makes for easier testing.

DG complained that some entries displayed in a different text size. Traced this to inconsistency in P tags - some entries didn't use them. Amended page to wrap ALL entry bodies in a P element.

Notified DG and asked if that's it for now.

On to did-you-mean and preparation for G conference call.

T
- posted by Tim Bruce @ 3:39 PM 0 comments

Current projects:

1. nplayer site alterations for DG.
2. partner login for G integration. Technology is working on xplus, ready for roll-out in next jar update. Conference call tonight at 5pm to catch up with G side.
3. hook up did-you-mean for company testing.
4. support xrefs and images in dtables.
5. remove unused tables and indexes now that markless links are established.

T
- posted by Tim Bruce @ 11:42 AM 0 comments

Back from Dad's birthday bash in York. London's looking very sunny!

Modified subject descriptions for art and social sciences, and moved a few volumes around.

T
- posted by Tim Bruce @ 11:22 AM 0 comments

Preview site now supports dtables, albeit in a fairly boring way. No interaction, just the facts.

T
- posted by Tim Bruce @ 5:08 PM 0 comments

Yup, supply a term= parameter to the partner login and results shall be yours.

T
- posted by Tim Bruce @ 4:53 PM 0 comments

Modified the nplayer page in line with requests from D G.

Adding support for search strings on partner logins.

T
- posted by Tim Bruce @ 4:25 PM 0 comments

Partner login checked in.

Fixed a bug in the admin system which was preventing Claire entering an IP with a 0 quad.

Phil's after an extension to the dtable system which will allow xrefs and images. I remember that as tricky (as we're not invoking the parser when we construct the table) but I'll take another look. A way to preview dtable entries would also be useful.

T
- posted by Tim Bruce @ 2:39 PM 0 comments

Done! Well, some of it at least.

db_dev contains the new structures and I've got a script to replicate them to the live servers.

There's a new API call to loginByPartnerId(int nPartnerId, int nClientId, String svMD5);

login_partner.jsp decodes appropriate parameters and calls the API to log you in.

I've set up db_dev with xdev as a G partner and added a new member for that account with member-type "partner".

testapi contains a link to log you in. Each partner has a separate shared secret - to calculate the MD5 you append the secret to the client id and encode the result. I'm re-using our token hasher, which agrees with the GNU implementation.

T

- posted by Tim Bruce @ 5:27 PM 0 comments

Blogger, welcome back. It's surprised me to realize how much I've come to rely on this form of note-taking. Without it I do things and cross them off in Tammy, but nothing matches the good solid feel of writing it up for future reference. A4? Nah. So, where were we?

July release (aldgate) is making its way across the cluster update. Lucene classes rolled back to 1.2 for this release, will aim to upgrade to 1.4 next month but it's got to be faster than it is right now.

Did-you-mean API is in there and the renamed db-table is up the live servers too. JSP edits can make that available to testers over the next month.

Fixed the caching problem with the all-books page which caused removed books to linger. We were indeed refreshing the subject list every five minutes but the books-in-subjects lists were never reviewed. They now get refreshed on the same schedule. Change will go live with today's page update.

On to the G integration.

Current plan is to add a new "partners" table with many-to-one links from client to partner. First partner will be xrefer, second will be G.

We will acquire a new member type (partner), which will be the member you get logged in as when you arrive via the login_partner page.

Login process to use shared-secrets and MD5 as G are reluctant to rely on referring urls (firewall issues). Will have to ask them to supply their partner id as well (unless we rely on the fact they're the only ones using the page).

With all that in mind, the plan is:

1. Verify terminology with Carl before it gets immortalised in shiny java.
2. Setup new db structures.
3. Add new member type.
4. Create test G account.
5. Create login API and jsp
6. Test logins
7. Modify admin system to allow selection of partner (and auto-create partner member for that acct).

T
- posted by Tim Bruce @ 11:40 AM 0 comments

Not much progress with the tweaking: shifted instance-variable access in the bottleneck function to use a local variable instead but it's made no difference. I suspect that the loop involved doesn't run more than once or twice per invocation, so the instance-var access overhead isn't a big deal.

Going to revert to lucene 1.2 API for the July release, but will see if the index-merge tuning can remain in place. Aim to migrate to 1.4 next month after further tuning.

Worked with Steve on his final day to integrate the did-you-mean functionality into the xrefer client API. It's in and working via the testapi page. Need to rename the db table and put it up to w1 ready for the july release.

This is the third time of trying to publish this post in three days. I think blogger's having problems.

T
- posted by Tim Bruce @ 11:43 AM 0 comments

This may be futile, but I'm going to try it: 24% of our time is going in BooleanScorer.next(). Can we speed that up?

T

Description of CPU usage for thread Thread-16
100.0% - 27646 ms - java.lang.Thread.run()
100.0% - 27646 ms - org.apache.jserv.JServConnection.run()
99.98% - 27641 ms - org.apache.jserv.JServConnection.processRequest()
99.9% - 27621 ms - javax.servlet.http.HttpServlet.service()
99.9% - 27621 ms - org.gjt.jsp.JspServlet.service()
99.9% - 27621 ms - org.gjt.jsp.JspServlet$Page.process()
98.6% - 27261 ms - javax.servlet.http.HttpServlet.service()
98.6% - 27261 ms - org.gjt.jsp.HttpJspPageImpl.service()
98.6% - 27261 ms - jsp__results_2ejsp._jspService()
97.31% - 26903 ms - xreferclient.impl.direct.DirectQueryToolResource.search()
96.29% - 26623 ms - xreferclient.impl.direct.DirectQueryToolResource.search()
91.59% - 25321 ms - com.xrefer.lucene.DirectSearcher.search()
91.15% - 25201 ms - org.apache.lucene.search.Searcher.search()
91.15% - 25201 ms - org.apache.lucene.search.Searcher.search()
91.15% - 25201 ms - org.apache.lucene.search.Hits.()
91.15% - 25201 ms - org.apache.lucene.search.Hits.getMoreDocs()
91.15% - 25201 ms - org.apache.lucene.search.IndexSearcher.search()
88.0% - 24331 ms - org.apache.lucene.search.Scorer.score()
82.01% - 22675 ms - org.apache.lucene.search.BooleanScorer.next()
36.63% - 10129 ms - org.apache.lucene.search.BooleanScorer.next()
13.78% - 3812 ms - org.apache.lucene.search.TermScorer.next()
10.95% - 3029 ms - org.apache.lucene.index.SegmentTermDocs.read()
3.63% - 1006 ms - org.apache.lucene.store.InputStream.readVInt()
1.81% - 502 ms - org.apache.lucene.store.InputStream.readByte()
0.43% - 119 ms - org.apache.lucene.store.InputStream.refill()
0.29% - 82 ms - org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal()
0.22% - 62 ms - org.apache.lucene.store.InputStream.readBytes()
0.22% - 62 ms - it.unige.csita.lucene.ROInputStream.readInternal()
0.14% - 40 ms - java.io.RandomAccessFile.read()
0.07% - 21 ms - java.io.RandomAccessFile.readBytes()
7.35% - 2033 ms - org.apache.lucene.search.BooleanScorer$Collector.collect()
3.83% - 1059 ms - org.apache.lucene.search.TermScorer.score()
13.25% - 3665 ms - org.apache.lucene.search.TermScorer.next()
10.09% - 2790 ms - org.apache.lucene.search.BooleanScorer$Collector.collect()
4.86% - 1346 ms - org.apache.lucene.search.TermScorer.score()
2.7% - 747 ms - org.apache.lucene.search.BooleanScorer.score()
1.51% - 418 ms - org.apache.lucene.search.BooleanScorer.doc()
5.99% - 1656 ms - org.apache.lucene.search.IndexSearcher$1.collect()
1.95% - 540 ms - org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer()
1.19% - 330 ms - org.apache.lucene.search.Query.weight()
0.21% - 60 ms - com.xrefer.lucene.DirectSearcher.createVolQuery()
0.14% - 40 ms - com.xrefer.lucene.MetaQueryFactory.assembleQuery()
0.07% - 20 ms - com.xrefer.lucene.QueryFactory.assembleQuery()
4.7% - 1302 ms - com.xrefer.lucene.DirectSearcher.getResultBlock()
0.93% - 259 ms - xreferclient.impl.direct.ContentDAO.getVolSpecByShelf()
0.07% - 21 ms - xreferclient.impl.direct.DirectQueryToolResource.tokenToMemberId()
0.79% - 220 ms - xreferclient.impl.direct.DirectQueryToolResource.getAllSubjects()
0.21% - 60 ms - org.gjt.jsp.JspWriterImpl.close()
0.07% - 21 ms - java.lang.String.toLowerCase()
0.07% - 20 ms - java.lang.StringBuffer.append()
0.06% - 19 ms - java.lang.StringBuffer.toString()
0.06% - 18 ms - java.lang.StringBuffer.()
1.23% - 341 ms - org.gjt.jsp.JspServlet$Page.needToRecompile()
0.06% - 19 ms - org.apache.jserv.JServConnection.getParameter()
0.01% - 5 ms - org.apache.jserv.JServConnection.readData()
- posted by Tim Bruce @ 3:47 PM 0 comments

Well then. We're running out of things to tune here; unfortunately 1.4 is still coming in at 2.28 secs versus 1.2's 1.55 second record.

The optimiser's showing that almost all objects created during the request handling are created within the Lucene framework, and the numbers aren't unreasonable. 14,828 "ScoreDoc" objects created is the biggie, and that corresponds to the 14,828 hits for our particular search. We probably can't prune that any lower without changing our approach (eg can we stop after 200?).

CPU time is also spent almost exclusively within Lucene. 91% of the time is within the Lucene search() call.

So we may be getting to the limits of tuning. I'm going to spend an hour or two going through the Lucene code to see if there are any obvious tweakables. Then tweak them.

T
- posted by Tim Bruce @ 2:51 PM 0 comments

2.2ish seconds on the test framework. Better, still some way to go.

T
- posted by Tim Bruce @ 1:09 PM 0 comments

Bit better:

Current heap of application org.apache.jserv.JServ
--------------------------------------------------

Class name Instance count Difference
-------------------------------------------------------------------- -------------- ----------
org.apache.lucene.search.ScoreDoc 44484 + 14828
char[] 271093 + 7242
java.lang.String 274146 + 6603
org.apache.lucene.search.BooleanScorer$Bucket 9216 + 3072
java.lang.StringBuffer 7829 + 2521
java.util.LinkedList$ListItr 5448 + 1816
java.util.LinkedList$Entry 5351 + 1771
org.apache.lucene.index.Term 222118 + 1670
byte[] 3685 + 918
int[] 2657 + 672
java.nio.HeapCharBuffer 2139 + 659
Object[] 3948 + 542

But how fast is it?

T
- posted by Tim Bruce @ 1:03 PM 0 comments

Running lucene 1.4 under jserv in Optit. Test search is for "field".

We're creating loads of objects:

Class name Instance count Difference
-------------------------------------------------------------------- -------------- ----------
char[] 423649 + 27735
java.lang.String 427058 + 26818
org.apache.lucene.index.Term 383291 + 21558
org.apache.lucene.index.TermInfo 381780 + 20217
org.apache.lucene.search.ScoreDoc 14828 + 14828
org.apache.lucene.search.BooleanScorer$Bucket 3072 + 3072
java.lang.StringBuffer 4006 + 2709
java.util.LinkedList$ListItr 1816 + 1816
java.util.LinkedList$Entry 1809 + 1775

It's opening the index twice: once for the search, once for highlighting. Will see if we can optimize that and reduce the object count somewhat.

T
- posted by Tim Bruce @ 12:36 PM 0 comments