Monday, September 20, 2004
Right, we're drawing a line under this.
The configuration update last Tuesday seems to have solved the long-term OOM errors. The servers have been running since then without a recurrence of the problem.
We have, however, seen a couple of OOMs - two of them this morning on w3. In each case we've traced the cause to "aberrant behaviour" from the clients - weird searches (or many simultaneous near-identical searches) which operate as a DOS attack on the server concerned.
So then, the things to do:
1. tweak pool config on each machine so mem can hold theoretical max size of pool.
2. refactor search handling via struts and include a sanity checker for search requests - too many clauses or too odd and you get a simplified search (or none at all).
T