xtim
Friday, September 26, 2008
 
EC2
Created a new AMI which will become our db server. Now running an instance of it with a newly-assigned Elastic IP address.

Next step is to create an EBS volume so the database's records persist between machine instances. Then modify the db config to use that volume and update the AMI.

S3 uploads continuing in the background. Yesterday I created a mechanism which will allow us to cache page details (in particular dimensions) in the db so they don't have to be gathered by each server independently. Also set up a queueing mechanism so new imports can be lined up for pre-caching.

T

Labels: ,


Thursday, September 25, 2008
 
First title live
Our first public title is now on S3 - switched it over yesterday. Seems ok, but I still want to improve the speed of uncached pages a little.

Uploading the rest of the data tree to S3 in the background. Now that catapult knows when to leave existing files alone can re-run it as required to send any updates.

Logging is enabled for S3 requests.

DNS for our media alias is in place, but the service isn't yet using it - giving it time to propagate. Will enable it for the next release.

Current list of milestones for the transition:


  1. Serve all page images from S3 (which will mean getting the import system to upload pages directly, also switching backups over to grabbing from S3).

  2. Move index integration to EC2.

  3. Move db to EC2.

  4. Run a webserver instance on EC2, alongside our existing servers.

  5. Switch mail handling to the EC2 server, routing through an external SMTP service.

  6. Switch DNS to point to the EC2 webservers.



Next actions:


  1. Speed up page generation for S3 content.

  2. Package up catapult and integrate with import process.



T

Labels:


Tuesday, September 23, 2008
 
Release 6.8.9
New release is live - this adds:


  1. Improved file handling (don't retrieve page images from S3 unless we really need them).

  2. Eternal disk caching for S3 objects - objects no longer expire automatically.

  3. Per-issue cache expiry on demand through a new admin toolbar



The Catapult tool now has support for MD5 generation but it doesn't use this yet - turns out that Amazon's eTag is explictly an MD5 of your content and this does correspond with the digests I'm calculating for the sample files. We can use this when checking for changed content in rsyncs and don't need to store our own MD5 as metadata for the object.

Checked ACLs for the bucket and test objects, those are fine - full_access for the owner and nothing more.

Finally to enable logging, then upload a live title and switch it over.

T

Labels: , ,


 
Caching nulls
Some of our pages don't have associated links or wordmaps (full-page images, for example).

Our new S3StorageManager used to return nulls to indicate that there was no such information - this has the drawback that while EHCache can store nulls as cache values, they can't be persisted to disk (as null is not Serializable). So after every restart the service would have to contact S3 again to establish that there was no wordmap for a particular page.

We're now returning empty byte arrays instead - these have the same meaning to the consumer of the information and can be cached between application shutdowns. Speed, speed...

T

Labels: ,


 
Optimisations
The plan S3 for today:


  1. Ensure we don't retrieve page images when we don't need to (for example, when preparing to calculate link rectangles for linkless pages).

  2. Turn off expiry for the S3 disk cache.

  3. Add ability to expire S3 cache on a per-issue basis.

  4. Release.

  5. Turn on S3 logging.

  6. Get Catapult to store an MD5 of the object when uploading to S3 so we can "rsync" later on.

  7. Re-check bucket settings, ACLs etc before uploading lots of content.

  8. Set up DNS for media subdomain.



T

Labels:


Monday, September 22, 2008
 
Release 6.8.8
went live on Friday afternoon. This is the first public release of the Shibboleth link - there's a new option to log in "via institution". We're using the UK Federation's WAYF service for now; will implement a custom directory service when we support a client outside the federation.

For the moment it's time to shift focus onto the S3 migration. The site is now pulling all wordmaps, links, keywords, feature lists and page images from the appropriate service. There may still be a few locations which refer explicitly to local page images (cover thumbnails are a case in point) but there's enough there for testing.

Initial testing reveals that retrieval of uncached pages is too slow, owing to the round-trip between servers as we calculate the link rectangles. Once cached, the timing is fine.

T

Labels: , ,


Friday, September 19, 2008
 
More power
RAM prices aren't what they used to be - I noticed this week that you can expand your Macbook to the max (4GB) for £50. Woo! Just installed it and the machine feels much more powerful, particularly when I switch back to an Eclipse editing session. That used to trigger a 30-second page thrash during which I would forget what I was going to do, but now it's ready to work straight away.

Number of page faults since boot: 0.

T

Labels:


Thursday, September 18, 2008
 
EHCache
is a fine, fine thing.

S3StorageManager is now caching the retrieved objects in a persistent disk cache. Next to bring LocalStorageManager up to speed, simplify the Highlighter and make a release.

The Shibboleth project is almost done. Our local federation gave the all clear last night, so we're now in their metadata. Did some initial testing with a real IdP today which worked initially but then broke - checked with the IdP that their metadata is up to date, as someone else had a similar problem for which stale metadata was the culprit. I must have tweaked something unfortunate. More testing on the way.

T

Labels: , ,


Wednesday, September 17, 2008
 
Persistent caches in Spring
The S3StorageManager is now picking up page images, word maps and links from S3 - woohoo! The next step is to enable caching so that it doesn't have to fetch a page image multiple times. The web servers will have plenty of free disk space so I'd like to use that to cache the objects from S3.

I had planned to use the AOP caching mechanism in Spring to take care of this automatically, storing objects in a persistent EHCache on the disk. Everything's configured and I can see the images getting pushed into the persistent cache, but they're never found again once you restart the application - so we'd get the benefits of caching while the app is running, but a site update or restart would mean we'd have to rebuild the cache from scratch.

A bit of poking around reveals that the CacheInterceptor is prepending the oject id of the target object to the cache key. I can see why that's a good idea (you probably do want to partition the method results cache contents by target most of the time), but it means that results cached from one object won't be used once that particular instance object's been replaced. In our case the target is a singleton anyway so we don't get any benefit.

I think the best solution might be for our S3StorageManager to manage its own cache explicitly. Off to check the EHCache documentation...

T

Labels: , ,


 
Release 6.8.7
is live now - this adds provisional support for S3 hosting. I've swapped a disused title over to the new scheme and am uploading its data to S3. Testing tomorrow.

T

Labels: ,


Tuesday, September 16, 2008
 
Platform Selection
Titles can now be assigned to platforms (local or S3). Next to release this to the live service and move a test title across for larger-scale testing.

In other projects: a paper issue is on its way for test scanning, technical details are under review by the federation and the import queue is growing steadily...

T

Labels: ,


 
Local S3 configuration
Spring's now autowiring our storage managers together - one bean each for the S3 and local managers, plus a composite manager which aggregates them into a single service. The S3 manager is configured in the per-machine settings so that we can keep the S3 details out of the codebase.

Introduced a new StoragePlatform enum which will become a field of the Magazine.

T

Labels: ,


 
Delivery from S3
Got S3 delivery working last night on my local test system. It's still at the proof-of-concept stage, but it seems fairly credible. Page images are delivered from S3 and come through about as fast as they do from our existing live servers.

There's a new storage abstraction layer which coordinates actions through an appropriate storage service; one for local and another for S3. Each title will be associated with one or the other service as we make the transition. The services are mainly stubs at the moment but will get fleshed out over the next couple of days.

The plan:


  1. Pick up S3 connection parameters from our per-machine config file.

  2. Add storage platform field to Magazine

  3. Release

  4. Upload all issues of a test title to S3 and switch platform

  5. Test performance

  6. Pick up wordmaps, links, keywords and page images through storage layer

  7. Enable caching

  8. Integrate index rebuilds



Unforseen benefit: my test server can now deliver anything that's available through the live site, as they're both serving the same content (well, it'll still use a local database, but the page images, links and wordmaps will all be common). No more staring at screens of placeholders while testing...

T

Labels: ,


 
I liked it so much...
been a quiet couple of days on the blog - my evaluation copy of MarsEdit had expired. I definitely felt the lack, so $30 later I'm back!

T

Labels:


Friday, September 12, 2008
 
iPhone 2.1
Site seems to work fine with the latest iPhone update - even feels a bit snappier, though that may just be because I've rebooted my phone.

That's a phrase I wouldn't have typed ten years ago...

T

Labels:


 
Release 6.8.6
this just adds the extra path for shibboleth-based login. The site's now configured to pass all such requests through shibd, while leaving other login requests alone.

This is now working through the test IdP, though I did have to modify the tomcat config to get it to pick up on the REMOTE_USER variable set by shibd: set tomcatAuthentication to false so that tomcat receives the variable propagated by apache.

So: you can now make a login request to our site, get redirected to an identity provider and redirected back to us. The login code will see your authenticated identity (well, enough to decide whether or not to grant access) and will log you into the site.

Remaining work: register our service provision details with the federation, integrate their metadata and offer a directory/where-are-you-from service from our login page. Getting there...

T

Labels: , ,


 
Form mangling?
Just tried to configure Shibboleth so that it sits in front of our real login process and intercepts appropriate requests - unfortunately this blocks real logins too, as if the POSTed details are getting lost before our handler can use them.

I'm going to abandon the query-string approach to Shibboleth invocation. There may be a way to get it to work, but it feels fragile and there's too much scope for inconveniencing other logins if things go wrong. Planning instead to deploy our login handler under two separate paths, one dedicated to shibboleth and the other handling regular login requests.

T

Labels: ,


Thursday, September 11, 2008
 
Catapult
Created a new tool (Catapult) which will upload a specified tree from our data hierarchy to S3. Seems to be working fine - next to create an abstraction layer in the web application so we can switch titles from local storage to S3.

Got the go-ahead from our federation, so we need to finish the live configuration for Shibboleth and share our metadata.

Also: imports. KW going up, need to do CH for publication later tonight.

T

Labels: , ,


 
Genius playlists
seem pretty good so far...

T
 
Elision
Wow - was that really a week since the last post? Lots of things going on...

New site release is up (6.8.5). Features:

Improved login consistency - dropped the idea of a "theme" parameter being passed around, instead it's all now based on the title you're trying to access. If you came to the login page because you need to authenticate to see the page you requested, we base the login page theme on the theme of the title concerned. The title id is passed to the Forgotten Password page and back again to keep everything consistent.

All logins are now handled by the dedicated login processor. Previously the situation described above (you need to login to see what you requested) was handled by the page browser or issue browser - this led to inconsistencies where the preparation work done by our login processor wasn't taking place, so some options weren't getting offered to the user.

The year browser on the site's more accurate - we used to page your issue list into rigid chunks of 12 per page, then take you to the page containing the start of the requested year. Now we fine-tune the page start so it coincides exactly with the first issue from that year. This was a particular client request.

Some theme work for the same reason.

The load-balancer config's been updated to ensure that sessions are consistent across http and https requests.

Shibd still up and running in the test configuration. Waiting for confirmation from the federation before switching to a live config.

Also, imports and the regular monthly routines.

Going to clear remaining emails and dedicate today to getting S3 up and running for a test title.

T

Labels: , ,


Wednesday, September 03, 2008
 
Unbalanced
Ah - the load balancer. I was investigating the infinite loop which arose when I tried to authenticate for http access to the protected resource. It's a product of the load balancer, which routes users to one of the live servers.

If you end up talking to one server for https requests and another for http then this isn't going to work, as Shibboleth will requires a session before it will let you in, but the details of that session are getting sent over https to a different server.

This multi-protocol juggling is a slight complication of the clustering techniques described in the documentation. As things stand, we can't rely on server affinity if the content and authorization requests use different protocols. It may be that we can tweak the load balancer config to support this, but as the eventual goal is to secure a https URL anyway it shouldn't be a problem. Just good to know where the pitfalls are before we go live...

T

Labels:


 
Rewrites
We're getting there - there are some rewrite rules we use to set up dynamic redirections on the site, so we can add /coolnewthing and set up a redirection in the database rather than having to tweak the apache config each time.

These rewrites only apply to the http service (not the https virtual host).

Checking the documentation for mod_rewrite, it's clear that the request URL gets rewritten before we get a chance to pass it through Shibboleth. By the time we would do that, it no longer looks like a request we would want Shibboleth to intercept.

So: added rewrite exemptions for the test path we want to protect and the Shibboleth session handlers.

This is better; I now get sent to the IdP when I try to access the protected resource. Once I've signed in though, the IdP and our Shibd get into a loop where the credentials are sent again and again until I kill the browser. This doesn't happen when I use the https: address, so I suspect I've still got a dodgy rewrite somewhere. Investigating...

T

Labels: , ,


 
Up all night
I extended yesterday's test and the Shibboleth daemon's been up and running all night on all live servers - everything seems to be okay.

Running with a test configuration which allows me to verify things are working with a "real" external IdP.

One configuration oddity: Shibboleth's only intercepting https requests, it's not logging anything for http requests to the same resource. I don't know whether it's apache which isn't passing them through or shibd which is ignoring them. Going to fix this, then work on the Query restriction in the Request Mapper.

T

Labels: ,


Tuesday, September 02, 2008
 
RequestMapping oddness
For all my excitement (steady) about the ability to Shibbolize requests selectively based on a query parameter, I can't actually get it to work.

In the request map section of shibboleth2.xml, I filter the matching requests down by host, path and finally query parameter before requiring a Shibboleth session.

In the apache config, I'm saying "require shibboleth" to pass candidate requests through the Shibboleth module. No "ShibRequireSession" directive.

I think that should require Shibboleth to look at each request, but not actually require a session unless the matching parameter is supplied. Non-matching requests are getting mangled too, though.

Going to work forward again from a very basic config to work out what breaks.

T

Labels: ,


 
S3 REST API in Java
Holding off on the Shibboleth integration tests until I can watch the logs throughout the day tomorrow.

In the meantime, experimenting with the Java library for S3's REST API. This seems to be more recently maintained than their SOAP alternative.

The sample application worked fine, storing and reading objects in a test bucket. I've compiled the integration code into a jar and created our own git repository of the current version.

Next to build a basic tool we can use from the command line to get some sample data into the service. Then to create an abstraction layer so we can pull the wordmaps, links and images (for highlighting/slicing) from S3 to the webservers. At the moment the webserver expects to find these on the local filesystem. We should be able to use Spring's caching framework to reduce the performance hit.

T

Labels:


Monday, September 01, 2008
 
Release 6.8.3
Fixes the bug with French renewal notices (we now auto-select the correct shop based on the subscription you're trying to renew).

Back to getting Shibboleth up and running with our new certificate.

T

Labels:


 
We're back
A fun but slightly wet tour of the country last week - Peterborough, York, Newcastle and Lancaster. Ubiquitous broadband and the iPhone make it much easier for anyone who's at all prone to work stress to check emails in a moment and get back to the holiday.

Unfortunately, Apple's 2.0.2 iPhone firmware update has subtly changed the way Safari interprets finger gestures. It used to be that a single tap would invoke a mousemove method, which is what we use to display our link overlays on the page. A second, separate tap would then invoke the click method. With the new firmware, the first tap itself does the click, so our links don't get displayed. Going to alter the iPhone presentation so that all links are visible by default.

Edit: belay that. My phone is now working the way it did before the update - most curious. Maybe rebooting the phone fixed it? Keep a keen eye on the 2.1 release when it arrives. The rules are available here.


There's also a bug to fix in the way our French renewal notices work, plus we have our organisation certificate ready to complete the Shibboleth configuration. After that: hosting.

T

Labels: , ,



Powered by Blogger