xtim: Extracting page text on Tiger

Friday, October 31, 2008

Extracting page text on Tiger
Looks as if there might be a bug in PDFKit on Tiger?

Calling


[page attributedString]

is throwing a range exception for some pages. On Leopard the console records a warning message to the same effect but the exception itself is not thrown.

This was rare, but it killed the import process when it happened. We're now explicitly catching the exception and retrying with


[page string]

instead to retrieve the plain text if the first call failed. This means we can't use the string attributes to help deduce the text's layout, but we can at least retrieve the content.

T

Labels: mac, pdf

- posted by Tim Bruce @ 5:27 PM

Comments: Post a Comment

<< Home