Vanity Foul
Dedicated to the wanderings of an egotistical mind.


20030824
Sunday August 24, 2003

Atom and Digester

As I mentioned in my previous post, I'm lazy. So I'm using Digester to parse the AtomAPI XML - but its falling down in one regard: <content>can contain <html></content>. And Digester sees this as just more XML, so the contents on the content tag only get assigned if there is no html.

Some general websearching, and searching on the Digester-user list, revealed one other person with a similar concern earlier this year. I've sent him an email, asking if he can supply more information on his solution. If not, or if that doesn't solve the issue, I'm afraid I'll have to resort to "hand-parsing" the AtomAPI. I'm lazy, I don't wanna do that.

But again, it may become necessary as the AtomAPI allows for adding arbitrary vendor-specific tags - <mt:fooBar> for example - and my Digester rules don't cover that at the moment. I do plan on re-expressing the rules via Digester's rules.xml capability though; that may allow for the flexibility necessary....
( Aug 24 2003, 01:30:02 PM ) Atom Permalink Comments [3]


Trackback URL: http://www.brainopolis.com/roller/trackback/lance/Weblog/atom_and_digester
Comments:

Hey Lance,

If you can't find a solution with Digester, I've got a delegating SAX handler that this would be pretty easy to do this with... it was originally designed to chop large xml files into chunks, so it's used to ignoring (and maintaining) nested tags.

Posted by Jason Carreira on August 24, 2003 at 08:14 PM CDT
Website: http://freeroller.net/page/jcarreira #

Its been awhile since I studied the Digester API, but would it be possible to implement a custom Digester Rule that would suck in all the child elements of a particular element?

Posted by Mark Mascolino on August 24, 2003 at 08:18 PM CDT
Website: http://people.etango.com/~markm/ #

Thanks for the offer Jason, I'll let you know if I need to take you up on it.

Mark, that's pretty much the solution I'm looking at. I found someone who's already done it, just waiting on a reply.

Posted by Lance on August 24, 2003 at 08:43 PM CDT #

Comments have been disabled.

archives
links