Simple plugin to clean HTML entities - fixes "libxml error 26"


#1

Yes - the feed providers should fix their feeds !

But when even NASA can’t get it right on their Curiosity feed (https://mars.nasa.gov/rss/missionupdates.cfm?s=msl), maybe it actually is rocket science… or maybe different standards (inches / cm , HTML / XML) is just an organisational problem they have. At least only RSS readers crash in this case.

Plugin simply edits the feed to replace any named entities (e.g.  ) that aren’t valid in XML with their numeric equivalent.


LibXML error 26 at line 460 (column 54): Entity 'mdash' not defined
#2

Unfortunately it doesn’t work for me i.e. the feed still doesn’t get fetched by TT-RSS and gives me LibXML error 64 at line 1 (column 274): XML declaration allowed only at the start of the document error.

The feed in question: http://www.knightsprovince.com/feed/

Or, I completely misunderstood the purpose of this plugin? If that is the case, my apologies - I can delete my post.


#3

Your error isn’t an invalid character, see the title’s “libxml error 26.”

Your error is that the web host probably has some kind of script being injected to all of their pages without excluding the RSS feed, and tries to set a browser cookie, which the RSS spec doesn’t even use in the first place. This plugin will do nothing for you.


#4

Yes - it’s quite specific because I’ve seen this problem a few times, the feeds contain OK HTML / XHTML but XML is a lot more restrictive. It’s also relatively safe to fix with a simple regex, without having to parse the XML which is a bit of a catch-22.

I don’t know how often this other example appears - possibly a filter to remove anything before the <rss> and after the </rss> tags would be effective in a few cases.


#5

Thanks for this plugin; I’ve installed it as my most interesting feed tends to drop a “weird” character every once in a while, breaking TT-RSS until the entry becomes old enough to vanish from the feed! :slight_smile:


#6

Would this plugin fix these errors? Apart from the HTTP 500 and timed out errors?