LibXML error 9 at line 4 (column 17): Input is not proper UTF-8, indicate encoding

Hi all,

I have this error on my RSS feed:

LibXML error 9 at line 4 (column 17): Input is not proper UTF-8, indicate encoding ! Bytes: 0xFD 0x6D 0x20 0x48

P.S: RSS feed language is Turkish.

Maybe the feed is buggy? Without it, we cannot help…

Actually its URL

http://forum.donanimhaber.com/rss.asp?forumID=193&type=1

Broken feed by the looks of it. You need to have a word with whoever’s running that forum.

https://fakecake.org/myfeedsucks:

Text version

OK. Here’s what we received:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Donan�m Haber Forum - S�cak F�rsatlar</title>
    <copyright>Copyright (c) 2020 Donanimhaber</copyright>
    <link>http://www.donanimhaber.com/</link>
    <description>Haber sitesi</description>
    <language>tr-TR</language>
    <ttl>5</ttl>
    <image><title>Donan�m Haber Forum</title><width>60</width><height>60</height><link>http://www.donanimhaber.com/</link><url>http://image.donanimhaber.com/images/dhiconsmall.jpg</url></image>
    <item>
      <title>Amazon Asics Kad�n Jolt 2 Sneaker 175 TL</title>
      <link>https://forum.donanimhaber.com/fb.asp?m=143648023&go=last</link>
      <guid>https://forum.donanimhaber.com/fb.asp?m=143648023&go=last</guid>
      <description>farklı renk ve numaralar var
 165 tl
 175 tl</description>
    </item>
    <item>
      <title>Packt Publishing8217den Onlarca Kitab� �cretsiz �ndirin</title>
      <link>https://forum.donanimhaber.com/fb.asp?m=143648392&go=last</link>
      <guid>https://forum.donanimhaber.com/fb.asp?m=143648392&go=last</guid>
      <description>Packt publishing onlarca ekitabını ücretsiz olarak indirme sundu.
Ayrıca 30 Mayıs 2020 tarihine kadar da online etkileşimli web geliştirme, veri anali</description>
    </item>
<snip>

Parsing…

Parsing failed. Diagnostic output below.

Error: LibXML error 9 at line 4 (column 17): Input is not proper UTF-8, indicate encoding ! Bytes: 0xFD 0x6D 0x20 0x48

Thank you for your comment.

I have been using Feedly and Selfoss for a long time. There was no problem with these tools. I think it’s a core related problem.

I found a URL about this subject on the internet. It is very interesting that someone else has no problem with this.

https://stackoverflow.com/questions/2507608/error-input-is-not-proper-utf-8-indicate-encoding-using-phps-simplexml-lo

From your link they also suggest it’s a broken data source

Either way, notify your data provider that they’re sending invalid data so that they can fix it.

My understanding is the using the common fix using utf8_encode produces unpredictable results and so cannot be reliably fixed in TT-RSS

Actually I’ve checked it now with w3 checker. It says:

[Valid RSS] This is a valid RSS feed.

https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fforum.donanimhaber.com%2Frss.asp%3FforumID%3D193%26type%3D1

Did you see what they also said after that?

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

Your feed appears to be encoded as "utf-8", but your server is reporting "ISO-8859-9" [help]


line 10, column 37: Image title doesn't match channel title [help]

        <image><title>Donanım Haber Forum</title><width>60</width><height>60</he ...
                                         ^

line 354, column 0: Invalid HTML: Numeric entity expected but none found. [help]

          <description>Herkese Merhaba, 

line 397, column 2: Missing atom:link with rel="self" [help]

      </channel>
      ^

tt-rss uses libxml to parse XML. therefore, feed documents should be valid enough for libxml. nothing about this is going to change.

if you feel that w3c parser or feedly or whatever else is correct in considering your feed a valid XML document and libxml is not, take it with libxml developers. good luck.

… or, you can build a few lines of php that fixes that issue on the fly, and output the fixed feed for ttrss.
I ended up doing something like this for a few latin2 feeds.

I actually respect that @fox doesn’t want to fix all the crappy feeds, but keeps them out of ttrss instead.