I found some garbled characters in one feed. I investigated it and found the following:
- The Atom feed contains the following:
- These bytes
E2 80 99are the utf8 representation for
- Doing a
select * from ttrss_entriesfrom the command line shows these characters as
âfollowed by two boxes. â is the ISO-8859-1 character for
E2, and the other two bytes don’t exist in ISO-8859-1.
- The ttrss web UI and app show them as
â€™. That’s these bytes interpreted as Windows-1252.
I’m guessing the issue is from the feed itself, but I’m not familiar with the Atom standard and I don’t know what to tell the site operators. Should they change these characters into
’ in unicode). Or should they add an encoding field somewhere? (the feed has
<?xml version='1.0' encoding='UTF-8'?> at the top, but I think that’s irrelevant here).
Alternatively, is ttrss handling this correctly or could it be fixed somehow?
Using Tiny Tiny RSS v17.12 (2c51fac), Centos 7.4, nginx 1.12.2, PHP 5.4.16, Postgresql 9.2.23.