Feeds having author fields exceeding the 245 char length limit

tt-rss version: current (af13f3009c)

tt-rss limits the length of the author field to 245 chars in pre-processing (classes/rssutils.php, line 778), and the database schema provides a VARCHAR(250).

Certain feeds routinely exceed this length limit in their dc:creator field. An example is quant-ph updates on arXiv.org, which uses extensive HTML inside that field, resulting in typically 1000s of characters, most of which is just HTML overhead rather than essential information. As a result, full author information for these feeds is not fully searchable because it never enters the database. This affects users wanting to search or filter for, e.g. the name of an author of an academic publication.

Suggested fixes:

  • Mark ttrss_entries->author varchar(250) for enlargement in future schema updates
  • As a workaround, add a per-feed option to strip HTML tags from the author field

just a note, html is actually stripped when escaping so it should be strictly a limit issue.