Updater: tex code in title/content make it crash

Hi, arxiv.org feeds can have tex code in the titles and abstracts of articles, which makes updater.php fail. I am not sure why this didn’t happen before (I read for many years those feeds with tt-rss), probably since the change to PDO.

git 4fa64e8, debian 8, postgres 9.4.15, php 7.1.15
feed: quant-ph updates on arXiv.org (passes myfeedsucks)

The problem currently is that the title and abstract of one entry contain this: Poincar\'e .

PHP Warning: PDO::prepare(): SQLSTATE[HY093]: Invalid parameter number: mixed named and positional parameters in /var/www/http/tt-rss/classes/rssutils.php on line 996
PHP Warning: PDO::prepare(): SQLSTATE[HY093]: Invalid parameter number in /var/www/http/tt-rss/classes/rssutils.php on line 996
PHP Fatal error: Uncaught Error: Call to a member function execute() on boolean in /var/www/http/tt-rss/classes/rssutils.php:1006
Stack trace:
#0 /var/www/http/tt-rss/classes/rssutils.php(190): RSSUtils::update_rss_feed(179, true, false)
#1 /var/www/http/tt-rss/update.php(199): RSSUtils::update_daemon_common(500)
#2 {main}
thrown in /var/www/http/tt-rss/classes/rssutils.php on line 1006
[13:26:10/475] Sleeping for 120 seconds…

Since I think it’s the specific \' combination, with this (ugly) change it works: git diff classes/rssutils.php :

-                                               $tsvector_combined = mb_substr($entry_title . ' ' .
-                                                       preg_replace('/[<\?\:]/', ' ', strip_tags($entry_content)),
+                                               $tsvector_combined = mb_substr(
+                                                       preg_replace('/\\\\\'/', '', $entry_title) . ' ' .
+                                                       preg_replace('/[<\?\:]/', ' ', preg_replace('/\\\\\'/', '', strip_tags($entry_content))),
                                                        0, 1000000);

Could you confirm this?
Thanks,
Wolfgang

there has to be a better way, this is just plain horrible

Yes. It seems that I was wrong, the ? in the “title” was the culprit, as it was not removed (question marks were already deleted from the entry_content). The reason why my previous change did work is different. How about simply this:

-                                               $tsvector_combined = mb_substr($entry_title . ' ' .
-                                                       preg_replace('/[<\?\:]/', ' ', strip_tags($entry_content)),
-                                                       0, 1000000);
+                                               $tsvector_combined = preg_replace('/[<\?\:]/', ' ', mb_substr($entry_title . ' ' .
+                                                       strip_tags($entry_content),
+                                                       0, 1000000));

this is better but i still really dislike the hacks needed for this stupid field

i’ll take a closer look at this tomorrow, if all else fails i’ll merge the above fix

e: thanks btw

I could not find much about PDO & postgres full-text search, manual sanitizing seems to be normal. I guess that tsvector_combined = to_tsvector( :tsvectext ) and then setting :tsvectext in execute() doesn’t work with PDO/PG but I didn’t try this.

yeah i don’t remember the details but this awful hackery is probably the result of the obvious not working :frowning:

i think this should be a proper fix: https://git.tt-rss.org/fox/tt-rss/commit/963c22646b3e1bd544bd957bf34175b996bd6e53

didn’t test on mysql so post here if i broke everything etc