[Solved] Af_readability not rebuilding search index (was an unrelated issue)

Hey guys, I just debugged an issue and have partly come to the conclusion that af_readability is not triggering a rebuild of the “tsvector_combined” index after updating the entry with content that it found. Now, I am digging further into this tonight, but I would think it’s not down to my version or anything else I have done.

I was hoping someone here might be quicker and smarter as far as confirming if that’s the case not… as if it is I presume it would be something that could be improved (unless intended of course).

TTRSSv17.12 (1ddf3a2), Ubuntu 14.04.5 LTS‬, PHP 7.0.27, PostgreSQL 9.6.6, curl 7.35.0, 2000+ feeds

article plugins are run before the update which sets both tsvector_combined and article content, so i find it unlikely that only one of those is updated (see rssutils.php986, rssutils.php:750), you might be running into some other issue.

everything is possible of course but i’m going to need more solid information to help you

Noted, I’ll check those parts tonight and have a look at what might be going on as best as I can.

btw tsvector is length-limited (by postgresql), maybe that’s why you’re not getting proper search results.

Did not know that, though in this case I don’t think that’s related. The DB shows as this ts_vector wise:

‘2012’:30 ‘2013’:32 ‘2014’:34,44 ‘2015’:36 ‘2015gg’:46 ‘2016’:38 ‘2017’:40 ‘2018’:6,13,42 ‘arhiva’:28 ‘bandai’:1,8 ‘e3’:29,31,33,35,37,39,41 ‘gamescom’:5,12,43,45 ‘gamingnagradne’:26 ‘gg’:19,21 ‘goodgame.hr’:15 ‘igregg.hr’:27 ‘interview’:22 ‘kanal’:48 ‘kolumne’:24 ‘lineup’:7,14 ‘najave’:18 ‘namco’:2,9 ‘navigationpočetnavijestigg’:16 ‘osvrtrecenzijebeta’:25 ‘prašina’:17 ‘predstavio’:3,10 ‘special’:23 ‘svoj’:4,11 ‘tech’:20 ‘translate’:49 ‘youtube’:47

With the content being as well rather limited (at least nothing extreme):

<div readability="5"><body id="readabilityBody"><p>  &#13;
  &#13;
  &#13;
</p>  &#13;
&#13;
    <meta http-equiv="content-type" content="text/html; charset=utf-8"/><!--
    <link href='http://fonts.googleapis.com/css?family=Roboto&subset=latin,latin-ext' rel='stylesheet' type='text/css'> --><link href="http://fonts.googleapis.com/css?family=Roboto&amp;subset=latin,latin-ext" rel="stylesheet" type="text/css"/><link href="http://fonts.googleapis.com/css?family=Ubuntu&amp;subset=latin,latin-ext" rel="stylesheet" type="text/css"/><!-- <meta charset="UTF-8"> --><title>Bandai Namco predstavio svoj Gamescom 2018 lineup | GoodGame.hr</title><!-- CSS --><link rel="stylesheet" type="text/css" href="http://www.goodgame.hr/wp-content/themes/brennuis-new/style.css" media="screen"/><link rel="stylesheet" type="text/css" href="http://www.goodgame.hr/wp-content/themes/brennuis-new/css/orange/orange.css" media="screen"/><!-- Fonts --><link href="http://fonts.googleapis.com/css?family=Ubuntu" rel="stylesheet"/><link href="http://fonts.googleapis.com/css?family=Ubuntu" rel="stylesheet"/><!-- Pingback --><link rel="pingback" href="http://www.goodgame.hr/xmlrpc.php"/><link rel="shortcut icon" href="http://www.goodgame.hr/wp-content/uploads/2014/07/ggfav.png"/><!-- RSS --><link rel="alternate" type="application/rss+xml" title="GoodGame.hr RSS Feed" href="http://www.goodgame.hr/feed/"/><!-- Header Hook --><!-- All in One SEO Pack 2.6.1 by Michael Torbert of Semper Fi Web Design[548,618] --><meta name="description" content="Bandai Namco je na nedavno završenom E3 sajmu bio prisutan s prilično bogatim lineupom, a sličan lineup nas očekuje i na Gamescomu, naravno uz poneko iznenađenje. Bandai Namco je službeno potvrdio svoj dolazak na Gamescom 2018 sajam te je otkrio kako u Njemačku idući mjesec stiže s najmanje 11 igrivih naslova, od kojih je osam već poznato: Jump Force SoulCalibur VI Dragon Ball FighterZ for Nintendo Switch One Piece: World Seeker Code Vein Ace Combat 7: Skies Unknown My Hero One's Justice Naruto to Boruto: Shinobi Striker. Uz osam navedenih naslova, očekuju nas i tri potpuno nove najave, uključujući i dodatni suizdavački projekt &quot;koji će oduševiti fanove&quot;. O kojim igrama je riječ, za sada možemo samo nagađati, dok nas pravi odgovor očekuje u periodu od 21. do 25. kolovoza, kada će se održati Gamescom 2018."/><meta name="keywords" content="ace combat 7: skies unknown,bandai namco,code vein,dragon ball fighterz,gamescom 2018,jump force,my hero one’s justice,naruto to boruto: shinobi striker,one piece: world seeker,soulcalibur vi"/><link rel="canonical" href="http://www.goodgame.hr/bandai-namco-predstavio-svoj-gamescom-2018-lineup/"/><!-- /all in one seo pack --><link rel="dns-prefetch" href="//translate.google.com"/><link rel="dns-prefetch" href="//s.w.org"/><link rel="alternate" type="application/rss+xml" title="GoodGame.hr » Kanal" href="http://www.goodgame.hr/feed/"/><link rel="alternate" type="application/rss+xml" title="GoodGame.hr » Kanal komentara" href="http://www.goodgame.hr/comments/feed/"/><link rel="alternate" type="application/rss+xml" title="GoodGame.hr » Bandai Namco predstavio svoj Gamescom 2018 lineup Kanal komentara" href="http://www.goodgame.hr/bandai-namco-predstavio-svoj-gamescom-2018-lineup/feed/"/><link rel="stylesheet" id="contact-form-7-css" href="http://www.goodgame.hr/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=5.0.2" type="text/css" media="all"/><link rel="stylesheet" id="cookie-notice-front-css" href="http://www.goodgame.hr/wp-content/plugins/cookie-notice/css/front.min.css?ver=51864fe9d75cd99ed1c2cd8a5db66c85" type="text/css" media="all"/><link rel="stylesheet" id="fvp-frontend-css" href="http://www.goodgame.hr/wp-content/plugins/featured-video-plus/styles/frontend.css?ver=2.3.3" type="text/css" media="all"/><link rel="stylesheet" id="google-language-translator-css" href="http://www.goodgame.hr/wp-content/plugins/google-language-translator/css/style.css?ver=5.0.48" type="text/css" media=""/><link rel="stylesheet" id="glt-toolbar-styles-css" href="http://www.goodgame.hr/wp-content/plugins/google-language-translator/css/toolbar.css?ver=5.0.48" type="text/css" media=""/><link rel="stylesheet" id="responsive-lightbox-swipebox-css" href="http://www.goodgame.hr/wp-content/plugins/responsive-lightbox/assets/swipebox/css/swipebox.min.css?ver=2.0.5" type="text/css" media="all"/><link rel="stylesheet" id="rs-settings-css" href="http://www.goodgame.hr/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.1&amp;ver=51864fe9d75cd99ed1c2cd8a5db66c85" type="text/css" media="all"/><link rel="stylesheet" id="rs-captions-css" href="http://www.goodgame.hr/wp-content/plugins/revslider/rs-plugin/css/dynamic-captions.css?rev=4.1&amp;ver=51864fe9d75cd99ed1c2cd8a5db66c85" type="text/css" media="all"/><link rel="stylesheet" id="rs-plugin-static-css" href="http://www.goodgame.hr/wp-content/plugins/revslider/rs-plugin/css/static-captions.css?rev=4.1&amp;ver=51864fe9d75cd99ed1c2cd8a5db66c85" type="text/css" media="all"/><link rel="stylesheet" id="synved-shortcode-jquery-ui-css" href="http://www.goodgame.hr/wp-content/plugins/synved-shortcodes/synved-shortcode/jqueryUI/css/snvdshc/jquery-ui-1.9.2.custom.min.css?ver=1.9.2" type="text/css" media="all"/><link rel="stylesheet" id="synved-shortcode-layout-css" href="http://www.goodgame.hr/wp-content/plugins/synved-shortcodes/synved-shortcode/style/layout.css?ver=1.0" type="text/css" media="all"/><link rel="stylesheet" id="synved-shortcode-jquery-ui-custom-css" href="http://www.goodgame.hr/wp-content/plugins/synved-shortcodes/synved-shortcode/style/jquery-ui.css?ver=1.0" type="text/css" media="all"/><link rel="stylesheet" id="tablepress-default-css" href="http://www.goodgame.hr/wp-content/plugins/tablepress/css/default.min.css?ver=1.9" type="text/css" media="all"/><link rel="stylesheet" id="blog-fancybox-css" href="http://www.goodgame.hr/wp-content/themes/brennuis-new/css/fancybox/jquery.fancybox-1.3.4.css?ver=1.0" type="text/css" media="screen"/><link rel="stylesheet" id="ln-flexslider-css" href="http://www.goodgame.hr/wp-content/themes/brennuis-new/css/flexslider.css?ver=1.0" type="text/css" media="screen"/><link rel="https://api.w.org/" href="http://www.goodgame.hr/wp-json/"/><link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://www.goodgame.hr/xmlrpc.php?rsd"/><link rel="wlwmanifest" type="application/wlwmanifest+xml" href="http://www.goodgame.hr/wp-includes/wlwmanifest.xml"/><link rel="prev" title="Dakar 18 ima novi trailer, ali i službeni datum izlaska" href="http://www.goodgame.hr/dakar-18-ima-novi-trailer-ali-i-sluzbeni-datum-izlaska/"/><link rel="shortlink" href="http://www.goodgame.hr/?p=137433"/><link rel="alternate" type="application/json+oembed" href="http://www.goodgame.hr/wp-json/oembed/1.0/embed?url=http%3A%2F%2Fwww.goodgame.hr%2Fbandai-namco-predstavio-svoj-gamescom-2018-lineup%2F"/><link rel="alternate" type="text/xml+oembed" href="http://www.goodgame.hr/wp-json/oembed/1.0/embed?url=http%3A%2F%2Fwww.goodgame.hr%2Fbandai-namco-predstavio-svoj-gamescom-2018-lineup%2F&amp;format=xml"/><link rel="icon" href="http://www.goodgame.hr/wp-content/uploads/2017/11/cropped-novilogo-32x32.png" sizes="32x32"/><link rel="icon" href="http://www.goodgame.hr/wp-content/uploads/2017/11/cropped-novilogo-192x192.png" sizes="192x192"/><link rel="apple-touch-icon-precomposed" href="http://www.goodgame.hr/wp-content/uploads/2017/11/cropped-novilogo-180x180.png"/><meta name="msapplication-TileImage" content="http://www.goodgame.hr/wp-content/uploads/2017/11/cropped-novilogo-270x270.png"/><!--/* OpenX Interstitial or Floating DHTML Tag v2.8.8 */--><!-- (C)2000-2013 Gemius SA - gemiusAudience / GoodGame.hr / Pages --><div id="big-background-image">
    <section id="wrapper"><!-- Top Section --><section id="top-section"><nav id="top-nav" class="top-navigation"/>&#13;
            &#13;
&#13;
        </section><section id="main-content"><!-- Header --><header id="main-header"><div id="logo">
                    <a href="http://www.goodgame.hr" class="no-eff"><img src="http://www.goodgame.hr/wp-content/uploads/2016/01/novilogo.png" title="GoodGame.hr" alt="GoodGame.hr"/></a>                                    </div>&#13;
                <div class="top-banner-full">
		          	<a href="http://www.sancta-domenica.hr/" class="no-eff" target="_blank" title="http://www.sancta-domenica.hr/"><img src="http://www.goodgame.hr/wp-content/uploads/2016/06/sancta-domenica-2-template.png" alt="http://www.sancta-domenica.hr/"/></a>
		          </div>            </header><!-- Navigation --><nav id="main-nav-wrapper"><select id="mobile-main-nav" class="responsive-menu"><option value="#"> - Navigation</option><option value="http://www.goodgame.hr">Početna</option><option value="http://www.goodgame.hr/kategorija/vijesti/">Vijesti</option><option value="http://www.goodgame.hr/kategorija/ggplus/">GG+</option><option value="http://www.goodgame.hr/kategorija/ggplus/prasina/">    Prašina</option><option value="http://www.goodgame.hr/kategorija/beta-2/najave/">    Najave</option><option value="http://www.goodgame.hr/kategorija/tech-2/gg-tech/">    GG Tech</option><option value="http://www.goodgame.hr/kategorija/gg-interview/">    GG Interview</option><option value="http://www.goodgame.hr/kategorija/special/">    Special</option><option value="http://www.goodgame.hr/kategorija/ggplus/kolumne/">    Kolumne</option><option value="http://www.goodgame.hr/kategorija/ggplus/osvrt/">    Osvrt</option><option value="http://www.goodgame.hr/kategorija/recenzije/">Recenzije</option><option value="http://www.goodgame.hr/kategorija/beta-2/odigrali-smo/">Beta Gaming</option><option value="http://www.goodgame.hr/kategorija/nagrade/">Nagradne Igre</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/">GG.hr arhiva</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/e3-2012/">    E3 2012</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/e3-2013/">    E3 2013</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/e3-2014/">    E3 2014</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/e3-2015/">    E3 2015</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/e3-2016/">    E3 2016</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/e3-2017/">    E3 2017</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/e3-2018/">    E3 2018</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/gamescom-2014/">    Gamescom 2014</option><option value="http://www.goodgame.hr/kategorija/gghr-arhiva/gamescom-2015/">    Gamescom 2015</option><option value="http://www.goodgame.hr/kategorija/ggyoutube/">GG YouTube Kanal</option></select>&#13;
            </nav>&#13;



    <!-- Content -->

    </section><!-- Sidebar --><!-- Footer --><footer id="main-footer"><span class="left-text">&#13;
                <!--
                                Goodgame.hr 2017. | Sva prava pridr&#382;ana
                    </span>

        <span class="right-text">
                                Powered by <a href="http://www.wordpress.org" target="_blank">WordPress</a>
                    </span>

    -->&#13;
        &#13;
    	</span></footer></section><div id="big-background-image">    <p><span class="translate">Translate »</span></p><!-- Social Ring JS Start --><!-- Social Ring JS End --><!-- Powered by WPtouch: 4.3.28 --><!-- Fancybox -->        <!-- Dynamic page generated in 1.232 seconds. --><!-- Cached page generated by WP-Super-Cache on 2018-07-04 07:08:15 --><!-- Super Cache dynamic page detected but late init not set. See the readme.txt for further details. --><!-- Dynamic Super Cache --></div></div></body></div>

Actually, at closer look I see a lot of the text is part of an HTML attribute element (it’s the article content text but maybe that get filtered/excluded when creating the index?) and not pure text … and that af_readability maybe did not catch the actual more cleaner HTML content part of the page.

The page in question is: Bandai Namco predstavio svoj Gamescom 2018 lineup | GoodGame.hr

i think it calls strip_tags() which might mangle the content, you can easily test it yourself, add print($params[":ts_content"]) somewhere before statement execute

if that one looks correct you can try running to_tsvector() yourself from postgresql console

that’s normal, readability is 100% guesswork. it doesn’t work all the time.

While guesswork indeed I do feel it’s failed me quite often recently, even on what seem to be pretty stock wordpress sites. Now there’s likely a 100 reasons of what could go wrong, though are there any plugin alternatives to test that’s not all outdated these days? I had a look at the plugin page but nothing popped my eyes as being a real/different alternative.

i have just recently replaced it with the updated one, which works a lot better for me

the one in trunk currently i think is the only uptodate implementation for php

Interesting, in that case I’ll try to pull that down and update my copy. Thank you fox.

well “recently” is more like a month ago

https://git.tt-rss.org/fox/tt-rss/src/master/vendor/andreskrey

I’ve a few custom tweaks so always fear updates :slight_smile: but pulled down all the latest files as to check if the readability updates done works better.

Update: debugging a lot of “42 Callback aborted [301]” on most feeds at the moment. Maybe I forgot something a file. Hopefully nothing too crazy. -> Seems I forgot the new MAX_CACHE_FILE_SIZE and MAX_DOWNLOAD_FILE_SIZE in functions.php, things back to normal again.

ah the joys of maintaining a fork. i know this feel all too well. :sob: