Issue since Readability update

… and away from it. Well, sorta; we were told to FOAD by Jeff.

Whoops. Sorry I’ve not responded. I’ve just pulled the latest git code so I’ll see if the try/catch solves the problem and watch the update log for a few days. FYI I use FreeBSD, so yes I know it’s not very mainstream, but it is still server tier :slight_smile:

Yeah what may have been helpful in this case is if the feed and maybe the entry that crashed was shown in the log. Because the crash happened before any logging it’s difficult to see which one caused it. But thanks for reminding me about the feed debugger. I’ll try that on each feed one at a time if I still get the same problem.

Just had the error happen again after one week where all feeds stopped updating except ISP Review. Which just happens to be mentioned in the log again before the crash exactly the same as last time. Though when I went into the feed debugger with f+D and forced a refresh it all loaded without a problem. And now everything is working fine again. So it seems whatever caused the problem is then fixed by forcing a refetch/rehash.

[07:30:57/55805] Base feed: ISPreview UK
[07:30:57/55805] => 2019-03-05 06:51:17.549093, 56 2
PHP Fatal error: Uncaught TypeError: Argument 1 passed to iterator_to_array() must implement interface Trav
ersable, null given in /usr/www/ttrss/vendor/andreskrey/Readability/Nodes/NodeTrait.php:324
Stack trace:
#0 /usr/www/ttrss/vendor/andreskrey/Readability/Nodes/NodeTrait.php(324): iterator_to_array(NULL)
#1 /usr/www/ttrss/vendor/andreskrey/Readability/Nodes/NodeTrait.php(421): andreskrey\Readability\N
odes\DOM\DOMText->getChildren(true)
#2 /usr/www/ttrss/vendor/andreskrey/Readability/Readability.php(1270): andreskrey\Readability\Node
s\DOM\DOMText->hasSingleTagInsideElement(‘tr’)
#3 /usr/www/ttrss/vendor/andreskrey/Readability/Readability.php(1166): andreskrey\Readability\Read
ability->prepArticle(Object(andreskrey\Readability\Nodes\DOM\DOMDocument))
#4 /usr/www/ttrss/vendor/andreskrey/Readability/Readability.php(155): andreskrey\Readability\Reada
bility->rateNodes(Array)
#5 /usr/www/ttrss/plugins/af_readability/init.php(188): andreskrey\Readability\Readabi in /usr/www/ttrss/vendor/andreskrey/Readability/Nodes/NodeTrait.php on line 324

congrats, instead of dumping the database, saving the XML somehow, or at least doing something to help us reproduce it, you decided to post the exception again. well done.

arch users, ladies and gentlemen. again and again.

Hi, new TTRSS user here

Imported my feeds via OPML from another reader. Also getting the Readability error mentioned

The initial update ran fine with readability on for all feeds. Subsequent feed updates started triggering the issue.

Most recent example is from The Register. Was working great for the whole 4 days I’ve had TTRSS, but just now bombed out. I disabled Readability for the Register and let the feed update run, and there was just 1 new article which was this one:

https://www.theregister.co.uk/2019/04/16/context_pc_numbers/

So is there something within this article which is screwing Readability? The atom feed is here http://www.theregister.co.uk/headlines.atom

I turned Readability back on for the Register and it hasn’t bombed out

Also regarding https://github.com/andreskrey/readability.php/issues/79 where Andreskrey says “Maybe you can put a breakpoint before triggering Readability and dump the HTML content?” is this possible? I’m not a dev but happy to dump my DB or whatever is needed

tldr: please report issues with readability to readability developers.

you didn’t even think to specify what php version on what platform you’re running in your largely useless “me too” post, i’m not going to waste a week spoonfeeding you because of a third party library i didn’t write nor support. you’ll have to do your homework yourself.

anyway, new rules for this issue:

  1. if you run into it and can figure out why it happens, submit a PR, preferably to developers of readability, but if its a tt-rss problem, to me. i don’t know how this could be a tt-rss problem since all its doing is passing XML to the class but whatever, anything is possible.
  2. if you want to bump this thread with a “me too”, the only thing you’ll get is a probation

i’m not wasting any more time on this.

when I went into the feed debugger with f+D and forced a refresh it all loaded without a problem. And now everything is working fine again.

i have the same problem
can u teach me how to ‘’ feed debugger with f+D ‘’ to fix the problem? thx a lot.

It means fetch the feed using debug mode. You go into the feed and then press the f and shift-D keys. The feed is successfully fetched and processed then. My guess is there’s something slightly different in the code paths between the main feed updater and the debug mode.

I am now agreeing with fox though. I took a look at the code and can see that he’s simply importing a 3rd party library and so this needs solving by the person that wrote the library. Unfortunately I can’t reproduce it in a way where I can just provide a broken feed. Because as I said, it breaks, you fetch the feed another way, and then it works fine for a week before maybe breaking again.

I have worked around this now by reverting the commit that upgraded the library and I’m rebasing the old version on top of any new commits. If the library gets upgraded again then I’ll test it. In the meantime, like fox, I’ve lost interest in caring about it.

btw actual readability library is now moved to the plugin so it’s possible to make af_readability_old or something and use that instead, i’ve made that change with this particular issue in mind.

Ahh that’s useful. Yes I’ve just done this instead. Created plugins.local/af_readability_old and then removed my revert with a reset --hard. Seems to work :slight_smile: If the library gets upgraded again in the future I’ll retest it but until then this will do.

I saw that the readability plugin was updated this week and so I switched back to using the proper plugin. So far it’s worked without any problems. So it’s possible the new version of the library fixed the problem that I had with the last version.

yeah, git changelog mentioned something related to php 7.3 compatibility, i’ve thought about posting here asking for feedback but got distracted and forgot.

Looks like a bug in plugin code or PHP syntax. Some feeds (e.g. ycombinator) have no meta charset tag so when you extract it, you’ve got an empty string. At the next line you call mb_convert_encoding and if the last argument is missing, then PHP throws an error. Below is the patch proposed to fix the error. IWFM for last 2 days.

P.S. Does it make sense to create PR/open issue at ttrss git repository?
P.P.S. I’m not a PHP developer so there might be a better solution.

--- init.php.old 2019-09-10 09:47:14.008953145 +1200
+++ init.php    2019-09-10 09:55:23.080483046 +1200
@@ -179,7 +179,11 @@
                        // this is the worst hack yet :(
                        if (strtolower($tmpdoc->encoding) != 'utf-8') {
                                $tmp = preg_replace("/<meta.*?charset.*?\/?>/i", "", $tmp);
-                               $tmp = mb_convert_encoding($tmp, 'utf-8', $tmpdoc->encoding);
+                               if (empty($tmpdoc->encoding)) {
+                                       $tmp = mb_convert_encoding($tmp, 'utf-8');
+                               } else {
+                                       $tmp = mb_convert_encoding($tmp, 'utf-8', $tmpdoc->encoding);
+                               }
                        }
 
                        try {

sure, post your gogs username and i’ll give you necessary permissions.

Username is the same as here, i.e. trap000d
Regards,

alright, you should be able to clone stuff now.

See PR #120

P.S.
Arrrrrrgh. At least 20 characters, no links
:slight_smile:

thanks, i’ll take a look in a few days when i’m back in town.

Not sure if this is the same error but saw this in my Event Log:

Error	Filename	Message	User	Date
E_WARNING (2)	plugins/af_readability/init.php:183	mb_convert_encoding(): Illegal character encoding specified
1. plugins/af_readability/init.php(183): mb_convert_encoding(<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">

<head>
<title>[1904.10631] Low-Memory Neural Network Training: A Technical Report</title>
<link rel="shortcut icon" href="https://static.arxiv.org/static/browse/0.2.5/images/icons/favicon.ico" type="image/x-icon" />
<link rel="stylesheet" type="text/css" media="screen" href="https://static.arxiv.org/static/browse/0.2.5/css/arXiv.css?v=20190307" />
<link rel="stylesheet" type="text/css" media="screen" href="https://static.arxiv.org/static/browse/0.2.5/css/browse_search.css" />
<!-- Matomo -->
<script type="text/javascript">
var _paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
_paq.push(["setCookieDomain", "*.arxiv.org"]);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://webstats.arxiv.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '1']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
<!-- End Matomo Code -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css">
<link rel="stylesheet" media="screen" type="text/css" href="/bibex/bibex.css?20181010"/>
<script src="https://static.arxiv.org/static/browse/0.2.5/js/mathjaxToggle.min.js" type="text/javascript"></script><script type="text/javascript" src="https://arxiv-org.atlassian.net/s/d41d8cd98f00b204e9800998ecf8427e-T/zca7yc/b/13/a44af77267a987a660377e5c46e0fb64/_/download/batch/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector.js?locale=en-US&collectorId=7a8da419"></script>
<script type="text/javascript">window.ATL_JQ_PAGE_PROPS = {
"triggerFunction": function(showCollectorDialog) {
//Requires that jQuery is available!
jQuery("#feedback-button").click(function(e) {
e.preventDefault();
showCollectorDialog();
});
},
fieldValues: {
"components": ["15700"], // Jira ID for browse component
"versions": ["14153"], // Jira ID for browse-0.2.1 release
"customfield_11401": window.location.href
}
};
</script>
<meta name="citation_title" content="Low-Memory Neural Network Training: A Technical Report"/>
<meta name="citation_author" content="Sohoni, Nimit Sharad"/>
<meta name="citation_author" content="Aberger, Christopher Richard"/>
<meta name="citation_author" content="Leszczynski, Megan"/>
<meta name="citation_author" content="Zhang, Jian"/>
<meta name="citation_author" content="Ré, Christopher"/>
<meta name="citation_date" content="2019/04/24"/>
<meta name="citation_online_date" content="2019/04/24"/>
<meta name="citation_pdf_url" content="https://arxiv.org/pdf/1904.10631"/>
<meta name="citation_arxiv_id" content="1904.10631"/><meta name="twitter:site" content="@arxiv"/>
<meta property="twitter:title" content="Low-Memory Neural Network Training: A Technical Report"/>
<meta property="twitter:description" content="Memory is increasingly often the bottleneck when training neural network
models. Despite this, techniques to lower the overall memory requirements of
training have been less widely studied..."/>
<meta property="og:site_name" content="arXiv.org"/>
<meta property="og:title" content="Low-Memory Neural Network Training: A Technical Report"/>
<meta property="og:url" content="https://arxiv.org/abs/1904.10631v1"/>
<meta property="og:description" content="Memory is increasingly often the bottleneck when training neural network
models. Despite this, techniques to lower the overall memory requirements of
training have been less widely studied compared to the extensive literature on
reducing the memory requirements of inference. In this paper we study a
fundamental question: How much memory is actually needed to train a neural
network? To answer this question, we profile the overall memory usage of
training on two representative deep learning benchmarks -- the WideResNet model
for image classification and the DynamicConv Transformer model for machine
translation -- and comprehensively evaluate four standard techniques for
reducing the training memory requirements: (1) imposing sparsity on the model,
(2) using low precision, (3) microbatching, and (4) gradient checkpointing. We
explore how each of these techniques in isolation affects both the peak memory
usage of training and the quality of the end model, and explore the memory,
accuracy, and computation tradeoffs incurred when combining these techniques.
Using appropriate combinations of these techniques, we show that it is possible
to the reduce the memory required to train a WideResNet-28-2 on CIFAR-10 by up
to 60.7x with a 0.4% loss in accuracy, and reduce the memory required to train
a DynamicConv model on IWSLT&#39;14 German to English translation by up to 8.7x
with a BLEU score drop of 0.15."/>
</head>

<body class="with-cu-identity">

<div class="slider-wrapper" style="display:none">
<a class="close-slider" href="#"><img src="https://static.arxiv.org/static/browse/0.2.5/images/icons/close-slider.png"></a>
<div class="copy-donation">
<h1>Donate to arXiv</h1>
<p>
Please join the <a href="https://simonsfoundation.org">Simons Foundation</a> and our
generous <a href="https://arxiv.org/about/ourmembers">member organizations</a>
in supporting arXiv during our giving campaign September 23-27. 100% of your contribution will fund
improvements and new initiatives to benefit arXiv's global scientific community.
</p>
</div>
<div class="amount-donation">
<div class="wrapper">
<div class="donate-cta"><a class="banner_link" href="https://bit.ly/arXivDONATE1"><b>DONATE</b></a>
<p>[secure site, no need to create account]</p>
</div>
</div>
</div>
</div><noscript><img src="https://webstats.arxiv.org/matomo.php?idsite=1&amp;rec=1" style="border:0" alt="" /></noscript>
<div id="cu-identity">
<div id="cu-logo">
<a href="https://www.cornell.edu/"><img src="https://static.arxiv.org/static/browse/0.2.5/images/icons/cu/cornell-reduced-white-SMALL.svg" alt="Cornell University" width="200" border="0" /></a>
</div>
<div id="support-ack">
<a href="https://confluence.cornell.edu/x/ALlRF">We gratefully acknowledge support from<br/>the Simons Foundation and member institutions.</a>
</div>
</div>

<div id="header" >
<a aria-hidden="true" href="{url_path('ignore_me')}"></a>

<h1><a href="/">arXiv.org</a> &gt; <a href="/list/cs/recent">cs</a> &gt; arXiv:1904.10631</h1>



<div class="search-block level-right">
<form class="level-item mini-search" method="GET" action="https://arxiv.org/search">
<div class="field has-addons">
<div class="control">
<input class="input is-small" type="text" name="query" placeholder="Search..." aria-label="Search term or terms" />
<p class="help"><a href="https://arxiv.org/help">Help</a> | <a href="https://arxiv.org/search/advanced">Advanced Search</a></p>
</div>
<div class="control">
<div class="select is-small">
<select name="searchtype" aria-label="Field to search">
<option value="all" selected="selected">All fields</option>
<option value="title">Title</option>
<option value="author">Author</option>
<option value="abstract">Abstract</option>
<option value="comments">Comments</option>
<option value="journal_ref">Journal reference</option>
<option value="acm_class">ACM classification</option>
<option value="msc_class">MSC classification</optio	

In /plugins/af_readability/init.php, I replaced:

$tmp = mb_convert_encoding($tmp, 'utf-8', $tmpdoc->encoding);

with:

ini_set('mbstring.substitute_character', "none");
$tmp = mb_convert_encoding($tmp, 'utf-8', 'utf-8');

Seems to have fixed the problem.

This might be unrelated by I also noticed a bunch of these types of errors in the Event Log:

Error	Filename	Message	User	Date
E_WARNING (2)	classes/rssutils.php:1217	DOMDocument::loadHTML(): Empty string supplied as input
1. classes/rssutils.php(1217): loadHTML()
2. classes/rssutils.php(814): cache_media(, http://forums.sailinganarchy.com/index.php?/discover/)
3. classes/rssutils.php(148): update_rss_feed(325, 1, )
4. update.php(250): update_daemon_common(50)

To resolve, I changed the line:

if ($doc->loadHTML($html)) {

…in static function cache_media of /classes/rssutils.php to:

if ($html && $doc->loadHTML($html)) {

…which also appears to have fixed the problem.