Save HTTP error when status code of feed response is neither 200-299 nor 304 instead of LibXML Parse Error


#1

When feed URL returns HTTP response with status code 400-599, tt-rss saves parsing error (it’s logical: its body is invalid in this case).
So I offer to save HTTP status code with status phrase (i.e. almost whole first line from headers of HTTP response) to last_error field of the feed in the database in order to more properly explain current problems with feed.


#2

shouldn’t this be stored already? i think at least when using curl status is part of the error message.


#3

For example, I get error

LibXML error 4 at line 1 (column 1): Start tag expected, ’

if feed status is between 400 and 599 (response body is HTML or plain text [with appropriate Content-Type HTTP header] instead of XML). cURL is installed.


#4

that looks like a minor bug: fetch_file_contents() returns false if response code is not 200. there’s no check afterwards so empty document is passed to feedparser - even though it shouldn’t - which tries to parse it and overrides the error message.

e: to get back to the original question, i wonder if curl_errno() / curl_error() includes status information sent by server (functions.php:454 etc)

e2: oh i might be wrong, looks like i’ve been looking in the basic feed info function, not actual feed update. hmmm.

-btw i get a strange error message without curl on 404: “[09:50:47/31743] unable to fetch: Undefined offset: 1 [404]”- <-- nvm

e3: looks like curl_error() is not set on 404 and such, i guess this makes sense.


#5

is this better?

[10:01:48/32554] unable to fetch: HTTP/1.1 404 Not Found [404]

that’s the entire status line. error body is not parsed though, i don’t think it’s a particularly good idea.

e: https://git.tt-rss.org/git/tt-rss/commit/9d930af9e109884f219a2254dc444c7a943b1e6e i think this should be an overall improvement, comments as usual welcome

in other news i give up on trying to force tab indenting on phpstorm, i guess i might as well just give in and reformat everything into spaces. damn.


#6

Yeap, I agree. HTTP code and status phrase are enough informative.

Perhaps should "HTTP/1.1 " be removed? Or is this part in log files only?
And status code is presented multiple times, but it isn’t so significant.


#7

second http code in brackets is debugging output only, i guess http/x.x prefix might be stripped for brevity