Tiny Tiny RSS: Community

Importing entries from other RSS reader


#1

Innoreader supports importing/exporting entries (especially those starred entries) to a JSON format.
Here is an example:

{
  "crawlTimeMsec":"1516226902000",
  "timestampUsec":"1516226902000000",
  "id":"tag:google.com,2005:reader\/item\/000000035d74960b",
  "categories":[
    "user\/1006616538\/state\/com.google\/reading-list",
    "user\/1006616538\/state\/com.google\/read",
    "user\/1006616538\/state\/com.google\/starred"
  ],
  "title":" THIS IS Titile ​ THIS IS Titile  THIS IS Titile  THIS IS Titile  THIS IS Titile  THIS IS Titile  THIS IS Titile ",
  "published":1516210059,
  "updated":1516230449,
  "starred":1516226902,
  "canonical":[
    {
      "href":"http::\/\/www.google.com"
    }
  ],
  "alternate":[
    {
      "href":"http::\/\/www.google.com",
      "type":"text\/html"
    }
  ],
  "summary":{
    "direction":"ltr",
    "content":"THIS IS CONTENT THIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENTTHIS IS CONTENT"
  },
  "author":"",
  "likingUsers":[
    
  ],
  "comments":[
    
  ],
  "commentsNum":-1,
  "annotations":[
    
  ],
  "origin":{
    "streamId":"feed\/http:\/\/www.google.com/feed.xml",
    "title":"SOMEONE's RSS FEED",
    "htmlUrl":"http:\/\/www.google.com"
  }
},

I think it is a cool feature for a seeming-less migration from other readers to TT-RSS.

Currently TT-RSS can import the list of feeds (OPML) and fetch the entries (contents) from that list. Since the remote feed might not preserve the full history, it is hard to transfer the starred entries from the old reader to TT-RSS.

It would be awesome if TT-RSS could import the list of entries (like the Innoreader JSON format), sort them into corresponding feeds according to the URL, and maybe also mark the stars automatically :slight_smile:


#2

Sorry it is basically what the import/export plugin does.

But somehow the download doesn’t start on my computer, and I have to ssh into the server to get the exported xml file on the cache folder.


#3

If it’s not downloading you can check the HTTP response code in the browser and the logs on the server.

Regarding the original post, OPML is a standardized format; so it’s a good choice for TT-RSS. Anything else can be achieved through plugins.


#4

looks like i partially broke that plugin, thanks for reporting

should be fixed by https://git.tt-rss.org/fox/tt-rss/commit/d7282ec292a79120709e93ba9b1c73d0077d871b


#5

I will try writing a script that converts the Inoreader JSON to the TT-RSS XML format.

Just in case I reinvent the wheel, is there any similar converter that exists already?


#6

I wrote a script to convert the Innoreader JSON to TTRSS XML format.

import json
import hashlib
from lxml import etree
from datetime import datetime
filename = "starred"
with open(filename+'.json',encoding="utf-8") as json_file:
    counter = 0
    data = json.load(json_file)["items"]
    articles = etree.Element("articles",attrib={"schema-version":"137"})
    for items in data:
        counter+=1
        title=items["title"]
        content = items["summary"]["content"]
        guid= "SHA1:"+hashlib.sha1(content.encode("utf-8")).hexdigest()
        link = items["canonical"][0]["href"]
        feed_title = items["origin"]["title"]
        feed_url=items["origin"]["streamId"][5:]
        updated=datetime.utcfromtimestamp(items["published"]).strftime('%Y-%m-%d %H:%M:%S')
        
        article = etree.SubElement(articles, "article")
        guidxml =  etree.SubElement(article, "guid")
        guidxml.text = etree.CDATA(guid)
        titlexml =  etree.SubElement(article, "title")
        titlexml.text = etree.CDATA(title)
        contentxml =  etree.SubElement(article, "content")
        contentxml.text = etree.CDATA(content)
        markedxml=  etree.SubElement(article, "marked")
        markedxml.text ="1"
        markedxml=  etree.SubElement(article, "published")
        markedxml.text ="0"
        markedxml=  etree.SubElement(article, "score")
        markedxml.text ="0"
        markedxml=  etree.SubElement(article, "note")
        markedxml=  etree.SubElement(article, "link")
        markedxml.text =etree.CDATA(link)
        markedxml=  etree.SubElement(article, "tag_cache")
        markedxml=  etree.SubElement(article, "feed_title")
        markedxml.text =etree.CDATA(feed_title)
        markedxml=  etree.SubElement(article, "feed_url")
        markedxml.text =etree.CDATA(feed_url)
        markedxml=  etree.SubElement(article, "updated")
        markedxml.text =etree.CDATA(updated)
    outputxml = etree.tostring(articles, pretty_print=True,encoding="utf-8")
    with open(filename+".xml","wb") as opt:
        opt.write(outputxml)
    print(outputxml.decode('utf-8'))
    print(counter)

Then I tried to upload the XML file, but found many problems. First it won’t upload due to the max_header_size on PHP and Nginx side. I increased both to 120MB, and then it worked for file <2MB.

However, when I try uploading a 10MB XML file, it shows

{"error":{"code":13,"message":\u627e\u4e0d\u5230\u65b9\u6cd5(Method)"}}

Is the file too large?


#7

it probably is too large for DOMDocument to load (see memory_limit in php.ini)

i suggest generating several files of smaller size

e: btw, if you have import_export enabled in config.php you can import from command line using update.php

e: also, it should be post_max_size to allow larger files to upload


#8

Great!
Now I imported everything from the my Innoreader.

But I also found a problem during the process. The imported articles are not searchable, and the Feed Debugger trick doesn’t seem to work for the imported articles. So I have to manually update index with SQL .

update ttrss_entries set tsvector_combined = to_tsvector(content);


#9

thanks for reporting this; tsvector_index has likely been added after this plugin was initially written, i’ll make a note to update it so that the index is generated properly


#10

i didn’t test it but it should work