- PHP 76.7%
- Smarty 23.3%
| templates | ||
| database.sql | ||
| README.md | ||
| retriever.php | ||
Retriever
This is an addon for the Friendica social network server. Many RSS feeds supply abbreviated summaries instead of full articles. This addon replaces those with full articles, by retrieving the upstream webpage and extracting the relevant portion. This allows reading this content within Friendica, or an attached client app, rather than visiting the upstream webpage yourself.
The Problem with RSS
Among many other features, Friendica allows "following" an RSS feed as if it were a social media user. Each news story or blog entry appears in your regular timeline intermixed with tweets from your friends. Depending on personal preference, this may be a convenient way to keep up-to-date with the latest news, or keep track of infrequently-posting blogs.
RSS allows much more flexibility for reading articles. It separates out the content from the presentation, which removes all the distracting navigation, advertising, popups etc that litter modern web design. And it allows accumulating articles in a separate repository, which could be browsed while offline. Articles can be tagged, sorted, and saved on the client side.
Howver, RSS feeds frequently only contain a short summary or excerpt of the article, not the article itself. In particular, this makes offline browsing rather useless.
Solution
This addon reacts to incoming articles by fetching the upstream article over HTTP. For each feed, the user must supply some fairly complicated configuration similar to CSS selectors. The addon will extract just those portions of the retrieved web page, and replace the RSS summary with those results.
Many articles contain embedded images that are essential when reading the page. The addon detects these images and retrieves those too. However, it does not attempt to retrieve other embedded content such as youtube videos. In many cases the articles will remain useless unless the user is able to follow the embedded link to the upstream content.
This plugin is designed to work well with the mailstream addon. Offline reading of email has been a solved problem for many decades, and ordinary users have come to rely on this. So most client devices continue to implement solid offline email support out of the box. RSS support, by contrast, has been in notable decline for some time.
Drawbacks
Configuring the correct content requires some rather sophisticated
work. In general I use the HTML inspector to find the outermost HTML
element that contains the full article. I hope that there is some
obvious and consistent class or id. Often there is no clear
choice, because the site uses a complex dynamic framework such as
React, or because it
is obfuscated specifically to prevent scraping. This addon does not
provide any special techniques to counteract deliberate
countermeasures.
Frequently this leaves additional chunks of content, such as navigation banners or related article blocks, which are also undesired. So there is an additional configuration section for removing sections.
After completing all of this work, it is likely that a news site will entirely refactor its implementation without warning. Typically this results in the scraping attempt failing, which leaves the original excerpt in place. In some cases it can result in the articles becoming unreadable. So there is usually ongoing maintenance work. However, as of 2026, I have noticed a marked decrease in this work. Websites are gradually accepting semantic HTML as best practice. Also, the business case for newsrooms to invest in rapid software development has largely evaporated.
Installation
To install this software, clone the repository inside your addon
directory:
# git clone https://git.exon.name/mat/retriever.git /var/www/friendica/addon/retriever
Then log in to your Friendica instance as admin, and enable the "Retriever" plugin on the Configuration/Addons page, for example https://my-friendica.test/admin/addons
From here, refer to the plugin help page, for example https://my-friendica.test/retriever/help.