专栏首页huginn筛选 RSS 源,制作新的全文 RSS 源

筛选 RSS 源,制作新的全文 RSS 源

筛选 RSS 源,制作新的全文 RSS 源

问题:假如你喜欢的网站只提供摘要型的 RSS 源,但是你希望能在 RSS 阅读器中阅读全文 RSS,同时还希望它只推送某些特定的文章 解决方法:利用 Huginn 制作一个经过筛选的全文 RSS 源,实现方法如下:

  1. RSSAgent:获取并解析网站提供的 RSS 源;
  2. TriggerAgent:过滤 RSS 源中的项目;
  3. WebsiteAgent:通过 RSS 源中的项目获取文章的全文;
  4. DataOutputAgent:输出全文 RSS。

1. RSSAgent

Name: Example RSS In

<span class="p">{</span> <span class="nt">"expected_update_period_in_days"</span><span class="p">:</span> <span class="s2">"14"</span><span class="p">,</span> <span class="nt">"clean"</span><span class="p">:</span> <span class="s2">"false"</span><span class="p">,</span> <span class="nt">"url"</span><span class="p">:</span> <span class="s2">"http://www.businesscat.happyjar.com/feed/"</span> <span class="p">}</span>

123456

<span class="p">{</span>  <span class="nt">"expected_update_period_in_days"</span><span class="p">:</span> <span class="s2">"14"</span><span class="p">,</span>  <span class="nt">"clean"</span><span class="p">:</span> <span class="s2">"false"</span><span class="p">,</span>  <span class="nt">"url"</span><span class="p">:</span> <span class="s2">"http://www.businesscat.happyjar.com/feed/"</span><span class="p">}</span>

2. TriggerAgent

Name: Example filter Event sources: Example RSS In Propagate immediately: Yes

<span class="p">{</span> <span class="nt">"expected_receive_period_in_days"</span><span class="p">:</span> <span class="s2">"14"</span><span class="p">,</span> <span class="nt">"keep_event"</span><span class="p">:</span> <span class="s2">"true"</span><span class="p">,</span> <span class="nt">"rules"</span><span class="p">:</span> <span class="p">[</span> <span class="p">{</span> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"regex"</span><span class="p">,</span> <span class="nt">"value"</span><span class="p">:</span> <span class="s2">".*\\/comic\\/.*"</span><span class="p">,</span> <span class="nt">"path"</span><span class="p">:</span> <span class="s2">"url"</span> <span class="p">}</span> <span class="p">]</span> <span class="p">}</span>

123456789101112

<span class="p">{</span>  <span class="nt">"expected_receive_period_in_days"</span><span class="p">:</span> <span class="s2">"14"</span><span class="p">,</span>  <span class="nt">"keep_event"</span><span class="p">:</span> <span class="s2">"true"</span><span class="p">,</span>  <span class="nt">"rules"</span><span class="p">:</span> <span class="p">[</span>    <span class="p">{</span>      <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"regex"</span><span class="p">,</span>      <span class="nt">"value"</span><span class="p">:</span> <span class="s2">".*\\/comic\\/.*"</span><span class="p">,</span>      <span class="nt">"path"</span><span class="p">:</span> <span class="s2">"url"</span>    <span class="p">}</span>  <span class="p">]</span><span class="p">}</span>

注意:将 keep_event 设置为 true,从而将解析的项目元素传递给下一个 agent

3. WebsiteAgent

Name: Example page fetch Event sources: Example filter Propagate immediately: Yes

<span class="p">{</span> <span class="nt">"expected_update_period_in_days"</span><span class="p">:</span> <span class="s2">"14"</span><span class="p">,</span> <span class="nt">"url"</span><span class="p">:</span> <span class="s2">""</span><span class="p">,</span> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"html"</span><span class="p">,</span> <span class="nt">"mode"</span><span class="p">:</span> <span class="s2">"merge"</span><span class="p">,</span> <span class="nt">"extract"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"imgurl"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"css"</span><span class="p">:</span> <span class="s2">"\#comic img"</span><span class="p">,</span> <span class="nt">"value"</span><span class="p">:</span> <span class="s2">"@src"</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span>

12345678910111213

<span class="p">{</span>  <span class="nt">"expected_update_period_in_days"</span><span class="p">:</span> <span class="s2">"14"</span><span class="p">,</span>  <span class="nt">"url"</span><span class="p">:</span> <span class="s2">""</span><span class="p">,</span>  <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"html"</span><span class="p">,</span>  <span class="nt">"mode"</span><span class="p">:</span> <span class="s2">"merge"</span><span class="p">,</span>  <span class="nt">"extract"</span><span class="p">:</span> <span class="p">{</span>    <span class="nt">"imgurl"</span><span class="p">:</span> <span class="p">{</span>      <span class="nt">"css"</span><span class="p">:</span> <span class="s2">"\#comic img"</span><span class="p">,</span>      <span class="nt">"value"</span><span class="p">:</span> <span class="s2">"@src"</span>    <span class="p">}</span>  <span class="p">}</span><span class="p">}</span>

注意:将 mode 设置为 merge,从而将解析的项目元素传递给下一个 agent

4. DataOutputAgent

Name: Example Rss out Event sources: Example page fetch Propagate immediately: Yes

<span class="p">{</span> <span class="nt">"secrets"</span><span class="p">:</span> <span class="p">[</span> <span class="s2">"examplerss"</span> <span class="p">],</span> <span class="nt">"expected_receive_period_in_days"</span><span class="p">:</span> <span class="s2">"14"</span><span class="p">,</span> <span class="nt">"template"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"title"</span><span class="p">:</span> <span class="s2">"Business Cat full comic feed"</span><span class="p">,</span> <span class="nt">"description"</span><span class="p">:</span> <span class="s2">"This is a feed of recent Business Cat comics generated by Huginn"</span><span class="p">,</span> <span class="nt">"item"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"title"</span><span class="p">:</span> <span class="s2">""</span><span class="p">,</span> <span class="nt">"description"</span><span class="p">:</span> <span class="s2">"&lt;img src=\"\" /&gt;"</span><span class="p">,</span> <span class="nt">"link"</span><span class="p">:</span> <span class="s2">""</span><span class="p">,</span> <span class="nt">"pubDate"</span><span class="p">:</span> <span class="s2">""</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span>

1234567891011121314151617

<span class="p">{</span>  <span class="nt">"secrets"</span><span class="p">:</span> <span class="p">[</span>    <span class="s2">"examplerss"</span>  <span class="p">],</span>  <span class="nt">"expected_receive_period_in_days"</span><span class="p">:</span> <span class="s2">"14"</span><span class="p">,</span>  <span class="nt">"template"</span><span class="p">:</span> <span class="p">{</span>    <span class="nt">"title"</span><span class="p">:</span> <span class="s2">"Business Cat full comic feed"</span><span class="p">,</span>    <span class="nt">"description"</span><span class="p">:</span> <span class="s2">"This is a feed of recent Business Cat comics generated by Huginn"</span><span class="p">,</span>    <span class="nt">"item"</span><span class="p">:</span> <span class="p">{</span>      <span class="nt">"title"</span><span class="p">:</span> <span class="s2">""</span><span class="p">,</span>      <span class="nt">"description"</span><span class="p">:</span> <span class="s2">"&lt;img src=\"\" /&gt;"</span><span class="p">,</span>      <span class="nt">"link"</span><span class="p">:</span> <span class="s2">""</span><span class="p">,</span>      <span class="nt">"pubDate"</span><span class="p">:</span> <span class="s2">""</span>    <span class="p">}</span>  <span class="p">}</span><span class="p">}</span>

本文由 Huginn 中文网 翻译,已经获得项目作者授权,项目原文访问Generating a filtered full-text RSS feed from an existing RSS feed

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 利用 Cloudmailin 服务解析邮件

    Cloudmailin 服务可以将邮件转化成 HTTP POST,这与 Webhook Agent 结合使用的话,可以实现很多有趣的功能,具体的设置步骤如下:

    huginn 中文网
  • 创建一个新的 Huginn Agent

    请注意:Huginn API 一直在改进,因此一些无用的 Agent 或将被放弃。我们非常希望您能将您的使用方法以及 API 应该更改什么告诉我们。查看 #60...

    huginn 中文网
  • linux 定时任务 Crontab 使用方法

    用户所建立的 crontab 文件中,每一行都代表一项任务,每行的每个字段代表一项设置,它的格式共分为六个字段,前五段是时间设定段,第六段是要执行的命令段,格式...

    huginn 中文网
  • 创建一个新的 Huginn Agent

    请注意:Huginn API 一直在改进,因此一些无用的 Agent 或将被放弃。我们非常希望您能将您的使用方法以及 API 应该更改什么告诉我们。查看 #60...

    huginn 中文网
  • 利用 Cloudmailin 服务解析邮件

    Cloudmailin 服务可以将邮件转化成 HTTP POST,这与 Webhook Agent 结合使用的话,可以实现很多有趣的功能,具体的设置步骤如下:

    huginn 中文网
  • 在Mac下配置PHP开发环境:Apache+php+MySql

    <span class="pln">sudo apachectl start</span>

    慕白
  • CSS 使用 Flex 布局来制作一个骰子

    我在上一篇博文 CSS 布局_2 Flex弹性盒中,对 Flex 弹性盒有着详细的介绍,在这里,我们使用 Flex 弹性盒布局,来实现骰子的布局,一个面可以设置...

    Nian糕
  • CSS3

    天天_哥
  • linux 定时任务 Crontab 使用方法

    用户所建立的 crontab 文件中,每一行都代表一项任务,每行的每个字段代表一项设置,它的格式共分为六个字段,前五段是时间设定段,第六段是要执行的命令段,格式...

    huginn 中文网
  • 对libevent+多线程服务器模型的C++封装类

    最近在看memcached的源码,觉得它那种libevent+多线程的服务器模型真的很不错,我将这个模型封装成一个C++类,根据我的简单测试,这个模型的效率真...

    bear_fish

扫码关注云+社区

领取腾讯云代金券