文章/答案/技术大牛

发布

社区首页 >问答首页 >自动扫描RSS提要并填充WebContent模型

问自动扫描RSS提要并填充WebContent模型
EN

Stack Overflow用户

提问于 2016-08-04 11:47:27

回答 1查看 111关注 0票数 0

我正在尝试创建一个Django服务器应用程序(目前位于本地主机上)，该应用程序将常规检查模型Blogger提供的RSS提要(即每小时一次)，然后提取数据，为模型WebContent提供数据。

到目前为止，我已经在http://127.0.0.1:8000/api/blogger/创建了一个数据端点，它输出以下信息：

[
    {
        "id": "c384f191-662f-43f9-a39d-2da737e7cbb8",
        "name": "Patricia Bright",
        "avatar": "http://127.0.0.1:8000/media/img/1470305802086_IMG_5921.JPG",
        "rss_url": "http://patriciabright.co.uk/?feed=rss2",
    },
    {
        "id": "dc70ca6b-94cc-4ba9-a0c8-0d907f7ab020",
        "name": "Shirley B. Eniang",
        "avatar": "http://127.0.0.1:8000/media/img/1470305797487_photo.jpg",
        "rss_url": "http://shirleyswardrobe.com/feed/",
    }
]

现在，我想循环上面的rss_url值，并从每个RSS提要中提取特定的信息，为模型WebContent提供数据。我想每小时运行一次，在填充模型WebContent之前，应该检查数据是否已经存在(这样我就不会收到任何重复的请求)。

这就是我到目前为止在models.py中所做的

from uuid import uuid4
from time import time
from django.db import models
from django.contrib.contenttypes.models import ContentType
import feedparser


def get_upload_avatar_path(instance, filename):
    timestamp = int(round(time() * 1000))
    path = "img/%s_%s" % (timestamp, filename)
    return path


class Blogger(models.Model):
    """
    Blogger model
    """
    id = models.UUIDField(primary_key=True, default=uuid4, editable=False)
    name = models.CharField(max_length=255, null=True, default=None)
    avatar = models.ImageField(upload_to=get_upload_avatar_path, blank=True, null=True, default=None, max_length=255)
    url = models.CharField(max_length=255, null=True, default=None)
    rss_url = models.CharField(max_length=255, null=True, default=None)
    instagram_url = models.CharField(max_length=255, null=True, default=None)
    twitter_url = models.CharField(max_length=255, null=True, default=None)
    youtube_url = models.CharField(max_length=255, null=True, default=None)

    class Meta:
        verbose_name_plural = "Bloggers"

    def __str__(self):
        return "%s" % (self.name)

    def generate_web_content(self):
        """
        Scan for blogger RSS feeds and generate web content
        :return: None
        """
        web_content = WebContent.objects.create(user_profile=self)
        self._scan_web_content(web_content)

    def _scan_web_content(self, web_content=None):
        """
        Scan blogger RSS feeds
        :param report: Associated WebContent object
        :return: None
        """
        urls = Blogger.objects.all()
        d = feedparser.parse(urls['rss_url'])
        for post in d.entries:
            blogger = self
            title = post.title.encode('ascii', 'ignore')
            url = post.link.encode('ascii', 'ignore')


class WebContent(models.Model):
    """
    Model to store blogger web content
    """
    id = models.UUIDField(primary_key=True, default=uuid4, editable=False)
    blogger = models.ForeignKey(Blogger)
    title = models.CharField(max_length=255, null=True, default=None)
    url = models.CharField(max_length=255, null=True, default=None)

    class Meta:
        verbose_name_plural = "Web Content"

我已经成功地在一个独立的python文件中模拟了一个实现，它运行得很好。我想我正在尝试将它移植到Django应用程序中。

import feedparser
import json
import sys
import os


os.system('cls')


# Import json
with open('bloggers.json') as jsonfile:
    j = json.load(jsonfile)


for blogger in j['bloggers']:
    print (blogger['name'])
    print "---------------------"

    d = feedparser.parse(blogger['rssUrl'])
    for post in d.entries:
        print post.title.encode('ascii', 'ignore') + ": " + post.link.encode('ascii', 'ignore') + "\n"

任何帮助都将不胜感激。

django

rss

python

json

Stack Overflow用户

回答已采纳

发布于 2016-08-06 14:00:01

在您的代码中似乎有许多问题：

在方法generate_web_content中，您正在通过传递参数user_profile=self来创建WebContent对象，而参数user_profile=self应该是blogger=self。
在方法_scan_web_content中，您已经查询了所有Blogger对象，如： urls = Blogger.objects.all() 因此，urls是一个queryset对象，您不能像urls['rss_url']那样访问键，而应该这样做 D= feedparser.parse(self.rss_url)
在for循环中，应该向作为参数传递的WebContent对象添加属性，如下所示：关于d.entries中的帖子: web_content.blogger = self web_content.title = post.title.encode('ascii'，‘web_content.url’)web_content.url= post.link.encode('ascii'，‘web_content.url’) web_content.save() 否则，此方法不会执行任何操作。

希望它能澄清！

票数 1

查看全部 1 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/38766502

复制

相似问题

问自动扫描RSS提要并填充WebContent模型
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问自动扫描RSS提要并填充WebContent模型EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问自动扫描RSS提要并填充WebContent模型
EN