前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >想看小说,自己写个采集类,读网页文章写入txt文件

想看小说,自己写个采集类,读网页文章写入txt文件

作者头像
liulun
发布2022-05-09 11:44:26
6100
发布2022-05-09 11:44:26
举报
文章被收录于专栏:liulun
代码语言:javascript
复制
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;

namespace allen
{
    class Program
    {
        /// <summary>
        /// 根据网址取得HTML代码
        /// </summary>
        /// <param name="url"></param>
        /// <returns></returns>
        static string GetHtml(string url)
        {
            HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
            HttpWebResponse response = request.GetResponse() as HttpWebResponse;
            Stream stream = response.GetResponseStream();
            StreamReader reader = new StreamReader(stream, Encoding.Default);
            string html = reader.ReadToEnd();
            stream.Close();
            return html;
        }
        static Regex reg;
        /// <summary>
        /// 过滤器,留下文章正文
        /// </summary>
        /// <param name="htmlStr"></param>
        /// <returns></returns>
        static string MyFilter(string htmlStr)
        {
            reg = new Regex(@"\s+");//先把任意空白符做掉
            htmlStr = reg.Replace(htmlStr, "");
            reg = new Regex("点此下载封神演义.txt</font></font></a></div></td>.*</div></td></tr><tr><tdclass=");//匹配出正文
            Match match = reg.Match(htmlStr);
            string result = match.Value;
            result = result.Replace("点此下载封神演义.txt</font></font></a></div></td>", "");
            result = result.Replace("</div></td></tr><tr><tdclass=","");
            result = result.Replace("</tr></table>", "");
            result = result.Replace("本文章下载于www.Txt66.com", "");
            result = result.Replace("<br>",Environment.NewLine);
            return result;
        }
        /// <summary>
        /// 循环读取每页的文章,写入记事本
        /// </summary>
        static void WriteFile()
        {
            int page_num = 1;
            string url = "http://www.txt66.com/read2.asp?id=8480&PageNum={0}";
            string url_temp = string.Empty;
            string html = string.Empty;
            string text = string.Empty;
            StreamWriter sw = new StreamWriter(@"F:\g.txt", true, Encoding.Unicode);
            while (page_num < 124)
            {
                url_temp = string.Format(url, page_num);
                html = GetHtml(url_temp);
                text = MyFilter(html);
                sw.Write(text);
                Console.WriteLine("写入第{0}页", page_num);
                System.Threading.Thread.Sleep(600);
                page_num++;
            }
            sw.Close();
        }
        /// <summary>
        /// 主函数
        /// </summary>
        /// <param name="args"></param>
        static void Main(string[] args)
        {
            WriteFile();
            Console.ReadKey();
        }
    }
}
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2010-03-06,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档