文章/答案/技术大牛

发布

社区首页 >问答首页 >列表或字符串数组包含Html源代码中的特定单词

问列表或字符串数组包含Html源代码中的特定单词
EN

Stack Overflow用户

提问于 2014-10-14 08:01:48

回答 2查看 199关注 0票数 0

我想要开发一个程序，可以计数在源代码中的html标记，所以我写了一个代码，以获取网站的源代码如下。

WebRequest req = HttpWebRequest.Create("http://google.com");
req.Method = "GET";
string source;

using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
    source = reader.ReadToEnd();
}

通过这种方式，我可以获取站点的源代码并绑定到string。接下来，我想要的是控制字符串和计数html /html body /body p /p bla bla bla。什么是LINQ方法来计数源代码中的所有html标记并显示结果

HTML: 2 机构:2 UL:42

arrays

linq

web

回答 2

Stack Overflow用户

发布于 2014-10-14 08:07:55

您可以使用HtmlAgilityPack来解析HTML并递归计数所有标记：

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(source);
int allTags = doc.DocumentNode.Descendants().Count();

如果您只想计数特定的标签(F.E。( UL)将Descendants改为Descendants("UL")。

请注意，这被计算为一个UL-标记(而不是两个)：

   <ul>
      <li><a id=""menuSubItem1""></a></li>
      <li><a id=""menuSubItem2""></a></li>
   </ul>

您还可以使用HtmlAgilityPack直接从web解析html：

var web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://google.com");
int countAll = doc.DocumentNode.Descendants().Count();
int countHtml = doc.DocumentNode.Descendants("HTML").Count();
int countBody = doc.DocumentNode.Descendants("BODY").Count();
int countUL = doc.DocumentNode.Descendants("UL").Count();

票数 3

Stack Overflow用户

发布于 2014-10-14 08:09:28

我推荐HtmlAgilityPack

var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(source);
var nodes = htmlDocument.DocumentNode
            .Descendants()
            .GroupBy(x => x.Name)
            .ToDictionary(x => x.Key, x => x.Count() * 2);

通过这种方式，您可以轻松地按其名称对所有标记进行分组，并使用nodes["html"]获取特定节点的计数。

Descendants还将返回文本节点，标记之间的文本被计算为一个node.It还将包括您希望只获得您可以添加的元素节点的comments.If：

.Where(x => x.NodeType == HtmlNodeType.Element)

在GroupBy之前。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/26355616

复制

相似问题

问列表或字符串数组包含Html源代码中的特定单词
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问列表或字符串数组包含Html源代码中的特定单词EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问列表或字符串数组包含Html源代码中的特定单词
EN