HtmlAgilityPack 库 StackOverflowException 解决方案

     最近试用HtmlAgilityPack 来解析html,试用过程中程序会抛出StackOverflowException异常,从MSDN上可以看到,从 .NET Framework 2.0 版开始,将无法通过 try-catch 块捕获 StackOverflowException 对象,并且默认情况下将终止相应的进程。

    调查原因,发现,当一个html结构非常复杂时,HtmlAgilityPack 的递归次数会非常多,于是就报StackOverflowException异常,google了一下,找到下面的解决方案

首先,在库中新增一个类:

	
public class StackChecker
{
    public unsafe static bool HasSufficientStack(long bytes)
    {
        var stackInfo = new MEMORY_BASIC_INFORMATION();
 
        // We subtract one page for our request. VirtualQuery rounds UP to the next page.
        // Unfortunately, the stack grows down. If we're on the first page (last page in the
        // VirtualAlloc), we'll be moved to the next page, which is off the stack! Note this
        // doesn't work right for IA64 due to bigger pages.
        IntPtr currentAddr = new IntPtr((uint)&stackInfo - 4096);
 
        // Query for the current stack allocation information.
        VirtualQuery(currentAddr, ref stackInfo, sizeof(MEMORY_BASIC_INFORMATION));
 
        // If the current address minus the base (remember: the stack grows downward in the
        // address space) is greater than the number of bytes requested plus the reserved
        // space at the end, the request has succeeded.
        return ((uint)currentAddr.ToInt64() - stackInfo.AllocationBase) >
            (bytes + STACK_RESERVED_SPACE);
    }
 
    // We are conservative here. We assume that the platform needs a whole 16 pages to
    // respond to stack overflow (using an x86/x64 page-size, not IA64). That's 64KB,
    // which means that for very small stacks (e.g. 128KB) we'll fail a lot of stack checks
    // incorrectly.
    private const long STACK_RESERVED_SPACE = 4096 * 16;
 
    [DllImport("kernel32.dll")]
    private static extern int VirtualQuery(
        IntPtr lpAddress,
        ref MEMORY_BASIC_INFORMATION lpBuffer,
        int dwLength);
 
    private struct MEMORY_BASIC_INFORMATION
    {
        internal uint BaseAddress;
        internal uint AllocationBase;
        internal uint AllocationProtect;
        internal uint RegionSize;
        internal uint State;
        internal uint Protect;
        internal uint Type;
    }
}

然后,在递归次数较多的地方(such as HtmlNode.WriteTo(TextWriter outText) andHtmlNode.WriteTo(XmlWriter writer)):)添加下面的代码:

if (!StackChecker.HasSufficientStack(4*1024))
                throw new Exception("The document is too complex to parse");

OK,大功告成!

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏陈仁松博客

ASP.NET Core 'Microsoft.Win32.Registry' 错误修复

今天在发布Asp.net Core应用到Azure的时候出现错误InvalidOperationException: Cannot find compilati...

4878
来自专栏Ceph对象存储方案

Luminous版本PG 分布调优

Luminous版本开始新增的balancer模块在PG分布优化方面效果非常明显,操作也非常简便,强烈推荐各位在集群上线之前进行这一操作,能够极大的提升整个集群...

3265
来自专栏我和未来有约会

Kit 3D 更新

Kit3D is a 3D graphics engine written for Microsoft Silverlight. Kit3D was inita...

2626
来自专栏一个爱瞎折腾的程序猿

sqlserver使用存储过程跟踪SQL

USE [master] GO /****** Object: StoredProcedure [dbo].[sp_perfworkload_trace_s...

2180
来自专栏杨龙飞前端

scrollto 到指定位置

2554
来自专栏张善友的专栏

Miguel de Icaza 细说 Mix 07大会上的Silverlight和DLR

Mono之父Miguel de Icaza 详细报道微软Mix 07大会上的Silverlight和DLR ,上面还谈到了Mono and Silverligh...

2737
来自专栏转载gongluck的CSDN博客

cocos2dx 打灰机

#include "GamePlane.h" #include "PlaneSprite.h" #include "BulletNode.h" #include...

5676
来自专栏pangguoming

Spring Boot集成JasperReports生成PDF文档

由于工作需要,要实现后端根据模板动态填充数据生成PDF文档,通过技术选型,使用Ireport5.6来设计模板,结合JasperReports5.6工具库来调用渲...

1.2K7
来自专栏落花落雨不落叶

canvas画简单电路图

65611
来自专栏闻道于事

js登录滑动验证,不滑动无法登陆

js的判断这里是根据滑块的位置进行判断,应该是用一个flag判断 <%@ page language="java" contentType="text/html...

7188

扫码关注云+社区