我有一个客户偶尔会报告IIS死锁。它是一个跨多个服务器的大型托管ASP.Net应用程序,但在这种情况下,它发生在一个简单的web服务器上,它要么返回静态文件(HTTP、Javascript),要么充当代理,并在应用程序层上调用web服务。注意,这是一个.Net 3.5应用程序,应用程序池使用经典的管道。
我对此有一个转储,并一直在分析它,但据我所知,没有任何资源以这种方式被阻塞,从而导致死锁的发生。
故障线程为#4,属于IIS。堆栈指示它检查运行状况问题,找到一个(死锁?),并使工作进程失败。
0:004> kv
# Child-SP RetAddr : Args to Child : Call Site
00 00000000`01b1e6c0 000007fe`f8c96d82 : 00000000`01b1e810 00000000`00000000 00000000`01b1e848 00000000`01b1e848 : KERNELBASE!RaiseException+0x39
01 00000000`01b1e790 000007fe`f80779cc : 00000000`05da4c58 00000000`00000082 00000000`05da4c58 00000000`00000082 : w3wphost!W3WP_HOST::FailWorkerProcess+0x2e
02 00000000`01b1e7e0 000007fe`f80728cb : 00000000`00000001 00000000`00000000 00000000`00000000 00000000`000c000a : isapi!RegisterModule+0xcce4
03 00000000`01b1e830 000007fe`f806dd84 : 00000000`01b1ed40 00000000`00000020 00000000`00000004 000007fe`00000020 : isapi!RegisterModule+0x7be3
04 00000000`01b1ecf0 000007fe`f7f07459 : 00000000`05da4c58 00000000`05da4c58 00000000`05da4c58 000007fe`f806e06f : isapi!RegisterModule+0x309c
05 00000000`01b1edd0 000007fe`f7f07617 : 01cefba1`36242e6e 000007fe`f80419d6 00000000`00000000 00000000`05da4c58 : webengine!ReportHealthProblem+0xc9
06 00000000`01b1ef30 000007fe`f7f08d6b : 01cefba1`34feed30 01cefba1`36242e6e 00000000`05da4c58 00000000`00000000 : webengine!CheckAndReportHealthProblems+0xb7
07 00000000`01b1ef60 000007fe`f806c540 : 00000000`0126a088 00000000`05da4c58 00000000`01b1f330 000007fe`f8063588 : webengine!AspNetHttpExtensionProc+0x1db
...我首先检查的是与!dlk的死锁;没有检测到任何死锁。
0:004> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
No deadlocks detected.据报道,dlk没有找到一些死锁,所以我下一次检查!线程,看看是否有锁。很多都是。这些线程正在另一台服务器上调用webservices。
0:004> !threads
ThreadCount: 63
UnstartedThread: 0
BackgroundThread: 57
PendingThread: 0
DeadThread: 6
Hosted Runtime: no
PreEmptive Lock
ID OSID ThreadOBJ State GC GC Alloc Context Domain Count APT Exception
7 1 96c 00000000021394a0 8220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
16 2 9a4 0000000002142250 b220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Finalizer)
17 4 a64 00000000021913c0 80a220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Completion Port)
18 5 a70 00000000021930f0 1220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
20 e b6c 00000000022700e0 880b220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Completion Port)
6 b 968 0000000005aaf0b0 220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
4 41 960 0000000005ab0220 220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
5 4f 964 0000000005a647c0 220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
30 ac 700 0000000005aafc50 220 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn
XXXX a7 0 0000000005ab1960 1801820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Worker)
XXXX aa 0 0000000005dcf540 1801820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Worker)
XXXX be 0 0000000005dce3d0 1801820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 Ukn (Threadpool Worker)
XXXX ae 0 0000000005dcd830 1801820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA (Threadpool Worker)
XXXX 38 0 0000000005ab0dc0 9820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA
XXXX 39 0 0000000005ab2500 9820 Enabled 0000000000000000:0000000000000000 00000000012e41d0 0 MTA
31 37 133c 0000000005ab07f0 180b220 Enabled 00000001c0668760:00000001c0668d50 0000000002193ba0 1 MTA (Threadpool Worker)
32 34 ce8 0000000005d93550 180b220 Enabled 00000001806563a8:0000000180657ad0 0000000002193ba0 1 MTA (Threadpool Worker)
33 c4 be4 0000000005d90ca0 180b220 Enabled 00000001c065e7b0:00000001c065ed50 0000000002248da0 1 MTA (Threadpool Worker)
34 101 5a0 0000000005d923e0 180b220 Enabled 000000014072bfa0:000000014072daa0 0000000002248da0 1 MTA (Threadpool Worker)
36 c5 6c0 0000000005d906d0 180b220 Enabled 000000010004d360:000000010004e150 0000000002193ba0 1 MTA (Threadpool Worker)
37 35 76c 0000000005d940f0 180b220 Enabled 000000010005a950:000000010005c150 0000000002248da0 1 MTA (Threadpool Worker)
38 36 cdc 0000000005d92f80 180b220 Enabled 0000000100069c20:000000010006a150 0000000002193ba0 1 MTA (Threadpool Worker)
39 33 b90 0000000005d91e10 180b220 Enabled 000000014072f460:000000014072faa0 0000000002248da0 1 MTA (Threadpool Worker)
40 32 12b8 0000000005d929b0 180b220 Enabled 0000000100083520:0000000100084150 0000000002193ba0 1 MTA (Threadpool Worker)
41 31 11d8 0000000005d95e00 180b220 Enabled 0000000100091cb0:0000000100092150 0000000002193ba0 1 MTA (Threadpool Worker)
42 30 d78 0000000005d946c0 180b220 Enabled 00000001000a3af8:00000001000a4150 0000000002193ba0 1 MTA (Threadpool Worker)
43 2f bd8 0000000005d95830 180b220 Enabled 00000001000b1200:00000001000b2150 0000000002193ba0 1 MTA (Threadpool Worker)
44 2e 598 0000000005d91840 180b220 Enabled 00000001000bf808:00000001000c0150 0000000002193ba0 1 MTA (Threadpool Worker)
45 2d ba0 0000000005d93b20 180b220 Enabled 00000001000cd698:00000001000ce150 0000000002193ba0 1 MTA (Threadpool Worker)
46 2c 136c 0000000005d94c90 180b220 Enabled 00000001000df068:00000001000e0150 0000000002193ba0 1 MTA (Threadpool Worker)
47 ca 8f0 0000000005d90100 180b220 Enabled 00000001000ed618:00000001000ee150 0000000002193ba0 1 MTA (Threadpool Worker)
48 102 d14 0000000005d95260 180b220 Enabled 00000001000fc7d0:00000001000fe150 0000000002193ba0 1 MTA (Threadpool Worker)
49 cb 12c0 0000000005d91270 180b220 Enabled 000000010010cf88:000000010010e150 0000000002193ba0 1 MTA (Threadpool Worker)
50 c7 e98 0000000005d969a0 180b220 Enabled 000000010011c618:000000010011e150 0000000002248da0 1 MTA (Threadpool Worker)
51 e2 d74 0000000005d96f70 180b220 Enabled 000000010012b758:000000010012c150 0000000002248da0 1 MTA (Threadpool Worker)
52 c2 1278 0000000005d97540 180b220 Enabled 00000001001395e0:000000010013a150 0000000002248da0 1 MTA (Threadpool Worker)
53 c8 8e0 0000000005d963d0 180b220 Enabled 0000000100148898:000000010014a150 0000000002193ba0 1 MTA (Threadpool Worker)
54 c6 24c 0000000005aaf680 180b220 Enabled 00000001001595d8:000000010015a150 0000000002193ba0 1 MTA (Threadpool Worker)
55 c9 708 0000000005ab1f30 180b220 Enabled 0000000180658120:0000000180659ad0 0000000002248da0 1 MTA (Threadpool Worker)
56 c3 110c 0000000005ab1390 180b220 Enabled 0000000100176ce8:0000000100178150 0000000002248da0 1 MTA (Threadpool Worker)
57 cd 8dc 0000000005dc8100 180b220 Enabled 000000010018c0f8:000000010018c150 0000000002193ba0 1 MTA (Threadpool Worker)
58 d1 588 0000000005dca9b0 180b220 Enabled 000000010019a620:000000010019c150 0000000002193ba0 1 MTA (Threadpool Worker)
59 d0 31c 0000000005dc8ca0 180b220 Enabled 00000001001ab9a0:00000001001ac150 0000000002193ba0 1 MTA (Threadpool Worker)
60 1a cb4 0000000005dca3e0 180b220 Enabled 00000001001ba7f8:00000001001bc150 0000000002193ba0 1 MTA (Threadpool Worker)
61 1b 13cc 0000000005dc9840 180b220 Enabled 00000001001ca798:00000001001cc150 0000000002193ba0 1 MTA (Threadpool Worker)
62 1c 12f4 0000000005dccc90 180b220 Enabled 00000001001da7d0:00000001001dc150 0000000002193ba0 1 MTA (Threadpool Worker)
63 1d 11c8 0000000005dc86d0 180b220 Enabled 00000001001eab48:00000001001ec150 0000000002193ba0 1 MTA (Threadpool Worker)
64 1e 1304 0000000005dcde00 180b220 Enabled 00000001001fa960:00000001001fc150 0000000002248da0 1 MTA (Threadpool Worker)
65 1f 1258 0000000005dcbb20 180b220 Enabled 000000010020ab18:000000010020c150 0000000002193ba0 1 MTA (Threadpool Worker)
66 20 854 0000000005dc9270 180b220 Enabled 00000001407307c8:0000000140731aa0 0000000002193ba0 1 MTA (Threadpool Worker)
67 21 13bc 0000000005dcaf80 180b220 Enabled 000000010022bd30:000000010022c150 0000000002248da0 1 MTA (Threadpool Worker)
68 22 c4c 0000000005dc9e10 180b220 Enabled 00000001002409f0:0000000100242150 0000000002193ba0 1 MTA (Threadpool Worker)
69 23 10dc 0000000005dcc6c0 180b220 Enabled 0000000100251c70:0000000100252150 0000000002193ba0 1 MTA (Threadpool Worker)
70 24 264 0000000005dcc0f0 180b220 Enabled 0000000100261288:0000000100262150 0000000002193ba0 1 MTA (Threadpool Worker)
71 25 3c8 0000000005dcb550 180b220 Enabled 0000000100271688:0000000100272150 0000000002193ba0 1 MTA (Threadpool Worker)
72 26 b88 0000000005dcd260 180b220 Enabled 0000000100287420:0000000100288150 0000000002193ba0 1 MTA (Threadpool Worker)
74 27 1318 0000000005dce9a0 180b220 Enabled 00000001002975c8:0000000100298150 0000000002248da0 1 MTA (Threadpool Worker)
75 28 bdc 0000000005dcef70 180b220 Enabled 00000001002a6d48:00000001002a8150 0000000002193ba0 1 MTA (Threadpool Worker)
76 29 100 0000000005a65930 180b220 Enabled 00000001002b7698:00000001002b8150 0000000002193ba0 1 MTA (Threadpool Worker)
77 2a e5c 0000000005a67070 180b220 Enabled 00000001002c70f8:00000001002c8150 0000000002248da0 1 MTA (Threadpool Worker)
78 2b 434 0000000005a67c10 180b220 Enabled 00000001002d6b78:00000001002d8150 0000000002193ba0 1 MTA (Threadpool Worker)
79 cc e4c 0000000005a65360 180b220 Enabled 00000001002e65f8:00000001002e8150 0000000002193ba0 1 MTA (Threadpool Worker)
80 3a dd0 0000000005a64d90 180b220 Enabled 00000001002f66c0:00000001002f8150 0000000002193ba0 1 MTA (Threadpool Worker)接下来,我检查是否有任何线程挂起。所有这些都不到一分钟,10秒以上的代码都属于ASP.Net基础结构,而没有在它们上运行任何代码。
0:004> !runaway
User Mode Time
Thread Time
4:960 0 days 0:00:39.843
5:964 0 days 0:00:33.281
6:968 0 days 0:00:25.906
7:96c 0 days 0:00:24.000
31:133c 0 days 0:00:09.437
14:99c 0 days 0:00:06.953
32:ce8 0 days 0:00:06.921
15:9a0 0 days 0:00:06.890
13:998 0 days 0:00:06.750
30:700 0 days 0:00:06.062
33:be4 0 days 0:00:04.937
12:994 0 days 0:00:04.640
26:9f8 0 days 0:00:02.859
34:5a0 0 days 0:00:01.203
16:9a4 0 days 0:00:00.765
0:934 0 days 0:00:00.062
17:a64 0 days 0:00:00.031
65:1258 0 days 0:00:00.015
52:1278 0 days 0:00:00.015
21:1338 0 days 0:00:00.015
80:dd0 0 days 0:00:00.000
79:e4c 0 days 0:00:00.000
78:434 0 days 0:00:00.000
77:e5c 0 days 0:00:00.000
76:100 0 days 0:00:00.000
75:bdc 0 days 0:00:00.000
74:1318 0 days 0:00:00.000
73:77c 0 days 0:00:00.000
72:b88 0 days 0:00:00.000
71:3c8 0 days 0:00:00.000
70:264 0 days 0:00:00.000
69:10dc 0 days 0:00:00.000
68:c4c 0 days 0:00:00.000
67:13bc 0 days 0:00:00.000
66:854 0 days 0:00:00.000
64:1304 0 days 0:00:00.000
63:11c8 0 days 0:00:00.000
62:12f4 0 days 0:00:00.000
61:13cc 0 days 0:00:00.000
60:cb4 0 days 0:00:00.000
59:31c 0 days 0:00:00.000
58:588 0 days 0:00:00.000
57:8dc 0 days 0:00:00.000
56:110c 0 days 0:00:00.000
55:708 0 days 0:00:00.000
54:24c 0 days 0:00:00.000
53:8e0 0 days 0:00:00.000
51:d74 0 days 0:00:00.000
50:e98 0 days 0:00:00.000
49:12c0 0 days 0:00:00.000
48:d14 0 days 0:00:00.000
47:8f0 0 days 0:00:00.000
46:136c 0 days 0:00:00.000
45:ba0 0 days 0:00:00.000
44:598 0 days 0:00:00.000
43:bd8 0 days 0:00:00.000
42:d78 0 days 0:00:00.000
41:11d8 0 days 0:00:00.000
40:12b8 0 days 0:00:00.000
39:b90 0 days 0:00:00.000
38:cdc 0 days 0:00:00.000
37:76c 0 days 0:00:00.000
36:6c0 0 days 0:00:00.000
35:11b4 0 days 0:00:00.000
29:11b8 0 days 0:00:00.000
28:438 0 days 0:00:00.000
27:12d8 0 days 0:00:00.000
25:e44 0 days 0:00:00.000
24:670 0 days 0:00:00.000
23:1284 0 days 0:00:00.000
22:139c 0 days 0:00:00.000
20:b6c 0 days 0:00:00.000
19:adc 0 days 0:00:00.000
18:a70 0 days 0:00:00.000
11:990 0 days 0:00:00.000
10:988 0 days 0:00:00.000
9:984 0 days 0:00:00.000
8:970 0 days 0:00:00.000
3:954 0 days 0:00:00.000
2:940 0 days 0:00:00.000
1:938 0 days 0:00:00.000我仍然对线程上的锁很好奇,我检查一下!它表明,这些都是属于由System.Web.HttpApplication创建的ASP.Net实例的薄块。也没有递归,所以这看起来也不错。
我下一次检查!线程池。我没有很好地使用这一点,所以我不确定我是否正确地解释了输出,但是看起来应用程序还没有达到极限(400),并且没有等待请求,所以这似乎是可以的。
0:004> !threadpool
CPU utilization 0%
Worker Thread: Total: 48 Running: 48 Idle: 0 MaxLimit: 400 MinLimit: 4
Work Request in Queue: 0
--------------------------------------
Number of Timers: 34
--------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 8 CurrentLimit: 0 MaxLimit: 400 MinLimit: 4我已经断断续续地对此进行了几天的分析,对此我感到迷惑不解,并希望就如何识别IIS检测到的“健康问题”提出建议。
更新线程堆栈太大,无法在这里包含,所以我已经将它们上传到这里。出于隐私原因,公司类型已改名为Foo.Bar。方法名0是由于实际混淆造成的。
更新2多亏了这些评论,我找到了KB 821268,这似乎是相关的。我不能正确地解释!线程池的输出。它表示总计: 48,运行: 48 (和空闲: 0),这可能意味着它已经筋疲力尽了,但是我不知道这对MaxLimit: 400意味着什么。也许有人能让我直截了当。
发布于 2014-01-10 14:19:45
经过更多的研究和分析,我能够找出问题的原因。我对线程池的输出考虑太多了。虽然这些数据是有用的,但它告诉我的是,我并没有耗尽工作线程或完成线程。
相反,真正的问题是活跃网络连接的数量。此值在.config文件中设置。对于.Net 2+环境,通常是自动配置为12 *#CPU。对于此转储,CPU数量为4,最多可打开48个连接。这与通过HttpWebRequest读取数据的线程数相匹配,如堆栈所示。
这方面的最初线索来自于KB 821268,尽管它缺乏如何诊断这一功能的详细信息。这种验证是在这博客文章中发现的。简而言之,我将所有System.Net.ServicePoint对象转储到堆中。每一个都有一个m_CurrentConnections和m_ConnectionLimit。所有的m_CurrentConnections都给了我开放网络连接的总数,这个转储的总数是48个。此外,m_ConnectionLimit验证了最大值为48。数字吻合。已达到导致IIS终止进程的最大连接数。这并不是所谓的“死锁”,但不幸的是,IIS在事件日志中保留消息时,对问题或原因并不十分具体。
https://stackoverflow.com/questions/20969087
复制相似问题