Blue Screen (SYSTEM_SERVICE_EXCEPTION) on WS08R2

Posted on Updated on

One of our Remote Desktop Session Hosts (RDSHs) experienced a blue screen (Blue Screen of Death or BSoD) a few weeks ago. I didn’t actually see the BSoD, but I was aware due to the following event being logged in the System event log at boot time (after the BSoD had occurred of course):

Event ID: 1001

Description: The computer has rebooted from a bugcheck. The bugcheck was: 0x0000003b (0x0000000080000003, 0xfffff880050a88eb, 0xfffff88007224d10, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 060112-27736-01.

Source: BugCheck

Level: Error

User: N/A

Computer: YOUR_SYSTEM

Logged: DATE_AND_TIME

Task Category: None

Keywords: Classic

OpCode: Info

Figure 1

Figure 2

Figure 3

This error was logged only once AFAIK, so it seems we were dealing with an unfrequent issue. I had to find out what the cause was and of course even more important, the solution or at least the workaround.

The event tells us the system has rebooted as a result of a bugcheck (= bug check), which is the bug checking and handling process initiated after a serious kernel mode bug has caused an exception to be thrown. This process includes the showing of the BSoD which informs the user/administrator of the bug. During this Blue Screen a dump has been created (C:\Windows\MEMORY.DMP). Note this path can differ if you have configured it differently (see the “System failure” section in the Startup and Recovery” configuration dialog, which you can get by clicking the button “Settings” in the section “Startup and Recovery” on the Advanced tab of the System Properties dialog box). This “report” has got the ID “060112-27736-01” with 060112 meaning the error occurred on the 1st (“01”) of June (“06”) 2012 (“12”). Again, this Report Id is different every time. The bug check discovered a bug with bug check code (STOP error) 0x0000003b (0x3B), which is a hexadecimal notation for SYSTEM_SERVICE_EXCEPTION (http://msdn.microsoft.com/en-us/library/windows/hardware/ff558949(v=vs.85).aspx), meaning “an” exception has been thrown in a system service routine. The first parameter is the specific exception that has been thrown and identifies the specific bug type. In our case this argument is 0x0000000080000003, meaning the exception code is 0x80000003 (a debug breakpoint exception, which shouldn’t have happened there). The 2nd argument (0xfffff880050a88eb) is the address of the faulting instruction; on this address the exception record is placed by the way. The 3rd argument (0xfffff88007224d10) is the address where the context record is placed (I’m not really sure, but perhaps this address could change). The 4th argument is always zero (0x0000000000000000).

Side note: the event source BugCheck is technically registered with the name “BugCheck” (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\eventlog\System\BugCheck). No DLL is configured here, but there is a referral to a GUID ({ABCE23E7-DE45-4366-8631-84FA6C525952}) for more information through the REG_EXPAND_SZ named value providerGuid. A provider with this GUID is registered in the registry at the location HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Publishers\{ABCE23E7-DE45-4366-8631-84FA6C525952} (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Publishers is the place where event providers are registered), which contains the provider’s name (Microsoft-Windows-WER-SystemErrorReporting) and the code file that is used for the event logging (%SystemRoot%\System32\WerFault.exe as indicated by “%SystemRoot%\system32\werfault.exe”). It’s obvious BugCheck events are logged by the Windows Error Reporting (WER) component, more precisely the part of WER that deals with “system errors” (which cause BSoDs), as already indicated by the provider’s name (Microsoft-Windows-WER-SystemErrorReporting) and the executable (it’s “commonly” known werfault.exe implements that part of WER).

When I opened the dump with WinDbg, part of Debugging Tools for Windows, and took a quick look (through the command “!analyze –v”) I was able to get much more information:

Figure 4

Figure 5

It seems the faulting module is a driver implemented by rdbss.sys. WinDbg also tells us the bug was triggered in a process based on the svchost.exe image, so basically the bug in the driver was triggered in a Windows service. The bug occurred in the function RxFsdCommonDispatch.

Let’s find out a bit more information. Those are the file details of the driver, located in %windir%\System32\drivers, and its description is simply “Redirected Drive Buffering SubSystem Driver”. The substring “rdbss” from the file name is obviously standing for “Redirected Drive Buffering SubSystem”.

Figure 6

The driver is registered as a driver Windows service in the registry with the technical name “rdbss”: just have a look at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\rdbss.

Figure 7

Figure 8

The driver is loaded in the “System process”, as you can see in Process Explorer:

Figure 9

“Redirected Drive Buffering SubSystem”. It seems the driver takes care of some buffering aspects for “redirected drives”, which could mean network drives, UNC paths, DFS paths,… This could be very well the case, as it seems some other driver Windows services seem to depend on rdbss.sys and those have to do with these topics too:

  • CSC: the Client Side Caching (CSC) driver (csc.sys). It’s described as the “Windows Client Side Caching Driver”. Registry key is HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\CSC.
  • MRxDAV: a redirection (“rx”) driver for WebDAV (mrxdav.sys). It’s also described as the “Windows NT WebDav Minirdr” (with “rdr” meaning “redirector”). Registry key is HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\MRxDAV.
  • mrxsmb: a redirection (“rx”) driver for Server Message Block (SMB) (“smb”) (mrxsmb.sys). It’s also described as the “Windows NT SMB Minirdr” (with “rdr” meaning “redirector”). Registry key is HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\mrxsmb.
  • RDPDR: the Terminal Server Device Redirector Driver driver (rdpdr.sys), so RDPDR stands for Terminal Server Device Redirector). It’s also described as the “Microsoft RDP Device redirector”. Registry key is HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\RDPDR.

I’m not going to explain CSC, WebDAV (which makes use of what can be considered as some kind of distributed file system), SMB (previously CIFS, the network protocol for remote file systems) and RDPDR (which is responsible for device redirection, where redirected devices often can be considered as redirected drives) any further, but I guess you feel the link, right? The last two, by the way, are also loaded by the “System process”.

Side note: Windows NT had a network redirector driver model called “rdr”. The model is very simple: everyone trying to write such a redirector had to write the full code himself. This changed with the rdr2 model introduced with Windows 2000: different modules for a network redirector were built-in en could be reused. One of those modules was/is RDBSS. Every high-level network redirector (called a mini-redirector, like MRxDAV and mrxsmb (hence “Minirdr”)) makes use of RDBSS, which takes care of the buffering aspect. RDBSS not only communicates with mini-redirectors, but also with other Windows components, like the Memory Manager, the Cache Manager, the I/O Manager, etc.

WinDbg already told me the bug took place in a svchost.exe process, so actually in a Windows service. But which one? Well, from the minidump I had I couldn’t really tell. It’s possible the bug is only triggered when used by that particular service, but that’s far from sure. The problem is I don’t find any information about this error on the Net. So I looked up the latest version of rdbss.sys available through an update and found KB2559767 (http://support.microsoft.com/kb/2559767). This update would upgrade my rdbss.sys to version 6.1.7601.21957. Since then we haven’t experienced the crash anymore, but I must admit

  • Before we only experienced the crash only once in a time period of many months
  • There was no evidence at all that the latest version would really solve our issue, so there was no certainty at all that this fix would never cause this crash again

So I tried to do more research… I couldn’t get any more stack info than the following:

Figure 10

On the Net (http://blogs.technet.com/b/dip/archive/2012/05/31/win2008-rtm-stop-0x3b-in-rdbss-rxfsdcommondispatch-ad7.aspx) though I found someone with a larger call stack “dump”:

00 fffff880`03b30398 fffff800`018ccca9 nt!KeBugCheckEx
01 fffff880`03b303a0 fffff800`018cc5fc nt!KiBugCheckDispatch+0x69
02 fffff880`03b304e0 fffff800`018f340d nt!KiSystemServiceHandler+0x7c
03 fffff880`03b30520 fffff800`018faa90 nt!RtlpExecuteHandlerForException+0xd
04 fffff880`03b30550 fffff800`019079ef nt!RtlDispatchException+0x410
05 fffff880`03b30c30 fffff800`018ccd82 nt!KiDispatchException+0x16f
06 fffff880`03b312c0 fffff800`018cabb4 nt!KiExceptionDispatch+0xc2
07 fffff880`03b314a0 fffff880`02e172e0 nt!KiBreakpointTrap+0xf4
08 fffff880`03b31630 fffff880`02e35b74 rdbss!RxFsdCommonDispatch+0xad8
09 fffff880`03b31720 fffff880`04b8acd7 rdbss!RxFsdDispatch+0x224
0a fffff880`03b31790 fffff880`018b8271 rdpdr!DrPeekDispatch+0x31f
0b fffff880`03b317e0 fffff880`018b6138 mup!MupiCallUncProvider+0x161
0c fffff880`03b31850 fffff880`018b6b0d mup!MupStateMachine+0x128
0d fffff880`03b318a0 fffff880`013006af mup!MupFsdIrpPassThrough+0x12d
0e fffff880`03b318f0 fffff880`018f794d fltmgr!FltpDispatch+0x9f
0f fffff880`03b31950 fffff880`013006af mfehidk!DEVICEDISPATCH::DispatchPassThrough+0x105
10 fffff880`03b319b0 fffff800`01be8707 fltmgr!FltpDispatch+0x9f
11 fffff880`03b31a10 fffff800`01be8f66 nt!IopXxxControlFile+0x607
12 fffff880`03b31b40 fffff800`018cc993 nt!NtDeviceIoControlFile+0x56
13 fffff880`03b31bb0 00000000`7721f72a nt!KiSystemServiceCopyEnd+0x13

Before rdbss rdpdr was used. This makes sense: RDP/RDS/RDSH is used A LOT on my server (which was and is a much used production RDSH with device redirection enabled and that’s used a lot too of course). RDPDR depends on RDBSS (as I’ve told you already). Perhaps RDBSS isn’t the real cause, but RDPDR is…

I must admit my offset for the RxFsdCommonDispatch function was 0xad7 instead of 0xad8, although the web page’s title also mentioned 0xad7 (weird…). Secodnly I’m not 100% sure if I had the same call stack (or part of it) before nt!KiExceptionDispatch+0xc2 and rdbss!RxFsdCommonDispatch+0xad7, but there is a very good chance.

Anyway, it seems a new version of rdpdr.sys did the trick according to the blog writer. The writer, Microsoft support engineer Rob Scheepens, was stuck even after a lot of tracing. He opened a support call at MS and they detected a bug. A pre-release fix was created and tested, including with Driver Verifier checking, in different environments. It seems this new driver version of RDPDR solved the problem. You see, the cause was not RDBSS, but RDPDR, even if the actual exception arose in the first driver’s code. The fix is related to KB2719704, but you can’t download or request the fix yet (except through a support call I suppose), not even at the Microsoft Premier support site. It’s expected this update will be spread through the update release cycle of July, so in about 20 days. You can wait for this cycle, open a support case or mail Rob Scheepens (see the web page for contact information). All this means that the update I had installed (KB2559767) was probably not the solution I needed. For completeness: the error seems to have a big chance to occur on a Remote Desktop Session Host (RDSH), but is not limited to this scenario.

I would like to thank Rob for his work and blog post (and of course Microsoft for creating the fix!). Just to end I would like to provide you with a few more screenshots about RDPDR, showing stuff I’ve described earlier in this article.

Figure 11

Figure 12

Personally I’ll try to contact Rob, in the hope to receive the fix and install it (except when this fix would still be the pre-release version; in that case I’ll wait for the next update cycle!). I’ll post an update to this post to let you know the results.

UPDATE: in the mean while the fix has been officially released. Microsoft’s KB article 2719704 (http://support.microsoft.com/kb/2719704) describes the issue and is available for download. If Driver Verifier is enabled the STOP error is 0x000000D5 (DRIVER_PAGE_FAULT_IN_FREED_SPECIAL_POOL, meaning a page fault has occurred in an already freed piece of Special Pool; Special Pool is a special feature Driver Verifier uses to detect bugs and is based on tagging). RDPDR tries to reference memory that’s no longer available. The file rdpdr.sys has been upgraded to version 6.1.7601.22014 (date: the 8th of June 2012). The issue seems to occur on Windows 7 and Windows Server 2008 R2. For the moment the KB article doesn’t seem to be accessible from the Microsoft Premier site, but this will probably be “fixed” very soon (the KB is very, very fresh at the time of writing :-)). Oh yes, it seems the update won’t be part of an automatic update cycle, so don’t wait for a Patch Tuesday! On the other hand, you should just install the fix if you experience the issue described.

Links:

Ciao!

Pedro

Advertisements

One thought on “Blue Screen (SYSTEM_SERVICE_EXCEPTION) on WS08R2

    James said:
    15/02/2013 at 01:28

    Very thorough and well documented article. We’ve experienced the same issue and found this article after I noted the dump reference to rdbss.sys. Thanks for the detail!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s