Remote Desktop Session Host has a BSOD (PAGE_FAULT_IN_NONPAGED_AREA)

Posted on Updated on

One of our Remote Desktop Session Hosts (RDSHs) experienced a blue screen (Blue Screen of Death or BSoD) a few weeks ago. I didn’t actually see the BSOD, but I was aware due to the following event being logged in the System event log at boot time (well, only after the BSOD had occurred of course):

Event ID: 1001

Description: The computer has rebooted from a bugcheck. The bugcheck was: 0x00000050 (0xfffffa803f166210, 0x0000000000000000, 0xfffff8800b7257e7, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 050212-24320-01.

Source: BugCheck

Level: Error

User: N/A

Computer: YOUR_SYSTEM

Logged: DATE_AND_TIME

Task Category: None

Keywords: Classic

OpCode: Info

Figure 1

Figure 2

Figure 3

This error was logged only a few times, so it seems we were dealing with a relatively unfrequent (but not exceptional!) situation. I had to find out what was the cause and of course even more important, the solution or at least the workaround.

The event tells us the system has rebooted as a result of a bugcheck (= bug check), which is the trapping of a bug of the “category”, including the showing of the BSOD which informs the user/administrator of the bug. During this Blue Screen a dump has been created (C:\Windows\MEMORY.DMP). Note this path can differ if you have configured it differently (see the “System failure” section in the Startup and Recovery” configuration dialog, which you can achieve by clicking the button “Settings” in the section “Startup and Recovery” on the Advanced tab of the System Properties dialog box). This “report” has got the ID “050212-24320-01” with 050212 meaning the error occurred on the 2nd of May 2012. Again, this report Id is different everytime. The bugcheck discovered a bug with bug check code (STOP error) 0x00000050, which is a hexadecimal notation for PAGE_FAULT_IN_NONPAGED_AREA (http://msdn.microsoft.com/en-us/library/windows/hardware/ff559023(v=vs.85).aspx). The first parameter is also different every time the error is logged.

If a memory address is referenced and it’s not in RAM, we have a so-called page fault and the content of the address is paged in to RAM, so it can be effectively used. If the memory address refers to the non paged pool (NPP) memory area though, this causes an exception, because non paged memory always resides in real physical memory (RAM) and should never be paged. Possibly a referral to an incorrect NPP memory address occurs because it once existed, but is freed in the meanwhile. Whatever the reason is, it’s always an incorrect usage and thus a bug. Because this happens in kernel mode and can impact the whole system, the system should restart to maintain integrity (although it’s no fun of course), including showing the BSoD.

Side note: the event source BugCheck is technically registered with the name “BugCheck” (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\eventlog\System\BugCheck). No DLL is configured here, but there is a referral to a GUID ({ABCE23E7-DE45-4366-8631-84FA6C525952}) for more information through the REG_EXPAND_SZ named value providerGuid. A provider with this GUID is registered in the registry at the location HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Publishers\{ABCE23E7-DE45-4366-8631-84FA6C525952} (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Publishers is the place where event providers are registered), which contains the provider’s name (Microsoft-Windows-WER-SystemErrorReporting) and the code file that is used for the event logging (%SystemRoot%\System32\WerFault.exe as indicated by “%SystemRoot%\system32\werfault.exe”). It’s obvious BugCheck events are logged by the Windows Error Reporting (WER) component, more precisely the part of WER that deals with “system errors” (which cause BSoDs), as already indicated by the provider’s name (Microsoft-Windows-WER-SystemErrorReporting) and the executable (it’s “commonly” known werfault.exe implements that part of WER).

The first parameter is the incorrect memory address, while the third parameter is the memory address containing this incorrect referral. Parameter 2 determines if the attempt was a read (0) or write operation (1), while the fourth parameter can be ignored, because it’s reserved. In my case every attempt was a read operation (0x0000000000000000), attempted from the very same memory address (0xfffff8800b7257e7), but always with a different incorrect target address (in my example that’s 0xfffffa803f166210). The 4th parameter is always 0x0000000000000000.

When I opened the dump with WinDbg, part of Debugging Tools for Windows, and took a quick look (through the command “!analyze –v”) I was able to get much more information:

Figure 4

Figure 5

It seems the source address belongs to a driver implemented by pnrdpwd.sys. I know it belongs to the client part of Quest Software’s vWorkspace, a Citrix XenDesktop/XenApp alternative (“pn” stands for “Provision Networks”, the company that originally developed vWorkspace and was acquired by Quest Software). This client part is installed on every Terminal Server (TS) and Remote Desktop Session Host (RDSH) we have. WinDbg also tells us the bug was triggered in a process based on the pntermhlp.exe image.

We are running vWorkspace 7.1.301.358, which is the same as 7.1 MR1 (“MR” stands for “Maintenance Release”). In theory this version isn’t supported for Windows Server 2008 R2, the OS we are running on our RDSHs. So it could be the driver has a bug that doesn’t occur on supported Windows versions. It’s best to upgrade, but that’s not a real option for us, because we are moving away from vWorkspace and planning to migrate to Citrix’ XenDesktop. But perhaps we can exclude the driver (pnrdpwd.sys) and/or the executable (pntermhlp.exe)? To answer that question we need to know what those files actually are (doing).

pnrdpwd.sys is Quest’s alternative to rdpwd.sys, a Microsoft driver, described as “RDP Terminal Stack Driver” (just a normal driver from the “stack of all built-in client drivers for ‘Terminal Services'”, which doesn’t really tell us a lot…). This driver is called the RDPWD driver and is used for the RDPWD device (the RDP Winstation Driver device). The device receives multi-channel data, unwraps it and forwards it to the right session on a target system. It also takes care of the received the mouse and keyboard input in particular. RDPWD also plays a role in output: the “remote session display” captures the “screen” to a format that can easily be converted into the RDP protocol. RDPWD is the device that takes care of this conversion. It’s registered as a driver Windows service in the registry through HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\RDPWD. The driver is loaded in the “System process”. The functions of RDPWD are the reason the device is called RDP Winstation driver, with Winstation meant to be interpreted as a so-called “window station”, which is basically the environment/container, including virtual input and output devices, for a session (so the RDPWD device is the device that “holds” the different window stations). HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp contains the string named value WdDLL, with “rdpwd” as its value. This key contains windows station settings for remote connections through the RDP-Tcp connection (the default RDP connection). HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations\Console\RDP contains the same, but for console/administrative sessions (starting from Vista console sessions don’t exist anymore and are kinda replaced by administrative sessions).

Figure 6

Side note: the “remote session display” device is the RDPCDD device, described as the RDP Miniport, implemented by RDPCDD.sys (in %windir%\System32\drivers) and registered as a driver Windows service in the registry (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\RDPCDD). “RDPCDD” stands for “RDP Chained DD” (RDP Chained Display Driver). I guess the “Chained” part refers to the fact a series of drivers (driver stack) is used by this device. To be more precise, RDPCDD uses RDPCDD.sys and also the driver rdpdd.dll (in %windir%\System32), the image for the RDPDD driver, described as and standing for RDP Display Driver and the actual core display driver (hence the fact RDPCDD is described as a miniport driver). RDPDD is registered as a driver Windows service in the registry: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\RDPDD. The RDPCDD device is also known as the “RDP Chained DD” device. The “System process” detects the RDPCDD device and loads the corresponding RDPCDD driver (RDPCDD.sys) and RDPDD driver (rdpddl.dll) into the “System process” (just like the RDPCDD registry key refers to RDPDD too).

If vWorkspace’s client is installed, this is extended with the PNRDD driver, implemented by the PNRDD.dll and also loaded in the “System process”. The file is located in %windir%\System32 and is described as “Quest RDP Display Driver”. “PNRDD” stands for “Provision Networks RDP Display Driver”. It’s not registered as a separate driver Windows service though, so there is no HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\PNRDD key.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\VIDEO shows RDPDD and PNRDD are the drivers for the real actual display driver stuff:

  • RDPDD: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\VIDEO\rdpdd
  • PNRDD: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\VIDEO\pnrdd

Anyway, “PNRDPWD” works analogously to RDPWD. It’s another device though, called PNRDPWD, controlled by the PNRDPWD driver, implemented by pnrdpwd.sys (in %windir%\System32\drivers) and described as “Quest Terminal Stack Driver” (just a normal driver from the “stack of all Quest vWorkspace client drivers for ‘Terminal Services'”). It’s registered as a driver Windows service in the registry through the key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\PNRDPWD and is loaded into the “System process” as well.

Figure 7

Figure 8

Figure 9

WdDLL in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp now has the value “pnrdpwd”, although WdDLL in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations\Console\RDP is still having “rdpwd” as its value. This means administrative sessions are still using the default RDPWD device for window station usage, while “normal” remote connections are using Quest’s alternative, i.e. PNRDPWD.

Figure 10

Figure 11

The process within pnrdpwd.sys was crashing, was based on the image pntermhlp.exe. This file resides in %windir%\System32 and is described as “Quest Terminal Services Helper Service”. It’s the executable used by the similarly named Windows service and is described in the registry under the key “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Quest Terminal Services Helper Service”.

Figure 12

Figure 13

Figure 14

To summarize, we can say that the Quest Terminal Services Helper Service Windows service is doing something on a certain moment within the PNRDPWD driver, causing unused or already free NPP memory to be read, which, of course, fails and throws an exception with STOP error 0x00000050, resulting in a Blue Screen, dump creation, reboot and the BugCheck event.

Okay, we know what we’re talking about now. The questions that pops up now is if we can get rid of PNRDPWD and/or Quest Terminal Services Helper Service. Till now the bug only appears within pntermhlp.exe, so I guess getting rid of Quest Terminal Services Helper Service would be enough, except when this service would be really necessary for PNRDPWD to work decently. I can’t find any information about Quest Terminal Services Helper Service, so we can only try out experimentally if disabling the Windows service keeps everything working (although I ask myself then what the service is actually doing…). So, because migrating to a newer vWorkspace version isn’t a real option right now for us, this is what I had to test. So I disabled the service, restarted the server, crossed my fingers and took a look: would everything still work? Or not…? Well, I’ve tried it out and at first sight it doesn’t seem to give me problems or feature reduction. But of course I have to wait for a longer time to be more sure of this. If I know more, I’ll let you know with an update to this post.

Side note: I know I’ve written a lot about COM lately, but be assured, Quest Terminal Services Helper Service has nothing to do with COM J

Links:

Greetz,

Pedro

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s