Friday, November 28, 2014

Finally Fixed the Problem

I finally fixed the BSOD problem on my machine: I just needed to update the chipset drivers.

I was originally convinced that I had a bad GPU, so I sent it back to XFX for repair.  They performed many tests and couldn't find a single flaw.  Fortunately, one of the technicians there recommended updating the motherboard chipset drivers.  For some reason, this was about the only driver that I hadn't thought to update in my previous testing.  After they returned the GPU to me, I installed it and updated the chipset drivers, then did a clean reinstall of the latest beta catalyst drivers.  Zero problems now, and it's been about a month of heavy use since.  No more atikmdag.sys blue screens, or any others for that matter.  A big thanks to XFX support for helping me with this.

This is the chipset driver I installed (for my P7P55D-E Pro motherboard with a Core i5 750 CPU, but it should work for any intel chipset):

https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=20775


Sunday, July 6, 2014

R9 290 atikmdag.sys BSOD Troubleshooting

Edit: the fixes below sadly only helped for a few weeks, then the blue screens were back again.  As of now, I believe the card itself is damaged in some way.  Quite the frustrating problem. 

After installing my Radeon R9 290 video card in my older P55 chipset motherboard (p7p55d-e pro), I encountered a persistent BSOD (blue screen of death) relating to the atikmdag.sys driver.  Googling this showed many people with similar problems and no consistent solution.  After a lot of research, trial and error, and voodoo, I think I've finally fixed the problem.  If you're encountering similar issues, try the steps below:

  • The BSOD occurrences seem to be fairly independent of Catalyst driver version.  I tried many, but just settled on the latest (as of 2014/7/6) 14.6 beta.  
    • Also, having Catalyst Control Center installed or uninstalled made no difference for me. 
  • Make sure your power supply can handle the R9 290.  It can require up to about 350 watts peak, so you'll need a 12V source stable at about a 30 amp draw.  Beware multi-rail PSUs... you may have to run two different rails to the R9 290, but this could cause problems.  I'm currently using a Corsair HX850 that has a single 70 amp +12V rail, and it's working well (switched from my older multi-rail 850 watt supply that wasn't cutting it).  
  • Reset all GPU settings to default in the Catalyst Control Center, especially under the "3D Application Settings" section.  The "Wait for Vertical Refresh: Always On" setting I had previously enabled seemed to increase the BSOD frequency (though wasn't the entire source), so the default "Off, unless application specifies" may be better.  Just enable VSYNC in each game instead. 
  • Make sure your motherboard BIOS is up to date.  
  • Enable ACPI 2.0 in the motherboard BIOS.  Seems to reduce S3 sleep resume problems with the card.  I also changed the suspend mode to be permanently set at S3 instead of Auto... not sure if this helped. 
  • In Windows 7, go to Power Options -> Change plan settings -> Change advanced power settings -> PCI Express -> Link State Power Management, and change this to "Off".  I believe that the default setting allowed Windows to reduce power on the PCI Express bus, which the power-hungry R9 290 didn't appreciate. 
    • For a while I was getting the BSOD only after resuming from sleep, but this change coupled with the ACPI 2.0 enabled BIOS fix above seems to be preventing all post-sleep troubles.  
  • Play any Adobe Flash video (on youtube or the like), right click on the video, choose Settings, and disable "Enable hardware acceleration".  Flash doesn't behave very well with hardware acceleration, and can cause blue screens.  
  • If you use Firefox, open Options, choose Advanced, and disable "Use hardware acceleration when available".  This has helped some people with blue screens happening while web browsing.  
These changes have so-far prevented the atikmdag.sys BSOD from re-appearing.  I'll supply updates if I encounter any troubles again, and any potential fixes.  I believe the link state power management and ACPI 2.0 changes made the biggest difference.