0

I have a problem that lasts for nearly half a year. PC hard-resets straight to BIOS POST (without BSOD) from time to time. PC isn't overclocked (at least intentionally). It doesn't look like it's connected to any specific activity i.e. may reset while just browsing Internet or playing a game or even in idle state at desktop. PC may work stable for a few days and then hard-reset five times a day.

My current OS is Windows 10. First thing I did was turning off automatic restarts and turning on writing minidumps. Then checked logs: https://pastebin.com/PPavraJZ

It's not clear what the problem is other than that it may be connected with loss of power.

Then I've took a minidump and put it into http://www.osronline.com/. Got https://pastebin.com/3aqeQNXi:

WHEA_UNCORRECTABLE_ERROR (124) A fatal hardware error has occurred.

I'm not sure how to interpret all that but to me it looked like hardware issue.

So thinking that it may be hardware issue I did the following:

  • Replaced old PSU.
  • Got memory out. Cleaned all slots with can of compressed air. Put it back.
  • Replaced thermal compound on CPU (used mx-2). Temperature dropped 5-6 degrees. At max load it now stays at about 70 degrees.
  • Updated to latest BIOS.
  • Reset all BIOS settings to AUTO or defaults.
  • Installed latest AMD chipset drivers. Selected optimized Ryzen power profile.
  • Removed all MSI software... just in case it overclocks something.
  • Ran memtest for half a day without issues.
  • Ran Prime95 for an hour without issues.

My current hardware:

  • Seasonic Titanium Prime 750W.
  • AMD Ryzen 1700 with stock Wraith Spire RGB cooler.
  • MSI B350 TOMAHAWK (MS-7A34). Latest BIOS (1.90 from 09/19/2017)
  • 16 GB DDR4 RAM (8+8). 1600.0 MHz (DDR4-3200 / PC4-25600) Corsair CMK16GX4M2B3200C16.
  • MSI GeForce GTX 1080 Aero OC 8GB DDR5X.
  • OCZ-VERTEX4 as primary SSD. Healthy.
  • Samsung SSD 960 PRO 512GB as secondary SSD. Healthy.
  • ASUS Xonar D2X.
  • USB keyboard and mouse.
  • KX-MB1500RU Panasonic Printer connected via USB.
  • Case is Thermaltake Tsunami Dream.

I'm out of ideas so asking for your help on how to diagnose and fix hard-resets further. Thank you.

Sam Dark
  • 101
  • 3
  • 1
  • Yes. That's what I'm doing more or less. There are two problems though: most hardware was delivered from Germany so replacing it would cost more than buying new (shipping costs + time). Since the problem isn't consistent, Ubuntu from LiveCD should be used for days or weeks and I have to use that PC for work. Thought that above info may trigger some memories to pop up about possible causes but thanks anyway. At least I am more sure there's no easy way :) – Sam Dark Jan 31 '18 at 08:53
  • 1
    use Windbg and the !errrec command with the value of arg2 to see why you get the 0x124 crash – magicandre1981 Jan 31 '18 at 16:18
  • Here it is: https://pastebin.com/1LMkbT5T – Sam Dark Jan 31 '18 at 23:58
  • 1
    ok, you have L1 Cache issue while reading data. The Ryzen is relatively new, so you have warranty and should RMA it – magicandre1981 Feb 01 '18 at 16:34
  • Yes. RMA is a possibility but is complicated. Will take about 2 months to get CPU back to German online shop etc. and I have to buy another CPU in a local shop meanwhile to have a working PC. How sure you are it's faulty CPU and not something else like fail to automatically set voltage by motherboard or GPU or memory causing it? – Sam Dark Feb 01 '18 at 23:11
  • 1
    convert the value 0xbe802800000c0135 from the status to binary and look if you can find in any AMD documentation what each bit means. do you have a friend with a compatible Ryzen CPU that you can test? if this cpu works your one is faulty. – magicandre1981 Feb 02 '18 at 15:24
  • Found docs. http://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf, page 181. The bank refers to Load-Store Unit. The error is uncorrectable ECC error that wasn't corrected by hardware. From the spec, I believe, memory refers to CPU cache and not regular RAM. Any idea if I'm correct? – Sam Dark Feb 02 '18 at 20:50
  • btw., sent a letter to AMD support pointing to this info. They've suggested doing sfc /scannow and after I confirmed that there were no errors, suggested to return processor. I hoped they'll dig into it at least a little bit :( – Sam Dark Feb 02 '18 at 20:53
  • No friends with Ryzen, unfortunately, so I have to buy another one to try it, I guess... – Sam Dark Feb 02 '18 at 23:25
  • Found and ran https://github.com/corngood/kill-ryzen-win. Crashed. Seems it's highly likely it's CPU. – Sam Dark Feb 04 '18 at 17:39
  • what is this tool doing? – magicandre1981 Feb 07 '18 at 16:31
  • Starts lots of C compilers in parallel, supplies em correct code and waits for exception. It doesn't happen in OK CPUs but does happen in faulty ones. – Sam Dark Feb 08 '18 at 12:07
  • btw., after presenting all this info to AMD they agreed on advance RMA so I'm on this faulty CPU till new one arrives then replacing it with OK one and sending it back to them. Proper way of doing RMA. – Sam Dark Feb 08 '18 at 12:09
  • 1
    ok, thanks for the feedback. reply again when you get the new one. btw, notify me with @ myusername. currently I have to check the topic on my own to see if you replied or not. – magicandre1981 Feb 08 '18 at 16:42
  • @magicandre1981 installed CPU received from AMD. Not using it enough, of course, but so far it works well. Tested it with kill-ryzen-win and prime95 for a few hours. – Sam Dark Feb 21 '18 at 13:01
  • ok, thanks for the feedback, so the CPU was faulty. – magicandre1981 Feb 21 '18 at 15:52
  • @magicandre1981 OMG, with new CPU it just hard-reset :( Crash dump is similar. The whole RMA seems wasn't helpful at all except fixing rare bug reproduceable with kill-ryzen. I'm lost again. – Sam Dark Feb 22 '18 at 10:06
  • do you have a different motherboard which you can test? – magicandre1981 Feb 22 '18 at 14:31
  • No extra hardware at hand. The problem is so frustrating that I don't mind spending extra on new MB if that may solve it. Which MB would you suggest to be hassle-free? – Sam Dark Feb 22 '18 at 15:48
  • some days ago an user also had 0x124 crashes and fixed it by disabling fast Startup of Windows 10. do you use this? if yes, also disable it in control panel->power settings – magicandre1981 Feb 25 '18 at 18:13
  • @magicandre1981 removed ASUS Xonar D2X from the case. Stable so far. There are messages about it causing BSOD w/ Nvidia cards so maybe CPU L1 cache BSOD is somehow caused by it despite of how weird it sounds... – Sam Dark Feb 26 '18 at 14:52

0 Answers0