0

First off, spec list:

  • OS: Windows 7 Ultimate 64-bit SP1
  • CPU: i7-4820k @ 3.7GHz (stock)
  • GPU: Two 3GB Radeon HD 7970s @ 1.05GHz
  • Mobo: AsRock X79 Extreme6
  • HDD: 2TB Seagate Barracuda 7200rpm
  • RAM: 16GB quad-channel Kingston 1600MHz
  • PSU: Antec HCG 900W
  • Monitors: Acer S220HQL 1920x1080 + ViewSonic VA2251 1920x1080. Plugged into different GPUs.

My problem is that, on a daily-ish basis, my monitors will turn off and not turn back on. My computer will still be running, GPU/CPU/case fans all still going, but the monitors will not turn back on. Additionally, it seems to cease all network activity. It doesn't seem to log any errors at all. I've verified that this is not a monitor issue, as when I press the num/caps/scroll lock buttons on my keyboard, the lights don't change, so the computer is clearly not accepting input.

I have noticed a few other people on the internet with this problem, and some have claimed that it was solved by disabling PCI-Express Link State Power Management, but the issue still occurs for me after this. Whilst my CPU and GPUs both run at 100% 24/7, the temperatures are certainly not at dangerous levels, with the CPU averaging 65°C and the GPUs at 70°C and 78°C average. All components are brand new.

I have tried forcing MSI Afterburner to start when Windows starts and to force a constant voltage, as this fixed the issue for a few days for another user, but he reported back saying that it had stopped working properly again, so I'm not putting too much faith in this working.

Many people have said to adjust display sleep mode settings, but this will clearly not work, as the keyboard lights would still work if the monitors were the issue.

The closest I can get to a log file for this issue is the following Folding@Home logs:

14:45:21:WU01:FS00:0x17:Completed 1120000 out of 2000000 steps (56%) 14:46:43:WU00:FS01:0x17:Completed 480000 out of 2000000 steps (24%) 14:46:49:WU01:FS00:0x17:Completed 1140000 out of 2000000 steps (57%) 14:48:30:WU01:FS00:0x17:Completed 1160000 out of 2000000 steps (58%) 14:49:55:WU01:FS00:0x17:Completed 1180000 out of 2000000 steps (59%)

As you can see, the second GPU (FS01) stops computation approximately three and a half minutes before the issue occurs (it should be completing 1% every 80-120 seconds), and the first GPU (FS00) continues for a few minutes more before the logs just end. As far as I can tell, the computer has a network failure at the time the first GPU stops working, the latest IRC message I received from this time was at 14:47:58. That being said, there could have just not been any messages between then and 14:50:00, so I'm going to be connecting a laptop to the same bouncer to double-check if it happens again.

The GPUs functioned perfectly well in another computer for a significant period of time, so I'm fairly confident that they aren't the issue, which means that this is being caused by either software or the motherboard, or possibly RAM. I really hope it's software.

I heard from a forum board that there was a patch from Microsoft that fixed this problem, but "I've forgot which KB it was or the google search terms I used to find the patch, LOL.", so that's not much help. Haven't seen it mentioned by anyone else on about a dozen threads about this issue either.

The computer is plugged in via a surge-protected power board, and I've run several other computers and pieces of hardware through it with no issues, so that is not the cause.

I have just set the hard disk to never turn off, although I don't believe that that will solve the issue.

Strangely, this has only happened when I'm not at the computer (which is actually a minority of the time). Until today it had only happened when I had not been actively using the computer for >6 hours, but today it happened within 10-30 minutes of me last using the computer actively.

I have enabled file logging from MSI Afterburner, so hopefully this will shed some light on the issue, but I'm not too optimistic. I've heard that it could be a motherboard problem, but I figured I should ask around before RMAing it.

Any help?

Sellyme
  • 116
  • 6
  • Test drives, test RAM (for days), try a known good PSU, test fresh install of OS. – Ƭᴇcʜιᴇ007 Oct 30 '13 at 15:56
  • 1
    possible duplicate of [How do I troubleshoot a Windows 7 freeze or slowness?](http://superuser.com/questions/26862/how-do-i-troubleshoot-a-windows-7-freeze-or-slowness) – Ƭᴇcʜιᴇ007 Oct 30 '13 at 15:57
  • @techie007 This /is/ a fresh OS install. I can't test a known good PSU, because I don't have another $200 lying around. That said, the Antec HCG 900 is fairly reliable, and should do the trick (TDP of the 7970 is meant to be 250W, and 130W for my CPU). Am testing RAM now. – Sellyme Oct 30 '13 at 16:08
  • 1
    Disable folding@home. It is probably crashing a core. the monitors going dark is probably the gpu/vpu attempting a recovery. – horatio Oct 30 '13 at 16:09
  • If you want to fix your own computer, sometimes you need to invest in spare parts, if only to rule out suspects. – Ƭᴇcʜιᴇ007 Oct 30 '13 at 16:10
  • @horatio Disabling F@H is not happening, seeing as that's the primary reason I bought this computer. The F@H software itself is extremely stable and has been run on these cards for a very long time with no issues. There is clearly an issue that should not be occurring, changing my usage to accommodate that isn't an option. – Sellyme Oct 30 '13 at 16:20
  • @techie007 Also, that question you marked as a possible duplicate certainly isn't, that's referring to recoverable freezes, not permanent ones necessitating a reboot. – Sellyme Oct 30 '13 at 16:33
  • Yeah it was more to point you at some other things you can check/try, since you haven't really told us what actual repairs you've tried already. Have you tested you RAM using MemTEst86+ for 48-72 hours yet? No? Then that's your first, easy, cheap step. ;) – Ƭᴇcʜιᴇ007 Oct 30 '13 at 16:39
  • @techie007 I've barely had this computer for that long, so no, I haven't tested it. I'm likely going to test the PSU first if the situation occurs again, by only having one 7970 plugged in. If that works fine for over 72 hours, I'll look at getting a new PSU, but I don't think it should be an issue. If that doesn't work, I'll swap to different RAM, to prevent my computer being unusable for several days due to MemTest86+. If /that/ doesn't work I'll try running MemTest86+ in the unlikely situation that both sets of RAM are bust. – Sellyme Oct 30 '13 at 16:53
  • I think it is obvious that one of the things you do to diagnose a problem is turn things off. Once you determine that folding is not your problem, you can rule it out as a cause. It is also self-evident that folding is doing dynamic calculations on dynamic data. The fact that it worked fine before is not a guarantee that it is not causing a problem. I think f@h can go without your undying support for a day or two while you rule it out. – horatio Oct 30 '13 at 17:03
  • @horatio I have no doubt that the heavy load is going to be a likely cause of the issue. I will test heavy load on other applications just to be sure that it's not miraculously somehow F@H's kernel breaking in this exact configuration for some completely unknown reason, but I'm not sure how any result of this would be of much use. – Sellyme Oct 30 '13 at 17:29
  • The point is not to worry about what to do, but what is causing it. Then you can decide how to proceed. The KB you can't remember may be related to loopback addresses: folding@home seems to have had problems with network dropouts hanging the client, which might hang the gpus. If you disable f@h *in order to see if it continues to hang*, and it stops hanging, you will know to focus your efforts on the interactions between f@h and your computer configuration. If it continues to hang, you know you need to look elsewhere. Even if the loopback issue is present, it may be a LAN issue. – horatio Oct 30 '13 at 18:14
  • @horatio For clarity, the KB isn't something I can't remember, it's something a forum user who last posted in 2011 can't remember. The issue is definitely nothing to do with network dropouts hanging the client, rather that network dropouts are a side effect of the computer hanging; I get network dropouts constantly with no ill effect. – Sellyme Oct 30 '13 at 18:15
  • If it only happens when you leave the computer alone could it be monitor powers a being features causing it? Try disabling screensavers and/or stop the computer blanking the monitor display. – Mokubai Oct 30 '13 at 22:05
  • @Mokubai I highly doubt that the issue, it doesn't occur every time the monitors go into standby, and as said, the keyboard doesn't respond, which shouldn't happen if it's a monitor issue. Additionally, these exact monitors worked fine on a different computer. – Sellyme Oct 31 '13 at 07:51
  • @SebastianLamerichs That is fine, I was just thinking it was a possibility. Not that it would be due to the monitors themselves but more to do with the graphics cards going into a semi-sleep state when they disable their monitor outputs. If it doesn't always happen then it's less likely to be that though. – Mokubai Oct 31 '13 at 20:06
  • @Mokubai Yeah, I've looked into that, and if it only happened occasionally at the second the monitors go into sleep it'd probably be a possibility, but seeing as it only happens several hours after they go into sleep, I don't think that that's the issue. I don't believe that it's ever happened while the monitors are active, but that's likely because it only happens when the computer is idle, and the monitors being idle is just something that tends to be correlated. – Sellyme Nov 01 '13 at 05:44

1 Answers1

0

Q: Have you tested with memtest for 48 - 72 hours?
A: I've barely had the computer for that long...

So what you've got is a brand-new, home-built computer that has never run stably and is crashing. I doubt there's a MS KB that will fix that. You simply need to work methodically through the hardware, updating drivers and swapping bits out until it stops crashing.

Kevin Panko
  • 7,346
  • 22
  • 44
  • 53
BlueCompute
  • 159
  • 4
  • The main thing that's confusing me is that it only seems to occur when the computer is inactive. It has occurred 5 times in 6 days, but only when I'm away from the computer. Seeing as I've been actively on the computer for a good 95+ hours of that time, that's quite statistically unlikely, which is why I'm not yet 100% convinced that it's a hardware issue. – Sellyme Oct 30 '13 at 20:37