WCG News [Archive] - AMD Users.com - Distributed Computing Team

View Full Version : WCG News

AMDave

10-14-2011, 08:23 AM

Planned CEP2 Upload Server Outage Monday at 15:00 UTC [Completed]
There is planned outage of the CEP2 Upload server Monday at 15:00 UTC. This outage is scheduled for 1 hour. You may get messages if you try to upload during this time that the server is down. There is no action needed by the member as your computer will automatically attempt to resend the results.This outage will only affect CEP2 uploads.Seippel

More... (http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,30494)

SuicideCabbage

10-24-2011, 11:53 AM

Speaking about CEP2 I just noticed that a large number of WUs across all my rigs have been erroring out with code RC = 0xc0000005 or RC = 0x1 mostly after job 11. What does this mean and is it something on my end or did we get hit with a bad batch of units? I remember in the early days RC = 0x1 would pop up a lot, don't remember what it meant and I think it cleared up on it's own.

AMDave

10-24-2011, 02:00 PM

"0xc0000005" is an Unhandled Exception or memory access violation.
In the realm of BOINC this would be caused by a bug in the client app.
(although you could also generate it in some client apps with wild parameters from input files, but we'd still call that a bug in the app)
It is possible that their project news post (above) is related to this.
I think Jason1478963 can advise better on this one. (PM sent)

Jason1478963

10-25-2011, 08:35 PM

What else are you running with CEP2? I try not to run more then 3 at a time as the I/O starts to create a difference between cpu time and wall clock time. I have had errors here and there, but nothing all that consistent. I did notice on a few linux machines when a work unit errors you may error out all that were running at that time. They are still having some issues creating the work units for dsfl and you will see some of these error when they try an start. HCC and HFCC seem to run well with CEP2 for me. I am running all but C4CW on most of mine at the moment and most of my errors have been on HPF2

SuicideCabbage

10-25-2011, 10:07 PM

The only WCG project I am running is CEP2, however all machines that run just that have either a VelociRaptor, RAID, or SSD. The few running a standard 7200RPM have other projects besides WCG running. I knew about the discrepancy with CPU time to wall time, but the worst I have seen is a 30m difference on the quad with a 7200RPM RAID 0 (30m on 10+h workunit is not too terrible)

I just looked again, and still the vast majority in the past 48h (upwards of 90%) are still failing across all machines with "Application exited with RC = 0xc0000005" and the occasional RC = 0x1 always after job 11. I did just set up a new machine but it's happening on rigs that I've ran CEP2 on for a year with no problems. That's making me think it's not on my end.

Jason1478963

10-26-2011, 03:53 AM

I am trying to only run 3 at a time and haven't seen many errors lately. I did try and load a dual quad up with clean energy a few weeks back and it seemed to error out more work units when running 8 at a time. This could be a sensitive batch of work units as well. I think I errored 2 recently, but I've had my power glitch causing it to reboot machines and think that may be part of the problem.

Jason1478963

10-26-2011, 12:58 PM

I'm not finding anybody else with recent errors in the WCG CEP2 forum. I would think the SSD and raptor setups would be fast enough to keep the I/O issue down. It is always nice to be able to run your favorite sub project, but it seems they don't play well with multiples running from my experience running eight at a time on my dual opteron setup a few weeks ago. With my mix and trying to bring other projects up to 10 years of crunch time for future badges three CEP2 at a time seem to run well. When they bring clean energy back online I can try and fill a machine or two up with clean energy to see if I see more errors again. In the mean time we might want to post some info for the WCG techs to verify if they are seeing issues with other crunchers after job 11.