PDA

View Full Version : MiniRosetta problems



mitchellds
04-30-2009, 12:26 AM
Looks like there are some current problems, I'm having alot of WU die. Other forum users are having the same related problems. Right before our race to... :(

liuqyn
04-30-2009, 12:36 AM
same here, and when they crash they seem to also take GPUGrid wu's with them(on my boxes that run both).

Nflight
04-30-2009, 12:40 AM
Just as you have mentioned I am starting to receive Ralph Work Units. A horde of them came today to clog my computers work load. I have had the Ralph Project open to receive for weeks and nothing then today they hit with force. They are long work units, but I have not had any go dysfunctional like your suggesting.

More comments are needed to figure out if your race is on or extended a week to work out the kinks and hurdles that lie ahead of you all in your desire to RACE for the great equipment. Good Luck Everyone! :blob3:

mitchellds
04-30-2009, 12:41 AM
yes, I'm seeing similar GPUGrid wu's problems

AMDave
04-30-2009, 01:04 AM
That happened to me last night too.
Now I check the Ralph tasks for the machine the GPU is on I see that the Ralph task succeeded but it coincides with the time the GPUGrid task failed.
I stopped Ralph on that machine for the moment.

Unless it is a coincidence and we have a batch of dodgy GPUGrid wu's at the same time as the Ralph release.
I don't have the debug log for the GPUGrid wu so I can't correlate the problem to here http://ralph.bakerlab.org/forum_thread.php?id=446
It would take some concerted effort to prove this.

As good as I can give is
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9600 GT"
# Clock rate: 1600000 kilohertz
# Total amount of global memory: 536150016 bytes
# Number of multiprocessors: 8
# Number of cores: 64
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce 9600 GT"
# Clock rate: 1600000 kilohertz
# Total amount of global memory: 536150016 bytes
# Number of multiprocessors: 8
# Number of cores: 64
# Using CUDA device 0
# Device 0: "GeForce 9600 GT"
# Clock rate: 1600000 kilohertz
# Total amount of global memory: 536150016 bytes
# Number of multiprocessors: 8
# Number of cores: 64
Cuda error: Kernel [fft_data_swizzle_out] failed in file 'CPME_cufft.cu' in line 94 : unspecified launch failure.

</stderr_txt>

How the Ralph WU could cause that is not clear, but the coincidence leaves me wondering.

NeoGen
04-30-2009, 09:10 AM
More comments are needed to figure out if your race is on or extended a week to work out the kinks and hurdles that lie ahead of you all in your desire to RACE for the great equipment.
You're right Nflight.
I'm disappointed that a Rosetta@Home has problems like this when they have a side project specifically created for beta testing.
If there is no solution I'll have to propose another project for first race, and maybe delay the contest start one more week. :-(

NeoGen
05-01-2009, 06:23 PM
Guys, are the problems still around, or have they fixed it?

If they're still around I'll have to change the initial project I guess...