RC5 CUDA Beta3 [Archive] - AMD Users.com - Distributed Computing Team

Brucifer

01-26-2009, 06:52 AM

The new 509beta3 CUDA client is out for linux, linux-64 and windows. Be SURE that you run the -bench to check which core to run. The default core is not the fastest, at least it wasn't in the case of my GTX260. I haven't tried the others yet. :icon_mrgreen:

Bender10

01-26-2009, 11:57 AM

Doh..!!

Post your -bench or best core rate here...

dnetc v2.9103-509-CTL-09010508-*dev* for Win32 (WindowsNT 5.2).

8800GT

RC5-72: using core #0 (CUDA 1-pipe 64-thd).
RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd) 0.00:00:14.34 [303,924,061 keys/sec]
RC5-72: using core #1 (CUDA 1-pipe 128-thd).
RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd) 0.00:00:14.28 [303,004,825 keys/sec]
RC5-72: using core #2 (CUDA 1-pipe 256-thd).
RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd) 0.00:00:14.57 [296,497,675 keys/sec]
RC5-72: using core #3 (CUDA 2-pipe 64-thd).
RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd) 0.00:00:14.07 [306,849,590 keys/sec]
RC5-72: using core #4 (CUDA 2-pipe 128-thd).
RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd) 0.00:00:16.76 [256,428,015 keys/sec]
RC5-72: using core #6 (CUDA 4-pipe 64-thd).
RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd) 0.00:00:14.00 [308,648,797 keys/sec]
RC5-72: using core #7 (CUDA 4-pipe 128-thd).
RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd) 0.00:00:16.31 [263,862,250 keys/sec]
RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd bus ... 0.00:00:14.32 [301,958,331 keys/sec]
RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sl ... 0.00:00:16.07 [268,257,385 keys/sec]
RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dyna ...
RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sl ... 0.00:00:16.04 [270,003,024 keys/sec]
RC5-72 benchmark summary :
Default core : #-1 (undefined)
Fastest core : #6 (CUDA 4-pipe 64-thd)

vaughan

01-28-2009, 07:04 AM

I'm trying the Win32 CUDA beta client on a Win64 Q9550 and it seems to be extremely slow.

117Mkeys/sec, 9800GTX, core 3

Box is also running 2x Wieferich, 1x NPLB and 1x BOINC PrimeGrid 321 project.

Bender10

01-28-2009, 11:26 AM

Vaughan,

Did you do a -bench and manually select the fastest core?

Is the 117 Mkeys, your 'time per completed unit', or Average key rate?

Brucifer

01-28-2009, 05:45 PM

I'm no mathematics/programming wundergeek, but offhand I would have to say that I'm not surprised that the win32 client is slower as it would seem to me that a 64 bit O/S could pass data twice as fast to the gpu as a 32 bit O/S. In actuality it may not be that restricting, but there has to be an impact on the ability to feed the gpu. ????

liuqyn

01-28-2009, 06:16 PM

not so sure about that, I have two 9800 GTX+ GPUs, one running on a 5600+ 2.8 Ghz vista 64bit, the other on a 5400+ 2.8Ghz XP 32bit, and they run nearly identical rates.

NeoGen

01-28-2009, 06:55 PM

I'm no mathematics/programming wundergeek, but offhand I would have to say that I'm not surprised that the win32 client is slower as it would seem to me that a 64 bit O/S could pass data twice as fast to the gpu as a 32 bit O/S. In actuality it may not be that restricting, but there has to be an impact on the ability to feed the gpu. ????
The eternal 32/64 bit speed myth... :icon_lol:
The compact version --> there is little or no speed difference in data transfer rate between 32 and 64 bit OSes.

Now the long version...
The main 32/64 bit difference in OSes is the address range for memory. In 32bit OSes the maximum memory one can have is 4Gb (without paging or other tweaks).
This is rather easy to explain, a 32bit number is a number in binary form with 32 digits, and the highest value one can write in binary form with 32 digits is 4,294,967,296 (converted back to decimal of course). You can verify this by doing in a calculator 2^32. That is the maximum number of bytes you can count with a 32bit long number.
If you take that big number and divide it three times by 1024 (to get kilobytes, megabytes, and gigabytes) the end result is "4" (Gigabytes).
Each cell in RAM memory has to have an address to be usable, and the address in this case is a 32bit number, so the maximum addresses you can have in 32bit OSes is up to 4Gbytes. RAM Memory beyond that would not be able to be addressable, and thus not usable. (Nowadays RAM beyond 4Gb is usable, using certain tricks that were developed along the years.)
This works exactly like if you have a long street with plenty of houses but with no door numbers beyond 30. (Rest of the houses have no number) You can't deliver something addressed to number 40 of that street if houses are not numbered up to that. Remember that for machines, guessing is not an option. :icon_wink:

In 64 bits all this changes as now the maximum addressable memory is a huge number that you can see if you do 2^64.

The funny trick we all wish they would use to double the data rate would be to pass two 32bit numbers together in one 64bit value to the GPU, right? :icon_mrgreen:
Unfortunately that could become very complicated software wise, if not impossible due to OS restrictions. But even if it was possible we would hit the 4Gb barrier again somewhere... :icon_rolleyes:

Nflight

01-28-2009, 07:44 PM

Bravo What an Explanation there NeoGen ! :blob3:

Bender10

01-28-2009, 10:36 PM

Vaughan,

Which video driver are you using? You may have to go back 1 or 2 versions....That may work. I'm not sure.

AMDave

01-29-2009, 01:40 AM

strange and wierd.
my client 'appears' to be working
BUT all my results seem to have gone ito a void.
The stats site show that I have returned nothing.
I killed the client remotely until I can take a closer look at what exactly is going on after work.

Jerod Vandehey

01-29-2009, 01:47 AM

Ok, so a dumb question here... I have a 8800 GTS, but it is on a w2k machine. Can I participate in cuda?

AMDave

01-29-2009, 04:07 AM

strange and wierd.
my client 'appears' to be working
BUT all my results seem to have gone ito a void.
The stats site show that I have returned nothing.
I killed the client remotely until I can take a closer look at what exactly is going on after work.
I found that Razor has already posted about this on FDC
http://www.free-dc.org/forum/showthread.php?p=132149#post132149

Dnet have 'held back' the stats for the 'big' packets becasue they need to tell the stats what a big packet is and what it is worth. At present their stats server looks at a 64-stats-unit packet as worth 1 instead of 64 for example.

Anyway, they (Dnet) already know about it and now we know about it too :)

vaughan

01-29-2009, 07:44 AM

I'm so glad you posted a similar experience to me Dave. I have Doomeva's twin 9800GTs running under Gentoo64 and the hopeless Windows 32bit CUDA client under Win64 with a 9800GTX and yesterday the Free-DC site reported I got 18 points and today's stat is ZERO, yep big fat 0.

Something is terribly wrong with this latest beta3 client.

I'm switching these GPUs to something that rewards the electricity I'm donating. :icon_twisted:

AMDave

01-29-2009, 08:11 AM

Ok, so a dumb question here... I have a 8800 GTS, but it is on a w2k machine. Can I participate in cuda?
I think we are still trying to work that out.
After all this is still Beta.
You should try it out if you like but it seems there may be a couple of issues yet.

AMDave

01-29-2009, 08:14 AM

I'm so glad you posted a similar experience to me Dave. I have Doomeva's twin 9800GTs running under Gentoo64 and the hopeless Windows 32bit CUDA client under Win64 with a 9800GTX and yesterday the Free-DC site reported I got 18 points and today's stat is ZERO, yep big fat 0.

Something is terribly wrong with this latest beta3 client.

I'm switching these GPUs to something that rewards the electricity I'm donating. :icon_twisted:

If the client sent back completed work to the project servers then that has been stored and will be applied as soon as the stats fix is complete - which they are working on. They rolled-back those work units that were already processed, so I think they will have it right shortly.

I am going to kick my client off again in the faith that they will get the fix right.

Bender10

01-29-2009, 12:08 PM

Ok, so a dumb question here... I have a 8800 GTS, but it is on a w2k machine. Can I participate in cuda?

There are a few CUDA projects you can participate in.

Folding at home...works fine

RC5 (distributed.net)...Beta clients, and some growing pains with this last client.

Seti....Beta also I think.

GPUgrid (Boinc)...I don't think your card will run this. They are Beta also.

AMDave

01-29-2009, 12:46 PM

They are Beta also.
There in lies the truth.
these are still very early days.

I guess you could put it this way, "If my Graphics card suddenly went dead how bad would that be for me?"

If you are using an older card that you just have around as a spare then you may feel that it is an acceptable risk.

On the other hand, if it is the only graphics card you have on your workstation (main machine) and the mobo does not have an on-board GPU for backup, then you would be up the proverbial creek until you got a replacement.

It is up to you to decide what level of risk is acceptable to you.

Brucifer

01-29-2009, 12:52 PM

The eternal 32/64 bit speed myth... :icon_lol:
The compact version --> there is little or no speed difference in data transfer rate between 32 and 64 bit OSes.

Okay, then in the case of sieving for instance, why do the 64 bit linux slieving clients walk all over the 32-bit?

Brucifer

01-29-2009, 01:10 PM

I'm so glad you posted a similar experience to me Dave. I have Doomeva's twin 9800GTs running under Gentoo64 and the hopeless Windows 32bit CUDA client under Win64 with a 9800GTX and yesterday the Free-DC site reported I got 18 points and today's stat is ZERO, yep big fat 0.

Something is terribly wrong with this latest beta3 client.

I'm switching these GPUs to something that rewards the electricity I'm donating. :icon_twisted:

They will have t hat issue taken care of in no time. One thing about the distributed.net folks is that they get their problems solved in a big hurry.

The real way to control it though is to run your own perproxy if you don't like the 64-credit work units. On the beta download page is the beta windows perproxy client for version 347. Those will handle the 64-credit units. The current highest version for the linux perproxy that I see is ver 343, which handles the non-variable clients. I run a linux perproxy v343, and wasn't seeing any of the big units. So in setting a system up to go the net for work, the cuda clients will end up being assigned to a v347 proxyserver which will push down the 64-credit units. So then I started a timing run between two GTX260 systems, one running the small units, and one running the big units, both on linux-64 core #7, and I'm seeing a 10% gain in completed stats units on the system running the large work units. It is also running warmer too. :-) Which also means that it's sucking more juice too... but they have optimized the cuda clients a bit more by going to the large units to extract more work from them. And they want you to run them, so you can bet your bottom dollar that they will get the credit issue worked out in a big hurry, otherwise there would be people that wouldn't run the client. Contributing team members want points, and they want completed work, so I wouldn't be worried about the point issue. :)

liuqyn

01-29-2009, 01:44 PM

I hope they get the ATI cards supported soon, I see them talking about it, but mine still doesn't work yet.

NeoGen

01-29-2009, 10:52 PM

Okay, then in the case of sieving for instance, why do the 64 bit linux slieving clients walk all over the 32-bit?
That is due to the brand new cpu features that exist in 64bit processors that make it really good for mathematical operations, but that 32bit software can't use. :icon_rolleyes:
Here's a couple of shamefully copy-pasted features from the article on 64bit from Wikipedia. http://en.wikipedia.org/wiki/X86-64

64-bit integer capability: All general-purpose registers (GPRs) are expanded from 32 bits to 64 bits, and all arithmetic and logical operations, memory-to-register and register-to-memory operations, etc. can now operate directly on 64-bit integers. Pushes and pops on the stack are always in 8-byte strides, and pointers are 8 bytes wide.

The ability to work with 8 bytes (64bit) at once instead of 4 (32bit) makes it possible to move around twice as much data between CPU and RAM. (And only CPU and RAM. No GPUs here)
The result is that if you want to move two 64bits long numbers in RAM to the CPU, in a 64bit OS you can do it in two clock cycles (64bits at a time) while in 32bit OS you run 4 clock cycles (32bits at a time).

Additional registers (http://en.wikipedia.org/wiki/Processor_register): In addition to increasing the size of the general-purpose registers, the number of named general-purpose registers is increased from eight (i.e. eax,ebx,ecx,edx,ebp,esp,esi,edi) in x86-32 (http://en.wikipedia.org/wiki/X86-32) to 16.
Registers are memory spaces inside the cpu where you store numbers to be worked on. Having more registers means you can store more numbers there to crunch. If you have 2 registers and need to do a sum of three parcels, at some point you have to waste time moving around partial results to RAM because they don't all fit in the registers.
If you had 4 registers for the same sum, you would do it all at once.

AMDave

01-30-2009, 01:41 AM

Big day for their statsman. They just rolled the stats back a whole week
Data shown reflects all blocks received as of 22-Jan-2009 at 23:59 UTC. Current time is 30-Jan-2009 02:35:42. It appears the fixes are in progress.

/ed -
upto 25th now

Data shown reflects all blocks received as of 25-Jan-2009 at 23:59 UTC. Current time is 30-Jan-2009 03:05:17.

AMDave

01-30-2009, 05:41 AM

Looks like they are all done

Data shown reflects all blocks received as of 29-Jan-2009 at 23:59 UTC. Current time is 30-Jan-2009 06:39:57.
and the numbers look right to me.
http://stats.distributed.net/team/tmember.php?project_id=8&team=28697

AMDave

01-30-2009, 08:51 AM

Sweet stuff.
It looks as though we are going to introduce some "Smack Fu!" to Team Norway 2 days before this client expires.
that is - if we are all still crunchin'
Are we all in?

vaughan

01-30-2009, 10:04 AM

Yes - running it again now that the stats are sensible again. If it wasn't for the Primegrid year of the Ox challenge I would have switched my CUDA client boxes over to Folding; instead I left the GPUs on idle and put all cores on PG.

Brucifer

01-30-2009, 04:58 PM

Sweet stuff.
It looks as though we are going to introduce some "Smack Fu!" to Team Norway 2 days before this client expires.
that is - if we are all still crunchin'
Are we all in?

Your computations are based on.....................................

AMDave

01-30-2009, 11:52 PM

30 day average
pass should happen in 15 -20 days
I added some wooliness because its not clear how much steinrar is crunching at the moment due to the stats changes.

probably sooner rather than later, though
thats well into the sub-200 ranks too by the way!

AMDave

01-30-2009, 11:55 PM

PS - check this out for AMD-Users
"The odds are 1 in 77 that this team will find the key before anyone else does."
That's incredible!

Brucifer

01-31-2009, 05:45 PM

I'm surprised Team Norway isn't cranking out more. But then they are pushing hard on some others. AMD_Users is slowly climbing up in the millions of completed units. Was a good output yesterday. What work units are others completing? Big or small ones? All mine are small since I'm running a perproxy to feed the crunchers and keep them busy since my net connection sometimes goes nuts.

With ogr-ng coming to an end, maybe there will be an upgraded perproxy put out that handles ogr-27 and the large rc5 units.

Brucifer

02-19-2009, 04:55 AM

We are about ready to slide under 3 days left on the beta3 cuda client. Hopefully we won't end up getting jacked around again waiting for another client to reappear..... :icon_rolleyes:

AMDave

02-19-2009, 07:02 AM

The question has been asked in the development thread.
I'll try to keep an eye on it and watch for a response.

Brucifer

02-19-2009, 03:50 PM

yup, I saw that, but I also noticed it's been asked for a while, plus the last time around (beta2) and they just kept ignoring it. Rather a little ridiculous too, as they are accepting the beta results into the permanent database, so the work it is doing is good. It should be a released client now. What they are doing is fine tuning and that stuff normally goes under version upgrades. Their project I know, but I also see that they are losing some people that started off on this and have moved over to other projects. At this point in the game, I'm wishing that I would have just put the money in to 2 or 3 Q9400 systems and had more latitude in projects to work on rather than the gpu's. Would also burn a lot less electricity.

Brucifer

02-22-2009, 06:22 PM

Well here we are once again with an expired linux64 cuda client and no new one. Time to move on with the gpu's I guess and quit playing their games.