Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 24

Thread: Anyone plan on trying this project

  1. #11
    Join Date
    Jul 2003
    Location
    Florida,US
    Posts
    393
    I did some tweaking in bios and it shows a 10% increase in speed just by disabling video and bios cache and lowering ram timings.

  2. #12

    SSE2

    PRP runs fastest on Pentium 4s, due to SSE2 instructions.

    However, PRP isn't quite as SSE2 optimized as Prime95. I think it takes slightly under 8 hours to test these exponents on a P4 2.8Ghz 800Mhz bus processor, and your Athlons at around 11 hours and above get close. If this were Prime95 there would be a huge gap rather than a close one. I am going to put a post in mersenneforum.org probably tomorrow seeing if someone can improve the SSE2 code for PRP.

    I love SSE2, it basically changed the situation from a slower clocked Athlon COMPLETELY CRUSHING a faster clocked P4 in x87 FPU code to a slower clock Pentium 4 using SSE2 COMPLETELY CRUSHING a faster clocked Athlon using x87 for Prime95. For prime numbers, SSE2 is the greatest thing since sliced bread

    I am extremely disappointed at AMD for having poor SSE2 in their new Athlon FX/Opteron/64. I was looking forward to them putting SSE2 in these processors, and being equivalent to Pentium 4s for SSE2 at equivalent clock speeds, but now the end result is there isn't really any difference in performance than if they just didn't put it in. At the same clock speeds, a P4 crushes a Athlon FX. I was impressed with the original Athlon when it came out, it was such a well designed processor, and now AMD disappoints us. Me, Bionic, and some others were talking about SSE2 over at the hardware section of mersenneforum.org I am assuming what I read on web that these processors have problems with vectorized SSE2 code. Bionic, can AMD do something to do this? should they put the SSE2 part of the chip away from the x87 FPU so they don't share it? Is AMD going to do something? or will all future releases disappoint in 32-bit SSE2 code?

    regards,
    william

  3. #13
    Join Date
    Jul 2003
    Location
    St. Joseph, MO
    Posts
    535
    My XP has SSE2 and the FX55 will have SSE3. I am not saying the SSE2 in the XPs is necessarily as good as the P4s, I just don't know, but it does have it. ;)

    Keith

  4. #14
    Join Date
    Jul 2003
    Location
    Florida,US
    Posts
    393
    I don't share the same opinion as some of the others in that thread when it comes to SSE2 and Athlon64's. Everone I talked to that does have one the cpus say with a client with SSE2 enable showed a decrease in speed not same as without SEE2. It seem kind of odd to me that intel and amd has a agreement to share cpu technology when it comes to instruction like 3dnow and sse, sse2, ect.. and the only bug was the the intel sse2 and all of a sudden intel has decided to make their own x86-64 cpu. It's no secret that intel has had most of the cpu market up untill now and amd had the first year out of the red could intel have put some bug in the code to hender amd?

  5. #15
    Join Date
    Jul 2003
    Location
    Florida,US
    Posts
    393
    Quote Originally Posted by Keith75
    My XP has SSE2 and the FX55 will have SSE3. I am not saying the SSE2 in the XPs is necessarily as good as the P4s, I just don't know, but it does have it. ;)

    Keith
    No Keith XP's have SSE not SSE2. I can't remember what SSE3 does off hand but I don't believe it has anything to do with FPU. I think it has to do with improving hyperthreading and cacheing.

  6. #16
    First, a faster clocked AMD Athlon should obviously perform faster than a slower clocked. Earlier in a thread, someone said their 1800+ outperformed a 3200+? Can't be; yeah front side and memory bus speeds maybe can make a difference and I don't know if both processors have the same bus speeds, but I don't believe an 1800+ beats a 3200+ unless some overclocking is done. Make sure no other programs or processes are running in the background.

    In response to SSE2, here are benchmarks George sent me (lower ms are better). The first one is the Opteron with SSE2 disabled, and the second one is it's natural SSE2 enabled.
    ------------------------------
    Compare your results to other computers at http://www.mersenne.org/bench.htm
    That web page also contains instructions on how your results can be included.

    AMD Opteron(tm) Processor 140
    CPU speed: 1395.99 MHz
    CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
    L1 cache size: 64 KB
    L2 cache size: 1024 KB
    L1 cache line size: 64 bytes
    L2 cache line size: 64 bytes
    L1 TLBS: 32
    L2 TLBS: 512
    Prime95 version 23.7, RdtscTiming=1
    Best time for 384K FFT length: 39.760 ms.
    Best time for 448K FFT length: 41.586 ms.
    Best time for 512K FFT length: 45.334 ms.
    Best time for 640K FFT length: 61.293 ms.
    Best time for 768K FFT length: 72.643 ms.
    Best time for 896K FFT length: 88.261 ms.
    Best time for 1024K FFT length: 97.124 ms.
    Best time for 1280K FFT length: 125.646 ms.
    Best time for 1536K FFT length: 152.502 ms.
    Best time for 1792K FFT length: 182.080 ms.
    Best time for 2048K FFT length: 204.119 ms.
    [Tue Apr 13 09:53:55 2004]
    Compare your results to other computers at http://www.mersenne.org/bench.htm
    That web page also contains instructions on how your results can be included.

    AMD Opteron(tm) Processor 140
    CPU speed: 1395.99 MHz
    CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
    L1 cache size: 64 KB
    L2 cache size: 1024 KB
    L1 cache line size: 64 bytes
    L2 cache line size: 64 bytes
    L1 TLBS: 32
    L2 TLBS: 512
    Prime95 version 23.7, RdtscTiming=1
    Best time for 384K FFT length: 32.687 ms.
    Best time for 448K FFT length: 39.195 ms.
    Best time for 512K FFT length: 44.306 ms.
    Best time for 640K FFT length: 55.108 ms.
    Best time for 768K FFT length: 66.948 ms.
    Best time for 896K FFT length: 80.943 ms.
    Best time for 1024K FFT length: 91.412 ms.
    Best time for 1280K FFT length: 122.026 ms.
    Best time for 1536K FFT length: 149.502 ms.
    Best time for 1792K FFT length: 179.271 ms.
    Best time for 2048K FFT length: 201.410 ms.
    -------------------------------

    As you see, for Prime95, SSE2 provides a very, very small performace gain compared to old x87 mode. It's almost worthless. Looking at the benchmark page at the link given, you see for instance with the 1024K FFT length, an old Pentium 4 1.6Ghz iteration time is 64ms, completely crushing this Opeteron 1.4Ghz. Slow down the Pentium 4 to 1.0 Ghz, or even using an Celeron with SSE2 and 128KB cache, the Opteron can't compete.

    Here is the thread where me and Bionic discuss SSE2:

    http://www.mersenneforum.org/showthread.php?t=2362

    I wonder why this is the case Bionic; I wonder if it is really a bug, or if they just put SSE2 in there to run SSE2 programs or whatever. I have a bad feeling it won't get fixed in a future release.

    By the way, as you know, AMD has it's strengths and Intel has it's strengths. For instance AMD has hardware rotate:
    http://n0cgi.distributed.net/faq/cache/55.html
    so I think performs better than P4 since P4 has slow rotate except for the new Prescott version.

    AMD has superb x87 FPU. A Pentium 4 can't compete. They traded off to make excellent SSE2 FPU.

    etc. etc. etc. Each CPU has it's strengths.

    Oh by the way, SSE3 (aka Prescott New Instructions, PNI) was introduced with the Pentium 4 Prescott; most P4s don't have it. I hear rumors AMD might get it in future.

    Also, Keith, your Athlon XP has SSE, not SSE2. If I remember correctly old Athlons didn't have SSE. Doesn't matter, Prime95 doesn't use SSE.

    regards,
    william

  7. #17
    Join Date
    Jul 2003
    Location
    Florida,US
    Posts
    393
    yea the athlon thunderbirds have only 3dnow extended as do duron spitfires and duron morgan which are XP paliminos counterpart had sse but was disabled not sure if sse is disabled in duron applebred(barton counterpart). if you go further back the k6 series had 3dnow+

    I got to think about it and SSE2 is just an 64 bit extention for a 32 bit cpu so really whats going on is your running a 64bit cpu is 32bit mode to use 64bit extensions.

    AMD Athlon(tm)
    CPU speed: 1998.84 MHz
    CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
    L1 cache size: 64 KB
    L2 cache size: 256 KB
    L1 cache line size: 64 bytes
    L2 cache line size: 64 bytes
    L1 TLBS: 32
    L2 TLBS: 256
    Prime95 version 23.5, RdtscTiming=1
    Best time for 384K FFT length: 37.277 ms.
    Best time for 448K FFT length: 44.150 ms.
    Best time for 512K FFT length: 47.700 ms.
    Best time for 640K FFT length: 60.681 ms.
    Best time for 768K FFT length: 73.697 ms.
    Best time for 896K FFT length: 87.510 ms.
    Best time for 1024K FFT length: 99.817 ms.
    Best time for 1280K FFT length: 134.836 ms.
    Best time for 1536K FFT length: 158.563 ms.
    Best time for 1792K FFT length: 202.566 ms.
    Best time for 2048K FFT length: 230.655 ms.

    Never said it was stock ;)

  8. #18
    Join Date
    Apr 2004
    Location
    Texas Gulf Coast
    Posts
    104
    On a completely different note, is PRP faster than LLR? Can LLR be used to test the canidates in this project?

  9. #19

    LLR

    My project uses k*2^n+1. Really LLR includes PRP in it. LLR does deterministic primality tests for k*2^n-1, while it defaults to normal PRP mode for k*2^n+1 (which is a probablistic probable prime test). Iteration times are roughly the same. However, LLR is buggy while Jean updates it to be fast for certain -1 cases, so please continue using PRP. Shouldn't make a difference, but wouldn't want to jeopardize things. Thankx
    regards,
    william

  10. #20
    Join Date
    Jul 2003
    Location
    Florida,US
    Posts
    393
    wfgarnett3, by looking at the benchmarks you posted it looks like some of the other dc projects aren't as well written for SSE2. I'm mainly talking about "Seventeen or bust". That the project where Athlon64's were slower with SSE2. Not too long ago folding@home released another core I think it's fahcore_79 it's for SSE2. folding@home is different from most as each wu detrimends which core you get some use mmx, 3dnow or sse, and sse2. kind of pot luck but from asking around it doesn't appear that athlon64 does very good with the new core.

Page 2 of 3 FirstFirst 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •