Results 1 to 6 of 6

Thread: All Encompassing Guide to POEM

  1. #1
    Join Date
    Feb 2011
    Location
    Arkansas, USA
    Posts
    16

    All Encompassing Guide to POEM

    The below guide is created for AMD only.

    I am back here with POEM with my Radeon HD 7950 and two Radeon HD 6850s crunching as much as possible since there is now an insane amount of work units to process.

    First of all, new users to POEM@Home absolutely need a powerful AMD Radeon graphics card running on Windows with the latest driver from AMD's website. Linux has some performance issues so it isn't recommended for this project unless you are simply running CPU tasks, in which case do make sure you have installed the latest BOINC client manually if you are running a Debian/Ubuntu based distribution.

    Secondly, you must place an app_info.xml and the app in your ProgramData\BOINC\projects\boinc.fzk.de_poem directory (and be sure to restart). A sample of a perfectly working app_info.xml is here: http://www.mediafire.com/view/?dkvu0gck6vlhd0z as well as the latest opencl app.

    Finally, and one of the most important aspects, there are some basic problems with POEM that can prevent you from getting best utilization, as I have learned through a lot of testing. The basic problem with POEM@Home is that it is constantly sending data to and from CPU -> GPU -> CPU -> at a fast rate, this causes a bottleneck problem with RAM. So, the most important component here, besides the graphics card, will be the speed of your RAM. Therefore, there will come a time when adding more work units simply will not increase GPU utilization at all. At best, my FX-8120 and 16GB of 1866Mhz RAM is only capable of getting 60% utilization at max (most often ~55%) of my Radeon HD 7950. Overclocking your memory and northbridge will help out a LOT in this project, your processors frequency takes a seat in the back in priority. To improve memory performance, you may simply close most of your programs, and get a program like this: http://bitsum.com/prolasso.php to get the most out of your processor and input/output utilization. Having BOINC on a SSD can help as well, but only minimally.

    Useful tools:
    Process Lasso : To improve efficiency, utilization, and monitor processor cores and memory
    MSI Afterburner : To overclock, overvolt, and monitor GPU usage.

    On a final note, this HD 7950 is really fast, even if it only gets ~55% utilization it makes 480,000 BOINC PPD. If it could be fully utilized that would be in the 800,000 PPD range. It completes a work unit every 9 minutes, with each work unit worth close to 3,000 points.
    Last edited by mmstick; 06-23-2012 at 06:04 PM.

  2. #2
    AMDave's Avatar
    AMDave is offline Seeker of the exit clause Moderator
    Site Admin
    Join Date
    Jun 2004
    Location
    Deep in a while loop
    Posts
    9,610
    "Holy heck! That's Ridiculous!". Of course I'm referring to your HD7xxx output in comparison to my band of HD5770 s

    Your insights have re-confirmed that my AM3 board strategy (with upgrade path) has been the right path to take and that my memory upgrade will be key, moving forward.
    Paying for the right mobo in the first place was a wise move, at the cost of lower performance CPU and RAM to begin with.
    I will be able to upgrade along the path that you suggest and reap the rewards.

    Thank you mmstick for putting together your analysis.
    +1
    . . . . . ___
    . . . . . . .\___/\______
    . . . . . . . \__AMD___\\__
    ---------------------------------------------

  3. #3
    Join Date
    Sep 2010
    Location
    Leiden, the Netherlands
    Posts
    4,384
    Doesn't your app_info.xml have one </app_info> too many? (Two, against one mentioning of <app_info>)
    On my Llano system POEM runs best all by itself, without a app_info.xml. Then 2 instances of POEM run, and three other tasks.
    Using the app_info.xml I can run more POEM tasks, but less others and the POEM tasks take up far more processing time, 15 hours instead of 5.
    A HD 7950 is thus far quicker than a HD 6670 or a HD 6550D. Neither of the latter support Double Precision, while my HD 4770 does, and that
    old GPU completes a POEM WU in about 45 minutes.
    So we can conclude that a card with good Double Precision performance is very well suited for POEM (which rules out almost all of nVidia's products,
    except for the megabucks Teslas and Quadros).
    Last edited by Dirk Broer; 06-05-2012 at 09:12 PM.


  4. #4
    Join Date
    Feb 2011
    Location
    Arkansas, USA
    Posts
    16
    You are correct, I have fixed it. Without an app_info.xml the best possible utilization is only a measly 10% with my HD 7950, with the possible even of a CPU project consuming the POEM@Home core since by default it only gives 0.9 of a core to POEM, which leads to being 0.1 realistically. Running multiple work units can be very CPU intense, but it improves graphics card utilization.

    I must also note that my Radeon HD 7950 with 28 CUs is running 1200MHz core and 1700MHz memory (4000 GigaFLOPS SP), whilst my old Radeon HD 6850 with 12 CUs is only running at 860Mhz core an 1250MHz memory (1700 GigaFLOPS SP). Running 5 work units with my 7950, each taking 45 minutes to complete, equals 9 minutes per completed WU at 55% utilization. Meanwhile, my 6850s running 5 work units take around 77 minutes to complete, so 15.5 minutes per completed task with 90% utilization.

  5. #5
    Join Date
    Sep 2010
    Location
    Leiden, the Netherlands
    Posts
    4,384
    I've found out that on my Llano system the two task running consume -without app_info.xml- 0.919 CPU each, but that three CPU WUs can run alongside it on the quad core A8-3870K. With app_info.xml three tasks were running, each consuming 0.5 of a GPU, but they each needed a complete CPU core with them, leaving room over for just one CPU WU. The HD 6670 is taxed at a quite constant 50-70%, the load on the HD 6550D differs wildly, between 0 and 100%. Guess I would need an app_info.xml that takes the two different GPUs into account.
    The Lunatics OpenCL application for Seti runs far smoother, only taking 0.05 of a CPU, but with the same load variance. OpenCL Einstein takes 0.5 of a CPU, also needing a complete CPU core for the two tasks running.
    Last edited by Dirk Broer; 06-06-2012 at 07:26 AM.


  6. #6
    Join Date
    Feb 2011
    Location
    Arkansas, USA
    Posts
    16
    The problem is just that although it is an OpenCL GPU task, it still does a significant amount of calculations on the processor with the GPU calculations relying completely on completed calculations from the processor, and the processor's calculations rely on the GPU's calculations. It does this too much, too often, causing memory lag. It would be great if they updated their program to put more work on the GPU and less work on the CPU, solving a lot of problems in the process. I would think that by now with OpenCL standard 1.2 available, and GCN looking very similar to a processor, almost all of the work could and should be offloaded onto the GPU.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •