PDA

View Full Version : current chess960 work units



AMDave
07-10-2007, 09:36 AM
Vaughan said in another thread
Quote:
Originally Posted by AMDave http://amdusers.com/forum/xbox360/buttons/viewpost.gif (http://amdusers.com/forum/showthread.php?p=50241#post50241)
(btw there's some rare chess wu's available if anyone is interested ;) shhhh.)

I got a weird one that got to 6% completed in >6 hours on a computer where Chess tasks usually only take 10 minutes. When I discovered it I aborted the task and all the others as they were past the report date/time.

I have been noticing some sluggishness on my prize workstation since I resumed this project. On closer examination I have found a performance issue.

I run BOINC purely as a service these days on any machine and without the BOINC Manager running. Until recently, I would open the BOINC Manager on my main workstation once in a while to check on each of the machines (see BOINC RPC article).

For the last few weeks I have been using BOINCview 1.5 beta 8 instead of the BOINC Manager as it pulls together the info from all your machines at the same time (thanks to Jason1478963 for the nudge - it is a terrific helper tool) and it has worked well ... until I added chess960 the other day.

Because chess960 was in Alpha last year (and it still is) I reported the problems I had with their WUs to them at the time and moved on. Just five days short of a year later, I return to find that nothing has changed. Well, all of my machines have been upgraded, but the work units still exhibit exactly the same problems I reported back then.

The WUs early in the run (first 24 hours) seem to be ok but after that they seem to cause the OSs (both Windows and Linux) to seize up while they run. This seems to occur at the point when the wrapper drops into the background and the "Glaurung" chess engine kicks in.

Further testing shows that the problem ONLY occurs when BOINC Manager or BOINCview are open and polling the client. At all other times it just plugs away and gets the job done.

Just to be sure though I have set my usage back to 99% of the CPU to make sure I will eventually get control back if something locks up completely.

Happy gambits :)

AMDave
07-10-2007, 10:34 AM
more testing ... more notes

changing the CPU usage appears to work ... for most work units. Most run at the allocated percentage. However, some of them still grab 100% of the CPU and won't let go. At that point the BOINC Manager / BOINCview lock up. BOINC Manager just stops responding altogether whereas in BOINCview I can still change tabs and open the menu etc., but all communication with the clients both local and remote has stopped. Restarting the programs works initially and shows a refreshed view at which point I can see that the boinc client has kept going and moved on, but then within a few seconds to a couple of minutes the same thing happens again.

During these periods, when the BOINC Manager or BOINCview is impacted the Windows core applications stop responding (explorer, tool bar, task manager etc.).

I have tried this with BOINC 5.9.x and 5.10.x on both Linux (FC5, FC5smp, CentOS4, Lindows) and Windows (XP-SP2, Win2k-SP4) and across CPU (AMD and Intel) platforms. On Linux the OS doesn't lock up but it does noticeably slow down the GUI. I cannot seem to find an external trigger or solution. I have touched base with 17 BOINC projects recently and I do not experience this problem with any other BOINC project. The evidence is really pointing towards the glaurung binary.

I suspect that somewhere in the "Glaurung" chess engine there is a conditional loop that is a little bit too "tight". In other words, the condition to enter the loop is only met sometimes and when the loop is entered it does not contain enough interrupt opportunities for the OS to allow something else to get a bit of CPU time, including the OS itself.

Of course the chess960 project is still in Alpha so there are still some bugs they need to kick out, but since this behaviour was reported a year ago I must say I'm a little disappointed that it has not been rectified.

In the mean time, I'll go back to keeping the BOINC Manager closed and only briefly using the BOINCview tool because it all seems to work fine
when I ignore it :D

I was going to give them a post but I see from recent posts on their forum that other people are also still experiencing these same old issues - as well as the other old issue of work not getting handed out to machines which have capacity and are below their limit but the server says the work was committed to other platforms and yet there are thousands of work units available for download.

I wish there was more evidence of some study/development attention being paid to the project by its project team, but at least we can see someone is there triggering games once in a while. The admin seems to be scarce although Rytis assures us that the chess960 admin is alive and well, just "busy".

I don't expect these problems to be addressed any time soon.
Bit of a shame that really.

vaughan
07-10-2007, 11:48 AM
AMDave I'm glad you posted your observations because I thought it was just my Windows machines and my inexperience with Ubuntu Linux. I also find that if I exit BOINC that the Chess engine keeps running. Only killing the application via Task Manager gets rid of it.

I tried BOINC-view myself but couldn't get it to work across my network so now I rely on visiting each PC and checking the BOINC Manager status. I am running LogMeIn Ignition and it seems pretty good. I doubt it is worth 40USD per year though when the trial period ends. It appears that LogMeIn know how to charge more than the mighty Micro$oft. :icon_wink:

AMDave
07-10-2007, 12:46 PM
Yes the chess engines used by chess960 will continue running until they complete, even after you have stopped BOINC. This is because the wrapper starts them as a separate executable and there is no communication back to the initiating program, whether it be the BOINC client or the wrapper. The engines also do not support "suspension". Once started they will run until complete. The wrapper checks on the program to see if it is finished or not and hands back the result files if it is complete.

Some of the projects that use a BOINC wrapper have added in some more intelligence to check if the executable has hung by setting a drop-dead time limit after which it sends the kill command to the OS. From your initial text above it looks like either that limit is set much too high in this wrapper or it may not be there.


LogMeIn is certainly a convenient product and they do take out a lot of the hard work required to connect to a LAN behind a DHCP connection. I tried the free version and although I had a stability issue with it I was convinced at the time that it was of my own making as I had really screwed around with the rig I tried it on and it had other problems too. Although I have not used it since then, I have observed several of you extolling its virtues.

It is possible to achieve the same thing with a combination of free remote-GUI products and a free dynamic DNS service. Of course changing the TCP/IP and UDP ports helps, but it is better to secure the port than it is to try and hide it when a port scanner will eventually pick it up anyway. With a bit of messing around with your router and a PC you can even set up your own VPN tunnel to keep out the script-kiddies. It is more arduous but learning something for the first time often is :icon_lol:

LogMeIn's selling point is that it combines these networking aspects and automates them as much as possible for you - Oh and the "LogMeIn Scout" is a little nifty.

AMDave
07-10-2007, 01:13 PM
They just ran out of work again.

Ready to send 0