PDA

View Full Version : LHC probs



Empty_5oul
10-03-2004, 08:37 AM
just checkig our stats and stuff, on the LHC home page came across the following:


Server Status

Up,
Warning: Too many connections in /shift/lxfsrk429/data01/boinc/projects/lhcathome/html/inc/db_ops.inc on line 11

Warning: MySQL Connection Failed: Too many connections in /shift/lxfsrk429/data01/boinc/projects/lhcathome/html/inc/db_ops.inc on line 11
Unable to connect to database - please try again laterToo many connections

2.10.2004 15:00 UTC
LHC@Home database is having bad performance. We should get another server soon and get things running better.

seems a shame, 2 many connections meaning they have too many users ??

vaughan
10-03-2004, 10:12 AM
I couldn't get it to connect, tried several times this arvo and gave up in disgust. Bloody BOINC, fri%&in useless system.!

AMDave
10-03-2004, 10:24 AM
Yup.

I saw that too a couple of hours ago when my client was not getting any WUs. Then about 2hours ago it resolved and the main page data came up. It said there were just over 19,000 tasks available, but I still could not get any of them. Now we both get the same message again.

I expect that because the tasks are so short to process, the task server is getting overloaded with requests for new WUs.

Interesting thing is that when the user increase happened from 2000 to 5000 users there were over 94,000 tasks available. I think they are going to run out of WUs before they get the new server in place.

Predictor and Seti both went through the same issues, but at least LHC have mitigated the issue by limiting the user base so they can expand in a controlled way. I think they may have underestimated the number of CPUs those new 3000 users would put on the project.

You can be sure that although some of us are getting error messages it is because some others are getting the WUs that are available. It may not seem very fair from a Team point of view, but for the project at least the work is still getting done.

Either way, the demand is now clearly greater than the supply and they will have to increase the power of the splitter to feed the WU demand that is there if they want to keep the user base happy and involved (and you can be VERY sure that they do)

As DMMc said in another thread...CERN don't hang about mulling things over for long. We should see something happen pretty promptly and powerfully - unlike the gradual recovery that Berkely had to go through with Seti.

At least with CPDN the WUs take a long time to crunch, so the demand peaks are more varied. It will be interesting to see what happens tho when their user base finishes the next couple of WU sets and start hitting their server for WUs. They have of popular base too.

In the mean time I have thrown a CPU back onto GRID. For some reason the SoB client on that machine is not registering when it sends work back. I'll figure that out later. GRID is one of my old favourites and currently our lowest placing project. Feel free to give it a spin if you have a CPU for it. (Windows GUI only, WUs vary 4hrs to 36 hours)

Post Script:
I must say that I still find the thought odd that the IT Pros in these projects would have not calculated the required productions rates and bandwidth demands at the servers satisfactorily enough to avoid these issues. There is sufficient data available on the exisiting projects in the world to calculate these things very finitely. Perhaps they did and were only able to apply equipment within restrricted budget limits and then had to demonstrate the issue to management when it occured, like an "I told you so".

Q - When's the last time you heard "whatever you need to get it done" between Mgmt and IT ?
A - right after it all F***d up. Thats when.

(Please s'cuse the "F". This happened to me this week. So I Know)

Ototero
10-03-2004, 10:41 AM
Trouble is.....lots of DC projects are moving to Boinc.

If we decide to ignor Boinc, whats left to concentrate on??

AMDave
10-03-2004, 10:58 AM
Stuart.

I didn't say "Ignore BOINC" !!!!
I would never say that.

The philosophy behind BOINC is great.
Especially from a DC team point of view.

I only put my CPU onto GRID until I can actually GET some LHC WUs.

I hope that my post did not come across like that.

I suppose that in my mind I was thinking of all the previos posts in which we have discussed BOINCs development.

In that way I may have ommitted to say what I was thinking, given that I have already said it several times.

BOINC is coming along in steps.
It is a very powerful medium for scientific (and otherwise) projects.
Conversion to BOINC is not yet a well mapped process.
Problems will happen.
DC Teams and members must try to be patient with them.
They will come good.

I did not mean to give that "other" impression at all.

AMDave
10-03-2004, 11:02 AM
Oh. I think you meant vaughan's post.
I was still typing mine when he posted,
I left the room came back and then I saw yours.

LOL.

I gotta stop typing at the speed of thought and keep my posts shorter.

Sorry if I gave the wrong impression.

Dave.

AMDave
10-03-2004, 11:34 AM
Here's the URL to the LHC server problem thread for what it is
http://lhcathome.cern.ch/forum_thread.php?id=623

Be patient, it may take up to a minute to load at the moment.

Ototero
10-03-2004, 12:04 PM
Dave,

I wasn't having a pop. Just that all teams are complaining about the reliability of Boinc, levels are always underestimated.

Screwed in Predictor, Seti points/people missing, LHC no work.....

Ubero, although effectivly wasted cpu time, has never had a problem.
SoB is always working.
Grid is reliable

Grid has more value because it's medical.

There have been a few views on what our priorities should be, top 10 in all DCs being 1 of them.

I'm all for taking 15th in Sob then having a vote.


Rant over.

AMDave
10-03-2004, 12:20 PM
:D
No rant there.
Just facts.

It *would* be nice to see the problems ironed out real fast.

Then I think that there should be an "All projects" cobblestone site.
That way, once the BOINC project implementations are sorted out people will sign up to more projects and it won't matter if a single project goes down for a few days as the users would still be punching out Wus on the other projects and still accumulating their overall cobblestones.

SetiSYNERGY is sooo very close to it already, but they have just stopped short of doing the combined rankings. It seems like the next logical step though.
http://www.setisynergy.com/stats/index.php

Ahh. One day perhaps.

ITMT Yes it would be rude not to capitalise on David's contribution to SoB and make the next rank.

Empty_5oul
10-03-2004, 02:22 PM
if they do an overall score they would need to make usre workunits across the different projects score accordingly, as in LHC it takes usually about 6 hours but ocassionally 45mins, whereas in predictor always about 2 hours.
i dont know how long SOB will be around for but they have solved few of the 17 calculations.

** For now we'll stick with SOB and when we comfortably have 15th place we will vote again and decide where to go - nobodya has mentioned D2ol for a while, i dunno what people feel about that ?

AMDave
10-03-2004, 02:50 PM
Soul Man.

Re BOINC:
I have the understanding tha the cobblestone calculation is the same across all BOINC projects.
http://boinc.berkeley.edu/credit.php

Re D2OL:
we are still keeping up with those naughty Ninjas.
12 Team Ninja 556,651
13 AMD Users 550,755

Re SoB:
once we get 15th in SoB we are going to have to be mindfull about how we keep it. Team BeOS have a consistent increase in power over their Team lifespan, something other teams do not exibit. Thier rate just went up again.
http://www.seventeenorbust.com/stats/teams/team.mhtml?teamID=6

My SoB stats show that my CPUs are absolutely cr*p at running SoB and the one that was best has stopped registering in the stats even though its sending up work...very frustrating. I want to support the team in the Team Priority Project but I'm asking graciously for a stay from SoB for myself to bring up the rear in our lower ranked projects where my CPUs seem to provide a better return.

How do I feel about it ? ... Perplexed and disappointed.
I'll just have to keep hoping a Quad-Opteron turns up under the Tree at Christmas. Santa are you listening ?

Empty_5oul
10-03-2004, 02:56 PM
i think thats wha we all hope for dave :P

re: d2ol i see what u meen we are close with the stats but ninjas wont settle for being beaten so that wont be a easy take back as they will compete with us for the spot.

regarding SOB i dont undertsand about team beos having a consistant increse in power?? looking at their stats i cant see this, how did their rate go up

if you cant run SOB dont worry do something else it doesnt matter, as long as CPU time isnt wasted

SB2
10-03-2004, 04:08 PM
Team BeOS has several members that only upload when the workunit is complete + 2 or 3 that move for one project to another on a regular basis. I think maybe what you are seeing as the increase in production rates.

Empty_5oul
10-03-2004, 04:24 PM
oh, k.

well i guess we do the opposite then sending in a unit as soon as its processed so we get our score straight away and less risk as if it crashes in that time they lose all of it.
i guess you could call what they do queing and dumping then

SB2
10-03-2004, 04:54 PM
From LHC@Home front page;

Server Status

Database overload - please hold connections


I think in a nutshell, all those machines gained from Predictor's extended downtime has overwhelmed LHC's servers.

DMMc
10-03-2004, 05:25 PM
They are down an repairing now

Empty_5oul
10-03-2004, 06:50 PM
DMMc it may be worth you leaving a few machines on SOB then if they are still down 2moz and it is hard to recieve WU's.

with your available power i would hate for you too be wasting time when we are trying to take positions, if you dont want to dont worry and i am very greatful for the time you spent this weekend on SOB

DMMc
10-04-2004, 12:46 AM
LHC is back up and running now, and I do have to play a little catchup .
I am going to leave a few of the systems running SoB as I can but not all 95 that I've got going now.
I will redirect systems where needed to help the group tho, just let me know where.

Beerknurd
10-04-2004, 01:23 AM
As soon as I finish my current test, I'll put my machine back on LHC, or do we need help somewhere else??

My Dell will finish SOB test in less than 2 days. I started it Saturday morning @ 6:42. I only have 244 left out of 1390. I'd say that kicks ass.

SB2
10-04-2004, 02:39 AM
Looks like it is down again, though I did manage to get several hours worth of workunits on 4 machines. :-(




/edit/ It figures, about the time I post the flood gates open back up and the clients queues' are filling up again. :roll:

SB2
10-04-2004, 10:37 AM
Server Status

Up, low on work


Me wonders if it is not time to archive some of the forums we have for completed or dead projects and add individual forums for the BOINC projects?

Anonymous
10-04-2004, 12:48 PM
that would seem sensible m8,
the BOINC forum is getting crouded and though i dont ever go in them apart from finding stuff i have read but forgotton. we could lose a dew of the "dead forums" or ones which currently have no posts or have been inactive for a year or so.
DMMc i dunno how long it will take for you to finsih your processing tasks but it would be good if you could finish all your current tasks and that would boost our total taks completed a lot

Skuzz
10-04-2004, 12:52 PM
I haven't noticed anybody talking about climate predictor. We have a good size team there that is in 38th position. You don't have to keep downloading WU's (each unit is huge - and sends in "trickle" updates). When I'm not running SOB I've been trying to help out there.

Ototero
10-04-2004, 12:58 PM
4 on SoB
1 on Climate Prediction (boring to run)
1 on Folding@Home
1 on Seti Classic
1 on Grid Org

bwhite
10-04-2004, 02:21 PM
I haven't noticed anybody talking about climate predictor. We have a good size team there that is in 38th position. You don't have to keep downloading WU's (each unit is huge - and sends in "trickle" updates). When I'm not running SOB I've been trying to help out there.

I just stared running CPDN too.

I'm just starting to get a good handle on making BOINC work for me on multiple projects. I have it set up on 6 computers to run:

LHC 40%
CPDN 40%
SETI 20%

After you let it run a while seems to equal out the work load OK. If there is a problem with a project - like LHC over the weekend BOINC just skipped that projects turn and gave the other projects their share the time. In other words it actually seemed to work like it was supposed to for me...which was amazing to me after dealing with the Predictor fiasco which left me with a bad opinion of BOINC related projects.

I'm sure there are still lots of problems yet to come. The BOINC client is after all still in beta. The projects have issues too: Predictor is just plain a mess, LHC is just out of beta and underestimated the amount of work 5000 users could do, SETI is slow on posting results and CPDN has monster work units (but seems to be good at keeping current on the stats).

I think I'll let BOINC run as it is for a while to see how it does.

BTW have left one box on 17 at least for now.

DMMc
10-04-2004, 02:21 PM
Out of work and doing server maintance but lthis is fun....Where all the users are located...

http://lhcathome.cern.ch/map.png

Empty_5oul
10-04-2004, 04:56 PM
thats pretty kool m8,
are there 2 portugal users or is one in spain and who are they, you realise em9901pepe has left us for freeDC.

i like it tho, when i looked i presumeed it was something to do with climate predictor lol :P

SB2
10-04-2004, 09:51 PM
I haven't noticed anybody talking about climate predictor. We have a good size team there that is in 38th position. You don't have to keep downloading WU's (each unit is huge - and sends in "trickle" updates). When I'm not running SOB I've been trying to help out there.

I had 1 running, unfortunitly it is on my only P4 which got commandeered for SOB. It is on it's second SOB workunit now but set to not download another. So should be back CP'ing sometime tomorrow. :cool:

Empty_5oul
10-05-2004, 04:40 PM
so your SB production is going to decrease then :( ohwell, as long as your not wasting those cycles.

ototero - i now have a prob with my memory so atm only the 3200 is up and running. i had to leave it off overnight while i tried to fix the prob :( so my stats are falling even lower.
when i got the motherboard it said take only 1 DDR400 stick, i had bought 2. i left one out for a while then thought id try it, so i had 1Gig in total. yesterday i sold 512 of this for £40 when i tried to boot now with the correect ammount installed it ont boot just 3 beeps meaning memory problem.
in the end i solved it by changing the memory and putting in 512 of 333 for the moment, this isnt ideal tho :( and noe my other machie has no memory.

Keith75
10-05-2004, 06:47 PM
I am doing about the same as Bruce is.

4 AMDS = Boinc
70%CPDN
20%Seti
10%LHC

2 Macs G3s = Seti Classic

2 PIIIs = D2OL

2 AMDs, 2 Mac G3s = Temporarily Off Network

Keith

SB2
10-05-2004, 09:38 PM
so your SB production is going to decrease then :( ohwell, as long as your not wasting those cycles.

Forgot to set the P4 to not download another sob unit, so it is still going. Came home to an idle LHC machine so it is back on sob, another will run out of LHC work by this evening so it will be back on sob too. In a wait and see mode on the other 3 LHC machines. :roll:

Empty_5oul
10-05-2004, 10:02 PM
that good mate.
my memory failed this morning so when i checked i b4 college the lights were off :O and machine down. i couldnt fix it then but this afternoon i got some new memory and fixed it. that was the 3200+'s first rest in 2 months, as since holiday it hasnt rested. i hope it enjoyed the break as the next one will be as far away as possible lol :P.

has anyone over over-worked and blown/made a CPU fail from DC projects without OCing or anythign like that ??

Keith75
10-05-2004, 10:25 PM
So far I haven't had a single machine running these have any problems. My cousin though has had lots of trouble with 2 different AMDs running D2OL mostly. He is a novice O/C'er though so I think he is the problem. :lol:

AMDave
10-06-2004, 07:48 AM
Yes. I lost one about 2 years ago.
It was running SetiGRID and had the entire cache of 100 wu's on it.
It was a PII-333.
The other machines were crunching like crazy in Seti Buffer until they ran out.
There were no onboard temp warnings.
Summer gets pretty warm in Queensland.
So does my garage.
Lucky for me there was no smoke, so I was able to grab the disk out, reload SetiGRID on one of the other machines get it back up and running and rescued the whole cache.
I have been a lot happier since we got temp warnings in BIOS.
I came home to silence (no fan noise) several times last summer.

Beerknurd
10-06-2004, 11:08 PM
How about air conditioning your garage... :lol: When we get a house that's probably where the wife will stick all my crap....

jlangner
10-09-2004, 01:18 AM
LHC back up.

Beerknurd
10-10-2004, 02:17 PM
It's up.... But out of work....

Main reason.... http://boinc.mundayweb.com/lhc/stats.php/userID:161/trans:off/.png :lol: