Page 3 of 3 FirstFirst 123
Results 21 to 29 of 29

Thread: New BOINC Project: SciLINC

  1. #21
    Join Date
    Apr 2007
    Posts
    92
    Is it sending/recieving a WU every second or what..?

    Maybe 500-1000 WU:s all at once should do the trick or..?

  2. #22
    Join Date
    Oct 2005
    Location
    Mid Michigan
    Posts
    590
    Quote Originally Posted by Nflight View Post
    I was away fro several hours, upon returning I have 1300 Work Units and the bandwidth is choking my system. So you were right, until they fix this it is not worth running it!
    My point exactly, I go through the same thing with the BOINC Alpha Project, if I try to run more than 1 Pc at the project my Network gets Choked up trying to Upload the 4 mb Files that only take 1 minute to run. So I only run 1 Pc at a time and it works okay then because the Uploads can keep up with the Downloads.

    But this sciLINC Project is ridiculous, with the less than 1 sec Wu's theres no way the Uploads can keep up with the downloads ...

  3. #23
    Join Date
    Apr 2007
    Posts
    92
    Hmm..

    Maybe someone with a faster upload should do those projects..(8/8)?

    (if it´s necesseary at all..)

  4. #24
    Join Date
    Apr 2007
    Posts
    92
    There seems to still be problems:

    SciLINC
    Unable to connect to database - please try again later Error: 2002Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

  5. #25
    Join Date
    Oct 2005
    Location
    Birmingham, UK
    Posts
    534
    They managed to corrupt their database, probably because of 1000's of WU's coming in from all angles every second.


  6. #26
    NeoGen's Avatar
    NeoGen is offline AMD Users Alchemist Moderator
    Site Admin
    Join Date
    Oct 2003
    Location
    North Little Rock, AR (USA)
    Posts
    8,451
    Database corruption doesn't happen because of the number of cuncurrent connections... it comes down to how the database server locks the operation requests.
    If the database is not properly configured, it can do update and delete operations to data that is being used elsewhere by some other cuncurrent connection. That causes corruption because the other connection is trying to work with data that doesn't exist anymore or was altered during that time.

    Usually its not a big deal when you're on test environments, with only half a dozen people accessing the database it rarely or never happens that two people access the same data at the same time, but if you switch to a production system with hundreds or thousands of clients all accessing and altering the data it can happen that two or more try to alter the same data at the same time...

  7. #27

    Arrow Official response

    Quote Originally Posted by NeoGen View Post
    Database corruption doesn't happen because of the number of cuncurrent connections... it comes down to how the database server locks the operation requests.
    If the database is not properly configured, it can do update and delete operations to data that is being used elsewhere by some other cuncurrent connection. That causes corruption because the other connection is trying to work with data that doesn't exist anymore or was altered during that time.
    Very true NeoGen, but alas that was not our issue either.

    The database was properly configured and we had tested with dozens of clients hitting the server while it ran 20+ instances of httpd, 9 different project daemons and an equal number of periodic tasks, some lasting 8 hours. All of this generated hundreds of queries per second with peaks of a couple thousand.

    We were bitten by electrical issues. Several machines in the building where the server is housed completely fell over around the time this occurred.

    (Un?)fortunately the SciLINC server stayed up, but system file buffers were corrupted and garbage was written out to the drive. This spanned the Apache log files, the MySQL log files, the Linux kernel log files and unfortunately the tablespace for at least one of our tables, the SciLINC result table.

    Ouch!

    We have since managed to recover everything else and put the project back up, but we are not feeding new work units nor are we registering new accounts at this point in time.

    I would like to personally apologize to those that were affected by the high CPU load that resulted from transferring 2,500 small files all at the same time. And, I would also like to thank everyone that has shown an interest in SciLINC and the research that is being done.

    Thank you,

    Ron Parker
    SciLINC Developer

  8. #28
    NeoGen's Avatar
    NeoGen is offline AMD Users Alchemist Moderator
    Site Admin
    Join Date
    Oct 2003
    Location
    North Little Rock, AR (USA)
    Posts
    8,451
    Ouch... I guess that not even the best and most well tuned Database system in the world would be prepared for that.

    Thankfully I got my account already, so I'll be there when the time comes for another round of workunits.

    In the mean time, thanks for the update, and keep up the good work! Your project has an interesting purpose, outside of the common nowadays maths or bio sciences, I'm looking forward to see it developing.

  9. #29
    Join Date
    Apr 2007
    Posts
    92
    When do you think the project will be working again?

    And do you intend to have a alpha or beta version first?
    Last edited by Bubben; 06-20-2007 at 11:59 PM.

Page 3 of 3 FirstFirst 123

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •