Results 1 to 5 of 5

Thread: What CPU to use for Simap, and why?

  1. #1
    Join Date
    Sep 2010
    Location
    Leiden, the Netherlands
    Posts
    4,372

    What CPU to use for Simap, and why?

    The Project
    What is SIMAP?
    SIMAP is a database of protein similarities and protein domains. It contains about all currently published protein sequences and is continuously updated. Protein similarities are computed using the FASTA algorithm which provides optimal speed and sensitivity. Protein domains are calculated using the InterPro methods and databases. SIMAP is to our knowledge the only project that combines comprehensive coverage with respect to all known proteins and incremental update capabilities.

    What is SIMAP used for?
    Because of the huge amount of known protein sequences in public databases it became clear that most of them will not be experimentally characterized in the near future. Nevertheless, proteins that have evolved from a common ancestor often share same functions (so-called orthologs). So it is possible to infer the function of a non-characterized protein from an ortholog with known function. A well-known example are the investigations about mouse genes and proteins. Their results are also beeing true for orthologous human genes and proteins in many cases. Protein similarities provide information about relations between proteins and are necessary for the prediction of orthologs.
    Protein domains (often called function domains) are the structural building blocks of proteins. They are responsible for the activities of a certain protein, e.g. binding of small molecules, catalytic reactions or binding other proteins in large complexes. The knowledge about protein domains is stored in huge repositories like the InterPro databases. The prediction of domains in newly sequenced proteins is based on those database and provides a fully-automatic functional annotation of these proteins. Therefore we calculate protein domains for all proteins in SIMAP, thus providing the largest system for protein function prediction worldwide.
    There are many more bioinformatics methods that rely on protein similarity and domains. Our protein similarity database provides pre-computed similarity and domain data and represents the known protein space. This opens completely new perspectives compared to the commonly used method to repeatedly re-calculate such kind of data. SIMAP is regularly updated. The similarity matrix is simply being incrementally extended if new sequences occur. The use of SIMAP is completely free for education and public research.

    Why do they need distributed computing for SIMAP?
    The computational costs to calculate the similarity data depend on the square of the number of contained sequences. So the computational effort for keeping the matrix up-to-date is constantly increasing. Our internal resources that perform calculations for SIMAP since years are not longer sufficient to keep track of all new sequences. That's why we implemented a SIMAP-client for the BOINC platform (Berkeley Open Infrastructure for Network Computing) which is based on the FASTA algorithm to detect sequence similarities.
    The situation for proteins domains is different but of similar complexity. The computational costs are proportional to the number of sequences and the number of domain models. Due to the growth of the sequence space and the frequent updates in the domain databases the computational effort for keeping the domain predictions up-to-date is constantly increasing.

    What are the institutions behind SIMAP?
    SIMAP is a joint project of the GSF National Research Center for Environment and Health, Neuherberg and Technical University Munich, Center of Life and Food Science Weihenstephan (both in Germany). Please contact Thomas Rattei (Department of Genome Oriented Bioinformatics, TU Munich).

    The Applications
    The following applications exist:
    PCs with SSE support (all modern PCs and notebooks):
    simap 5.10 for Windows stable
    simap 5.12 for Windows64 stable
    simap 5.11 for Linux stable
    hmmer 5.09 for Windows stable
    hmmer 5.09 for Linux stable
    PowerPC/Intel based Apple computers (Mac's):
    simap 5.10 for MacOS >=10.3.9 stable
    hmmer 5.09 for MacOS >=10.3.9 stable
    PCs without SSE (e.g. PII, K6, early Athlons):
    i386-simap** 5.10 for Windows stable
    i386-simap** 5.11 for Linux stable
    PC/PARISC/IA64/Alpha/Sparc platforms for UNIX:
    UNIX simap** 5.10 stable
    UNIX hmmer** 5.09 stable

    ** These applications need manual installation

    The Stats
    The top-5 for recent average credit shows that Simap likes you to have multi-cores, as much as possible:
    1. Opteron 6282 SE (64 cores, so a quad 16-core)
    2. Xeon X5675 (24 cores, so two hyperthreaded 6-cores)
    3. Opteron 6174 (24 cores, so a dual 12-core)
    4. Xeon X5680 (24 cores, so two hyperthreaded 6-cores)
    5. Xeon L5639 (24 cores, so two hyperthreaded 6-cores)

    The all-time top-5 looks like this:
    1. Opteron 6164 HE (48 cores, so a quad 12-core)
    2. Opteron 6174 (24 cores, so a dual 12-core)
    3. Xeon E5430 (8 cores, so a dual quad-core)
    4. Opteron 6274 (32 cores, so a dual 16-core)
    5. Xeon L5420 (8 cores, so a dual quad-core)

    It gives you the idea of an application that is more into 'every core counts' than into optimaly utilizing the capabilities of the latest instrruction set(s) of the latest Intel or AMD consumer CPUs. But is that true? What does WuProp have for Simap?

    A score of 1000 credits per core per day seems to be about the limit for both Intel and AMD, and not every CPU is capable of reaching that value.

    Best scoring AMD machine in the WuProp database is a Quad twelve-core Opteron 6168, its 48 cores giving giving 23712 credits/day.
    Best performing AMD consumer CPUs are the six core Phenom IIs and the eight-core Bulldozers, reaching 6000+ credits/day.

    Intels Atoms and Celerons perform on par with the single AMD Duron in WuProp database, and are in the absolute lower end of the field, barely reaching 250 credits/core/day, and often failing that and even staying below 200. Coupled with the fact that these are at most dual cores they seem to be not suited for high scores in Simap.

    The only Intel consumer CPU models that reach the values the 6-core Phenoms and 8-core Bulldozers are capable of are the i7-970 and i7-980, whose 12 hyperthreaded cores can reach to the 6000+ credits/day level. P4 is barely better than the before mentioned Celerons and Atoms.
    Some PIIIs perform better per core (using a Tualatin 1133 with 64-bit Windows), but are only single cores. Older PIIIs are as bad as Celerons

    Intels server chips seem to face the same handicap as the consumer models, Simap-wise: too low a score per core/day, so only the 24-core models can come in the neighbourhood of the Opterons. For both Intel and AMD it can be said that their most recent chips perform best, so we can assume that the application is better optimized than you'd think at first glance. It does not depend on e.g clock rate, P4 failing to impress totally.
    Last edited by Dirk Broer; 05-03-2012 at 07:00 PM. Reason: taypo


  2. #2
    Join Date
    Jan 2007
    Location
    Vermont, USA
    Posts
    1,379
    Dirk,
    I really thought you would get to 500k before me. I did not know my 2 x6 phenoms would catch-up so fast (they had a little help just for the race...).

    Back to WCG for me!
    Logic is the art of being wrong with confidence.


  3. #3
    Join Date
    Sep 2010
    Location
    Leiden, the Netherlands
    Posts
    4,372
    That sure is a nice string of badges for WCG (and every chance to make them even more pretty the next week)!


  4. #4
    Join Date
    Jan 2007
    Location
    Vermont, USA
    Posts
    1,379
    We'll see Dirk. I'm back down to only 2 phenoms. That's my normal 'set it and forget it' setup. They get me 5 - 6k a day with 12 cores (and depending on the wing men (wing person??) I get).
    Logic is the art of being wrong with confidence.


  5. #5
    Join Date
    Sep 2010
    Location
    Leiden, the Netherlands
    Posts
    4,372
    Any new perspectives on the most efficient CPU for Simap?
    CPU Credit per day per core (WUProp) Cores Credit per day per CPU
    AMD A-10 5800
    1,062.7
    4
    4,250.8
    AMD FX-8350
    1,219.3
    8
    9,674.4


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •