Opstone

Blue Sail Software's Opstone benchmarks were used in this portion of the review. We will use the Athlon XP 32-bit precompiled optimized binaries of the Scalar Product (SP) and Sparce Scalar Product (SSP) benchmark. Unfortunately, this means the Athlon 64 does not receive the benifit of SSE2 in this benchmark. The SP benchmark is explained by the author:

"The 'SP' benchmark calculates the scalar product (dot product) of 2 vectors ranging in size from 16 elements to 1048576 elements for both single and double-precision floats. Although the Gflops/sec. for every vector length is recorded (in the resulting output log file), the average of all these values is reported. This benchmark is indicative of the performance of many raw floating-point data processing apps (movie format conversion, MP3 extraction, etc.)"

Opstone 04q2: Scalar Product

The integer intensity scalar product benchmark is relatively unscathed by the difference in L2 cache, with the exception of a slightly higher sustained mean GFlops.

Below is the SSP benchmark, as explained by the author:

 

"The 'ssp' benchmark also calculates the scalar product of 2 vectors, except that these vectors are sparsely populated (only the non-zero value elements are stored) ranging from a 'loading factor' (non-zero/zero elements) of 0.000001 to 0.01 for both single and double-precision floats. Since the data is not contiguous in memory, the performance is much lower than regular 'sp' and is measured in Mflops/sec. There is not much difference in performance between different loading factors as this benchmark really challenges the ability of the processor to perform short bursts of calculations coupled with lots of conditional testing. It is this reason that the P4 with its longer pipeline does not generally perform as well as the Athlon64. This benchmark is indicative of the performance of many 3D games as the processing is similar (short bursts of calculations with numerous conditional testing)"

Opstone 04q2: Sparse Scalar Product

Floating point operation scales much better than integer processing if we are to trust Opstone. All three processors scale in the same order of their price range, although the AMD PR rating obviously does not hold on this benchmark.

Rendering Benchmarks Content Creation
Comments Locked

59 Comments

View All Comments

  • dougSF30 - Friday, August 20, 2004 - link

    Kris,

    It doesn't matter if Intel is formally using GHz any longer or not. (And GHZ is still featured prominently in nearly every Intel part or system offered for sale today, but that is really beside the point.)

    The simplest way to put it is, whether or not any of us LIKES it, Sempron PR is not designed to be equivalent to A64 PR.

    Thus it is misleading to imply that there may be something wrong with the Sempron 3100+ PR rating based on relative performance with any A64 parts.

    Sure, I'd like it if AMD copied Intel model numbers for all their parts, but there may be a legal reason they are hesistant to do so.

    Anyway, just a small nitpick in an otherwise excellent review.

    Doug
  • KristopherKubicki - Thursday, August 19, 2004 - link

    Hi Doug,

    Since i noticed you copied your email along in the comments, ill copy your response back into the comments too:

    Hi Doug,

    I would agree with your logic except the fact that Celerons no longer go by any sort of similar rating system. If you look back to May before the Sempron line was announced, AMD was beginning to rollout a Product Code numbering scheme.

    http://www.anandtech.com/cpuchipsets/showdoc.aspx?...

    Then suddenly, this was scrapped from the roadmaps and AMD went back to the PR rating system, *after* the induction of the Celeron D line. You and I know the Sempron 3100+ competes against a Celeron D ~340 but that is definitely obscured by the PR rating.

    Your claim is that since Intel uses a higher GHz rating for its older Celeron CPUs, AMD should be allowed to do the same for its budget Semprons. I don't think its acceptable for AMD nor Intel to use a PR or GHz rating system to sell their processors if they don't adhere by the same rating standards from processor to processor!

    Let's face it, AMD already does it with the 3400+ and the 3500+; dual channel or not they perform within 1% of each other! Do dual channel Athlons get a different rating system than single channel? In the same argument do we claim half cache processors get a different rating system than full cache ones?

    Kristopher Kubicki
    Senior Editor, AnandTech.com
    email: kristopher@anandtech.com
  • dougSF30 - Thursday, August 19, 2004 - link

    Kris,


    AMD has made it very clear that the Sempron PR rating system is NOT equivalent to the A64 PR rating system.


    So you can't conclude that you cannot "vouch" for the Sempron rating of 3100+, compared to the A64 3000+ or 2800+, as those figures are NOT MEANT to be compared.


    Sempron PR is designed to rate against Celeron clockspeed, whatever AMD says officially about a "different suite" of benchmarks for legal reasons.


    And A64 PR is designed to rate against the full P4.


    So, given that Celeron performance is much less than P4 performance at the same clockspeed, another way to say this is:


    For a given level of performance, Celeron clock is much higher than P4 clock.


    Thus is follows automatically that Sempron PR is higher than A64 PR for a given level of performance.

    It's not "moot" because Intel is also labeling Celeron parts with Model numbers... the point is still valid: Sempon PR is a completely different rating system than A64 PR.

    The only way AMD could have been less confusing would have been to copy the Intel Celeron model numbers, with the Sempron "330" "340", etc., but there may be reasons (legal?) they cannot do that.

    Pretty simple, no?


    Doug
  • KristopherKubicki - Thursday, August 19, 2004 - link

    Matt did you see this as well:

    http://www.anandtech.com/linux/showdoc.aspx?i=2163...

    Youre getting higher numbers than i got with my Xeon 3.6GHz chip.

    Kristopher
  • Matthew Daws - Thursday, August 19, 2004 - link

    Some further comments about TSCP: I found an old article over at Ace's which used it. The numbers don't seem comparable, but the article does say that the P4 does very, very well. Still not sure why I am getting different numbers to you. Have you run the windows benchmark which can be downloaded: that should give an indication of the numbers you might expect on linux...

    --Matt
  • PrinceGaz - Thursday, August 19, 2004 - link

    Interesting results but it would probably have been more relevant to the majority if it was the standard Windows benchmarks as everything was 32-bit.

    When I saw "L2 cache: Sempron vs Athlon" and "three 1.8GHz offerings from AMD", I really expected to see the Sempr0n 3100+ (256K), and Athlon 64 2800+ (512K) that you used, tested along with a Athlon 64 3200+ (1MB) set to a 9x multiplier. Then we'd see all three cache sizes on otherwise identical chips in 32-bit mode to truly show what effect L2 cache size has. Throwing in an Athlon XP instead as a third AMD 1.8GHz chip was rather meaningless as there are far too many other differences.

    The results do reflect what we've seen in the past that the 512K -> 256K L2 cache halving doesn't have a significant impact on performance in most apps, certainly not the crippling effect it has on the P4 architecture. Of course with the exclusive 128K L1 cache we're really only looking at a 40% (640K -> 384K) cache reduction.

    I've got to disagree with your conclusion I'm afraid. Given what is a very small price difference between the Sempr0n 3100+ and A64 2800+, spending the $20 extra for the A64 2800+ is a no brainer when you consider total system cost. Throw in just a S754 mobo and the performance difference alone already makes the A64 2800+ a viable option. People buying S754 systems aren't seriously looking to upgrade in the future (else they'd go S939). And being stuck with a Sempr0n 3100+ means you miss out on all the benefits of 64-bit in a year or two.
  • Matthew Daws - Thursday, August 19, 2004 - link

    Kris: Hmm, well, using -O3 -march=pentium4, under Windows, I get with my Celeron 2GHz:

    GCC 3.3.3 -- 272K
    GCC 3.4.1 -- 280K

    GCC 3.3.3 (-02 -march=pentium4) -- 273K
    GCC 3.3.3 (-O3 only) -- 262K

    Just with pure clock-speed scaling, I'd expect 20% increase with a 2.4C, so 300K or so...

    You shouldn't see any difference with linux: indeed, only a linux box I have access to, with GCC 3.2.2 (I *think* it's a P4 2.8GHz, but I'm not 100% sure: I'm doing a remote-login right now, so cannot check!) I get 365K with (-O3 -march=pentium4). This seems to be pretty linear clock scaling, which we might expect if the memory usage is low...

    An Athlon *should* excel at this sort of test, at least given other benchmarks.

    Just to check: I am using the source-code from Tom Kerrigan, at http://home.comcast.net/~tckerrigan/tscp181.zip

    --Matt
  • KristopherKubicki - Thursday, August 19, 2004 - link

    Hi Matt: Just -O2/3 -march=athlon

    I emailed a few people about your results. I have a 2.4C here that only does 250K with GCC 3.3.3

    Kristopher
  • Matthew Daws - Thursday, August 19, 2004 - link

    Kris: I don't have an AthlonXP, so I cannot comment. I was using GCC 3.4.1 (MinGW version for Windows) which might explain the difference. Still, I would expect any of the processors in this test to completely thrash my Celeron...

    What flags were you using with the AthlonXP? I'm pretty sure it's not an SSE(2) issue...

    --Matt
  • KristopherKubicki - Thursday, August 19, 2004 - link

    Matthew Daws: I was getting wild results with my TSCP on the athlon xp, which is why i didnt include it. I assumed there was some optimization somewhere that shouldnt have been.

    which GCC version are you using? on the a64 platforms ive seen as much as 30% increases using GCC 3.4.1 over 3.3.3.

    Kristopher

Log in

Don't have an account? Sign up now