.PH ""
.ce
Performance Report
.ce
Revision 3.24 of the Pascal Workstation
.ce
Jeff Hendershot
.ce
April 12, 1991
.SP
.H 1 "Summary"
.P
Revision 3.24 of the Pascal Workstation (PaWS) adds support of the
380 SPU and the Hp-ux SRM/UX server. Any increased performance of
the operating system is due to the improved performance of the
hardware as this release does not change the way code is generated
or the way the operating system works.
.P
Of interest is the performance of the floating point instructions.
The MC68040 processor does not support a floating point co-processor.
Instead some of the floating point instructions are supported
directly on the 68040 itself. The remaining floating point instructions
are supported by way of a floating point emulation package (FP40).
Some timing results were obtained comparing the speed of these
instructions with the speed of execution for an MC68882 co-processor
at 33 and 50 Mhz, and these results are included below.
.P
Our tests indicate that for Dhrystones per second, PaWS on the 380 SPU
is 55% faster than on the 375 SPU and 125% faster than on the 370 SPU.
.P
The Wirth tests, which is a general set of tests that measure
loop timing, real math, array operations, procedure calls, and
linked list/memory management performance, indicate a 34% performance
increase for the 380 over the 375 and a 70% performance increase
for the 380 over the 370.
.P
Floating point operations generally show a performance speedup 
for operations carried out on the MC68040 itself. FADD, FSUB, and
FMUL all showed speedups by a factor of 6 when compared to the
fastest 68882 (50 Mhz).
.P
Operations emulated with the FP40 package show a performance slowdown 
when compared to the MC68882. Most of these stay within a factor of 3
when comparing execution times with the fastest co-processor
(68882 at 50 Mhz). Some operations do not do this well (see results
below).
.P
.H 1 "Benchmark Results"
.P
The benchmarking undertaken is not intended for publication outside HP.
The benchmark programs are:
.P	
.AL
.LI
dhry - the dhrystone benchmark, as implemented by Siemens AG and
loaned to HP.
.LI
wirth - a catchall program that does a little of everything, loop
timing, real math, array operations, procedure calls, and linked
list/memory management tasks. Also from Siemens AG.
.LI
real math - simple execution timing for floating point opcodes. These
provide useful comparisons between the FP40 package and the MC68882
co-processor.
.LE
.P
.H 2 "Dhrystone Benchmarks"
.P
The dhrystone benchmark program was run against the 380. Numbers obtained
previously for the 375, 370, 360 and 332 are included for comparison.
.P
.DS
.ul
     380        375        370        360        332

   20449.9    13157.9     9090.9     6578.9     4366.8

    (the units are Dhrystones per Second)

To obtain a relative performance increase, the following formula
was used:

	x% = (380time/3xxtime - 1)*100

The 380 vs the 375 yields

	20449.9/13157.9      ==>  55% increase in performance.

The 380 vs the 370 yields

	20449.9/9090.9       ==> 125% increase in performance.

The 380 vs the 360 yields

	20449.9/6578.9       ==> 211% increase in performance.

The 380 vs the 332 yields

	20449.9/4366.8       ==> 368% increase in performance.
.DE
.P
.H 2 "Wirth Benchmarks"
The wirth benchmark program was run against the 380. Previous results
for the 375, 370, 360, and 332 are included for comparison.
.P
.DS

     Test    380      375      370      360      332
    ______  _____    _____    _____    _____    _____

      A        9        8       12        16        24
      B       12       11       16        21        31
      C        7        8       12        16        23
      D       52       49       74        98       147
      E       15       42       55        68       100
      F       47       25       31        41        60
      G       35       36       50        72       104
      H       61       56       79       109       161
      I       11       10       14        20        29
      J       14       13       18        24        36
      K      117      126      133       208       289
      L      123      146      156       223       379
      M       54      222      309       297       442
      N       27       33       35        63        92
            ____     _____    _____    ______    ______  
    Total    584      785      994      1276      1917

    (The units are elapsed time in centiseconds)

To obtain a relative performance increase, the following formula
was used:

	x% = (1 - 3xxtime/375time)*100

NOTE: that because a small number is an increase, the ratio is
inverted.

The 380 vs the 375 yields

	785/584      ==>    34% increase in performance.

The 380 vs the 370 yields

	994/584      ==>    70% increase in performance.

The 380 vs the 360 yields

	1276/584     ==>   119% increase in performance.

The 380 vs the 332 yields

	1917/584     ==>   228% increase in performance.

.DE
.P
.H 2 "Floating Point Instruction Timing"
.P
The floating point instruction timings were obtained with a 25 Mhz 380,
a 50 Mhz 345, and a 33 Mhz 360. 

Here are the results (elapsed time - smaller number is faster):
.DS

     Opcode           380      345/68882   360/68882
    ________        ________   _________   _________

      FADD              3          17          23     (on chip - 68040)
      FSUB              3          17          23     (on chip - 68040)
      FMUL              3          29          43     (on chip - 68040)
      FDIV             33          48          73     (on chip - 68040)
      FSQRT            98          49          75     (on chip - 68040)
      FSGLMUL           3          20          29     (on chip - 68040)  
      FSGLDIV          33          27          39     (on chip - 68040) 

      FSIN            625         205         325
      FCOS            627         207         328
      FTAN            655         255         405
      FASIN           901         334         531
      FACOS           852         354         563
      FATAN           580         233         369
      FSINH           859         349         555
      FCOSH           822         303         483
      FTANH           865         365         575
      FATANH          866         420         669
      FLOG2           770         352         559
      FLOG10          725         352         559
      FLOGN           789         347         551
      FETOX           706         232         367
      FTENTOX         701         330         525
      FTWOTOX         696         330         525 
      FINT            719          17          23
      FINTRZ          697          18          25
      FGETEXP         471          11          20
      FGETMAN         477          11          20
      FREM            517          11          20
      FSCALE          546          11          20
      FMOD            517          11          20
      

.DE
.P
The floating point package generally produces results in the same
ballpark as compared to what can be obtained with a 68882 at the
fastest clock speed. The one exception to this is for the opcodes
FINT, FINTRZ, FGETEXP, FGETMAN, FREM, FSCALE, and FMOD. All of
these opcodes are easy to implement in silicon and are always
computable way faster (10x or greater) with the 68882 than with
the FP40 package.
.P
.H 1 "Conclusion"
.P
Integer processing performance of this revision of PaWS on the 380 is
significantly faster than prior product offerings. Floating point
processing is faster for some opcodes (FADD, FSUB, FMUL, and FDIV)
and slower for others. While we can say with confidence that PaWS
running on the 380 is generally "half again as fast as a 375", this
cannot be guaranteed for any specific benchmark or application.

