Skip to content

Add sm120 tunings for DevicePartition::Flagged 1/2#8846

Draft
gonidelis wants to merge 5 commits intoNVIDIA:mainfrom
gonidelis:partition_flagged_tuning
Draft

Add sm120 tunings for DevicePartition::Flagged 1/2#8846
gonidelis wants to merge 5 commits intoNVIDIA:mainfrom
gonidelis:partition_flagged_tuning

Conversation

@gonidelis
Copy link
Copy Markdown
Member

@gonidelis gonidelis commented May 6, 2026

this builds on top of #7796 (I should probably using stacked PRs for this??)

It adds tunings from old DBs I scraped. Will be running more for the larger CT workloads and append them on a second pr.

posting verification results shortly...

only I8/I32, I8/I64 make sense. keeping these two, dropping everything else and re-tuning

gonidelis added 4 commits May 6, 2026 06:43
Verification on RTX PRO 6000 Blackwell Server Edition (sm_120, 188 SMs)
showed +1% to +9% regressions for all encoded I32/I64/F32/F64 entries,
while I8/I16 entries remained -3% to -33% FAST. The Workstation-EVO
winners do not transfer to Server for size-4+ kernels.

Drops:
  (size=4, off=4, distinct=F)
  (size=4, off=4, distinct=T)
  (size=4, off=8, distinct=F)
  (size=4, off=8, distinct=T)
  (size=8, off=4, distinct=F)
  (size=8, off=8, distinct=F)

Remaining encodings (7): all on input_size 1 and 2.
Adds 6 sm120_tuning specializations for the partition.flagged variant
(flagged::yes, keep_rejects::yes), covering the CT-axis combos
where Workstation EVO produced clean winners:

  I8 / I32 / distinct_partitions::no
  I8 / I32 / distinct_partitions::yes
  I8 / I64 / distinct_partitions::no   (2nd-best; 1st EVO winner was absurd)
  I8 / I64 / distinct_partitions::yes
  I16 / I32 / distinct_partitions::no
  I16 / I32 / distinct_partitions::yes

Routes through the same Policy1200 chain added by the partition.if PR
(via the shared sm120_tuning template). Verification on the Server SKU
is pending; size-4+ entries are intentionally absent for now (per
prior analysis those were Workstation-only winners).

Stacked on partition_tuning branch (PR for partition.if).
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 6, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 6, 2026
@gonidelis
Copy link
Copy Markdown
Member Author

original results for 2^28 elements - dropping the bad ones
image

Verification on RTX PRO 6000 Blackwell Server Edition (sm_120) showed
4 of the 6 originally-encoded partition.flagged entries are SLOW or
flat at 2^28:

  I8 / I32 / distinct=false : +2.20%  (7 SLOW across sizes)
  I8 / I64 / distinct=false : -1.22%  but 5 SLOW ("2nd-best after absurd")
  I16 / I32 / distinct=false: +1.38%
  I16 / I32 / distinct=true : +0.58%

Only the distinct=true variants for I8 transferred cleanly:

  I8 / I32 / distinct=true  : -3.10%  (5 FAST / 0 SLOW / 7 SAME)
  I8 / I64 / distinct=true  : -8.48%  (6 FAST / 2 SLOW / 4 SAME, SLOWs at 2^16)

Same Workstation->Server transfer pattern as partition.if: distinct=false
landscapes diverge meaningfully across SKUs. Remaining CT combos pending
fresh EVO sweep on Server.
@gonidelis
Copy link
Copy Markdown
Member Author

elaborate results

['/home/ggonidelis/partition_jsons/flagged_main.json', '/home/ggonidelis/partition_jsons/flagged_tuning.json']
# base

## [0] NVIDIA RTX PRO 6000 Blackwell Server Edition

|  T{ct}  |  OffsetT{ct}  |  DistinctPartitions{ct}  |  Elements{io}  |  Entropy  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|---------|---------------|--------------------------|----------------|-----------|------------|-------------|------------|-------------|------------|---------|----------|
|   I8    |      I32      |          false           |      2^16      |     1     |   8.262 us |       3.11% |   6.341 us |       8.33% |  -1.921 us | -23.25% |   FAST   |
|   I8    |      I32      |          false           |      2^20      |     1     |  12.064 us |       5.70% |  12.120 us |       3.40% |   0.056 us |   0.46% |   SAME   |
|   I8    |      I32      |          false           |      2^24      |     1     |  49.818 us |       2.12% |  50.996 us |       1.06% |   1.178 us |   2.36% |   SLOW   |
|   I8    |      I32      |          false           |      2^28      |     1     | 588.090 us |       0.25% | 600.931 us |       0.29% |  12.841 us |   2.18% |   SLOW   |
|   I8    |      I32      |          false           |      2^16      |   0.544   |  10.142 us |       4.59% |  10.240 us |       0.00% |   0.098 us |   0.97% |   SLOW   |
|   I8    |      I32      |          false           |      2^20      |   0.544   |  11.682 us |       7.41% |  11.834 us |       6.72% |   0.152 us |   1.30% |   SAME   |
|   I8    |      I32      |          false           |      2^24      |   0.544   |  50.558 us |       1.86% |  51.496 us |       1.79% |   0.938 us |   1.85% |   SLOW   |
|   I8    |      I32      |          false           |      2^28      |   0.544   | 590.177 us |       0.24% | 604.271 us |       0.29% |  14.094 us |   2.39% |   SLOW   |
|   I8    |      I32      |          false           |      2^16      |     0     |  10.240 us |       0.00% |  10.192 us |       2.58% |  -0.048 us |  -0.47% |   FAST   |
|   I8    |      I32      |          false           |      2^20      |     0     |  12.288 us |       0.00% |  12.197 us |       2.61% |  -0.092 us |  -0.74% |   ????   |
|   I8    |      I32      |          false           |      2^24      |     0     |  49.433 us |       1.59% |  50.812 us |       1.64% |   1.379 us |   2.79% |   SLOW   |
|   I8    |      I32      |          false           |      2^28      |     0     | 587.800 us |       0.23% | 599.793 us |       0.26% |  11.993 us |   2.04% |   SLOW   |
|   I8    |      I32      |           true           |      2^16      |     1     |  10.066 us |       4.29% |   9.901 us |       6.85% |  -0.165 us |  -1.64% |   SAME   |
|   I8    |      I32      |           true           |      2^20      |     1     |  10.148 us |       4.17% |  10.160 us |       5.43% |   0.013 us |   0.13% |   SAME   |
|   I8    |      I32      |           true           |      2^24      |     1     |  50.134 us |       2.17% |  49.902 us |       2.29% |  -0.231 us |  -0.46% |   SAME   |
|   I8    |      I32      |           true           |      2^28      |     1     | 639.126 us |       2.36% | 614.528 us |       0.61% | -24.598 us |  -3.85% |   FAST   |
|   I8    |      I32      |           true           |      2^16      |   0.544   |  10.240 us |       0.00% |   8.192 us |       0.00% |  -2.048 us | -20.00% |   FAST   |
|   I8    |      I32      |           true           |      2^20      |   0.544   |  10.240 us |       0.00% |  10.218 us |       2.44% |  -0.022 us |  -0.21% |   FAST   |
|   I8    |      I32      |           true           |      2^24      |   0.544   |  51.154 us |       1.19% |  50.881 us |       1.34% |  -0.273 us |  -0.53% |   SAME   |
|   I8    |      I32      |           true           |      2^28      |   0.544   | 623.371 us |       1.61% | 619.877 us |       0.70% |  -3.494 us |  -0.56% |   SAME   |
|   I8    |      I32      |           true           |      2^16      |     0     |  10.107 us |       4.72% |   8.135 us |       3.27% |  -1.972 us | -19.51% |   FAST   |
|   I8    |      I32      |           true           |      2^20      |     0     |  10.290 us |       4.22% |  10.247 us |       2.32% |  -0.043 us |  -0.42% |   SAME   |
|   I8    |      I32      |           true           |      2^24      |     0     |  51.083 us |       1.31% |  51.140 us |       0.78% |   0.057 us |   0.11% |   SAME   |
|   I8    |      I32      |           true           |      2^28      |     0     | 638.557 us |       2.55% | 607.286 us |       0.64% | -31.271 us |  -4.90% |   FAST   |
|   I8    |      I64      |          false           |      2^16      |     1     |   8.941 us |      11.41% |   8.836 us |      10.97% |  -0.105 us |  -1.17% |   SAME   |
|   I8    |      I64      |          false           |      2^20      |     1     |  12.337 us |       3.16% |  12.998 us |       7.51% |   0.661 us |   5.36% |   SLOW   |
|   I8    |      I64      |          false           |      2^24      |     1     |  49.093 us |       0.92% |  49.084 us |       0.73% |  -0.008 us |  -0.02% |   SAME   |
|   I8    |      I64      |          false           |      2^28      |     1     | 626.749 us |       0.84% | 620.654 us |       0.84% |  -6.096 us |  -0.97% |   FAST   |
|   I8    |      I64      |          false           |      2^16      |   0.544   |  10.046 us |       4.34% |  10.240 us |       0.00% |   0.194 us |   1.93% |   SLOW   |
|   I8    |      I64      |          false           |      2^20      |   0.544   |  10.453 us |       8.76% |  12.180 us |       3.40% |   1.727 us |  16.53% |   SLOW   |
|   I8    |      I64      |          false           |      2^24      |   0.544   |  50.093 us |       2.25% |  49.939 us |       2.04% |  -0.153 us |  -0.31% |   SAME   |
|   I8    |      I64      |          false           |      2^28      |   0.544   | 638.767 us |       0.89% | 631.036 us |       0.83% |  -7.731 us |  -1.21% |   FAST   |
|   I8    |      I64      |          false           |      2^16      |     0     |   8.192 us |       0.00% |   8.734 us |      10.64% |   0.542 us |   6.62% |   SLOW   |
|   I8    |      I64      |          false           |      2^20      |     0     |  10.240 us |       0.00% |  11.453 us |       9.11% |   1.213 us |  11.85% |   SLOW   |
|   I8    |      I64      |          false           |      2^24      |     0     |  49.405 us |       1.47% |  49.294 us |       1.97% |  -0.111 us |  -0.23% |   SAME   |
|   I8    |      I64      |          false           |      2^28      |     0     | 634.132 us |       0.83% | 624.679 us |       0.74% |  -9.453 us |  -1.49% |   FAST   |
|   I8    |      I64      |           true           |      2^16      |     1     |  10.240 us |       0.00% |   8.177 us |       1.44% |  -2.063 us | -20.15% |   FAST   |
|   I8    |      I64      |           true           |      2^20      |     1     |  12.396 us |       4.23% |  12.298 us |       1.46% |  -0.098 us |  -0.79% |   SAME   |
|   I8    |      I64      |           true           |      2^24      |     1     |  50.023 us |       2.10% |  51.012 us |       1.37% |   0.988 us |   1.98% |   SLOW   |
|   I8    |      I64      |           true           |      2^28      |     1     | 657.379 us |       2.34% | 594.536 us |       0.49% | -62.843 us |  -9.56% |   FAST   |
|   I8    |      I64      |           true           |      2^16      |   0.544   |  10.240 us |       0.00% |  10.074 us |       5.34% |  -0.166 us |  -1.62% |   FAST   |
|   I8    |      I64      |           true           |      2^20      |   0.544   |  10.319 us |       5.18% |  10.893 us |       8.94% |   0.574 us |   5.56% |   SLOW   |
|   I8    |      I64      |           true           |      2^24      |   0.544   |  50.827 us |       1.70% |  51.202 us |       1.11% |   0.375 us |   0.74% |   SAME   |
|   I8    |      I64      |           true           |      2^28      |   0.544   | 641.288 us |       2.09% | 596.355 us |       0.54% | -44.933 us |  -7.01% |   FAST   |
|   I8    |      I64      |           true           |      2^16      |     0     |   9.998 us |       5.18% |   7.966 us |       5.77% |  -2.032 us | -20.32% |   FAST   |
|   I8    |      I64      |           true           |      2^20      |     0     |  10.144 us |       3.57% |  10.257 us |       3.17% |   0.113 us |   1.11% |   SAME   |
|   I8    |      I64      |           true           |      2^24      |     0     |  49.952 us |       2.32% |  50.751 us |       1.73% |   0.799 us |   1.60% |   SAME   |
|   I8    |      I64      |           true           |      2^28      |     0     | 649.675 us |       2.36% | 591.964 us |       0.44% | -57.711 us |  -8.88% |   FAST   |
|   I16   |      I32      |          false           |      2^16      |     1     |  10.240 us |       0.00% |   8.183 us |       1.31% |  -2.057 us | -20.09% |   FAST   |
|   I16   |      I32      |          false           |      2^20      |     1     |  16.384 us |       0.00% |  14.208 us |       3.12% |  -2.176 us | -13.28% |   FAST   |
|   I16   |      I32      |          false           |      2^24      |     1     |  69.444 us |       1.30% |  66.280 us |       1.73% |  -3.164 us |  -4.56% |   FAST   |
|   I16   |      I32      |          false           |      2^28      |     1     | 943.515 us |       0.53% | 954.071 us |       0.30% |  10.556 us |   1.12% |   SLOW   |
|   I16   |      I32      |          false           |      2^16      |   0.544   |  10.836 us |       8.92% |   8.286 us |       6.16% |  -2.549 us | -23.53% |   FAST   |
|   I16   |      I32      |          false           |      2^20      |   0.544   |  14.336 us |       0.00% |  13.144 us |       7.61% |  -1.192 us |  -8.31% |   ????   |
|   I16   |      I32      |          false           |      2^24      |   0.544   |  70.639 us |       1.74% |  67.822 us |       1.41% |  -2.817 us |  -3.99% |   FAST   |
|   I16   |      I32      |          false           |      2^28      |   0.544   | 942.381 us |       0.37% | 957.515 us |       0.33% |  15.134 us |   1.61% |   SLOW   |
|   I16   |      I32      |          false           |      2^16      |     0     |  10.211 us |       2.14% |   8.167 us |       2.07% |  -2.044 us | -20.02% |   FAST   |
|   I16   |      I32      |          false           |      2^20      |     0     |  14.078 us |       4.74% |  12.675 us |       6.36% |  -1.403 us |  -9.97% |   FAST   |
|   I16   |      I32      |          false           |      2^24      |     0     |  69.833 us |       1.10% |  67.170 us |       1.32% |  -2.663 us |  -3.81% |   FAST   |
|   I16   |      I32      |          false           |      2^28      |     0     | 943.602 us |       0.56% | 957.019 us |       0.35% |  13.418 us |   1.42% |   SLOW   |
|   I16   |      I32      |           true           |      2^16      |     1     |   9.917 us |       6.32% |   8.162 us |       2.26% |  -1.755 us | -17.70% |   FAST   |
|   I16   |      I32      |           true           |      2^20      |     1     |  14.250 us |       2.15% |  12.262 us |       2.21% |  -1.988 us | -13.95% |   FAST   |
|   I16   |      I32      |           true           |      2^24      |     1     |  68.562 us |       1.59% |  66.283 us |       1.52% |  -2.279 us |  -3.32% |   FAST   |
|   I16   |      I32      |           true           |      2^28      |     1     | 944.636 us |       0.51% | 948.660 us |       0.25% |   4.025 us |   0.43% |   SLOW   |
|   I16   |      I32      |           true           |      2^16      |   0.544   |  10.027 us |       4.53% |   8.166 us |       2.22% |  -1.861 us | -18.56% |   FAST   |
|   I16   |      I32      |           true           |      2^20      |   0.544   |  16.365 us |       0.84% |  14.299 us |       3.23% |  -2.066 us | -12.62% |   FAST   |
|   I16   |      I32      |           true           |      2^24      |   0.544   |  70.920 us |       1.68% |  67.529 us |       0.69% |  -3.392 us |  -4.78% |   FAST   |
|   I16   |      I32      |           true           |      2^28      |   0.544   | 942.194 us |       0.38% | 950.090 us |       0.26% |   7.896 us |   0.84% |   SLOW   |
|   I16   |      I32      |           true           |      2^16      |     0     |  10.240 us |       0.00% |   9.255 us |      11.22% |  -0.985 us |  -9.62% |   FAST   |
|   I16   |      I32      |           true           |      2^20      |     0     |  14.319 us |       0.91% |  12.288 us |       0.00% |  -2.031 us | -14.19% |   ????   |
|   I16   |      I32      |           true           |      2^24      |     0     |  69.344 us |       1.34% |  65.993 us |       1.61% |  -3.351 us |  -4.83% |   FAST   |
|   I16   |      I32      |           true           |      2^28      |     0     | 944.111 us |       0.57% | 948.515 us |       0.24% |   4.404 us |   0.47% |   SLOW   |
|   I16   |      I64      |          false           |      2^16      |     1     |   8.196 us |       3.02% |   9.415 us |      10.63% |   1.219 us |  14.88% |   SLOW   |
|   I16   |      I64      |          false           |      2^20      |     1     |  12.259 us |       1.52% |  12.288 us |       0.00% |   0.029 us |   0.23% |   ????   |
|   I16   |      I64      |          false           |      2^24      |     1     |  67.399 us |       0.95% |  67.455 us |       1.01% |   0.056 us |   0.08% |   SAME   |
|   I16   |      I64      |          false           |      2^28      |     1     | 956.073 us |       0.49% | 957.963 us |       0.51% |   1.891 us |   0.20% |   SAME   |
|   I16   |      I64      |          false           |      2^16      |   0.544   |   8.302 us |       7.38% |   9.907 us |       6.67% |   1.606 us |  19.34% |   SLOW   |
|   I16   |      I64      |          false           |      2^20      |   0.544   |  12.365 us |       4.02% |  12.567 us |       5.81% |   0.202 us |   1.64% |   SAME   |
|   I16   |      I64      |          false           |      2^24      |   0.544   |  67.735 us |       0.84% |  68.083 us |       1.33% |   0.348 us |   0.51% |   SAME   |
|   I16   |      I64      |          false           |      2^28      |   0.544   | 968.845 us |       0.68% | 971.328 us |       0.70% |   2.483 us |   0.26% |   SAME   |
|   I16   |      I64      |          false           |      2^16      |     0     |  10.003 us |       5.27% |  10.240 us |       0.00% |   0.237 us |   2.37% |   SLOW   |
|   I16   |      I64      |          false           |      2^20      |     0     |  12.108 us |       3.83% |  12.480 us |       5.15% |   0.372 us |   3.07% |   SAME   |
|   I16   |      I64      |          false           |      2^24      |     0     |  67.707 us |       0.85% |  68.052 us |       1.26% |   0.345 us |   0.51% |   SAME   |
|   I16   |      I64      |          false           |      2^28      |     0     | 957.897 us |       0.49% | 958.511 us |       0.53% |   0.614 us |   0.06% |   SAME   |
|   I16   |      I64      |           true           |      2^16      |     1     |  10.007 us |       6.46% |  10.081 us |       5.11% |   0.074 us |   0.74% |   SAME   |
|   I16   |      I64      |           true           |      2^20      |     1     |  12.288 us |       0.00% |  12.324 us |       5.85% |   0.036 us |   0.30% |   ????   |
|   I16   |      I64      |           true           |      2^24      |     1     |  66.065 us |       1.41% |  66.485 us |       1.59% |   0.419 us |   0.63% |   SAME   |
|   I16   |      I64      |           true           |      2^28      |     1     | 944.940 us |       0.31% | 946.382 us |       0.34% |   1.443 us |   0.15% |   SAME   |
|   I16   |      I64      |           true           |      2^16      |   0.544   |  10.174 us |       3.30% |  10.214 us |       1.71% |   0.040 us |   0.39% |   SAME   |
|   I16   |      I64      |           true           |      2^20      |   0.544   |  12.347 us |       3.41% |  12.652 us |       6.35% |   0.304 us |   2.46% |   SAME   |
|   I16   |      I64      |           true           |      2^24      |   0.544   |  68.684 us |       1.59% |  69.065 us |       1.43% |   0.381 us |   0.55% |   SAME   |
|   I16   |      I64      |           true           |      2^28      |   0.544   | 950.885 us |       0.34% | 952.064 us |       0.37% |   1.179 us |   0.12% |   SAME   |
|   I16   |      I64      |           true           |      2^16      |     0     |  10.205 us |       1.98% |  10.051 us |       4.27% |  -0.153 us |  -1.50% |   SAME   |
|   I16   |      I64      |           true           |      2^20      |     0     |  14.284 us |       1.73% |  14.312 us |       0.53% |   0.027 us |   0.19% |   SAME   |
|   I16   |      I64      |           true           |      2^24      |     0     |  66.597 us |       1.55% |  66.926 us |       1.51% |   0.329 us |   0.49% |   SAME   |
|   I16   |      I64      |           true           |      2^28      |     0     | 942.230 us |       0.29% | 943.505 us |       0.33% |   1.276 us |   0.14% |   SAME   |
|   I32   |      I32      |          false           |      2^16      |     1     |  10.056 us |       4.19% |  10.240 us |       0.00% |   0.184 us |   1.83% |   SLOW   |
|   I32   |      I32      |          false           |      2^20      |     1     |  18.328 us |       2.54% |  18.426 us |       1.26% |   0.098 us |   0.53% |   SAME   |
|   I32   |      I32      |          false           |      2^24      |     1     | 111.366 us |       0.96% | 111.606 us |       1.08% |   0.239 us |   0.21% |   SAME   |
|   I32   |      I32      |          false           |      2^28      |     1     |   1.667 ms |       0.16% |   1.668 ms |       0.18% |   0.759 us |   0.05% |   SAME   |
|   I32   |      I32      |          false           |      2^16      |   0.544   |  10.215 us |       1.64% |  10.223 us |       1.41% |   0.008 us |   0.08% |   SAME   |
|   I32   |      I32      |          false           |      2^20      |   0.544   |  18.418 us |       1.67% |  18.269 us |       2.52% |  -0.149 us |  -0.81% |   SAME   |
|   I32   |      I32      |          false           |      2^24      |   0.544   | 112.865 us |       0.88% | 113.067 us |       0.87% |   0.203 us |   0.18% |   SAME   |
|   I32   |      I32      |          false           |      2^28      |   0.544   |   1.669 ms |       0.16% |   1.670 ms |       0.17% |   1.136 us |   0.07% |   SAME   |
|   I32   |      I32      |          false           |      2^16      |     0     |  10.239 us |       0.05% |  10.221 us |       1.44% |  -0.018 us |  -0.17% |   FAST   |
|   I32   |      I32      |          false           |      2^20      |     0     |  16.384 us |       0.00% |  17.533 us |       6.22% |   1.149 us |   7.02% |   SLOW   |
|   I32   |      I32      |          false           |      2^24      |     0     | 112.769 us |       1.22% | 112.061 us |       1.02% |  -0.708 us |  -0.63% |   SAME   |
|   I32   |      I32      |          false           |      2^28      |     0     |   1.669 ms |       0.17% |   1.669 ms |       0.17% |  -0.189 us |  -0.01% |   SAME   |
|   I32   |      I32      |           true           |      2^16      |     1     |   7.861 us |      11.90% |   9.091 us |      12.19% |   1.230 us |  15.65% |   SLOW   |
|   I32   |      I32      |           true           |      2^20      |     1     |  19.720 us |       5.17% |  18.947 us |       4.69% |  -0.773 us |  -3.92% |   SAME   |
|   I32   |      I32      |           true           |      2^24      |     1     | 117.952 us |       1.05% | 116.528 us |       1.08% |  -1.424 us |  -1.21% |   FAST   |
|   I32   |      I32      |           true           |      2^28      |     1     |   1.663 ms |       0.11% |   1.663 ms |       0.11% |   0.118 us |   0.01% |   SAME   |
|   I32   |      I32      |           true           |      2^16      |   0.544   |  10.240 us |       0.00% |  10.240 us |       0.00% |   0.000 us |   0.00% |   SAME   |
|   I32   |      I32      |           true           |      2^20      |   0.544   |  19.774 us |       5.08% |  20.144 us |       3.83% |   0.370 us |   1.87% |   SAME   |
|   I32   |      I32      |           true           |      2^24      |   0.544   | 117.789 us |       1.03% | 117.913 us |       1.02% |   0.124 us |   0.11% |   SAME   |
|   I32   |      I32      |           true           |      2^28      |   0.544   |   1.664 ms |       0.11% |   1.664 ms |       0.11% |   0.318 us |   0.02% |   SAME   |
|   I32   |      I32      |           true           |      2^16      |     0     |  10.102 us |       3.73% |  10.204 us |       1.95% |   0.103 us |   1.02% |   SAME   |
|   I32   |      I32      |           true           |      2^20      |     0     |  16.634 us |       4.27% |  19.576 us |       5.16% |   2.942 us |  17.68% |   SLOW   |
|   I32   |      I32      |           true           |      2^24      |     0     | 115.513 us |       1.08% | 115.822 us |       0.99% |   0.308 us |   0.27% |   SAME   |
|   I32   |      I32      |           true           |      2^28      |     0     |   1.663 ms |       0.10% |   1.664 ms |       0.10% |   0.333 us |   0.02% |   SAME   |
|   I32   |      I64      |          false           |      2^16      |     1     |   8.217 us |       2.95% |   8.256 us |       4.32% |   0.039 us |   0.48% |   SAME   |
|   I32   |      I64      |          false           |      2^20      |     1     |  16.350 us |       3.21% |  18.500 us |       3.53% |   2.150 us |  13.15% |   SLOW   |
|   I32   |      I64      |          false           |      2^24      |     1     | 112.764 us |       0.96% | 113.054 us |       0.92% |   0.290 us |   0.26% |   SAME   |
|   I32   |      I64      |          false           |      2^28      |     1     |   1.656 ms |       0.10% |   1.656 ms |       0.10% |  -0.034 us |  -0.00% |   SAME   |
|   I32   |      I64      |          false           |      2^16      |   0.544   |  10.240 us |       0.00% |  10.131 us |       3.39% |  -0.109 us |  -1.07% |   FAST   |
|   I32   |      I64      |          false           |      2^20      |   0.544   |  18.591 us |       3.22% |  18.598 us |       3.07% |   0.006 us |   0.03% |   SAME   |
|   I32   |      I64      |          false           |      2^24      |   0.544   | 114.585 us |       1.00% | 114.527 us |       1.09% |  -0.058 us |  -0.05% |   SAME   |
|   I32   |      I64      |          false           |      2^28      |   0.544   |   1.659 ms |       0.11% |   1.659 ms |       0.11% |  -0.015 us |  -0.00% |   SAME   |
|   I32   |      I64      |          false           |      2^16      |     0     |  10.240 us |       0.00% |  10.238 us |       0.08% |  -0.002 us |  -0.02% |   FAST   |
|   I32   |      I64      |          false           |      2^20      |     0     |  16.298 us |       3.40% |  18.481 us |       2.27% |   2.184 us |  13.40% |   SLOW   |
|   I32   |      I64      |          false           |      2^24      |     0     | 112.870 us |       1.01% | 112.717 us |       1.02% |  -0.153 us |  -0.14% |   SAME   |
|   I32   |      I64      |          false           |      2^28      |     0     |   1.657 ms |       0.10% |   1.657 ms |       0.10% |  -0.129 us |  -0.01% |   SAME   |
|   I32   |      I64      |           true           |      2^16      |     1     |  13.884 us |       5.36% |  14.336 us |       0.00% |   0.452 us |   3.26% |   ????   |
|   I32   |      I64      |           true           |      2^20      |     1     |  22.195 us |       3.37% |  22.828 us |       3.77% |   0.634 us |   2.86% |   SAME   |
|   I32   |      I64      |           true           |      2^24      |     1     | 122.598 us |       0.89% | 122.534 us |       0.90% |  -0.064 us |  -0.05% |   SAME   |
|   I32   |      I64      |           true           |      2^28      |     1     |   1.689 ms |       0.09% |   1.689 ms |       0.09% |   0.001 us |   0.00% |   SAME   |
|   I32   |      I64      |           true           |      2^16      |   0.544   |  14.231 us |       3.27% |  14.251 us |       2.60% |   0.020 us |   0.14% |   SAME   |
|   I32   |      I64      |           true           |      2^20      |   0.544   |  22.528 us |       0.00% |  23.570 us |       4.55% |   1.042 us |   4.63% |   SLOW   |
|   I32   |      I64      |           true           |      2^24      |   0.544   | 124.279 us |       0.94% | 124.341 us |       0.91% |   0.062 us |   0.05% |   SAME   |
|   I32   |      I64      |           true           |      2^28      |   0.544   |   1.689 ms |       0.09% |   1.689 ms |       0.10% |   0.020 us |   0.00% |   SAME   |
|   I32   |      I64      |           true           |      2^16      |     0     |  14.312 us |       1.14% |  14.147 us |       3.02% |  -0.165 us |  -1.15% |   FAST   |
|   I32   |      I64      |           true           |      2^20      |     0     |  23.071 us |       4.15% |  23.025 us |       3.86% |  -0.045 us |  -0.20% |   SAME   |
|   I32   |      I64      |           true           |      2^24      |     0     | 122.461 us |       0.98% | 122.209 us |       0.91% |  -0.252 us |  -0.21% |   SAME   |
|   I32   |      I64      |           true           |      2^28      |     0     |   1.688 ms |       0.09% |   1.688 ms |       0.09% |  -0.070 us |  -0.00% |   SAME   |
|   I64   |      I32      |          false           |      2^16      |     1     |  12.288 us |       0.00% |  12.265 us |       1.31% |  -0.023 us |  -0.19% |   ????   |
|   I64   |      I32      |          false           |      2^20      |     1     |  25.647 us |       4.40% |  25.265 us |       3.88% |  -0.382 us |  -1.49% |   SAME   |
|   I64   |      I32      |          false           |      2^24      |     1     | 212.249 us |       0.96% | 212.368 us |       0.94% |   0.119 us |   0.06% |   SAME   |
|   I64   |      I32      |          false           |      2^28      |     1     |   3.119 ms |       0.09% |   3.119 ms |       0.08% |   0.005 us |   0.00% |   SAME   |
|   I64   |      I32      |          false           |      2^16      |   0.544   |  11.761 us |       6.77% |  12.279 us |       0.77% |   0.518 us |   4.40% |   SLOW   |
|   I64   |      I32      |          false           |      2^20      |   0.544   |  25.888 us |       3.88% |  25.911 us |       3.77% |   0.023 us |   0.09% |   SAME   |
|   I64   |      I32      |          false           |      2^24      |   0.544   | 212.393 us |       1.00% | 214.354 us |       0.86% |   1.961 us |   0.92% |   SLOW   |
|   I64   |      I32      |          false           |      2^28      |   0.544   |   3.121 ms |       0.15% |   3.121 ms |       0.14% |  -0.020 us |  -0.00% |   SAME   |
|   I64   |      I32      |          false           |      2^16      |     0     |  12.286 us |       0.06% |  12.262 us |       1.36% |  -0.024 us |  -0.19% |   FAST   |
|   I64   |      I32      |          false           |      2^20      |     0     |  25.489 us |       3.99% |  25.041 us |       4.07% |  -0.448 us |  -1.76% |   SAME   |
|   I64   |      I32      |          false           |      2^24      |     0     | 212.038 us |       1.03% | 212.705 us |       0.90% |   0.667 us |   0.31% |   SAME   |
|   I64   |      I32      |          false           |      2^28      |     0     |   3.119 ms |       0.08% |   3.119 ms |       0.09% |   0.159 us |   0.01% |   SAME   |
|   I64   |      I32      |           true           |      2^16      |     1     |  12.274 us |       0.93% |  12.288 us |       0.00% |   0.014 us |   0.12% |   ????   |
|   I64   |      I32      |           true           |      2^20      |     1     |  24.292 us |       3.08% |  24.343 us |       2.84% |   0.051 us |   0.21% |   SAME   |
|   I64   |      I32      |           true           |      2^24      |     1     | 212.773 us |       1.04% | 212.029 us |       1.01% |  -0.744 us |  -0.35% |   SAME   |
|   I64   |      I32      |           true           |      2^28      |     1     |   3.120 ms |       0.09% |   3.120 ms |       0.09% |  -0.082 us |  -0.00% |   SAME   |
|   I64   |      I32      |           true           |      2^16      |   0.544   |  12.288 us |       0.00% |  12.083 us |       3.71% |  -0.205 us |  -1.67% |   ????   |
|   I64   |      I32      |           true           |      2^20      |   0.544   |  24.671 us |       2.67% |  24.612 us |       2.37% |  -0.058 us |  -0.24% |   SAME   |
|   I64   |      I32      |           true           |      2^24      |   0.544   | 214.192 us |       1.02% | 213.957 us |       0.96% |  -0.234 us |  -0.11% |   SAME   |
|   I64   |      I32      |           true           |      2^28      |   0.544   |   3.121 ms |       0.11% |   3.122 ms |       0.12% |   0.157 us |   0.01% |   SAME   |
|   I64   |      I32      |           true           |      2^16      |     0     |  11.846 us |       6.34% |  10.240 us |       0.00% |  -1.606 us | -13.55% |   FAST   |
|   I64   |      I32      |           true           |      2^20      |     0     |  24.435 us |       2.39% |  24.521 us |       2.47% |   0.087 us |   0.36% |   SAME   |
|   I64   |      I32      |           true           |      2^24      |     0     | 212.638 us |       0.93% | 212.487 us |       0.98% |  -0.151 us |  -0.07% |   SAME   |
|   I64   |      I32      |           true           |      2^28      |     0     |   3.120 ms |       0.10% |   3.120 ms |       0.10% |   0.287 us |   0.01% |   SAME   |
|   I64   |      I64      |          false           |      2^16      |     1     |   9.616 us |       9.01% |   8.920 us |      10.89% |  -0.696 us |  -7.24% |   SAME   |
|   I64   |      I64      |          false           |      2^20      |     1     |  23.192 us |       6.19% |  22.990 us |       6.04% |  -0.202 us |  -0.87% |   SAME   |
|   I64   |      I64      |          false           |      2^24      |     1     | 206.067 us |       1.48% | 206.199 us |       1.33% |   0.132 us |   0.06% |   SAME   |
|   I64   |      I64      |          false           |      2^28      |     1     |   3.120 ms |       0.15% |   3.120 ms |       0.15% |  -0.054 us |  -0.00% |   SAME   |
|   I64   |      I64      |          false           |      2^16      |   0.544   |  10.240 us |       0.00% |   9.164 us |      11.08% |  -1.076 us | -10.50% |   FAST   |
|   I64   |      I64      |          false           |      2^20      |   0.544   |  23.286 us |       5.31% |  23.326 us |       5.71% |   0.040 us |   0.17% |   SAME   |
|   I64   |      I64      |          false           |      2^24      |   0.544   | 207.917 us |       1.56% | 207.800 us |       1.61% |  -0.117 us |  -0.06% |   SAME   |
|   I64   |      I64      |          false           |      2^28      |   0.544   |   3.119 ms |       0.12% |   3.119 ms |       0.14% |  -0.043 us |  -0.00% |   SAME   |
|   I64   |      I64      |          false           |      2^16      |     0     |  10.214 us |       1.72% |   8.896 us |      10.16% |  -1.318 us | -12.90% |   FAST   |
|   I64   |      I64      |          false           |      2^20      |     0     |  23.192 us |       5.92% |  22.914 us |       5.12% |  -0.278 us |  -1.20% |   SAME   |
|   I64   |      I64      |          false           |      2^24      |     0     | 205.257 us |       1.43% | 205.607 us |       1.59% |   0.350 us |   0.17% |   SAME   |
|   I64   |      I64      |          false           |      2^28      |     0     |   3.121 ms |       0.13% |   3.121 ms |       0.15% |  -0.177 us |  -0.01% |   SAME   |
|   I64   |      I64      |           true           |      2^16      |     1     |  12.142 us |       3.46% |  10.286 us |       2.88% |  -1.856 us | -15.29% |   FAST   |
|   I64   |      I64      |           true           |      2^20      |     1     |  23.737 us |       4.28% |  23.438 us |       4.43% |  -0.299 us |  -1.26% |   SAME   |
|   I64   |      I64      |           true           |      2^24      |     1     | 210.503 us |       0.86% | 210.316 us |       0.94% |  -0.187 us |  -0.09% |   SAME   |
|   I64   |      I64      |           true           |      2^28      |     1     |   3.114 ms |       0.10% |   3.114 ms |       0.12% |   0.376 us |   0.01% |   SAME   |
|   I64   |      I64      |           true           |      2^16      |   0.544   |  12.112 us |       3.49% |  10.757 us |       8.14% |  -1.355 us | -11.18% |   FAST   |
|   I64   |      I64      |           true           |      2^20      |   0.544   |  23.644 us |       4.60% |  23.560 us |       4.56% |  -0.084 us |  -0.35% |   SAME   |
|   I64   |      I64      |           true           |      2^24      |   0.544   | 211.391 us |       0.93% | 211.482 us |       0.92% |   0.091 us |   0.04% |   SAME   |
|   I64   |      I64      |           true           |      2^28      |   0.544   |   3.117 ms |       0.14% |   3.117 ms |       0.12% |  -0.490 us |  -0.02% |   SAME   |
|   I64   |      I64      |           true           |      2^16      |     0     |  12.286 us |       0.06% |  12.288 us |       0.00% |   0.002 us |   0.01% |   ????   |
|   I64   |      I64      |           true           |      2^20      |     0     |  23.470 us |       4.38% |  23.572 us |       4.38% |   0.102 us |   0.44% |   SAME   |
|   I64   |      I64      |           true           |      2^24      |     0     | 211.073 us |       0.96% | 210.888 us |       0.99% |  -0.185 us |  -0.09% |   SAME   |
|   I64   |      I64      |           true           |      2^28      |     0     |   3.114 ms |       0.10% |   3.113 ms |       0.09% |  -0.476 us |  -0.02% |   SAME   |
|  I128   |      I32      |          false           |      2^16      |     1     |  12.288 us |       0.00% |  10.359 us |       3.02% |  -1.929 us | -15.69% |   ????   |
|  I128   |      I32      |          false           |      2^20      |     1     |  34.565 us |       2.79% |  34.587 us |       3.23% |   0.021 us |   0.06% |   SAME   |
|  I128   |      I32      |          false           |      2^24      |     1     | 389.955 us |       0.53% | 389.878 us |       0.54% |  -0.077 us |  -0.02% |   SAME   |
|  I128   |      I32      |          false           |      2^28      |     1     |   6.059 ms |       0.26% |   6.059 ms |       0.26% |   0.098 us |   0.00% |   SAME   |
|  I128   |      I32      |          false           |      2^16      |   0.544   |  12.128 us |       4.27% |  11.647 us |       8.39% |  -0.481 us |  -3.96% |   SAME   |
|  I128   |      I32      |          false           |      2^20      |   0.544   |  34.817 us |       2.54% |  34.928 us |       2.97% |   0.111 us |   0.32% |   SAME   |
|  I128   |      I32      |          false           |      2^24      |   0.544   | 390.148 us |       0.45% | 390.033 us |       0.43% |  -0.115 us |  -0.03% |   SAME   |
|  I128   |      I32      |          false           |      2^28      |   0.544   |   6.062 ms |       0.06% |   6.061 ms |       0.06% |  -0.873 us |  -0.01% |   SAME   |
|  I128   |      I32      |          false           |      2^16      |     0     |  12.084 us |       4.01% |  12.313 us |       3.77% |   0.229 us |   1.89% |   SAME   |
|  I128   |      I32      |          false           |      2^20      |     0     |  34.532 us |       2.76% |  34.510 us |       3.16% |  -0.021 us |  -0.06% |   SAME   |
|  I128   |      I32      |          false           |      2^24      |     0     | 389.730 us |       0.57% | 390.116 us |       0.57% |   0.386 us |   0.10% |   SAME   |
|  I128   |      I32      |          false           |      2^28      |     0     |   6.061 ms |       0.25% |   6.061 ms |       0.24% |  -0.122 us |  -0.00% |   SAME   |
|  I128   |      I32      |           true           |      2^16      |     1     |  12.398 us |       4.37% |  12.327 us |       4.58% |  -0.071 us |  -0.57% |   SAME   |
|  I128   |      I32      |           true           |      2^20      |     1     |  33.914 us |       3.48% |  33.987 us |       3.30% |   0.073 us |   0.21% |   SAME   |
|  I128   |      I32      |           true           |      2^24      |     1     | 389.791 us |       0.55% | 389.700 us |       0.53% |  -0.091 us |  -0.02% |   SAME   |
|  I128   |      I32      |           true           |      2^28      |     1     |   6.052 ms |       0.31% |   6.051 ms |       0.31% |  -1.953 us |  -0.03% |   SAME   |
|  I128   |      I32      |           true           |      2^16      |   0.544   |  12.162 us |       3.92% |  11.883 us |       6.06% |  -0.280 us |  -2.30% |   SAME   |
|  I128   |      I32      |           true           |      2^20      |   0.544   |  34.126 us |       3.16% |  34.084 us |       3.35% |  -0.042 us |  -0.12% |   SAME   |
|  I128   |      I32      |           true           |      2^24      |   0.544   | 389.981 us |       0.45% | 389.927 us |       0.40% |  -0.054 us |  -0.01% |   SAME   |
|  I128   |      I32      |           true           |      2^28      |   0.544   |   6.061 ms |       0.05% |   6.061 ms |       0.05% |   0.017 us |   0.00% |   SAME   |
|  I128   |      I32      |           true           |      2^16      |     0     |  12.310 us |       4.27% |  12.321 us |       3.44% |   0.011 us |   0.09% |   SAME   |
|  I128   |      I32      |           true           |      2^20      |     0     |  34.002 us |       3.18% |  33.924 us |       3.30% |  -0.078 us |  -0.23% |   SAME   |
|  I128   |      I32      |           true           |      2^24      |     0     | 389.649 us |       0.53% | 389.507 us |       0.51% |  -0.142 us |  -0.04% |   SAME   |
|  I128   |      I32      |           true           |      2^28      |     0     |   6.056 ms |       0.29% |   6.056 ms |       0.28% |   0.150 us |   0.00% |   SAME   |
|  I128   |      I64      |          false           |      2^16      |     1     |  12.155 us |       4.28% |  12.290 us |       4.05% |   0.136 us |   1.12% |   SAME   |
|  I128   |      I64      |          false           |      2^20      |     1     |  34.061 us |       3.45% |  34.170 us |       3.12% |   0.109 us |   0.32% |   SAME   |
|  I128   |      I64      |          false           |      2^24      |     1     | 391.476 us |       0.57% | 391.433 us |       0.54% |  -0.043 us |  -0.01% |   SAME   |
|  I128   |      I64      |          false           |      2^28      |     1     |   6.039 ms |       0.25% |   6.039 ms |       0.25% |   0.309 us |   0.01% |   SAME   |
|  I128   |      I64      |          false           |      2^16      |   0.544   |  12.305 us |       3.47% |  12.296 us |       4.49% |  -0.009 us |  -0.07% |   SAME   |
|  I128   |      I64      |          false           |      2^20      |   0.544   |  35.100 us |       2.71% |  35.159 us |       2.62% |   0.059 us |   0.17% |   SAME   |
|  I128   |      I64      |          false           |      2^24      |   0.544   | 391.858 us |       0.49% | 392.015 us |       0.52% |   0.157 us |   0.04% |   SAME   |
|  I128   |      I64      |          false           |      2^28      |   0.544   |   6.065 ms |       0.08% |   6.064 ms |       0.07% |  -0.158 us |  -0.00% |   SAME   |
|  I128   |      I64      |          false           |      2^16      |     0     |  12.329 us |       4.31% |  12.102 us |       4.57% |  -0.227 us |  -1.84% |   SAME   |
|  I128   |      I64      |          false           |      2^20      |     0     |  33.992 us |       3.24% |  33.992 us |       3.31% |   0.000 us |   0.00% |   SAME   |
|  I128   |      I64      |          false           |      2^24      |     0     | 392.210 us |       0.54% | 392.368 us |       0.50% |   0.158 us |   0.04% |   SAME   |
|  I128   |      I64      |          false           |      2^28      |     0     |   6.048 ms |       0.30% |   6.047 ms |       0.29% |  -1.509 us |  -0.02% |   SAME   |
|  I128   |      I64      |           true           |      2^16      |     1     |  12.244 us |       2.77% |  12.403 us |       4.58% |   0.159 us |   1.30% |   SAME   |
|  I128   |      I64      |           true           |      2^20      |     1     |  34.327 us |       3.11% |  34.262 us |       3.09% |  -0.065 us |  -0.19% |   SAME   |
|  I128   |      I64      |           true           |      2^24      |     1     | 389.152 us |       0.51% | 389.363 us |       0.51% |   0.211 us |   0.05% |   SAME   |
|  I128   |      I64      |           true           |      2^28      |     1     |   6.064 ms |       0.14% |   6.064 ms |       0.14% |   0.061 us |   0.00% |   SAME   |
|  I128   |      I64      |           true           |      2^16      |   0.544   |  12.093 us |       4.60% |  12.230 us |       2.29% |   0.137 us |   1.13% |   SAME   |
|  I128   |      I64      |           true           |      2^20      |   0.544   |  34.141 us |       3.19% |  34.060 us |       3.23% |  -0.081 us |  -0.24% |   SAME   |
|  I128   |      I64      |           true           |      2^24      |   0.544   | 389.622 us |       0.43% | 389.492 us |       0.45% |  -0.130 us |  -0.03% |   SAME   |
|  I128   |      I64      |           true           |      2^28      |   0.544   |   6.062 ms |       0.05% |   6.062 ms |       0.05% |  -0.032 us |  -0.00% |   SAME   |
|  I128   |      I64      |           true           |      2^16      |     0     |  12.301 us |       2.55% |  12.243 us |       3.84% |  -0.057 us |  -0.47% |   SAME   |
|  I128   |      I64      |           true           |      2^20      |     0     |  34.707 us |       2.83% |  34.984 us |       2.81% |   0.277 us |   0.80% |   SAME   |
|  I128   |      I64      |           true           |      2^24      |     0     | 389.060 us |       0.50% | 389.276 us |       0.50% |   0.216 us |   0.06% |   SAME   |
|  I128   |      I64      |           true           |      2^28      |     0     |   6.063 ms |       0.16% |   6.063 ms |       0.13% |   0.476 us |   0.01% |   SAME   |
|   F32   |      I32      |          false           |      2^16      |     1     |  10.165 us |       2.78% |   9.993 us |       4.95% |  -0.172 us |  -1.69% |   SAME   |
|   F32   |      I32      |          false           |      2^20      |     1     |  18.352 us |       2.43% |  18.356 us |       2.63% |   0.004 us |   0.02% |   SAME   |
|   F32   |      I32      |          false           |      2^24      |     1     | 111.766 us |       0.97% | 111.706 us |       1.00% |  -0.059 us |  -0.05% |   SAME   |
|   F32   |      I32      |          false           |      2^28      |     1     |   1.669 ms |       0.16% |   1.669 ms |       0.15% |  -0.058 us |  -0.00% |   SAME   |
|   F32   |      I32      |          false           |      2^16      |   0.544   |  10.218 us |       1.54% |  10.240 us |       0.00% |   0.022 us |   0.22% |   SLOW   |
|   F32   |      I32      |          false           |      2^20      |   0.544   |  18.413 us |       1.51% |  18.312 us |       2.14% |  -0.101 us |  -0.55% |   SAME   |
|   F32   |      I32      |          false           |      2^24      |   0.544   | 113.536 us |       1.03% | 113.424 us |       1.00% |  -0.112 us |  -0.10% |   SAME   |
|   F32   |      I32      |          false           |      2^28      |   0.544   |   1.671 ms |       0.16% |   1.671 ms |       0.15% |   0.202 us |   0.01% |   SAME   |
|   F32   |      I32      |          false           |      2^16      |     0     |  10.020 us |       4.61% |  10.188 us |       2.41% |   0.168 us |   1.68% |   SAME   |
|   F32   |      I32      |          false           |      2^20      |     0     |  18.314 us |       2.86% |  18.222 us |       3.40% |  -0.092 us |  -0.50% |   SAME   |
|   F32   |      I32      |          false           |      2^24      |     0     | 111.373 us |       0.95% | 111.285 us |       1.04% |  -0.088 us |  -0.08% |   SAME   |
|   F32   |      I32      |          false           |      2^28      |     0     |   1.669 ms |       0.17% |   1.669 ms |       0.17% |  -0.145 us |  -0.01% |   SAME   |
|   F32   |      I32      |           true           |      2^16      |     1     |  10.225 us |       1.23% |  10.199 us |       2.14% |  -0.026 us |  -0.25% |   SAME   |
|   F32   |      I32      |           true           |      2^20      |     1     |  19.669 us |       5.47% |  19.848 us |       5.23% |   0.179 us |   0.91% |   SAME   |
|   F32   |      I32      |           true           |      2^24      |     1     | 115.291 us |       1.02% | 115.206 us |       1.02% |  -0.085 us |  -0.07% |   SAME   |
|   F32   |      I32      |           true           |      2^28      |     1     |   1.663 ms |       0.12% |   1.663 ms |       0.11% |   0.091 us |   0.01% |   SAME   |
|   F32   |      I32      |           true           |      2^16      |   0.544   |  10.240 us |       0.00% |  10.065 us |       4.20% |  -0.175 us |  -1.71% |   FAST   |
|   F32   |      I32      |           true           |      2^20      |   0.544   |  20.126 us |       3.97% |  20.219 us |       3.69% |   0.093 us |   0.46% |   SAME   |
|   F32   |      I32      |           true           |      2^24      |   0.544   | 118.191 us |       1.06% | 118.295 us |       1.09% |   0.104 us |   0.09% |   SAME   |
|   F32   |      I32      |           true           |      2^28      |   0.544   |   1.664 ms |       0.11% |   1.664 ms |       0.10% |  -0.185 us |  -0.01% |   SAME   |
|   F32   |      I32      |           true           |      2^16      |     0     |  10.240 us |       0.00% |  10.068 us |       4.10% |  -0.172 us |  -1.68% |   FAST   |
|   F32   |      I32      |           true           |      2^20      |     0     |  19.839 us |       4.85% |  19.834 us |       4.90% |  -0.005 us |  -0.03% |   SAME   |
|   F32   |      I32      |           true           |      2^24      |     0     | 116.937 us |       0.99% | 116.990 us |       0.96% |   0.054 us |   0.05% |   SAME   |
|   F32   |      I32      |           true           |      2^28      |     0     |   1.664 ms |       0.10% |   1.663 ms |       0.10% |  -0.080 us |  -0.00% |   SAME   |
|   F32   |      I64      |          false           |      2^16      |     1     |  10.005 us |       4.97% |  10.240 us |       0.00% |   0.235 us |   2.35% |   SLOW   |
|   F32   |      I64      |          false           |      2^20      |     1     |  18.432 us |       0.00% |  18.451 us |       1.99% |   0.019 us |   0.11% |   SLOW   |
|   F32   |      I64      |          false           |      2^24      |     1     | 112.636 us |       0.86% | 112.585 us |       0.90% |  -0.052 us |  -0.05% |   SAME   |
|   F32   |      I64      |          false           |      2^28      |     1     |   1.656 ms |       0.10% |   1.656 ms |       0.10% |  -0.164 us |  -0.01% |   SAME   |
|   F32   |      I64      |          false           |      2^16      |   0.544   |  10.240 us |       0.00% |  10.240 us |       0.00% |   0.000 us |   0.00% |   SAME   |
|   F32   |      I64      |          false           |      2^20      |   0.544   |  18.582 us |       3.27% |  18.487 us |       2.70% |  -0.095 us |  -0.51% |   SAME   |
|   F32   |      I64      |          false           |      2^24      |   0.544   | 115.196 us |       0.87% | 115.056 us |       0.76% |  -0.140 us |  -0.12% |   SAME   |
|   F32   |      I64      |          false           |      2^28      |   0.544   |   1.659 ms |       0.10% |   1.659 ms |       0.10% |  -0.146 us |  -0.01% |   SAME   |
|   F32   |      I64      |          false           |      2^16      |     0     |  10.168 us |       2.65% |  10.255 us |       3.45% |   0.087 us |   0.86% |   SAME   |
|   F32   |      I64      |          false           |      2^20      |     0     |  18.378 us |       3.68% |  18.413 us |       3.81% |   0.035 us |   0.19% |   SAME   |
|   F32   |      I64      |          false           |      2^24      |     0     | 113.554 us |       1.09% | 113.164 us |       1.00% |  -0.389 us |  -0.34% |   SAME   |
|   F32   |      I64      |          false           |      2^28      |     0     |   1.657 ms |       0.10% |   1.657 ms |       0.10% |   0.160 us |   0.01% |   SAME   |
|   F32   |      I64      |           true           |      2^16      |     1     |  14.317 us |       0.91% |  14.142 us |       3.10% |  -0.176 us |  -1.23% |   FAST   |
|   F32   |      I64      |           true           |      2^20      |     1     |  23.693 us |       4.28% |  23.517 us |       4.32% |  -0.176 us |  -0.74% |   SAME   |
|   F32   |      I64      |           true           |      2^24      |     1     | 121.949 us |       1.07% | 122.151 us |       0.95% |   0.202 us |   0.17% |   SAME   |
|   F32   |      I64      |           true           |      2^28      |     1     |   1.688 ms |       0.09% |   1.688 ms |       0.09% |   0.020 us |   0.00% |   SAME   |
|   F32   |      I64      |           true           |      2^16      |   0.544   |  14.376 us |       5.53% |  14.595 us |       4.85% |   0.219 us |   1.52% |   SAME   |
|   F32   |      I64      |           true           |      2^20      |   0.544   |  24.306 us |       2.83% |  24.227 us |       3.18% |  -0.079 us |  -0.32% |   SAME   |
|   F32   |      I64      |           true           |      2^24      |   0.544   | 124.522 us |       0.95% | 124.587 us |       0.95% |   0.065 us |   0.05% |   SAME   |
|   F32   |      I64      |           true           |      2^28      |   0.544   |   1.688 ms |       0.09% |   1.689 ms |       0.09% |   0.118 us |   0.01% |   SAME   |
|   F32   |      I64      |           true           |      2^16      |     0     |  14.336 us |       0.00% |  14.336 us |       0.00% |   0.000 us |   0.00% |   ????   |
|   F32   |      I64      |           true           |      2^20      |     0     |  22.950 us |       3.81% |  22.867 us |       3.50% |  -0.084 us |  -0.36% |   SAME   |
|   F32   |      I64      |           true           |      2^24      |     0     | 121.651 us |       0.91% | 121.669 us |       0.90% |   0.018 us |   0.01% |   SAME   |
|   F32   |      I64      |           true           |      2^28      |     0     |   1.689 ms |       0.09% |   1.688 ms |       0.09% |  -0.191 us |  -0.01% |   SAME   |
|   F64   |      I32      |          false           |      2^16      |     1     |  12.047 us |       5.39% |  12.288 us |       0.00% |   0.241 us |   2.00% |   ????   |
|   F64   |      I32      |          false           |      2^20      |     1     |  25.024 us |       4.07% |  25.012 us |       3.78% |  -0.012 us |  -0.05% |   SAME   |
|   F64   |      I32      |          false           |      2^24      |     1     | 212.870 us |       0.88% | 212.741 us |       0.84% |  -0.129 us |  -0.06% |   SAME   |
|   F64   |      I32      |          false           |      2^28      |     1     |   3.119 ms |       0.09% |   3.119 ms |       0.09% |   0.191 us |   0.01% |   SAME   |
|   F64   |      I32      |          false           |      2^16      |   0.544   |  12.255 us |       1.52% |  12.068 us |       3.96% |  -0.187 us |  -1.53% |   FAST   |
|   F64   |      I32      |          false           |      2^20      |   0.544   |  25.841 us |       3.91% |  25.739 us |       4.05% |  -0.102 us |  -0.40% |   SAME   |
|   F64   |      I32      |          false           |      2^24      |   0.544   | 213.178 us |       0.94% | 213.023 us |       0.92% |  -0.155 us |  -0.07% |   SAME   |
|   F64   |      I32      |          false           |      2^28      |   0.544   |   3.121 ms |       0.15% |   3.121 ms |       0.15% |   0.031 us |   0.00% |   SAME   |
|   F64   |      I32      |          false           |      2^16      |     0     |  12.074 us |       3.92% |  12.050 us |       3.94% |  -0.024 us |  -0.20% |   SAME   |
|   F64   |      I32      |          false           |      2^20      |     0     |  25.312 us |       3.94% |  25.352 us |       4.00% |   0.039 us |   0.15% |   SAME   |
|   F64   |      I32      |          false           |      2^24      |     0     | 212.873 us |       0.89% | 212.851 us |       0.86% |  -0.022 us |  -0.01% |   SAME   |
|   F64   |      I32      |          false           |      2^28      |     0     |   3.119 ms |       0.10% |   3.119 ms |       0.09% |   0.021 us |   0.00% |   SAME   |
|   F64   |      I32      |           true           |      2^16      |     1     |  12.267 us |       1.25% |  12.281 us |       0.56% |   0.014 us |   0.11% |   SAME   |
|   F64   |      I32      |           true           |      2^20      |     1     |  24.407 us |       2.49% |  24.387 us |       2.92% |  -0.020 us |  -0.08% |   SAME   |
|   F64   |      I32      |           true           |      2^24      |     1     | 212.616 us |       0.91% | 212.632 us |       0.98% |   0.015 us |   0.01% |   SAME   |
|   F64   |      I32      |           true           |      2^28      |     1     |   3.121 ms |       0.09% |   3.120 ms |       0.09% |  -0.220 us |  -0.01% |   SAME   |
|   F64   |      I32      |           true           |      2^16      |   0.544   |  12.288 us |       0.00% |  12.263 us |       1.31% |  -0.025 us |  -0.20% |   ????   |
|   F64   |      I32      |           true           |      2^20      |   0.544   |  24.576 us |       0.00% |  24.612 us |       2.29% |   0.036 us |   0.15% |   ????   |
|   F64   |      I32      |           true           |      2^24      |   0.544   | 212.624 us |       0.94% | 212.688 us |       0.94% |   0.063 us |   0.03% |   SAME   |
|   F64   |      I32      |           true           |      2^28      |   0.544   |   3.122 ms |       0.12% |   3.122 ms |       0.11% |  -0.102 us |  -0.00% |   SAME   |
|   F64   |      I32      |           true           |      2^16      |     0     |  10.240 us |       0.00% |  10.346 us |       2.57% |   0.106 us |   1.03% |   SLOW   |
|   F64   |      I32      |           true           |      2^20      |     0     |  24.319 us |       2.71% |  24.455 us |       2.70% |   0.136 us |   0.56% |   SAME   |
|   F64   |      I32      |           true           |      2^24      |     0     | 212.273 us |       0.92% | 211.859 us |       0.94% |  -0.414 us |  -0.19% |   SAME   |
|   F64   |      I32      |           true           |      2^28      |     0     |   3.120 ms |       0.11% |   3.120 ms |       0.10% |  -0.235 us |  -0.01% |   SAME   |
|   F64   |      I64      |          false           |      2^16      |     1     |  10.003 us |       4.70% |  10.218 us |       1.51% |   0.215 us |   2.15% |   SLOW   |
|   F64   |      I64      |          false           |      2^20      |     1     |  22.839 us |       5.38% |  22.594 us |       5.57% |  -0.245 us |  -1.07% |   SAME   |
|   F64   |      I64      |          false           |      2^24      |     1     | 206.448 us |       1.49% | 206.560 us |       1.37% |   0.112 us |   0.05% |   SAME   |
|   F64   |      I64      |          false           |      2^28      |     1     |   3.119 ms |       0.17% |   3.119 ms |       0.16% |   0.401 us |   0.01% |   SAME   |
|   F64   |      I64      |          false           |      2^16      |   0.544   |   9.813 us |       8.42% |   9.747 us |       8.00% |  -0.066 us |  -0.68% |   SAME   |
|   F64   |      I64      |          false           |      2^20      |   0.544   |  23.043 us |       4.45% |  23.098 us |       4.34% |   0.056 us |   0.24% |   SAME   |
|   F64   |      I64      |          false           |      2^24      |   0.544   | 207.424 us |       1.42% | 207.256 us |       1.45% |  -0.168 us |  -0.08% |   SAME   |
|   F64   |      I64      |          false           |      2^28      |   0.544   |   3.119 ms |       0.13% |   3.119 ms |       0.13% |   0.158 us |   0.01% |   SAME   |
|   F64   |      I64      |          false           |      2^16      |     0     |  10.211 us |       1.79% |  10.240 us |       0.00% |   0.029 us |   0.29% |   SLOW   |
|   F64   |      I64      |          false           |      2^20      |     0     |  22.855 us |       4.50% |  22.740 us |       5.27% |  -0.114 us |  -0.50% |   SAME   |
|   F64   |      I64      |          false           |      2^24      |     0     | 206.420 us |       1.35% | 206.672 us |       1.55% |   0.252 us |   0.12% |   SAME   |
|   F64   |      I64      |          false           |      2^28      |     0     |   3.119 ms |       0.17% |   3.119 ms |       0.16% |   0.280 us |   0.01% |   SAME   |
|   F64   |      I64      |           true           |      2^16      |     1     |  10.275 us |       3.80% |  10.343 us |       4.02% |   0.069 us |   0.67% |   SAME   |
|   F64   |      I64      |           true           |      2^20      |     1     |  23.133 us |       4.55% |  23.265 us |       4.29% |   0.132 us |   0.57% |   SAME   |
|   F64   |      I64      |           true           |      2^24      |     1     | 211.000 us |       0.97% | 210.932 us |       0.91% |  -0.069 us |  -0.03% |   SAME   |
|   F64   |      I64      |           true           |      2^28      |     1     |   3.114 ms |       0.11% |   3.114 ms |       0.10% |  -0.015 us |  -0.00% |   SAME   |
|   F64   |      I64      |           true           |      2^16      |   0.544   |  12.077 us |       3.73% |  12.228 us |       2.29% |   0.151 us |   1.25% |   SAME   |
|   F64   |      I64      |           true           |      2^20      |   0.544   |  23.268 us |       4.24% |  23.185 us |       4.46% |  -0.083 us |  -0.36% |   SAME   |
|   F64   |      I64      |           true           |      2^24      |   0.544   | 211.445 us |       0.97% | 211.663 us |       0.85% |   0.218 us |   0.10% |   SAME   |
|   F64   |      I64      |           true           |      2^28      |   0.544   |   3.116 ms |       0.15% |   3.116 ms |       0.14% |  -0.144 us |  -0.00% |   SAME   |
|   F64   |      I64      |           true           |      2^16      |     0     |  11.980 us |       6.08% |  11.832 us |       6.55% |  -0.148 us |  -1.24% |   SAME   |
|   F64   |      I64      |           true           |      2^20      |     0     |  23.205 us |       4.41% |  23.164 us |       4.10% |  -0.041 us |  -0.18% |   SAME   |
|   F64   |      I64      |           true           |      2^24      |     0     | 210.645 us |       0.87% | 210.782 us |       0.89% |   0.137 us |   0.06% |   SAME   |
|   F64   |      I64      |           true           |      2^28      |     0     |   3.114 ms |       0.12% |   3.113 ms |       0.11% |  -0.126 us |  -0.00% |   SAME   |

# Summary

- Total Matches: 336
  - Pass    (diff <= min_noise): 236
  - Unknown (infinite noise):    15
  - Failure (diff > min_noise):  85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant