Unable to find any difference between SSE2 and AVX2 in any browser

Web browsers, old and new, Firefox/Gecko, Chromium, Safari/WebKit, Opera/Presto, etc.
User avatar
K4sum1
Lazy Owner
Posts: 885
Joined: 11 Jan 2021, 07:40
Location: ur dads house
OS: Windows 8.1 x64
Has thanked: 549 times
Been thanked: 251 times
Contact:
United States of America

Unable to find any difference between SSE2 and AVX2 in any browser

Unread post by K4sum1 »

For release r3dfox, I was wanting to build x86 SSE2, x64 SSE3, and x64 AVX or AVX2. Three builds that offered the best performance for their category.

When it came to picking AVX or AVX2, I wanted to see which had the best benchmark scores. I started by taking Mercury 115 ESR SSE3, SSE4, AVX, and AVX2 to compare. I benchmarked them with JetStream 2.1, MotionMark 1.2 (hardware and software rendering), and Speedometer 3.0 on a machine I had laying around, a Lenovo ThinkCentre M93p Tiny with a 4765T.

What I discovered was basically no difference, or if anything, AVX2 is worse.

This then evolved into me testing various instruction sets for various browsers. Stock Firefox to sanity check, Waterfox for SSSE3 (not SSE3), Pale Moon, New Moon, Supermium, and Thorium.

I still couldn't see any sizeable difference between SSE2/3 and AVX/2.

So I did the same testing but with 7 SP0 to force a scenario where no AVX code could run if it was something that was automatically taken advantage of.

I still found no difference.

The only place I found instruction sets make a difference was in New Moon where the IA-32 and SSE(1) builds performed half was well as the SSE2 build with software rendering in MotionMark. Hardware accelerated MotionMark was the same. The JetStream and Speedometer benchmarks have no difference in score with software or hardware rendering. I found no other major difference in any browser.

The results

Firefox or Gecko based browsers:
This includes Firefox 115 ESR, Mercury 115, and Waterfox G6. Tested with JetStream 2.1, MotionMark 1.2, and Speedometer 3.0. MotionMark was tested with software WebRender.

FX115.9.1 x86
JS2.1 64
MM1.2 178 (278 hw)
SP3 5.6

FX115.9.1
JS2.1 75
MM1.2 180 (351 hw)
SP3 5.7

M115.9 SSE3
JS2.1 75
MM1.2 196 (350 hw)
SP3 5.3

M115.9 SSE4
JS2.1 76
MM1.2 201 (353 hw)
SP3 5.3

M115.9 AVX
JS2.1 76
MM1.2 196 (340 hw)
SP3 5.3

M115.9 AVX2
JS2.1 76
MM1.2 194 (335 hw)
SP3 5.3

WFG6.0.11 SSSE3
JS2.1 75
MM1.2 192 (339 hw)
SP3 5.6

When it comes to instruction sets, in Mercury going from SSE3 to AVX2, the differences seem margin of error. Hardware WebRender shows a bit of a difference in favor of SSE4 the most, but that should mostly be on the GPU and not relevant to the CPU so I assume run variance.

When it comes to browsers, in Firefox going from x86 to x64 shows a bit of an improvement in JetStream and hardware rendering in MotionMark. Going from Firefox to Mercury shows an improvement in MotionMark with software rendering, but a slight decrease in Speedometer. Waterfox seems to do slightly better than Firefox in software MotionMark, but is otherwise the same. Mercury is compiled with msvc, and I know Waterfox is compiled with mingw32 instead. I wonder if Firefox is compiled with mingw32 too as it would explain the Speedometer differences. I don't think the SSSE3 requirement for Waterfox does anything. It seems like a worse requirement than AVX as there's even less potential for performance. Like AVX, it also breaks compatibility with a lot of CPUs like everything AMD before FX.

Pale Moon or Goanna based browsers:
This includes Pale Moon, the AVX/2 builds, and New Moon. Tested with JetStream 1.1, MotionMark 1.2, and Speedometer 2.1 as the newer versions didn't run. MotionMark was tested with basic software and hardware rendering as Goanna does not have WebRender.

PM33.0.2 SSE2
JS1.1 98
MM1.2 70 (119 hw)
SP2.1 21.2

PM33.0.2 AVX
JS1.1 98
MM1.2 67 (123 hw)
SP2.1 21.1

PM33.0.2 AVX2
JS1.1 100
MM1.2 75 (119 hw)
SP2.1 21.1

NM28 (24-03-23) IA-32
JS1.1 129
MM1.2 57 (146 hw)
SP2.1 27.0

NM28 (24-03-23) SSE
JS1.1 128
MM1.2 56 (145 hw)
SP2.1 27.0

NM28 (24-03-23) SSE2
JS1.1 130
MM1.2 114 (145 hw)
SP2.1 27.1

NM28 (24-03-23) SSE2 x64
JS1.1 112
MM1.2 140 (161 hw)
SP2.1 26.6

The differences in Pale Moon going from SSE2 to AVX2 seem to be margin of error. JetStream and Speedometer are almost the same. MotionMark does have a slight bump, but it doesn't seem very consistent going by results from before and AVX(1) having a lower score. I assume margin of error. New Moon IA-32 and SSE seem to do considerably better than Pale Moon, but are worse in Motionmark with software rendering while hardware is the same. I assume New Moon has some optimizations or is built with a better mozconfig. SSE2 doubles the software MotionMark score, while the other two are margin of error. The score dips with SSE2 x64, but MotionMark goes up some which is interesting. FIrefox x86 and x64 don't show this. Perhaps Goanna or the Firefox 52 base is more optimized for x86 operation.

Chromium or Blink based browsers:
This includes Supermium and Thorium. Supermium only has one x64 build but was tested due to previous testing between it and Thorium I wanted to check. Tested with JetStream 2.1, MotionMark 1.2, and Speedometer 3.0. MotionMark was tested with software and hardware rendering.

Sup122.0.6261.85
JS2.1 129
MM1.2 360 (465 hw)
SP3 8.8

Thor122.0.6261.132 SSE2 x86
JS2.1 111 (Slow load?)
MM1.2 292 (416 hw)
SP3 7.4

Thor122.0.6261.132 SSE3
JS2.1 129
MM1.2 328 (436 hw)
SP3 7.8

Thor122.0.6261.132 AVX
JS2.1 125
MM1.2 338 (440 hw)
SP3 7.8

Thor122.0.6261.132 AVX2
JS2.1 127
MM1.2 332 (411 hw)
SP3 7.6

I wanted to test Supermium since I remember in other testing done before it was faster than Thorium. It appears to be so, but it makes no sense since supporting XP, progwrp.dll, and other aspects should in theory make it slower. I assume optimizations are being hidden from Thorium developers to make Supermium appear better. Monetization and closed source progwrp.dll are reasons to avoid Supermium. Unless progwrp.dll is open sourced, I can't recommend it. Thorium SSE2 x86 to SSE3 x64 shows some improvements in all three benchmarks likely from being 64 bit. SSE3 to AVX2 seem margin of error just like Mercury and Pale Moon.

So I picked one of each category and tested again under 7 SP0 where no AVX code can run. The newer browsers aside from Firefox x86 had no hardware acceleration nor could it be force enabled.

Gecko/Firefox:

FX115.9.1 x86
JS2.1 70
MM1.2 162 (284 hw)
SP3 5.9

FX115.9.1
JS2.1 76
MM1.2 189
SP3 6.05

M115.9 SSE3
JS2.1 77.5
MM1.2 189
SP3 5.6

WFG6.0.11 SSSE3
JS2.1 77
MM1.2 196
SP3 5.8

Goanna/UXP:

PM33.0.2 SSE2
JS1.1 106
MM1.2 72 (96 hw)
SP2.1 21.7

NM28 (24-03-23) SSE2
JS1.1 141
MM1.2 115 (146 hw)
SP2.1 27.6

NM28 (24-03-23) SSE2 x64
JS1.1 122
MM1.2 134 (169 hw)
SP2.1 26.9

Blink/Chromium:

Sup122.0.6261.85
JS2.1 130
MM1.2 316
SP3 8.4

Thor122.0.6261.132 SSE3
JS2.1 132
MM1.2 303
SP3 8.6

Generally the same, but if anything Windows 7 SP0 seems to perform slightly better. I assume it's because having no updates takes less resources, but I would need to test more to make sure. This makes me think that there's nothing in a browser that benefits from more than SSE2.

In experimenting, I found one scenario where AVX2 could show an improvement that isn't tested, AV1 decoding.
https://www.phoronix.com/news/AVX2-dav1d-0.9-Benchmarks
However this seems to have no effect until you get to high resolution 10 bit content at hundreds of FPS, which seems highly unrealistic. I also assume this is an alternative code path if available since I don't think browser devs want to ship two versions of their AV1 decoder, with one increasingly getting more out of date.

Other than this, I can't find any notable performance difference between SSE2 and AVX2. If I have missed something or you can consistently recreate a speed increase or know a benchmark that can show an improvement, please let me know.
I don't know what I'm doing hit album by Brad Sucks

User avatar
K4sum1
Lazy Owner
Posts: 885
Joined: 11 Jan 2021, 07:40
Location: ur dads house
OS: Windows 8.1 x64
Has thanked: 549 times
Been thanked: 251 times
Contact:
United States of America

Unable to find any difference between SSE2 and AVX2 in any browser

Unread post by K4sum1 »

Bonus testing:

This was testing of me testing tuning settings. Tune and ztune are settings that optimize for the Athlon 64. I was testing to see if it benefited only the Athlon 64 or more processors. These scores aren't directly comparable to the above due to me not compiling with PGO.

P4 524

r3d stock
JS1.1 34.8
MM1.2 1.30
SP2.1 15.6

r3d tune
JS1.1 40.9
MM1.2 1.52
SP2.1 15.9

r3d ztune
JS1.1 41.3
MM1.2 1.62
SP2.1 15.8

A64 3400+ (SSE3 model)

r3d stock
JS1.1 47.5
MM1.2 1.26
SP2.1 24.3

r3d tune
JS1.1 48.7 (took three tries to finish)
MM1.2 1.26
SP2.1 24.3

r3d ztune
JS1.1 48.6
MM1.2 1.33
SP2.1 24.3

C2D E6300

r3d stock
JS1.1 53
MM1.2 75
SP2.1 34.6

r3d tune
JS1.1 54
MM1.2 77
SP2.1 34.8

r3d ztune
JS1.1 55
MM1.2 75
SP2.1 34.8

4765T

r3d stock
JS2.1 84
MM1.2 186
SP3 6.3

r3d tune
JS2.1 84
MM1.2 185
SP3 6.5

r3d ztune
JS2.1 83
MM1.2 182
SP3 6.6

The worst case seems to be margin of error. Best case is a sizeable improvement on a Pentium 4 of all things. Seeing that, I went ahead and built the final release with the a64 tuning.
I don't know what I'm doing hit album by Brad Sucks

User avatar
luk3Z
Posts: 80
Joined: 10 Dec 2021, 19:23
Location: Metavira
OS: XP/Vista/W7 x64, MX
Has thanked: 7 times
Been thanked: 42 times
Micronesia

Unable to find any difference between SSE2 and AVX2 in any browser

Unread post by luk3Z »

There was an update for W7-W10 called "Microcode update for Intel processors" that caused some CPUs performance degradation.

https://duckduckgo.com/?q=intel+CPU+microcode+updates+windows+7&va=k&ia=web
https://searxng.site/searxng/search?q=windows+critical+update+KB+CPU+microcode
https://www.howtogeek.com/781308/intel-cpu-are-getting-mystery-critical-security-updates/

Ask AI about "intel CPU microcode windows updates and CPU performance decrease" for more informations:
https://gemini.google.com/app
https://chat.openai.com
These updates are typically aimed at addressing security vulnerabilities such as Spectre and Meltdown.
This resulted in a performance decrease, especially on older Intel CPUs (pre-2016). Benchmarks showed single-digit slowdowns for newer CPUs and potentially more noticeable slowdowns for older ones.
Thats why W7 SP0 is faster than W7 SP1. Unfortunately I don;t know which KB was associated with this - should be easy to find though.
Last edited by luk3Z on 07 Apr 2024, 12:08, edited 1 time in total.

User avatar
K4sum1
Lazy Owner
Posts: 885
Joined: 11 Jan 2021, 07:40
Location: ur dads house
OS: Windows 8.1 x64
Has thanked: 549 times
Been thanked: 251 times
Contact:
United States of America

Unable to find any difference between SSE2 and AVX2 in any browser

Unread post by K4sum1 »

I looked at my 2020 Windows 7 ISO, and it appears to have microcode from 7 SP1. Unless late 2010 microcode has some sort of performance decrease, it shouldn't be that.
I don't know what I'm doing hit album by Brad Sucks

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests