MMX instructions

Hardware and software mods, and programming stuff (overclocking, game mods, learning how to exit Vim because you're a web developer...)
265 993 303
Posts: 3
Joined: 28 Nov 2023, 19:09
OS: Windows NT 3.5 x86
Been thanked: 2 times
Poland

MMX instructions

Unread post by 265 993 303 »

Windows NT 3.x, Windows 95, and some versions of Windows NT 4.0 do not support storing SSE registers of each process. Therefore, SSE won't work on these Windows versions, and it won't work on non-SSE processors. SSE and SSE2 have been used for vectorization, by doing the same operation on multiple values. However, even without SSE support, it is possible to vectorize by using MMX instructions. MMX uses the existing register space within x87, so Windows storing separate x87 floating point registers will automatically store MMX state as well. MMX is also older than SSE, so some legacy processors without SSE will work with MMX.

Here is an example of how MMX could be used to vectorize a polynomial integer approximation of sRGB to linear conversion:

Code: Select all

#include <stdio.h>
#include <stdint.h>

int main(){
 const uint64_t a=0x2A3A2A3A2A3A2A3A; //packed four 10810 words
 const uint64_t b=0x1D001D001D001D00; //packed four 7424 words
 const uint64_t c=0x01F201F201F201F2; //packed four 242 words
 uint64_t o=0x000000FF00BC0001;
 uint64_t* z=&o;
 __asm{
  mov ecx, z
  movq mm0, [ecx]
  psllw mm0, 7
  movq mm1, a
  pmulhw mm1, mm0
  paddw mm1, b
  pmulhw mm1, mm0
  paddw mm1, c
  pmulhw mm1, mm0
  movq [ecx], mm1
 }
 printf("%016llX ",o);
}
This can be compiled with Digital Mars C/C++ Compiler Version 8.57 (available in https://www.digitalmars.com/download/freecompiler.html), which works in Windows NT 3.5 and up, and works in Windows 95 and up. In this example, four sRGB channel values (00, FF, BC, 01) are converted to four approximate linear values (0000, 0D60, 06B6, 0001). The 00—FF sRGB scale is converted to 0000—0D60 linear scale. sRGB data will typically have byte order RRGGBBRRGGBBRRGG… or BBGGRRBBGGRRBBGG… or BBGGRR00BBGGRR00… so the expected xx00xx00xx00xx00 byte order on input may be done by PUNPCKLBW/PUNPCKHBW or PAND/PANDN instructions. A lookup table of less than 1 page may be used to round trip the reverse conversion of linear range to 00—FF sRGB. The use of MMX instructions allowed the conversion of four channels to linear at the same time without any SSE requirement. Resulting linear values can then be added linearly to produce gamma correct color blending.

Post Reply

Who is online

Users browsing this forum: No registered users and 25 guests