MMX(TM) Technology Packed Arithmetic Intrinsics

The prototypes for MMX(TM) technology intrinsics are in the mmintrin.h header file.

Packed Arithmetic Intrinsics, Part 1

Intrinsic
Name
Alternate Name Corresponding
Instruction
Operation Signed
_m_paddb _mm_add_pi8 PADDB Addition --
_m_paddw _mm_add_pi16 PADDW Addition --
_m_paddd _mm_add_pi32 PADDD Addition --
_m_paddsb _mm_adds_pi8 PADDSB Addition Yes
_m_paddsw _mm_adds_pi16 PADDSW Addition Yes
_m_paddusb _mm_adds_pu8 PADDUSB Addition No
_m_paddusw _mm_adds_pu16 PADDUSW Addition No
_m_psubb _mm_sub_pi8 PSUBB Subtraction --
_m_psubw _mm_sub_pi16 PSUBW Subtraction --
_m_psubd _mm_sub_pi32 PSUBD Subtraction --
_m_psubsb _mm_subs_pi8 PSUBSB Subtraction Yes
_m_psubsw _mm_subs_pi16 PSUBSW Subtraction Yes
_m_psubusb _mm_subs_pu8 PSUBUSB Subtraction No
_m_psubusw _mm_subs_pu16 PSUBUSW Subtraction No
_m_pmaddwd _mm_madd_pi16 PMADDWD Multiplication --
_m_pmulhw _mm_mulhi_pi16 PMULHW Multiplication Yes
_m_pmullw _mm_mullo_pi16 PMULLW Multiplication --

Packed Arithmetic Intrinsics, Part 2

Intrinsic
Name
Alternate Name Corresponding
Instruction
Argument
Values/Bits
Result
Values/Bits
_m_paddb _mm_add_pi8 PADDB 8/8 8/8
_m_paddw _mm_add_pi16 PADDW 4/16 4/16
_m_paddd _mm_add_pi32 PADDD 2/32 2/32
_m_paddsb _mm_adds_pi8 PADDSB 8/8 8/8
_m_paddsw _mm_adds_pi16 PADDSW 4/16 4/16
_m_paddusb _mm_adds_pu8 PADDUSB 8/8 8/8
_m_paddusw _mm_adds_pu16 PADDUSW 4/16 4/16
_m_psubb _mm_sub_pi8 PSUBB 8/8 8/8
_m_psubw _mm_sub_pi16 PSUBW 4/16 4/16
_m_psubd _mm_sub_pi32 PSUBD 2/32 2/32
_m_psubsb _mm_subs_pi8 PSUBSB 8/8 8/8
_m_psubsw _mm_subs_pi16 PSUBSW 4/16 4/16
_m_psubusb _mm_subs_pu8 PSUBUSB 8/8 8/8
_m_psubusw _mm_subs_pu16 PSUBUSW 4/16 4/16
_m_pmaddwd _mm_madd_pi16 PMADDWD 4/16 2/32
_m_pmulhw _mm_mulhi_pi16 PMULHW 4/16 4/16 (high)
_m_pmullw _mm_mullo_pi16 PMULLW 4/16 4/16 (low)

 

__m64 _m_paddb(__m64 m1, __m64 m2)

Add the eight 8-bit values in m1 to the eight 8-bit values in m2.

__m64 _m_paddw(__m64 m1, __m64 m2)

Add the four 16-bit values in m1 to the four 16-bit values in m2.

__m64 _m_paddd(__m64 m1, __m64 m2)

Add the two 32-bit values in m1 to the two 32-bit values in m2.

__m64 _m_paddsb(__m64 m1, __m64 m2)

Add the eight signed 8-bit values in m1 to the eight signed 8-bit values in m2 using saturating arithmetic.

__m64 _m_paddsw(__m64 m1, __m64 m2)

Add the four signed 16-bit values in m1 to the four signed 16-bit values in m2 using saturating arithmetic.

__m64 _m_paddusb(__m64 m1, __m64 m2)

Add the eight unsigned 8-bit values in m1 to the eight unsigned 8-bit values in m2 and using saturating arithmetic.

__m64 _m_paddusw(__m64 m1, __m64 m2)

Add the four unsigned 16-bit values in m1 to the four unsigned 16-bit values in m2 using saturating arithmetic.

__m64 _m_psubb(__m64 m1, __m64 m2)

Subtract the eight 8-bit values in m2 from the eight 8-bit values in m1.

__m64 _m_psubw(__m64 m1, __m64 m2)

Subtract the four 16-bit values in m2 from the four 16-bit values in m1.

__m64 _m_psubd(__m64 m1, __m64 m2)

Subtract the two 32-bit values in m2 from the two 32-bit values in m1.

__m64 _m_psubsb(__m64 m1, __m64 m2)

Subtract the eight signed 8-bit values in m2 from the eight signed 8-bit values in m1 using saturating arithmetic.

__m64 _m_psubsw(__m64 m1, __m64 m2)

Subtract the four signed 16-bit values in m2 from the four signed 16-bit values in m1 using saturating arithmetic.

__m64 _m_psubusb(__m64 m1, __m64 m2)

Subtract the eight unsigned 8-bit values in m2 from the eight unsigned 8-bit values in m1 using saturating arithmetic.

__m64 _m_psubusw(__m64 m1, __m64 m2)

Subtract the four unsigned 16-bit values in m2 from the four unsigned 16-bit values in m1 using saturating arithmetic.

__m64 _m_pmaddwd(__m64 m1, __m64 m2)

Multiply four 16-bit values in m1 by four 16-bit values in m2 producing four 32-bit intermediate results, which are then summed by pairs to produce two 32-bit results.

__m64 _m_pmulhw(__m64 m1, __m64 m2)

Multiply four signed 16-bit values in m1 by four signed 16-bit values in m2 and produce the high 16 bits of the four results.

__m64 _m_pmullw(__m64 m1, __m64 m2)

Multiply four 16-bit values in m1 by four 16-bit values in m2 and produce the low 16 bits of the four results.