**Note**

You must include fvec.h header file for the following functionality.

Compute the element-wise maximum of the respective signed integer words in A and B.

Is16vec4 simd_max(Is16vec4 A, Is16vec4
B);

Corresponding intrinsic: _mm_max_pi16

Compute the element-wise minimum of the respective signed integer words in A and B.

Is16vec4 simd_min(Is16vec4 A, Is16vec4
B);

Corresponding intrinsic: _mm_min_pi16

Compute the element-wise maximum of the respective unsigned bytes in A and B.

Iu8vec8 simd_max(Iu8vec8 A, Iu8vec8
B);

Corresponding intrinsic: _mm_max_pu8

Compute the element-wise minimum of the respective unsigned bytes in A and B.

Iu8vec8 simd_min(Iu8vec8 A, Iu8vec8
B);

Corresponding intrinsic: _mm_min_pu8

Create an 8-bit mask from the most significant bits of the bytes in A.

int move_mask(I8vec8 A);

Corresponding intrinsic: _mm_movemask_pi8

Conditionally store byte elements of A to address p. The high bit of each byte in the selector B determines whether the corresponding byte in A will be stored.

void mask_move(I8vec8 A, I8vec8 B,
signed char *p);

Corresponding intrinsic: _mm_maskmove_si64

Store the data in A to the address p without polluting the caches. A can be any Ivec type.

void store_nta(__m64 *p, M64 A);

Corresponding intrinsic: _mm_stream_pi

Compute the element-wise average of the respective unsigned 8-bit integers in A and B.

Iu8vec8 simd_avg(Iu8vec8 A, Iu8vec8
B);

Corresponding intrinsic: _mm_avg_pu8

Compute the element-wise average of the respective unsigned 16-bit integers in A and B.

Iu16vec4 simd_avg(Iu16vec4 A, Iu16vec4
B);

Corresponding intrinsic: _mm_avg_pu16