The following load operation intrinsics and their respective instructions are functional in the Streaming SIMD Extensions 2 (SSE2).

The prototypes for SSE2 intrinsics are in the emmintrin.h header file.

__m128d _mm_load_pd(double const*dp)

(uses MOVAPD) Loads two DP
FP values. The address p must be 16-byte aligned.

r0 := p[0]

r1 := p[1]

__m128d _mm_load1_pd(double const*dp)

(uses MOVSD + shuffling) Loads
a single DP FP value, copying to both elements. The address p
need not be 16-byte aligned.

r0 := *p

r1 := *p

__m128d _mm_loadr_pd(double const*dp)

(uses MOVAPD + shuffling) Loads
two DP FP values in reverse order. The address p
must be 16-byte aligned.

r0 := p[1]

r1 := p[0]

__m128d _mm_loadu_pd(double const*dp)

(uses MOVUPD) Loads two DP
FP values. The address p need not be 16-byte aligned.

r0 := p[0]

r1 := p[1]

__m128d _mm_load_sd(double const*dp)

(uses MOVSD) Loads a DP FP
value. The upper DP FP is set to zero. The address p
need not be 16-byte aligned.

r0 := *p

r1 := 0.0

__m128d _mm_loadh_pd(__m128d a, double const*dp)

(uses MOVHPD) Loads a DP FP
value as the upper DP FP value of the result. The lower DP FP value is
passed through from a. The address p
need not be 16-byte aligned.

r0 := a0

r1 := *p

__m128d _mm_loadl_pd(__m128d a, double const*dp)

(uses MOVLPD) Loads a DP FP
value as the lower DP FP value of the result. The upper DP FP value is
passed through from a. The address p
need not be 16-byte aligned.

r0 := *p

r1 := a1