Memory operations using the Streaming SIMD Extensions should be performed on 16-byte-aligned data whenever possible.
F32vec4 and F64vec2 object variables are properly aligned by default. Note that floating point arrays are not automatically aligned. To get 16-byte alignment, you can use the alignment __declspec:
__declspec( align(16) ) float A;