As of now the closest to what I want is the instruction vpmadd231d zmm, zmm, zmm which can deal with EPI32

Idea is lowering power requirements while also taking advantage of the fp32 pipes extremely high performance, which also happens to have lower latency than the integer unit (both CPI 0.5, but FP unit is lat 4 vs integer being lat10)

I need more than EPI16 native, but EPI32/64 is a waste of power and precision.

I'd also rather not do horrible things in the FP unit...

#knc #simd

Last updated 2 years ago