

# Intel® AVX-512 Architecture

Elena Demikhovsky Intel® Software and Services Group

Comprehensive vector extension for HPC and enterprise

AVX-512 - What's new?





#### **Conflict Detection**

Sparse computations are hard for vectorization

for(i=0; i<16; i++) { A[B[i]]++; } // Load 16 B[i] index = vload &B[i] old\_val = vgather A, index // Grab A[B[i]] new val = vadd old val, +1.0// Compute new values vscatter A, index, new\_val // Update A[B[i]]

Code above is wrong if any values within B[i] are duplicated

**VPCONFLICT** instruction detects elements with conflicts

Copyright ° 2013 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S and other countries.





### □ 512-bit wide vectors, 32 SIMD registers 8 new mask registers Embedded Rounding Control ☐ Embedded Broadcast **New Math instructions** □ 2-source shuffles ☐ Gather and Scatter Compress and Expand **Conflict Detection**

#### **Embedded Broadcast**

A source from memory is repeated across all the elements.

```
vbroadcastss zmm3, [rax]
vaddps zmm1, zmm2, zmm3
```

vaddps zmm1, zmm2, [rax] {1to16}

#### **Embedded Rounding Control**

- Static (per instruction) rounding rode
- No need to access MXCSR any more!

vaddps zmm7 {k6}, zmm2, zmm4 {rd} vcvtdq2ps zmm1, zmm2, {ru}

All exceptions are always suspended by using embedded RC

#### Masking

#### **Unmasked elements remain** unchanged:

VADDPD zmm1 {k1}, zmm2, zmm3 Or zeroed:

VADDPD zmm1 {k1} {z}, zmm2, zmm3



- Memory fault suppression
- **Avoid FP exceptions**
- Avoid extra blends

```
float32 A[N], B[N], C[N];
for(i=0; i<16; i++)
 if (B[i] != 0)
   A[i] = A[i] / B[i];
   A[i] = A[i] / C[i];
             VMOVUPS zmm2, A
             VCMPPS k1, zmm0, B
             VDIVPS zmm1 {k1}{z}, zmm2, B
             KNOT k2, k1
             VDIVPS zmm1 {k2}, zmm2, C
             VMOVUPS A, zmm1
```

nay or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.

ations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

## Masking in LLVM



