Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

mkl(dgemm) performance problems on "superlarge" processors

$
0
0

Hi,

I was running two subsequent dgemm operations: T=AB and C=A'T with A=(56,000x400,000), B=(400,000x30), T=(56,000x30) and C=B.

Conditional on the CPU I measured these wall clock times (for the dgemm operations only):

Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz with 36 (real) cores, 46080 KB cache, 250GB of RAM

T=AB: 3.73 seconds,

C=A'T: 4.17 seconds

 

Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 56 (real) cores, 19712 KB cache, 2TB of RAM

T=AB: 91.47 seconds

C=A'T: 232.78 seconds

What was paticularly striking was that T=AB used all 56 cores, whereas C=A'T used only half of it.

kmp setting was: KMP_AFFINITY=compact,1,0,granularity=fine

 

I am wondering whether the bad performance of the latter is solely attributable to its architecture and therefore is set in stone, or whether I can somehow optimize mkl/kmp environment variables to increase performance.

Thanks


Viewing all articles
Browse latest Browse all 2652

Trending Articles