Pardiso Threadripper 2990wx versus Ryzen 1700

I have the same multi-physics finite element code generating a matrix. An old machine with a Ryzen 1700 (8 core) is faster than a threadripper 2990wx (32 core). Windows 10, intel64, mkl_rt.lib, and the MKL versions are 2018.1.156 for Ryzen 1700 and 2019.0.117 for Threadripper. I can provide an example matrix if it helps. Here are the options, which are same on both builds:

struct pardiso_struct

{

void *pt[64];

int maxfct{ 1 };

int mnum{ 1 };

int mtype{ 11 };

int n{ 0 };

int idum{ 0 }; //dummy not used by PARDISO when iparm(5-1) != 1

int nrhs{ 1 };

int iparm[64];

int msglvl{ 1 };

double ddum{ 0. };

int error{ 0 };

pardiso_struct()

{

// fill(pt, pt + 64, void(0)); does not work

for (int i = 0; i < 64; ++i)

pt[i] = 0;

std::fill(iparm, iparm + 64, 0);

iparm[0] = 1; // 0 for all default, !=0 for any custom

iparm[1] = 3; // 0 minimum degree alg, metis, 3 openMP metis

//iparm[2] // reserved

iparm[3] = 0; // For iterative methods

iparm[4] = 0; // user fill-in reducing permutation

iparm[5] = 0; // 0 - solution written on x, 1 - solution on b

//iparm[6] output of number of iterative refinement steps

iparm[7] = 0; // iterative refinement steps

//iparm[8] reserved

iparm[9] = 13; // pivoting, 13 for nonsymmetric, 8 for sym

iparm[10] = 1; // 0 no scaling, 1 scaling (1 Default for nonsym)

iparm[12] = 1; // 0 to disable weighted matching? 1 default for non-sym

//iparm[13]-iparm[19] outputs

//iparm[20] = special pivoting for symmetric but indefinite

//iparm[21] output for number of pos eigs

//iparm[22] output for number of neg eigs

iparm[23] = 1; // 0 for classic alg, 1 for openMP scalable > 8 procs

iparm[24] = 0; // 0 for parallel solve, 1 for sequential solve

//iparm[25] // reserved

iparm[26] = 0; // 0 Do not check sparse mat, 1 check sparse mat

iparm[27] = 0; // 0 double precision, 1 single precision

//iparm[28] reserved;

//iparm[29] output zero or neg pivots in sym

//iparm[30] only solve for certain components...?

//iparm[31][32] reserved

//iparm[33] some reproduceability stuff

iparm[34] = 1; //0 one based indexing, 1 zero based indexing

//iparm[35] something with schur complements

iparm[36] = 0; //0 CSR, >0 BSR, <0 convert to BSR

//iparm[59] ooc options

}

};

The results of reorder and factorization are here. Solve (omitted here) is slower on 2990wx but the main concern is numerical factorization time.

*************** Ryzen 7 1700 **********************

=== PARDISO: solving a real nonsymmetric system ===
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.847928 s
Time spent in reordering of the initial matrix (reorder) : 7.678907 s
Time spent in symbolic factorization (symbfct) : 2.075314 s
Time spent in data preparations for factorization (parlist) : 0.098494 s
Time spent in allocation of internal data structures (malloc) : 4.281882 s
Time spent in additional calculations : 3.785140 s
Total time spent : 18.767665 s

Statistics:
===========
Parallel Direct Factorization is running on 8 OpenMP

< Linear system Ax = b >
number of equations: 1928754
number of non-zeros in A: 46843184
number of non-zeros in A (%): 0.001259

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
number of supernodes: 795666
size of largest supernode: 9159
number of non-zeros in L: 673935341
number of non-zeros in U: 631031607
number of non-zeros in L+U: 1304966948

=== PARDISO: solving a real nonsymmetric system ===
Two-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 53.846398 s
Time spent in allocation of internal data structures (malloc) : 0.000878 s
Time spent in additional calculations : 0.000001 s
Total time spent : 53.847277 s

Statistics:
===========
Parallel Direct Factorization is running on 8 OpenMP

< Linear system Ax = b >
number of equations: 1928754
number of non-zeros in A: 46843184
number of non-zeros in A (%): 0.001259

number of right-hand sides: 1

gflop/s for the numerical factorization: 53.929973

****************** Threadripper 2990wx *********************************

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.919861 s
Time spent in reordering of the initial matrix (reorder) : 10.085178 s
Time spent in symbolic factorization (symbfct) : 2.207123 s
Time spent in data preparations for factorization (parlist) : 0.101967 s
Time spent in allocation of internal data structures (malloc) : 3.143640 s
Time spent in additional calculations : 3.677500 s
Total time spent : 20.135269 s

Statistics:
===========
Parallel Direct Factorization is running on 32 OpenMP

< Linear system Ax = b >
number of equations: 1928754
number of non-zeros in A: 46843184
number of non-zeros in A (%): 0.001259

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
number of supernodes: 794723
size of largest supernode: 7005
number of non-zeros in L: 683894639
number of non-zeros in U: 640539323
number of non-zeros in L+U: 1324433962

=== PARDISO: solving a real nonsymmetric system ===
Two-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 61.520888 s
Time spent in allocation of internal data structures (malloc) : 0.001112 s
Time spent in additional calculations : 0.000002 s
Total time spent : 61.522003 s

Statistics:
===========
Parallel Direct Factorization is running on 32 OpenMP

< Linear system Ax = b >
number of equations: 1928754
number of non-zeros in A: 46843184
number of non-zeros in A (%): 0.001259

number of right-hand sides: 1

gflop/s for the numerical factorization: 46.812250

Nearly 2 million unknowns should have enough work for each core. Manually specifying a max of 16 threads shows a modest speedup (53 seconds for numerical factorization), which suggests to me that this is a Pardiso scaling issue and not a hardware issue. Although, it may be due to the memory architecture of the 2990wx.

Any suggestions?

Pardiso Threadripper 2990wx versus Ryzen 1700

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112