Hello everyone!
I'm using PARDISO to solve Navier Stoks and Temperature equations in one program.
I use phase 11 only at program start to tell solver what my matrixes look like. I need to refill both matrixes, make factorization and find solution at every time layer. To do so I use maxfct parameter equal 2 and change mnum from 1 to 2 for different equations. I also use 2 different pt arrays for different equations. I need to mention that I use phase 0 after each time layer in order to free memory.
I want to accelerate computations by using several cluster nodes. I found Cluster version of PARDISO and it was a surprise for me that maxfct and mnum parameters are ignored. I also didnt find phase 0.
My first question is: do I really need those parameters to solve my problem on a cluster? I dont want to use phase 11 at every time layer because it will be too slow.
Secondly I'm having a problem using PARDISO on a single cluster node when the number of equations is ~ 13 million. I recieve error -2. Output below I get after phase 11 for both matrixes and phase 23 for Navier Stoks equation.
=== PARDISO: solving a real nonsymmetric system === 0-based array is turned ON PARDISO double precision computation is turned ON Parallel METIS algorithm at reorder step is turned ON Scaling is turned ON Matching is turned ON Summary: ( reordering phase ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 1.680254 s Time spent in reordering of the initial matrix (reorder) : 15.632488 s Time spent in symbolic factorization (symbfct) : 128.883790 s Time spent in data preparations for factorization (parlist) : 2.423303 s Time spent in allocation of internal data structures (malloc) : 168.633973 s Time spent in additional calculations : 13.837179 s Total time spent : 331.090987 s Statistics: =========== Parallel Direct Factorization is running on 28 OpenMP < Linear system Ax = b > number of equations: 13198336 number of non-zeros in A: 107134147 number of non-zeros in A (%): 0.000062 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 number of supernodes: 6792104 size of largest supernode: 56189 number of non-zeros in L: 52781978834 number of non-zeros in U: 52521656342 number of non-zeros in L+U: 105303635176 === PARDISO: solving a real nonsymmetric system === 0-based array is turned ON PARDISO double precision computation is turned ON Parallel METIS algorithm at reorder step is turned ON Scaling is turned ON Matching is turned ON Summary: ( reordering phase ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 0.309069 s Time spent in reordering of the initial matrix (reorder) : 2.541601 s Time spent in symbolic factorization (symbfct) : 2.516607 s Time spent in data preparations for factorization (parlist) : 0.269381 s Time spent in allocation of internal data structures (malloc) : 1.306915 s Time spent in additional calculations : 1.014548 s Total time spent : 7.958121 s Statistics: =========== Parallel Direct Factorization is running on 28 OpenMP < Linear system Ax = b > number of equations: 3368499 number of non-zeros in A: 22736533 number of non-zeros in A (%): 0.000200 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 number of supernodes: 2244305 size of largest supernode: 13571 number of non-zeros in L: 2906625075 number of non-zeros in U: 2863523632 number of non-zeros in L+U: 5770148707 === PARDISO is running in In-Core mode, because iparam(60)=0 === *** Error in PARDISO ( insufficient_memory) error_num= 8 *** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 412359210 bytes failed total memory wanted here: 435961139 kbyte === PARDISO: solving a real nonsymmetric system === Summary: ( starting phase is factorization, ending phase is solution ) ================ Times: ====== Time spent in additional calculations : 0.000074 s Total time spent : 0.000074 s Statistics: =========== Parallel Direct Factorization is running on 28 OpenMP < Linear system Ax = b > number of equations: 13198336 number of non-zeros in A: 107134147 number of non-zeros in A (%): 0.000062 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 number of supernodes: 6792104 size of largest supernode: 56189 number of non-zeros in L: 52781978834 number of non-zeros in U: 52521656342 number of non-zeros in L+U: 105303635176 gflop for the numerical factorization: 3841175.475158 PARDISO ERROR = -2
Do I really need so much memory to solve system or it is some kind of bug? If it is not, can this problem be solved by using cluster PARDISO?
My iparm setup is below.
IPARM(1) = 1 ! NO SOLVER DEFAULT IPARM(2) = 3 ! FILL-IN REORDERING FROM METIS IPARM(3) = 0 ! NUMBERS OF PROCESSORS IPARM(4) = 0 ! NO ITERATIVE-DIRECT ALGORITHM IPARM(5) = 0 ! NO USER FILL-IN REDUCING PERMUTATION IPARM(6) = 0 ! =0 SOLUTION ON THE FIRST N COMPONENTS OF X IPARM(7) = 0 ! NOT IN USE IPARM(8) = 5 ! NUMBERS OF ITERATIVE REFINEMENT STEPS IPARM(9) = 0 ! NOT IN USE IPARM(10) = 16 ! PERTURB THE PIVOT ELEMENTS WITH 1E-13 IPARM(11) = 1 ! USE NONSYMMETRIC PERMUTATION AND SCALING MPS IPARM(12) = 0 ! NOT IN USE IPARM(13) = 1 ! MAXIMUM WEIGHTED MATCHING ALGORITHM IS SWITCHED-ON (DEFAULT FOR NON-SYMMETRIC) IPARM(14) = 0 ! OUTPUT: NUMBER OF PERTURBED PIVOTS IPARM(15) = 0 ! NOT IN USE IPARM(16) = 0 ! NOT IN USE IPARM(17) = 0 ! NOT IN USE IPARM(18) = -1 ! OUTPUT: NUMBER OF NONZEROS IN THE FACTOR LU IPARM(19) = -1 ! OUTPUT: MFLOPS FOR LU FACTORIZATION IPARM(20) = 0 ! OUTPUT: NUMBERS OF CG ITERATIONS IPARM(24) = 0 IPARM(34) = 0 IPARM(27) = 0 IPARM(35) = 1 ! ZERO BASE INDEXING IPARM(39) = 0
Im using Parallel Studio XE 2017.4.196 on a single cluster node. Each node has 2 х Intel Xeon E5-2690 v4 and 256GB of RAM and it runs on CentOS 7.3
Thanks in advance for your help.