Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

Pardiso and Cluster Pardiso Questions

$
0
0

Hello everyone!

I'm using PARDISO to solve Navier Stoks and Temperature equations in one program.

I use phase 11 only at program start to tell solver what my matrixes look like. I need to refill both matrixes, make factorization and find solution at every time layer. To do so I use maxfct parameter equal 2 and change mnum from 1 to 2 for different equations. I also use 2 different pt arrays for different equations. I need to mention that I use phase 0 after each time layer in order to free memory. 

I want to accelerate computations by using several cluster nodes. I found Cluster version of PARDISO and it was a surprise for me that maxfct and mnum parameters are ignored. I also didnt find phase 0.

My first question is: do I really need those parameters to solve my problem on a cluster? I dont want to use phase 11 at every time layer because it will be too slow.

Secondly I'm having a problem using PARDISO on a single cluster node when the number of equations is ~ 13 million. I recieve error -2. Output below I get after phase 11 for both matrixes and phase 23 for Navier Stoks equation. 

=== PARDISO: solving a real nonsymmetric system ===
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 1.680254 s
Time spent in reordering of the initial matrix (reorder)         : 15.632488 s
Time spent in symbolic factorization (symbfct)                   : 128.883790 s
Time spent in data preparations for factorization (parlist)      : 2.423303 s
Time spent in allocation of internal data structures (malloc)    : 168.633973 s
Time spent in additional calculations                            : 13.837179 s
Total time spent                                                 : 331.090987 s

Statistics:
===========
Parallel Direct Factorization is running on 28 OpenMP

< Linear system Ax = b >
             number of equations:           13198336
             number of non-zeros in A:      107134147
             number of non-zeros in A (%): 0.000062

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    6792104
             size of largest supernode:               56189
             number of non-zeros in L:                52781978834
             number of non-zeros in U:                52521656342
             number of non-zeros in L+U:              105303635176

=== PARDISO: solving a real nonsymmetric system ===
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.309069 s
Time spent in reordering of the initial matrix (reorder)         : 2.541601 s
Time spent in symbolic factorization (symbfct)                   : 2.516607 s
Time spent in data preparations for factorization (parlist)      : 0.269381 s
Time spent in allocation of internal data structures (malloc)    : 1.306915 s
Time spent in additional calculations                            : 1.014548 s
Total time spent                                                 : 7.958121 s

Statistics:
===========
Parallel Direct Factorization is running on 28 OpenMP

< Linear system Ax = b >
             number of equations:           3368499
             number of non-zeros in A:      22736533
             number of non-zeros in A (%): 0.000200

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    2244305
             size of largest supernode:               13571
             number of non-zeros in L:                2906625075
             number of non-zeros in U:                2863523632
             number of non-zeros in L+U:              5770148707
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
*** Error in PARDISO  (     insufficient_memory) error_num= 8
*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 412359210 bytes failed
total memory wanted here: 435961139 kbyte

=== PARDISO: solving a real nonsymmetric system ===


Summary: ( starting phase is factorization, ending phase is solution )
================

Times:
======
Time spent in additional calculations                            : 0.000074 s
Total time spent                                                 : 0.000074 s

Statistics:
===========
Parallel Direct Factorization is running on 28 OpenMP

< Linear system Ax = b >
             number of equations:           13198336
             number of non-zeros in A:      107134147
             number of non-zeros in A (%): 0.000062

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    6792104
             size of largest supernode:               56189
             number of non-zeros in L:                52781978834
             number of non-zeros in U:                52521656342
             number of non-zeros in L+U:              105303635176
             gflop   for the numerical factorization: 3841175.475158

 PARDISO ERROR =           -2
 

Do I really need so much memory to solve system or it is some kind of bug? If it is not, can this problem be solved by using cluster PARDISO?

 

My iparm setup is below.

    IPARM(1) = 1 ! NO SOLVER DEFAULT
    IPARM(2) = 3 ! FILL-IN REORDERING FROM METIS
    IPARM(3) = 0 ! NUMBERS OF PROCESSORS
    IPARM(4) = 0 ! NO ITERATIVE-DIRECT ALGORITHM
    IPARM(5) = 0 ! NO USER FILL-IN REDUCING PERMUTATION
    IPARM(6) = 0 ! =0 SOLUTION ON THE FIRST N COMPONENTS OF X
    IPARM(7) = 0 ! NOT IN USE
    IPARM(8) = 5 ! NUMBERS OF ITERATIVE REFINEMENT STEPS
    IPARM(9) = 0 ! NOT IN USE
    IPARM(10) = 16 ! PERTURB THE PIVOT ELEMENTS WITH 1E-13
    IPARM(11) = 1 ! USE NONSYMMETRIC PERMUTATION AND SCALING MPS
    IPARM(12) = 0 ! NOT IN USE
    IPARM(13) = 1 ! MAXIMUM WEIGHTED MATCHING ALGORITHM IS SWITCHED-ON (DEFAULT FOR NON-SYMMETRIC)
    IPARM(14) = 0 ! OUTPUT: NUMBER OF PERTURBED PIVOTS
    IPARM(15) = 0 ! NOT IN USE
    IPARM(16) = 0 ! NOT IN USE
    IPARM(17) = 0 ! NOT IN USE
    IPARM(18) = -1 ! OUTPUT: NUMBER OF NONZEROS IN THE FACTOR LU
    IPARM(19) = -1 ! OUTPUT: MFLOPS FOR LU FACTORIZATION
    IPARM(20) = 0 ! OUTPUT: NUMBERS OF CG ITERATIONS
    IPARM(24) = 0
    IPARM(34) = 0
    IPARM(27) = 0
    IPARM(35) = 1 ! ZERO BASE INDEXING
    IPARM(39) = 0

Im using Parallel Studio XE 2017.4.196 on a single cluster node. Each node has 2 х Intel Xeon E5-2690 v4 and 256GB of RAM and it runs on CentOS 7.3

 

Thanks in advance for your help.


Viewing all articles
Browse latest Browse all 2652

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>