Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

Problems with mkl_cluster_sparse_solver

$
0
0

Dear all,

unfortunately, I have again some troubles with the mkl_cluster_sparse_solver as in my previous topic. I have taken one of the examples intel provides in the example dir of mkl and modified it in two ways: on the one hand the code can now read an arbitrary matrix stored in the file fort.110 and on the other hand I perform a loop over the routines since I want to change the matrix within one cycle later on. The first problem arises when treating large system sizes.

In this case, you can find the matrix in fort1.zip. The program aborts with a segmentation fault after 18%: forrtl: severe (174): SIGSEGV, segmentation fault occurred. Unfortunately, this is somehow hard to track down what is the issue but it must be in the subroutine since it starts. As I said this happens for large matrices. Unfortunately I dont know how to get rid of this problem.

The next problem occurs for small matrices as found in fort.zip. The problem seems to be the loop: the first cycle everything works fine but the second cycle aborts with an error message I have already seen in one of my last topics:

Fatal error in PMPI_Reduce: Message truncated, error stack:
PMPI_Reduce(2334).................: MPI_Reduce(sbuf=0x7d7d7f8, rbuf=0x7f0b900, count=22912, MPI_DOUBLE, MPI_SUM, root=0, comm=0x84000004) failed
MPIR_Reduce_impl(1439)............: fail failed
I_MPIR_Reduce_intra(1533).........: Failure during collective
MPIR_Reduce_intra(1201)...........: fail failed
MPIR_Reduce_Shum_ring(833)........: fail failed
MPIDI_CH3U_Receive_data_found(131): Message from rank 1 and tag 11 truncated; 14000 bytes received but buffer size is 1296

I have tried what I did the last time: provide all parameters (nhrs, msglevel, iparm, ..) for all ranks again but it does not seem to fix the issue.

This is the program code (cl_solver_f90.f90):

program cluster_sparse_solver
use mkl_cluster_sparse_solver
implicit none
include 'mpif.h'
integer, parameter :: dp = kind(1.0D0)
!.. Internal solver memory pointer for 64-bit architectures
TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE)  :: pt(64)

integer maxfct, mnum, mtype, phase, nrhs, error, msglvl, i, ik, l1, k1, idum(1), DimensionL, Nsparse
integer*4 mpi_stat, rank, num_procs
double precision :: ddum(1)

integer, allocatable :: IA( : ),  JA( : ), iparm( : )
double precision, allocatable :: VAL( : ), rhodot( : ), rho( : )

integer(4) MKL_COMM


MKL_COMM=MPI_COMM_WORLD
call mpi_init(mpi_stat)
call mpi_comm_rank(MKL_COMM, rank, mpi_stat)


do l1 = 1, 64
  pt(l1)%dummy = 0
end do

 error       = 0   ! initialize error flag
 msglvl      = 1   ! print statistical information
 mtype       = 11  ! real, non-symmetric
 nrhs        = 1
 maxfct      = 1
 mnum        = 1

allocate(iparm(64))
 
do l1 = 1, 64
 iparm(l1) = 0
end do

!Setup PARDISO control parameter
 iparm(1)  = 1   ! do not use default values
 iparm(2)  = 3   ! fill-in reordering from METIS
 iparm(8)  = 100 ! Max. number of iterative refinement steps on entry
 iparm(10) = 13  ! perturb the pivot elements with 1E-13
 iparm(11) = 1   ! use nonsymmetric permutation and scaling MPS
 iparm(13) = 1   ! Improved accuracy using nonsymmetric weighted matching
 iparm(27) = 1   ! checks whether column indices are sorted in increasing order within each row

read(110,*) DimensionL, Nsparse

allocate(VAL(Nsparse),JA(Nsparse),IA(DimensionL))

if (rank.eq.0) then
do k1=1,Nsparse
read(110,*) VAL(k1)
end do
do k1=1,DimensionL+1
read(110,*) IA(k1)
end do
do k1=1,Nsparse
read(110,*) JA(k1)
end do
end if

allocate(rhodot(DimensionL), rho(DimensionL))

if (rank.eq.0) then
rhodot=0.0d0
rhodot(1) = 1.0d0
rho=0.0d0
end if

if (rank.eq.0) write(*,*) 'INIT PARDISO'

ik = 0
Pardisoloop: do

ik = ik + 1

phase = 12
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error

phase = 33
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error

if (ik.ge.4) exit Pardisoloop

end do Pardisoloop


call MPI_BARRIER(MKL_COMM,mpi_stat)

phase = -1
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'Release of memory: ', error


call mpi_finalize(mpi_stat)

end

I compile with

mpiifort -i8 -I${MKLROOT}/include -c -o mkl_cluster_sparse_solver.o ${MKLROOT}/include/mkl_cluster_sparse_solver.f90
mpiifort -i8 -I${MKLROOT}/include -c -o cl_solver_f90.o cl_solver_f90.f90
mpiifort mkl_cluster_sparse_solver.o cl_solver_f90.o -o MPI.out  -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

and run the program with mpiexec -n 2 ./MPI.out. Our cluster has 16 cores per node and I request two of them. Ram should not be the problem (64gb), since it perfectly runs with the normal pardiso on just one node. I set export MKL_NUM_THREADS=16. Am I right that the slave MPI process should automatically obtain parts of the LL^T factors or do I have to use the distributed version in order to do so? The reason why I ask is that I cannot observe any process running on the second node.

The Versions are: MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239, but my college can also reproduce the issue on other versions/clusters.

Thanks in advance,

Horst

AttachmentSize
Downloadapplication/zipfort1.zip52.63 MB
Downloadapplication/zipfort.zip356.13 KB

Viewing all articles
Browse latest Browse all 2652

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>