PARDISO randomly crashes if limit the memory usage

Hi,

If I run the compiled bin exe at head node without limiting the num of CPUs and amount of memory, PARDISO can run through and give correct results at Linux server. Also, running at Windows system, the code always works. However, PARDISO randomly crashes if I limit the memory usage with qsub at Linux server (see runCWNLAT.sub below). The crashes happed without error sign, so PARDISO will not return any error information. Pls help me check what might be the reason?

In my code, I used C++ code to call some functions from Fortran code. In one execution, PARDISO will be called many times. PARDISO in called in Fortran code. The simple Fortran code calling PARDISO is:

…

! Init or set PARDISO parameters

maxfct = 1

mnum = 1

mtype = 6 ! complex and symmetric matrix

phase = 13 ! analysis, numerical factorization, solve, iterative refinement

nrhs = n_recei + 1 ! number of right-hand sides that need to be solved for

msglvl = 0 ! if msglvl=1, print statistical information

error = 0

call pardisoinit(pt, mtype, iparm) ! init pardiso with default parameters in accordance with the matrix type

iparm(4) = 0 ! no iterative solver, use direct algorithm

iparm(28) = 0 ! use type double precision "double complex" instead of "complex"

iparm(35) = 0 ! one-based indexing (Fortran-style indexing)

! Solve A*u = f with mkl PARDISO

call pardiso(pt, maxfct, mnum, mtype, phase, &

& n_totNodes, & ! num of rows of A, ~ num of equations in A*u = f

& csrA_vals, csrA_rows, csrA_cols, & ! CSR3 A

& perm, nrhs, iparm, msglvl, &

& f, & ! right-hand side vector/matrix

& u, & ! solution vector/matrix

& error)

if (error /= 0) then

write(6,*) 'ERROR during PARDISO backslash! Error = ', error

stop "*** ERROR during PARDISO backslash! ***"

endif

phase = -1

call pardiso(pt, maxfct, mnum, mtype, phase, n_totNodes, dummy, csrA_rows, csrA_cols, perm, nrhs, iparm, msglvl, dummy, dummy, error)

…

The matrix A above is sparse complex symmetric matrix. A’s number of nonzeros is about 1 million to 2 million. I tested the peak memory usage during the execution is about 3 GB, but PARDISO crashes even though I assign 16 GB memory at the server.

In makefile, I first compile C++ or Fortran source code to object, then link them together. Here are the details:

Operating system and version

-bash-4.1$ lsb_release -a

LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch

Distributor ID: RedHatEnterpriseServer

Description: Red Hat Enterprise Linux Server release 6.3 (Santiago)

Release: 6.3

Codename: Santiago

Library version: mkl_compser_xe_2013

-bash-4.1$ echo $MKLROOT

/usr/opt/intel/composer_xe_2013.1.117/mkl

Compiler version

-bash-4.1$ ifort -v

ifort version 13.0.1

GNU Compiler Collection (GCC)* or Microsoft Visual Studio* version (if applicable)

-bash-4.1$ g++ -v

Using built-in specs.

COLLECT_GCC=g++

COLLECT_LTO_WRAPPER=/sb/gcc-5.2.0/libexec/gcc/x86_64-unknown-linux-gnu/5.2.0/lto-wrapper

Target: x86_64-unknown-linux-gnu

Configured with: /sb/objdir/../gcc-5.2.0/configure --prefix=/sb/gcc-5.2.0 --enable-languages=c,c++,fortran,go --disable-multilib

Thread model: posix

gcc version 5.2.0 (GCC)

Steps to reproduce the error (include makefiles, command lines, small test cases, and build instructions)

Makefile:

SRCFDIR = $(realpath ./)/src_Fortran

SRCCDIR = $(realpath ./)/src_Cpp_cw5

OBJDIR = $(realpath ./)/obj

MKDIR = if [ ! -d $(@D) ]; then mkdir -p $(@D); fi

PROGRAM=cw5

ARCH = $(shell uname -m)

TARGET = ${PROGRAM}.${ARCH}

#include Makefile.${ARCH}

CPPC = g++

FC_SEQ = ifort

FC_PAR = ifort

FC_LINK = ifort

MKL_LINK_FLAGS =-L$(MKLROOT)/lib/intel64 -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread -lm

#Begin Optimized options

CPP_FLAGS = -std=c++17 -mcmodel=large -w

F_SEQ_FLAGS = -O3 -shared-intel

F_PAR_FLAGS = -O3 -shared-intel

F_LINK_FLAGS = -O3 -static-intel -cxxlib -lrt

#End Optimized options

C_MAIN = cw5.cpp

F_SRCS = dcwnlatg4.f

F90_SRCS = dcwnlatf4.f90

OBJS = $(OBJDIR)/${C_MAIN:.cpp=.o} $(OBJDIR)/${F_SRCS:.f=.o} $(OBJDIR)/${F90_SRCS:.f90=.o}

all: ${TARGET} CWNLAT

# ********* First Program: cw5 ************ #

#${TARGET}: ${OBJS}

# $(FC_LINK) -o $@ $(F_LINK_FLAGS) ${OBJS}

${TARGET}: ${OBJS}

$(FC_LINK) $(F_LINK_FLAGS) -nofor_main -o $@ ${OBJS} $(MKL_LINK_FLAGS)

$(OBJDIR)/dcwnlatg4.o : $(SRCFDIR)/${F_SRCS}

@$(MKDIR)

$(FC_PAR) $(F_PAR_FLAGS) -o $(OBJDIR)/dcwnlatg4.o -c $(SRCFDIR)/${F_SRCS}

$(OBJDIR)/dcwnlatf4.o : $(SRCFDIR)/${F90_SRCS}

@$(MKDIR)

$(FC_PAR) $(F_PAR_FLAGS) -o $(OBJDIR)/dcwnlatf4.o -c $(SRCFDIR)/${F90_SRCS}

$(OBJDIR)/cw5.o : $(SRCCDIR)/${C_MAIN}

@$(MKDIR)

$(CPPC) $(CPP_FLAGS) -o $(OBJDIR)/cw5.o -c $(SRCCDIR)/${C_MAIN} -lrt

# ********* Second Program CWNLAT ************ #

CWNLAT : $(SRCFDIR)/runCWNLAT.f

$(FC_SEQ) $(F_SEQ_FLAGS) -o CWNLAT $(SRCFDIR)/runCWNLAT.f

.PHONY: clean cleanall

clean:

rm $(OBJS) CWNLAT

cleanall:

rm $(OBJS) *~

runCWNLAT.sub used for qsub:

# Tell PBS which shell to use on the compute nodes Options are: /bin/bash or /bin/tcsh

#PBS -S /bin/bash

# Tell PBS the name to use for your job

#PBS -N runCWNLAT

# request #nodes:#cpus/node:#memory/node,requested time

#PBS -l select=1:ncpus=8:mem=16gb,walltime=00:04:00

# queue group

#PBS -q normal

# Tell PBS to join the output (.o) and error (.e) files into one file

#PBS -j oe

# *********** Commands **********#

# Tell PBS to run the job in the directory your job was submitted from

cd $PBS_O_WORKDIR

# Set up env for Intel MKL

source /opt/intel/bin/ifortvars.sh intel64

./CWNLAT ENmodel_33.DAT

PARDISO randomly crashes if limit the memory usage

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112