HPCG and mpirun on 2S Xeon-EP

Hi,

I'm encountering a problem when trying to measure socket performance of a Xeon E5 v3 chip with COD active but the problem also persists when I try to run on two sockets of a 2S Xeon-EP node. I am using the latest benchmark from the intel.com website (l_mklb_p_2017.1.013) and am following the advice from https://software.intel.com/en-us/node/599526.

I was trying running

I_MPI_ADJUST_ALLREDUCE=5 mpiexec.hydra -n 2 env OMP_NUM_THREADS=18 KMP_AFFINITY=granularity=fine,compact,1,0 bin/xhpcg_avx2 --n=168 on a 2S E5-2697 v4 with COD deactivated and;
mpiexec.hydra -genv I_MPI_PIN_DOMAIN node -genv I_MPI_PI disable -np 1 -env OMP_NUM_THREADS 7 -env KMP_AFFINITY 'verbose,granularity=fine,proclist=[0,1,2,3,4,5,6],explicit' ../l_mklb_p_2017.1.013/hpcg/mybins/xhpcg_avx2_mpi --n=128 --t=0 : -np 1 -env OMP_NUM_THREADS 7 -env KMP_AFFINITY 'verbose,granularity=fine,proclist=[7,8,9,10,11,12,13],explicit' ../l_mklb_p_2017.1.013/hpcg/mybins/xhpcg_avx2_mpi --n=128 --t=0 to evaluate the socket performance of a 14-core Haswell-EP with COD active.

The problem is that xhpcg does not finish when running with more than one MPI process per node. When running one process per node and setting OMP_NUM_THREADS to x I can see x*100% CPU load for that process in top (I know top probably isn't the best tool to estimate core utilisation for a memory-bound application, but it's a good enough indicator); if I use more than one MPI process, I see the cpu utilisation dropping to 100% for each MPI process instead of x*100%.

I tried some debugging but I'm neither an MPI nor HPCG expert, so some help would be appreciated. I set
HPCG_OPTS = -DHPCG_DEBUG -DHPCG_DETAILED_DEBUG and compiled a new binary. If I run with two MPI processes I get two files, the last entries of each are:
broadep2:IMPI_IOMP_AVX2 iwi325$ tail hpcg_log_n168_2p_1t_2016.11.24.20.16.15.txt
Process 0 of 2 has 9261 rows.
Process 0 of 2 has 230702 nonzeros.
Process 0 of 2 has 4741632 rows.
Process 0 of 2 has 126758012 nonzeros.
Process 0 of 2 has 592704 rows.
Process 0 of 2 has 15687500 nonzeros.
Process 0 of 2 has 74088 rows.
Process 0 of 2 has 1922000 nonzeros.
Process 0 of 2 has 9261 rows.
Process 0 of 2 has 230702 nonzeros.

broadep2:IMPI_IOMP_AVX2 iwi325$ tail hpcg_log_n168_2p_1t_1_2016.11.24.20.16.15.txt
Process 1 of 2 has 9261 rows.
Process 1 of 2 has 230702 nonzeros.
Process 1 of 2 has 4741632 rows.
Process 1 of 2 has 126758012 nonzeros.
Process 1 of 2 has 592704 rows.
Process 1 of 2 has 15687500 nonzeros.
Process 1 of 2 has 74088 rows.
Process 1 of 2 has 1922000 nonzeros.
Process 1 of 2 has 9261 rows.
Process 1 of 2 has 230702 nonzeros.

Maybe there's an MPI_barrier and they're waiting for a third non-existend MPI process?

Any help would be appreciated.

HPCG and mpirun on 2S Xeon-EP

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112