Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

Segfault with 8 threads linking against MKL 11.0.3

$
0
0

Dear all,

I am experiencing a segfault on Linux with an application of mine when I link it against MKL shipped with composer_xe_2013.3.163 (Update 3 - March 2013), which should be 11.0.3 according to http://software.intel.com/en-us/articles/which-version-of-the-intel-ipp-....

My application is multi-threaded and it uses pthreads. The segfault happens in cblas_dgemm when I spawn 8 threads: runs with 1, 2 or 4 threads work fine. I am linking against libmkl_intel_lp64.so, libmkl_core.so, libmkl_sequential.so. I have the following environment:

MKL_DISABLE_FAST_MM=1
MKL_SERIAL=YES
MKL_NUM_THREADS=1

The same binary compiled with composer_xe_2013.3.163 runs perfectly on 8 threads if I point LD_LIBRARY_PATH to the MKL libraries shipped with Intel Compilers version 11.1.069. So it really seems to be a version-specific issue.

I have tried to set:

ulimit -s unlimited
MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_ALL=1"
OMP_NUM_THREADS=1
OMP_DYNAMIC=FALSE
MKL_DYNAMIC=FALSE
OMP_NESTED=FALSE

but it makes no difference. Here I attach the valgrind trace:

==20175== Thread 3:
==20175== Invalid read of size 8
==20175==    at 0x53DB0DA: mkl_serv_malloc (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175==    by 0x860C01B: mkl_blas_mc_dgemm_get_bufs (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x8684768: mkl_blas_mc_xdgemm_par (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x8683B4B: mkl_blas_mc_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x53ED8DB: mkl_blas_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175==    by 0x662A7CE: mkl_blas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_sequential.so)
==20175==    by 0x4CF0AA8: DGEMM (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175==    by 0x4D02452: cblas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175==    by 0x455204: pred_y_values (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175==    by 0x4689F7: lmo_cv_thread (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175==    by 0x3B0A00683C: start_thread (in /lib64/libpthread-2.5.so)
==20175==    by 0x3B094D4F8C: clone (in /lib64/libc-2.5.so)
==20175==  Address 0xd0 is not stack'd, malloc'd or (recently) free'd
==20175==
==20175==
==20175== Process terminating with default action of signal 11 (SIGSEGV)
==20175==  Access not within mapped region at address 0xD0
==20175==    at 0x53DB0DA: mkl_serv_malloc (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175==    by 0x860C01B: mkl_blas_mc_dgemm_get_bufs (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x8684768: mkl_blas_mc_xdgemm_par (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x8683B4B: mkl_blas_mc_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175==    by 0x53ED8DB: mkl_blas_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175==    by 0x662A7CE: mkl_blas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_sequential.so)
==20175==    by 0x4CF0AA8: DGEMM (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175==    by 0x4D02452: cblas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175==    by 0x455204: pred_y_values (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175==    by 0x4689F7: lmo_cv_thread (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175==    by 0x3B0A00683C: start_thread (in /lib64/libpthread-2.5.so)
==20175==    by 0x3B094D4F8C: clone (in /lib64/libc-2.5.so)

I consistenly get this error on "Address 0xd0".
As I mentioned, my program works perfectly when linked against older Intel MKL versions, as well as ATLAS or Sun Performance LIbrary.
I would be very glad if you could indicate a way to solve my problem.

Thanks, best regards,
Paolo


Viewing all articles
Browse latest Browse all 2652

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>