I have a real head-scratcher of a problem here, and was hoping that someone can help me resolve it. The issue is to do with a fatal error generated when dynamically loading MKL from a Linux shared library, which is in turn referenced by a Python module created with the SWIG interface-generation tool.
I'm encountering this issue on an Ubuntu Linux system, using gcc 4.6.3 to compile against the version of MKL included in Parallel Studio 2016 Update 2. I'm also using Python 2.7.3 and SWIG 2.0.4. (I can consistently reproduce the problem using the Intel C compiler, Python 2.7.11, and/or Parallel Studio 2015 as well.) I am running in an environment with all necessary environment variables set, as produced by mklvars.sh (MKLROOT is there, LD_LIBRARY_PATH includes the MKL libraries, etc.).
I've created a stripped-down example to demonstrate my issue, but it's still a little complicated, so I will explain as I go. First, we define a C library called foo in the header/source pair foo.h/foo.c. This library exposes a single function, bar(), which makes a trivial BLAS call. (First code block is foo.h, second is foo.c.)
#ifndef _FOO_H #define _FOO_H void bar(); #endif//_FOO_H
#include "mkl.h" void bar() { double arr[1] = { 1.0 }; cblas_daxpy(1, 1, arr, 1, arr, 1); }
To check that this function runs without errors, we use it in a simple executable, defined in main.c:
#include "foo.h" int main() { bar(); return 0; }
Then we create a simple SWIG interface file foo.i, allowing generation of a Python interface for the foo library:
%module foo %{ #define SWIG_FILE_WITH_INIT #include "foo.h" %} %include "foo.h"
The main executable and the Python/SWIG module can be built using gcc, with the following sequence of commands. Note that the MKL linking options are precisely as recommended by the MKL link line advisor tool. With the exception of a warning about a set-but-unused variable in the SWIG wrapper, compilation proceeds cleanly.
gcc -Wall -Wextra -O0 -fPIC -I$MKLROOT/include -c -o foo.o foo.c gcc -Wall -Wextra -O0 -shared -L$MKLROOT/lib/intel64 -Wl,-rpath=./ -o libfoo.so foo.o -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -ldl -lpthread -lm gcc -Wall -Wextra -O0 -L. -Wl,-rpath=./ -o main main.c -lfoo swig -python foo.i gcc -Wall -Wextra -O0 -fPIC -I/usr/include/python2.7 -c -o foo_wrap.o foo_wrap.c gcc -Wall -Wextra -O0 -shared -L. -L$MKLROOT/lib/intel64 -Wl,-rpath=./ -o _foo.so foo_wrap.o -lfoo -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -ldl -lpthread -lm
After building, the main executable runs without errors. However, attempting to use the generated SWIG module from within a Python interpreter (launched from the directory containing the various outputs of the compilation process) produces the following error:
Python 2.7.3 (default, Jun 22 2015, 19:33:41) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.>>> import foo>>> foo.bar() Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.
By setting LD_DEBUG=libs and trying again, I can see that the error is connected to a symbol lookup error, with the error message:
[...]/libmkl_def.so: error: symbol lookup error: undefined symbol: mkl_dft_fft_fix_twiddle_table_32f (fatal)
This symbol is defined in libmkl_core.so, which I believe everything should be linked against. The same error (or at least, the same "Intel MKL FATAL ERROR: ..." output) is reported in a post on this forum from December 2015: <https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/.... Another forum post, linked from the original (<https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...), suggests using LD_PRELOAD to attempt to resolve the problem. Sure enough, if I open the Python interpreter with LD_PRELOAD=$MKLROOT/lib/intel64/libmkl_core.so python, the call to foo.bar() executes without issue. (Attempting to preload libmkl_avx2.so or libmkl_def.so without libmkl_core.so produces a symbol lookup error for the exact same symbol as before.)
So, the question is: can anybody suggest why this is happening, and hopefully suggest a fix that does not involve LD_PRELOAD? (We can't ship code that requires LD_PRELOAD...) My first thought was that this was a Python issue, but I'm not sure -- the other forum post reporting this problem was in relation to a tool called FuPerMod, which (from a quick look at the relevant git repo) doesn't seem to make any use of Python at all...