It would be useful to limit the output of MKL_VERBOSE on a per-thread basis. For example, assuming the use of a KNL and you are running 1 process with 16 threads. You may want to limit the MKL_VERBOSE output from only 1 calling thread. (though KML calls may be using multiple threads)