Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

MKL_DNN convolution has the wrong output order on Intel(R) Xeon(R) CPU E5-2650 v3 (Possible bug)

$
0
0

Hello all,

I recently implemented the convolution of the intel mkl library as described in the example included with the library. Everything is fine and dandy on my Laptop with a  i5-3210M. However when I tried to run the code on the big machine, with an Intel(R) Xeon(R) CPU E5-2650 v3 i ran into some bugs/problems.

For outputs that have a channel size that is a multiple of 8 the order of the output is wrong. This is either a mistake on my side (probably with the compile options) or in the worst case a bug in the mkl. I wrote a short test script similar to the example file, that implements a standard forward convolution.

#include <iostream>
#include "mkl_dnn.h"
#include <vector>
using namespace std;


#define dimension (4)
int main() {

	dnnPrimitiveAttributes_t attributes;
	dnnPrimitive_t conv_prim = NULL;


	float* resConv1[dnnResourceNumber] = {0};

	size_t batch_num = 1;


	bool use_bias = false;

	size_t xinp = 4,
		yinp = 4,
		xout = 4,
		yout = 4,
		inpchannels = 1,
		outchannels = 8,
		xfilt = 3,
		yfilt = 3;


    size_t outputSize[dimension] = { xout, yout, outchannels, batch_num };
    size_t outputStrides[dimension] = { 1, xout, xout * yout, xout * yout * outchannels };

    size_t inputSize[dimension] = { xinp, yinp, inpchannels, batch_num };
    size_t inputStrides[dimension] = { 1, xinp, xinp * yinp, xinp * yinp * inpchannels };

    size_t filterSize[dimension] = { xfilt, yfilt, inpchannels, outchannels };
    size_t filterStrides[dimension] = { 1, xfilt, xfilt * yfilt, xfilt * yfilt * inpchannels };

    size_t biasSize[1] = { outputSize[2] };
    size_t biasStrides[1] = { outputStrides[2] };

    size_t convolutionStride[dimension - 2] = { 1, 1 };
    int inputOffset[dimension - 2 ] = { - ( (outputSize[0]/2)) - filterSize[0]/2 + inputSize[0]/2, - ( (outputSize[0]/2)) - filterSize[0]/2 + inputSize[0]/2 };

    dnnLayout_t lt_conv1_input = NULL,
                lt_conv1_filt = NULL,
                lt_conv1_bias = NULL,
                lt_conv1_output = NULL;




	if( dnnPrimitiveAttributesCreate_F32(&attributes)!= E_SUCCESS){
		std::cout << "error"<< std::endl;
	}
	dnnError_t err;
	if( use_bias ){
		err= dnnConvolutionCreateForwardBias_F32(&conv_prim, attributes,
	                    dnnAlgorithmConvolutionDirect, dimension, inputSize,
	                    outputSize, filterSize, convolutionStride, inputOffset,
	                    dnnBorderZeros);
	}else{
		err = dnnConvolutionCreateForward_F32(&conv_prim, attributes,
						dnnAlgorithmConvolutionDirect, dimension, inputSize,
						outputSize, filterSize, convolutionStride, inputOffset,
						 dnnBorderZeros);
	}

	if( err != E_SUCCESS){
		switch (err){
		case E_INCORRECT_INPUT_PARAMETER:
				std::cout << "incorrect input parameter while creating the convolution"<< std::endl;break;
		default:
			std::cout << "error while creating convolution"<< std::endl;
		}

	}

    dnnLayoutCreateFromPrimitive_F32(&lt_conv1_input, conv_prim, dnnResourceSrc);
    dnnLayoutCreateFromPrimitive_F32(&lt_conv1_filt, conv_prim, dnnResourceFilter);
    if( use_bias){
    	dnnLayoutCreateFromPrimitive_F32(&lt_conv1_bias, conv_prim, dnnResourceBias);
    }
    dnnLayoutCreateFromPrimitive_F32(&lt_conv1_output,conv_prim, dnnResourceDst);


    std::vector<float> input(xinp*yinp*inpchannels,1.0);
    std::vector<float> output(xout*yout*outchannels,1.0);
    std::vector<float> filter(xfilt*yfilt*inpchannels*outchannels,1.0);
    std::vector<float> bias(outchannels,1.0);

    resConv1[dnnResourceSrc] = &(input[0]);
    resConv1[dnnResourceFilter] = &filter[0];
    if( use_bias)  resConv1[dnnResourceBias] = &bias[0];
    resConv1[dnnResourceDst]= &output[0];

    dnnError_t err_exe = dnnExecute_F32(conv_prim, (void**) resConv1);
    if( err_exe != E_SUCCESS){
    	std::cout << "Error while forward propagation in convolutional layer"<< std::endl;
    	if( err_exe== E_MEMORY_ERROR){
    		std::cout << "Memory Error"<< std::endl;
    	}
    	if( err_exe == E_UNIMPLEMENTED){
    		std::cout << "Unimplemented"<< std::endl;
    	}
    	if( err_exe == E_UNSUPPORTED_DIMENSION){
    		std::cout << "Unsupported dimension"<< std::endl;
    	}
    	if( err_exe == E_INCORRECT_INPUT_PARAMETER){
    		std::cout << "Incorrect input parameter"<< std::endl;
    	}
    }

    std::cout << "output"<<std::endl;
    for( int i=0; i < output.size(); i++){
    	std::cout << output[i] << "";
    }
    std::cout << std::endl;
	return 0;
}

 

The desired output for a 4x4 image with 8 convolutions and an input of 1s and 3x3 filters of 1s is:

4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4

This is also what my mobile CPU gives me when i run the code. However on the big PC i get

4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4

which is obviously somewhat right, but not in the right order. However when i change the output channel to not be a multiple of 8 the code runs fine even on the Xeon CPU. This might be due to the mkl switching to a slower and different algorithm as explained in this post:

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...

Does anybody have an explanation or even a fix for this issue? Is this known behaviour on Xeon CPUs, or a bug in the software? I don't necessarily wan't to switch to the open source implementation, since it would mean a week of new implementing/testing. 

For compilation i used the following linkline for both systems :

 -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

 -I${MKLROOT}/include -I${MKLROOT}/../lib/intel64_lin

 

any help would be appreciated.

 

 

 


Viewing all articles
Browse latest Browse all 2652

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>