Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

Normalize matrix by sum of columns

$
0
0

I have a tensor - batch of matrixes dims [10 x 6 x 52]  10  matrixes 6 * 52 raw major.  I can change batch size as I want. Data type is - single float. And I  need to normalize every matrix in the tensor by it columns sum(so sum will be a vector of length 52). So I need make a columnwise sum and devide every row in matrix to it.  A pretty typical task in different areas.  Currently, I am doing something like this:

 

//[10 x 6 x 52] - [batch x actions x cards_count]

// node.regrets is target and source tensor.  node.regrets_sum - storage for sum.

const size_t actionsCount = node.ActionsCount();

for (long long b = 0; b < _batch_size; b++)

{

memset(node.regrets_sum.data(), 0, node.regrets_sum.size() * sizeof(float));

 

for (size_t a = 0; a < actionsCount; a++)

{

const size_t regretsOffset = (b * actionsCount + a) * cards_count;

vsAdd(card_count, node.regrets_sum.data(), node.regrets.data() + regretsOffset, node.regrets_sum.data());

}

 

for (size_t a = 0; a < actionsCount; a++)

{

const size_t regretsOffset = (b * actionsCount + a) * cards_count;

vsDiv(card_count, node.regrets.data() + regretsOffset, node.regrets_sum.data(), node.regrets.data() + regretsOffset);

}

}

 

And this is the hottest point of my app. I am pretty sure that performance can be improved because currently by the profiling I know that gemm with this tensor is faster than this normalization. Any ideas how to optimize this with help of MKL and Intel compiler? Maybe I have missed some ready to use routine for this case. Thank you in advance!


Viewing all articles
Browse latest Browse all 2652

Latest Images

Trending Articles



Latest Images