I am using DGEMM from MKL to do multiplication between matrix and vectors.
I found when I test in simple program, just calling DGEMM 200000 times to compute 256*256 matrix times 256*1 vector, it takes only about 7 seconds (nthreads=8).
I my real poisson solver, which need this multiplication 200000 times, still 256*256 matrix times 256*1 vector, it takes 2 min, which is much much slower than in simple test.
Could anyone suggest any reason about this low performance? My poisson solver is openmp code.
Thanks in advance!
Sincerely,
Xuan