Hi Everyone,
We are developing an application that uses the FGMRES function on the MKL library to solve systems of linear equations as part of Newton iterations. Recently we did a bit of benchmarking and found that, as the number of equations increases, the processor utilization goes down.
We instrumented the code and realized that calls to dfgmres take a progressively larger amount of the total time in the solution operation as the number of equations increases. Basically, we modified the "fgmres_full_fnct_c.c" file provided in the mkl examples directory and computed elapsed timed for different operations such as the calls to fgmres and the time to solve reverse communication callbacks such as RCI_request=1 (matrix-vector product), RCI_request=3 (application of preconditioner), etc. Here are a few numbers:
number of equations = 480k
total solution time = 8.6 s
(rci_request = 1) = 0.7 s
(rci_request = 3) = 2.2 s
calls to dfgmres = 4.9 s
number of equations = 950k
total solution time = 27 s
(rci_request = 1) = 1.8 s
(rci_request = 3) = 5.7 s
calls to dfgmres = 18 s
number of equations = 7,150k
total solution time = 820 s
(rci_request = 1) = 15 s
(rci_request = 3) = 83 s
calls to dfgmres = 700 s
We also took pictures of the resource manager and noted that processor utilization is very low for large periods of time, as low as 4%, despite the fact that mkl correctly sets the maximum number of threads to the number of cores (16) in the system.
Does anybody have an idea of what is happening?
Sincerely,
Gonzalo
PS: We have several, current licenses of Intel Parallel Studio but Intel's support site is not letting me submit this question to priority support because I am not associated with the account that was used to register the product in our office.