I have a very large program which does numerical modeling. Up until recently I've been debugging and everything has been fine. I'm now switching over to optimize areas that need improvements. In my code I need to solve many linear algebra problems (50k+ cases) each take about a tenth of a second in sequential mode. Each case is completely variable independent. So it seemed to make sense to me to put the entire thing in a openmp for loop to run my 50k+cases and keep the cores fed that way. That's when I started noticing small errors in the data. Such as 4.600009 vs 4.600011. Then I started seeing cases where the solver didn't converge at all (all of my cases converge without openmp).
What clued me in to it might be DSS is I ran a very large set of cases (~1M) while running a profiler. After about 100k or so the concurrency drops to 1 core (of 16) and shows that all of the cores are waiting on dss_reorder to complete.
I cannot post all of the code(proprietary and legal reasons), but here's the bulk of the function that runs the DSS code. Its fairly straight forward DSS code. Each openmp loop gets its own dss handle.
SOLVER_SETUP
MKL_INT Error;
_MKL_DSS_HANDLE_t Handle;
MKL_INT opt = MKL_DSS_DEFAULTS | MKL_DSS_MSG_LVL_WARNING | MKL_DSS_TERM_LVL_ERROR | MKL_DSS_ZERO_BASED_INDEXING;
MKL_INT Sym = MKL_DSS_NON_SYMMETRIC;
MKL_INT Typ = MKL_DSS_INDEFINITE;
const MKL_INT RowCount = NodeCount;
const MKL_INT ColCount = NodeCount;
const MKL_INT NonZeros = mMatrixA.NonZeros.size();
const MKL_INT One = 1;
Error = dss_create(Handle, opt);
CME_Assert(ERROR != MKL_DSS_SUCCESS);
Error = dss_define_structure(Handle, Sym, mMatrixA.RowStart, RowCount, ColCount, mMatrixA.Columns, NonZeros);
CME_Assert(ERROR != MKL_DSS_SUCCESS);
Error = dss_reorder(Handle, opt, 0);
CME_Assert(ERROR != MKL_DSS_SUCCESS);
SOLVER_LOOP_INIT
Error = dss_factor_real(Handle, Typ, mMatrixA.Values);
CME_Assert(ERROR != MKL_DSS_SUCCESS);
Error = dss_solve_real(Handle, opt, mVectorB, One, mCurrentValue);
CME_Assert(ERROR != MKL_DSS_SUCCESS);
SOLVER_LOOP_END
Error = dss_delete(Handle, opt);
SOLVER_***** are macros
The solver loop shown here is not the open mp loop discussed. It is there because the equations are highly non-linear and the matrix mMatrixA depends upon mCurrentValue.
None of the asserts indicate there are any issues with the solver at anytime.
I'm currently running MKL 11.3 update 2. Visual Studio 2013. I've tried this on 3 different machines all with the same result.
From what I've seen it seems like dss_reorder has concurrency issues, but the documentation says otherwise. I have tried both the sequential and parallel version of the mkl with the same results.
I am hoping someone has seen this before and knows of a work around (Although, I did not see this issue on any forums)
Thanks for any help