Hi,
In some of our codes, we have noticed some slight discrepancies in the results when using MKL AXPY calls and our own explicit loops to calculate Y=Y+ALPHA*X. When we are performing iterative matrix solutions, the final solutions using AXPY can be significantly worse than the solutions using explicit loops instead of AXPY.
I've attached a test case under 64-bit MS Windows using Intel MKL 2019 update 2. For some vectors and scale factors, the results are identical down to machine precision. For other vector and scale factor combinations, you can barely maintain the specified floating point precision. In the attached screen shot, the left column of numbers shows the RMS error between the AXPY result and the explicit loop result for two cases. In the first case, the two results are different and in the second case no difference is discernible between the two results. Since AXPY loops have no dependencies between vector entries, it does not seem like it could be a threading issue. While we are aware that floating point operations are always tricky, we don't understand why independent operations of y(i) = y(i) + alpha*x(i) are returning such different results.
We've tried the various suggestions from the documentation about getting repeatable floating point solutions to no avail. Any advice would be much appreciated. Is this just to be expected or is there some other MKL issue going on?
Thanks,
John