Hi,
i have a special use case which needs to compute two independent GEMMs.
each one with a MNK in the range of [20~4000], on Xeon skylake 8180, only reaching 600~700 GFlops/sec.
from the algorithm level, the two GEMMs has no denpendency, so they can be launched in parallel.
how can i parallel these two GEMMs? say one socket for each one, perhaps. i suppose i can't use batch GEMM for this.