I am performing a Cholesky factorization with pdpotrf(). I am reading all the matrix in the master node and then I distribute it. Then, every node is handling a submatrix and call pdpotrf(). Then I just send back the submatrices to the master node and compose the solution.
I am amazed by that. How does it do it? I mean what algorithm does it implement? I suspect it's block partitioning and every node is communicating (I hope not much, but I would really like to know).
Moreover, I feel I should understand the algorithmic part, in order to choose properly the block sizes.
Finally, I would like to know if pdpotrf() is multithreaded. For example, I read in this paper, that 4-threaded approaches do exist.