The table below shows the scaling of VASP.4 code on the T3D. The system is l-Fe with a cell containing 64 atoms, Gamma point only was used, the number of plane waves is 12500, and the number of included bands is 384.
The main problem with the current algorithm is the sub space rotation. Sub space rotation requires the diagonalization of a relatively small matrix (in this case ), and this step scales badly on a massively parallel machine. VASP currently uses either scaLAPACK or a fast Jacobi matrix diagonalization scheme written by Ian Bush. On 64 nodes the Jacoby scheme requires around 1 sec to diagonalize the matrix, but increasing the number of nodes does not improve the timing. The scaLAPACK needs at least 2 sec and reaches this performance already with 16 nodes.