Simulations of materials from first principles have improved drastically over the
last fewdecades, benefitting from newly developed methods and access to increasingly
large computing resources. Nevertheless, a quantum mechanical description
of a solid without approximations is not feasible. In the wide field of methods for
ab initio calculations of electronic structure, it has become apparent that density
functional theory and, in particular, the local density approximation can also make
simulations of large systems accessible. Density functional calculations provide insight
into the processes taking place in a vast range of materials by their access to an
understandable electronic structure in the framework of the Kohn-Sham single particle
wave functions. A number of functionalities in the fields of electronic devices,
catalytic surfaces, molecular synthesis and magnetic materials can be explained
by analyzing the resulting total energies, ground state structures and Kohn-Sham
spectra. However, challenging physical problems are often accompanied by calculations
including a huge number of atoms in the simulation volume, mostly due to
very low symmetry. The total workload of wave-function-based DFT scales at best
quadraticallywith the number of atoms. This means that supercomputersmust be
used. In the present work, an implementation of DFT on real-space grids has been
developed, suitable for making use of the massively parallel computing resources
of modern supercomputers. Massively parallel machines are based on distributed
memory and huge numbers of compute nodes, easily exceeding 100,000 parallel
processes. An efficient parallelization of density functional calculations is only
possible when the data can be stored process-local and the amount of inter-node
communication is kept low. Our real-space grid approach with three-dimensional
domain decomposition provides an intrinsic data locality and solves both the Poisson
equation for the electrostatic problemand the Kohn-Sham eigenvalue problem
on a uniform real-space grid. The derivative operators are approximated by finite
differences leading to localized operators which only require communication with the nearest neighbor processes. This leads to excellent parallel performance at large
system sizes. Treating only valence electrons, we apply the projector-augmented
wave method for accurate modeling of energy contributions and scattering properties
of the atomic cores. In addition to real-space grid parallelization, we apply a
distribution of the workload of different Kohn-Sham states onto parallel processes.
This second parallelization level avoids the memory bottleneck for large system
sizes and introduces even more parallel speedup. Calculations of systems with
up to 3584 atoms of Ge, Sb and Te were performed on (up to) all 294,912 cores of
JUGENE, the massively parallel supercomputer installed at Forschungszentrum
Jülich.
Paul Ferdinand Baumeister