UniMat: Scalable Diffusion for Materials Generation
Generative models trained on internet-scale data are capable of generating novel and realistic texts, images, and videos. A natural next question is whether these models can advance science, for example by generating novel stable materials. Traditionally, models with explicit structures (e.g., graphs) have been used in modeling structural relationships in scientific data (e.g., atoms and bonds in crystals), but generating structures can be difficult to scale to large and complex systems. Another challenge in generating materials is the mismatch between standard generative modeling metrics and downstream applications. For instance, common metrics such as the reconstruction error do not correlate well with the downstream goal of discovering stable materials. In this work, we tackle the scalability challenge by developing a unified crystal representation that can represent any crystal structure (UniMat), followed by training a diffusion probabilistic model on these UniMat representations. Our empirical results suggest that despite the lack of explicit structure modeling, UniMat can generate high fidelity crystal structures from larger and more complex chemical systems, outperforming previous graph-based approaches under various generative modeling metrics. To better connect the generation quality of materials to downstream applications, such as discovering novel stable materials, we propose additional metrics for evaluating generative models of materials, including per-composition formation energy and stability with respect to convex hulls through decomposition energy from Density Function Theory (DFT). Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.
Unified Crystal Representation
UniMat represents atoms in a material’s unit cell (the smallest repeating unit) by storing the continuous value x,y,z atom locations at the corresponding element entry in the periodic table. Any crystals can be represented in a 4-dimensional unified material space, [L,H,W,C], where L = 9 and H = 18 correspond to the number of periods and groups in the periodic table, L corresponds to the maximum number of atoms per element in the periodic table, and C = 3 corresponds to the x,y,z locations of each atoms in a unit cell. An example crystal represented by UniMat is shown below:
Diffusion Models with UniMat
With the above UniMat representation, we can use the denoising process of a diffusion model to move atoms from random initialized locations to their target locations in a unit cell as shown below. We denote x=y=z=-1 as a null location. Atoms that do not exist in a unit cell are moved to the null location.
One can condition generation on compositions by treating compositions as conditional input to the UNet. One may also condition generation on material properties through classifier-free guidance.
DFT Verification
We run DFT relaxations to compute the decomposition energy of unconditionally generated materials and compared that to the previous baseline (CDVAE). UniMat generates materials with significantly lower decomposition energy.
Generated Structures
We visualize unconditionally generated structures below and compare generated structures to the structures in the held-out test set (matched by formula).
We visualize unconditionally generated structures whose formulas do not exist in the test set below.
Citation
@article{yang2023scalable,
title={Scalable Diffusion for Materials Discovery},
author={Yang, Mengjiao and Cho, KwangHwan and Merchant, Amil and Abbeel, Pieter and Schuurmans, Dale and Mordatch, Igor and Cubuk, Ekin Dogus},
journal={arXiv e-prints},
year={2023}
}