International collaboration lays the foundation for future AI for materials via the OPTIMADE standard

Artificial intelligence (AI) is accelerating the development of new materials. A prerequisite for AI in materials research is large-scale use and exchange of data on materials, which is facilitated by a broad international standard. A major international collaboration including researchers from the LMS laboratory now presents an extended version of the OPTIMADE standard.

New technologies in areas such as energy and sustainability involving for example batteries, solar cells, LED lighting and biodegradable materials require new materials. To create materials with the exact properties required, many demanding simulations are performed on supercomputers, yielding large amounts of data that can be used to train machine learning models. These AI models can then efficiently predict the responses to new calculations that have not yet been made, and by extension predict the properties of new materials. 

But huge amounts of data are required to train the models. Data from large-scale simulations, and general data about materials, are collected in large databases. Over time, many such databases have emerged from different research groups and projects, like isolated islands in the sea. They work differently and often use properties that are defined in different ways. The OPTIMADE (Open databases integration for materials design) standard has been developed over the past eight years. Behind this standard is a large international network with over 30 institutions worldwide and large materials databases in Europe and the USA. The aim is to give users easier access to both leading and lesser-known materials databases. A new version of the standard, v1.2, is now being released, and is described in an article published in the journal Digital Discovery and co-authored by LMS scientists Jusong Yu, Giovanni Pizzi and Nicola Marzari. One of the biggest changes in the new version is a greatly enhanced possibility to accurately describe different material properties and other data using common, well-founded definitions.

The international collaboration spans the EU, the UK, the US, Mexico, Japan and China together with institutions such as École Polytechnique Fédérale de Lausanne (EPFL), University of California Berkeley, University of Cambridge, Northwestern University, Duke University, Paul Scherrer Institut (PSI), and Johns Hopkins University. Much of the collaboration takes place in meetings with annual workshops funded by CECAM in Switzerland, with the first one funded by the Lorentz Center in the Netherlands, that Nicola Marzari (LMS laboratory head) dedicated to the creation of a common API to access all computational materials databases. Giovanni Pizzi (group leader in the LMS laboratory) has been co-organizer of all annual meetings since 2019. Other activities have been supported by the organisation Psi-k, the competence centre NCCR MARVEL in Switzerland, and the e-Science Research Centre (SeRC) in Sweden. The researchers in the collaboration receive support from many different financiers.

The Materials Cloud portal, jointly maintained by the LMS lab at PSI and EPFL, includes an OPTIMADE client that provides a graphical user interface to query, find and download structures in all databases exposing an OPTIMADE API. Once a structure is found, it can be sent to other Materials Cloud Tools, such as the Quantum ESPRESSO input generator. A new feature has also been recently integrated in the Materials Cloud Archive, so that when a dataset with structures and properties is uploaded, data is automatically converted and served using the OPTIMADE standard (more information can be found here).

The standard is available at: https://www.optimade.org.

You can read the full scientific highlight article on the NCCR MARVEL website.

Paper reference: Matthew L. Evans, Johan Bergsma, Andrius Merkys, et al., Developments and applications of the OPTIMADE API for materials discovery, design, and data exchange, Digital Discovery (2024); https://doi.org/10.1039/D4DD00039K.