So essentially,
We have a Universal Quantum Chemistry dataset to find new drugs now!
Paper: β2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials (18 Pages)
Github: https://github.com/AIRI-Institute/nablaDFT
Researchers from Moscow and St. Petersburg want to make better methods of computational quantum chemistry provide accurate approximations of molecular properties. This is crucial for industries such as computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications.
Hmm..Whatβs the background?
Quantum chemistry is used to approximate molecular properties which are important for areas such as computer-aided drug discovery. The complexity of these computations makes them difficult to scale. Density functional theory (DFT) is the primary approach for solving the many-particle SE for electrons. DFT provides reasonably accurate predictions but even a single iteration may take several CPU hours making its use in molecular modeling tasks where it would be called many times, impractical.
Neural Network Potentials (NNPs) are a promising alternative to quantum chemistry methods but require large datasets for training. The researchers decided to release a dataset to assist NNP approaches.
Ok, So what is proposed in the research paper?
The researchers created a new dataset and benchmark called β2DFT. The β2DFT dataset is built on the nablaDFT dataset, contains twice as many molecular structures, and three times more conformations. The data was generated using the ΟB97X-D/def2-SVP DFT level.
They proposed a benchmark encompassing Hamiltonian prediction, energy and force prediction, and conformational optimization to evaluate the performance of NN-based models for QC.
Whatβs next?
Future work may involve expanding the β2DFT dataset to include structures not currently included such as nanoparticles, nanotubes, big rings, and other non-drug-like structures.
The authors plan to release a full set of interatomic forces for the β2DFT dataset, training NNPs on the full β2DFT, benchmark other NNPs for conformational optimization using the β2DFT dataset.
So essentially,
We have a Universal Quantum Chemistry dataset to find new drugs now!