Person pouring purple liquid into flask

Cutting-Edge Computational Chemistry Enabled by Deep Learning

Machine learning is becoming a bigger part of chemistry as of the last two or three years. Industries need to have people trained in both fields, and it’s taken time for them to make their way into this sector. Olexandr Isayev is at the forefront of that wave, and he talks to us about what he’s done while melding deep learning and chemistry together and his vision of where he sees this field going with this new tech.

Olexandr Isayev: Historically, chemistry was empirical science. It’s been driven by experiment. So, you find the observation, you formulate a hypothesis, you make a prediction, and do a test, so it’s the standard scientific method. Now, those new machine learning methods allow us to do a data-driven discovery.

Ginette: I’m Ginette.

Curtis: And I’m Curtis.

Ginette: And you are listening to Data Crunch.

Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world.

Ginette: A Vault Analytics production.

Ginette: Tableau is the leading software in data analysis, preparation, and interactive analytics, and we’re huge fans of it because we’ve seen how it facilitates quickly finding business value from data—helping you do this faster than anything else can. If you and your team have recently purchased Tableau licenses, you’re off to a great start, but in our experience working with companies over the years, many deployments of Tableau fail to realize their potential value because of a lack of training and understanding of how to use it well—which is a shame because it can truly transform your business when used well.

We’ve helped dozens of companies learn how to use Tableau and get real results for their businesses because we focus not only on the technical skills, but also on how to be a good analyst and solve real-world problems your business cares about. We come onsite to your business, get to know your employees and your business problems, and train you on the skills you need to make Tableau a success for your needs. We’ll even customize the training to your own data so your employees learn how to work with the specific data and problems that are relevant to them, building out analysis and dashboards that can immediately be used after the training to drive business value.

We train on Tableau at basic, intermediate, and advanced levels. We’d love to hear from you and help you transform your business with Tableau with an onsite training—send us an email at [email protected] or visit our site at vaultanalytics.com, and we’ll be in touch!

Curtis: Chemistry—while it can conjure up images of begoggled scientists in white coats donning blue rubber gloves in sterile laboratories—it touches more of our lives than we probably give it credit for. Think of lithium batteries that power our electronics, plastics that exist almost ubiquitously around us, dyes that color much of your world, jet engines, among so many other things. And as a personal note, I happen to find the science fascinating. I majored in chemistry in college.

Ginette: Today we speak with a man who is at the forefront of connecting AI with chemical sciences. His work has been published in some well known journals, and he’s headed up some impressive projects.

Olexandr Isayev: My name is Olexandr Isayev, or Olis for short. I grew up in Ukraine then I move to US. Did my PhD in computational chemistry, and I also have a minor in computer science. So I’ve worked in a lot of different topics in chemical sciences, and in particular, we use a computer simulation, like high performance computing simulation, and how we can address the challenges of chemical sciences, and recently, I started the faculty position at UNC, so I’m an assistant professor at the University of North Carolina in Chapel Hill, and basically where current focus of our work is connecting AI and machine learning with chemical and biological sciences and how does technology, data-driven technologies, could help us solve some of the fundamental problems in chemistry and biology.

So chemistry was kind of lagging behind some other field, right, because deep learning was the revolution in vision, in speech, in text, right? So in life sciences, it’s kind of on the back, so I think neural network come—the modern neural networks come—two or three years ago to chemical sciences. Interesting, so there was the old types of neural networks in 90s, so you can still find those papers where people use the neural networks in the 80s and 90s. Then you know, then there was a period of winter and no one used them, but now we see this new way for past 34 years, probably, and so those are very emergent applications, and not that many people work there, but new students come in and been trained in both computer science and chemistry, so the trend is accelerating, so we see this wave of applications last year and this year.

Ginette: He’s been at the forefront of the chemistry–artificial intelligence wave, helping augment traditional chemistry methods with machine learning to bring about what he thinks might be a leap-frog moment for the discipline.

Olexandr: We can make a map of chemical or material space, so this is not a physical space like, you know, like on the globe, but it’s imaginative, you know, high dimensional space where materials or molecules are and then we can use some data-driven methods to actually navigate us as a chemist or material scientist and look for a specific properties or functions of materials, and those types of maps help us design better high performance material, better drugs and stuff like that.

All our chemistry happens in the computer, so we’re computational people, but eventually you know it’s all ultimately in the hands of experimental—so people who go to the lab and synthesize and make actual material and test it, and we’ve worked with several different experimental groups, and we help them to design new material, for example for solar cell applications. We also worked with a lot of organic chemists, medicinal chemists, who do drug discovery, so design new, better drugs to treat disease like cancer or Alzheimer’s, and so this is a work in progress now.

Curtis: Olexandr’s computer simulations that use machine learning to predict chemical reactions saves the experimental chemists lots of time because he can tell them, based on his neural net’s computations, which combination of chemicals are worth testing. And this isn’t trivial. This can save chemists hundreds of hours of lab time because there’s so many possibilities of what you could test.

Olexandr: Some of those experimental methods are very expensive because of nature of the process, and they’re slow. It’s it’s really laborious work in the lab, but now we can do a simulation on the computer, and we can use is a physics-based simulation or data driven, like a machine-learning methods, to guide a chemical experiment, and we can predict and navigate them and say okay, so if you have the options to test thousand different things and instead of running a thousand different experiments, we help them prioritize and say, “oh, don’t do this 990, but this ten can be precious.”

Ginette: So, in what other ways is machine learning fundamentally changing chemistry?

Olexandr: Historically, chemistry was empirical science. It’s been driven by experiment. You find the observation, you formulate a hypothesis and you make a prediction and do a test, so it’s the standard scientific method from the Newton times, but now, you know, those new machine learning methods allow us to do a data-driven discovery. Given your historical data, you can train a machine-learning model and predict a sort of properties or a character or a feature of interest for particular molecule and then you can drive an experiment from that based on your machine learning models.

So there are a lot of physics-based methods, which mean you rely on some kind of fundamental principle, for example quantum mechanics, and you can solve the quantum mechanical problem for a particular molecule, and you can understand the properties of it, but the problem is this is a very computer-intensive process, and typically you have to run like a super computer, and it takes for a lot of time, so once machine learning kicks in, it allows us to do faster and accurate approximation of this problem, and this is one of the project in my lab, so we use neural networks, you know, deep learning, to approximate solution of Schrödinger equation, and this gives us a speedup of up to 6 order of magnitudes and so the supercomputer, you can run essentially on a laptop. So this give you a tremendous speed up.

Instead of doing expensive simulation on a supercomputer, neural net approximate this, and you can use a standard linux box and a GPU and get a answer much faster. Again there are certain approximation because it’s not, you know, exact solution, you know, a space of standard inorganic molecules, drug-like molecules, basically we have a very, very nice accuracy, and the solution is almost exact.

Ginette: Olexandr points out that his team can essentially do the work of a supercomputer with some neural networks, a linux box, and a GPU. By learning or refining your skills with neural networks, there’s a world of possibilities in every field.

Interested in neural networks? Then check out Brilliant.org, a problem solving website that teaches you to think like a computer scientist. Instead of passively listening to lectures, you get to master concepts like neural networks and machine learning by solving fun and challenging problems. Brilliant provides you with the tools and the framework that you need to tackle these challenges. Brilliant’s thought-provoking content based around breaking up complexities into bite-sized understandable chunks will lead you from curiosity to mastery. So what are you waiting for? They were good enough to sponsor this episode, and using this link lets them know that you came from us, and you can sign up for free, preview courses, and start learning! Go to Brilliant.org slash Data Crunch to sign up for free, and the first 200 people that go to the link will get 20% off the annual premium subscription. Once again, that’s Brilliant dot org slash Data Crunch.

Brilliant.com's logo with details

Curtis: So not only can machine learning help target the right experiments to solve a problem, it can also help solve equations that use huge computational resources faster than traditional methods by several orders of magnitude. That’s like riding on a jet instead of on the back of a giant snail. You can go a lot more places on the jet. The data Olexandr uses with his models include properties and structures of molecules already know from scientific literature, as well as solutions to the computationally intensive equations we mentioned earlier.

Olexandr: Either work is experimental data, so either go to a history of literature, some databases, so we collect experimental measurements or properties of molecules, and those properties can be anything, you know, efficiency of your lithium battery or efficiency of your solar cell that can be binding to a specific protein. It can be any kind of useful property, and then we connect this property with a structure of the molecules and materials by using some kind of features and apply machine learning methods.

The second approach is when we approximate the physics-based simulation instead of having this experimental results, we use the solution of this very expensive calculations as the ultimate target, and then neural net would approximate the solution of this equation. That can be energy of the molecule or it can be a computed property of the molecule, that’s the ban gaffe for example, or some other useful properties.

Ginette: So what else has Olexandr and his team been working on that’s caught the attention of so many people?

Olexandr: So we have a code on Github, and a couple of publication, so any curious reader could go to a technical detail sort of play by him- or herself. So basically what we did, we invested a lot of computational resources to solve the Shonier equation or organic molecules, and we generated a gigantic database for pairs of molecule and energy and some other properties, for example, and then we train a deep neural network that would predict that, so now when we did this hard work and very least a trained neural network so everyone can go, and then instead of, you know, plugging their own molecule and get the solution and get the energy and the structure of the molecule and use for their own projects, and now we collaborate with a lot of experiment lab and people who do drug discovery because, you know, those methods are used for many different applications. They are probably would revolutionize the field of computational chemistry soon, at least we hope so.

Curtis: So with this code, Olexandr and his team help many different experimental labs speed up their processes of finding what works well for their particular experiments. But this isn’t the only project he’s been working on.

Olexandr: Probably most of your readers know the Alpha Go, these reinforcement learning that beat the best players of the game Go. And the game Go is super complex. And actually what we built in the same analogy so the game Go has two pieces: one that play there, you know, make a decision, movement of the of the checkers on the board, and the other one to score, and basically they work together, so what we did, we essentially designed an Alpha Go, that you know . . . a machine suggest a molecule for a particular biological application, and then the place is itself and learns chemistry. And then it can suggest to us a molecule with specific desired function, so for example, what we show, we pick a particular protein called on Janus kinase, check 2, and it’s an important protein implicated in implicated in cancer and some other diseases, and what we show is that a machine can design an inhibitor for this protein, and therefore we can we can we can envision that’s fully a machine driven design of new drugs.

So you teach a machine to generate molecules and we use reinforcement learning to reward to make only useful molecule, you know, it’s like a carrot and stick, essentially. Our score system part is a different neural network. They take a structure of the molecule and predict binding to this specific enzyme. And basically it gives you an approximation to an experiment. And basically when you train them in the loop so the scoring part teach the generative part to generate only molecules and then we can maximize, you know, we can maximize binding and we can minimize binding and we can do a combination of different things. So eventually I think this would be a new way to how drugs are discovered instead of a chemist, you know serendipitously goes one by one to a molecule, here a machine could get a pool of useful molecule for you.

Ginette: In addition to helping experimental chemists limit what experiments they need to conduct in the lab, he’s suggesting that machine learning methods can actually help design new drugs by recommending creation of specific molecules. This would speed up getting medicine to the market for various illnesses.

Olexandr: It’s very interesting, so you see this wave of creativity. People use GANS use different types of neural networks, you know, game theory. So essentially what you see, those new interesting ideas and algorithms start coming to chemical sciences and biological sciences, so I’m really happy. I’m really excited about what’s come out of this.

I’m optimistic, but also I’m a little bit worried about, you know, the hype, right? So if you overhype, people get, you know, disappointed so people may have yet another winter, but I am optimistic that you know those those methods would significantly transform chemical sciences, so you’ll see faster drugs in the market, so you can treat more, you know, disease like cancer faster. You know, hopefully we will see personalized medicine when for example your own genome would be sequenced, and then we can design a specific treatment for your particular condition, and that would be possible as a combination of cutting-edge science and data-driven methods, and also design of a new materials, like steels, alloys would be accelerated as well, so I’m very optimistic. I’m very happy we live in this age. It’s very interesting to see this transformation.

Ginette: A huge thank you to Olexandr for speaking with us, and as mentioned in our podcast, if you want some better insights into your business data by training your team in Tableau, go to vaultanalytics.com or email us at [email protected]. We’ll teach you how to find insights and share them effectively, creating improvements for your company and greater success for you.

And as always, for the transcript and links for this podcast, you can go to datacrunchpodcst.com, and you’ll find the links at the bottom of the show transcript. If you like what you’re learning here with us, please share our podcast with your coworkers and friends and go to iTunes or your favorite podcast playing platform and leave us a review.

Links

vaultanalytics.com

brilliant.org/DataCrunch

Attributions

Music

“Loopster” Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/

Picture

Photo by Louis Reed on Unsplash