Curious about graph databases? Matthias Broecheler from DataStax talks with us about looking at the world through a graph lens and using it to solve real world problems.

Ginette: I’m Ginette,

Curtis: and I’m Curtis,

Ginette: and you are listening to Data Crunch,

Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.

Ginette: Data crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.

Today we chat about graph databases with Matthais Broecheler, Chief Technologist at DataStax and coauthor of the Practitioner’s Guide to Graph Data.

Curtis: Graph isn’t something we’ve done on this show yet. So I’m glad you’re here. I’ve gone to conferences and stuff in the past. And I keep hearing, you know, this is the year of the graph database, and then nothing emerges. Right, right? And so I’d really love to dive in since you’ve done so much with the graph database. Where are the use cases that, that it’s applicable and really useful as opposed to relational. And then I know there’s also sort of a mid ground where you could use a relational or you could also use a graph and they both work. Right? So helping us all understand what that is, what that landscape looks like would be awesome.

Matthias: Absolutely. Yeah. I would be, I’ll be happy to talk about all these things. Yeah. It’s, it’s a, it’s a fascinating world. I’m talking about. Like, it’s, it’s, you know, the year of the graph. People love to do this kind of industry analysis, right? Like where are we on the hype cycle and what is coming, what is going. I think graph is interesting in the sense that there is a difference between how you solve problems and what the technology choices are that you use to solve those problems. So to your point, if you are very familiar with relational technology, you can actually solve graph problems with relational technology—if you bring the right mindset to the problem. It makes certain things harder, right? Like you, for instance, you would likely run into like join problems and having to do like manual join optimization and figuring out how to really build a solid physical data model based on the ER diagram, like all those sort of things.

But if you, you know, if you have like 25 years of relational database expertise under your belt, that’s not a bad starting point. And in fact, in the book that we wrote, we spent a whole chapter talking about this and talking about, you know, graph versus relational, because I think what gets lost in the technology conversation is the bigger sort of the much more interesting point and that is how can we teach people to think in terms of graph and to understand when they are confronted with the graph problem and to switch their mindset towards the graph problem. The relational system, but I mean, it’s one, one way to make computers do data, and it’s a, as you said, it’s a very powerful way, and we’ve been doing it for decades and the systems have become incredibly sophisticated, but we need to keep in mind where they came from.

And that was sort of from early bookkeeping, you know, applications like how do you keep track of transactions and like invoices and, you know, payroll and things like that, right? Like that’s kind of where the relational world originally came from, like a very kind of flat way of looking at the world. And that has really changed tremendously. Those applications still exist. Like obviously ERP still exists, and we still need to keep track of payroll and we still need to file invoices, but the world has evolved tremendously from there to now we have the internet and now we have the internet of devices, and we have social media and we all connected and, and the world has become much more of an interconnected system, not only at the biological level where we already see this, but even at the societal level of how societies are structured. And the pandemic that we’re currently in is, is a very vivid reminder of how intertwined our supply chains are and how intertwined nations are and, and things like that. And all those are effectively ways of looking at the world in a graph-structured way. And I think that’s really the key message we’re trying to get across the book in the book is to say, how do you use graph technology? Yes, for sure. But also, how do you get yourself to think in a way that is graph like so you can actually solve the problem you’re trying to solve.

Curtis: Do you do that in your book by some examples, some case studies? Or how do you approach that?

Matthias: Yes, so we do it twofold. So one is exactly as you said, we pick a number of different problems. So we look at a customer 360 example, and the kind of the kind of connectivity that exists in modern commerce systems. We look at an IOT cell tower type of example. We look at a Bitcoin example. We look at a movie recommender system example. So there’s, there’s a lot of examples that we go through to show on different topics and use cases, how graph thinking can be applied to solve particular problems. But we also take the approach of trying to introduce what we, what we call graph thinking in more general terms, and try to explain how to shift your mindset from a looking at the world as individual objects that as a secondary thing might be connected to other objects, which is how we mostly look at the relational world, right?

Like tables. And then a table can be a relationship to thinking of relationships as a first class citizen of sort that that really determines how the problem ought to be solved. And I think a great, a great way to show how powerful this is, is to look at the early days of internet search, right? Like if you, if you remember back to, you know, the 1990s and, and very early two thousands, right? There were lots of search engines on the market, like, like AltaVista and Lycos that dominated that space. And they looked at the problem of internet search, like you would look at the problem of document search, right? They looked at each webpage as an individual document that had content and structure and they, you know, they hired the best linguists and NLP experts, and they did a tremendous amount of work and built fantastically complex systems to analyze that, that huge corpus of documents they found on the web, but one problem that they had is how do you, how do you know if a document is authoritative on a particular topic, right?

Because that determines how it should be ranked in the search results, not just whether words occur on a page. And so they did a lot of linguistic analysis to determine how they could infer authority from just this one entity, the document, and then Google came around and they took a very different approach to the problem and that they looked at the problem as a graph problem. They said, well, yes, they are individual documents, but more interestingly, those documents are linked to each other. And if we think of those links as inferring or assigning authority, right, if I link to your blog or to your website, I am basically saying I endorse, you know, Data Crunch and I endorsed the content that you have on your page. And so I can look at that relationship as a sort of endorsement. And if I then sort of recursively compute that, which is effectively what page rank is, I get away of ranking pages on the internet that is induced by the graph structure that underlies it all.

And that completely blew away all other search engines to the point where, you know, I think some of them might still be around, but basically they have been forgotten. And we have come to the understanding that the way you do internet search is by looking at the internet as a graph. And I think there are so many examples like that, where we looked at a problem as a collection of objects. And once we changed it to it’s actually a graph, you have a whole different set of tools that you can apply to solve the problem that is much more powerful than just individual entity analysis.

Curtis: I remember doing searches on AltaVista. They’re now gone. And, um, what are some other examples that you have, you know. I mean, this is really interesting to start thinking about this. What are some other ones that are maybe one other, one that you think is really interesting that maybe people aren’t as familiar with, not that they would be familiar with the graph concept within Google, but, but maybe something like a concept that maybe they don’t think about often or use all the time.

Matthias: Yeah. I think one thing that I’m personally very interested in is just the world of finance and commerce, right? Like how everything is so tightly intertwined. And I think the financial crisis of 2008/2009 made that very apparent how a certain failure in the system can cascade and, and lead to a cascading failure event that we could not . . . at the time, we certainly couldn’t predict it, but we also couldn’t even understand it when it was happening. Right? And we were kind of running around trying very desperate things to, to, to make it not take down the entire financial system at the time, which was mostly driven by our lack of understanding for how, what is on one bank’s books influence of how another bank performs and how all these, you know, securities were intertwined and what kind of depths were on, on which bank and such.

And I think that’s still very much prevalent today where the interconnectivity between our commerce system and our economy as a whole, and the financial system is a huge massive, massive graph that oftentimes is hard to see, but in case of failure, it becomes very apparent how cascading effects can have very, very devastating effects. And I think it is for me, it’s a really interesting area now, again, in the COVID-19 crisis, because we can see that obviously this is a huge shock to our economy, but it is very hard right now to predict what kind of shock it is going to be and how we best buffer the shock other than like blanket handing out money, which is sort of the approach that we’re currently taking. And I’m not saying that that is, that is a bad idea. I’m saying it would be really delightful if we could advance our thinking.

And if it could have advanced our systems to the point where we can actually understand all these interdependencies and accurately model and predict what is going to happen similarly to like how you can do weather forecast, right? Like we have a very good understanding for instance of metrological phenomenon, to the point where we can fairly actually predict the weather. Now, obviously our economy is a much more complex system than that, but it seems that we could get to the point where we understand the graph structure and, and model it in a way and observe it in a way that actually allows us to control it better. Another example is that of, of cybersecurity, which is a completely different take on a graph problem. And that is when you look at the kinds of attacks that are now being launched against companies or States for that matter, or in agencies, those are very sophisticated.

You see groups of people who take a very long time and a very deliberate approach to penetrate systems and they, they find multiple entry points and then they spend a long time sort of scanning and analyzing the internal system to, to plan their next move and get closer and closer to sensitive data while trying not to trip off any of the, sort of, any of the warning systems or intrusion detection systems that would alert to their presence. And the only way we can really defend against those is if we understand the context and the history in which these adversaries act inside the system, right? If you look at any individual event, may it be a file that is being sent or a port that is being opened, or a connection that shouldn’t have been opened, like any of these may not individually trigger detection, but if you put them in context and you understand and can trace the attack vector, then you can get a much better risk profile and come up with systems that alert to these intrusions without being overly sensitive and, you know, being constantly in panic mode. And I think that’s an, that’s another area where graph thinking is starting to become very important to be able to trace and defend against those attacks.

Curtis: Yeah. So lots of, lots of use cases here. A lot of us being that we have worked with relational databases so much, we can get into a relational database and we know how to analyze it and aggregate it and visualize it and you know, run machine learning algorithms against it. Can you give us a sense for how that may differ in an, in a graph database? Like how would you visualize an analysis or how would you aggregate it in a meaningful way to get insight when it’s in a graph state as opposed to relational?

Matthias: That’s a, that’s a really good question. So on the visualization front, there’s, there’s obviously a lot of, a lot of graph visualization tools, and they are also sometimes applied to relational systems in, in the graph world. Those are usually usually packaged with a graph database. For instance, DSE graph comes with a tool called DataStax Studio that has graph visualization built in so you can, you can kind of quickly do like doing an analysis of sort of, of a sub graph of the graph, but obviously when you deal with very large graphs, that can be very overwhelming, right? There’s not, there’s not a whole lot of use in looking at a giant ball of spaghetti where your screen is basically a bunch of dots with lines in it that that’s not something humans can perceive in any, any real way. So there’s a lot of work to try to understand how to provide the right level of granularity and how to do for instance, clustering, to understand when there’s like clusters of vertices in your graph that should belong together, that you can collapse them onto one. So that’s a really interesting area of research. So graph visualization is one thing there’s a lot of also a lot of really coUs and GPU acceleration to handle very large graphs. So you can kind of zoom in and out and do interactive analysis of graph data.

The other thing that’s very different from relational is the ability to figure out how you would want to structure your data to begin with, like in the relational world. We usually, we usually start with the entities and, you know, we call it entity, relation . . . entity, relational modeling, for a reason, we usually start with the entities, then look at the relationships and map that onto tables. In the graph world, there’s usually a more fluid type and more fluid and heterogeneous approach where you build your schema incrementally in a more whiteboard type fashion, and try to include more data as you go along. One of the really powerful ways that you can utilize graph technology is to include multiple different types of data and get a more comprehensive view of the problem. And I think that’s a, that’s a kind of mentality that in the relational world is, is still fairly new. I would say, in the, in the data data analyst space, machine learning space, I think we’re, you know, we’re very well versed in the whole notion of feature engineering and like, what features do you need to consider and how do you include them?

And, and with graph, you kind of taking that one step further and not just looking at like, features on this in an entity sense. So like what kind of attributes for my records do I include in a support vector machine let’s say, but furthermore, what kind of relationships and related attributes do I include? And what does that mean for the kind of problem that I’m looking at? So that kind of, that kind of connected thinking is I think the biggest difference in looking at the world from an, from an entity relationship point of view, versus a graph point of view, where when you were kind intuitively familiar with this idea of flows and dependency and interdependency, and even bi-directional dependencies, where sometimes you don’t actually know what the, what the cause of relationship might be. And so you’re dealing more with like mark of network type dependencies in the probabilistic sense than with Bayesian network type dependencies, where you have some kind of directionality in the causality that you’re exploiting.

Curtis: It’s hard to do this, right, over a podcast ’cause we can’t visualize anything. Right. Essentially . . .

Matthias: Yeah, that’d be amazing if we could do a whiteboard right now.

Curtis: Right, exactly. So I’m just trying to give, so some listeners may have never, I imagine most have at least heard of graph databases, but some maybe have not. And so if you were to just on the most basic level, sort of describe, you know, you have your nodes and this kind of thing.

Matthias: Yeah, absolutely. I think the, I think the, the, the one liner would be depending on how important the relationships are to the problem you’re trying to solve, that usually ends up being the determinant as to whether or not you should use graph technology versus sort of technologies that you may be more familiar with, like relational, or even just flat files, like just CSV files that you load if you’re doing batch analysis. And the reason being is that graph databases have a lot of support, both in terms of the technology, the implementation, the index structures and performance, and the query engine and such for dealing with relationships, but also in the query languages and the tools that are available that make you more productive with relationships. Those are, I think the two important things to consider if you’re dealing, if you’re in a world where you’re doing what you find funds, and you’re starting in relation to what, and then you find yourself writing like five way joins, you will probably notice it, your productivity drops quite significantly when you’re dealing with relational systems, because you’re doing like all of a sudden, like join reordering becomes important and you might have to sort of hints and your queries get really ugly.

And it’s really hard to see what’s going on. And, and there’s lots of like productivity problems with that, but there’s also performance problems with that. Because a relational engine, at some level of join depth, we’ll run into trouble. And those are the two areas where I would say, you know, kind of, if you see those things happening, start checking out, graph technology and a graph way of looking at the problem, because it makes you more productive and it makes it, it gives you systems that are optimized and specifically designed to work in that kind of environment. And those that would be sort of my high level starting point, coming from a relational or entity centric mindset,

Curtis: That’s helpful, right. Rules of thumb to kinda, kinda think through, you know, what are we doing and when is graph better than the other. Now another thing, you know, cause graph is not as widely used, right. And we still struggle with this on the relational side, right. In machine learning on the relational side. And that is how do you put something into production and, uh, you know, make it work as opposed to sort of an experimental setup and, uh, you know, devops and this kind of thing. How does that look on the graph side of things?

Matthias: Yeah, I think that’s, um, obviously it is hard to, um, in particular, once you try to go in the direction of production, if, if you’re not familiar with graph technology, or if, you know, the team you’re working with is not familiar with graph technology, it’s obviously a little more difficult to move a system into production where the team knowledge is not as mature as you’d want it to be. Um, and that’s definitely one thing to be mindful of as you play with graph technologies is, um, is to kind of, you know, be aware of your own maturity curve as it comes to the technology that said the, the systems that are out there, then there’s, there’s a number of graph databases out there that have been used in large scale production environments for many, many years. So the technology definitely has mature to the point where it is very safe to use in that environment.

It’s more think of is the team familiar and comfortable enough using this technology so that they can envision themselves going through a fire drill when something goes down or something goes wrong. And that definitely takes some time. We have seen, we have worked with a number of teams in, on graph problems, and we have seen that over the course of, you know, six months, nine months you can get from, “Hey, let’s build a POC, that’s proved this out. Let’s show how this would work and let’s roll it out on sort of a smallest use case.” And then scale it up from there is usually a good approach to ease people into the technology and make them more familiar and more comfortable over time, rather than just kind of going from like, “yup. Okay. You convinced me graph technology. Okay. Let’s get into production in a month” and run, you know, a trillion trillion edge graph that needs to be highly available like that, that will likely be a, a more frustrating journey than kind of easing yourself into it.

Curtis: Yeah. Yeah. That’s fair. And you guys are doing something like data stacks, I think. Do you guys maintain some sort of open source project in the graph world?

Matthias: Exactly. Yeah, we do. We have a product called DSE graph that you can download and play with. You know, it gets you all the way to a production and can scale massively. We also the, all of the query language and all of the supporting tooling on the graph site is part of a project called Apache tinker pop, which is one of the oldest graph communities that has really pioneered ways of translating graph thinking into formal languages and approaches that can make people productive working with graphs, both on what’s in terms of real time types of queries like pathfinding and traversals and such as well as analytic types of RAF operations, like connected component findings and, and, and centrality algorithms. And so there’s, there’s a large and active community out there in a very active open source project that maintains the graph query language gremlin, and a number of associated tools and components that allow you to be productive and graph.

And those that’s all open source because it is adopted by very many graph vendors. So it’s kind of, you can think of it as sort of as a defacto standard. So if you learn how to use gremlin, then you can work with many different graph technologies, right? Like kind of like how, if, you know, SQL you can use my SQL and Oracle and they might, you know, there’s obviously some minor difference, but by far and large, you can kind of get up and running quickly and you don’t have to like learn a new database system. And similarly in the graph world, Apache tinker pop gives you that sort of shared foundation that shared language. And that shared way of thinking about graph structure data that is then implemented by multiple graph vendors. So you can pick, you can make the technology choice independent from your learning and how you like to express graphs.

Curtis: How does that relate to neo4J? That’s the one that I tend to hear all the time is that it’s sort of built on top of tinker pop, or how are those related?

Matthias: Neo4j also implements tinkerpop and supports gremlin. So you can use gremlin with, uh, with neo4j. Neo4j also has their own own query language that they are, you know, they’re developed over the years. So, so unlike, I would say unlike the relational world where sort of the, the question of query language has been settled and, you know, has been settled for some time, the graph world, I think there’s still, there’s still an active conversation around what should the right query language be. And, and I think it’s a little more it’s it’s yeah, it’s a little more up in the air and it’s a really, I think it’s a really fascinating conversation around, well, how do we, how do we translate the graph thinking that is happening and fundamentally graphs are highly multidimensional structures, right? So that’s kind of the, the root sort of the root of the problem, if you will. How do you translate that into a formal language that is on the one hand expressive enough and powerful enough that you can do a lot of the graph stuff that people want to regularly do, like pathfinding, pattern matching, egocentric traversals, et cetera. Those are the common graph access patterns. How do you, how do you give people a query language that allows them to do those things with high levels of productivity while at the same time being somewhat familiar to how people use other types of query language like SQL, or even just programming languages while at the same time, not overwhelming people with a very steep learning curve. And I don’t think we have found the right answer to that question yet. I think there’s lots of really cool ideas out there, and it will be a little while before that converges. So it’s kind of like graft databases are sort of in this space where relational databases were in like the 1980s or so where it was still kind of unclear where things are, we’re going to go and sort of some alternatives out there,

Curtis: Still the wild west, even more so than, than data science in general is it sounds like. If someone wanted to get started, what would you recommend? Is gremlin kind of the widest, has the widest adoption, or what would, what would you look into?

Matthias: Yeah, I think I would, I mean, obviously I’m biased in that, that area, right? Because I’ve, you know, I’ve been working with an, on Apache tinker pop and built a number of graph databases that support apache tinker pop. I think it is. And I think it’s the most open and welcoming graph community that is supported across multiple databases. And if you look at, like, if you look at it, I mean, in terms of just factually looking at it, if you look at the number of vendors that support gremlin, theirs, those are just the most systems that support any particular graph language that would be gremlin, right? Like you can, whether you use AWS Neptune or Neo4j or DSE graph or Allegrograph, or a number of others that I’m forgetting here right now. And I don’t mean to like bias it, obviously DSE graph. I should put that in there ’cause that’s the one that we were working on. Those all support gremlin and Apache tinker pop. So I think it’s fair to say that if you want it to be able to use the largest number of graph systems on the market right now, gremlin would be the way to go.

Curtis: Got it. Yeah. Okay. Yeah. That’s fair. I want to talk at least a little bit about, about your book and not only how hard was it to write and writing a book is a huge, huge undertaking. So I’m always curious, how long did it take you guys and, you know, how did you get through it and survive and, um, and then just maybe the impetus for it and kind of what you were hoping to communicate there.

Matthias: Yeah, absolutely. It took, you’ll be surprised to hear this, but it took way longer than we thought. It’s a, it’s funny when we started off, I mean, a huge first off, a huge, huge shout out to my coauthor, Denise Gosnell, who did a lot of the work, did most of the writing and really kept us on, on a tight schedule. And, um, she was just phenomenal to work with on this book. We started off, I mean, it took us almost two years to complete. We came together in Seattle and, uh, we hiked up a mountain and kind of had this sort of, kind of had a general outline for the books, kind of set up after the hike. And we felt really good about, and we’re like, yep. So now we’re just gonna spend a year to kind of crank it out. And that turned into kind of more like two years, because we realized along the way there’s better and worse ways to explain things.

And we really had to go through this learning curve of, of trying to explain how we were approaching graph problems to a general audience and how to do that in a way that people could understand. We’re super, super grateful for the many reviewers of the book, really spend a lot of time going through it and giving us solid feedback that helped us better understand how to rephrase certain things and how to even entirely change chapters, to give people a better understanding of graph thinking. Because one of the things that’s really really hard to do is, is when you, when you spent so many years doing something, you sometimes subconsciously do things that you don’t even know how to verbalize. And, and that was kind of a journey that we were on. We were like, “Oh, actually we should probably like go back to when we didn’t know all these things about graph and like figure out what we would have liked to know back then.”

And we talked to a lot of people and did interviews and Denise did a ton of work on that as well, trying to understand how people were thinking about that transition from entity centric to graph centric thinking and how we could best support people. And that’s ultimately what this book became. We start off with a fairly high level introduction to graph thinking and why it is so powerful, kind of like going through some of the examples that I mentioned earlier and others. And then we do, we spend a lot of time explaining how this is different from a relational and how you can think of the two and how the two kind of co-exist. I think one important point we wanted to get across. We’re not saying relational is bad or anything. It’s really just sometimes a hammer does the job and sometimes the screw diver, just the job and you kind of need to know which tool works best for what, whatever you’re trying to accomplish.

And so we really spend a lot of time dissecting the two, and then we do a deep dive into very many different use cases and explain different concepts in graph that people need to be familiar with and, and try to address some common pitfalls that we see people fall in. Because one of the things, just to give one example here, one of the things that’s really, really challenging to do with graph is that every time you sort of walk a graph, you, you end up in sort of a combinatorial explosion of data, right? Like if I, if I ask the question, like how many friends do you have Curtis? And you’d be like, you know, on the order of, you know, so-and-so many, a hundred, let’s say, but then if you ask how many friends of friends do you have? Well, suddenly we’re on the order of like tens of thousands, right? Like just two hops out and you’re already dealing with a massive amount of data. And then, you know, if you ask the question of, well, how many people are there that a friend of a friend could connect you to? Well, suddenly we’re talking like millions of people, right? So these kinds of like graph, like explorations can very quickly explode. And so there is a lot of learning around how to constrain them, to really get to the data that you’re interested in and to answer the questions that you interested in without traversing the entire world of data.

Curtis: Got it. Yeah. That’s a that’s and what is the book called again?

Matthias: Yes. The book is called “The Practitioner’s Guide to Graph Data.” It is published under a O’Riley and yeah. Available on Amazon or your favorite bookseller.

Curtis: Got it. And is that, and is that targeted? It sounds like it’s targeted more towards people that, you know, get in and actually write the queries and stuff. Would you also recommend it for, I dunno, business leaders to just count kind of want to know like what this is and what kind of problems it could solve? Is it geared toward that as well?

Matthias: I would say. Yeah. So the, I think the, the, the audience we had in mind is really that, that practitioner, right, who wants to, wants to get their fingers in it and really solve the problem. I think the first, like if you read the first three to four chapters as a sort of technology interested business leader, I think you would get a lot of value out of that as well, because it really lays out the case for graph thinking. It shows how graphs are really shaping economies and values right now. Like if you just think of like, if you are a business leader and you’re trying to figure out how can I create value in my business? Right. One of the, one of the most obvious answers to that is create network effects, right? That is one of the, one of the few natural monopolies that you still have available as, as a business leader to build structure and build value, and really to understand that you really need to understand graphs and graph concepts.

So, so I think there, it’s not really, like we don’t spend too much time dwelling on sort of the strategic elements of graph and how it determines business value and business structure. But if you are already kind of thinking in that direction and you already, like, if network effects, for instance, is something you already understand and you want to understand better how you might be able to translate that into sort of a technical leadership program. Then I think the first couple of chapters of the book, the first three, four or five chapters of the book could be really helpful to give you sort of a lay of the land.

Curtis: That’s awesome. It sounds interesting. Um, is there anything else that you wanted to cover or wanted to share with the audience about graph or DataStax or whatever that you think is important?

Matthias: No, I would like just, just like to encourage people to, to give it a try. I think that the biggest, really the biggest message we’re trying to get across is not, you know, use this technology or that technology. It’s really, play around with graph thinking, and you would be amazed by how you can look at the world differently. And, and that is a su, like, that’s a super fun thing to do, honestly, like I’ve, like for me, it was eye opening to, to learn about concepts like emergents and complex systems and, and how they shape the way that societies work, how our bodies work. And I think I would really love to encourage people to just, just try it out, even if you don’t have a problem at hand way, like this is a graph problem. If you’re, if you’re a curious, interested person who would like to understand the world better and potentially learn some tools along the way that they can use in the future, I think graph thinking and graph technologies as a implementation of graph thinking is a really, really great area to dive into. So I’d encourage anybody to do that, even though it’s a little scary to learn something new and it seems kind of out there, but you’d really, I think you’d really come away from it thinking in a much more sort of in depth way about the world around you.

Ginette: A big thank you to Matthais Broecheler for being on the podcast, and as always go to datacrunchpodcast.com for our transcript and attributions.

Attributions

Music

“Loopster” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/

Curious about graph databases? Matthias Broecheler from DataStax talks with us about looking at the world through a graph lens and using it to solve real world problems.

Ginette: I’m Ginette,

Curtis: and I’m Curtis,

Ginette: and you are listening to Data Crunch,

Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.

Ginette: Data crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.

Today we chat about graph databases with Matthais Broecheler, Chief Technologist at DataStax and coauthor of the Practitioner’s Guide to Graph Data.

Curtis: Graph isn’t something we’ve done on this show yet. So I’m glad you’re here. I’ve gone to conferences and stuff in the past. And I keep hearing, you know, this is the year of the graph database, and then nothing emerges. Right, right? And so I’d really love to dive in since you’ve done so much with the graph database. Where are the use cases that, that it’s applicable and really useful as opposed to relational. And then I know there’s also sort of a mid ground where you could use a relational or you could also use a graph and they both work. Right? So helping us all understand what that is, what that landscape looks like would be awesome.

Matthias: Absolutely. Yeah. I would be, I’ll be happy to talk about all these things. Yeah. It’s, it’s a, it’s a fascinating world. I’m talking about. Like, it’s, it’s, you know, the year of the graph. People love to do this kind of industry analysis, right? Like where are we on the hype cycle and what is coming, what is going. I think graph is interesting in the sense that there is a difference between how you solve problems and what the technology choices are that you use to solve those problems. So to your point, if you are very familiar with relational technology, you can actually solve graph problems with relational technology—if you bring the right mindset to the problem. It makes certain things harder, right? Like you, for instance, you would likely run into like join problems and having to do like manual join optimization and figuring out how to really build a solid physical data model based on the ER diagram, like all those sort of things.

But if you, you know, if you have like 25 years of relational database expertise under your belt, that’s not a bad starting point. And in fact, in the book that we wrote, we spent a whole chapter talking about this and talking about, you know, graph versus relational, because I think what gets lost in the technology conversation is the bigger sort of the much more interesting point and that is how can we teach people to think in terms of graph and to understand when they are confronted with the graph problem and to switch their mindset towards the graph problem. The relational system, but I mean, it’s one, one way to make computers do data, and it’s a, as you said, it’s a very powerful way, and we’ve been doing it for decades and the systems have become incredibly sophisticated, but we need to keep in mind where they came from.

And that was sort of from early bookkeeping, you know, applications like how do you keep track of transactions and like invoices and, you know, payroll and things like that, right? Like that’s kind of where the relational world originally came from, like a very kind of flat way of looking at the world. And that has really changed tremendously. Those applications still exist. Like obviously ERP still exists, and we still need to keep track of payroll and we still need to file invoices, but the world has evolved tremendously from there to now we have the internet and now we have the internet of devices, and we have social media and we all connected and, and the world has become much more of an interconnected system, not only at the biological level where we already see this, but even at the societal level of how societies are structured. And the pandemic that we’re currently in is, is a very vivid reminder of how intertwined our supply chains are and how intertwined nations are and, and things like that. And all those are effectively ways of looking at the world in a graph-structured way. And I think that’s really the key message we’re trying to get across the book in the book is to say, how do you use graph technology? Yes, for sure. But also, how do you get yourself to think in a way that is graph like so you can actually solve the problem you’re trying to solve.

Curtis: Do you do that in your book by some examples, some case studies? Or how do you approach that?

Matthias: Yes, so we do it twofold. So one is exactly as you said, we pick a number of different problems. So we look at a customer 360 example, and the kind of the kind of connectivity that exists in modern commerce systems. We look at an IOT cell tower type of example. We look at a Bitcoin example. We look at a movie recommender system example. So there’s, there’s a lot of examples that we go through to show on different topics and use cases, how graph thinking can be applied to solve particular problems. But we also take the approach of trying to introduce what we, what we call graph thinking in more general terms, and try to explain how to shift your mindset from a looking at the world as individual objects that as a secondary thing might be connected to other objects, which is how we mostly look at the relational world, right?

Like tables. And then a table can be a relationship to thinking of relationships as a first class citizen of sort that that really determines how the problem ought to be solved. And I think a great, a great way to show how powerful this is, is to look at the early days of internet search, right? Like if you, if you remember back to, you know, the 1990s and, and very early two thousands, right? There were lots of search engines on the market, like, like AltaVista and Lycos that dominated that space. And they looked at the problem of internet search, like you would look at the problem of document search, right? They looked at each webpage as an individual document that had content and structure and they, you know, they hired the best linguists and NLP experts, and they did a tremendous amount of work and built fantastically complex systems to analyze that, that huge corpus of documents they found on the web, but one problem that they had is how do you, how do you know if a document is authoritative on a particular topic, right?

Because that determines how it should be ranked in the search results, not just whether words occur on a page. And so they did a lot of linguistic analysis to determine how they could infer authority from just this one entity, the document, and then Google came around and they took a very different approach to the problem and that they looked at the problem as a graph problem. They said, well, yes, they are individual documents, but more interestingly, those documents are linked to each other. And if we think of those links as inferring or assigning authority, right, if I link to your blog or to your website, I am basically saying I endorse, you know, Data Crunch and I endorsed the content that you have on your page. And so I can look at that relationship as a sort of endorsement. And if I then sort of recursively compute that, which is effectively what page rank is, I get away of ranking pages on the internet that is induced by the graph structure that underlies it all.

And that completely blew away all other search engines to the point where, you know, I think some of them might still be around, but basically they have been forgotten. And we have come to the understanding that the way you do internet search is by looking at the internet as a graph. And I think there are so many examples like that, where we looked at a problem as a collection of objects. And once we changed it to it’s actually a graph, you have a whole different set of tools that you can apply to solve the problem that is much more powerful than just individual entity analysis.

Curtis: I remember doing searches on AltaVista. They’re now gone. And, um, what are some other examples that you have, you know. I mean, this is really interesting to start thinking about this. What are some other ones that are maybe one other, one that you think is really interesting that maybe people aren’t as familiar with, not that they would be familiar with the graph concept within Google, but, but maybe something like a concept that maybe they don’t think about often or use all the time.

Matthias: Yeah. I think one thing that I’m personally very interested in is just the world of finance and commerce, right? Like how everything is so tightly intertwined. And I think the financial crisis of 2008/2009 made that very apparent how a certain failure in the system can cascade and, and lead to a cascading failure event that we could not . . . at the time, we certainly couldn’t predict it, but we also couldn’t even understand it when it was happening. Right? And we were kind of running around trying very desperate things to, to, to make it not take down the entire financial system at the time, which was mostly driven by our lack of understanding for how, what is on one bank’s books influence of how another bank performs and how all these, you know, securities were intertwined and what kind of depths were on, on which bank and such.

And I think that’s still very much prevalent today where the interconnectivity between our commerce system and our economy as a whole, and the financial system is a huge massive, massive graph that oftentimes is hard to see, but in case of failure, it becomes very apparent how cascading effects can have very, very devastating effects. And I think it is for me, it’s a really interesting area now, again, in the COVID-19 crisis, because we can see that obviously this is a huge shock to our economy, but it is very hard right now to predict what kind of shock it is going to be and how we best buffer the shock other than like blanket handing out money, which is sort of the approach that we’re currently taking. And I’m not saying that that is, that is a bad idea. I’m saying it would be really delightful if we could advance our thinking.

And if it could have advanced our systems to the point where we can actually understand all these interdependencies and accurately model and predict what is going to happen similarly to like how you can do weather forecast, right? Like we have a very good understanding for instance of metrological phenomenon, to the point where we can fairly actually predict the weather. Now, obviously our economy is a much more complex system than that, but it seems that we could get to the point where we understand the graph structure and, and model it in a way and observe it in a way that actually allows us to control it better. Another example is that of, of cybersecurity, which is a completely different take on a graph problem. And that is when you look at the kinds of attacks that are now being launched against companies or States for that matter, or in agencies, those are very sophisticated.

You see groups of people who take a very long time and a very deliberate approach to penetrate systems and they, they find multiple entry points and then they spend a long time sort of scanning and analyzing the internal system to, to plan their next move and get closer and closer to sensitive data while trying not to trip off any of the, sort of, any of the warning systems or intrusion detection systems that would alert to their presence. And the only way we can really defend against those is if we understand the context and the history in which these adversaries act inside the system, right? If you look at any individual event, may it be a file that is being sent or a port that is being opened, or a connection that shouldn’t have been opened, like any of these may not individually trigger detection, but if you put them in context and you understand and can trace the attack vector, then you can get a much better risk profile and come up with systems that alert to these intrusions without being overly sensitive and, you know, being constantly in panic mode. And I think that’s an, that’s another area where graph thinking is starting to become very important to be able to trace and defend against those attacks.

Curtis: Yeah. So lots of, lots of use cases here. A lot of us being that we have worked with relational databases so much, we can get into a relational database and we know how to analyze it and aggregate it and visualize it and you know, run machine learning algorithms against it. Can you give us a sense for how that may differ in an, in a graph database? Like how would you visualize an analysis or how would you aggregate it in a meaningful way to get insight when it’s in a graph state as opposed to relational?

Matthias: That’s a, that’s a really good question. So on the visualization front, there’s, there’s obviously a lot of, a lot of graph visualization tools, and they are also sometimes applied to relational systems in, in the graph world. Those are usually usually packaged with a graph database. For instance, DSE graph comes with a tool called DataStax Studio that has graph visualization built in so you can, you can kind of quickly do like doing an analysis of sort of, of a sub graph of the graph, but obviously when you deal with very large graphs, that can be very overwhelming, right? There’s not, there’s not a whole lot of use in looking at a giant ball of spaghetti where your screen is basically a bunch of dots with lines in it that that’s not something humans can perceive in any, any real way. So there’s a lot of work to try to understand how to provide the right level of granularity and how to do for instance, clustering, to understand when there’s like clusters of vertices in your graph that should belong together, that you can collapse them onto one. So that’s a really interesting area of research. So graph visualization is one thing there’s a lot of also a lot of really coUs and GPU acceleration to handle very large graphs. So you can kind of zoom in and out and do interactive analysis of graph data.

The other thing that’s very different from relational is the ability to figure out how you would want to structure your data to begin with, like in the relational world. We usually, we usually start with the entities and, you know, we call it entity, relation . . . entity, relational modeling, for a reason, we usually start with the entities, then look at the relationships and map that onto tables. In the graph world, there’s usually a more fluid type and more fluid and heterogeneous approach where you build your schema incrementally in a more whiteboard type fashion, and try to include more data as you go along. One of the really powerful ways that you can utilize graph technology is to include multiple different types of data and get a more comprehensive view of the problem. And I think that’s a, that’s a kind of mentality that in the relational world is, is still fairly new. I would say, in the, in the data data analyst space, machine learning space, I think we’re, you know, we’re very well versed in the whole notion of feature engineering and like, what features do you need to consider and how do you include them?

And, and with graph, you kind of taking that one step further and not just looking at like, features on this in an entity sense. So like what kind of attributes for my records do I include in a support vector machine let’s say, but furthermore, what kind of relationships and related attributes do I include? And what does that mean for the kind of problem that I’m looking at? So that kind of, that kind of connected thinking is I think the biggest difference in looking at the world from an, from an entity relationship point of view, versus a graph point of view, where when you were kind intuitively familiar with this idea of flows and dependency and interdependency, and even bi-directional dependencies, where sometimes you don’t actually know what the, what the cause of relationship might be. And so you’re dealing more with like mark of network type dependencies in the probabilistic sense than with Bayesian network type dependencies, where you have some kind of directionality in the causality that you’re exploiting.

Curtis: It’s hard to do this, right, over a podcast ’cause we can’t visualize anything. Right. Essentially . . .

Matthias: Yeah, that’d be amazing if we could do a whiteboard right now.

Curtis: Right, exactly. So I’m just trying to give, so some listeners may have never, I imagine most have at least heard of graph databases, but some maybe have not. And so if you were to just on the most basic level, sort of describe, you know, you have your nodes and this kind of thing.

Matthias: Yeah, absolutely. I think the, I think the, the, the one liner would be depending on how important the relationships are to the problem you’re trying to solve, that usually ends up being the determinant as to whether or not you should use graph technology versus sort of technologies that you may be more familiar with, like relational, or even just flat files, like just CSV files that you load if you’re doing batch analysis. And the reason being is that graph databases have a lot of support, both in terms of the technology, the implementation, the index structures and performance, and the query engine and such for dealing with relationships, but also in the query languages and the tools that are available that make you more productive with relationships. Those are, I think the two important things to consider if you’re dealing, if you’re in a world where you’re doing what you find funds, and you’re starting in relation to what, and then you find yourself writing like five way joins, you will probably notice it, your productivity drops quite significantly when you’re dealing with relational systems, because you’re doing like all of a sudden, like join reordering becomes important and you might have to sort of hints and your queries get really ugly.

And it’s really hard to see what’s going on. And, and there’s lots of like productivity problems with that, but there’s also performance problems with that. Because a relational engine, at some level of join depth, we’ll run into trouble. And those are the two areas where I would say, you know, kind of, if you see those things happening, start checking out, graph technology and a graph way of looking at the problem, because it makes you more productive and it makes it, it gives you systems that are optimized and specifically designed to work in that kind of environment. And those that would be sort of my high level starting point, coming from a relational or entity centric mindset,

Curtis: That’s helpful, right. Rules of thumb to kinda, kinda think through, you know, what are we doing and when is graph better than the other. Now another thing, you know, cause graph is not as widely used, right. And we still struggle with this on the relational side, right. In machine learning on the relational side. And that is how do you put something into production and, uh, you know, make it work as opposed to sort of an experimental setup and, uh, you know, devops and this kind of thing. How does that look on the graph side of things?

Matthias: Yeah, I think that’s, um, obviously it is hard to, um, in particular, once you try to go in the direction of production, if, if you’re not familiar with graph technology, or if, you know, the team you’re working with is not familiar with graph technology, it’s obviously a little more difficult to move a system into production where the team knowledge is not as mature as you’d want it to be. Um, and that’s definitely one thing to be mindful of as you play with graph technologies is, um, is to kind of, you know, be aware of your own maturity curve as it comes to the technology that said the, the systems that are out there, then there’s, there’s a number of graph databases out there that have been used in large scale production environments for many, many years. So the technology definitely has mature to the point where it is very safe to use in that environment.

It’s more think of is the team familiar and comfortable enough using this technology so that they can envision themselves going through a fire drill when something goes down or something goes wrong. And that definitely takes some time. We have seen, we have worked with a number of teams in, on graph problems, and we have seen that over the course of, you know, six months, nine months you can get from, “Hey, let’s build a POC, that’s proved this out. Let’s show how this would work and let’s roll it out on sort of a smallest use case.” And then scale it up from there is usually a good approach to ease people into the technology and make them more familiar and more comfortable over time, rather than just kind of going from like, “yup. Okay. You convinced me graph technology. Okay. Let’s get into production in a month” and run, you know, a trillion trillion edge graph that needs to be highly available like that, that will likely be a, a more frustrating journey than kind of easing yourself into it.

Curtis: Yeah. Yeah. That’s fair. And you guys are doing something like data stacks, I think. Do you guys maintain some sort of open source project in the graph world?

Matthias: Exactly. Yeah, we do. We have a product called DSE graph that you can download and play with. You know, it gets you all the way to a production and can scale massively. We also the, all of the query language and all of the supporting tooling on the graph site is part of a project called Apache tinker pop, which is one of the oldest graph communities that has really pioneered ways of translating graph thinking into formal languages and approaches that can make people productive working with graphs, both on what’s in terms of real time types of queries like pathfinding and traversals and such as well as analytic types of RAF operations, like connected component findings and, and, and centrality algorithms. And so there’s, there’s a large and active community out there in a very active open source project that maintains the graph query language gremlin, and a number of associated tools and components that allow you to be productive and graph.

And those that’s all open source because it is adopted by very many graph vendors. So it’s kind of, you can think of it as sort of as a defacto standard. So if you learn how to use gremlin, then you can work with many different graph technologies, right? Like kind of like how, if, you know, SQL you can use my SQL and Oracle and they might, you know, there’s obviously some minor difference, but by far and large, you can kind of get up and running quickly and you don’t have to like learn a new database system. And similarly in the graph world, Apache tinker pop gives you that sort of shared foundation that shared language. And that shared way of thinking about graph structure data that is then implemented by multiple graph vendors. So you can pick, you can make the technology choice independent from your learning and how you like to express graphs.

Curtis: How does that relate to neo4J? That’s the one that I tend to hear all the time is that it’s sort of built on top of tinker pop, or how are those related?

Matthias: Neo4j also implements tinkerpop and supports gremlin. So you can use gremlin with, uh, with neo4j. Neo4j also has their own own query language that they are, you know, they’re developed over the years. So, so unlike, I would say unlike the relational world where sort of the, the question of query language has been settled and, you know, has been settled for some time, the graph world, I think there’s still, there’s still an active conversation around what should the right query language be. And, and I think it’s a little more it’s it’s yeah, it’s a little more up in the air and it’s a really, I think it’s a really fascinating conversation around, well, how do we, how do we translate the graph thinking that is happening and fundamentally graphs are highly multidimensional structures, right? So that’s kind of the, the root sort of the root of the problem, if you will. How do you translate that into a formal language that is on the one hand expressive enough and powerful enough that you can do a lot of the graph stuff that people want to regularly do, like pathfinding, pattern matching, egocentric traversals, et cetera. Those are the common graph access patterns. How do you, how do you give people a query language that allows them to do those things with high levels of productivity while at the same time being somewhat familiar to how people use other types of query language like SQL, or even just programming languages while at the same time, not overwhelming people with a very steep learning curve. And I don’t think we have found the right answer to that question yet. I think there’s lots of really cool ideas out there, and it will be a little while before that converges. So it’s kind of like graft databases are sort of in this space where relational databases were in like the 1980s or so where it was still kind of unclear where things are, we’re going to go and sort of some alternatives out there,

Curtis: Still the wild west, even more so than, than data science in general is it sounds like. If someone wanted to get started, what would you recommend? Is gremlin kind of the widest, has the widest adoption, or what would, what would you look into?

Matthias: Yeah, I think I would, I mean, obviously I’m biased in that, that area, right? Because I’ve, you know, I’ve been working with an, on Apache tinker pop and built a number of graph databases that support apache tinker pop. I think it is. And I think it’s the most open and welcoming graph community that is supported across multiple databases. And if you look at, like, if you look at it, I mean, in terms of just factually looking at it, if you look at the number of vendors that support gremlin, theirs, those are just the most systems that support any particular graph language that would be gremlin, right? Like you can, whether you use AWS Neptune or Neo4j or DSE graph or Allegrograph, or a number of others that I’m forgetting here right now. And I don’t mean to like bias it, obviously DSE graph. I should put that in there ’cause that’s the one that we were working on. Those all support gremlin and Apache tinker pop. So I think it’s fair to say that if you want it to be able to use the largest number of graph systems on the market right now, gremlin would be the way to go.

Curtis: Got it. Yeah. Okay. Yeah. That’s fair. I want to talk at least a little bit about, about your book and not only how hard was it to write and writing a book is a huge, huge undertaking. So I’m always curious, how long did it take you guys and, you know, how did you get through it and survive and, um, and then just maybe the impetus for it and kind of what you were hoping to communicate there.

Matthias: Yeah, absolutely. It took, you’ll be surprised to hear this, but it took way longer than we thought. It’s a, it’s funny when we started off, I mean, a huge first off, a huge, huge shout out to my coauthor, Denise Gosnell, who did a lot of the work, did most of the writing and really kept us on, on a tight schedule. And, um, she was just phenomenal to work with on this book. We started off, I mean, it took us almost two years to complete. We came together in Seattle and, uh, we hiked up a mountain and kind of had this sort of, kind of had a general outline for the books, kind of set up after the hike. And we felt really good about, and we’re like, yep. So now we’re just gonna spend a year to kind of crank it out. And that turned into kind of more like two years, because we realized along the way there’s better and worse ways to explain things.

And we really had to go through this learning curve of, of trying to explain how we were approaching graph problems to a general audience and how to do that in a way that people could understand. We’re super, super grateful for the many reviewers of the book, really spend a lot of time going through it and giving us solid feedback that helped us better understand how to rephrase certain things and how to even entirely change chapters, to give people a better understanding of graph thinking. Because one of the things that’s really really hard to do is, is when you, when you spent so many years doing something, you sometimes subconsciously do things that you don’t even know how to verbalize. And, and that was kind of a journey that we were on. We were like, “Oh, actually we should probably like go back to when we didn’t know all these things about graph and like figure out what we would have liked to know back then.”

And we talked to a lot of people and did interviews and Denise did a ton of work on that as well, trying to understand how people were thinking about that transition from entity centric to graph centric thinking and how we could best support people. And that’s ultimately what this book became. We start off with a fairly high level introduction to graph thinking and why it is so powerful, kind of like going through some of the examples that I mentioned earlier and others. And then we do, we spend a lot of time explaining how this is different from a relational and how you can think of the two and how the two kind of co-exist. I think one important point we wanted to get across. We’re not saying relational is bad or anything. It’s really just sometimes a hammer does the job and sometimes the screw diver, just the job and you kind of need to know which tool works best for what, whatever you’re trying to accomplish.

And so we really spend a lot of time dissecting the two, and then we do a deep dive into very many different use cases and explain different concepts in graph that people need to be familiar with and, and try to address some common pitfalls that we see people fall in. Because one of the things, just to give one example here, one of the things that’s really, really challenging to do with graph is that every time you sort of walk a graph, you, you end up in sort of a combinatorial explosion of data, right? Like if I, if I ask the question, like how many friends do you have Curtis? And you’d be like, you know, on the order of, you know, so-and-so many, a hundred, let’s say, but then if you ask how many friends of friends do you have? Well, suddenly we’re on the order of like tens of thousands, right? Like just two hops out and you’re already dealing with a massive amount of data. And then, you know, if you ask the question of, well, how many people are there that a friend of a friend could connect you to? Well, suddenly we’re talking like millions of people, right? So these kinds of like graph, like explorations can very quickly explode. And so there is a lot of learning around how to constrain them, to really get to the data that you’re interested in and to answer the questions that you interested in without traversing the entire world of data.

Curtis: Got it. Yeah. That’s a that’s and what is the book called again?

Matthias: Yes. The book is called “The Practitioner’s Guide to Graph Data.” It is published under a O’Riley and yeah. Available on Amazon or your favorite bookseller.

Curtis: Got it. And is that, and is that targeted? It sounds like it’s targeted more towards people that, you know, get in and actually write the queries and stuff. Would you also recommend it for, I dunno, business leaders to just count kind of want to know like what this is and what kind of problems it could solve? Is it geared toward that as well?

Matthias: I would say. Yeah. So the, I think the, the, the audience we had in mind is really that, that practitioner, right, who wants to, wants to get their fingers in it and really solve the problem. I think the first, like if you read the first three to four chapters as a sort of technology interested business leader, I think you would get a lot of value out of that as well, because it really lays out the case for graph thinking. It shows how graphs are really shaping economies and values right now. Like if you just think of like, if you are a business leader and you’re trying to figure out how can I create value in my business? Right. One of the, one of the most obvious answers to that is create network effects, right? That is one of the, one of the few natural monopolies that you still have available as, as a business leader to build structure and build value, and really to understand that you really need to understand graphs and graph concepts.

So, so I think there, it’s not really, like we don’t spend too much time dwelling on sort of the strategic elements of graph and how it determines business value and business structure. But if you are already kind of thinking in that direction and you already, like, if network effects, for instance, is something you already understand and you want to understand better how you might be able to translate that into sort of a technical leadership program. Then I think the first couple of chapters of the book, the first three, four or five chapters of the book could be really helpful to give you sort of a lay of the land.

Curtis: That’s awesome. It sounds interesting. Um, is there anything else that you wanted to cover or wanted to share with the audience about graph or DataStax or whatever that you think is important?

Matthias: No, I would like just, just like to encourage people to, to give it a try. I think that the biggest, really the biggest message we’re trying to get across is not, you know, use this technology or that technology. It’s really, play around with graph thinking, and you would be amazed by how you can look at the world differently. And, and that is a su, like, that’s a super fun thing to do, honestly, like I’ve, like for me, it was eye opening to, to learn about concepts like emergents and complex systems and, and how they shape the way that societies work, how our bodies work. And I think I would really love to encourage people to just, just try it out, even if you don’t have a problem at hand way, like this is a graph problem. If you’re, if you’re a curious, interested person who would like to understand the world better and potentially learn some tools along the way that they can use in the future, I think graph thinking and graph technologies as a implementation of graph thinking is a really, really great area to dive into. So I’d encourage anybody to do that, even though it’s a little scary to learn something new and it seems kind of out there, but you’d really, I think you’d really come away from it thinking in a much more sort of in depth way about the world around you.

Ginette: A big thank you to Matthais Broecheler for being on the podcast, and as always go to datacrunchpodcast.com for our transcript and attributions.

Attributions

Music

“Loopster” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/