Man holding up a sticky note that says "blockchain."

Potential Advantages of Blockchain for Data Scientists

Luciano Pesci is bullish on blockchain and data science. Since blockchain offers a complete historical record, no one can delete or alter prior information written into the record. He sees this characteristic as a massive advantage for data scientists. 

Luciano Pesci: And the key for data scientists and leaders who are gonna oversee data sciences, you’ve got to get a narrow enough problem to demonstrate one quick win and I mean in 90 days. If in 90 days you can’t come back to the organization and show, “we have made real progress on these metrics in your understanding so that you can make these decisions,” they’re not going to continue to do it.

Ginette Methot: I’m Ginette,

Curtis Seare: and I’m Curtis,

Ginette: and you are listening to Data Crunch,

Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.

Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.

Ginette: No matter what your position in a company is, knowing about data, how it works, and what it can do for you is vital to the success of your organization.

Fortunately there are ways for you and those in your organization to learn about data. Brilliant dot org, an online educational resource, has on-demand classes in data basics that can help you understand this growing area, providing you with tools and the framework you need to break up complex concepts into bite-sized chunks. You can sign up for free, preview courses, and start learning by going to Brilliant.org/DataCrunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription.

Ginette: The CEO of Emperitas, Luciano Pesci, joins us today. Let’s get right into the episode.

Curtis: What inspired you to get into data? What inspired you to to start the company you’re working at now and how’d you get going?

Luciano: All of it was a complete accident. Yeah, none of it, not the schooling, the business, none of it was intentional.

Curtis: Okay, let’s hear about it.

Luciano: My first business was actually recording studio and a record label, and I had signed, among other acts, my own band, and we got a management deal, and we went to LA. We started to tour with national acts, and I thought that was going to be my career path without a doubt, and so I didn’t take the ACT/SAT at the time, barely graduated high school, and then the band fell apart. And I was like, “well, what am I going to do?” So I went back to school, had a transformative experience, got drawn into economics, and then within economics really found data.

Curtis: And what drew you to economics?

Luciano: I like studying people. I think it’s the most complete picture of people. So there’s a lot of other disciplines that sort of dive deeper when it comes to people’s psychological characteristics, their behavioral components. But economics was about the entire system and how an individual functions within that bigger system. And the reason I got to data from that was that the key assumption of modern economics is perfect information. So this is usually where critics of what is called the classical model in economics come in and say, “well, you can’t have perfect information, so therefore you can’t have optimizing behavior.” And one of the beautiful lessons of the last 20 years, especially with data science is it might not be perfect information, but you can get really good information to make optimized choices. And so the represented that, that method of going into the real world and optimizing all these processes that we were learning about in the textbooks and at the abstract theory level.

Curtis: Interesting. And that’s, there’s not a lot of places, if any, that I know of that teach that approach, right? Or have good coursework around that. Did you kind of figure this out on your own or how’d you, how’d you come to that?

Luciano: So yes, I, as part of my PhD program, so my PhD is in economics, but I was also a master’s student in economics first. As part of that program, you have to teach. And the course that I taught, I was allowed to design my own course. And it was an engaged learning course. So what we were doing was taking mostly econ and business school undergrads, but a little mix of master’s students, and we’re teaching them econometric methods and research methods and data analysis by having them partner with a real-world client. So these were often nonprofits cause we’re a public university, so we didn’t want any critique about squeezing out the real business. And we would find these nonprofits who just couldn’t afford this research on their own or this work on their own and the students who needed actual experience. And so it was a really nice melding of those skills. And the result of it was that the program has continued to this day. But in about 2014, we spun off Emperitas. And that’s when I started to do this privately.

Curtis: Oh, that’s interesting. Okay. So this is, this is a spinoff from the university PhD work you were doing and teaching.

Luciano: Yes.

Curtis: That’s awesome.

Luciano: Yep.

Curtis: That’s awesome. And, uh, so tell me a little bit about, uh, some of the stuff you do there. Maybe give me an example of one of your clients and, and, and so we can have a concrete example of, you know, how this is actually applied.

Luciano: We help mostly B2B customers. We do some B to C, but B2B seem to be more focused right now on getting their data in order and leveraging it to make decisions and optimize those decisions. And so as far as the verticals that we have worked in a lot in the pharmaceutical medical field, a lot in finance, so banking and banks, banks are a big part of our customer base. Startups who have passed their J-curve growth and are now starting to hit that next stage in the product life cycle where they hit mass maturity. And so they’re still growing year by year but at a slower and slower rate, and they’re all looking for a way to beat competitors. They’ve been in the market for 10 years, they’ve got hundreds of employees, and they’ve done as much as they can internally to optimize these systems, but they’re not using their data.

And so what we have done is come in and say, “look, we have to, the first step is getting this data, identifying where it is, identifying the different silos that this data is in, and then getting into some kind of structured central form where key stakeholders who need access can either log in and get access to the raw data or they can come and view it through a dashboard or they can export some sort of automatic report, but they have to be able to use this. Right now. They’re not doing it.” And then once we’ve set up those systems, there’s really three deliverables that we provide, and it’s customer lifetime value. So we calculate a customer lifetime value that’s both monetary and non-monetary. And then we do a customer journey map, and then we break it up by personas. And those three deliverables really give them a picture of everything that’s going on affecting their organization. And it allows us to put all of their data together in a framework that they understand. Now, in the last two years, the most, uh, or the fastest increasing portion of our business has been cryptos and blockchain based organizations. And this is one of those things that I, I think, you know, we’ve always been at the cutting edge of market research or data science, but where we really are moving fast into the future while a lot of other people are standing still is around blockchain.

Curtis: Let’s talk about that a little bit because there’s a lot of, you know, you read stuff about blockchain, there’s some people that are really bullish on it and some people that are, you know, not so much that, uh, what’s your, what’s your take on it since you’re so involved in, in working with it and doing things with it.

Luciano: So one of the courses that I have taught at the university of Utah in the econ department is the history of the U S economy. And really if I summed up that story, it’s about the technology and the social institutions of the people involved and how those two things play with each other. And taking 400 years of history into context, there is nothing that has emerged on the scene that’s as powerful with its potential as blockchain. And this is because it fundamentally changes how data is currently being stored. So right now if you have ah, it used to be when we started the Utah CRG, which is the program at the U, and then even Emperitas, we started by getting flat files from everybody. We’d get CSVs. That started to change. We started having access directly to databases or through APIs. And we’re clearly in a world now where we’re about to move even beyond that, where there are centralized systems where all of this information is getting aggregated, and blockchain presents a huge advantage over the current database approach.

Curtis: Got it. And what are some of those advantages that you see? Let’s dig into this a little bit for, for those people that, that aren’t as versed in blockchain.

Luciano: So a traditional, let’s define a traditional database first. And this comes from Kris Bennett the blockchain beard guy who is someone you should definitely follow if blockchain interests you; he can be found on LinkedIn and Twitter. He looks like ZZ top band member is talking about blockchain. Uh, he does, he’s got that rough mountain man voice, but he is a blockchain pioneer. I mean he’s been in this space for 10 years, which is as long as you can actually have really been in this space. And you know, he says all the time, there’s four things that a traditional database can do, right? It can create data, it can read data, it can update data, and it can delete data. The last two are not allowed in a blockchain, at least in the current version 1.0 or 2.0 systems. You can create data and read data, but if you want to append something or delete, it’s about updating a new record, not going back and wiping out the old record. And what this does is it gives data scientists the ability to view the history of a system, not just the single snapshot. It’s the difference between trying to watch a movie like the Avengers in single frames every 15 minutes versus the continuous flow of the actual film at 30 frames per second or 60 frames per second. That’s the difference between blockchain data and traditional databases.

Curtis: Where’s most of the adoption happening? Is it mostly in banking or who’s taking it up?

Luciano: Well, cryptos have been the pioneer. So the very first real world example, what they call blockchain 1.0 is Bitcoin. It’s a single asset. It’s everybody has access or nobody has access. It doesn’t have any other permissioning beyond that. It’s a single ledger. Uh, and it has some anonymity. That’s a false anonymity because you can’t be deidentified in the real world anymore. But it, it seeks to have some anonymity. That’s different than blockchain 2.0 which is more like an Ethereum that has smart contracts and hyperledger which allows for permissioning. These are now about any asset. They’re not just about money. So the first applications of this have been in money, and that’s where you’ve seen the real rapid rise of blockchains. In the last 10 years, five years of the last 10 year history, probably IBM’s done more than anybody. I mean there’s 200 plus signers sponsors to the hyperledger project, but IBM has really been one pushing that. And that’s the enterprise version of blockchain technology. As far as who’s adopted it, the answer is really nobody yet. It’s so early in its own product life cycle, people just figuring out how to get a handle on this.

Curtis: Are you using the hyperledger then any of your current clients that you can discuss and give us some, some real world examples?

Luciano: Yeah, I think before we go into that, I think it’s would be good to say what are the four advantages that blockchain represents for data scientists or for organizations who are thinking about embracing data science. This would be a complimentary good in the terminology of economics, right? You buy like Coke and Pepsi are substitutes. You buy one or the other, but tea and sugar are complements; you buy both together because you want to combine them, and so blockchain would be one of those complimentary goods to data science, and the four places that there’s a real advantage over traditional databasing where you had those four abilities and now you only have two is that it is continuous, so you have a complete historical record. It’s called “the collective memory of the system” is what people usually say because every single thing that’s happened is there.

Luciano: You can go back and trace every single Bitcoin that currently exists all the way back to when it was mined, who’s held it, when they’ve held it, how long they held it, who they sent it to. It’s a complete picture of a system that we just don’t get with traditional databases. So that’s the second piece, right? It’s continuous and it’s complete. And that’s because the, the system is immutable, you can’t delete. Right? And so that means that the whole history is there. Every single piece is there. And because of things like cryptography and consensus, which are the code mechanisms by which the data comes to exist, that stuff set in stone, it’s programmed. There’s no ambiguity. If there’s a data point in a field, you can have confidence that it’s correct. And this gets to the third advantage, which is it’s trustworthy. You can know the full context, you can trust that it’s a correct value.

No one individual could come into that system and say, update it to their favor. Right? Change a score, change a value that can’t happen in the system. So you as a data scientist can have this huge trust in the data, which this is probably one of the main things that’s lacking from data science right now is the data’s dirty and incomplete. So are the conclusions correct? Maybe. That changes in a blockchain world. Now you know exactly what you see is to a higher degree, what is happening. It’s more likely to a higher degree that it’s happening. And then the fourth, and this is probably the most important for data scientists who are trying to get a project going. So if you’re early in the stages of getting a project going and you’re dealing with a organization that doesn’t have a data culture, nobody owns data, data’s all in silos.

There’s no data dictionaries. One of the immediate advantages of a ledger like blockchain or Ethereum is that it’s accessible. All you have to do is spin up a node, and you can immediately have access to the data. And if you don’t even want to go through that effort, Google’s already done it. If you go to big query, they have seven or eight crypto ledgers that have been transformed from their ledger format and ETLed into a relational database or multiple relational databases, and you can just download flat files from it. And so those four things, it’s continuous, it’s complete, it’s trustworthy, and it’s accessible. That will save most data scientists 80 to 90% of the time that they spend right now, which isn’t on actual data science. It’s prep. It’s understanding what the data is, talking to people, all that stuff goes away because there’s been consensus and cryptography to get the data into a system that you can fully identify without ever having to ask anybody.

Curtis: Say you apply it to a CRM system where you have humans entering data into that system. Is there still the element of right, someone was entering a data in a certain way. Some people interpreted what that was supposed to mean in a different way and so the data is the same but it actually means something different cause of the person entering it. How do you overcome those, those problems?

Luciano: So you’re talking about a problem that only exists because the technological, technological limitations of the current customer journey for businesses. So on the back end of this process, I keep saying customer journey, but on the back end of this process, if you’re in a marketing team or a sales qualification team, you’re probably going to call this process the demand generation or the demand waterfall where you’re giving leads different things, right? Like okay, someone has said that they are interested, they become a marketing qualified lead. Sure. Do they actually have a real need? That’s where it may be sales will qualify them and then you hand them off to sales to get the sale. Then you hand them off to finance to get to get payment. Then they either get access to your program or platform or somebody does a service for them and on they go down that journey.

Right now that is a hundred different systems easily. So you’re talking about the CRM, which contains information about maybe three or four of these stages, but there is 98 other alternative databases that touch on different parts of this process. So you might have Google ad words data sitting in the AdWords platform. You might have Facebook ads data sitting in the Facebook ads platform, you might have your CRM and it’s not using click IDs to track people from what ad they saw all the way to, they’ve been a client of yours for 10, 20 years and they’ve recommended 10 other people too. There’s no transparency into that right now in 99% of organizations. And the only way to actually get it would be to put all the data silos from those hundred different sources into one place. And this is again, one of those promises of the blockchain. If you assigned a lead at the beginning of that funnel, a token, you can watch that token move through the entire system, across all the departments and know the entirety of its journey from tracking one token versus let’s get 50 or 60 or 70 different systems to know that we’re all talking about Curtis and that specific customer.

Curtis: So in order to do that, that would then have to be an engineered system where everything is happening with the back end being that blockchain, is that correct?

Luciano: Yeah, and that’s one of the limits right now with blockchain is you have interoperability problems, which you have those right now between Facebook ads and Google ads and your CRM. So it’s not as though interoperability is a unique problem to blockchain, but if you have someone who’s utilizing Ethereum and maybe a smart contract for the business, but they’re doing payments in Bitcoin, how do you get that information together across the ledgers when you have wallet IDs but you don’t have identities and the community of early adopters of cryptocurrency don’t like the idea of being identified, so they often try to hide their identity. How do you put that together? And there’s a bunch of projects that are working on this. One is cosmos. That’s probably the most important. They’re going to take all the ledger data in from all the different cryptos blend them and provide insights.

That will be a pretty big accomplishment if they actually pull it off, but probably more than likely what will happen is a new standard will emerge. I mean think about cell phone chargers. You know, maybe you’re old enough, Curtis, to remember. Sure. Cell phone chargers from like 15 years ago where every single phone had its own charger and if you want to see a remnant of this, you can go into truckstops and they still sell the conversion kits where they have one power outlet and 27 different adapters that change because everybody went to a standard that’s going to happen in blockchain. As it matures. There will have to be some standard of tracking individuals across different interactions, but on one ledger.

Curtis: What kind of time horizon do you think we’re looking at there? Yeah, five years. Five years for, for you think most people are using it?

Luciano: Big organizations will be utilizing this within five years. Yeah. Cool. I actually think it will be easier for the small organizations to leverage this technology, and it will give them a huge competitive advantage because again, what this represents the sum total of those four things that are better than a traditional database is that you get information that is accurate, that lets you make optimized decisions. That means you can outcompete people who are guessing. That means you can capture more revenue and save on costs and have more profit and that gives you power in the market. And so we’ve seen this now everybody, this is the process of disruption everybody refers to. Blockchain is going to provide informationally the most disruptive power force that we’ve seen in the last few decades, far more than any of the other technologies that we’ve seen so far. Because the information will be correct.

Curtis: Got it. That’s interesting. Do you see any, uh, any potential downsides of blockchain?

Luciano: Right now it’s all computational power sources. So the, uh, the resources necessary to run the blockchain network are massive. The same for Ethereum. Uh, Hyperledger fixes some of this, but we’re not going to get away from the fact that it’s computationally intensive. We will get more efficiency. And this is where in my econ classes where I teach the history of economics, I use this example all the time and if you look at computers over the last, you know, 60 years of really only been commercially available for 60 years, and you compare that with a car that’s been commercially available for about a hundred and you say, “what would a car look like today if in its hundred years it progressed at the same rate to the current moment as computers have?” And if you did that, the car would have 660 million horsepower. It would go zero to 60 in 0.0034 seconds.

It would get 3.6 million miles per gallon, and it would cost $4,300. That’s how much we’ve progressed with computers. So we’re already on this trajectory of rapid improvement. So these computational constraints are not long term. They will require some better energy sources. They’re just too energy, too energy intensive and maybe some of that gets fixed with things like quantum computing or some of these other processes? I don’t think so. I think we’re only going up in our power consumption needs, but computationally, hard drive, space processing power, all of those things, connectivity, all of those things have done in economics, what’s called the race to the bottom. They’ve gone from where they were infinitely expensive to have a terabyte of storage 20, 30 years ago to pennies to have that. It’s the same with connectivity. It’s the same with processing power. So this will be a problem that time is going to solve with human ingenuity as the help.

Curtis: Do you see there ever being sort of a ceiling or a limit to that with, with the current technology? Do we have to like fundamentally change what we’re doing here?

Luciano: Yes. Oh yeah. Yeah. So one of the other great conclusions of economics, if you had to sum up what are the really key things that have been discovered beyond the downward sloping demand curve. Another is a diminishing returns. You can’t just continue to throw things at a problem and to get infinite returns. And where we see this a lot is marketing teams, so they make this assumption. They say, “Hey, we’re trying to attract a customer and we’ve got this great data from all these platforms that we’re merging and we can totally tell what’s going on, and we put $50,000 into our budget and we got X out. Then we put $500,000 into our budget, think we’re going to get this specific increase and we didn’t. We got way less. Why?” Well, diminishing returns. It’s a fact of the world. Just like the product life cycle is a fact of the world given our current technology, our current social institutions. It’s just something that everything and everybody is affected by. So yeah, you’re going to have it’s not going to be an infinite solution to give us, you know, E economic growth forever.

Curtis: What kind of interesting things are you seeing at your company that you’re working on that that you’d like maybe to share before we wrap up here?

Luciano: I think at the industry level, some of the interesting things that I’ve noticed even over the last six to eight months has been a real commitment from organizations to start to fund efforts to solve data science problems. And I’d say that most of the, most of those organizations can be grouped into two types, doesn’t matter if they’re B2B, B to C, doesn’t matter what the vertical is. Um, there’s two types. There are those who have tried some sort of data science approach and failed miserably and they’re very jaded and they’re small. They’re probably one in five or one in 10 of the companies because most of them have not actually engaged in data science at this point. Um, there are far more organizations who’ve been sitting there watching everybody else talk about data and they feel like that just can’t wait any longer. It’s really hit a critical moment.

It’s now starting to affect them. Everyone in the organization is asking for data insights. They’re getting out competed and they know their competitors are using data and that group is massive and it’s growing. And this the last eight months, they have started to fund initiatives. And the key for data scientists and leaders who are gonna oversee data sciences, you’ve got to get a narrow enough problem to demonstrate one quick win. And I mean in 90 days. If in 90 days you can’t come back to the organization or if you’re a, you know, an outside services, you can’t come to the client and show, “we have made real progress on these metrics in your understanding so that you can make these decisions,” and it’s been 90 days, they’re not going to continue to do it. Most are going to drop off and wait again. And so it’s a really critical moment where on one side of the market you have these businesses who have these, you know, deep pockets and money and they want to fund this because they want it to work.
They believe in it, they don’t know what the ROI is. So if the data scientists themselves aren’t helping to figure that out, the chance that the projects go through is pretty low because if they can’t see the outcome but they see a high price tag, it’s unlikely they’ll do it. And then on the other side you have the supply of data scientists. There’s never been more data scientists in the market. There’s never been more tools available to them. It’s actually getting commodified, which is a good thing. We need the level of data skills in organizations across way outside the data team, just across the organization. The data skills have to come up. I always associate this with typing. There used to be a time, if you don’t know the show Madmen on AMC, go watch it, and you’ll see every person who has an office has someone out front who’s typing because you didn’t have to be able to type as a skillset that was something that you expected to hand off to somebody else in your organization that is the same as data right now and that’s got to change.

You will not be successful as data scientists. You will not be successful as leaders trying to do data science initiatives if the general level of data understanding doesn’t come up in these organizations. It’s horrible. Statistics is probably the single worst taught subject in higher ed because for a long time it’s been about mathematical formulas and and integrals and area under the curve and not stuff that most people can relate to. But most people can understand a pattern when you explain it by shape center and spread and talk about what it’s connected to and how it addresses some of their unknowns. They can actually digest that. And so it’s just a matter of retraining around the application of data science and getting data scientists who are highly technical and business leaders who have the domain expertise to effectively work together.

Ginette: A huge thank you to Lucian Pesci for being on the show, and if you’d like to read the transcript or see any of our attributions, go to datacrunchcorp.com/podcast.

Attributions

Music

“Loopster” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/