Two business people working at computers and writing on paper

Why DataOps Matter

If you’re building a data product, these questions are likely occupying your mind: how do you get your customers to trust your data? How do you know your product’s something your customers will want? How do you produce those products more quickly without compromising accuracy? Today we talk with someone who has a lot of experience answering these questions.

Ginette: If you’re building a data product, these questions are likely occupying your mind: how do you get your customers to trust your data? How do you know your product’s something your customers will want? How do you produce those products more quickly without compromising accuracy? Today we talk with someone who has a lot of experience answering these questions.

Ginette: I’m Ginette.

Curtis: And I’m Curtis.

Ginette: And you are listening to Data Crunch.

Curtis: A podcast about how data and prediction shape our world.

Ginette: A Vault Analytics production.

Curtis: If you’re a company aiming to research emerging technologies, like AI, ML, IoT, or edge computing, and you find your company lacking expertise, we know where you can the expertise to pad your research team: this team is a group of ex-fortune 500, b2b tech product managers with in-depth market analysis, product planning, and development expertise in bringing successful products, software, and services to the market, and they have significant in-depth technology skills on their team. They drive emerging tech research, product strategy, and tech marketing that resonates with customers, and they’re good at it. If a service like this would be helpful to you for a proposal you’re writing or a for a product that you’re creating, reach out to us at [email protected], and we’ll be in touch.

Ginette: Now let’s jump into today’s episode. We’re talking with someone who’s worked with data teams for many years and has learned a thing or two. This is Chris Bergh.

Chris: I’m Chris Bergh. I’m head chef of a company called Data Kitchen in Cambridge, Massachusetts, and we’re a company that helps teams of people who do AI or machine learning or data engineering or data visualization deliver insight faster with higher-quality, and so how did I, how did I get to this point to found a company to focus on what we called dataops? Well, I guess I’m a working class kid from Wisconsin. I went to, in the late 80s actually, I went to Columbia to study AI back when AI was just a corner of the world that people, no one knew what it was, and you didn’t walk through an airport and run into it, and then I worked on some AI systems at NASA and MIT to automate air traffic control, and then I sort of got into software development and managing software teams.

Curtis: To fill out this picture a little more, Chris has two patents under his belt and has had two companies acquired, one by Microsoft, while he was building the company in the C-suite. So he’s no stranger to the difficult experiences that come with companies’ growing pains.  

Chris: About 10 years ago I got into data and analytics, and the company I worked for was about a 60 person company. We did everything that you could do in analytics, and we did data visualization. We had data scientists. We had data engineers. We even decided to build our own complete software platform that did everything in analytics, and I was the chief operating officer, and I worked with a guy who was from Harvard Medical School, really knew, it was a healthcare analytics company, really knew health care and really could talk to customers and figure out what they wanted, but then he’d come back to me and say, “Chris, here I’ve got this idea. Customer has this pain. Could you get some people together and figure out how to solve it, so I would go off and pull the data scientist and maybe data engineer and maybe someone who knew Tableau and maybe a software engineer in a room, and we’d talked it through.

And I’d, I’d, you know, we figured out we’d do this and do that and say, “hey, it’ll take two weeks, and I know very proud that we could deliver in two weeks, and I’d walk into my boss’s office, and he’d look at me, and I’d say it’d take two weeks, and then he’d sort of pull his glasses down and say, “Chris, I thought that should take two hours, not two weeks,” and I’d be a little you know upset, and my tail would between my legs, and I’d walk out of my office, and I’d go, go back in and get a phone call from one of our customers, and I’d hear from them, and they’d say “Chris, the data’s wrong,” or “this week’s data is wrong. If you don’t fix it, we’re going to throw you out.” And I’m having a really good day, having my boss look at me that I was going too slow and my customer saying the data’s wrong.

I’d walk out of my office, and we’d hired a lot of bright people, and they’d say, “I have this new open source tool. I want to try this out. I want to innovate, and so my life for many years was—as a person who made the analytics train run on time—how do you deliver insight fast to your business customer? How do you deliver it with really high quality? And how do you let people innovate and use the tools that they love, and when I did that for everything in data and analytics, you know, putting data together, deploying predictive models into production, developing predictive models, doing visualization, the whole end-to-end scope of data and analytics, and so I lived this, this, this life of trying to help a whole team of people deliver data and analytics, and so we sold that company, and then my co-founders and I, we started Data Kitchen with the idea that after talking to about 200 people that this role of a chief data officer or chief analytics officer or people who run data science teams or do analytics really have the same problem of, of the trains run on time problem. How do you get all these people who have different skills of putting data together or doing visualization or modeling to deliver insight to your customer and not have it just take forever and also how to let people create and innovate, and so we’ve built a software product around that and have taken ideas from software development, agile software, what’s called devops and and lean manufacturing.

Ginette: What they’ve created is an approach for dataops adapted from the software development process, which in developmental maturity is far beyond where the relatively young data field is. This approach shifts the data scientists from a somewhat solo hero to more of a team player.

Chris: We’ve been talking for years at conferences about dataops, and I think the biggest challenge isn’t technical; it’s a mindset change, and why do I say that? Well, I think first of all analytics, data and analytics, is a team sport. They have all those roles involved, so people have to communicate and talk, but also they have to sort of stop being heroes, and that means they have to think about things from a more process-centric way, and I’m, I’m sort of done in data and analytics of getting a call on Saturday morning, hearing that the date is wrong, and then having to sort of slink off from the soccer field and fix a bug that something happened in production.

And we all take a lot of risks sometimes in trying to do things, and then there’s got to be a better way than putting some data, putting a model on production and finding out that it broke, broke from your customer because a lot of times, if you want to use analytics to influence someone, they instantly want to doubt the data, and so if you can prove that the data is right, prove that the model’s giving a result that’s right, you can actually influence them better, and so teams have to have a mindset change that what they’re doing is is teamwork, moving fast is really important, that testing their results really is important, and this mindset change to a dataops mindset I think of is one of the fundamental pieces that that people have to do, and it’s not there yet in the marketplace, and I have gotten push back from data scientist saying, “You know, I’m a, I’m a scientist. I create the model, and I sort of chuck it over the fence to someone else to put it in production, and you know likewise there’s those different roles: the data engineer versus the data scientist and sometimes they have their own turf wars, but they’re really part of a continuous value chain, and that’s, that’s really what dataops is a part. People have to work together.

Curtis: Besides developing a teamwork mindset, speed of getting to insight for customers is really important. So how does a team decrease time to insight?

Chris: Yeah, the major pain is, is the speed at which you can get an insight to your customer that they can understand, and so a lot of times, it’s really hard to decide a priori what’s going to affect someone to make a decision, and is that going to be . . . there’s different visualization that they have, there’s different models that can influence them, there’s different data sets, and so that . . . if you can get something to some . . . in their hands sooner and then iterate and improve upon it, you have a better chance of actually making change in the world, and so a lot of times people—certainly technical people—really like their keyboards and really like to kind of do their work and sometimes it’s fun to kind of go off for a month or two and do your work and then talk to your customer at the end of the end of the time, and that’s, it’s nice, I mean I’m a I’m a nerd. I enjoy that, but the reality is you’ve got to force feedback into your process as fast as possible: outside feedback, your customers, the business users, the consumers, and that way you actually produce better results by forcing that feedback into the process, and it’s, it’s scary and you know you want to sometimes . . . it’s better sometimes to sit in your, you know, in your office and just code for a few hours, but like getting that feedback actually makes better results.

Ginette: Forcing feedback into the system at the right intervals for you and your client can prevent your team from missing the mark and can save you headaches.

Chris: At the end of the day, a team of people want to create, right? They don’t want to get beat up by their business customer. There’s a lot . . . everyone who’s in data and analytics fundamentally believes there’s value in data, but the people that they’re trying to affect in the world and make . . . you know there’s an interesting set of abstraction skills to take data and turn it into insight, and there’s a lot of people who want that insight, but also they don’t want, you know? There’s a confirmation bias that happens in analytics, sometimes, you know? I’m using analytics to prove that what I’ve already believed is true, or there’s people who don’t have the time of day to listen to analytics, and so we’ve got to as people who do data analytics for a business try to find a way to take the power on that data and and make change in the world, and by spending several months or several years working on something, that’s not it. It’s the same idea that happened in software development. Instead of spending months or years producing software and then shipping and finding out nobody wanted it, software’s learned that you’ve got to ship in very, very quick cycles, and that iterative cycle of getting it into the hands of someone is actually a way to build much higher quality software, much more software that people liked, as opposed to these sort of failed multi-month multi-year projects, and I think that that’s kind of the idea that data, we’re trying to bring to dataops to data and analytics.

Curtis: The lessons learned from a parallel field like software shed some light on how to improve the data process, which like software development is a much more complex process than people with little experience in the industry might guess.

Chris: Because of running an organization that had all these roles and trying to help people learn, I sort of came in with, about a dozen years ago, with my own bias that like you know data’s easy. We just put it together, and you don’t need to test it. You don’t need to worry about it, and you know, I had data engineers who would . . . we’d have 24-hour data builds, and they would make a change and then wait 24 hours and find out it was wrong, and then they’d go back and make another change and wait 24 hours, and I learned this idea that you really should test what you do, that if you think about your job as code and I’m writing some code, and that code may actually be SQL, it may be python, it may be R, it may be a Saas, it actually even be a tableau workbook that has . . . if you’re in the business of code, you’re in the business of complexity, and how do you not have a big team of people create a hairball of complexity, and so just like I when I manage software teams, I think of managing a data team or a data science team in the same way: they are in the complexity business, and how do you tame that complexity? You need to first of all think about what you do as code, and, and because it’s code, and because it’s code, you should be able to check it into a “sforce code system” and branch and merge it, and you should also think of it as something that you should test, and one of the ways that I’ve learned over the years, if not 20% of your work is an automated tests to be able to be able to test the data going in, test the data going out, test that it’s still working the same way. Testing is very important, so code, being able to automate deployments, being able to put your work in source code, being able to branch and merge it.

Ginette: Verifying that your data is accurate is extremely important, and testing it frequently is key to that process. So how do you get your customers onboard with an approach where you’re sending them a product to review at various intervals?

Chris: If it, with the business customer that you’re working with, if you can start to get into “I’m going to deliver a little everyday or a little every week,” you get into a pattern of trust where they can say you don’t have to give them a big . . . you don’t to sit down and write a requirements document that takes two weeks, and you hand it back and forth, and you end up with this interim representation of what you’re going to do that in and of itself take weeks to do, and so I found that if your business customer starts to believe in agile saying, “Okay you’re going to give it to me, it’s going to be 70% right. I’m going to touch it, interact with it, and give you feedback and then the next time you’re going to get it 80% right and then 90% right, and you know what, that’s all I need. I don’t need any any 10% more,” and so the the the core idea to hear is that your customer doesn’t know what they want, and it’s because it’s really hard for people not because they’re not smart not because they’re any less of a human being but just it’s hard, and it’s also really hard to communicate that in some other format, so if you put it in their hands quickly and then get their feedback and the things that you can you can play with are sometimes you can give it to them a variation of what’s in production and give it to them and say, “Hey, the data is kind of low quality but here it is. The model’s not completely perfect. What do you think?” and then they look at it.

And I’ve had literally thousands of times where I’ve looked at . . . maybe not thousands, maybe dozens of times literally . . . where business customers have looked at what I’ve producer or my teams produce, and in three seconds say that’s wrong. And why would you spend months doing something where someone can in three seconds say it’s wrong, and so that’s where the . . . it is partly having both sides of the equation trust you’re data and analytics team work in an agile way, deliver frequently, and then your business customer start believing that’s the way to do, and so as again where the data ops or agile is really a mindset change.

Curtis: To be a high achieving data team, you need to be able to work at the speed of business, create high quality, and be innovative. Seem like a tall order?

Chris: People aren’t wrong to ask for things fast, and they are living at a . . . a person in a business role has a lot of complexities to their job, and they have to make decisions at a very, very fast rate, and they need a data team that can respond and work at the speed of business—otherwise, they’re just going to do what they normally do, which is sort of make decisions based on gut instinct or wrong data, or just you know throw some stuff against the wall, and to be really a data-driven organization, you need to have the speed of business be linked to the speed of analytic insight delivery, and that’s what we’re trying to do with with data ops. And so nobody’s wrong trying in that equation of “go fast, deliver high-quality, and innovate.” It’s not a choice between one of the three. You can’t say okay we can go fast, but we’re going to have a lot of errors. We can innovate a lot, but it’s going to take a while. People want to innovate. People who work in data teams . . . there’s a lot of great open source. People want to learn new things. They’re really right. People want to try some stuff out, and data science itself is really fundamentally an innovation business. It’s a science. It’s an experiment. Second, data quality and getting the right data to your customer or getting it so that they can trust it is, is really important. If you start delivering things to a business customer or a customer comes to your website and gets something strange, they lose trust in your business and ultimately will lose trust and dismiss the team that is responsible for delivering that insight, and so all those things play together.

Ginette:  A huge thanks to Chris Bergh for his insights, and he says Data Kitchen is hiring, so go check them out at datakitchen.io.

 

Credits

Photo: Photo by Helloquence on Unsplash