teams sitting around a whiteboard

Structuring Your Data Science Dream Team

The way you organize your data science team will greatly affect your business’s outcome. This episode discusses different structures for a data science team, as well as top down versus bottom up approaches, how to get data science solutions into production organically, and how to be part of the business while remaining in contact with other data scientists on the team.

Mark Lowe: Having lived through small scale, two people working, to large scale, thousands of people in your organization, the way that you organize the data science team has dramatic effect on its productivity.

Ginette Methot: I’m Ginette, and I’m Curtis, and you are listening to Data Crunch, a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.

Building effective data science processes is tough. Mode, the data science platform, has compiled three tips to make it a bit easier: don’t over plan, there’s no one process that fits everyone, and waste time. That’s right. Waste time. Read more at mode.com/dsp M O D E.com/D S P.

Today we’re going to talk about effective ways you can organize your data science team, and we’ll hear lots of great insights from our guest. Let’s get to it.

Mark: My name is Mark Lowe. I’m currently the senior principal data scientist here at Valassis.

Curtis Seare: Describe just a little bit about what Valassis does.

Mark: So we work with pretty much every major manufacturer retailer in the U.S. Our work kind of runs the gamut in terms of solving problems for them in terms of how do I influence customers. And so we manage a lot of print products that go reach every household, every week and of course a lot of digital products. So everything from display advertising, campaign, search campaign, social. Pretty much any distribution mechanism that can influence customers, we try to use those channels.

Curtis: And in working on these problems we talked a little bit about earlier what the approaches for data science. Some people try to bin it in a software development kind of a role, an agile role, and how that usually doesn’t work for data science cause it’s more of an experimental type of a thing. Can you comment on its similarities and differences and how you should be approaching data sites?

Mark: I think that’s a great question. Honestly, if you, if you asked me 10 years ago if this was an interesting question, I would have found it very boring. But having, having lived through small-scale, two people working, to large scale, thousands of people in your organization, the way that you organize the data science team has dramatic effect on its productivity, and there’s no one size that fits all. Honestly, you kind of have to cater the organization of the data science team to where the company is. For example, the two common models that are deployed and, and we’ve, we’ve lived in both of them is kinda thinking about data science as an internal consulting group. So I have a a pool of data scientists. Stakeholders throughout the company come to me and ask, they say, “I have this problem. I think it needs data science” and then the data science lead or team.

Yes, we do need a data scientist working on that. Here’s a person with that specialty. So kind of farming out individuals on the team to solve particular problems. So it’s a fairly centralized organization and that, you know, there’s a lot of benefits to that. One, you’ve got strong sense of community as a team. Oftentimes you’re very tightly organized together. You function as a data science unit. You can try to make sure that you’re putting the right skillset for the right problem. As you know, as you’ve talked to that, there’s, there is no one definition of data science, there’s no one skillset. So oftentimes the data science team has a mixture of skills across the team, not necessarily every individual has that mixture embodied them. So you know, having that ability to judge the business problem and then assign a person from the team to it has some benefits to it.

I think as we got a lot bigger, there just became more friction with that model. And so we moved to a scrum model, which is a, a version of agile software development. We have a lot of engineers at our company. We do a lot of software production, and so our software teams were already organized that way and so we were kind of not fully enveloped in that. So, so we moved to that, and I think we did it in a little bit different way. Certainly talking with other people who’ve use scrum with data science, oftentimes they’ll still organize as a single data science team and that team will do scrum. So they’ll have planning, they’ll do sprints. The way that we’ve employed it is we embed data scientists into individual scrum teams. So our scrum teams are kind of organized by like the product that’s being developed for instance.

And so you know, a scrum team may have 12 people on it, it could have eight engineers and four data scientists or something like that. And so our data scientists, we support about five different scrum teams. And you know, there’s some, we’ve seen some real benefits. The the biggest one as a data scientist, you want your work to impact the business. You want to allow for the business to make more money. You want to help your clients. And the only way to do that is to make sure that the data science work you do makes its way into production. It is inside the software that ships. And so having data scientists scrumming with engineers, daily standups, planning, et cetera. We found that it dramatically speeds up the time in terms of getting a data science project into production because everyone on the team is hearing about the work. They can catch potential design flaws or scale challenges very early on, and there’s just a lot more collaboration on the idea. So, so that, that’s worked really well for us as our size has grown.

Curtis: As opposed to just being centralized where you kind of don’t have that day-to-day communication and you can’t pick up on those little nuances that are, that are happening.

Mark: Yep, that’s right. To really oversimplify it, I kind of think that the work comes to the data science team usually in one of two ways. Either someone says the business needs to do X and X requires data science. The consultant model works okay. That way someone’s coming to you asking for it. But the other way that I’ve seen work get done is data science sees an opportunity to improve the business in some way if we do X and X requires data science, and I think in what I’ve seen at least in terms of having the data science team involved in scrum, it lends itself to that second business value approach faster because again, you are working very closely with the engineering teams on a particular product area, so ideas can find their way into a production environment oftentimes much faster and more organically benefits that top down someone saying the business needs X.

Curtis: With your data scientists kind of spread out amongst these teams that are each doing their scrum processes, is it still centralized in a sense that you, I mean do they all get together at any point? Do they all have sort of a central data science manager? Or are they truly separated from one another in these different business product areas?

Mark: Yeah, we’ve, I’ve experienced it in a few different ways. One is where each business unit starts to develop their own data science team, and sometimes they need a slightly specialized skillset and so it makes sense to specialize around that way. And I’ve seen in environments like that, a lot of times there will be an engineering manager and that engineering manager will have ownership over the data scientists assigned to that team. And there’s some benefits there. It’s a manager who is very close to what you’re doing on a daily basis, how productive you are, et cetera. The way that that we’ve been operating is we still try to have a data science team. So we, we all meet as data scientists as a team. We have a single data science manager who really owns making sure that we are obviously recruiting data scientists, that we are plugged into the community and we do a lot of things to try to make sure that we know what the other people are doing.

Mark: So we have sandboxes where, you know, we meet every week and someone is kind of presenting either the problem that they’re working on or a solution that they’ve come up with just as a way to one, soundboard at with their colleagues. But, but you know, also I think just feel like there’s a sense of community. It’s been my experience. I’m sure this is true for everybody, but I’ve noticed it. I feel like more on data science, there’s just a real desire to feel like you’re a part of a data science group, to feel a sense of community around your craft and how you’re developing it. And so we’ve tried to make sure that we have strong connections across folks, even if they may be working in different subject matter areas.

Curtis: Interesting. What kinds of things are you looking for in people that you hire and is it easy or is it hard to find the right people for what you’re doing?

Mark: It’s always hard.

Curtis: It’s hard. Yeah. Okay.

Mark: And you know, it’s be, it’s become harder I think in the last few years because the signal to noise ratio is become a lot higher I think, or rather lower. You know, it’s, there are so many programs now and it’s great. Every college almost has data science programs. You can certainly go online and find very high quality courses to take to augment whatever your degree was in. And so it’s, it’s very common that people across a whole spectrum of backgrounds can claim some experience and knowledge about data science problems or some data science subject matter. And so that kinda makes it more challenging to do easy screens of resumes or use, you know, your keywords that used to help you really be able to try to identify folks. So it’s not a bad thing. I mean the more talent and, uh, folks that we have obviously then the better for everyone.

But, but that’s been one challenge. And then the second challenge is that I think everyone needs data science. Now. It almost doesn’t matter what commercial activity you’re engaged in, there’s a real pressure to, to have data science on staff or else you’re behind. This is not meant in a cynical way. I mean, this is my craft. But, um, I think for a lot of companies, they really don’t need data scientist on staff. They can use software that is very good for what they need, or they really need talented data analyst on staff. You know, they have a lot of data and they have questions that they want answers to. But that’s, that’s not data science, per se, but that rush to have so many people want data science has obviously created a bit of a demand backlog. So, so those, those things always complicate the hiring process a bit.

Curtis: Data science is a little bit more experimental maybe than software development. Sometimes you fail, sometimes you succeed. That can make people uncomfortable, especially when you have smaller budgets and, and you’re on deadlines, things like this. How do you manage that with a business that maybe has a smaller budget and yet that’s the nature of data science.

Mark: Right? It’s definitely a good question. I, I mean, you’ve probably encountered this, but you know, I think a lot of times people love the idea of experimentation and they want to know that you are doing experiments, but there isn’t an expectation that most experiments fail. It’s almost, there’s an expectation well that most experiments succeed, which isn’t really the case. And so I think a lot of times the challenge come from just having open early communication with, with stakeholders about what the approach is. And I think really the truth is, too, trying to make sure that you can convert as early as possible wishy washy goals from a stakeholder into something more quantitative. For instance, as data scientists, we can’t accept someone saying, “okay, we’ll make it better or do it faster or we need it to be more accurate,” because we can just lose infinite amounts of time on those types of, of statements. So really there’s a process with stakeholders of trying to say, “well would you be, would you be happy if it was 20% faster or if we were able to achieve an improvement of 5% then accuracy, would that solve the client’s problem?” Really actually having a firmer quantitative goal up front that you’re going after will help both in the design of the experiments and managing expectations throughout the process.

Curtis: That’s really good advice. There’s a difference between a data analyst and a data scientist, and I’m just wondering if you can expand on that a little bit. ‘Cause I think there is a little bit of a confusion about that in the market.

Mark: Right. And this is . . . I’m highly opinionated about this and so you know, a lot of people can disagree but I think what I see in a lot of businesses and business units is the fact that they are surrounded with data and they desperately need someone who can help the business formulate the questions to ask and can communicate with data to ask those questions remains a very, very important role in an organization. And I think what, what we’ve seen is that we, and this is good thing, but, you know, we’ve tried to democratize the access to data and so, you know, and to some extent both systems and training we’ve, we’ve let more people be able to do this but, but we haven’t necessarily, I think, still made a very important role for people who do this really well. As an, you know, I have years of experience in terms of formulating questions and how to quickly get answers to them and when to be mistrustful of the data.

Those are tremendous data analyst skills that are valuable. And I think a data scientist certainly may have that some of that skillset, you know in this Venn diagram of of data science, but I think that alone isn’t necessarily the definition of data science. And sometimes businesses may be confused those two things in terms of what they need, which can challenge them and being able to find the right person for a role. A lot of times internally they may have the right person who could be a tremendous asset in that data analyst role, but they’re looking for a data scientist, and so you know, they may crowd out some talent internally.

Ginette: A huge thank you to Rob Lowe for being on our show today. As always, head to datacrunchcorp.com for our show notes and attributions and remember building effective data science processes is tough mode. The data science platform has compiled three tips to make it a bit easier. Don’t over plan. There’s no one process that fits everyone and waste time. That’s right. Waste time. Read more at mode.com/dsp. M O D E dot com slash d-s-p.

Attributions
Music
“Loopster” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/