Matthew Battifarano has always been interested in mobility and transportation. He talks to his intrigue with traffic equilibrium, which includes autonomous vehicles, ride hailing, bike sharing, etc.; what shaped it; and what lead him to his PhD at the Mobility Data Analytics Center at Carnegie Mellon University.
Matthew Battifarano: This is something I’d never heard of before, before I started in the program. So a lot of the research I do now, and a lot of the research that’s interested in modeling new technologies like ride hailing or autonomous vehicles or bike sharing, or all of these different components of mobility that we see in cities now, a lot of them use this concept of traffic equilibrium.
Ginette: I’m Ginette,
Curtis: and I’m Curtis,
Ginette: and you are listening to Data Crunch,
Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.
Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company.
If you want to become the type of tech talent we talk about on our show today, you’ll need to master algorithms, machine learning concepts, computer science basics, and many other important concepts. Brilliant is a great place to start digging into these.
The nice thing about Brilliant is that you can learn in bite-sized pieces at your own pace, and with a bit of consistent effort, you can tackle some really tough subjects. With 60+ courses that combine story-telling, code-writing, and interactive challenges, Brilliant helps develop the skills that are crucial to school, job interviews, and careers.
Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription.
Today we chat with Matthew Battifarano, former data scientist at Bridj and current PhD student at the Mobility Data Analytics Center at Carnegie Mellon University.
Matthew: I grew up in New York City, and that really shaped my view of transportation. I grew up really enjoying transportation. There’s a great transit museum out in Brooklyn that I loved going to as a kid. You could like go up in the driver’s seat of, of a bus they had there and all kinds of cool stuff. And that made a lot of sense to me, particularly growing up in that environment, like that’s how I got to school. That’s how I went everywhere that I needed to go. So from a young age and growing up there, that just made sense to me. When I went to school, went to college, like I didn’t really think about transportation. I went to the University of Chicago, which doesn’t have anything that’s applied. They’re very theory oriented. So there’s no engineering. There’s no anything that has like a real application. I studied math there. Did my undergrad there.
I also, at that point, was really interested in computational neuroscience, which seems like totally unrelated to what I’m doing now. And it largely is. But the one thing that really stood out to me about it and what I, the reason why I was drawn to it is the whole field is taking a really complex biological system, the brain, which is like, if, as far as biological systems go, like that’s a really hard one to crack open.
Curtis: Pretty much the top, right?
Matthew: Yeah. It’s like at the top and they were using mathematical models. They’re using machine learning; they’re using all of these techniques to break down and try to understand it. And there are ways in which that’s really successful that I was very interested in. And there are also ways in which that’s really hard, and mathematic models like don’t yet give us good insight.
So that’s what I was oriented towards throughout college. I spent two years in a research lab in computational neuroscience and basically learned in those two years that I didn’t really want to do that. It was, there were parts that I really liked. And I figured out that the parts that I really liked were the parts where I was figuring out how to mathematically model some process that we were investigating. And there were a lot of other components that I, the domain itself wasn’t as interesting as I had hoped it would be. And it didn’t grab me in the way that I wanted it to. So I was sort of looking for a change, and I found this startup called Bridj, which was just beginning out in Boston at the time. And this was around 2014. And their whole idea was we have all this data about how people move around a city.
It’s everywhere. It’s in, it’s in Yelp. It’s in Google Maps. It’s in all these, like there’s tons of different ways, especially with the rise of smartphones, ton of different ways. We can measure mobility in cities in a way that we could never before, how can we leverage this to design a better transit infrastructure? So our focus was on these what’s now called micro transit, which is using small passenger van, sort of maybe like 12 passengers to a van and routing this dynamically. It follows, uh, this was not a new idea. It follows in the tradition of, you know, a lot of Jitney services, which exists in the US and, and more commonly throughout the world. It was trying to take that model and bring in a data layer on top of it to try to make it work better and more efficiently. So this was actually before Uber pool showed up. So when Uber pool showed up, it was great because our whole company got a lot easier to explain. On the flip side, it sounded a lot like Uber Pool. We were just like, “Oh yeah, it’s like Uber pool. But you know, now we’re dealing with, with 12 passenger vans.” And so there was positives and negatives to that.
Curtis: Interesting. Was that, was that a net positive? Would you say, did it give you, or was it sort of like, why don’t I just use Uber pool?
Matthew: I think it was a net positive. We, we were a startup. So, and at that point on a pretty small scale. And so just getting people used to the idea of taking out their phones and sharing a ride in some fashion, there are ways in which our platforms differed, obviously because we, you know, we were trying sort of have a more bus-like experience where it was a larger aggregation, a little less flexible than a point to point, but it would offer sort of a middle price range. So the idea of using your smartphone to call transit was an idea that I think spreading that idea was helpful to us. It also, it’s on the flip side, it’s kind of hard to compete with Uber or Lyft or these big, you know.
So I was there for, uh, about two and a half years. I got hired on as to, to help with their data teams so that what we called the science team. And we had two sort of complimentary goals. The first is how do we know what people want? How do we know where people are trying to go? How are they moving around the city? So that was really the first part of that pitch of the business is, you know, we’re leveraging data about how people move. So the first thing is, “okay, let’s find the data, let’s figure out how we can extract these mobility patterns from it.” And then the second part is using that, that information. “Can we design routes that are efficient or optimal in some sense.
Curtis: So, where did you source the data from? That’s usually a really hard problem in terms of like, where do you get the data? Do you have to pay for it all these kinds of things.
Matthew: I’m not sure that I can. So the company itself went, ended up going under it in, in late 2014, but the, the remnants of the company were bought by an Australian company, and it exists down there. So I’m not sure exactly how much I’m allowed to say about the data sources, but it was, I would say two things. It was a variety of subscription-based data sources and also kind of more one-off like data purchasing. We tried to be really broad with this. Um, the reason behind being broad is that each data source comes with a particular set of biases. And our ultimate goal was sorta to figure out one, like, what is the total mobility pattern going on in the city, which is sort of a latent variable because you can’t directly observe it. And then as sort of like a subset of that population, who is actually going to get on a vehicle in the next month, or, you know, over a short horizon, ’cause we’re always trying to, we were again, trying to make our ridership numbers look good. So we were always trying to make decisions that would help those metrics along ’cause we were in the process of trying to get additional funding through much of our, through much of my time there.
Curtis: Got it. Yep. And that’s, that’s a common problem in startups, right? It’s, it’s, uh, trying to use and analytics to prove that you’re doing something good that it’s worthwhile and to get more funding. So yeah, I get that.
Matthew: Yeah. And it’s also one thing that we struggle with is the platform itself is a really nice data collection platform in the same way that Uber and Lyft have a great data collection platform. They understand a lot about their own service and about the demand that utilizes their services. They also run into problems. Of course, that data itself is biased. So there is even a tension there, but when you’re a startup and you’re just trying to introduce something to market your data is almost useless because it’s one it’s so small, and two it’s very concentrated and very biased in terms of where you’ve decided to go and how you’re marketing it. It’s very sensitive to these early on business decisions, which are of course, rapidly changing. So it’s hard to interpret that when it’s, when it’s at that early stage.
Curtis: For sure. Yeah. So, so how far did you get in this process before you, you moved on to, to the, I assuming your next step was academia.
Matthew: Yeah, we got pretty far, we actually successfully, there was, we had a roadmap that we were following to get toward this goal of, you know, these sort of super dynamic and, um, and convenient bus network that you could take at all hours of the day. We had some sort of milestones that we were aiming for, and we achieved a few of them. We didn’t get all the way to where we wanted to be, but we made significant steps, particularly on the optimization side. The demand forecasting side is really hard because no matter how good your methods are, if your data is not really there, then there’s only so much that you can say. And we ran into that problem pretty early on. The other thing that we realized is that, that the demand, even if you had a really good demand forecasting system, if you don’t have the ability to act on that information, then it doesn’t really matter how well you can predict demand.
Right? So we sort of shifted our focus halfway through to really focusing in on the, on the optimization component. So assuming that we have some idea of where demand is, how can we create an optimal set of routes and how can we in particular, like optimize that sort of on the fly. You’re not optimizing like a fixed route. You’re optimizing a dynamic route that can change, which is a really interesting research question in its own in its own right. Sure. So we shifted focus. I spent like a lot of my last year there working on that,
Curtis: That’s interesting. Was the problem you ran into there more maybe computationally, like how do we quickly take in these variables, run something against a model and then have something to use in a timeframe that that is suitable, or was it more the actual algorithms that you’re trying to design to work on the problem?
Matthew: It was both, uh, which was tricky to, to manage. Uh, the idea is, of course, if you’re trying to do something dynamic, you are time limited because you need to respond somewhat quickly.
Matthew: A lot of, I think a lot of machine learning applications, and even the example, you know, demand forecasting that we were, that we were focused on for the first half of my time there, can be done in a completely offline setting. You spend a lot of computational resources. You spend a lot of time. You come up with a model. Maybe that model gets updated in the background every so often, but it’s not really being mo like . . . it’s being applied in real time. That’s the easy part. And it’s being trained sort of over longer cycles with when you’re trying to do something like what we were trying to do, where you want something that’s optimal, based on the current situation, you sort of have maybe two options.
So one, you sort of figure out what are the, all the possibilities that can happen and figure out in advance what’s the best decision. For something like this problem, that really doesn’t make a lot of sense because there’s so many things that can happen. The search space is enormous. The other thing is that you have some sort of online method where you are sort of ingesting data or whatever, and using that to sort of figure out what your next move is. And so we sort of had to figure out what methods would provide us a balance between those two. What methods make sense in that context. And then also, how do we make sure that we’re able to come up with a solution or something in a constrained, in a time constrained environment? So that was both a domain question. And also an algorithmic question.
Ultimately, we had to face the reality that sometimes for whatever reason, it just would not be able to finish a computation. And so there’s a question in there of what do you do? How does the application as a whole respond to failure? And that’s sort of within a much larger question of application design in general is you want resilience under failure, different components failing. So that was a really interesting intersection between a traditionally software engineering focus, or really not even software engineering, kind of . . . more like dev ops, where you’re considering the development and operation of a, of a software application and machine learning AI of what do we do when a component, when this particular component fails and how do we also at the same time minimize that or mitigate failure.
Curtis: Right. So more user experience questions, right? What do we, what do we do to, to ensure that this is still useful or does something that, that can help out the user? It sounds like, um, that’s really interesting. And those kinds of problems often are, are the hairier ones. I find as I talk to people, although, you know, the modeling is not, not easy either. And I’m curious, and if you’re not at Liberty to say, that’s fine, but what kind of models did you end up using to solve these, these problems and infrastructure?
Matthew: I can’t really speak to that specifically, but there’s a lot of research that exists to solve this sort of . . . This problem lives within a pretty well-known class of problems called the vehicle routing problem. And there are a ton of different variants of this problem that are aimed at solving different problems or aimed at different applications, rather. So a really sort of prototypical application of the v, of the vehicle routing problem or VRP is in logistics. So if you’re ups or FedEx, you have some depots where you have packages sitting, you’ve got a fleet of trucks, and you have a bunch of destinations. So the question is, how do you route these trucks to, to serve all of your, all of this package demand in the most efficient way possible? And there are a lot of different approaches to doing this. So, and again, depending on your operational constraints, some might be better than others. So there’s a ton of research out there on different methods that you can use.
Curtis: So you didn’t have to develop anything from scratch. It sounds like there was some research you could build upon and sort of modify for your needs.
Curtis: Cool. Okay. That’s great. And so, so then how did this lead into you going back into academia?
Matthew: Yeah, so about two years in, I, you know, I had been working on this demand forecasting. I had been working on this optimization engine also sort of on the side. I’ve been working with a small team that was really interested in drilling down on our operational metrics as well. So answering more short term, maybe more traditional data analytics questions about how the business was operating. This is looking at sort of more day to day metrics about business performance and, and how we might make small interventions, again, on like a day-to-day basis to improve the quality of the product. And I sort of had felt like I had gotten to a point where I could see myself in this position learning slowly on the job, but I saw sort of diminishing marginal returns on that. I was spending a lot of time implementing this optimization engine, and I was around and part of the discussion in terms of how to actually model it and how to develop it from a methodological standpoint.
But I just didn’t have the, the background or the knowledge to really contribute in a fundamental way to that development. And that was something I realized I was really interested in being able to do.
Curtis: Got it.
Matthew: And everyone who was doing that, they had PhDs. So we had one that was, did their PhD in transportation, sort of more generally, we had another, that had their PhD in operations research and another that had their PhD in, in artificial intelligence. So between all of those perspectives, we, the, we were able to come up with what I thought was a really cool approach to that problem that I would have never been able to think of. I would never been, been able to express. One thing that was really cool is that in the process of this, I found there were parts of the method that really fit into an intuition that I already had, but I would have no way of getting from my intuition to an actual mathematical formulation or an algorithmic formulation.
And that was at one point from one perspective, really cool ’cause I was like, Oh yeah, this is, this is what I was looking for. I just couldn’t express it. On the other hand, it was really frustrating because I felt like, well, what if I, if I had been able to express it, if I had had that background, I really would have been able to contribute a lot more than I did. So I started looking for academic programs that melded this view of transportation and mathematical modeling. And in that regard, going back to the computational neuroscience, that was a familiar desire for me. This is, I was looking here, we have this really complex system that no one really knows how it works. It’s a lot of individual decisions being made by individual people, and you can’t really measure it, even though we’re surrounded by it all the time. There’s a really complex system that’s very important to understand because it affects how we live every single day. And here we have some really interesting examples of how a mathematical modeling approach can really add to that understanding and can help us improve systems. So I started looking for programs and in particular professors who were taking this approach and taking this perspective of trying to bring a mathematical modeling approach in particular leveraging new sources of data that weren’t available before to understand and improve transportation and mobility, um, particularly in an urban setting.
Curtis: Now, is that a, uh, a common topic? Was that like hard to find somebody that was focused on that specifically? Or is that more, uh, I don’t know. I mean, I’ve never heard about this particular sort of niche application before, so I’m curious if it was really hard to find someone focused on that or if there was some options.
Matthew: That’s an interesting questions, I, I, as I’ve talked to other people as they go through their grad school applications, and I think one thing that I’ve heard and definitely I experienced is you have this idea of what you’re interested in, but you don’t necessarily have the vocabulary that the niche that you actually want is using. So, you know, if you, if you were starting your search for grad school or professors, you might type into Google, something that resembles what, how you would describe the area you’re interested in. And that might be correct, but it might be something that might be language that people don’t use in that field or in that niche. So it’s really hard to, to sort of figure out, I’ll give you a more concrete example. So what I do now has a lot to do with a particular modeling area or particular modeling method called traffic equilibrium.
Matthew: And very briefly it just basically answers the question or it models this phenomenon. If you have a bunch of people who are trying to use the road network or whatever transportation network to get from where they are to where they’re going, how do they end up using this network? And when you think about it, the use of the network depends on how everyone else is using the network. Right. Just think about like when you’re looking at Google maps and trying to figure out the shortest, the best way to get in your car from point A to point B, that’s going to depend on traffic, which depends on all the other decisions that everyone else has made.
Matthew: And so traffic equilibrium is, is this sort of, sort of economic based model of how people make those decisions. This is something I’d never heard of before, before I started in the program. So a lot of the research I do now, and a lot of the research that’s interested in modeling new technologies like ride hailing or autonomous vehicles or bike sharing, or all of these different components of mobility that we see in cities now, a lot of them use this concept of traffic equilibrium. And so now I would say, when I look for other professors who are doing similar work or other labs that are doing similar work, I usually start from how are people looking at this in terms of traffic equilibrium?
Curtis: Got it.
Matthew: But on the, on, when I was applying, when I was doing, going through this process, I had no idea that that’s what I should be typing in to Google.
Curtis: Sure. Yeah. That’s interesting. And so it’s, it’s a search problem, right? Yeah. Knowing you kind of know what you’re looking for, but you don’t know, like you say how to express it or what the vocabulary is behind it. So how did you find the program you’re in now? How did you determine that, that, that was the one that you wanted to do?
Matthew: I ended up looking at, it was sort of an iterative process of, I would sort of put something out in, I ended up a lot in Google Scholar as well, looking at papers to try to figure out what’s going on. And so I started from a really basic understanding. I started and I, then I started finding what professors are doing something close to what I want, what are their papers look like, maybe find one or two papers that felt very, I felt the closest to what I wanted to do. Look at who they were citing and sort of branch out from there to try to find who’s asking what and how they’re asking it. And then try to find the, sort of maybe a full array or a fuller picture of what the field was looking and even doing that I wasn’t super successful. I originally applied to CMU with a focus on using, there’s a big focus at CMU here about sensory networks, I’m in the civil engineering department.
And we’ve got a lot of people who are really focused in, on structural health monitoring, which involves putting a bunch of sensors in places and measuring things about the structure or about its use. There’s definitely, a, a, intersection between that and transportation. When you’re talking about data collection and putting sensors everywhere and understanding how a system is being used. And I sort of found CMU through that. And then only once I started digging further into that, that I find my current advisor who does nothing, has nothing to do with sensors, but was very much in line with this, this perspective that I was after of how do we combine transportation modeling and data and technology,
Ginette: A big thank you to Matthew Battifarano, and as always, head to datacrunch.com/podcast for our transcripts and attributions.
“Loopster” Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License