What does it take to become a data scientist? Nic Ryan has been in the field for over a decade and answered thousands of questions from people looking to get into the field. In this episode, he talks about his journey into data science and his experiencing mentoring aspiring data scientists, giving advice to both beginners and seasoned professionals.
Nic Ryan: I think there’s sometimes a problem in data science education, and what people find interesting is they tend to focus on the algorithms, which as you know from doing data science projects is really just the last little bit. There’s tens or even sometimes hundreds of decisions steps that are made until you get to that particular point.
Ginette: I’m Ginette.
Curtis: And I’m Curtis.
Ginette: And you are listening to Data Crunch.
Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world.
Ginette: A Vault Analytics production.
Ginette: Ad space
Curtis: Let’s introduce you to our guest: Nic Ryan. He is an experienced data scientist and LinkedIn influencer who has helped a lot of aspiring data scientists in their journey into the profession. He’s been part of many different data teams, small and large, in big companies and startups, and he wrote a book called, “The Data Scientist’s Journey. The Guide for Aspiring Data Scientists,” which is based off the thousands of questions he’s been asked about becoming a data scientist.
Nic: It started off with failure. Originally, I wanted to go over to the States to play basketball, so I’m a failed basketball player, and there’s a couple reasons why I didn’t make it: one is I wasn’t tall enough to be a small forward, which is a bit ironic. I’m only 6’2”, but probably the more important reason is I wasn’t very good, but I didn’t know that at the time, so I didn’t get a scholarship to play basketball, but I did get a scholarship to do actuarial studies. So it’s not a bad backup plan.
But from there, I ended up falling into more of the stats side of things, of insurance, so the statistical modeling, pricing, fire, and theft, I really enjoyed that kind of stuff, so over time, I did more of that. Did some of my post-grad actuarial exams, and I was doing some reading on the weekends and finding out more about stats and a bit about code and a bit about R, and what really did it for me was having an incredibly long train ride to get to work. It was a couple hours each way, and so this is of course, this is the era of MOOCs, and rather than just talking to people, I just ended up joining the MOOCs, and so, really enjoyed that, and this whole thing of data science has just kind of grown around me, and I ended up working for one of the banks and doing their credit scoring and consulting with different banks for a long period of time, and I got a call out of the blue to, a guy just gave me a plane ticket and said come talk to us. So I flew there, and they offered me what was really a head of data science role, so there was a team overseas and a couple teams in Australia doing data science, and yeah, we did some pretty awesome things with NLP and bank statements and built some pretty sophisticated risk models; it was probably best in the country at that time.
It’s about 60 miles away from Sydney where I worked, and so it was a real opportunity. It was probably two hour door to door each way, and that was the other thing as well: that was a long time away from family, which wasn’t cool. I had a couple young kids. That’s part of the reason I have my own business now is that I’ve spent too much time away from my daughters. The result of it being I had a whole heap of dead time that I could either use or not use, and so I was able to teach myself code and teach myself some more stats and machine learning and stuff pretty quickly when you have a couple hours of dead time each day, you become pretty good, pretty quickly, and so that’s what I encourage other people to d o as well. If you find some dead time, you probably only need a couple hours a day, you know, you get pretty good, pretty quickly.
Ginette: Like Nic says here, finding time to self-educate is really important for getting into the data science field. It’s a constantly evolving profession where even people who’ve been in the field a long time have to continually learn the latest. A great place to take online courses is Brilliant.org. Their classes help you understand algorithms, machine learning concepts, computer science basics, probability, computer memory, and many other important concepts in data science. They also have a new course in Python programming language coming soon. The nice thing about Brilliant.org is that you can learn in bite-sized pieces at your own pace. Their courses are entertaining, challenging, and educational, and they go beyond lectures to help you actively learn.
If you’d like to deeply understand machine learning and data science, give them a try by going to brilliant dot org slash Data Crunch. They were good enough to sponsor this episode, and using this link lets them know that you came from us, and you can sign up for free, preview courses, and start learning! Also, the first 200 people that go to that link will get 20% off the annual premium subscription. Once again, that’s brilliant dot org slash Data Crunch to understand machine learning!
Ginette: There were a couple of skills in particular that qualified him for the head of data science role he landed with the bank, a combo of soft and hard skills.
Nic: I know for this position they tried to hire someone for about four years who was technical but as well could communicate and lead a team. They tried even overseas and here in the States, and they couldn’t really find any one they really liked, and I think they found me and thought, “well, it’s close enough.” And, so, yeah, I was able to work for those guys, and they’re a great bunch of people, really, really nice, and a good team. Overseas and based in Ukraine, in Kiev, and everyone over there is unbelievably smart. I mean, everyone speaks about six languages and probably double as many computer languages, and they’re just brilliant. I really see a lot of talent coming out of some of those countries there, and also Germany, Europe, and there’s just so much talent around the world, and it’s really eye opening to see it.
Curtis: Overtime, Nic has become a LinkedIn influencer, which happened organically.
Nic: I write code with a pug on my lap, and she’s probably the second most experienced data scientist in the region, I suppose. And I just started engaging in some of the conversations on LinkedIn, and Randy Lao and some of the people, I just started chatting with them, and some of these guys are just awesome to have access to them, and private messages, and just chatting with Kyle and Kate Strachnyi and people like that, it’s just great. And over time, I don’t even know how it happened, 500 connections went to 1,000 went to 2,000 went to like 15,000. I just posted stuff I thought was interesting or relevant, and it just kind of grew from there, and it’s been really great because I’m five hours away from the nearest city here, so there’s no data scientists around for a half a day’s drive, but I feel absolutely part of the community here.
Nic: It’s more like people had questions, and I’ve had some experience as well in this field before it was a field, like yourself, and I’ve also managed teams, and I’ve worked with startups, and I’m a technical advisor for a startup as well, and I’ve seen a lot in a short space of time, from everything from insurance to banking to online advertising to agriculture and teams of four and teams of forty, and so, I’ve been very fortunate in my career to see things work really well and see things that haven’t perhaps gone so well, and so just the benefit of seeing things there, I’ve been able to offer some advice from things I’ve seen, and just to help the conversation along, and it just went from there.
Ginette: After helping out many aspiring data scientists, what insights does Nic have about getting into the field?
Nic: You need to know your destination. I think that’s probably the key thing as well is to work out where you want to end up and working out your motivation for why you really want to be a data scientist because that’s your north star that will guide you and will guide what you need to know, and it’s a good way to keep yourself on track. So that’s probably the first thing, and secondly, I think sometimes there’s a problem in data science education, and what people find interesting is they tend to focus on the algorithms, which of data science projects is really just the last little bit. There are tens and even hundreds of decisions made until you get to that particular point. And so someone can become very useful very quickly by knowing the basics.
So in some of the basics, like I see SQL everywhere. You’re going to be interacting in some kind of SQL database of some type, and so I would take someone who can wrangle data in SQL and even using Python or R to take it a bit further, and that’s a very valuable skill, and it’s often missing, and so that might be your first step in the door, and then if you can do some exploratory plots in Python or in R, that’s a really good place to be able to start to add value to the business right away, and then obviously the modeling on that is great, and even things like reproducible research as well will set you apart, knowing even R markdown or Jupyter Notebooks something like that will be really good, and obviously having the communication skills and the passion as well, and being business focused so not trying to do something that’s interesting and also asking what’s going to be ROI impacting, and so that’s a way that you can map out the analytics pipeline, and you can work out how to you can prepare yourself to be useful, but it will be different for different things you aspire to, but there are kind of the common elements I see.
Curtis: Nic also offers insight into a common pitfalls people learning data science will fall into, and what to do instead.
Nic: You read Wes Mckinney’s book, Python for Data Analysis, and he says people can spend too much time trying to learn the entire thing, and I see this a lot as well. People jump on these tutorials, and they stay there for a long time because it’s relatively safe, but I think you need to get yourself to a point where you feel confident enough to be doing projects, and just at that point. You don’t want to be doing tutorials again and again and again over a long period of time because then when you actually have a real project there’s a disconnect there, and I often see that as well. That’s probably a big problem that people have is that they can go through the tutorials and they can do it, but then when they’re presented with a real world problem, they really struggle, and that’s what’s really going to get you hired, if you have evidence of a real world problem that you can do that’s actually close to the hiring company, what they have, then that’s really going to set you apart. And so in terms of putting a timeline on it is hard because it’s different for everyone, but I would say don’t think years; think how can I get my hands going on real data as soon as possible, and even if I fail, I can just Google something because another traveler has come across the same problem as you, more than likely, and you can leverage the collective experience of the Internet to solve your problems, so I think that’s where you really learn is when you’re doing those projects.
Ginette: And once you’ve learned some of the basics and have been able to work on a real world project, here’s Nic’s salient advice for how to land a job.
Nic: I didn’t realize the importance of networking until probably the last couple years, but it is just so important to network and become that guy or girl who has the coffee dates with everyone and knows everyone and goes to all the meetups, and if you’ve got an in in a particular company, like if you know someone there, then there’s an untapped job market for roles that aren’t even advertised or roles based on personal recommendation, and so putting your resume on a pile with other resumes is one option, but having that contact and that repore with people in the companies is obviously a much better option, so I would encourage people to network because that’s the secret to getting hired.
It’s really good to have connections overseas, but you also need to reach out to people, as well, in your local area ‘cause you just never know, like you just never know, and someone that you can actually have coffee with and talk to is really valuable, and a lot of business is still done over coffee, over lunch, and so tap someone and meet them in person and shake their hand is awesome.
Having said that, one of my really good friends and mentors, JT Kostman, he’s based in New York, and he’s like an honorary Australian ‘cause I’ve seen him twice in the last couple months. He’s been out here heaps. And so it’s pretty amazing, meeting those people as well, some people that I shouldn’t under normal circumstances ever have the chance of meeting someone of his caliber, but being able to call him a buddy is really weird.
Curtis: If networking sounds like something scary, inauthentic, or forced to you, it doesn’t have to be. Nic gives some advice on how he has been able to connect so well with others on LinkedIn.
Nic: I just think if there’s anything of interest or anything that I think will help someone on their journey, then I’ll post it. Or if someone asks a question, then I’ll answer, and if there’s something interesting, someone tags me, then I’ll try to answer, but it is pretty hard to answer fifty messages a day, and that’s why I obviously wrote the book. There’s a limited amount of space to answer some of those questions of how to get a job and how to get into the field, and if I write a 250-page book, I can probably go over it more thoroughly.
Ginette: The book Nic is referencing is called “The Data Scientist’s Journey. The Guide for Aspiring Data Scientists,” and we’ll link to it in the show notes. As he just mentioned, it’s all the advice he’d give someone getting into the field.
Nic: I did have a lot of material based on the questions I’ve been asked by hundreds or maybe even thousands of people, of all I’ve read and seen, and so there was a huge amount of material there. It was then it was a matter of arranging it and sequencing it and then also working out what worked for me and what wasn’t so good form me back in the day, and what I did well and what I didn’t do well, and what I’ve seen from others as well with other people for what they’ve struggled on or things they’ve done really well, and then putting that all together to help someone on their journey and help save them a bit of time.
Some people will say things like what’s the quickest, easiest, shortest way for me to become a data scientist, and I don’t know, I mean I’ve been in this game for over ten years, and you’re still always learning, and so it’s more of a passion thing, and you need to be able to learn every day, and you need to be comfortable with learning new things, and you need to be comfortable working out what you need to know and what you don’t know and the best way of doing it, so learning how to learn is a real skill, and learning what works for you and setting aside time to learn every day or as often as you can is a real critical piece, I think, so you can point some resource to people, and you can suggest some things, but also the onus is on them as well, to work out where they want to be and what they need to know, and so you have to be quite introspective to work out what you really like doing, and it’s the old “know thyself.” And hopefully this book will be able to help some people figure that out and to offer some practical advice and tips, so you do have to look inward for a bit and work out what excites me. And it’s different for different people, and for me as well to have that niche of NLP of bank statements that I absolutely adore. That’s weird.
Curtis: Beyond helping people get into the field, we talked to Nic about some of the most interesting projects Nic’s seen in data science. Some people might think bank statements are boring, but Nic would disagree.
Nic: There’s a requirement with short-term lending in Australia to look at ninety days of bank statement data with loan decisions, so what you can actually do is look at the text of bank statement, and you can classify that line item as groceries or taxi, income, mortgage, rent. Then those features can be fit into a credit scoring algo, and then that’ll work out if someone is a good or bad credit risk based on past data with an outcome of what good or bad looks like, and it kind of goes further than that as well, because you can look at things like how many times are they going to take money out from ATMs, is that an indicator of fraud, or what’s their gambling spending like, do they have tax liability, do they have dependents, do they have a joint account, so someone else’s income is going into that as well, so really you can group similar transactions based on string distance ideas as well, so once you have a grouping you can work out the days between those groupings, so you can work out how often they get paid, and you can map out their reliability and regularity of their payments, and you can even identify their future for layoffs and severance pay if there was job loss. There’s so much you can do.
Now what I know is most banks don’t use the bank statements for lending decisions, and from what we saw with some of the plots that we did of actual expenses versus what someone stated, there was no correlation what people thought they were spending and what they were actually spending. Ask me how much I spent on groceries last week, I have no idea. And yet you see, so many banks using the application form with a bit of credit bureau information as the basis of lending, so they’re using data that’s incorrect in some of their lending. It’s not that people are lying. They just don’t know. Sometimes if they say, I have no credit cards, and they have ten, well that’s lying.
So I really think for banks, that’s the next thing, and probably in the States banks are probably better at doing that, but using that bank statement data they’re sitting on a basis of lending, fraud, and other checks as well is going to be pretty important going forward here.
Ginette: What does Nic see for the future of data science?
Nic: I think it’s pretty exciting times, and I think as well, there’s going to be some real winners and some real losers as well in the near future, and you’re probably seeing it more over there, and there’s a real arms race going on in the business world to be able to incorporate AI and machine learning in decisions, and again with my experience, I mean if you have someone doing manual processing and you have someone who is using the bank statements and they’ve got automated decisioning and you get your loan in 20 minutes online, people are busy, so all the customers will gravitate to that one bank, so it’s either going to be big banks or little startups just chewing away at the flesh of these behemoths like a little piranha.
So the future is exciting either way, if you’re in a big company adopting AI or your competitors to the point where you’ll be around, and they won’t. If you’re learning AI and machine learning and you’ve got some buddies, it’s never been cheaper and easier to form a startup and have a go, and some of the tech that’s coming out is really exciting too. Even in Australia we’re talking about smart cities and are collecting data, and I see huge opportunities for data scientists there. Wherever you look, like self-driving cars, there’s so much happening in the field right now. It’s an awesome time to get started in data science. We’re seeing this field exploding, and we’re just at the start of it.
The one thing as well that I see is a lot of people find it tough to find a job, I would just say hang in there. It’s all persistence, and it’s the hardest thing you probably have to do in your career is to get that first job, and then it’s all much smoother sailing from there. It’s one thing to say, hang in there, but if you’re getting rejection after rejection, it is about resilience. There are consultant gigs I don’t get. I still get rejected. It doesn’t end once you’re in the field and once you’re there. Even if it requires something that’s less than a machine learning engineer or data scientist, even if you have to make a sideways move if you’re already working, or if you get a first job as a data analyst or you’re building dashboards or something. Really get yourself in there and working with data, and that’s what advice I give. See it as a long term play over time.
Ginette: A huge thank you to Nic Ryan for chatting with us. If you want to find out more about the book he wrote, “The Data Scientist’s Journey. The Guide for Aspiring Data Scientists,” you can find it on his website, datafriends.rocks or go to datacrunchpodcast.com and follow the link.
Also, if you’ve liked this episode or a prior episode, we’d love it if you’d leave us a review.
“Loopster” Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License