DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. PyDev of the Week: Data Scientist Joel Grus

PyDev of the Week: Data Scientist Joel Grus

A data scientist talks about the role of Python in his field, the process of writing a book, and how he got started with Python.

Mike Driscoll user avatar by
Mike Driscoll
·
May. 16, 19 · Interview
Like (2)
Save
Tweet
Share
4.81K Views

Join the DZone community and get the full member experience.

Join For Free

This week we welcome Joel Grus (@joelgrus) as our PyDev of the Week! Joel is the author of Data Science From Scratch: First Principles with Python from O'Reilly. You can catch up with Joel on his website or on GitHub. Let's take some time to get to know Joel better!

Can you tell us a little about yourself (hobbies, education, etc.)?:

In school I studied math and economics. I started my career doing quantitative finance (options pricing, financial risk, and stuff like that). I got very, very good at Excel, and I learned a tiny amount of SQL. But I kind of hated working in finance (and also I got laid off), so I joined an online travel startup as a "data analyst" doing BI stuff (lots of spreadsheets, lots of SQL, some very light scripting). That startup got acquired by Microsoft, who at the time had basically no idea what to do with my more-than-a-financial-analyst-less-than-a-software-engineer skillset (nor did I, really).

Then in 2011 I saw that the winds were blowing toward "data science," so I sort of BS-ed my way into a data scientist job at a tiny startup. I took a bunch of Coursera courses to fill in gaps in my knowledge, and then I learned to write (ugly) production code and discovered I really enjoyed building software. Through doing well in coding competitions I had the opportunity to interview for a software engineer job at Google, so I spent 6 really hectic weeks cramming computer science and then somehow passed the interview. I spent a couple of years at Google, and then I found I missed doing data and ML stuff, and so now I'm at the Allen Institute for Artificial Intelligence, where I build deep learning tools for NLP researchers. My current job is right at the intersection of deep learning and Python library design, which is a pretty great match for my interests.

I don't really have time for hobbies... I have an 8-year-old daughter, and I spend a lot of my free time with her, and then I keep agreeing and/or volunteering to write things and give talks and make livecoding videos, which takes up most of the rest. And then I have a podcast and a Twitter to stay on top of. I have long-term hobby goals of (1) learning jazz piano and (2) writing a novel, but I'm not really making much progress on either.

Why did you start using Python?

A long, long time ago I was taking a "Probability Modeling" class that was taught using Matlab. The site license for Matlab was only valid on-campus, which meant I couldn't work on the assignments at my apartment, which was where I preferred to work. I discovered that there was a Python library called Numeric (the predecessor of NumPy) that would allow me to do the numerical-simulation things I needed to do, so I learned just enough Python to be able to do my assignments. Several years after that I had a job, and I inherited a bunch of Perl scripts, and I really didn't want to maintain Perl code, so I started migrating them to Python, and the rest is history.

What other programming languages do you know and which is your favorite?

About 10-15 percent of my job involves writing JavaScript/React, which I actually really enjoy (I might enjoy it less if it were 100 percent of my job). The first year I was at AI2 I worked mostly in Scala, and after that I briefly worked on a project that was in Go. At Google I wrote primarily C++. The startup I was at before that used F#. For fun I used to write Haskell and PureScript. Part of me still dreams of having a Haskell/PureScript job, but at this point I'm so comfortable working in Python (and Python has so deeply entrenched itself as the language for doing machine learning) that it seems unlikely I'll ever make the switch.

What projects are you working on now?

In my day job I'm a core developer on AllenNLP, which is an open-source deep-learning library for NLP researchers. I just finished the second edition of Data Science from Scratch, should be available any time now. In April I'm giving a keynote talk at qcon.ai about modern NLP. This month I'm giving a talk at the "Reproducible ML" workshop at ICLR, and a comedic banquet keynote at the ASA Symposium on Statistics and Data Science, which means I need to write a 30+ minute standup routine about data science and statistics. Does this sound like too much to be working on? It's way too much to be working on. But each of these projects is individually exciting, and I can't imagine which one I would have said no to (I have a lot of trouble saying no to things).

Which Python libraries are your favorite (core or 3rd party)?

I think PyTorch is great (AllenNLP is built on top of it). I am pretty much the world's biggest proponent of type annotations, and (accordingly) I'm a huge fan of mypy and also the typing module. There's a lot of fun stuff in itertools, and you can really level up your Python by learning it. I also really like Flask — knowing how to prototype tiny Flask + React apps is a minor superpower for data scientists. tqdm (progress bars for iterables) comes in handy surprisingly often.

How did you end up writing a book on data science and Python?

I felt like an impostor compared to all the "famous" data scientists, so I thought that if I wrote a book I might feel less like an impostor. I cold-emailed O'Reilly with my proposal (which was originally way too ambitious), and they were very skeptical, so I kept sending them sample chapters, and then eventually they asked me "if we keep being indecisive are you going to eventually send us the whole book?" and I said probably, and then they said OK we'll publish it, and then I spent basically all my free time for the next year writing it.

Now that it's 2019, I feel extremely guilty that there is a Python 2.7 book out there with my name on it, so I proposed a second edition where the code is upgraded to 3.6 (with type annotations), and I took the opportunity to make the code cleaner and freshen up the jokes and add some new material on things like deep learning and NLP and data ethics.

What lessons did you learn in writing a book?

The primary lesson I learned was that all the things I thought I understood I didn't actually understand. For example, I thought I understood hypothesis testing, but when I started trying to *explain* it I discovered that I didn't understand it at all, and then I had to *actually* learn it before I could write the chapter. Many of the topics in the book were similarly humbling.

When you're coding it's easy to get overly clever, and writing a book is a good way to disabuse yourself of that habit. I tried very hard to make the code examples in the book as *clear* as possible, and that's a practice that's really spilled over into all the code I write.

Another lesson is that you're not going to please everyone. There are a lot of people who really like my book (which is extremely gratifying!), but also there are some people who hate it. That can be really dispiriting. A good book is deeply personal, which means that when you publish it you're really putting yourself out there for judgment. That's a tough thing to do.

For example, in the second edition I used type annotations everywhere. I deeply believe it was the right choice (both morally and pedagogically), but also I know that some people are going to absolutely *hate* the type annotations, and I'm still steeling myself for those reactions.

Is there anything else you'd like to say?

Buy the 2nd edition of Data Science from Scratch when it comes out!

Read other people's code and get them to read your code. It's one of the best ways to improve as a coder. I've had quite a few people tell me they've started using type annotations or NamedTuples or asserts or various other things because they saw me using them and how they made my code better. That's one of the best feelings for me, when someone tells me that they're a better coder or a better data scientist on account of my book or my blog or one of my videos.

Finally, don't use mutable objects (e.g. lists) as default values for function arguments! Everyone makes this mistake at least once, and it's always a pain in the ass to figure out what you did wrong.

Thanks for doing the interview!
Data science Python (language) Deep learning Book career

Published at DZone with permission of Mike Driscoll, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Tracking Software Architecture Decisions
  • 10 Best Ways to Level Up as a Developer
  • Comparing Map.of() and New HashMap() in Java
  • Solving the Kubernetes Security Puzzle

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: