Data Science Live Book: An Open-Source Book About Data Science, Analytics, and More

DZone 's Guide to

Data Science Live Book: An Open-Source Book About Data Science, Analytics, and More

This completely free book will teach you about data science, machine learning, data analytics, data preparation, and more!

· Big Data Zone ·
Free Resource

I'd like to share with you the book I've been writing for more than a year. It's open-source and you can check it out whenever you want! I invite you to read the book online and/or download it here

Image title


This is a book to learn data science, machine learning, and data analytics with tons of examples and explanations around several topics like:

  • Exploratory data analysis

  • Data preparation

  • Selecting best variables

  • Model performance

Most of the written R code can be used in real scenarios! I worked on the funModeling R package while I was writing this book, so you'll notice it many times as you read along.

How About Some Examples?

This a playbook full of data preparation receipts. For example, in the Missing Values chapter, you'll find out how to input and convert these values into something useful for both analytics and predictive modeling. Additionally, in the Outliers chapter, you'll get to know to some methods that spot outliers based on different criteria. For instance, funModeling contains a function that can help you process all of your data at once...

Or, more conceptually, say we have a numeric variable and we need to convert it into a categorical variable, or vice-versa — do we have to convert it or can we just leave it as it came?

And so on and so on...

Book's Philosophy

All of the chapters of the book are interrelated, so you can start with any of them. My apologies if the number of links distracts from the reading; I wanted it that way just to show how all the machine learning concepts are related.

There is a lot of effort in justifying what the book states. Yet, this is not enough; the reader can replicate and improve the examples and thus generate their own knowledge.

To encourage critical thinking without taking any statement as the "true truth" is really important in this sea of books, courses, videos, and any kind of technical material to learn. This book is just another view from the data science perspective.

Next Releases?

I'm not sure, but I have some ideas about what I'd like to add — such as more information on predictive model creation and validation, validating clustering models, dimension reduction techniques, and how to become a data scientist, among others.

Some Metrics

Here's a screenshot from Google Analytics (October 25, 2017) showing the top four most-viewed chapters:

Image title
I think Profiling is the most-viewed section simply because it is the first chapter after the index. But the number of entrances — which can be seen as the visitors — was beyond my expectations 18k+ (visiting around 47k+ pages).

Many thanks to those who have already read something!

I Put Some Random Errors...

...both technical and grammatical. The problem is, I don't know where! So if you want to raise your hand and shout, "That's not correct! I think the correct form is... {replace-with-your-detailed-answer-here}," I invite you to report drop me an email (pcasas.biz -at- gmail.com) or comment below.


If you learn anything new in this book or it somehow helps you somehow save time at work, you can support the project by acquiring the portable version. It's name-your-price starting at $5. You'll receive an email to download it in three formats. I will note that there is no difference between the portable and web versions!

You can reach me on twitter.

Thanks for reading :)

big data, data analytics, data science, machine learning

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}