Rate this book

Agile Data Science: Building Data Analytics Applications with Hadoop

Name: Agile Data Science: Building Data Analytics Applications with Hadoop
Rating: 3.33 (10 reviews)
ISBN: 9781449326265

Russell Jurney

Rate this book

Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps.

GenresReferenceProgrammingComputer ScienceNonfictionCodingComputersTechnology

175 pages, Paperback

First published December 22, 2012

12 people are currently reading

234 people want to read

About the author

Russell Jurney

6 books3 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

12 (16%)

4 stars

16 (21%)

3 stars

31 (42%)

2 stars

12 (16%)

1 star

2 (2%)

Displaying 1 - 10 of 10 reviews

Jonas

17 reviews2 followers

June 28, 2016

Agile Data Science sets out to explain how to apply agile methodology in the field of data science. I would have liked more information on team formation and work processes, which the book covers pretty briefly. Instead the author focuses more on the tools (some of which are pretty dated at the time of reading) and one illustrative example application. Nevertheless, I find the book worth skimming through.

Lolo

191 reviews1 follower

July 10, 2017

A good book for data science, but why put the "Agile" in the title if you're not gonna focus on this aspect?!

The majority of the book is about data science tools. Tedious step-by-step guides. So if you want to learn about data science tools it's a good book. If on the other hand you want to learn how to apply Agile methodologies to Data Science projects (like the title of the book implies) this is not the book for you.

The book was ok, but there are much better books (or video tutorials) about Data Science than this book.

computer-science

André Gomes

Author 5 books114 followers

April 2, 2014

Very nice introduction to data science with practical exemples and exercises.

It makes me think about how much unexplored knowledge is hidden in all these data our applications generate everyday.

The author uses many interesting tools such as apache pig, apache avro, mongo db, elastic search, wonder dog and flask.

Let's go deeper into it...

bluesoft safari-books software

Luis

54 reviews1 follower

February 25, 2019

In my first read I only went through the first three chapters, which contained the general principles of the book, and a VERY interesting methodological framework for thinking about data science in general. Those principles (iterate, deliver intermediate products, etc.) have been extremely useful for my day to day at work (Banco de Bogotá currently). Particularly, the one about "scaling the pyramid of data value" is the absolute best one. The rest of the book is very much specific, showing how to design and application from start to finish using the whole modern Hadoop software stack. At some point I might come back to it, but for now the guiding principles is what I wanted (the specifics might not be as relevant depending on the software one ends up using. Currently it seems like at work is going to be Hortonworks as the Hadoop distribution, so I will probably focus on book where the details are aimed there).

Liamarcia Bifano

8 reviews2 followers

December 3, 2018

It is good to have some general idea about how the technologies are used but it keeps with just one type of infra and doesn't make any comparison, pros and con with others available

33 reviews

Love the concept

226 reviews30 followers

May 6, 2014

One of the problems with data science is that any description of what is encountered takes on the appearance of a mythical unicorn, noone person could possibly have all of the skills required. And it gets worse when you add to the standard set of statistics, domain knowledge, and programming the ability to deploy the application into a high speed environment. This book is not going to make a data scientist an expert in running a data center, but it is useful to give someone who has the rest of the skills an understanding of the environment their work will be deployed into.

One of the conflicts between the data scientist/analyst and information technology groups is that while the data scientist gives the data owned by the organization its value, IT is charged with storing the data and providing the access. And in a high velocity, high volume environment of big data, not understanding how the architecture works can lead to the data scientist creating valid solutions that cannot be applied in the actual day to day working environment. That is where this book comes in. The book has associated virtual machines in software repository so that the data scientist who does not know anything about infrastructure and the software stack that the data and the analysis rides on can see how everything fits together.

The book title is misleading. This is not a book about data analytics. This is a book for data analysts so they know how their analytical application is deployed and applied to day-to-day use in enterprise environments. For that reason it is useful.

Disclaimer: I received a free electronic copy of this book as part of the Oreilly Press Blogger program.

Chris

38 reviews3 followers

January 14, 2016

Typical techi book, not a lot of detail, lots of downloading instructions and a general frame-work on how to approach things. Not bad for 170 pages, but not really clear and seems to be tailor-made more for programmers.

If you want to tackle this book and get the most out of it, you should read up on Web Development, understand basic MVC style web architecture, know what Hadoop and MongoDB are on a higher-level, know how JSON works and have done some Python programming.

Upom

229 reviews

March 20, 2014

Interesting book on how analytics applications can be developed quickly. A bit haphazardly written, but a lot of decent ideas for a budding data scientist to play around with

21st-century cloud-computing computer