Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps.
Agile Data Science sets out to explain how to apply agile methodology in the field of data science. I would have liked more information on team formation and work processes, which the book covers pretty briefly. Instead the author focuses more on the tools (some of which are pretty dated at the time of reading) and one illustrative example application. Nevertheless, I find the book worth skimming through.
A good book for data science, but why put the "Agile" in the title if you're not gonna focus on this aspect?!
The majority of the book is about data science tools. Tedious step-by-step guides. So if you want to learn about data science tools it's a good book. If on the other hand you want to learn how to apply Agile methodologies to Data Science projects (like the title of the book implies) this is not the book for you.
The book was ok, but there are much better books (or video tutorials) about Data Science than this book.
In my first read I only went through the first three chapters, which contained the general principles of the book, and a VERY interesting methodological framework for thinking about data science in general. Those principles (iterate, deliver intermediate products, etc.) have been extremely useful for my day to day at work (Banco de Bogotá currently). Particularly, the one about "scaling the pyramid of data value" is the absolute best one. The rest of the book is very much specific, showing how to design and application from start to finish using the whole modern Hadoop software stack. At some point I might come back to it, but for now the guiding principles is what I wanted (the specifics might not be as relevant depending on the software one ends up using. Currently it seems like at work is going to be Hortonworks as the Hadoop distribution, so I will probably focus on book where the details are aimed there).
It is good to have some general idea about how the technologies are used but it keeps with just one type of infra and doesn't make any comparison, pros and con with others available
One of the problems with data science is that any description of what is encountered takes on the appearance of a mythical unicorn, noone person could possibly have all of the skills required. And it gets worse when you add to the standard set of statistics, domain knowledge, and programming the ability to deploy the application into a high speed environment. This book is not going to make a data scientist an expert in running a data center, but it is useful to give someone who has the rest of the skills an understanding of the environment their work will be deployed into.
One of the conflicts between the data scientist/analyst and information technology groups is that while the data scientist gives the data owned by the organization its value, IT is charged with storing the data and providing the access. And in a high velocity, high volume environment of big data, not understanding how the architecture works can lead to the data scientist creating valid solutions that cannot be applied in the actual day to day working environment. That is where this book comes in. The book has associated virtual machines in software repository so that the data scientist who does not know anything about infrastructure and the software stack that the data and the analysis rides on can see how everything fits together.
The book title is misleading. This is not a book about data analytics. This is a book for data analysts so they know how their analytical application is deployed and applied to day-to-day use in enterprise environments. For that reason it is useful.
Disclaimer: I received a free electronic copy of this book as part of the Oreilly Press Blogger program.
Typical techi book, not a lot of detail, lots of downloading instructions and a general frame-work on how to approach things. Not bad for 170 pages, but not really clear and seems to be tailor-made more for programmers.
If you want to tackle this book and get the most out of it, you should read up on Web Development, understand basic MVC style web architecture, know what Hadoop and MongoDB are on a higher-level, know how JSON works and have done some Python programming.
Interesting book on how analytics applications can be developed quickly. A bit haphazardly written, but a lot of decent ideas for a budding data scientist to play around with