Applied Data Science

Applied Data Science

Lecture notes for Applied Data Science course at Columbia University. It focuses more on the statistics edge, while also teaching readers some basic programming skill.

Publication date: 31 Dec 2012

ISBN-10: n/a

ISBN-13: n/a

Paperback: 141 pages

Views: 3,841

Type: N/A

Publisher: n/a

License: n/a

Post time: 29 Jun 2016 02:01:25

Applied Data Science

Applied Data Science Lecture notes for Applied Data Science course at Columbia University. It focuses more on the statistics edge, while also teaching readers some basic programming skill.
Tag(s): Big Data Data Science Machine Learning Statistics
Publication date: 31 Dec 2012
ISBN-10: n/a
ISBN-13: n/a
Paperback: 141 pages
Views: 3,841
Document Type: N/A
Publisher: n/a
License: n/a
Post time: 29 Jun 2016 02:01:25
From the Course Description:
The explosion of available data coinciding with the continued evolution of statistical and computational methods has resulted in a new breed of specialist. These data scientists use rigorous statistical methods to find meaning in data. Minimizing a loss function is not enough: Business and societal decisions hinge on the interpretation of these insights. The world of scientific computation is rapidly evolving. Quick-and-dirty scripts are not enough: A maintainable code base and collaborative development environment allows projects to productionalize and scale. A data scientist must wear many caps, we present two of them here.

Maintainable coding techniques will be taught using test-driven-development, version control, and collaboration. Code will be of the type found in the scikit-learn and statsmodels packages. Students finish the class having created a library on GitHub, and an understanding of several core statistical/machine-learning algorithms.

Case studies give students the opportunity to use these their own software on real world data sets. Here they develop intuition for extracting meaning from data. Students finish the class with a website/blog/portfolio, and experience with the translation:

Real world --> data --> scientist --> collaborators/coworkers --> policy-decision/data-product

More information is available at the course webpage.




About The Author(s)


Daniel Krasner is the founder and CEO of Merriam Tech, a company whose products combines techniques from archival research - a focus on meaning and context - with statistical language processing to bring intuitive and insightful interaction to extensive collections of electronic text. Before that, he was a mathematician (Columbia University PhD) working in the intersection of low-dimensional topology, representation theory and homological algebra. 

Daniel Krasner

Daniel Krasner is the founder and CEO of Merriam Tech, a company whose products combines techniques from archival research - a focus on meaning and context - with statistical language processing to bring intuitive and insightful interaction to extensive collections of electronic text. Before that, he was a mathematician (Columbia University PhD) working in the intersection of low-dimensional topology, representation theory and homological algebra. 


Ian Langmore is a Software Engineer at Google, an applied mathematician working as a data-scientist. His specialities are Monte Carlo simulation, machine learning, statistics, partial differential equations, scientific computation.

Ian Langmore

Ian Langmore is a Software Engineer at Google, an applied mathematician working as a data-scientist. His specialities are Monte Carlo simulation, machine learning, statistics, partial differential equations, scientific computation.


Book Categories
Sponsors
Icons8, a free icon pack