Applied Data Science

Lecture notes for Applied Data Science course at Columbia University. It focuses more on the statistics edge, while also teaching readers some basic programming skill.

**Publication date**: 31 Dec 2012

**ISBN-10**:
n/a

**ISBN-13**:
n/a

**Paperback**:
141 pages

**Views**: 4,163

**Type**: N/A

**Publisher**:
n/a

**License**:
n/a

**Post time**: 29 Jun 2016 02:01:25

Applied Data Science

Lecture notes for Applied Data Science course at Columbia University. It focuses more on the statistics edge, while also teaching readers some basic programming skill.

From the Course Description:

More information is available at the course webpage.

The explosion of available data coinciding with the continued evolution of statistical and computational methods has resulted in a new breed of specialist. These data scientists use rigorous statistical methods to find meaning in data. Minimizing a loss function is not enough: Business and societal decisions hinge on the interpretation of these insights. The world of scientific computation is rapidly evolving. Quick-and-dirty scripts are not enough: A maintainable code base and collaborative development environment allows projects to productionalize and scale. A data scientist must wear many caps, we present two of them here.

Maintainable coding techniques will be taught using test-driven-development, version control, and collaboration. Code will be of the type found in the scikit-learn and statsmodels packages. Students finish the class having created a library on GitHub, and an understanding of several core statistical/machine-learning algorithms.

Case studies give students the opportunity to use these their own software on real world data sets. Here they develop intuition for extracting meaning from data. Students finish the class with a website/blog/portfolio, and experience with the translation:

Real world --> data --> scientist --> collaborators/coworkers --> policy-decision/data-product

More information is available at the course webpage.

Tweet

About The Author(s)

Daniel Krasner is the founder and CEO of Merriam Tech, a company whose products combines techniques from archival research - a focus on meaning and context - with statistical language processing to bring intuitive and insightful interaction to extensive collections of electronic text. Before that, he was a mathematician (Columbia University PhD) working in the intersection of low-dimensional topology, representation theory and homological algebra.

Ian Langmore is a Software Engineer at Google, an applied mathematician working as a data-scientist. His specialities are Monte Carlo simulation, machine learning, statistics, partial differential equations, scientific computation.

Book Categories

Computer Science
Introduction to Computer Science
Introduction to Computer Programming
Algorithms and Data Structures
Artificial Intelligence
Computer Vision
Machine Learning
Neural Networks
Game Development and Multimedia
Data Communication and Networks
Coding Theory
Computer Security
Information Security
Cryptography
Information Theory
Computer Organization and Architecture
Operating Systems
Image Processing
Parallel Computing
Concurrent Programming
Relational Database
Document-oriented Database
Data Mining
Big Data
Data Science
Digital Libraries
Compiler Design and Construction
Functional Programming
Logic Programming
Object Oriented Programming
Formal Methods
Software Engineering
Agile Software Development
Information Systems
Geographic Information System (GIS)

Mathematics
Mathematics
Algebra
Abstract Algebra
Linear Algebra
Number Theory
Numerical Methods
Precalculus
Calculus
Differential Equations
Category Theory
Proofs
Discrete Mathematics
Theory of Computation
Graph Theory
Real Analysis
Complex Analysis
Probability
Statistics
Game Theory
Queueing Theory
Operations Research
Computer Aided Mathematics

Supporting Fields
Web Design and Development
Mobile App Design and Development
System Administration
Cloud Computing
Electric Circuits
Embedded System
Signal Processing
Integration and Automation
Network Science
Project Management

Operating System
Programming/Scripting
Ada
Assembly
C / C++
Common Lisp
Forth
Java
JavaScript
Lua
Microsoft .NET
Rexx
Perl
PHP
Python
R
Rebol
Ruby
Scheme
Tcl/Tk

Miscellaneous
Sponsors