Statistics 141 - Statistical Computing

This class is an introduction to statistical and, more generally scientific, computing. The goal is for you to learn to think about and express statistical and data analysis tasks computationally. One of the primary focuses (focii) in this class is working with data. Manipulating data into the right form in order to do analysis and create graphical displays is an extremely important and significant element of practical statistics and indeed scientific activity.

Computing is vital for the practice and research of statistics and data analysis. Computing plays an essential role in simulation and computer experiments for exploring theory. We also use computing for all data analysis involving fitting models, creating graphical displays, etc. We use more sophisticated and customized algorithms, e.g Markov Chain Monte Carlo, for fitting more complex models, and for dealing with large datasets. And before we fit models, we have to manipulate our data to work on the approriate subsets, make transformations of variable and compute derived variables. And nowadays, we can create much more interesting, informative and interactive graphical displays using Web tools such as Google Earth and Google Maps.

So much data is now available online in various forms: text, comma-separated value files, HTML, HTML forms, XML, binary representations. When one works in a biology lab, one frequently uses data from on-line databases such as NCBI and KEGG. A vast amount of geophysical data is available related to climate and climate change. Many sites publish useful data merely as HTML tables. And various "social" and commerce sites, as well as biological and geophysical sites, provide explicit Web application interfaces via REST or SOAP technologies.

Nowadays, computers are sufficiently fast that we usually work with high-level languages such as R, MATLAB, Perl and Python. So this course will focus on the R environment and language. We'll explore the langage and endeavor to understand the concepts of the language rather than just learning commands. We'll also focus on exploratory data analysis and common sense investigations by looking at "real" data that is hopefully of interest. We'll learn ways to fetch data in different formats and from different sources. We'll look at ways to transform the data and then explore it graphically and with statistical methods.

This has been, in the past, a challenging class. It demands a good deal of time to do the assignments. In this class, learning by doing is especialy true. The lectures give you the concepts, but the exercises and projects give you the experience and knowledge. It is vital that you do not leave the exercises/projects to the last minute. You will not learn much from them this way and they will be immensely frustrating. And you learn by trying things and figuring out what went wrong and why things didn't work. This is how you learn the abstractions and gain the experience to attack other problems efficiently.

In this class, we'll look at data that is hopefully interesting to you. Some of the topics include


Duncan Temple Lang <duncan@wald.ucdavis.edu>
Last modified: Sat Sep 19 08:18:55 PDT 2009