Friday, July 31, 2009

Scott-Long: Workflow of Data Analysis

Thanks to Martin for pointing this out to me. I have not read this yet but it look precisely what is needed for people starting empirical research.

link here

The Workflow of Data Analysis Using Stata, by J. Scott Long, is an essential productivity tool for data analysts. Aimed at anyone who analyzes data, this book presents an effective strategy for designing and doing data-analytic projects.

In this book, Long presents lessons gained from his experience with numerous academic publications, as a coauthor of the immensely popular Regression Models for Categorical Dependent Variables Using Stata, and as a coauthor of the SPOST routines, which are downloaded over 20,000 times a year.

A workflow of data analysis is a process for managing all aspects of data analysis. Planning, documenting, and organizing your work; cleaning the data; creating, renaming, and verifying variables; performing and presenting statistical analyses; producing replicable results; and archiving what you have done are all integral parts of your workflow.

Long shows how to design and implement efficient workflows for both one-person projects and team projects. Long guides you toward streamlining your workflow, because a good workflow is essential for replicating your work, and replication is essential for good science.

An efficient workflow reduces the time you spend doing data management and lets you produce datasets that are easier to analyze. When you methodically clean your data and carefully choose names and effective labels for your variables, the time you spend doing statistical and graphical analyses will be more productive and more enjoyable.

After introducing workflows and explaining how a better workflow can make it easier to work with data, Long describes planning, organizing, and documenting your work. He then introduces how to write and debug Stata do-files and how to use local and global macros. Long presents conventions that greatly simplify data analysis—conventions for naming, labeling, documenting, and verifying variables. He also covers cleaning, analyzing, and protecting your data.

While describing effective workflows, Long also introduces the concepts of basic data management using Stata and writing Stata do-files. Using real-world examples, Stata commands, and Stata scripts, Long illustrates effective techniques for managing your data and analyses. If you analyze data, this book is recommended for you.

No comments: