Reproducible and Robust Research with Open Source Tools
Vince Buffalo

#Bioinformatics
#Data
#Python
#R
#Unix_Shell
#Git
#SQLite
#Tabix
This practical book teaches the skills that scientists need for turning large sequencing datasets into reproducible and robust biological findings. Many biologists begin their bioinformatics training by learning scripting languages like Python and R alongside the Unix command line. But there's a huge gap between knowing a few programming languages and being prepared to analyze large amounts of biological data.
Rather than teach bioinformatics as a set of workflows that are likely to change with this rapidly evolving field, this book demsonstrates the practice of bioinformatics through data skills. Rigorous assessment of data quality and of the effectiveness of tools is the foundation of reproducible and robust bioinformatics analysis. Through open source and freely available tools, you'll learn not only how to do bioinformatics, but how to approach problems as a bioinformatician.
Table of Contents
Part I. Ideology: Data Skills for Robust and
Reproducible Bioinformatics
Chapter 1. How to Learn Bioinformatics
Part II. Prerequisites: Essential Skills for Getting Started
with a Bioinformatics Project
Chapter 2. Setting Up and Managing a Bioinformatics Project
Chapter 3. Remedial Unix Shell
Chapter 4. Working with Remote Machines
Chapter 5. Git for Scientists
Chapter 6. Bioinformatics Data
Part III. Practice: Bioinformatics Data Skills
Chapter 7. Unix Data Tools
Chapter 8. A Rapid Introduction to the R Language
Chapter 9. Working with Range Data
Chapter 10. Working with Sequence Data
Chapter 11. Working with Alignment Data
Chapter 12. Bioinformatics Shell Scripting, Writing
Pipelines, and Parallelizing Tasks
Chapter 13. Out-of-Memory Approaches: Tabix and SQLite
Chapter 14. Conclusion
About the Author
Vince Buffalo is currently a first-year graduate student studying population genetics in Graham Coop's lab at UC Davis in the Population Biology Graduate Group. Before starting his PhD in population genetics, Vince worked professionally as a bioinformatician in the Bioinformatics Core at the UC Davis Genome Center and in the Department of Plant Sciences. An obsessive programmer since he was a young teenager, Vince was drawn to the statistical and computational problems of genomics. He works on open source bioinformatics tools in his work and free time, and enjoys fly fishing and cooking when away from the computer.









