An Introduction to R and RStudio

an introduction to the world or R using the great RStudio

This is a workshop I put together during my PhD in attempt to help others wanting to learn and ultise the power of R in their studies/work.

In the mean time, a pdf version of this workshop can be downloaded here.

Note: This workshop was last updated on the 5th October 2012, and I have good intentions of updating it - therefore it is a work in progress and may contain some dead links and outdated software versions, however the core concepts are still accurate.

I’m also aiming to have an online version of this workshop avaiable in the not too distant future (hopefully in GitBook format), watch this space!

An Introduction to R: Preface

This document is a collection of notes and useful titbits that I’ve hoarded over the few years since I’ve been learning R. I just want to reiterate that I am by no means an expert, and am constantly picking up new methods and techniques. The great thing about R (and, as I’m quickly learning, most programming languages) is there are many ways to achieve the same end point. More often than not when you’re starting out it’s about what’s easy to grasp and what actually works, as opposed to beautiful code and elegant design. Yes, sure there may well be a ‘better’ way of doing things, but if you’ve managed to stumble your way through a project and have got an end result then congratulations; give yourself a pat on the back!

Enough about me, ‘what’s the deal with this workshop?’ you’re asking. Well since bioinformatics and high-throughput genomics applications are now quickly becoming ingrained in many labs, and data sets are expanding in size at a phenomenal rate – it is not uncommon to see some next gen sequencing data in the terabytes – there needs to be efficient and powerful ways to mine and analyse this data. R just happens to lend itself to this purpose, and is quickly becoming one of the major go-to tools for bioinformatics analysis. This in mind, R is still a programming language and generally these aren’t the easiest things to just pick up and learn.

So that’s where this workshop comes in; this is my first workshop for those interested in applications of R in medical research, bioinformatics and high-throughput genomics. Somewhat of a pilot, the aim is to offer an initial grounding in R and set the platform for possible future workshops that will deal with handling and analysis of genomic data, i.e. microarrays. I have attempted to make this document as easy to follow as possible, and hope that those attending (and others who were unable to make it) will be able to take it away and use it as a reference and tutorial they can work through at their own pace.

This document is broken into numerous sections1. The first couple offer information on where to get R/RStudio, how to install, and getting started. I’ve also included information about resources I’ve found invaluable whilst learning (in most cases these will have an html link). The actual workshop will be based on a core of four sections:

  1. A Brief Introduction to R Language and Programming
  2. Data Management in R
  3. Basic Statistical methods
  4. Basic Plotting/Graphing in R

You might notice that these four sections are generally lighter on text – and contain much more code – than the initial ‘introduction’ of this document. This is because I will be talking around the code and examples as we go. I have included some barebones comments and explanations that you can hopefully refer back to at a later date if needed. Also after each section there is an exercise that aims to reinforce the ideas covered previously. At the very end of the document you’ll find a ‘cheat sheet’ that details some of the more common and useful functions/parameters utilised in R.

Well that’s a brief synopsis of how things are planned to proceed, so good luck and I hope you enjoy.