An Introduction to R and RStudio

An Introduction to R and RStudio

by Miles Benton

This document is a collection of notes and useful titbits that I’ve hoarded over the few years since I’ve been learning R. I just want to reiterate that I am by no means an expert, and am constantly picking up new methods and techniques. The great thing about R (and, as I’m quickly learning, most programming languages) is there are many ways to achieve the same end point. More often than not when you’re starting out it’s about what’s easy to grasp and what actually works, as opposed to beautiful code and elegant design. Yes, sure there may well be a ‘better’ way of doing things, but if you’ve managed to stumble your way through a project and have got an end result then congratulations; give yourself a pat on the back!

Enough about me, ‘what’s the deal with this workshop?’ you’re asking. Well since bioinformatics and high-throughput genomics applications are now quickly becoming ingrained in many labs, and data sets are expanding in size at a phenomenal rate – it is not uncommon to see some next gen sequencing data in the terabytes – there needs to be efficient and powerful ways to mine and analyse this data. R just happens to lend itself to this purpose, and is quickly becoming one of the major go-to tools for bioinformatics analysis. This in mind, R is still a programming language and generally these aren’t the easiest things to just pick up and learn.

So that’s where this workshop comes in; this is my first workshop for those interested in applications of R in medical research, bioinformatics and high-throughput genomics. Somewhat of a pilot, the aim is to offer an initial grounding in R and set the platform for possible future workshops that will deal with handling and analysis of genomic data, i.e. microarrays. I have attempted to make this document as easy to follow as possible, and hope that those attending (and others who were unable to make it) will be able to take it away and use it as a reference and tutorial they can work through at their own pace.

This document is broken into numerous sectionsfn1. The first couple offer information on where to get R/RStudio, how to install, and getting started. I’ve also included information about resources I’ve found invaluable whilst learning (in most cases these will have an html link). The actual workshop will be based on a core of four sections:

  1. A Brief Introduction to R Language and Programming
  2. Data Management in R
  3. Basic Statistical methods
  4. Basic Plotting/Graphing in R

You might notice that these four sections are generally lighter on text – and contain much more code – than the initial ‘introduction’ of this document. This is because I will be talking around the code and examples as we go. I have included some barebones comments and explanations that you can hopefully refer back to at a later date if needed. Also after each section there is an exercise that aims to reinforce the ideas covered previously. At the very end of the document you’ll find a ‘cheat sheet’ that details some of the more common and useful functions/parameters utilised in R.

Well that’s a brief synopsis of how things are planned to proceed, so good luck and I hope you enjoy.


fn1. I thought about using chapters, but it just made me realise how large this ‘manual’ was becoming! This is as good a time as any to point out that there will be several footnotes – with some containing important information (as well as being a tribute to the great Terry Pratchett – a staunch proponent of the footnote).