...Down the Rabbit Hole...

“If you don't know where you are going any road can take you there”

That's my root partition you're hogging!

I recently noticed that my / (root) partition was not so slowly running out of space. This was somewhat alarming as I’d given it a generous 23Gb of SSD space which is surely overkill for a Debian install… Well it turns out that the culprit was R, more specifically the default site-library located in /usr/local/lib/R/site-library: $ sudo du -h /usr/local/lib/R/site-library 6.0G /usr/local/lib/R/site-library/ That’s 6Gb of R packages dominating my root directory, thanks Bioconductor!

A nice how-to series for writing scientific communications

How-to’s of writing scientific communications Very brief post to remind me to go back and read these in more detail! I’ve recently been asked to submit a couple of articles to Nature. One thing that seems to be an absolute must for even getting a look in to a journal such as this is a well written cover letter. Hence I went on the hunt for any tips/tricks/suggestions that were scattered around the internet.

Linux find examples

I just wanted to record a few short find commands that are very fast and useful. To find all the files modified on a specific date, in the case below the 13th May 2014: $ find . -type f -name "*" -newermt 2014-05-13 ! -newermt 2014-05-14 ./Phd/NI_genetic_data/SNPloci_update_May-2013/bed/male_Y_SNPS.hh ./Phd/NI_genetic_data/SNPloci_update_May-2013/bed/male_Y_SNPS.log ./Phd/NI_genetic_data/SNPloci_update_May-2013/bed/male_Y_SNPS.map ./Phd/NI_genetic_data/SNPloci_update_May-2013/bed/male_Y_SNPS.nof ./Phd/NI_genetic_data/SNPloci_update_May-2013/bed/male_Y_SNPS.ped ./Phd/NI_genetic_data/SNPloci_update_May-2013/bed/plink.hh ./Phd/NI_genetic_data/SNPloci_update_May-2013/bed/plink.log ./Phd/NI_genetic_data/SNPloci_update_May-2013/bed/plink.nof ./Presentations/Genemappers_2014/conference_notes/example_expression_variance.png ./Presentations/Genemappers_2014/conference_notes/notes_20140513.txt You can also append ls to the above to get more detailed information about the files:

Power calculations in R

Power calculations are an essential element of good experimental design. They can inform on appropriate sample sizes to detect effects of a given size with varying degrees of confidence, or even provide information on the degree of confidence one will have within the confines of a small sample size and the maximum effect sizes one would expect to detect. It may just be me, but there always seems to be something mystically ‘scary’ about power calculations, peoples perception of them, and how to go about them (until recently I’m not ashamed to have included myself in all 3!

Creating a QuasR target file using the terminal

I’ve just recieved some sequencing data from a couple of Illumina HiSeq lanes and need to run some QC and initial analyses for the miR RNA-seq experiment. As this my introduction into RNA-seq I’ve been doing a little reading and came across a very nice overview of using R/Bioconductor for RNA-seq analysis: Thomas Girke - Analysis of RNA-Seq Data with R/Bioconductor I’ve decided to give QuasR a go, I’ll report back either here or on the blog with progress.

Useful Linux sort variants

I just wanted a quick reference to look back on. You have a list of chromosomes in a random order: $ cat chr.list chr3 chr6 chr2 chr11 chr5 chr7 chr9 chr10 chr1 chr4 chr8 chr12 chr20 chr22 Default sort: $ cat chr.list | sort chr1 chr10 chr11 chr12 chr2 chr20 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 Sort with –version-sort: $ cat chr.list | sort --version-sort chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr20 chr22 Another way to get the same as above: