<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>malariagen informatics</title>
	<atom:link href="http://informatics.malariagen.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://informatics.malariagen.net</link>
	<description>Software engineering, bioinformatics &#38; computer science for MalariaGEN</description>
	<lastBuildDate>Fri, 17 May 2013 10:26:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='informatics.malariagen.net' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://1.gravatar.com/blavatar/7c137ce952d612227e4c7c11af903906?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>malariagen informatics</title>
		<link>http://informatics.malariagen.net</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://informatics.malariagen.net/osd.xml" title="malariagen informatics" />
	<atom:link rel='hub' href='http://informatics.malariagen.net/?pushpress=hub'/>
		<item>
		<title>Migrating from P. falciparum reference genome 3D7 version 2 to version 3</title>
		<link>http://informatics.malariagen.net/2013/02/25/migrating-from-p-falciparum-reference-genome-3d7-version-2-to-version-3/</link>
		<comments>http://informatics.malariagen.net/2013/02/25/migrating-from-p-falciparum-reference-genome-3d7-version-2-to-version-3/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 11:42:02 +0000</pubDate>
		<dc:creator>Alistair Miles</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=454</guid>
		<description><![CDATA[I&#8217;ve created a liftover chain file to migrate genomic data from the &#8220;version 2&#8243; 3D7 reference genome to the newer &#8220;version 3&#8243; reference genome. You can download the chain file at the link below, as well as a binary for the liftOver program compiled for x86_64: 2to3.liftOver liftOver (x86_64 binary) To check it works, download [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=454&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve created a liftover chain file to migrate genomic data from the &#8220;version 2&#8243; 3D7 reference genome to the newer &#8220;version 3&#8243; reference genome. You can download the chain file at the link below, as well as a binary for the liftOver program compiled for x86_64:</p>
<ul>
<li><a href="https://malariagen-eu.s3.amazonaws.com/public/2013/02/pf-liftover/2to3.liftOver">2to3.liftOver</a></li>
<li><a href="https://malariagen-eu.s3.amazonaws.com/public/2013/02/pf-liftover/x86_64/liftOver">liftOver</a> (x86_64 binary)</li>
</ul>
<p>To check it works, download the above and <a href="https://malariagen-eu.s3.amazonaws.com/public/2013/02/pf-liftover/test.bed">test.bed</a> to a local directory then run:</p>
<pre class="brush: plain; title: ; notranslate">
chmod +x ./liftOver
./liftOver test.bed 2to3.liftOver test.v3.bed test.v3.unmapped
</pre>
<p>This should create the file <code>test.v3.bed</code> containing:</p>
<pre class="brush: plain; title: ; notranslate">
Pf3D7_07_v3	403620	403621	crt
</pre>
<p>Note that this expects chromosome names in the input to be like &#8220;Pf3D7_01&#8243;. If you&#8217;re using chromosome names like &#8220;MAL1&#8243; you&#8217;ll need to convert those first prior to applying the liftover to version 3.</p>
<p><span id="more-454"></span></p>
<p>To build the <code>liftOver</code> binary (and the other programs that are needed to create the liftover file) I did the following on Ubuntu 12.10:</p>
<pre class="brush: bash; title: ; notranslate">
wget http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
unzip jksrc.zip -d jksrc
cd
export MACHTYPE=x86_64
mkdir ~/bin/$MACHTYPE
export PATH=~/bin/$MACHTYPE:$PATH
sudo apt-get install libmysqlclient-dev
export MYSQLINC=/usr/include/mysql
export MYSQLLIBS=&quot;/usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 -lz&quot;
make
</pre>
<p>This didn&#8217;t completely build, but it got far enough to build the binaries in ~/bin/x86_64/ needed to create the liftover chain file.</p>
<p>To create the liftover chain file I followed instructions found at these links:</p>
<ul>
<li><a href="http://genomewiki.ucsc.edu/index.php/LiftOver_Howto">http://genomewiki.ucsc.edu/index.php/LiftOver_Howto</a></li>
<li><a href="http://genomewiki.ucsc.edu/index.php/Minimal_Steps_For_LiftOver">http://genomewiki.ucsc.edu/index.php/Minimal_Steps_For_LiftOver</a></li>
<li><a href="http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt">http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt</a></li>
</ul>
<p>This liftover was built using the following versions of the 3D7 reference genome:</p>
<ul>
<li>&#8220;version 2&#8243;: ftp://ftp.sanger.ac.uk/pub/pathogens/Plasmodium/falciparum/3D7/3D7.latest_version/September_2011/</li>
<li>&#8220;version 3&#8243;: ftp://ftp.sanger.ac.uk/pub/pathogens/Plasmodium/falciparum/3D7/3D7.latest_version/version3/September_2012/</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/454/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/454/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=454&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2013/02/25/migrating-from-p-falciparum-reference-genome-3d7-version-2-to-version-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6bbb1a29798652153eae95526b6322b6?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">alimanfoo</media:title>
		</media:content>
	</item>
		<item>
		<title>Load data from a VCF file into numpy arrays</title>
		<link>http://informatics.malariagen.net/2013/02/22/load-data-from-a-vcf-file-into-numpy-arrays/</link>
		<comments>http://informatics.malariagen.net/2013/02/22/load-data-from-a-vcf-file-into-numpy-arrays/#comments</comments>
		<pubDate>Fri, 22 Feb 2013 12:09:28 +0000</pubDate>
		<dc:creator>Alistair Miles</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=408</guid>
		<description><![CDATA[I&#8217;ve recently been doing some analysis of SNPs and indels from the MalariaGEN P. falciparum genetic crosses project, and have found it convenient to load variant call data from VCF files into numpy arrays to compute summary statistics, make plots, etc. Attempt 1: vcfarray I initially wrote a small Python library for loading the arrays [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=408&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve recently been doing some analysis of SNPs and indels from the <a href="http://www.malariagen.net/node/45">MalariaGEN P. falciparum genetic crosses project</a>, and have found it convenient to load variant call data from VCF files into <a href="http://www.scipy.org/Tentative_NumPy_Tutorial">numpy arrays</a> to compute summary statistics, make plots, etc.</p>
<p><strong>Attempt 1: <a href="https://github.com/alimanfoo/pyvcfarray">vcfarray</a></strong></p>
<p>I initially wrote a small Python library for loading the arrays based on the excellent <a href="http://pyvcf.readthedocs.org/en/latest/">PyVCF</a> module. This works well but is a little slow, and when I profiled it it was the VCF parsing that was the bottleneck, so I went in search of a C/C++ library I could use from Cython&#8230;</p>
<p><strong>Attempt 2: <a href="https://github.com/alimanfoo/vcfnp">vcfnp</a></strong></p>
<p><a href="https://github.com/ekg/vcflib">Erik Garrison&#8217;s vcflib library</a> provides a nice C++ API for parsing a VCF file, so I had a go at writing a <a href="https://github.com/alimanfoo/vcfnp">Cython module</a> based on that. Performance is better, I get roughly 2-4X speed-up over the PyVCF-based implementation, although I was hoping for an order of magnitude &#8230; I guess it&#8217;s just the case that string parsing is relatively slow, even in C/C++, and we should be using BCF2.</p>
<p>To install and try vcfnp for yourself, do:</p>
<pre class="brush: bash; title: ; notranslate">
pip install vcfnp
</pre>
<p>See the <a href="https://github.com/alimanfoo/vcfnp">vcfnp</a> README for some examples of usage.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/408/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/408/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=408&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2013/02/22/load-data-from-a-vcf-file-into-numpy-arrays/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6bbb1a29798652153eae95526b6322b6?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">alimanfoo</media:title>
		</media:content>
	</item>
		<item>
		<title>A tour of some VCF Visualisation software</title>
		<link>http://informatics.malariagen.net/2012/08/22/a-tour-of-some-vcf-visualisation-software/</link>
		<comments>http://informatics.malariagen.net/2012/08/22/a-tour-of-some-vcf-visualisation-software/#comments</comments>
		<pubDate>Wed, 22 Aug 2012 09:51:39 +0000</pubDate>
		<dc:creator>Ben Jeffery</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[bamseek]]></category>
		<category><![CDATA[igv]]></category>
		<category><![CDATA[java web start]]></category>
		<category><![CDATA[snps]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[varb]]></category>
		<category><![CDATA[vcf]]></category>
		<category><![CDATA[vcf files]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=375</guid>
		<description><![CDATA[Visualisation software round-up A common need that we have is to directly view, or interpretively visualise information (both numeric and categoric) that is attached to a particular point on genomic sequence, often in relation to some attributes of that sequence. The number of file formats and tools that have been written for doing this surprised me when I first looked. [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=375&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<h4 style="text-align:justify;">Visualisation software round-up</h4>
<p style="text-align:justify;">A common need that we have is to directly view, or interpretively visualise information (both numeric and categoric) that is attached to a particular point on genomic sequence, often in relation to some attributes of that sequence. The number of file formats and tools that have been written for doing this surprised me when I first looked. This post is the first step in looking at what is out there. For this purpose I&#8217;m limiting myself to looking at tools that read the popular <a href="http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41" target="_blank">VCF</a> format (Not to be confused with <a href="http://en.wikipedia.org/wiki/VCard" target="_blank">v-Card</a>). This will only scratch the surface &#8211; a quick look at <a href="http://seqanswers.com/wiki/Software">this list</a> shows there is more bioinformatics software than you can shake a double helix at.</p>
<h5 style="text-align:justify;">IGV &#8211; &#8216;Integrative Genomics Veiwer&#8217;</h5>
<p style="text-align:justify;"><a href="http://www.broadinstitute.org/igv/" target="_blank">IGV</a> is a java app, that loads a veritable <a href="http://www.broadinstitute.org/software/igv/FileFormats" target="_blank">kitchen sink of formats</a>. It is integrated with the <a href="http://www.oracle.com/technetwork/java/javase/tech/index-jsp-136112.html" target="_blank">Java Web Start</a> system that allows launching of a Java app with &#8216;one-click&#8217; from your web browser. Files can be loaded from disk or over http/ftp/<a href="http://www.biodas.org/wiki/Main_Page" target="_blank">DAS</a> or from a curated set that the app has metadata for. The state of the app on startup can be specified by command line args or XML config. Combined with a <a href="http://www.broadinstitute.org/software/igv/ControlIGV" target="_blank">php script</a> that cooks up custom Web Start files one can actually link to a specific view on a specific dataset (if that data is public). This gives web-app style linking, albeit with a bit of a wait and no access control.</p>
<p style="text-align:justify;">I&#8217;m only interested at the moment in using IGV to look at SNP data from VCF files &#8211; it does much more than this, for example reads from BAM files. Before loading you need to pre-process the VCF to create an index using igvtools which is accessed from the &#8216;File&#8217; menu in IGV. Indexing our VCF files originally failed &#8211; IGV complained that they did not comply with the the VCF4.0 spec as they had whitespace in the INFO field. I confirmed this with <a href="http://vcftools.sourceforge.net/" target="_blank">VCF tools</a> - in fact the error message from IGV was more instructive as it had the line number of the problem. This is defiantly the fault of our systems and something I hope we can eradicate through better automated testing and persuading people that sticking to standards is in their best interest. For now I just fixed this by truncating the file before the problem.</p>
<p style="text-align:justify;">Firstly one picks the reference genome onto which the VCF file will be mapped. IGV comes with quite a selection available in its curated set, or one can be loaded. The VCF will need the same chromosome names or it will not map. For example I had to pick an old Plasmodium reference as our VCF had the old &#8216;MAL1&#8242; style chromosome names. IGV is very flexible in what it will load &#8211; one could add extra columns to the VCF and have them displayed along side.</p>
<p style="text-align:justify;"><a href="http://malariageninformatics.files.wordpress.com/2012/08/igv21.png"><img class="alignright size-full wp-image-377" title="IGV Interface" src="http://malariageninformatics.files.wordpress.com/2012/08/igv21.png?w=614" alt="IGV Interface"   /></a>Once loaded one is presented with a layout with base as the X-axis and sample as the Y, you can drag around and use the arrow keys to move left right or use the stylised scroll bar at the top. I couldn&#8217;t use the mouse wheel to zoom <img src='http://s0.wp.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />  but you can use Ctrl-+.  The app keeps a record of your locations which you can navigate with the forward and back buttons. You can skip to a point or gene label using the search box at the top &#8211; this auto-completes and will give you a list to pick from if it gets more than one match. I managed to make the app hang by searching for a single letter though.</p>
<p style="text-align:justify;"><a href="http://malariageninformatics.files.wordpress.com/2012/08/screenshot-1.png"><img class="alignright  wp-image-382" title="IGV Pop-up" src="http://malariageninformatics.files.wordpress.com/2012/08/screenshot-1.png?w=164&#038;h=342" alt="IGV Pop-up" width="164" height="342" /></a>Hovering over any point gives information about that point in a window that disappears as soon as you move away &#8211; making you feel like you&#8217;re <a href="http://www.youtube.com/watch?v=LQY6Y96paZo#t=10m40s" target="_blank">playing some kind of steady hand game</a>. This also means that you can&#8217;t cut and paste that info out of it. There is a toolbar button that replaces the pop-up with a separate window with text, but I couldn&#8217;t copy out of that either. Right-clicking brings up a context menu that lets you sort by the selection or change how it displayed, for example switching between colouring for allele or genotype. As far as I can see the display is always relative to the reference genome. Although you can mark regions of interest you can&#8217;t pick a set of SNP positions and then just view those without the intervening bases, or order them by any criteria but genomic position. Above the individual samples is a summary section which for each position shows a small bar which is coloured in proportion the samples&#8217; genotype distribution.</p>
<p style="text-align:justify;">The code is on <a href="https://github.com/broadinstitute/IGV" target="_blank">github</a> (yay!) and appears to be under active development. In summary IGV is a flexible tool for viewing data, but does not offer any tools specifically for exploring variation through SNPs as in our use case.</p>
<h5 style="text-align:justify;">VARiation Browser</h5>
<p style="text-align:justify;"><a href="http://malariageninformatics.files.wordpress.com/2012/08/screenshot-3.png"><img class="alignright size-full wp-image-386" title="Screenshot-3" src="http://malariageninformatics.files.wordpress.com/2012/08/screenshot-3.png?w=614" alt=""   /></a><a href="http://software.markdpreston.com/varb" target="_blank">VARB</a> is a C++/QT app that only views VCF files. It is distributed as a binary but with shared linking to QT so I had to &#8216;apt-get libqt&#8217; before it would start. The source is distributed as a zip file so I can&#8217;t tell if it is under active development or submit changes as anything but a patchfile. VARB loads requires three files, a reference in <a href="http://en.wikipedia.org/wiki/FASTA_format" target="_blank">FASTA</a> format, an annotation file in <a href="http://www.sanger.ac.uk/resources/software/gff/" target="_blank">GFF</a> format and finally the VCF. I used the FASTA from <a href="ftp://ftp.sanger.ac.uk/pub/pathogens/Plasmodium/falciparum/3D7/3D7.version2.1.5/" target="_blank">here</a> and the GFF from the VARB example files. In loading our malformed VCF VARB also failed but did not provide any clue beyond saying that the file was malformed.</p>
<p style="text-align:justify;">VARB offers the same kind of navigation as IGV, again no mouse-wheel and strangely zooming is relative to the left edge. SNPs can disappear and re-appear as one zooms as the rasterisation algorithm doesn&#8217;t cope with sparse SNPs on zoomed out regions. The controls and drawing appear to run in the same thread which makes navigation hard. There is an annotation search, but with no complete. The selection tool was much more useful however with the details coming up in the sidebar and easily copied as clicking makes the details stick in the window until cleared.</p>
<p style="text-align:justify;">As well as the information from the VCF VARB adds some analytical output at the bottom of the window. This is fixed to the GC density, Relative variant density, <a href="http://en.wikipedia.org/wiki/Fixation_index">Fst</a> and <a href="http://en.wikipedia.org/wiki/Tajima's_D">Tajima&#8217;s D</a>, these are updated as one changes the quality, depth and SNP type filters on the left. The windows used for calculating these are fixed and zoom independent. Samples can be grouped, and this grouping is used for the Fst calculation &#8211; although I&#8217;m not sure how it works out Fst for more than one group. As in IGV there is no way to view the SNPs or samples in any way but sequence order and with separation. The colours can be re-assigned &#8211; I found that setting the reference allele colour to white let me see the variation much more clearly. With a few tweaks VARB could be a very use-able SNP browser.</p>
<h5 style="text-align:justify;">BAMSeek</h5>
<p style="text-align:justify;"><a href="http://malariageninformatics.files.wordpress.com/2012/08/screenshot-home-benj-vcf_viewers-hb3xdd2-qcplussamples-0-1-vcf.png"><img class="alignright size-full wp-image-390" title="Screenshot--home-benj-vcf_viewers-Hb3xDd2-qcPlusSamples-0.1.vcf" src="http://malariageninformatics.files.wordpress.com/2012/08/screenshot-home-benj-vcf_viewers-hb3xdd2-qcplussamples-0-1-vcf.png?w=614" alt=""   /></a><a href="http://code.google.com/p/bamseek/">BAMSeek</a> isn&#8217;t so much a visualisation tool as it is a file inspection tool. It is distributed as a JAR file with source on Google Code. It supports quite a few formats and is primarily designed for loading large files as it indexes, and then pages, the file as needed.  Anyone who has used a normal text editor will know the pain of large files (I have found <a href="http://www.sublimetext.com/">Sublime Text</a> handles them well though after a slightly long loading).  BAMSeek successfully loaded our off-spec VCF file &#8211; probably as does not fully parse it in order to display its textual content. The VCF file is simply displayed in a table with the header in a separate section. The paging is done by having actual pages that you flip through with a control on the bottom. The line numbers on the left are relative to the page &#8211; which is a little frustrating as to get the actual line number you have to do ((page-1)*(rows_per_page)+line_no) in your head. Hovering over a cell gives you the information formatted vertically. There&#8217;s not much more to it than that!</p>
<p style="text-align:justify;">Next time we&#8217;ll look at some web-based apps that do a similar job.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/375/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/375/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=375&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2012/08/22/a-tour-of-some-vcf-visualisation-software/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/972b7c8ae28068f4b6e4b888b74b2ab6?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">benjeffery</media:title>
		</media:content>

		<media:content url="http://malariageninformatics.files.wordpress.com/2012/08/igv21.png" medium="image">
			<media:title type="html">IGV Interface</media:title>
		</media:content>

		<media:content url="http://malariageninformatics.files.wordpress.com/2012/08/screenshot-1.png" medium="image">
			<media:title type="html">IGV Pop-up</media:title>
		</media:content>

		<media:content url="http://malariageninformatics.files.wordpress.com/2012/08/screenshot-3.png" medium="image">
			<media:title type="html">Screenshot-3</media:title>
		</media:content>

		<media:content url="http://malariageninformatics.files.wordpress.com/2012/08/screenshot-home-benj-vcf_viewers-hb3xdd2-qcplussamples-0-1-vcf.png" medium="image">
			<media:title type="html">Screenshot--home-benj-vcf_viewers-Hb3xDd2-qcPlusSamples-0.1.vcf</media:title>
		</media:content>
	</item>
		<item>
		<title>RDP VirtualBox without the proprietary Oracle extension pack</title>
		<link>http://informatics.malariagen.net/2012/04/11/rdp-virtualbox-without-the-proprietary-oracle-extension-pack/</link>
		<comments>http://informatics.malariagen.net/2012/04/11/rdp-virtualbox-without-the-proprietary-oracle-extension-pack/#comments</comments>
		<pubDate>Wed, 11 Apr 2012 15:58:22 +0000</pubDate>
		<dc:creator>Robert Hutton</dc:creator>
				<category><![CDATA[HOWTOs]]></category>
		<category><![CDATA[System Administration]]></category>
		<category><![CDATA[console]]></category>
		<category><![CDATA[consolidation]]></category>
		<category><![CDATA[extension]]></category>
		<category><![CDATA[headless]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[oracle]]></category>
		<category><![CDATA[ose]]></category>
		<category><![CDATA[oss]]></category>
		<category><![CDATA[RDP]]></category>
		<category><![CDATA[screen]]></category>
		<category><![CDATA[VBoxHeadless]]></category>
		<category><![CDATA[VBoxManage]]></category>
		<category><![CDATA[virtual]]></category>
		<category><![CDATA[virtualbox]]></category>
		<category><![CDATA[vm]]></category>
		<category><![CDATA[vnc]]></category>
		<category><![CDATA[windows]]></category>
		<category><![CDATA[xrdp]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=355</guid>
		<description><![CDATA[These days, virtualisation is all the rage. The various competing virtualisation products have reached a level of maturity where they can be reliably used for server consolidation. VirtualBox is one of the easiest to use, most featureful programs available in this space and with the ability to run on many different OSes on hardware with [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=355&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>These days, virtualisation is all the rage.  The various competing virtualisation products have reached a level of maturity where they can be reliably used for server consolidation.  VirtualBox is one of the easiest to use, most featureful programs available in this space and with the ability to run on many different OSes on hardware with or without VM extensions, it is also one of the most popular.  However, there is one wrinkle when it comes to using it for server consolidation: the proprietary RDP/USB2 extension pack.</p>
<p>The conventional wisdom when running a headless server with VirtualBox is that you need to install this <a href="https://www.virtualbox.org/wiki/Downloads">proprietary extension pack from Oracle</a>.  This is fine until you want to use the server in production: as the <a href="https://www.virtualbox.org/wiki/VirtualBox_PUEL">PUEL</a> only covers you for personal use and evaluation, you must <a href="http://www.oracle.com/us/technologies/virtualization/oraclevm/061976.html">purchase licenses</a>.  You can either pay £34 per user or £670 per &#8220;socket&#8221; (which has quite a <a href="http://www.orafaq.com/wiki/Oracle_Licensing#Standard_Edition_Per-socket_licensing">convoluted definition</a>).  This gets you USB2 and RDP support.</p>
<p>However, there is another way, at least when it comes to RDP support.<span id="more-355"></span>  <a href="http://www.virtualbox.org/manual/ch07.html">Chapter 7 of the VirtualBox manual</a> covers running virtual machines on a remote host, either with <code>VBoxManage</code> command (which offeres similar functionality to the VirtualBox GUI) or the <code>VBoxHeadless</code> command, which appears to be the backend binary that <code>VBoxManage</code> calls out to to do the actual work.  Now, chapter 7 covers in detail how to connect to remote VMs using the proprietary extensions, but there&#8217;s one feature that they conveniently failed to mention: the built-in VNC server.</p>
<pre class="brush: plain; title: ; notranslate">man VBoxHeadless
VBOXHEADLESS(1)            User Commands             VBOXHEADLESS(1)

NAME
       VBoxHeadless - x86 virtualization solution

DESCRIPTION
    Oracle  VM  VirtualBox Headless Interface (C) 2008-2011 Oracle
    Corporation All rights reserved.

Usage:
    -s, -startvm, --startvm &lt;name|uuid&gt;
           Start given VM (required argument)
    -n, --vnc
           Enable the built in VNC server
    -m, --vncport &lt;port&gt;
           TCP port number to use for the VNC server
    -o, --vncpass &lt;pw&gt;
           Set the VNC server password
    -v, -vrde, --vrde on|off|config
           Enable (default) or disable the VRDE  server  or  don't
           change the setting
    -e,  -vrdeproperty,  --vrdeproperty  &lt;name=[value]&gt;
    Set a VRDE property:
           &quot;TCP/Ports&quot; - comma-separated list of  ports  the  VRDE
           server can bind to. Use a dash between two port numbers
           to specify a range &quot;TCP/Address&quot;  -  interface  IP  the
           VRDE server will bind to
    -c, -capture, --capture
           Record the VM screen output to a file
    -w, --width
           Frame width when recording
    -h, --height
           Frame height when recording
    -r, --bitrate
           Recording bit rate when recording
    -f, --filename
           File  name when recording.  The codec used will be cho‐
           sen based on the file extension

VBoxHeadless                January 2011             VBOXHEADLESS(1)</pre>
<p>Right!  So we can start a virtual machine and forward its root console (or main video or out-of-band console or whatever you want to call it) over the network with a VNC server.  In my example I&#8217;ll use screen to keep my VMs running when I log out.  I&#8217;ll start a couple of VMs as an example:</p>
<pre class="brush: bash; title: ; notranslate">screen
VBoxHeadless --startvm 'Ubuntu' --vnc --vncport 5900
# hit &quot;ctrl-a c&quot; to open a new terminal within screen
VBoxHeadless --startvm 'XP1' --vnc --vncport 5901 --vncpass vnc2xrdp</pre>
<p>Great!  We don&#8217;t need the proprietary expansion pack to do this.  However, there are a few drawbacks to this approach:</p>
<ul>
<li>VNC sucks over slow network connections, it&#8217;d be much nicer to use the more modern RDP protocol</li>
<li>You have to keep the command running so you have to use <code>screen</code> or <code>nohup</code> or equivalent, which is a bit less convenient than using <code>VBoxManage</code></li>
<li>If the VNC server crashes (which I&#8217;ve had happen only once so far in testing, when changing screen resolution in a Windows XP guest), the whole VM goes down with it</li>
<li>You need to make sure you only bind one VNC server to each port.  As far as I can tell if you try to bind a second one to a port that&#8217;s already in use, the VM still starts up but you have no way of interacting with it!</li>
</ul>
<p>We can actually work around the first limitation, by using the <code>xrdp</code> program to &#8220;translate&#8221; the VNC protocol into RDP.  I&#8217;m on Ubuntu, so I have the luxury of installing xrdp the easy way.  On the VirtualBox server machine:</p>
<pre class="brush: bash; title: ; notranslate">sudo apt-get install xrdp</pre>
<p>Now we configure it to use the existing VNC servers that we previously spawned with <code>VBoxHeadless</code>.  One neat thing here is that xrdp uses a single RDP port to manage multiple VNC connections:</p>
<p>/etc/xrdp/xrdp.ini</p>
<pre class="brush: plain; title: ; notranslate">[globals]
bitmap_cache=yes
bitmap_compression=yes
port=3389
crypt_level=low
channel_code=1

[xrdp1]
name=VBox-Ubuntu
lib=libvnc.so
ip=127.0.0.1
port=5900

[xrdp2]
name=VBox-XP1
lib=libvnc.so
username=
password=vnc2xrdp
ip=127.0.0.1
port=5901</pre>
<p>So we have two VirtalBox VMs running, Ubuntu and XP1.  In the example above, I started the ubuntu VNC without a password, so I&#8217;ve left out the username/password entries.  The XP1 connection is protected with the password <code>vnc2xrdp</code>.  You can also use the value <code>ask</code> and xrdp will prompt for a username/password for connecting to VNC.  Note that VNC passwords are generally insecure, so it&#8217;d probably be best to protect the vnc ports using firewalling.  It doesn&#8217;t appear to be possible to bind the VNC server to only the loopback device (at least from the man page above).</p>
<p>So now all that&#8217;s left to do is to connect to the RDP port using one of the myriad RDP clients for linux (I&#8217;m using Remmina, but there are <a href="http://en.wikipedia.org/wiki/Remote_Desktop_Protocol#Non-Microsoft_implementations">heaps of options</a>).  You can then choose the VNC connection you want xrdp to connect to and you&#8217;re away!</p>
<div id="attachment_360" class="wp-caption alignnone" style="width: 624px"><a href="http://malariageninformatics.files.wordpress.com/2012/04/xrdp.png"><img src="http://malariageninformatics.files.wordpress.com/2012/04/xrdp.png?w=614&#038;h=375" alt="xrdp login screen" title="xrdp" width="614" height="375" class="size-full wp-image-360" /></a><p class="wp-caption-text">xrdp login screen</p></div>
<p>Of course, SSH local port forwarding is your friend if you&#8217;re doing any of this through firewalls or over insecure connections.  Remmina actually includes this functionality, or you can forward the remote port to your local machine with something like:</p>
<pre class="brush: bash; title: ; notranslate">ssh -L 3389:localhost:3389 vboxservermachine</pre>
<p>So there you are, RDP connections to remote VirtualBox VMs without the proprietary Oracle extensions; all free software!</p>
<p>Have fun. <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/355/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=355&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2012/04/11/rdp-virtualbox-without-the-proprietary-oracle-extension-pack/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b15c9886dce5949ef7e6c8543db80e32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rwh86</media:title>
		</media:content>

		<media:content url="http://malariageninformatics.files.wordpress.com/2012/04/xrdp.png" medium="image">
			<media:title type="html">xrdp</media:title>
		</media:content>
	</item>
		<item>
		<title>How to build MySQL Workbench on Ubuntu Precise (pre-release)</title>
		<link>http://informatics.malariagen.net/2012/04/11/how-to-build-mysql-workbench-on-ubuntu-precise-pre-release/</link>
		<comments>http://informatics.malariagen.net/2012/04/11/how-to-build-mysql-workbench-on-ubuntu-precise-pre-release/#comments</comments>
		<pubDate>Wed, 11 Apr 2012 15:46:13 +0000</pubDate>
		<dc:creator>Robert Hutton</dc:creator>
				<category><![CDATA[HOWTOs]]></category>
		<category><![CDATA[System Administration]]></category>
		<category><![CDATA[12.04]]></category>
		<category><![CDATA[build]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[mysqlworkbench]]></category>
		<category><![CDATA[pangolin]]></category>
		<category><![CDATA[precise]]></category>
		<category><![CDATA[source]]></category>
		<category><![CDATA[ubuntu]]></category>
		<category><![CDATA[workbench]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=348</guid>
		<description><![CDATA[Update 2012-04-25: mysql workbench has now appeared in the universe package archive. You should be able to install it with a simple: Read on if you still want to compile from source. Right now (2012-04-04), Ubuntu 12.04 hasn&#8217;t been released yet, and so there is no binary package from Oracle of MySQL Workbench for Precise. [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=348&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><strong>Update 2012-04-25</strong>: mysql workbench has now appeared in the <a href="http://packages.ubuntu.com/precise/mysql-workbench">universe package archive</a>.  You should be able to install it with a simple:</p>
<pre class="brush: bash; title: ; notranslate">sudo apt-get install mysql-workbench</pre>
<p>Read on if you still want to compile from source.</p>
<p>Right now (2012-04-04), Ubuntu 12.04 hasn&#8217;t been released yet, and so there is no binary package from Oracle of MySQL Workbench for Precise.  I managed to get the <a href="http://dev.mysql.com/downloads/workbench/">MySQL Workbench binaries for Oneiric</a> to run, by manually installing libzip1_0.9.3-1_amd64.deb from Oneiric, but this wasn&#8217;t stable (crashed as soon as I tried to run a SQL Query).</p>
<p>So I decided to build from source.  Here&#8217;s how I did it<span id="more-348"></span>:</p>
<pre class="brush: bash; title: ; notranslate">wget http://www.mirrorservice.org/sites/ftp.mysql.com/Downloads/MySQLGUITools/mysql-workbench-gpl-5.2.38-src.tar.gz
md5sum mysql-workbench-gpl-5.2.38-src.tar.gz
# should be cd2a0cec9dffd5465b6999f5d9c8de78 (from http://dev.mysql.com/downloads/workbench/#downloads).
tar xvzf mysql-workbench-gpl-5.2.38-src.tar.gz
cd mysql-workbench-gpl-5.2.38-src
# from http://bugs.mysql.com/bug.php?id=63898
fgrep -rlZ pkglib_DATA --include Makefile.am . | xargs -0 sed -i 's/pkglib_DATA/pkgdata_DATA/g'
# from https://bugzilla.redhat.com/show_bug.cgi?id=750023
vim ./modules/db.mysql.sqlparser/src/mysql_sql_parser_fe.cpp</pre>
<p>change line 23 from:</p>
<pre class="brush: cpp; title: ; notranslate">#include &lt;glib/gunicode.h&gt;</pre>
<p>to</p>
<pre class="brush: cpp; title: ; notranslate">#include &lt;glib.h&gt;</pre>
<pre class="brush: bash; title: ; notranslate">apt-get update
# from http://ubuntuforums.org/showthread.php?t=1792874
sudo apt-get install build-essential autoconf automake libtool libzip-dev libxml2-dev libsigc++-2.0-dev libglade2-dev libgtkmm-2.4-dev libgl1-mesa-dev libmysqlclient-dev uuid-dev liblua5.1-dev libpcre3-dev g++ libgnome2-dev libgtk2.0-dev libpango1.0-dev libcairo2-dev libsqlite3-dev python-dev libboost-dev libctemplate-dev
./autogen.sh
# I use -j4 below to use all four of my CPUs, set this appropriately for your setup.
make -j 4 install DESTDIR=/home/`echo $USER`/mysql-workbench
~/mysql-workbench/usr/local/bin/mysql-workbench</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/348/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/348/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=348&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2012/04/11/how-to-build-mysql-workbench-on-ubuntu-precise-pre-release/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b15c9886dce5949ef7e6c8543db80e32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rwh86</media:title>
		</media:content>
	</item>
		<item>
		<title>Handling Failures and Rerunning Tasks in Sun Grid Engine Array Jobs</title>
		<link>http://informatics.malariagen.net/2012/04/11/handling-failures-and-rerunning-tasks-in-sun-grid-engine-array-jobs/</link>
		<comments>http://informatics.malariagen.net/2012/04/11/handling-failures-and-rerunning-tasks-in-sun-grid-engine-array-jobs/#comments</comments>
		<pubDate>Wed, 11 Apr 2012 15:18:40 +0000</pubDate>
		<dc:creator>Alistair Miles</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[samtools]]></category>
		<category><![CDATA[sge]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=344</guid>
		<description><![CDATA[We use Sun Grid Engine here at WTCHG for managing our compute resources. Many of the analyses I&#8217;m doing are best run as array jobs, which generally works very well, but sometimes one or more tasks will fail for one reason or another, and I&#8217;ve been casting around for best practice when it comes to [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=344&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We use Sun Grid Engine here at WTCHG for managing our compute resources. Many of the analyses I&#8217;m doing are best run as array jobs, which generally works very well, but sometimes one or more tasks will fail for one reason or another, and I&#8217;ve been casting around for best practice when it comes to (a) verifying which tasks succeeded and which failed, and (b) re-running failed tasks.</p>
<p>I found a nice <a href="http://www.warelab.org/blog/?p=328">post by Shiran Pasternak on resubmitting failed SGE array tasks</a>, however Shiran doesn&#8217;t say how he determined which tasks had failed, and the set of tasks to rerun is specified manually. I have thousands of tasks in each array job, and so I really need an automated way of determining the success/failure of each task and rerunning those that failed.</p>
<p>I came up with the following pattern. </p>
<p><span id="more-344"></span></p>
<p>Say, for example, I want to run samtools flagstat over a set of several hundred BAM files. I create two scripts. The first script &#8211; flagstat.sh &#8211; just wraps the call to samtools flagstat:</p>
<pre class="brush: bash; title: ; notranslate">
#!/bin/bash

#
# This script generates summary statistics using samtools flagstat for
# a single sample.
#

# debug
set -x

# main executable
SAMTOOLS=/path/to/samtools

# assume first argument is sample ID
SAMPLE=$1

# path to BAM file
BAMFILE=/path/to/${SAMPLE}.bam

# assume second argument is location of output file
OUTFILE=$2

# do the work
$SAMTOOLS flagstat $BAMFILE &gt; $OUTFILE
</pre>
<p>The second script &#8211; flagstat.job.sh &#8211; is an SGE job script:</p>
<pre class="brush: bash; title: ; notranslate">
#!/bin/bash

# 
# Job script wrapper for flagstat.sh
#

# SGE options
#$ -S /bin/bash
#$ -N pf09_flagstat
#$ -m beas
#$ -M alimanfoo@googlemail.com
#$ -cwd
#$ -l vf=40M
#$ -l h_vmem=100M
#$ -l h_rt=1:59:0
#$ -t 1-428
#$ -o history
#$ -j y

# debug
set -x

# main script
MAIN=./flagstat.sh

# log file
LOG=log

# sample manifest - text file with one sample ID per line
MANIFEST=/path/to/samples.txt

# dereference task ID to sample ID
SAMPLE=`awk &quot;NR==${SGE_TASK_ID}&quot; ${MANIFEST}`

# expected location of output and MD5 verification files
OUTFILE=outputs/${SAMPLE}.flagstat
MD5FILE=${OUTFILE}.md5

# check if MD5 file already exists and matches output file                                                       
if [[ -f $OUTFILE &amp;&amp; -f $MD5FILE &amp;&amp; `md5sum ${OUTFILE} | cut -f1 -d&quot; &quot;` = `cat ${MD5FILE} | cut -f1 -d&quot; &quot;` ]]; then

    # task was previously run successfully, skip this time
    echo -e `date` &quot;\t${JOB_ID}\t${SGE_TASK_ID}\t${SAMPLE}\tSKIP&quot; &gt;&gt; $LOG
    exit 0

else

    # do the main work
    $MAIN $SAMPLE $OUTFILE

    # check exit status
    STATUS=$?
    if [[ $STATUS -eq 0 ]]; then

        # success, write MD5 verification file
        echo `md5sum $OUTFILE` &gt; $MD5FILE
	echo -e `date` &quot;\t${JOB_ID}\t${SGE_TASK_ID}\t${SAMPLE}\tOK&quot; &gt;&gt; $LOG

    else

        echo -e `date` &quot;\t${JOB_ID}\t${SGE_TASK_ID}\t${SAMPLE}\tFAIL\t${STATUS}&quot; &gt;&gt; $LOG

    fi

    exit $STATUS

fi
</pre>
<p>The main idea in this job script is that, if the main executable completes successfully, a &#8220;verification file&#8221; will be written, containing the MD5 hash of the task&#8217;s output file. Before running the main executable, the script checks whether the output file already exists, and the verification file already exists, and the MD5 hash it contains matches the MD5 sum of the output file &#8211; if so then the script assumes that the task was previously run successfully, and skips the task this time.</p>
<p>The point of all this is that a call to <code>qsub flagstat.job.sh</code> is effectively idempotent. I.e., if some tasks failed in a previous run, I can just call <code>qsub flagstat.job.sh</code> again, and it will automatically run only those tasks that failed previously.</p>
<p>This job script also writes some simple output to a log file, which is just a convenient file for me to scan visually to see if any tasks failed and I need to resubmit the job &#8211; the same information could probably also be got via <code>qacct</code>.</p>
<p>This works for me but if you have a more elegant solution I&#8217;d love to hear it. </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/344/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/344/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=344&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2012/04/11/handling-failures-and-rerunning-tasks-in-sun-grid-engine-array-jobs/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6bbb1a29798652153eae95526b6322b6?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">alimanfoo</media:title>
		</media:content>
	</item>
		<item>
		<title>Jobs &#8211; Bioinformatics</title>
		<link>http://informatics.malariagen.net/2012/03/23/jobs-bioinformatics/</link>
		<comments>http://informatics.malariagen.net/2012/03/23/jobs-bioinformatics/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 12:40:11 +0000</pubDate>
		<dc:creator>Alistair Miles</dc:creator>
				<category><![CDATA[Jobs]]></category>
		<category><![CDATA[jobs]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=335</guid>
		<description><![CDATA[We&#8217;re advertising bioinformatics jobs at both Oxford and Sanger (near Cambridge), see the following links for job descriptions and information on how to apply: Job Details &#8211; Bioinformatician &#8211; The Wellcome Trust Centre for Human Genetics, University of Oxford Job Details &#8211; Principal Bioinformatician &#8211; The Wellcome Trust Centre for Human Genetics, University of Oxford [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=335&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We&#8217;re advertising bioinformatics jobs at both Oxford and Sanger (near Cambridge), see the following links for job descriptions and information on how to apply:</p>
<ul>
<li><a href="https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=102515">Job Details &#8211; Bioinformatician &#8211; The Wellcome Trust Centre for Human Genetics, University of Oxford</a></li>
<li><a href="https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=102519">Job Details &#8211; Principal Bioinformatician &#8211; The Wellcome Trust Centre for Human Genetics, University of Oxford</a></li>
<li><a href="https://jobs.sanger.ac.uk/wd/plsql/wd_portal.show_job?p_web_site_id=1764&amp;p_web_page_id=146602">Job Details &#8211; Bioinformatician &#8211; The Wellcome Trust Sanger Institute</a></li>
<li><a href="https://jobs.sanger.ac.uk/wd/plsql/wd_portal.show_job?p_web_site_id=1764&amp;p_web_page_id=146603">Job Details &#8211; Principal Bioinformatician &#8211; The Wellcome Trust Sanger Institute</a></li>
</ul>
<p>Here&#8217;s an excerpt from the job description:</p>
<blockquote><p>
<strong>Overview of role</strong></p>
<p>All MalariaGEN projects working on parasite and vector biology depend on next-generation sequencing. Over 2,000 samples of parasite DNA have been sequenced, and at least 10,000 samples will have been sequenced by 2015. Genome sequencing has been carried out on approximately 200 Anopheles samples to date, and the aim is to sequence approximately 2,500 individuals over the next 4 years. Most parasite samples have been extracted directly from infected blood samples, and so present additional complexities such as small quantities of DNA and mixed infection.</p>
<p>Raw next-generation sequence data is the beginning of a complex and intellectual demanding analysis process. The primary goal is to discover robust evidence for genetic variation. However, building from raw sequence data to robust variation data is and will continue to be one of the most significant challenges facing the malaria research community over coming years. Working to iteratively improve the quality of our genetic variation data and reach deeper into the Plasmodium and Anopheles genomes is the main focus of the MalariaGEN Bioinformatician roles. </p>
<p>This is an extremely fast-paced area of current research and development, and new methods and tools are emerging from many leading research groups and projects, many of whom we have close contacts with. However, we have to strike a balance between looking to the future, and delivering data to MalariaGEN partners that might not be perfect or complete but which nevertheless provides a highly valuable research tool for a range of studies, such as genotype-phenotype association studies, and studies of parasite and vector population structure and dynamics.</p>
<p>To achieve this balance between <strong>methods development</strong> on the one hand, and <strong>production</strong> of data on the other, our bioinformatics programme is organised around two working groups. The methods development group is focused on the development, exploration and thorough evaluation of new methods, including methods for sequence alignment, variation calling and genotyping, working closely with statisticians. The production group is focused on establishing tightly specified data analysis pipelines and using them to produce high quality variation data in a reproducible way. Both working groups work to a quarterly data release cycle, where the methods development looks ahead to the next release and determines the best available methods, which are then adopted and implemented by production.</p>
<p>While this role may focus more on methods development or production at different times, we encourage participation in both working groups, as there are important insights that can only be gained by working across both.
</p></blockquote>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/335/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=335&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2012/03/23/jobs-bioinformatics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6bbb1a29798652153eae95526b6322b6?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">alimanfoo</media:title>
		</media:content>
	</item>
		<item>
		<title>Jobs &#8211; Scientific Software Engineering</title>
		<link>http://informatics.malariagen.net/2012/03/23/job-scientific-software-engineer/</link>
		<comments>http://informatics.malariagen.net/2012/03/23/job-scientific-software-engineer/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 12:23:12 +0000</pubDate>
		<dc:creator>Alistair Miles</dc:creator>
				<category><![CDATA[Jobs]]></category>
		<category><![CDATA[jobs]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=314</guid>
		<description><![CDATA[We&#8217;re advertising for software engineers, the job title says &#8220;scientific&#8221; but no previous experience of scientific programming is required, applications are very welcome from anyone with a strong software engineering background and an interest in the life sciences and/or public health. We&#8217;re advertising at both Sanger (near Cambridge) and Oxford, see the following links for [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=314&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We&#8217;re advertising for software engineers, the job title says &#8220;scientific&#8221; but no previous experience of scientific programming is required, applications are very welcome from anyone with a strong software engineering background and an interest in the life sciences and/or public health.</p>
<p>We&#8217;re advertising at both Sanger (near Cambridge) and Oxford, see the following links for job descriptions and information on how to apply:</p>
<ul>
<li><a title="job details" href="https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=102518">Job Details &#8211; Scientific Software Engineer - Wellcome Trust Centre for Human Genetics, University of Oxford</a></li>
<li><a href="https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=102522">Job Details &#8211; Principal Scientific Software Engineer &#8211; Wellcome Trust Centre for Human Genetics, University of Oxford</a></li>
<li><a href="https://jobs.sanger.ac.uk/wd/plsql/wd_portal.show_job?p_web_site_id=1764&amp;p_web_page_id=146604">Job Details &#8211; Software Developer &#8211; Wellcome Trust Sanger Institute</a></li>
</ul>
<p>Here&#8217;s a snippet from the job ad:</p>
<blockquote><p>MalariaGEN aims to produce global data on natural genetic variation in parasite, mosquito and human populations, and to deliver these data via the MalariaGEN website, alongside web tools which add value by enabling people to explore, understand and analyse the data. Some of these web and data products are intended for unrestricted use by the malaria research and public health community, to inform future research directions and malaria control policy. Other web and data products are being developed for private use by researchers contributing to MalariaGEN community projects, and provide a key incentive to participation in MalariaGEN, e.g., secure web tools providing access to fine-grained genetic data on individual samples.</p>
<p>We have a unique opportunity for Software Engineers to take a key role in the development and implementation of software projects relating to MalariaGEN web and data products.</p>
</blockquote>
<p>The job description has a bit more background:</p>
<blockquote><p>MalariaGEN continues to present many challenges that require development of new software applications. These include:</p>
<ul>
<li>
<p align="LEFT"><strong>Web applications</strong> to present and visualise complex data</p>
</li>
<li>
<p align="LEFT">Software for <strong>data analysis and analysis pipelines</strong>, typically compute-intensive involving terabytes of data</p>
</li>
<li>
<p align="LEFT"><strong>Laboratory information management systems</strong> (LIMS) to keep track of samples, data and high-throughput experiments</p>
</li>
<li>
<p align="LEFT"><strong>Business and collaboration systems</strong> to administrate and coordinate a complex global research network, and to enable partners from different institutions to share information and effectively work together</p>
</li>
</ul>
<p>The Web continues to be our primary platform for delivering software applications, and we have specialist expertise in <strong>Web application development</strong> and Web standards within the team. However we also develop other types of application as the problem requires.</p>
<p>Members of the software engineering team are equally capable of working across <strong>all stages of the software project life cycle</strong>, from requirements analysis and design through to implementation and testing, and we support the development of skills and experience across these different areas.</p>
<p>All software we develop is or will be released under an <strong>open source</strong> license. We also make use of existing open source software where possible and actively contribute to a number of open source projects. An interest in open source software and previous experience of participation in open source projects is an advantage.</p>
<p>We are working in a fast-moving area of scientific research, and we are constantly having to innovate. However, we also have a strong focus on the scientific robustness of the products delivered by MalariaGEN, and value a dedication to quality and sound engineering practices.</p>
</blockquote>
<p> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/314/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/314/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=314&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2012/03/23/job-scientific-software-engineer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6bbb1a29798652153eae95526b6322b6?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">alimanfoo</media:title>
		</media:content>
	</item>
		<item>
		<title>Job &#8211; Clinical Data Curator</title>
		<link>http://informatics.malariagen.net/2012/03/12/job-clinical-data-curator/</link>
		<comments>http://informatics.malariagen.net/2012/03/12/job-clinical-data-curator/#comments</comments>
		<pubDate>Mon, 12 Mar 2012 16:10:48 +0000</pubDate>
		<dc:creator>Alistair Miles</dc:creator>
				<category><![CDATA[Jobs]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[jobs]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=304</guid>
		<description><![CDATA[We&#8217;re advertising for a clinical data curator, here&#8217;s a snippet from the job ad: Applications are invited for a MalariaGEN Clinical Data Curator to work in a data-sharing community developing new tools to control malaria by integrating epidemiology with genome science. You will be a member of the MalariaGEN resource centre and will focus on [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=304&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We&#8217;re advertising for a clinical data curator, here&#8217;s a snippet from the job ad:</p>
<blockquote><p>Applications are invited for a <a href="http://www.malariagen.net">MalariaGEN</a> Clinical Data Curator to work in a data-sharing community developing new tools to control malaria by integrating epidemiology with genome science. </p>
<p>You will be a member of the <a href="http://www.malariagen.net">MalariaGEN</a> resource centre and will focus on <a href="http://www.malariagen.net">MalariaGEN</a> consortial data. This data relates to three of our human consortial projects; Genetic determinants of resistance to malaria; Genetic determinants of the immune response to malaria and Human genome variation in malaria-endemic regions.
</p></blockquote>
<p>&#8230;the actual work involves curating large amounts of clinical data relating to cases of severe malaria from Africa. The data originate from different countries, studies, research groups, &#8230; basically the data can be quite heterogeneous, and needs to be carefully managed, standardised and quality controlled to enable the data to be aggregated then analysed.</p>
<p>Dealing with clinical research data on this scale has been an ongoing challenge for our team for many years, and remains a critical part of realising <a href="http://www.malariagen.net">MalariaGEN</a>&#8216;s studies of human resistance to malaria. If you enjoy dealing with real-world data management problems, have an interest in the life sciences and/or public health, and enjoy working with a diverse community of people from different parts of the world, your application would be very welcome.</p>
<p>Further details are available at the link below:</p>
<ul>
<li><a href="https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=102332">Job Details &#8211; MalariaGEN Clinical Data Curator &#8211; Wellcome Trust Centre for Human Genetics, University of Oxford</a></li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/304/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=304&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2012/03/12/job-clinical-data-curator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6bbb1a29798652153eae95526b6322b6?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">alimanfoo</media:title>
		</media:content>
	</item>
		<item>
		<title>Job &#8211; Scientific Product Manager</title>
		<link>http://informatics.malariagen.net/2012/03/12/job-scientific-product-manager/</link>
		<comments>http://informatics.malariagen.net/2012/03/12/job-scientific-product-manager/#comments</comments>
		<pubDate>Mon, 12 Mar 2012 16:01:30 +0000</pubDate>
		<dc:creator>Alistair Miles</dc:creator>
				<category><![CDATA[Jobs]]></category>
		<category><![CDATA[jobs]]></category>

		<guid isPermaLink="false">http://informatics.malariagen.net/?p=301</guid>
		<description><![CDATA[Just a brief post to say that we&#8217;re advertising for a Scientific Product Manager. This may not be obvious at a glance, but this job is primarily about management of web and data products &#8211; previous experience in science is desirable but not necessary, applications are very welcome from anyone with a passion for developing [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=301&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Just a brief post to say that we&#8217;re advertising for a Scientific Product Manager. This may not be obvious at a glance, but this job is primarily about <strong>management of web and data products</strong> &#8211; previous experience in science is desirable but not necessary, applications are very welcome from anyone with a passion for developing and delivering high quality web and data products. Here&#8217;s a snippet from the job ad:</p>
<blockquote><p><a href="http://www.malariagen.net">MalariaGEN</a> aims to produce global data on natural genetic variation in parasite, mosquito and human populations, and to deliver these data via the <a href="http://www.malariagen.net">www.malariagen.net</a> website, alongside web tools which add value by enabling people to explore, understand and analyse the data. Some of these web and data products are intended for unrestricted use by the malaria research and public health community, to inform future research directions and malaria control policy. Other web and data products are being developed for private use by researchers contributing to <a href="http://www.malariagen.net">MalariaGEN</a> community projects, and provide a key incentive to participation in <a href="http://www.malariagen.net">MalariaGEN</a>, e.g., secure web tools providing access to fine-grained genetic data on individual samples.</p>
<p>We have a unique opportunity for a Product Manager to take responsibility for the development and delivery of <a href="http://www.malariagen.net">MalariaGEN</a> web and data products relating to genome sequencing, genotyping and population genetic data from Plasmodium, Anopheles and human populations.</p></blockquote>
<p>The job is being advertised both at Sanger and Oxford because you could be based at either location. Here are the job ads in full:</p>
<ul>
<li><a href="https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=102387">Job Details &#8211; MalariaGEN Scientific Product Manager &#8211; Wellcome Trust Centre for Human Genetics, Oxford University</a></li>
<li><a href="https://jobs.sanger.ac.uk/wd/plsql/wd_portal.show_job?p_web_site_id=1764&amp;p_web_page_id=146426">Job Details &#8211; MalariaGEN Scientific Product Manager &#8211; Wellcome Trust Sanger Institute, Cambridge</a></li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/malariageninformatics.wordpress.com/301/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/malariageninformatics.wordpress.com/301/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=informatics.malariagen.net&#038;blog=23499782&#038;post=301&#038;subd=malariageninformatics&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://informatics.malariagen.net/2012/03/12/job-scientific-product-manager/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6bbb1a29798652153eae95526b6322b6?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">alimanfoo</media:title>
		</media:content>
	</item>
	</channel>
</rss>
