Recently I’ve been bitten by the Galaxy bug, primarily because I needed a mechanism this year to supervise final year undergraduate projects of students without a strong background in bioinformatics. This was a great success, since students seem to pick up the interface really easily and I was able to track and comment on their progress explicitly via shared histories and workflows.
Because of this experience, I’ve become much more interested in using workflow systems to run and manage my bioinformatics pipelines in my research projects rather than relying on READMEs and UNIX shell scripts. Recent news that Kostas Karasavvas from NBIC has developed eGalaxy, a mechanisms to run Taverna 2 workflows using Galaxy is in my view a game-changer for the more widespread use of workflows by practicing bioinformaticians like myself, since it will permit mash-ups between the two main workflow systems and deployment of the large pre-established library of Taverna workflows in myExperiment to be used in a local Galaxy installation.
The easiest way of getting a Taverna workflow running in Galaxy is to search myExperiment for Taverna 2 workflows, and click the “Download Workflow as a Galaxy tool” button in the “Download” section of the page. This will send you to a “Galaxy tool download” page with instruction on how to get the Taverna workflow installed as a tool in Galaxy. The instructions are a bit spare at the moment and require familiarity with installing Galaxy locally and adding tools to a local Galaxy installation. They also only have have installation notes for Debian-based systems, but with the help of Rob Haines from the Taverna team, I’ve been able to get a stable protocol working for OSX as well.
To give a bit more context, Taverna workflows are run in Galaxy is as Ruby scripts that are added to your Galaxy tools directory like any other custom tool. Executing the Ruby script tool launches a connection to a remote Taverna 2 server, where the workflow is run. Results are then returned back to the Ruby script and thence to Galaxy. Like all Galaxy tools, installing a Taverna tools requires the tool itself (a script or other executable program) and a description of the tools’ inputs/outputs in XML format to be placed in the “tools” directory, plus a notification to Galaxy that the tool exists in the tool_conf.xml file in Galaxy main directory.
The Ruby script generated by myExperiment requires a few ruby packages (aka “gems”) that are installed by the RubyGems. Both Ruby and RubyGems are installed by default on OSX (in /usr/bin) so your kit is nearly complete. The following steps should allow you to run a test Taverna workflow to make sure your configuration is working properly on a OSX 10.6 machine. To help consolidate install notes for the entire process in one place, I’m copying the key steps for a local Galaxy installation here as well.
1) Install Mercurial version control system for OSX from here, and add make sure /usr/local/bin/ is in your path.
2) Checkout the Galaxy codebase using Mercurial in your home directory ($HOME):
$ hg clone https://bitbucket.org/galaxy/galaxy-dist/
3) Create a Taverna tools directory in your Galaxy distribution:
$ mkdir $HOME/galaxy-dist/tools/tavernaTools
4) Install the RubyGems needed for the Taverna tool to run. The critical gems to install are t2-server (which is needed to connect to the taverna server that runs the workflow) and rubyzip (which is needed for compression of Galaxy results). Installation of t2-server will automatically install the libxml-ruby and hirb gems it is dependent on. libxml-ruby calls on the the libxml2 C XML parser, which is also installed by default on OSX in /usr/include/libxml2/
$ sudo gem install t2-server
$ sudo gem install rubyzip
5) Select a Taverna 2 workflow from myExperiment and download Ruby script and XML file. For testing, use a workflow that does not require any input files, e.g. http://www.myexperiment.org/workflows/823/versions/1/galaxy_tool
6) Paste “http://test.mybiobank.org/taverna-server” into the “Taverna server URL:” textbox.
7) Click the “Download Galaxy tool” button, e.g. to your Downloads folder.
8) Unzip the Taverna 2 Galaxy tool and move the Ruby script and XML file into your Taverna tools directory, e.g.
$ unzip $HOME/Downloads/fetch_pdb_flatfile_from_rcsb_server_58764_galaxy_tool.zip
$ mv $HOME/Downloads/fetch_pdb_flatfile_from_rcsb_server_58764_galaxy_tool.xml $HOME/galaxy-dist/tools/tavernaTools
$ mv $HOME/Downloads/fetch_pdb_flatfile_from_rcsb_server_58764_galaxy_tool.rb $HOME/galaxy-dist/tools/tavernaTools
9) Edit your tool_conf.xml file to include a new section for Taverna tools, e.g.
<section name="Taverna Tools" id="tavernaTools">
10) Start the Galaxy server by running the run.sh script:
$ sh $HOME/galaxy-dist/run.sh
11) Open http://127.0.0.1:8080 in your web browser and you should see a “Taverna Tools” tool heading above the “Get Data” Tools heading, which when clicked on reveals the newly installed “Fetch PDB flatfile from RCSB server” tool. You can now run this Taverna tool like any other analysis tool in Galaxy. For this particular tool, this involves inputting a PDB id and clicking “execute”. Successful completion of the job should return a PDB file in the main Galaxy window, e.g.
In early stages testing this protocol, I selected a different Taverna workflow (916) that did not run out of the box and gave less-than-helpful error messages in Galaxy. Trouble shooting with Rob Haines pinpointed this problem to the http://test.mybiobank.org Taverna server not being able to execute aspects of this workflow. When tested on an alternate Taverna server, the workflow did run and completed with expected results. So if you experience a failed attempt at running a Taverna workflow on Galaxy it may not have anything to do with your kit or the workflow in question.
From this initial experience, looking forward (from mid-2011) I’d like to see the eGalaxy system include a mechanism to generate tests for each Taverna tool automatically, either at the time of tool download from myExperiment or as a part of the Galaxy testing system. I’d also like to see a industrial-scale Taverna sever hosted somewhere (preferably by the Galaxy or Taverna teams) so all Taverna tools can be used reliably out of the box on at least one tested server. In any event, I’m now convinced that the eGalaxy project is what it says on the tin, and can only improve with more folks trying it out and contributing feedback.
Notes: This protocol was developed on a MacBook Air Intel Core 2 Duo running OSX 10.6.8. Credits to Rob Haines for help trouble shooting andgiving me a detailed walk through on the mechanics of the Taverna-Galaxy integration as well as to Kostas Karasavvas of NBIC for having the inspiration to intiate the eGalaxy project.