Sunday, March 25, 2007

More about The Dresden Build

This week I sent out an email to about 20 potential users of The Dresden Build. In the course of corresponding with one of the people who responded, I ended up writing a somewhat longish email back, about "what this project is all about", why it's being called a Beta release, that kind of stuff.

Rather than forward the whole text back to the email list, I thought I'd reproduce most of it here. If you're interested in knowing more about The Dresden Build, this might be a good place for you to start.

(You may also want to read the previous post: "arrayCGHbase - The Dresden Build").

----------------


Our lab in Dresden is not large, and getting funding for new tools such as software is always a challenge. The lab members have had a lot of experience with downloading free and/or open-source tools from the web, trying them out, and running into problems for which there is little or no support available.

I have over 10 years experience as a software engineer, and over this past year I've taken the original code for ArrayCGHbase, and worked to make it something that our lab members can easily install and maintain, even if they have little or no experience as database or application administrators. We've fixed bugs and added features, and included more recent features that Björn Menten (the originator of ArrayCGHbase) has worked on. We now have a partner lab in Germany using our builds, and they are very happy so far.

The tool does in fact provide both data storage and CGH analytic tools - as you might imagine, tools such as normalization, reporter set management, an assortment of view tools showing different perspectives on one or more experiments (karyotype view, line view, chromosome view, scatter plots, etc). The application is customizable is various ways - for example, several data formats come pre-loaded (various versions of ImaGene, GenePix, ArrayPro and AffyMetrix), but users are also able to quickly define and save their own formats as well.

We've also put some thought into how this tool can be integrated with the "data pipeline" which exists not only for us, but probably for most labs. Experiment data not only goes into the application, but we also want the post-normalized data to come out of the application, and be quickly available to external tools that can provide additional analysis support. For example, using standard database technology, we've developed a CSV output which can quickly deliver a set of experiments to a spreadsheet application - reporter location data are listed vertically in the first few columns, the experiment ID's run across the top row, and the cells at each intersection contain the normalized ratio data. This basic format provides a very good starting point for working in a tool such as Excel. This CSV output is another tool that we're glad to share with other labs along with this application (you run it as a direct script on the underlying database, rather than through ArrayCGHbase).

Another focus for us is to be aware of the kinds of efficiency gains we've been able to realize by using and improving ArrayCGHbase. A simple, non-scientific benchmark: For a single experiment, it used to take our lab members up to 2 days to perform normalization and analysis. Now, using this tool and subsequent tools in the pipeline, multiple samples can be clearly looked at in just a few hours.

We think that our build of this application is now stable enough that other labs (who might have similar issues around choosing and implementing software into their pipelines), should be able to benefit from using it. Our work with our partner lab has helped prove this to be true.

On the other hand, we want users to know a few things up front: firstly, that this project requires a certain amount of tenacity on the part of the user. Our installation procedures for Apache, PHP, and MySql are simple enough, but if you're not accustomed to this kind of work, it may seem intimidating at first. In addition, any time a database is installed and used, there will be administration issues that come up from time to time; while this shouldn't be frightening for users, it is something that they need to be aware of. Finally, there will be bugs from time to time.

We intend that our documentation and the online support we are willing to give are enough to make users comfortable taking on these issues. Therefore, the Beta phase: I'd like to have 5-10 users go through the installation process, start using the application, and give all the feedback they can, before opening up the project to a larger group.

I'm intending to establish this as a real open-source software project. This means much more than just delivering code...as I mentioned, we are interested in other labs contributing to the success of the project, whether it is simple feedback on the documentation, ideas for new features, new code, bugfixes, or just complaints in general. My interest is to provide the best, most stable and well-documented software application, free of charge, to as many groups who could use it as I can.


----------------------


yours truly, etc etc

arrayCGHbase - The Dresden Build

Aside from the mobile/semantic/design stuff that this blog is primarily about, I'm also supporting a build of the php application ArrayCGHbase, which I'm now calling "The Dresden Build". It is up for demo viewing here:

http://www.cultureset.com/arrayCGHbase/index.php

If you'd like the demo password, please send email to acgh_base@cultureset.com. Let me know a little about you: what lab you are working at, what's your experience with ArrayCGHbase or other tools like it.

I found a couple of glaring bugs just before putting this version up, and I'm packing to go to California tomorrow, so I probably won't fix them for a few days. Just so you know:

- checkboxes are acting strangely in the experiment list page

- in normalization, if you do an add/update, the query mode controls appear at the bottom, even though you're not in query mode yet.

- I'm still tweaking the settings at the host; for now, loading more than about 6 oligo experiments, or 8 of the smaller experiments, will cause the upload page to time out. (Note that in this version of ArrayCGHbase, users copy experiment data files into a directory on the same server as the app; the files are then available via a pulldown in the experiment upload section. I've only uploaded 3 sets of CGH files into this directory).

Also, please note that in normalization, the checkbox for "Use Smoothed Ratios" refers to performing the normalization on a column of pre-smoothed ratio data, rather than the normal (non-smoothed) ratios column.

There are some reports still disabled in the pulldown in the experiment list page. We've got a list of open issues on those reports, which Beta users are welcome to peruse. (Beta users also get a tool for enabling those reports so they can try them out.)

So, if you don't the dust while I get some things worked on, please enjoy trying out this version. Let me know (at the above email address) if you're interested in downloading and installing The Dresden Build. I've written a quick but comprehensive manual for doing the install, which takes you through setting up your own LAMP (or WAMP/MAMP) server, tweaking it a bit, and installing our code. (Again, let me know what lab you're working with, and what experience you've had with ArrayCGHbase or other similar tools in the past.)

enjoy!