Sunday, March 25, 2007

More about The Dresden Build

This week I sent out an email to about 20 potential users of The Dresden Build. In the course of corresponding with one of the people who responded, I ended up writing a somewhat longish email back, about "what this project is all about", why it's being called a Beta release, that kind of stuff.

Rather than forward the whole text back to the email list, I thought I'd reproduce most of it here. If you're interested in knowing more about The Dresden Build, this might be a good place for you to start.

(You may also want to read the previous post: "arrayCGHbase - The Dresden Build").

----------------


Our lab in Dresden is not large, and getting funding for new tools such as software is always a challenge. The lab members have had a lot of experience with downloading free and/or open-source tools from the web, trying them out, and running into problems for which there is little or no support available.

I have over 10 years experience as a software engineer, and over this past year I've taken the original code for ArrayCGHbase, and worked to make it something that our lab members can easily install and maintain, even if they have little or no experience as database or application administrators. We've fixed bugs and added features, and included more recent features that Björn Menten (the originator of ArrayCGHbase) has worked on. We now have a partner lab in Germany using our builds, and they are very happy so far.

The tool does in fact provide both data storage and CGH analytic tools - as you might imagine, tools such as normalization, reporter set management, an assortment of view tools showing different perspectives on one or more experiments (karyotype view, line view, chromosome view, scatter plots, etc). The application is customizable is various ways - for example, several data formats come pre-loaded (various versions of ImaGene, GenePix, ArrayPro and AffyMetrix), but users are also able to quickly define and save their own formats as well.

We've also put some thought into how this tool can be integrated with the "data pipeline" which exists not only for us, but probably for most labs. Experiment data not only goes into the application, but we also want the post-normalized data to come out of the application, and be quickly available to external tools that can provide additional analysis support. For example, using standard database technology, we've developed a CSV output which can quickly deliver a set of experiments to a spreadsheet application - reporter location data are listed vertically in the first few columns, the experiment ID's run across the top row, and the cells at each intersection contain the normalized ratio data. This basic format provides a very good starting point for working in a tool such as Excel. This CSV output is another tool that we're glad to share with other labs along with this application (you run it as a direct script on the underlying database, rather than through ArrayCGHbase).

Another focus for us is to be aware of the kinds of efficiency gains we've been able to realize by using and improving ArrayCGHbase. A simple, non-scientific benchmark: For a single experiment, it used to take our lab members up to 2 days to perform normalization and analysis. Now, using this tool and subsequent tools in the pipeline, multiple samples can be clearly looked at in just a few hours.

We think that our build of this application is now stable enough that other labs (who might have similar issues around choosing and implementing software into their pipelines), should be able to benefit from using it. Our work with our partner lab has helped prove this to be true.

On the other hand, we want users to know a few things up front: firstly, that this project requires a certain amount of tenacity on the part of the user. Our installation procedures for Apache, PHP, and MySql are simple enough, but if you're not accustomed to this kind of work, it may seem intimidating at first. In addition, any time a database is installed and used, there will be administration issues that come up from time to time; while this shouldn't be frightening for users, it is something that they need to be aware of. Finally, there will be bugs from time to time.

We intend that our documentation and the online support we are willing to give are enough to make users comfortable taking on these issues. Therefore, the Beta phase: I'd like to have 5-10 users go through the installation process, start using the application, and give all the feedback they can, before opening up the project to a larger group.

I'm intending to establish this as a real open-source software project. This means much more than just delivering code...as I mentioned, we are interested in other labs contributing to the success of the project, whether it is simple feedback on the documentation, ideas for new features, new code, bugfixes, or just complaints in general. My interest is to provide the best, most stable and well-documented software application, free of charge, to as many groups who could use it as I can.


----------------------


yours truly, etc etc

0 Comments:

Post a Comment

<< Home