Retreats Spring2014

Retreat Spring 2014 - Izmir

Use this page to collect ideas, topics, etc. for the upcoming OpenMS developer retreat.

Discussion Topics

OpenMS 2.0 [all] Protocol
- Kernel redesign
- Re-modularization (OpenMS & sublibs, TOPP)
- New functionality
- Identify and address key scalability issues
- Coherent vision and concept for downstream stat analysis
- Schedule
- Continuous integration and continuous deployment
- Distribution channels (e.g., partnership/integration with vendor packages)
- Qt5 support
Replace Logging with Boost.Log [Stephan]
- see http://www.boost.org/doc/libs/1_55_0/libs/log/doc/html/index.html
- a single central logger in the library
- seems to not support remove of duplicates (maybe the classes should take care of this themselves)
- requires use of boost 1.54
Inheritance vs. composition + Interfaces instead of templates [Hannes + ??? ]
- MetaValue"Interface"
- inheritance from std::string in OpenMS::String
- DefaultParamHandler, ProgressLogger, RangeManager, ..
- reduce template usage (e.g., Feature in FeatureMap)
- switch to STL structures instead of home-grown data structures (e.g. vector vs Peak1D)
- which interfaces should be supported?
- de-entangle algorithms from the data structures
Implementation of mzQuantML/mzIdentML/mzTab [Mathias, Timo]
- The current support for mzQuantML/mzIdentML in OpenMS is far from ideal. This workgroup should discuss/evaluate what is necessary to support reading and writing of mzQuantML/mzIdentML in OpenMS
- Determine common classes (Experimental design, etc.)
- Also performance aspects of the implementation should be discussed.
- Small molecule support
Modularization of OpenMS [Hannes]
- splitting OpenMS into different, smaller libraries (which would those be? ANALYSIS, CHEMISTRY, DATASTRUCTURES, FORMAT/IO, KERNEL, MATH, metdadata? what is metadata?, SIMULATION), what would be the advantage? how much work would it be? how to best start, which path forward to take?
- Communicate via interfaces
- Use forward declarations and reduce multiple inheritance
Should we provide a framework for Python code developed on top of pyOpenMS (e.g. github.com/OpenMS/PythonCore or so)? If so, where should it live, how should it be tested and is there interest in a common core of classes that are useful across projects?
Protein inference Protocol
- plans
- integration of other tools
- factor out protein inference information from PeptideHit into dedicated class (bipartite graph)?
Spectral matching
- proteomics vs. metabolomics
- integration of SpectraST
- comparison
Downstream analysis [all]

Hacking Tasks

Qt5 support [Stephan, Chris]
- We currently have a couple of classes that prevent a migration to Qt5. This workgroup should find all Qt5 incompatible classes and, if possible, replace the code with Qt5 compatible code.
- Furthermore it should be evaluated how the build system needs to be adapted to support Qt5 (ideally alongside Qt4)
- References:
  - http://qt-project.org/wiki/Transition_from_Qt_4.x_to_Qt5
- How good is platform support for Qt5? Especially on Linux distros this could be a problem?! Windows needs to compile on its own anyway.
Removal of DB Support [Xiao, Fabian]
- OpenMS still contains the (unmaintained) DB support. This workgroup should remove all traces of the DB support (documentation, code, tests, ..) from OpenMS.
Rewrite of ProgressLogger [Chris, Stephan]
- Currently ProgressLogger has two problems
  1. All classes derive from ProgressLogger instead of holding it as a member.
  2. ProgressLogger is the only class in OpenMS core that has dependency on QtGUI.
- This workgroup should come up with a new concept for the ProgressLogger and (if possible) implement it.
- This task could/should be coordinated with the discussion group on Boost.Log as their decisions may affect also the ProgressLogger (e.g., configuration of log targets) - specifically boost progress.hpp might be good option
KNIME Testing [Knut, Canan]
- To be promoted to KNIME's Trusted Community Contributions we need to provide a couple of test workflows that ensure the functionality of OpenMS.
- Come up with a handful of small workflows (ideally involving loops etc.) that test the KNIME integration.
Design and implement a new concept of Residues, Modifications, AASequence and theoretical spectra generation [Hannes, Timo, Jens, Jennifer, Hendrik]
- specificity groups
- support multiple neutral losses at one residue
- in general: provide several implementation tailored for special usage? Hard to solve everything with one design.
- fast and configurable fragment mass and theoretical spectrum generation (use case: quickly get normal and all neutral loss spectra without recomputing everything)
- get rid of most or all static references or static initializations using elements db
- how to persist user created modifications (can we borrow some work done for MzTab/MzQuantML?)
- Use simple search engine prototype to evaluate design
Update of the online documentation [Fabian, Marc, Xiao, Christian, Hannes]
- Online documentation is at some points outdated or even confusing and external developers and users heavily depend on a good and correct documentation
- Maintenance and compatibility discussion should bring some insides on what we want to document and where up-to-dateness is critical
SVN cleanup [??]
- our documentation and code still contains some references to the old svn system
- VersionInfo still carries the revision flag, that isn't filled anymore
fix warnings [all, each evening]
- currently clang emits tons of warnings (see #652) while mostly trivial, we still need to fix them
- maybe even reenable some of the disabled warnings and check them
- see also #738 and http://blog.httrack.com/blog/2014/03/09/what-are-your-gcc-flags/
reassign tickets [Oliver, Hendrik, Christian]

OpenMS coding and discussion topics

~~inheritance vs. composition~~
- MetaValue"Interface"
- inheritance from std::string in OpenMS::String
~~get rid of templates (e.g. MSExperiment)~~
~~static stuff~~
~~separate libraries - removal of circular dependencies~~
- forward declarations
  - check out https://code.google.com/p/include-what-you-use/
- avoid/remove excessive use of multiple inheritance
~~automatic scan of header/sources in cmake~~
- the CMake authors actually discourage such an approach, see http://www.cmake.org/cmake/help/v2.8.12/cmake.html#command:file

(We do not recommend using GLOB to collect a list of source files from your source tree. If no CMakeLists.txt file changes when a source is added or removed then the generated build system cannot know when to ask CMake to regenerate.)

header guards
~~replace logging library with Boost.Log~~
~~use of interfaces for spectrum, chromatogram etc.~~
~~formal process for including third-party dependencies~~
~~small introduction to git and the targeted workflow~~
~~location of tutorials, etc. (wiki or doc)~~
~~adoption of C++11~~
~~future of idXML/featureXML (replace by mzIdentML/mzQuantML?)~~
unit test clean-up: reduce long-runners, increase coverage
~~support for Qt 5.X, Qt 4 updates lack behind (e.g., Mac OS X 10.9 support) and conflicts with qt5 installations~~
(auto) CV updating
~~final deprecation of the DB support~~
future of the build infrastructure (FU cannot host all build machines)
remove GUI tools from OpenMS_GUI
TOPPView as editor (modification, addition of data via TOPPView)
TOPPView cleanup (e.g., TOPPViewBase has ~4.000 lines of code)
Ease of getting started -> are we doing enough for new developers?

Detailed Agenda

Sunday, March 23rd

Arrival at Izmir, transport to hotel

Monday, March 24th

Morning session

Welcome by Oliver & Knut
Recent news from the OpenMS eco-system [Oliver, Knut]
- awesome papers and collaborations
- grants
- other projects worth to keep an eye on
short overview of the individual divisions
- Tübingen (Timo)
- Zürich (Hannes)
- Berlin (Knut)
- OpenMS @ Sanger (Hendrik)
- OpenMS @ Freiburg (Lars)
- OpenMS @ Izmir (Jens)
- OpenMS @ Mainz (Jennifer)
formation of working groups
- prioritisation of discussion topics/tasks (see top of page)
- assignment of groups

Lunch break

Afternoon session

Breakout Session: Git Q&A [Christian, Hannes, Stephan]
- wish-list: just add new topics and we will try to solve them
  - collaborative development
  - parallel feature development
  - when should I rebase and when should I merge?
- document the found solutions and come up with a wiki page for git standards
- From git to installer package:
  - Provide installers of experimental features (branches, pull requests) to collaboration partners
  - We have a dedicated build server / branch in Berlin, that provides windows installers
Happy Hacking

Tuesday, March 25th

Morning session

Maintenance and compatibility [Hannes, Stephan]
- How do we handle dependency versions (e.g., 3rd party libs, min boost, qt versions) w.r.t. to the libraries shipped with different platforms (e.g., RHEL 6 ships qt 4.6)
- how do we handle compiler support (e.g., Visual Studio versions)
- Currently we spend more time on maintaining OpenMS than on adding new features
  - OpenMS continues to grow but we lack the resources to maintain all the generated code
  - Strategy to remove code
- Increase focus on quality
  - adherence to design standards and testing guidelines to keep code maintainable (a big part of this will be modularisation and removing inter-dependencies)
  - incubation mechanism for tools, classes?
- Documentation:
  - A maintenance issue that costs time and is critical for external developers and users.
  - We should decide what needs to be documented (API, design concepts/decisions, user guide, quick-start guide, installation/compilation process) and what are the priorities concerning up-to-dateness.
- Requirements for 3rd party libraries; what are the minimal requirements for 3rd party libraries ? (build on all platforms, no conflicts with our other libraries [specifically boost], fast build (travis?) or debian-packages, BSD or Apache licence)

Lunch break

Afternoon session

Happy Hacking

Wednesday, March 26th

Morning session ?

OpenMS Interfaces and modularisation [Hannes]
Define interfaces for current OpenMS data: Experiment, Spectrum, Chromatogram, MassTrace, Feature, Identification
Use standard, easy containers for our data (std::vector, iterators)
De-couple actual data from meta-data
Prepare path forward to de-couple algorithms from actual data structures
Prepare build system to build algorithmic libraries separate from data structures to ensure de-coupling
Create class-dependency graph and try to remove unnecessary dependencies and group common classes
Final goal: to be able to use only parts of OpenMS and swtich to different implementations of data handling rather easily (in-memory/on-disc/3rd-party library)

Social event ? (or another afternoon/day)

Thursday, March 27th

Morning session

Lunch break

Afternoon session

Happy Hacking

Friday, March 28th

Morning session

Lunch break

Afternoon session

Happy Hacking
Future of OpenMS [Oliver, Knut]
- Where are we at the moment ?
- What are/were the goals of OpenMS, were they achieved ?
- Where do we want to go ? What will be future goals ?
- Which scientific features should / will be added ?
Wrap-Up [Oliver, Knut]

Saturday, March 29th

Postponed topics

~~C++ 11 Adoption~~ [this is not really an option with our Python bindings (need MSVS2008)] - why?
- Because to use compiled Python modules directly without re-compilation, they need to be compiled with the same compiler as the Python.exe executable on the system. On Windows, the "official" Python.exe is compiled with MSVS2008 - so to offer easy access to pyOpenMS we need to support MSVS2008
  - Argh. Python 3.3 is using MSVS2010 (and it looks like 3.4 will stay with that). Would switching to Python 3 be an option and would MSVS2010 give us C++11 support, or do we need to wait for a Python that uses MSVS2012? It would be nice to have a plan for switching to C++11 as soon as that becomes possible.
Generic Galaxy Nodes
- Will there be a tool that automatically generates Galaxy nodes? A GenericGalaxyNodes (GGN) tool equivalent to Stephan’s GenericKnimeNodes (GKN) tool.
- Yes. This is under development, will reside near the GKN repo and will be implemented in Python (based on CTDopts package by Andras Szolek). Luis de la Garza has the lead on this, ETA within the next few weeks.
Protein Evidence Traceback [Lars]
- Often biologists are not so much interested in the entire proteome but in two/three proteins that are the focus of their study. It is currently difficult to trace back the evidence of a particular protein to the original LC-MS data. We could extend ProteinQuantifier, FileFilter and IDFilter in order to make this task easier. (1) ProteinQuantifier annotates the input featureXML/consensusXML with protein IDs the feature/consensus was used for. (2) FileFilter and IDFilter then allow to search for particular peptide sequences or protein IDs in featureXML/consensusXML or idXML/mzIdentML.
Visualization: http://d3js.org/
Small molecules [Fabian, Marc]
- Determine small molecule related classes needed in OpenMS
- Assess, how these can be serialized with the existing xml file schemata (begin with new standards)

Retreats Spring2014

Table of Contents

Discussion Topics

Hacking Tasks

OpenMS coding and discussion topics

Detailed Agenda

Sunday, March 23rd

Monday, March 24th

Morning session

Lunch break

Afternoon session

Tuesday, March 25th

Morning session

Lunch break

Afternoon session

Wednesday, March 26th

Morning session ?

Social event ? (or another afternoon/day)

Thursday, March 27th

Morning session

Lunch break

Afternoon session

Friday, March 28th

Morning session

Lunch break

Afternoon session

Saturday, March 29th

Postponed topics

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally