Difference between revisions of "Journal:The systems biology format converter"

From LIMSWiki
Jump to navigationJump to search
(Created as needed.)
 
(Added content. Saving and adding more.)
Line 69: Line 69:
SBFC is developed using the Java programming language. However, if an existing converter is developed in a programming language other than Java, it is still possible to create a new SBFC converter that will invoke the existing converter using the Java ''Runtime exec()'' method. This approach can be used for invoking any external program or command without having to re-write the full converter. Once the converter is integrated into the framework, it can be used and combined effortlessly with other converters (source code available in the SBFC manual). A potential disadvantage of this approach is the loss of interoperability when using operating system-dependent code. The advantage is that the specific SBFC converters directly rely on the development of the original external converters reducing code duplication.
SBFC is developed using the Java programming language. However, if an existing converter is developed in a programming language other than Java, it is still possible to create a new SBFC converter that will invoke the existing converter using the Java ''Runtime exec()'' method. This approach can be used for invoking any external program or command without having to re-write the full converter. Once the converter is integrated into the framework, it can be used and combined effortlessly with other converters (source code available in the SBFC manual). A potential disadvantage of this approach is the loss of interoperability when using operating system-dependent code. The advantage is that the specific SBFC converters directly rely on the development of the original external converters reducing code duplication.


Each format is identified by an identifiers.org URI [3] or an internet media type. If none of them exists, the developers of the format and converter classes must agree on an identifier (URI) for this format. SBFC allows multiple classes implementing the GeneralModel interface for a given format, using different tools to read and write models. All classes should return the same value for the ''getURI()'' method. For instance, the implementation of a converter for the Systems Biology Markup Language [4] may rely on JSBML [5], libSBML [6], or a DOM document structure [7]. This design can be advantageous when 1) a given library does not read a version of a format properly; 2) a converter was written with an old or newer version of a library that has a different API; or 3) high performance is required (e.g. improving the implementation for file processing). At the beginning of a conversion, the converter checks that the value returned by the ''getURI()'' method of the input GeneralModel is a URI of a format it does support. If the converter recognises the format URI, the generic write methods (''modelToString()'' or ''modelToFile(String fileName)'') are used in order to retrieve the file content.
Each format is identified by an identifiers.org URI<ref name="JutyIdent12">{{cite journal |title=Identifiers.org and MIRIAM Registry: Community resources to provide persistent identification |journal=Nucleic Acids Research |author=Juty, N.; Le Novère, N.; Laibe, C. |volume=40 |issue=D1 |pages=D580-D586 |year=2012 |doi=10.1093/nar/gkr1097 |pmid=22140103 |pmc=PMC3245029}}</ref> or an internet media type. If none of them exists, the developers of the format and converter classes must agree on an identifier (URI) for this format. SBFC allows multiple classes implementing the GeneralModel interface for a given format, using different tools to read and write models. All classes should return the same value for the ''getURI()'' method. For instance, the implementation of a converter for the Systems Biology Markup Language<ref name="HuckaTheSys03">{{cite journal |title=The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models |journal=Bioinformatics |author=Hucka, M.; Finney, A.; Sauro, H.M. et al. |volume=19 |issue=4 |pages=524-531 |year=2003 |doi=10.1093/bioinformatics/btg015 |pmid=12611808}}</ref> may rely on JSBML<ref name="DrägerJSBML11">{{cite journal |title=JSBML: A flexible Java library for working with SBML |journal=Bioinformatics |author=Dräger, A.; Rodriguez, N.; Dumousseau, M. et al. |volume=27 |issue=15 |pages=2167-2168 |year=2011 |doi=10.1093/bioinformatics/btr361 |pmid=21697129 |pmc=PMC3137227}}</ref>, libSBML<ref name="BornsteinLibSBML08">{{cite journal |title=LibSBML: an API library for SBML |journal=Bioinformatics |author=Bornstein, B.J.; Keating, S.M.; Jouraku, A.; Hucka, M. |volume=24 |issue=6 |pages=880-881 |year=2008 |doi=10.1093/bioinformatics/btn051 |pmid=18252737 |pmc=PMC2517632}}</ref>, or a DOM document structure.<ref name="LeHorsDoc04">{{cite web |url=https://www.w3.org/TR/DOM-Level-3-Core/ |title=Document Object Model (DOM) Level 3 Core Specification |author=Le Hors, A.; Le Hégaret, P.; Wood, L. et al. |publisher=W3C |date=2004 |accessdate=14 March 2016}}</ref> This design can be advantageous when 1) a given library does not read a version of a format properly; 2) a converter was written with an old or newer version of a library that has a different API; or 3) high performance is required (e.g. improving the implementation for file processing). At the beginning of a conversion, the converter checks that the value returned by the ''getURI()'' method of the input GeneralModel is a URI of a format it does support. If the converter recognises the format URI, the generic write methods (''modelToString()'' or ''modelToFile(String fileName)'') are used in order to retrieve the file content.
 
==Results==
===Available formats and converters===
The SBFC project already implemented support for several formats and developed several converters. The following format classes are provided:
 
* APMModel for the APMonitor Modelling Language (APM). APMonitor is an optimization software for mixed-integer and differential algebraic equations<ref name="HedengrenNonlinear14">{{cite journal |title=Nonlinear Modeling, Estimation and Predictive Control in APMonitor |journal=Computers & Chemical Engineering |author=Hedengren, J.D.; Asgharzadeh, S.R.; Powell, K.M.; Edgar, T.F. |volume=70 |pages=133–148 |year=2014 |doi=10.1016/j.compchemeng.2014.04.013}}</ref>;
* BioPAXModel for BioPax, format to exchange descriptions of biomolecular pathways, including reaction and interaction networks<ref name="DemirTheBio10">{{cite journal |title=The BioPAX community standard for pathway data sharing |journal=Nature Biotechnology |author=Demir, E.; Cary, M.P.; Paley, S. et al. |volume=28 |issue=9 |pages=935–942 |year=2010 |doi=10.1038/nbt.1666}}</ref>;
* DotModel for the Dot format, that encodes graph descriptions used by the open source graph visualisation software GraphViz<ref name="GansnerAnOpen00">{{cite journal |title=An open graph visualization system and its applications to software engineering |journal=Software: Practice and Experience |author=Gansner, E.R.; North, S.C. |volume=30 |issue=11 |pages=1203–1233 |year=2000 |doi=10.1002/1097-024X(200009)30:11<1203::AID-SPE338>3.0.CO;2-N}}</ref> to generate multiple image formats (e.g. PNG, JPEG, etc);
* GPMLModel for the format used by the pathway drawing and analysis tool PathVisio [11] and the pathway database WikiPathways [12];
* MDLModel for the format used by the single particle simulator MCell [13];
* OctaveModel for Octave and MatLab m-file formats, encoding mathematical models usable by the modeling environments GNU Octave [14] and MatLab;
* SBGNMLModel for SBGN-ML format, a format to encode graphical maps in the Systems Biology Graphical Notation [15];
* SBMLModel for SBML [4], a format encoding mathematical models;
* XPPModel for XPP format, encoding mathematical models usable by the numerical analysis software XPPAUT [16].


==References==
==References==

Revision as of 17:32, 25 April 2016

Full article title The systems biology format converter
Journal BMC Bioinformatics
Author(s) Rodriguez, N.; Pettit, J.-B.; Pezze, P.D.; Li, L.; Henry, A.; van Iersel, M.P.; Jalowicki, G.;
Kutmon, M.; Natarajan, K.N.; Tolnay, D.; Stefan, M.I.; Evelo, C.T.; Le Novère, N.
Author affiliation(s) EMBL European Bioinformatics Institute, The Babraham Institute,
Maastricht University, California Institute of Technology
Primary contact E-mail: lenov@babraham.ac.uk
Year published 2016
Volume and issue 17
Page(s) 154
DOI 10.1186/s12859-016-1000-2
ISSN 1471-2105
Distribution license Creative Commons Attribution 4.0 International
Website http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1000-2
Download http://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1000-2 (PDF)

Abstract

Background

Interoperability between formats is a recurring problem in systems biology research. Many tools have been developed to convert computational models from one format to another. However, they have been developed independently, resulting in redundancy of efforts and lack of synergy.

Results

Here we present the System Biology Format Converter (SBFC), which provide a generic framework to potentially convert any format into another. The framework currently includes several converters translating between the following formats: SBML, BioPAX, SBGN-ML, Matlab, Octave, XPP, GPML, Dot, MDL and APM. This software is written in Java and can be used as a standalone executable or web service.

Conclusions

The SBFC framework is an evolving software project. Existing converters can be used and improved, and new converters can be easily added, making SBFC useful to both modellers and developers. The source code and documentation of the framework are freely available from the project web site.

Keywords

Converter, Format, Systems biology, SBML

Background

Computational representations of pathways and models lie at the core of systems biology research.[1] Formats have been designed to encode these complex knowledge representations, either as community standards or as formats specific to a software tool.[2] Different formats are preferentially used to address specific problems or use different approaches, thus limiting interoperability. However, one often needs to use several tools and approaches to answer a biological question, or to reuse existing pathways and models in different contexts. Many format converters have been written over the years. Often, several converters between the same formats are developed independently by different groups. This results in a duplication of efforts and waste of time, energy and money. The different converters may be inconsistent, leading to different results. In addition, being developed by one person or one team, those software tools tend to go unmaintained while the formats they are covering keep evolving. Finally, some of these converters are embedded in larger pieces of software, which hinders their use.

To overcome these challenges, the Systems Biology Format Converter (SBFC) software provides an open-source modular and extensible framework to potentially support conversion between any two formats using a single executable or web service.

Implementation

SBFC was built to support rapid implementation and integration of new converters. Therefore, it was designed with a high degree of modularity. At the core of the software are the GeneralModel interface and the GeneralConverter abstract class. The former is used for data exchange and describes the operations that every input or output format object must implement to be processed by SBFC. The latter represents the generic algorithm for converting one format into another. An overview of the SBFC framework is provided on Fig. 1.

Fig1 Rodriguez BMCBioinformatics2016 17.gif

Figure 1. SBFC overview. Overview for the software package SBFC. At the SBFC core a general converter translates a general model into another. Instantiations of general model and general converter are easily implemented in SBFC, providing users with a wide range of options for converting between specific model formats. Software libraries for importing or exporting model formats can be reused by different converters. For instance, the converter SBML2BioPAX currently uses the software libraries JSBML to import an SBML model, and PAXTOOLS to export it.

To add a new format, a developer must simply implement the GeneralModel interface, which provides some methods to read and write the format to file or string. Adding a new converter requires extending the GeneralConverter class and implementing the GeneralModel convert(GeneralModel model) method, where the model parameter is the input format that needs to be converted and the returned GeneralModel object is the new converted format. For instance a converter A2B translating from a file formatted as model A to a file formatted as model B requires the definition of two classes ModelA and ModelB implementing the GeneralModel interface. The class converter A2B must extend the abstract class GeneralConverter and implement the method GeneralModel convert(GeneralModel model). This method will receive an input object named model, whose dynamic type is ModelA. The object returned by this method will have dynamic type ModelB.

Because all SBFC format classes are implementations of the GeneralModel interface, it is possible to create new converters re-using existing converters by simply invoking the generic convert() method for each existing converter (Fig. 2). The convert() method in the new converter A2C is implemented by calling the convert() methods in the converters A2B and B2C, respectively (source code for all classes is available in the SBFC manual).

Fig2 Rodriguez BMCBioinformatics2016 17.gif

Figure 2. Creation of complex converters. a. In this scenario, three existing formats (A, B, and C) and two converters (A2B and B2C) are considered. Each of the A, B and C classes represents a different format and implements the interface GeneralModel. The converters extend the GeneralConverter class and translate from A to B, and from B to C respectively. b. A new converter A2C translating from A to C, can be added effortlessly by invoking the method convert() implemented in the converters A2B and B2C. c. Java source code illustrating the implementation of the method convert() for the converter class A2C.

SBFC is developed using the Java programming language. However, if an existing converter is developed in a programming language other than Java, it is still possible to create a new SBFC converter that will invoke the existing converter using the Java Runtime exec() method. This approach can be used for invoking any external program or command without having to re-write the full converter. Once the converter is integrated into the framework, it can be used and combined effortlessly with other converters (source code available in the SBFC manual). A potential disadvantage of this approach is the loss of interoperability when using operating system-dependent code. The advantage is that the specific SBFC converters directly rely on the development of the original external converters reducing code duplication.

Each format is identified by an identifiers.org URI[3] or an internet media type. If none of them exists, the developers of the format and converter classes must agree on an identifier (URI) for this format. SBFC allows multiple classes implementing the GeneralModel interface for a given format, using different tools to read and write models. All classes should return the same value for the getURI() method. For instance, the implementation of a converter for the Systems Biology Markup Language[4] may rely on JSBML[5], libSBML[6], or a DOM document structure.[7] This design can be advantageous when 1) a given library does not read a version of a format properly; 2) a converter was written with an old or newer version of a library that has a different API; or 3) high performance is required (e.g. improving the implementation for file processing). At the beginning of a conversion, the converter checks that the value returned by the getURI() method of the input GeneralModel is a URI of a format it does support. If the converter recognises the format URI, the generic write methods (modelToString() or modelToFile(String fileName)) are used in order to retrieve the file content.

Results

Available formats and converters

The SBFC project already implemented support for several formats and developed several converters. The following format classes are provided:

  • APMModel for the APMonitor Modelling Language (APM). APMonitor is an optimization software for mixed-integer and differential algebraic equations[8];
  • BioPAXModel for BioPax, format to exchange descriptions of biomolecular pathways, including reaction and interaction networks[9];
  • DotModel for the Dot format, that encodes graph descriptions used by the open source graph visualisation software GraphViz[10] to generate multiple image formats (e.g. PNG, JPEG, etc);
  • GPMLModel for the format used by the pathway drawing and analysis tool PathVisio [11] and the pathway database WikiPathways [12];
  • MDLModel for the format used by the single particle simulator MCell [13];
  • OctaveModel for Octave and MatLab m-file formats, encoding mathematical models usable by the modeling environments GNU Octave [14] and MatLab;
  • SBGNMLModel for SBGN-ML format, a format to encode graphical maps in the Systems Biology Graphical Notation [15];
  • SBMLModel for SBML [4], a format encoding mathematical models;
  • XPPModel for XPP format, encoding mathematical models usable by the numerical analysis software XPPAUT [16].

References

  1. Le Novère, N. (2015). "Quantitative and logic modelling of gene and molecular networks". Nature Reviews Genetics 16 (3): 146–158. doi:10.1038/nrg3885. PMC PMC4604653. PMID 25645874. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4604653. 
  2. Hucka, M., Nickerson, D.P.; Bader, G.D. et al. (2015). "Promoting coordinated development of community-based information standards for modeling in biology: The COMBINE initiative". Frontiers in Bioengineering and Biotechnology 3: 19. doi:10.3389/fbioe.2015.00019. PMC PMC4338824. PMID 25759811. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4338824. 
  3. Juty, N.; Le Novère, N.; Laibe, C. (2012). "Identifiers.org and MIRIAM Registry: Community resources to provide persistent identification". Nucleic Acids Research 40 (D1): D580-D586. doi:10.1093/nar/gkr1097. PMC PMC3245029. PMID 22140103. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245029. 
  4. Hucka, M.; Finney, A.; Sauro, H.M. et al. (2003). "The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models". Bioinformatics 19 (4): 524-531. doi:10.1093/bioinformatics/btg015. PMID 12611808. 
  5. Dräger, A.; Rodriguez, N.; Dumousseau, M. et al. (2011). "JSBML: A flexible Java library for working with SBML". Bioinformatics 27 (15): 2167-2168. doi:10.1093/bioinformatics/btr361. PMC PMC3137227. PMID 21697129. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137227. 
  6. Bornstein, B.J.; Keating, S.M.; Jouraku, A.; Hucka, M. (2008). "LibSBML: an API library for SBML". Bioinformatics 24 (6): 880-881. doi:10.1093/bioinformatics/btn051. PMC PMC2517632. PMID 18252737. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2517632. 
  7. Le Hors, A.; Le Hégaret, P.; Wood, L. et al. (2004). "Document Object Model (DOM) Level 3 Core Specification". W3C. https://www.w3.org/TR/DOM-Level-3-Core/. Retrieved 14 March 2016. 
  8. Hedengren, J.D.; Asgharzadeh, S.R.; Powell, K.M.; Edgar, T.F. (2014). "Nonlinear Modeling, Estimation and Predictive Control in APMonitor". Computers & Chemical Engineering 70: 133–148. doi:10.1016/j.compchemeng.2014.04.013. 
  9. Demir, E.; Cary, M.P.; Paley, S. et al. (2010). "The BioPAX community standard for pathway data sharing". Nature Biotechnology 28 (9): 935–942. doi:10.1038/nbt.1666. 
  10. Gansner, E.R.; North, S.C. (2000). "An open graph visualization system and its applications to software engineering". Software: Practice and Experience 30 (11): 1203–1233. doi:10.1002/1097-024X(200009)30:11<1203::AID-SPE338>3.0.CO;2-N. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.