Difference between revisions of "Template:Article of the week"

From LIMSWiki
Jump to navigationJump to search
(Updated article of the week text.)
(Updated article of the week text.)
Line 1: Line 1:
<div style="float: left; margin: 0.5em 0.9em 0.4em 0em;">[[File:Fig0.5 Alperin JofCheminformatics2016 8.gif|240px]]</div>
<div style="float: left; margin: 0.5em 0.9em 0.4em 0em;">[[File:Fig1 Garza BMCBioinformatics2016 17.gif|240px]]</div>
'''"[[Journal:Terminology spectrum analysis of natural-language chemical documents: Term-like phrases retrieval routine|Terminology spectrum analysis of natural-language chemical documents: Term-like phrases retrieval routine]]"'''
'''"[[Journal:From the desktop to the grid: Scalable bioinformatics via workflow conversion|From the desktop to the grid: Scalable bioinformatics via workflow conversion]]"'''


This study seeks to develop, test and assess a methodology for automatic extraction of a complete set of ‘term-like phrases’ and to create a terminology spectrum from a collection of natural language PDF documents in the field of chemistry. The definition of ‘term-like phrases’ is one or more consecutive words and/or alphanumeric string combinations with unchanged spelling which convey specific scientific meanings. A terminology spectrum for a natural language document is an indexed list of tagged entities including: recognized general scientific concepts, terms linked to existing thesauri, names of chemical substances/reactions and term-like phrases. The retrieval routine is based on n-gram textual analysis with a sequential execution of various ‘accept and reject’ rules with taking into account the morphological and structural [[information]].
Reproducibility is one of the tenets of the [[scientific method]]. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. [[Workflow]]s rest upon the notion of splitting complex work into the joint effort of several manageable tasks.


The assessment of the retrieval process, expressed quantitatively with a precision (P), recall (R) and F1-measure, which are calculated manually from a limited set of documents (the full set of text abstracts belonging to five EuropaCat events were processed) by professional chemical scientists, has proved the effectiveness of the developed approach. ('''[[Journal:Terminology spectrum analysis of natural-language chemical documents: Term-like phrases retrieval routine|Full article...]]''')<br />
There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free — an aspect that could potentially drive away members of the scientific community. ('''[[Journal:From the desktop to the grid: Scalable bioinformatics via workflow conversion|Full article...]]''')<br />
<br />
<br />
''Recently featured'':  
''Recently featured'':  
: ▪ [[Journal:Terminology spectrum analysis of natural-language chemical documents: Term-like phrases retrieval routine|Terminology spectrum analysis of natural-language chemical documents: Term-like phrases retrieval routine]]
: ▪ [[Journal:A legal framework to support development and assessment of digital health services|A legal framework to support development and assessment of digital health services]]
: ▪ [[Journal:A legal framework to support development and assessment of digital health services|A legal framework to support development and assessment of digital health services]]
: ▪ [[Journal:The GAAIN Entity Mapper: An active-learning system for medical data mapping|The GAAIN Entity Mapper: An active-learning system for medical data mapping]]
: ▪ [[Journal:The GAAIN Entity Mapper: An active-learning system for medical data mapping|The GAAIN Entity Mapper: An active-learning system for medical data mapping]]
: ▪ [[Journal:Visualizing the quality of partially accruing data for use in decision making|Visualizing the quality of partially accruing data for use in decision making]]

Revision as of 15:42, 15 August 2016

Fig1 Garza BMCBioinformatics2016 17.gif

"From the desktop to the grid: Scalable bioinformatics via workflow conversion"

Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks.

There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free — an aspect that could potentially drive away members of the scientific community. (Full article...)

Recently featured:

Terminology spectrum analysis of natural-language chemical documents: Term-like phrases retrieval routine
A legal framework to support development and assessment of digital health services
The GAAIN Entity Mapper: An active-learning system for medical data mapping