Recent years have seen a huge increase in the amount of biomedical informationthat is available in electronic format. Consequently, for biomedical researcherswishing to relate their experimental results to relevant data lurking somewhere withinthis expanding universe of on-line information, the ability to access and navigatebiomedical information sources in an efficient manner has become increasinglyimportant. Natural language and text processing techniques can facilitate this taskby making the information contained in textual resources such as MEDLINEmore readily accessible and amenable to computational processing. Names ofbiological entities such as genes and proteins provide critical links between differentbiomedical information sources and researchers' experimental data. Therefore,automatic identification and classification of these terms in text is an essentialcapability of any natural language processing system aimed at managing the wealthof biomedical information that is available electronically. To support term recognitionin the biomedical domain, we have developed Termino, a large-scale terminologicalresource for text processing applications, which has two main components: first, adatabase into which very large numbers of terms can be loaded from resources suchas UMLS, and stored together with various kinds of relevant information; second,a finite state recognizer, for fast and efficient identification and mark-up of termswithin text. Since many biomedical applications require this functionality, we havemade Termino available to the community as a web service, which allows for itsintegration into larger applications as a remotely located component, accessed througha standardized interface over the web.
展开▼