|
Navigation
Research |
BioText: Improving Search over Bioscience Literature via language Analysis and User Interface DesignThe University of California at Berkeley has been given a grant to design,prototype, implement and evaluate a new search system for bioscience literature. There are significant components of the project that deal with processing of language as it is naturally spoken or written, the design of the search screen and how it works, as well as basic database design and implementation. Evaluation of the effectiveness and performance of the system will be done using techniques in which the same set of evaluation questions are applied at each stage of the design, thus ensuring that revisions always improve the results. To label the linguistic relations between biological terms and descriptions in the text, a new computational linguistics technique called "statistical semantic grammars" will be developed. The algorithms will be built on the investigator's earlier work on the relations between pairs or groups of nouns in natural language and drawing general ideas about unseen sequences of words by making use of the hierarchical structure of how word definitions are related to each other. The evaluation of the language labeling component will use standard measures of precision and recovery of information against a known standard collection of text. Although the initial data will come from PubMed abstracts, the design and interface are meant to be applicable for bioscience literature in general. |