Author(s): Laippala V, Ginter F, Pyysalo S, Salakoski T
Abstract Share this page
Abstract INTRODUCTION: In this paper, we present steps taken towards more efficient automated processing of clinical Finnish, focusing on daily nursing notes in a Finnish Intensive Care Unit (ICU). First, we analyze ICU Finnish as a sublanguage, identifying its specific features facilitating, for example, the development of a specialized syntactic analyser. The identified features include frequent omission of finite verbs, limitations in allowed syntactic structures, and domain-specific vocabulary. Second, we develop a formal grammar and a parser for ICU Finnish, thus providing better tools for the development of further applications in the clinical domain. METHODS: The grammar is implemented in the LKB system in a typed feature structure formalism. The lexicon is automatically generated based on the output of the FinTWOL morphological analyzer adapted to the clinical domain. As an additional experiment, we study the effect of using Finnish constraint grammar to reduce the size of the lexicon. The parser construction thus makes efficient use of existing resources for Finnish. RESULTS: The grammar currently covers 76.6\% of ICU Finnish sentences, producing highly accurate best-parse analyzes with F-score of 91.1\%. We find that building a parser for the highly specialized domain sublanguage is not only feasible, but also surprisingly efficient, given an existing morphological analyzer with broad vocabulary coverage. The resulting parser enables a deeper analysis of the text than was previously possible.
This article was published in Int J Med Inform
and referenced in Journal of Health & Medical Informatics