首页> 外文会议>16th workshop on biomedical natural language processing >Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings
【24h】

Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings

机译:具有单词和字符N-Gram嵌入的临床自由文本的无监督上下文敏感拼写校正

获取原文
获取原文并翻译 | 示例

摘要

We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. We greatly outperform two baseline off-the-shelf spelling correction tools on a manually annotated MIMIC-III test set, and counter the frequency bias of an optimized noisy channel model, showing that neural embeddings can be successfully exploited to include context-awareness in a spelling correction model. Our source code, including a script to extract the annotated test data, can be found at https://github.com/ pieterfivez/bionlp2017.
机译:我们为使用单词和字符n-gram嵌入的临床自由文本提供了一种无监督的上下文相关的拼写校正方法。我们的方法生成了拼写错误的替换候选单词,并通过计算候选单词的矢量化表示与拼写错误上下文之间的加权余弦相似度,根据其语义适合度对它们进行排序。在人工注释的MIMIC-III测试集上,我们的性能大大优于两个基准的现成拼写校正工具,并克服了优化的噪声通道模型的频率偏差,表明神经嵌入可以成功地利用以将上下文感知包括在拼写校正模型。我们的源代码(包括用于提取带注释的测试数据的脚本)可以在https://github.com/pieterfivez/bionlp2017中找到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号