...
首页> 外文期刊>International journal of medical informatics >Empirical analysis of Zipf s law, power law, and lognormal distributions in medical discharge reports
【24h】

Empirical analysis of Zipf s law, power law, and lognormal distributions in medical discharge reports

机译:ZIPF S法,权力法和逻辑分布在医学释放报告中的实证分析

获取原文
获取原文并翻译 | 示例

摘要

Background: Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions.Objective: This paper empirically analyses whether text in medical discharge reports follow Zipf's law, a commonly assumed statistical property of language where word frequency follows a discrete power-law distribution. Method: We examined 20,000 medical discharge reports from the MIMIC-III dataset.Methods included splitting the discharge reports into tokens, counting token frequency, fitting power-law distributions to the data, and testing whether alternative distributions-lognormal, exponential, stretched exponential, and truncated powerlaw-provided superior fits to the data.Result: Discharge reports are best fit by the truncated power-law and lognormal distributions. Discharge reports appear to be near-Zipfian by having the truncated power-law provide superior fits over a pure power-law.Conclusion: Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power-law and lognormal probability priors and non-parametric models that capture power-law behavior.
机译:背景:贝叶斯建模和统计文本分析依赖于知情概率前锋来鼓励良好的解决方案。目的:本文经验分析了医学释放报告中的文本遵循ZIPF的法律,如何假设语言的统计属性,其中单词频率遵循离散的幂律。分配。方法:我们检查了来自MIMIC-III DataSet的20,000个医疗释放报告。包括将排放报告分成令牌,计数令牌频率,拟合幂律分布到数据,测试是否替代分布 - 逻辑,指数,呈现幂等数字,并截断的PowerLaw提供的卓越适合数据。结果:排放报告最适合截断的幂律和井展分布。截断的幂律似乎近Zipfian似乎是近ZIPFIAN,提供优越的符合纯粹的幂律。结论:我们的研究结果表明,贝叶斯建模和出院报告文本的统计文本分析将受益于使用截短的幂律和捕捉幂律行为的逻辑正式概率前沿和非参数模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号