【24h】

A Pilot Study for Chinese SQL Semantic Parsing

机译:中文SQL语义解析的初步研究

获取原文

摘要

The task of semantic parsing is highly useful for dialogue and question answering systems. Many datasets have been proposed to map natural language text into SQL, among which the recent Spider dataset provides cross-domain samples with multiple tables and complex queries. We build a Spider dataset for Chinese, which is currently a low-resource language in this task area. Interesting research questions arise from the uniqueness of the language, which requires word segmentation, and also from the fact that SQL keywords and columns of DB tables are typically written in English. We compare character- and word-based encoders for a semantic parser, and different embedding schemes. Results show that word-based semantic parser is subject to segmentation errors and cross-lingual word embeddings are useful for text-to-SQL.
机译:语义分析的任务对于对话和问题解答系统非常有用。已经提出了许多将自然语言文本映射到SQL的数据集,其中最近的Spider数据集提供了具有多个表和复杂查询的跨域样本。我们为中文构建了一个Spider数据集,该数据集目前是该任务领域中一种资源匮乏的语言。有趣的研究问题来自于语言的独特性,这需要分词,也来自于SQL关键字和DB表的列通常以英语编写的事实。我们比较了基于字符和单词的编码器的语义解析器,以及不同的嵌入方案。结果表明,基于单词的语义解析器容易出现分段错误,跨语言单词嵌入对于从文本到SQL很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号