随着Web 2.0的发展,用户再也不仅仅是网站内容的浏览者,而且也成为网站内容的创造者。通过用户上传分享信息逐渐成为互联网内容的重要活力源泉,例如,维基百科的参与者来自世界各地,谷歌地图搜索提供的修改和商户中心功能,大众点评的商户信息收录服务等。在用户由网上冲浪变为波浪制造者的同时,应考虑到用户上传分享内容的规范性和正确性。特别地,提供生活消费平台的网站,用户上传的商户地址信息的规范化尤显重要。为此针对大众点评网中的自由文本商户地址语料,提出基于层叠条件随机场对中文地址进行规范化的方法。实验结果表明所提出的中文地址规范化方法是有效的,在真实语料的开放测试中F值达到81%。%With the development of web 2.0, the user is no longer just browsing website's contents, but also becomes a maker of website contents.The information which shared and uploaded by users is becoming a vital source for Internet contents.For example, the participants of Wikipedia come from all places around the world;the modification and merchants-centre function offered by Google Maps search;the mer-chants information recording services in website of“public comments” ( www.dianping.com) , etc.While the users become the makers of the Internet content from internet surfers, we should also consider the standardisation and correctness of the information uploaded and shared by users.In particular, the standardisation of merchant address information is of utmost importance for those websites offering living consumption platforms.For this sake, the paper presents a method for Chinese address standardisation which is based on cascaded conditional random fields.Results of experiments indicate that the proposed Chinese address standardisation method is effective and the F-score achieves 81%in open testing of real corpus.
展开▼