This paper presents our recent workfor participation in the First InternationalChinese Word Segmentation Bakeoff(ICWSB-1). It is based on a generalpurposengram model for word segmentationand a case-based learning approachto disambiguation. This system excelsin identifying in-vocabulary (IV) words,achieving a recall of around 96-98%.Here we present our strategies for languagemodel training and disambiguationrule learning, analyze the system's performance,and discuss areas for further improvement,e.g., out-of-vocabulary (OOV)word discovery.
展开▼