This article presents a word-sense annotation for the Balanced Corpus of Contemporary Written Japanese: a mashed-up Japanese lexicon based on the 'Word List by Semantic Principles' (WLSP). The WLSP is a large-scale Japanese thesaurus which includes 98,241 entries with syntactic and hierarchical semantic categories. We utilized a morpheme-word sense alignment table to extract all possible word sense candidates for each word appearing in the target corpus. Then, we manually disambiguated the word senses for 182,166 content words in the texts.
展开▼