Subtopic segmentation aims at finding the boundaries among text passages that represent different subtopics, which usually develop a main topic in a text. Being capable of automatically detecting subtopics is very useful for several Natural Language Processing applications. This paper describes subtopic annotation in a corpus of news texts written in Brazilian Portuguese. In particular, we focus on answering the main scientific questions regarding corpus annotation, aiming at both discussing and dealing with important annotation decisions and making available a reference corpus for research on subtopic structuring and segmentation.
展开▼