This paper describes a method used to construct a thesaurus in the field of civil engineering. This work is an effort to investigate the potential of thesauri as a tool for information retrieval systems and as an aid in civil engineering. ThesWB, a tool that extracts terms and relations between them from HTML documents, was used for collecting candidate thesaurus terms from Web. The principal advantage of the Web as a source for thesaurus construction is that it can be viewed as a body of text containing two fundamentally different types of data: the contents and the tags. A tag in HTML is meta-data describing the layout and linking structure between the texts. For these kinds of documents we can apply different approaches to extract and structure terms automatically. ThesWB is used to construct domain independent thesaurus from HTML pages.
展开▼