Product specifications contain many data. It is not, however, clear which is the characteristic data in them. We are developing a multi-specifications summarization system using extracted characteristic data from the product specifications. The specifications are written in a
tag. The presence of the
tag in an HTML document does not necessarily indicate the presence of specifications. Less than 30% of HTML
tags are real tables in one particular domain. In this paper, we propose a method for keyword extraction for product specifications extraction. We evaluate the performance for two keyword sets, which are constructed by entropy and a Bayes theorem based method.
展开▼