Keyphrases provide semantic metadata that summarize and characterize documents. Kea is an algorithm for automatically extracting keyphrases from text. We use a large test corpus to evaluate its effectiveness in terms of how many author-assigned keyphrases are correctly identified. The system is simple, robust, and publicly available. Kea identifies candidate keyphrases using lexical methods, calculates feature values for each candidate, and uses a machine-learning algorithm to predict which candidates are good keyphrases. The machine learning scheme first builds a prediction model using training documents with known keyphrases, and then uses the model to find keyphrases in new documents.
展开▼