EPIA'03 - 11th Portuguese Conference on Artificial Intelligence

NLTR -- Natural Language and Text Retrieval


Session: December 5, 10:0-10:45, Room B
Title: Automatic Summarization based on Principal Component Analysis
Chang Beom Lee, Min Soo Kim , Jang Sun Baek, and Hyuk Ro Park
Abstract: This paper describes an automatic summarization approach that constructs a summary by extracting sentences that are likely to represent the main theme of a document. The approach takes advantage of the relationship of words in the document to improve the detection of significant sentences. The particular technique used is Principal Component Analysis(PCA) which is one of the multivariate statistical methods. The PCA can understand the flow of words in the document on the basis of an eigenvector and its corresponding eigenvalue. We extract thematic words by an eigenvector and corresponding eigenvalue, and select significant sentences by the thematic words. Experimental results using newspaper articles show that the proposed method is superior to the methods that exploit word frequency or lexical chain using information retrieval thesaurus.
Back to schedule.