|
EPIA'03 - 11th Portuguese Conference on Artificial Intelligence
NLTR -- Natural Language and Text Retrieval
|
Session: December 5, 10:0-10:45, Room B |
Title: |
Automatic Summarization based on Principal Component Analysis |
|
Chang Beom Lee, Min Soo Kim , Jang Sun Baek, and Hyuk Ro Park |
Abstract: |
This paper describes an automatic summarization approach that constructs a summary by extracting sentences that are likely to represent the main theme of a document. The approach takes advantage of the relationship of words in the document
to improve the detection of significant sentences.
The particular technique used is Principal Component Analysis(PCA) which is one of the multivariate statistical methods. The PCA can understand the flow of words in the document on the basis of an eigenvector and its corresponding eigenvalue. We extract thematic words by an eigenvector and corresponding eigenvalue, and select significant sentences by the thematic words.
Experimental results using newspaper articles show that the proposed method is superior to the methods that exploit
word frequency or lexical chain using information retrieval thesaurus. |
Back to schedule. |