Download Project Document/Synopsis
To find prominent summarized points in a collection of documents. We here propose a system to detect summarized points from a huge or multiple paragraph. We use an efficient method to discover summarized points from the provided content using Natural language processing (NLP). The provided content is divided into two parts as Summarized Content and Summarized Point. One would expect particular words to appear in the content more or less frequently: “dog” and “bone” will appear more often in documents about dogs, “cat” and “meow” will appear in documents about cats, and “the” and “is” will appear equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. Our proposed system captures this intuition in a mathematical framework and will examine the content of particular set of documents. Here the system will extract keywords and will use clustering algorithm in order to discover topic from particular set of documents. System will extract keywords which occur often and will cluster this keywords using clustering algorithm and will detect summarized point from a collection of documents. This system takes co-occurrence of terms into account which gives best result.
Advantages
- User can specify how much percent the content should be summarized.
- The algorithm provides quick result with the summarized data.
- Selects the best suitable points for summarization.
Disadvantages
- This system extracts words rather than phrases.
- The provided content must be more than 100-150 characters.