2024-03-29T14:46:57Z
https://meral.edu.mm/oai
oai:meral.edu.mm:recid/3336
2021-12-13T00:32:53Z
1582963302567:1597824273898
user-ucsy
Informative Content Extraction for Web Page using Text Density and Visionbased Page Segmentation (VIPS) Algorithm Integration
Mon, Ei Phyu Phyu
Yuzana
Web pages consist of not only actualcontent, but also other elements such as brandingbanners, navigational elements, advertisements,copyright etc.Irrelevant content in the Web page istreated as noisy content. This noisy content istypically not related to the main subjects of thewebpages. A method is necessary to extract theinformative content and discard the noisy contentfrom Web pages. This system is used an integrationof textual and visual importance features to extractthe informative contents from Web pages. Initially aweb page is converted into Document Object Model(DOM) tree. For each node in the DOM tree,textual and visual importance is calculated. Textualimportance and visual importance is combined toform hybriddensity.DensitySumis calculated andused in content extraction algorithm to extract theinformative content from Web pages. The algorithmis tested with various web domains and styles ofweb pages. Performance of web content extractionis obtained by calculating precision and recall.
2017-12-27
http://hdl.handle.net/20.500.12678/0000003336
https://meral.edu.mm/records/3336