2024-03-28T08:59:35Z
https://meral.edu.mm/oai
oai:meral.edu.mm:recid/4209
2021-12-13T04:40:59Z
1582963302567:1597824273898
user-ucsy
An Efficient Approach for Web Data Extraction
Htwe, Thanda
Kham, Nang Saing Moon
Most of the Web page typically contains clutterunlike conventional data or text. It usually has suchnoise data as navigation panels, copyright andprivacy notices, and advertisement. These noisedata can seriously harm for Web miners byextracting whole document rather than theinformative content and also retrieve non-relevantresults. So, eliminating these noise patterns is greatimportant. In this paper, we propose an effectivetechnique to detect and remove various noisepatterns from Web document to enhance Webmining. Our system first builds DOM tree structurefor an incoming Web page and then split it into subtreesto detect noise data. We also apply backpropagation neural network algorithm to classifyvarious noise patterns, data patterns and mixturepatterns in current Web page. The classificationresult of neural network is used for eliminatingvarious noise patterns. The proposed technique isevaluated on several commercial Web sites andNews Web sites to show the performance andimprovement of our approach.
2009-12-30
http://hdl.handle.net/20.500.12678/0000004209
https://meral.edu.mm/records/4209