<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-06-25T14:06:11Z</responseDate>
  <request verb="GetRecord" identifier="oai:meral.edu.mm:recid/3523" metadataPrefix="oai_dc">https://meral.edu.mm/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:meral.edu.mm:recid/3523</identifier>
        <datestamp>2021-12-13T00:34:28Z</datestamp>
        <setSpec>1582963302567:1597824273898</setSpec>
        <setSpec>user-ucsy</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Discovering Informative Content Blocks for Efficient Web Data Extraction</dc:title>
          <dc:creator>Hlaing, Nwe Nwe</dc:creator>
          <dc:creator>Nyunt, Thi Thi Soe</dc:creator>
          <dc:description>As web sites are getting more complicated,the construction of web information extractionsystems becomes more troublesome and timeconsuming.A common theme is the difficulty inlocating the segments of a page in which the targetinformation is contained, which we call theinformative blocks. So discriminating informativeblocks from the noisy blocks and then extracting theinformative blocks from web page is an importanttask. In this paper, we propose a method that utilizesboth the visual features and semantic information toextract information block. First, the VIPS (VisionbasedPage Segmentation) algorithm is used topartition a web page into semantic blocks with ahierarchy structure. Then spatial features (such asposition, size) and content feature (the number ofimage and links) are extracted to construct featurevector for each block. Secondly based on thesefeature, the blocks with similar content structuresand spatial structures are clustered by means ofsimilarity computation. After clustering blocks withsimilar structures, determine the cluster with thelargest size and nearest distance to the centre ofpage as informative block.</dc:description>
          <dc:date>2010-12-16</dc:date>
          <dc:identifier>http://hdl.handle.net/20.500.12678/0000003523</dc:identifier>
          <dc:identifier>https://meral.edu.mm/records/3523</dc:identifier>
        </oai_dc:dc>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
