Log in
Language:

MERAL Myanmar Education Research and Learning Portal

  • Top
  • Universities
  • Ranking
To
lat lon distance
To

Field does not validate



Index Link

Index Tree

Please input email address.

WEKO

One fine body…

WEKO

One fine body…

Item

{"_buckets": {"deposit": "506fe0ee-d4e5-4c4d-95f3-91adc31c4e60"}, "_deposit": {"id": "4931", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4931"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4931", "sets": ["1597824273898", "user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "Record matching is the task of identifying records that match the same real world entity. Detecting data records that are approximate duplicates, is an important task. Datasets may contain duplicate records concerning the same real-world entity because of data entry errors, unstandardized abbreviations, or differences in the detailed schemas of records from multiple databases. This paper describes a record matching algorithm, is based on the multi-pass sorted neighborhood method for publication datasets. It also detects data duplication over publication xml database, produces a higher percentage of correct duplicates and a lower percentage of false positive, on multiple key sorting pass. Multi-pass approach is used, which is based on the combination of keys. Since no single key is sufficient to catch all matching records, combining results of individual passes produces more accurate results at lower cost. According to experimental results, multi-pass approach is at lowest false positive error (FPE) and lowest false negative error (FNE)."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "record matching"}, {"interim": "approximate duplicate"}, {"interim": "multi-pass sorted neighborhood method"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-12"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "psc2010paper (20).pdf", "filesize": [{"value": "207 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 207000.0, "url": {"url": "https://meral.edu.mm/record/4931/files/psc2010paper (20).pdf"}, "version_id": "b776b92e-4a8a-4e9c-b73c-e5bb3802ec33"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Fifth Local Conference on Parallel and Soft Computing", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Yi, Soe Lai"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2010-12-16"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/832"}, "item_title": "Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004931", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-12"}, "publish_date": "2019-07-12", "publish_status": "0", "recid": "4931", "relation": {}, "relation_version_is_last": true, "title": ["Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method"], "weko_shared_id": -1}
  1. University of Computer Studies, Yangon
  2. Conferences

Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method

http://hdl.handle.net/20.500.12678/0000004931
http://hdl.handle.net/20.500.12678/0000004931
9659706a-05d7-4d84-8be9-48bdf5d42caa
506fe0ee-d4e5-4c4d-95f3-91adc31c4e60
None
Preview
Name / File License Actions
psc2010paper psc2010paper (20).pdf (207 Kb)
Publication type
Article
Upload type
Publication
Title
Title Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method
Language en
Publication date 2010-12-16
Authors
Yi, Soe Lai
Description
Record matching is the task of identifying records that match the same real world entity. Detecting data records that are approximate duplicates, is an important task. Datasets may contain duplicate records concerning the same real-world entity because of data entry errors, unstandardized abbreviations, or differences in the detailed schemas of records from multiple databases. This paper describes a record matching algorithm, is based on the multi-pass sorted neighborhood method for publication datasets. It also detects data duplication over publication xml database, produces a higher percentage of correct duplicates and a lower percentage of false positive, on multiple key sorting pass. Multi-pass approach is used, which is based on the combination of keys. Since no single key is sufficient to catch all matching records, combining results of individual passes produces more accurate results at lower cost. According to experimental results, multi-pass approach is at lowest false positive error (FPE) and lowest false negative error (FNE).
Keywords
record matching, approximate duplicate, multi-pass sorted neighborhood method
Identifier http://onlineresource.ucsy.edu.mm/handle/123456789/832
Journal articles
Fifth Local Conference on Parallel and Soft Computing
Conference papers
Books/reports/chapters
Thesis/dissertations
Back
0
0
views
downloads
See details
Views Downloads

Versions

Ver.1 2020-09-01 15:32:01.931100
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats
  • JSON

Confirm


Back to MERAL


Back to MERAL