{"created":"2020-09-01T15:02:51.881077+00:00","id":4532,"links":{},"metadata":{"_buckets":{"deposit":"9c9093a6-eff4-4c6a-8ab9-19ce7244b8b2"},"_deposit":{"id":"4532","owners":[],"pid":{"revision_id":0,"type":"recid","value":"4532"},"status":"published"},"_oai":{"id":"oai:meral.edu.mm:recid/4532","sets":["1582963302567:1597824322519"]},"communities":["ucsy"],"item_1583103067471":{"attribute_name":"Title","attribute_value_mlt":[{"subitem_1551255647225":"Predictive Big Data Analytics on High-Dimensional Data","subitem_1551255648112":"en_US"}]},"item_1583103085720":{"attribute_name":"Description","attribute_value_mlt":[{"interim":"Nowadays, data is extremely growing very fast to become “BIG DATA”, anyvoluminous amount of structured, semi-structured and unstructured data, which hashigh potential to be mined for valuable information in decision making process.Analyzing on big data using traditional data analysis methods has become the keychallenge in data analytics research. In addition, high-dimensional data analytics hasbeen a great attention in big data era because the dimensions of datasets arecontinuously growing in size. It creates a critical issue to reduce efficiently a subset ofdimensions from all diverse and raw data dimensions which will fulfill valuableinformation in decision making process. With increasing volumes of data, classicaldimensionality reduction algorithms which are designed to work well with small-scaledata usually face scalability bottleneck. Although Principal Component Analysis(PCA) could be applied as a dimensionality reduction algorithm in high-dimensionaldata, it is absolutely required to transform as scalable PCA (sPCA) for highdimensional big data.With the purpose of constructing efficient prediction model, Multiple LinearRegression (MLR), the redundant and irrelevant features or data dimensions arehighly potential to increase noises and biases which can hinder the prediction processof the model. In this research, two-stage dimension reduction approach is proposedfor the MLR model. Firstly, scalable Principal Component Analysis (sPCA) isproposed to solve the storage and computational problems of PCA by reducing thenumber of redundant dimensions without much loss of information. To examine thereduced feature subset resulted from sPCA stage whether correlated or not with theoutput variable of MLR model, Pearson Correlation Coefficient (PCC) is also appliedto reduce the number of irrelevant dimensions. Although the high dimensions of inputvoluminous data matrix have been reduced, it is still a big issue to solve how to splitor decompose this voluminous matrix containing large amount of observations or datarecords. Therefore, “QR Decomposition” is proposed to decompose large-scale matrixX into a Q and R product of an orthogonal matrix Q and an upper triangular matrix Rfor MLR model.In this research, the high-dimensional data reduction providing predictive bigdata analytics is implemented on distributed big data analytics platform, “ClouderaiiiDistribution Hadoop (CDH)” using Multi Node Cloudera Cluster using threecomputing nodes or VMs which all are interconnected with Cloudera Manager. Threediverse high-dimensional big data sources are applied not only evaluating theproposed approaches but also achieving predictive analysis results from the system.Firstly, geospatial big data, OpenStreetMap in XML format (OSM XML) is used toobtain “One-way Roads” prediction. Then, high-resolution or high-dimensionalrepresentation of images from MS-Celeb-A, a large-scale face attributes dataset areutilized to predict “Number of Faces” in these images. Finally, the raw, unstructuredtext data via “DeliciousMIL” dataset from UCI is applied as input text documents toobtain “Number of Documents (Education, Science && Technology, Culture &&History)” prediction results.According to the evaluation analysis, the proposed sPCA can efficientlyperform dimension reduction process with increasing size or number of datadimensions for diverse data types. It also shows the good scalability performancewhile the traditional PCA offers “Out of Memory” results. Applying the proposedtwo-stage approach (sPCA and PCC) achieves the victory of accuracy in 99 percent(%) for “One-way Roads” prediction. Furthermore, QR Decomposition approachproviding MLR model offers faster execution time for the system. Therefore, theproposed system provides better scalability, prediction accuracy, and faster executiontime in predictive analytics on high-dimensional big data."}]},"item_1583103108160":{"attribute_name":"Keywords","attribute_value":[]},"item_1583103120197":{"attribute_name":"Files","attribute_type":"file","attribute_value_mlt":[{"accessrole":"open_access","date":[{"dateType":"Available","dateValue":"2019-11-28"}],"displaytype":"preview","filename":"9Ph.D-10 Kyi Lai Lai Khine (Predictive Big Data Analytics on High-Dimensional Data).pdf","filesize":[{"value":"4291 Kb"}],"format":"application/pdf","licensetype":"license_note","mimetype":"application/pdf","url":{"url":"https://meral.edu.mm/record/4532/files/9Ph.D-10 Kyi Lai Lai Khine (Predictive Big Data Analytics on High-Dimensional Data).pdf"},"version_id":"42b84d54-53c2-43d6-b415-5afdb25658d7"}]},"item_1583103131163":{"attribute_name":"Journal articles","attribute_value_mlt":[{"subitem_issue":"","subitem_journal_title":"","subitem_pages":"","subitem_volume":""}]},"item_1583103147082":{"attribute_name":"Conference papers","attribute_value_mlt":[{"subitem_acronym":"","subitem_c_date":"","subitem_conference_title":"","subitem_part":"","subitem_place":"","subitem_session":"","subitem_website":""}]},"item_1583103211336":{"attribute_name":"Books/reports/chapters","attribute_value_mlt":[{"subitem_book_title":"","subitem_isbn":"","subitem_pages":"","subitem_place":"","subitem_publisher":""}]},"item_1583103233624":{"attribute_name":"Thesis/dissertations","attribute_value_mlt":[{"subitem_awarding_university":"University of Computer Studies, Yangon","subitem_supervisor(s)":[{"subitem_supervisor":""}]}]},"item_1583105942107":{"attribute_name":"Authors","attribute_value_mlt":[{"subitem_authors":[{"subitem_authors_fullname":"Khine, Kyi Lai Lai"}]}]},"item_1583108359239":{"attribute_name":"Upload type","attribute_value_mlt":[{"interim":"Publication"}]},"item_1583108428133":{"attribute_name":"Publication type","attribute_value_mlt":[{"interim":"Thesis"}]},"item_1583159729339":{"attribute_name":"Publication date","attribute_value":"2019-10"},"item_1583159847033":{"attribute_name":"Identifier","attribute_value":"http://onlineresource.ucsy.edu.mm/handle/123456789/2456"},"item_title":"Predictive Big Data Analytics on High-Dimensional Data","item_type_id":"21","owner":"1","path":["1597824322519"],"publish_date":"2019-11-28","publish_status":"0","recid":"4532","relation_version_is_last":true,"title":["Predictive Big Data Analytics on High-Dimensional Data"],"weko_creator_id":"1","weko_shared_id":-1},"updated":"2021-12-13T02:08:39.891397+00:00"}