Comparison of dimensionality reduction and clustering methods for SARS-CoV-2 genome
Untari N. Wisesty, Tati Rajab Mengko
Abstract
This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters.
Keywords
Autoencoder; Dimensionality reduction; Genome clustering; Principal component analysis; SARS-CoV-2
DOI:
https://doi.org/10.11591/eei.v10i4.2803
Refbacks
There are currently no refbacks.
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License .
<div class="statcounter"><a title="hit counter" href="http://statcounter.com/free-hit-counter/" target="_blank"><img class="statcounter" src="http://c.statcounter.com/10241695/0/5a758c6a/0/" alt="hit counter"></a></div>
Bulletin of EEI Stats
Bulletin of Electrical Engineering and Informatics (BEEI) ISSN: 2089-3191, e-ISSN: 2302-9285 This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU) .