Flag Counter
AKILLI SİSTEMLER VE UYGULAMALARI DERGİSİ
JOURNAL OF INTELLIGENT SYSTEMS WITH APPLICATIONS
J. Intell. Syst. Appl.
E-ISSN: 2667-6893
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.

Metaheuristics Based Clustering Algorithms on Document Clustering

Metin Belgesi Kümelemede Metasezgisel Yöntemlere Dayalı Kümeleme Algoritmaları

How to cite: Onan A. Metaheuristics based clustering algorithms on document clustering. Akıllı Sistemler ve Uygulamaları Dergisi (Journal of Intelligent Systems with Applications) 2019; 2(1): 39-45. DOI: 10.54856/jiswa.201905059

Full Text: PDF, in Turkish.

Total number of downloads: 833

Title: Metaheuristics Based Clustering Algorithms on Document Clustering

Abstract: Cluster analysis is an important exploratory data analysis technique which divides data into groups based on their similarity. Document clustering is the process of employing clustering algorithms on textual data so that text documents can be retrieved, organized, navigated and summarized in an efficient way. Document clustering can be utilized in the organization, summarization and classification of text documents. Metaheuristic algorithms have been successfully utilized to deal with complex optimization problems, including cluster analysis. In this paper, we analyze the clustering quality of five metaheuristic clustering algorithms (namely, particle swarm optimization, genetic algorithm, cuckoo search, firefly algorithm and yarasa algorithm) on fifteen text collections in term of F-measure. In the empirical analysis, two conventional clustering algorithms (K-means and bi-secting k-means) are also considered. The experimental analysis indicates that swarm-based clustering algorithms outperform conventional clustering algorithms on text document clustering.

Keywords: Document clustering; swarm intelligence; metaheuristic algorithms


Başlık: Metin Belgesi Kümelemede Metasezgisel Yöntemlere Dayalı Kümeleme Algoritmaları

Özet: Kümeleme analizi, verileri benzerliklerine göre gruplarına ayıran önemli bir veri analizi tekniğidir. Belge kümeleme, kümeleme algoritmalarının metin belgeleri üzerinde uygulanması ile belgelerin etkin bir biçimde geri getiriminin, organizasyonunun, erişiminin ve özetlenmesinin olanaklı hale gelmesini sağlar. Belge kümeleme, metin belgelerinin organizasyonu, özetlenmesi ve sınıflandırılmasında kullanılabilir. Metasezgisel algoritmalar, aralarında kümeleme analizinin de yer aldığı birçok karmaşık eniyileme probleminin çözümünde uygulanmaktadır. Bu çalışmada, beş metasezgisel kümeleme algoritmasının (parçacık sürüsü eniyilemesi, genetik algoritma, guguk kuşu algoritması, ateşböceği algoritması ve yarasa algoritması) on beş metin veri seti üzerinde F-ölçütü aracılığı ile değerlendirilmiştir. Deneysel analizlerde, iki geleneksel kümeleme algoritması (K-ortalama ve ikiye ayırma Kortalama) da dikkate alınmıştır. Deneysel analiz sonuçları, sürü zekasına dayalı kümeleme algoritmalarının daha yüksek başarım elde ettiğini göstermektedir.

Anahtar kelimeler: Belge kümeleme; sürü zekası; metasezgisel algoritmalar


Bibliography:
  • Das S, Abraham A, Konar A. Metaheuristic Clustering. Springer, Berlin, Heidelberg, 2009.
  • Hasan MJA, Ramakrishnan S. A survey: Hybrid evolutionary algorithms for cluster analysis. Artificial Intelligence Review 2011; 36: 179-204.
  • Alsumait L, Domeniconi C. Chapter 5: Text clustering with local semantic kernels. Book chapter in Survey of Text Mining (editor: Berry MW), Springer, New York, NY, 2008, pp. 87-105.
  • Onan A, Bulut H, Korukoglu S. An improved ant algorithm with LDA-based representation for text document clustering. Journal of Information Science 2017; 43(2): 275-292.
  • Aggarwal CC, Zhai CX. Mining Text Data. Springer, Boston, MA, 2012.
  • Hruschka ER, Campello RJGB, Freitas AA, Carvalho AC. A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 2009; 39: 133-155.
  • Song W, Park SC. Genetic algorithm for text clustering based on latent semantic indexing. Computers and Mathematics with Applications 2009; 57: 1901-1907.
  • Hasanzadeh E, Poyanrad M, Rokny HA. Text clustering on latent semantic indexing with particle swarm optimization algorithm. International Journal of the Physical Sciences 2012; 7(1): 116-120.
  • Forsati R, Keikha A, Shamsfard M. An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing 2015; 159: 9-26.
  • Vaijayanti P, Natarajan AM, Murugadoss R. Ants for document clustering. International Journal of Computer Science 2012; 9(2): 493-499.
  • Azaryuon K, Fakhar B. A novel document clustering algorithm based on ant colony optimization algorithm. Journal of Mathematics and Computer Sciences 2013; 7: 171-180.
  • Avanija J, Ramar K. Semantic similarity-based clustering of web document using fuzzy c-means. International Journal of Computational Intelligence and Applications 2015; 14(3): 1550015.
  • Forsati R, Mahdavi M, Shamsfard M, Meybod MR. Efficient stochastic algorithms for document clustering. Information Science 2013; 220: 269-291.
  • Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kaufmann, Waltham, MA, USA, 2006.
  • Theordoridis S, Koutroumbas K. Pattern Recognition. Academic Press, Burlington, MA, USA, 1999.
  • Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques. In KDD Workshop on Text Mining, August 20, 2000, Boston, USA.
  • Reddy CK, Vinzamuri B. Chapter 4: A survey of partitional and hierarchical clustering algorithms. Book chapter in Data Clustering: Algorithms and Applications (editors: Aggarwal CC, Reddy CK), CRC Press, Boca Raton, FL, USA, 2013, pp. 87-107.
  • Talbi EG. Metaheuristics: From Design to Implementation. Wiley, Hoboken, New Jersey, USA, 2009.
  • Yang XS. Nature-Inspired Metaheuristic Algorithms. Luniver Press, 2008.
  • Yang XS, Deb S. Cuckoo search via Levy flights. In NABIC 2009 Congress, December 9-11, 2009, Coimbatore, India.
  • Onan A. Hybrid supervised clustering based ensemble scheme for text classification. Kybernetes 2017; 46(2): 330-348.
  • Yang XS. A new metaheuristic bat-inspired algorithm. In the Proceedings of the Nature inspired Cooperative Strategies for Optimization (NICSO), May 12-14, 2010, Granada, Spain.
  • Obitko M. Introduction to genetic algorithms. 1998, Retrieved from http://www.obitko.com/tutorials/genetic-algorithms
  • Rossi RG, Marcacini RM, Rezende SO. Benchmarking text collections for classification and clustering tasks. Technical Report, University of Sao Paulo, Brasil, 2013.
  • Min X, Liu L, He Y, Gong X, Fong G, Xu Q, Wong KKL. Benchmarking swarm intelligence clustering algorithms with case study of medical data. Computerized Medical Imaging and Graphics 2016; Withdrawn article in press.