Comparison of Templates with Word2vec in Finding Semantic Relations Between Words

Ant, Kaan; Ant, Kaan; Soğukpınar, Uğur; Sogukpinar, Ugur; Amasyalı, Mehmet Fatih; Amasyali, Mehmet Fatih

doi:10.54856/jiswa.201805007

AKILLI SİSTEMLER VE UYGULAMALARI DERGİSİ
JOURNAL OF INTELLIGENT SYSTEMS WITH APPLICATIONS
J. Intell. Syst. Appl.

E-ISSN: 2667-6893

This work is licensed under a Creative Commons Attribution 4.0 International License.

Comparison of Templates with Word2vec in Finding Semantic Relations Between Words

Kelimeler Arası Anlamsal İlişkilerin Bulunmasında Word2vec ile Şablonların Karşılaştırılması

How to cite: Ant K, Soğukpınar U, Amasyalı MF. Comparison of templates with word2vec in finding semantic relations between words. Akıllı Sistemler ve Uygulamaları Dergisi (Journal of Intelligent Systems with Applications) 2018; 1(1): 13-17. DOI: 10.54856/jiswa.201805007

Full Text: PDF, in Turkish.

Total number of downloads: 969

Title: Comparison of Templates with Word2vec in Finding Semantic Relations Between Words

Abstract: The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.

Keywords: Natural language processing; commonsense databases; semantic spaces; relation templates

Başlık: Kelimeler Arası Anlamsal İlişkilerin Bulunmasında Word2vec ile Şablonların Karşılaştırılması

Özet: Doğal dil işleme çalışmalarının daha etkili yapılabilmesi için kelimeler arası anlamsal ilişkileri içeren veri tabanlarının kullanımı giderek yaygınlaşmaktadır. Kelime torbası yaklaşımı yerine önerilen anlamsal uzaylar kelimeler arası ilişkilerin büyüklüklerini vermekte ancak ilişki türünü ifade etmemektedir. Bu çalışmada anlamsal uzayların ilişki türü bulmada nasıl kullanılabileceği gösterilmiş ve şablonlar yöntemiyle karşılaştırılması yapılmıştır. Oldukça büyük (1 GB) bir derlem üzerinde elde edilen sonuçlara göre üst kavramıdır ve zıt anlamlıdır, ilişkileri için anlamsal uzaylar daha başarılı olurken, nerede bulunur, neyden yapılmıştır ilişki türlerinde ve ilişkisizliğin belirlenmesinde şablonlar yaklaşımı daha başarılı olmuştur.

Anahtar kelimeler: Doğal dil işleme; hayat bilgisi veritabanları; anlamsal uzaylar; ilişki şablonları

Bibliography:

Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography 1990; 3(4): 235-244.
Lenat DB. Cyc: A large-scale investment in knowledge infrastructure. The Communications of the ACM 1995; 38(11): 33-38.
Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka ER, Mitchell TM. Toward an architecture for never-ending language learning. AAAI Publications, Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010, pp. 1306-1313.
Liu H, Singh P. ConceptNet: A practical commonsense reasoning toolkit. BT Technology Journal 2004; 22: 211-226.
Amasyali MF, Inak B, Ersen MZ. Türkçe hayat bilgisi veri tabanının oluşturulması. Akademik Bilişim Konferansı, 2010.
Amasyali MF, Beken A. Measurement of Turkish word semantic similarity and text categorization application. In 2009 IEEE 17th Signal Processing and Communications Applications Conference (SIU), 2009.
Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 2013, pp. 3111-3119.
Haris ZS. Mathematical Structures of Languages, Wiley, s.12, 1968.
Handler A. An empirical study of semantic similarity in WordNet and Word2Vec. PhD Thesis. Columbia University, 2014.
Sahin G, Amasyali MF. Iterative information extraction from large text collections. EMO Bilimsel Dergi 2014; 4(7): 13-20.
Sak H, Gungor T, Saraclar M. Turkish language resources: Morphological parser, morphological disambiguator and web corpus. In Advances in Natural Language Processing, Springer Berlin Heidelberg, 2008, pp. 417-427.
Yildiz T, Yildirim S, Diri B. Extraction of part-whole relations from Turkish corpora. In CICLing 2013: Computational Linguistics and Intelligent Text Processing, LNCS 7816, 2013, pp. 126-138.
Apache spark docs. http://spark.apache.org/
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 2009; 11(1): 10-18.
Hall MA. Correlation-based feature selection for machine learning. PhD Thesis, The University of Waikato, 1999.