A Domain Specific Entity Linking Approach Consuming Multistore Environment

İnan, Emrah; Inan, Emrah; Yönyül, Burak; Yonyul, Burak; Tekbacak, Fatih; Tekbacak, Fatih

doi:10.54856/jiswa.201805016

AKILLI SİSTEMLER VE UYGULAMALARI DERGİSİ
JOURNAL OF INTELLIGENT SYSTEMS WITH APPLICATIONS
J. Intell. Syst. Appl.

E-ISSN: 2667-6893

This work is licensed under a Creative Commons Attribution 4.0 International License.

A Domain Specific Entity Linking Approach Consuming Multistore Environment

Çoklu Veri Depo Ortamını Kullanan Alana Özgü Varlık Bağlama Yaklaşımı

How to cite: İnan E, Yönyül B, Tekbacak F. A domain specific entity linking approach consuming multistore environment. Akıllı Sistemler ve Uygulamaları Dergisi (Journal of Intelligent Systems with Applications) 2018; 1(1): 46-52.

Full Text: PDF, in Turkish.

Total number of downloads: 1840

Title: A Domain Specific Entity Linking Approach Consuming Multistore Environment

Abstract: Most of the data on the web is non-structural, and it is required that the data should be transformed into a machine operable structure. Therefore, it is appropriate to convert the unstructured data into a structured form according to the requirements and to store those data in different data models by considering use cases. As requirements and their types increase, it fails using one approach to perform on all. Thus, it is not suitable to use a single storage technology to carry out all storage requirements. Managing stores with various type of schemas in a joint and an integrated manner is named as 'multistore' and 'polystore' in the database literature. In this paper, Entity Linking task is leveraged to transform texts into wellformed data and this data is managed by an integrated environment of different data models. Finally, this integrated big data environment will be queried and be examined by presenting the method.

Keywords: Big data; multi store; querying; entity linking; data integration

Başlık: Çoklu Veri Depo Ortamını Kullanan Alana Özgü Varlık Bağlama Yaklaşımı

Özet: Web üzerindeki verilerin çoğu yapısal olmayan bir halde bulunmaktadır ve bu nedenle makinelerin işleyebileceği bir yapıya dönüştürülmesi gerekmektedir. Dolayısıyla yapısal olmayan bu verilerin öncelikle gereksinime göre yapısallaştırılması ve kullanım durumlarını dikkate alarak farklı veri modellerinde saklanması uygun olacaktır. Gereksinimler ve çeşitleri arttıkça tek bir yöntem hepsini çözmede yetersiz kalmaktadır. Buna göre farklı saklama ihtiyaçlarını karşılayan tek bir saklama teknolojisinin kullanılması da uygun olmayacaktır. Farklı tipte şemaya sahip depoların (store) birlikte ve bütünleşik olarak yönetilmesi veritabanı literatüründe multistore ve polystore (çoklu depo) olarak ifade edilmektedir. Bu çalışma kapsamında Varlık Bağlama problemi üzerinde durularak veriler yapısallaştırılacak ve bu veriler farklı veri modellerinde bütünleşik bir ortamda yönetilecektir. Son olarak bütünleştirilmiş bu büyük veri ortamı sorgulanacak ve yöntem belirlenerek incelenecektir.

Anahtar kelimeler: Büyük veri; çoklu depo; sorgulama; varlık bağlama; veri bütünleştirme

Bibliography:

Shen W, Wang J, Han J. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 2015; 27(2): 443-460.
Bunescu R, Pasca M. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), Trento, Italy, April 9-16, 2006.
Cucerzan S. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), Prague, Czech Republic, June 28-30, 2007, pp. 708-716.
Ratinov L, Roth D, Downey D, Anderson M. Local and global algorithms for disambiguation to Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT'11), Portland, Oregon, June 19-24, 2011, pp. 1375-1384.
Milne D, Witten IH. Learning to link with Wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM '08), Napa Valley, California, USA, October 2008, pp. 509–518.
Kulkarni S, Singh A, Ramakrishnan G, Chakrabarti S. Collective annotation of Wikipedia entities in web text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09), Paris, France, June 28 - July 1, 2009, pp. 457-466.
Moro A, Raganato A, Navigli R. Entity linking meets word sense disambiguation: A unified approach. Transactions of the Association for Computational Linguistics 2014; 2: 231-244.
Navigli R, Ponzetto SP. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 2012; 193: 217-250.
Mahdisoltani F, Biega J, Suchanek FM. YAGO3: A knowledge from multilingual Wikipedias. In 7th Biennial Conference on Innovative Data Systems Research (CIDR 2015), Asilomar, California, USA, January 4-7, 2015.
Suchanek FM, Kasneci G, Weikum G. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proceedings of the 16th International World Wide Web Conference, Banff, Alberta, Canada, May 8-12, 2007, pp. 697-706.
De Melo G, Weikum G. UWN: A multilingual lexical knowledge base. In Proceedings of the ACL 2012 System Demonstrations, Jeju Island, Korea, July 2012, pp. 151-156.
Hoffart J, Suchanek FM, Berberich K, Kelham EL, De Melo G, Weikum G. YAGO2: Exploring and querying world knowledge in time, space, context and many langauges. In Proceedings of the 20th International Conference Companion on World Wide Web (WWW 2011), Hyderabad, India, 2011, pp. 229-232.
Lenzerini M. Data integration: A theoretical perspective. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Madison, Wisconsin, June 3-5, 2002, pp. 233-246.
Halevy A, Rajaraman A, Ordille J. Data integration: The teenage years. In Proceedings of the 32nd International Conference on Very Large Databases (VLDB'06), Seoul, Korea, September 12-15, 2006, pp. 9-16.
Bondiombouy C, Kolev B, Levchenko O, Valduriez P. Integrating big data and relational data with a functional sql-like query language. In International Conference on Database and Expert Systems Applications, Valencia, Spain, September 2015, pp. 170-185.
Kolev B, Valduriez P, Bondiombouy C, Jimenez-Peris R, Pau R, Pereira J. CloudMdsQL: Querying heterogeneous cloud data stores with a common language. Distributed and Parallel Databases 2015; 34(4): 463-503.
Bondiombouy C, Kolev B, Levchenko O, Valduriez P. Multistore big data integration with CloudMdsQL. Book Chapter in Transactions on Large-Scale Data-and Knowledge-Centered Systems XXVIII (editors: Hameurlain A, Kung J, Wagner R, Chen Q), vol. 9940, 2016, pp. 48-74.
Kolev B, Bondiombouy C, Levchenko O, Valduriez P, Jimenez-Peris R, Pau R, Pereira J. Design and implementation of the CloudMdsQL multistore system. In 6th International Conference on Cloud Computing and Services Science (CLOSER), Roma, Italy, April 2016, pp. 352-359.
Kolev B, Bondiombouy C, Valduriez P, Jimenez-Peris R, Pau R, Pereira J. The CloudMdsQL multistore system. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD'16), San Francisco, California, USA, June 26 - July 01, 2016, pp. 2113-2116.
Duggan J, Elmore AJ, Stonebraker M, Balazinska M, Howe B, Kepner J, Madden S, Maier D, Mattson T, Zdonik S. The BigDAWG polystore system. ACM Sigmod Record 2015; 44(2): 11-16.
Saeed M, Villarroel M, Reisner AT, Clifford G, Lehman LW, Moody G, Heldt T, Kyaw TH, Moody B, Mark RG. Multiparameter intelligent monitoring in intensive care II (MIMIC-II): A public access intensive care unit database. Critical Care Medicine 2011; 39(5): 952-960.
Bugiotti F, Bursztyn D, Deutsch A, Ileana I, Manolescu I. Invisible glue: Scalable self-tuning multi-stores. In Conference on Innovative Data Systems Research (CIDR), Asilomar, United States, January 2015.
Bondiombouy C, Valduriez P. Query processing in multistore systems: An overview. Research Report No. 8890, INRIA Sophia Antipolis-Mediterranee, 2016, pp. 38.
Zhu M, Risch T. Querying combined cloud-based and relational databases. In International Conference on Cloud and Service Computing (CSC 2011), Hong Kong, China, December 12-14, 2011, pp. 330-335.
Ong KW, Papakonstantinou Y, Vernoux R. The SQL++ semistructured data model and query language: A capabilities survey of SQL-on-Hadoop, NoSQL and NewSQL databases. CoRR Database in Arxiv, abs/1405.3631, April 2014.
Simitsis A, Wilkinson K, Castellanos M, Dayal U. Optimizing analytic data flows for multiple execution engines. In Proceedings of the 2012 International Conference on Management of Data (SIGMOD'12), Scottsdale, Arizona, USA, May 20-24, 2012, pp. 829-840.
DeWitt DJ, Halverson A, Nehme R, Shankar S, AguilarSaborit J, Avanes A, Flasza M, Gramling J. Split query processing in polybase. In Proceedings of the 2013 International Conference on Management of Data (SIGMOD'13), New York, USA, June 22-27, 2013, pp. 1255-1266.
Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment 2009; 2(1): 922-933.
Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M. Spark SQL: Relational data processing in Spark. In Proceedings of the 2015 International Conference on Management of Data (SIGMOD'15), Melbourne, Victoria, Australia, May 31- June 4, 2015, pp. 1383-1394.
Ferragina P, Scaiella U. TAGME: On-the-fly annotation of short text fragments (by Wikipedia entities). In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10), Toronto, Canada, 2010, pp. 1625-1628.
Ristoski P, Paulheim H. RDF2Vec: RDF graph embeddings for data mining. In 15th International Semantic Web Conference (ISWC 2016), Kobe, Japan, October 17-21, 2016, pp. 498-514.