[ᠨᠠᠰᠤᠨ᠋ᠤ᠋ᠷᠲᠤ᠂ ᠬᠠᠰ] - ᠦᠭᠦᠯᠡᠯ - (ᠠᠩᠭ᠍ᠯᠢ) - The Automatic Construction Method of Mongolian Lexical Semantic Network Based on WordNet
 The Automatic Construction Method of Mongolian Lexical Semantic Network Based on WordNet Hasi Computer and Information Engineering College Inner Mongolia Normal University Huhhot, China e-mail: hasi@imnu.edu.cn Nasun-urt Academy of Mongolian studies Inner Mongolia University Huhhot, China e-mail: mgnasun@imu.edu.cn Abstract—This paper introduces a automatic constructing method of Mongolian lexical Semantic Network based on WordNet. Mongolian lexical Semantic Network is mainly used in mongolian lexical semantic information queries and machine translation applications. We take Mongolian Grammatical Information Dictionary ,DARHAN dictionary and other mongolian dictionaries as the resourses, develop the Mongolian lexical semantic Network. The main contents include synonym set building, automatic generation algorithm of the semantic relation between synsets and function of lexical semantic information query and maintenance. Keywords- Mongolian; Lexical Semantic Network ; WordNet Sets of Synonyms; I. INTRODUCTION In recent years, researchers have done many explorations for semantic in Mongolian information processing, which involve semantic classification of nouns, verbs and adjectives, case frame, the coordination valence theory etc. Although these studies have made certain achievements, machine translation systems and other natural language processing systems usually need an electronic dictionary including semantic knowledge so that automatic computer can offer more comprehensive and thorough analysis of the semantic information. “Mongolian information grammatical dictionary” covers semantic information, but it is not a semantic dictionary so there is an urgent need to construct a Mongolian semantic information dictionary. In addition, the current semantics in information processing oriented study is beyond of inner semantic relationship between Mongolian verbs, nouns, adjectives, a verb and noun, nouns and nouns, nouns and adjectives . Therefore, research core of Mongolian semantic information processing is using the semantic net and concept interdependence theory to analyze the semantic relationship between words and phrases and establish the semantic relationship net. The most representative WordNet about Semantic relationship is English WordNet, made by cognitive science laboratory, Princeton University, which is an online dictionary database system, based on the English lexical semantic network, and organizes English nouns, verbs, adjectives and adverbs for synonyms synsets, and each set presents a basic vocabulary concept, and there are synonymous relationship, antonymous relationship, upper relationship, bottom relationship, part relationship and complete relationship which are all lexical semantic relationship between these concepts. At present, this WordNet has been successfully used in eliminating different meanings of words, automatic processing in linguistics, bilingual and multilingual machine translation, searching system and so on, and is widely considered as the most important source for computational linguistics, text analysis and many other related fields. There are also Chinese WordNet researches based on English WordNet in China. Present achievements about Mongolian grammar knowledge base should be absorbed to construct the semantic knowledge base which is considered as an effective way. Because by using theories and methods in computational linguistics and computational semantics, the information processing oriented semantic net will be constrcuted automatically in computer. The task of lexical semantic study is to present various semantic relationships between concepts and between properties comprehensively and thoroughly, so as to convenient for calculating. Compared with other languages in the world such as Chinese and English, the informationization of Mongolian starts later. Particularly, there exists relatively more blanks in information processing oriented Mongolian semantic research. So the very complete theory system and the strict theory category have not been formed, there are disputes among many basic theory problems. The inadequacy of the basic research in Mongolian semantic analysis and description, and the slow work in semantic information processing keep the work of Mongolian information processing from being deeper. Maybe just because of this, semantics has a great developing prospect. As an important part of Mongolian lexical research, word sense has many related achievements, and the theory foundation for Mongolian information processing has been established. But the research achievements can not satisfy the information processing oriented requirements. Because the relationship between word senses is the basis and important in information processing, the researching work of it is gradually booming in recent years. The School of Mongolian Studies in Inner Monglolia University took the item of semantic in Mongolian information processing supported by the national social 2012 Fifth International Conference on Intelligent Networks and Intelligent Systems 978-0-7695-4855-5/12 $26.00 ' 2012 IEEE DOI 10.1109/ICINIS.2012.13 220science fund from 2001 to 2007. The item is aimed at establishing an information processing oriented Mongolian semantic description system. Specifically, with the beginning of the processing work and the development of Mongolian information grammatical dictionary, a part of speech classification of Mongolian language for information processing and its marking sets have been set up. This is primarily to classify Mongolian words, compound words, fixed phrases, Mongolian characters, and added components from the point of view of grammatical information processing, and set up their marks. And this study tried to make a semantic classification of nouns, adjectives and verbs based on the achievements of traditional Mongolian research. In other words, according to the noun, adjective, verb semantic features with characteristics, nouns are divided into seven levels of more than 130 classes, adjectives are divided into two levels of 21 classes, and verbs are divided into three levels of nearly 150 classes. As word with characteristics that are closely associated with its valence, Mongolian verb into the verb rather than with the terms of the number 0 price of a verb, a price of verbs, verbs of the two price trivalent verbs are involved in the this study. Case grammar study, although unable to determine how much of the semantic grid Mongolian, is able to determine the number and form of expression characteristics of the Mongolian grid. Mongolian semantic field study of its type, the relationship between justice, the relationship between the semantic field such as a certain basis. However, based on, like that minimal semantic field to start the systematic study of the Mongolian semantic field has just begun. On the other hand, basic research and application development are inseparable from the vocabulary of grammatical attributes and semantic description of the property. Without a vocabulary-based syntax and semantic attributes describing the system, it is unable to meet the needs of deep-level language information processing. Since 2008, the School of Mongolian Studies in Inner Mongolian University has been developing the Mongolian semantic information dictionary under the support of the National Natural Science Fund, to construct the Mongolian semantic information dictionary based on the available language resource of Mongolian information grammatical dictionary, and provides comprehensive language knowledge combining grammar with semantics for automatic processing of Mongolian sentences in machine translation, automatic proofreading and text retrieval system. Mongolian information processing research in recent years has further developed, along with the increasingly urgent demand in the field of semantic knowledge. This demand relates to the language units the sememe Morphemes, meaning and sentence meaning such as multi-faceted. Analysis of actual demand, combined with the Mongolian semantic progress of the study, the establishment of a comprehensive the Mongolian semantic knowledge base is the fundamental way to solve the semantic problem. Mongolian Semantic Information Dictionary “such as a semantic knowledge base as the basic units of modern Mongolian word“ to organize the records of their use of language in a variety of semantic information. Initially built Dictionary content library by the semantic properties and semantic classification library, the semantic properties of the library also includes a number of sub-libraries of the pool, and things warehouses, sports and warehouses and traits and warehouses. The library according to its main function from a different angle on the head or vocabulary included the word semantic description of the semantic information recorded to meet the required syntax and sentence meaning processing semantic knowledge. The purpose of the establishment of the dictionary is a want to meet Mongolian information processing in the syntactic analysis of sentence meaning analysis, similarity computing needs of the semantic knowledge, and everyday language work and language learning word meaning knowledge, third, to provide for linguistic theory a wide range of meaningful semantic basis. The dictionary contains all the commonly used word in Mongolian Grammatical Knowledge “(over 38,000) and additional ingredients, but also on the basis of an additional part of the commonly used compound words constitute a dictionary word head or vocabulary, the so- called“ word “refers to is included in the vocabulary in the semantic classification system. Although the above results has been made, but in general the following problems:    The above description is of the Mongolian semantic dictionary and semantic classification, Inner Mongolia University, 1 million corpus and the grammar dictionary dictionary-based, one of its importantthe premise of the constraints, limitations of the size of the corpus and vocabulary. Real life, the lexical semantic information needed by the Mongolian information processing is far more than the previous results (need to keep the dynamic expansion), not to mention the rapid development of network environment application? From publicly available information so far, except within 100 Million corpus not yet meet the requirements of such high quality corpus.    language unity and multi-language compatibility issues. Applications of the semantic system of a single language vocabulary become very narrow, constrained knowledge tap the potentials of knowledge- based systems. Played an important role in natural language processing and information retrieval, WordNet lexical semantic network semantic relations between words connect to form a network, the basic resources of the next-generation Internet way - Semantic Web. With high-quality lexical semantic network can reduce the error information retrieval, semantic information to improve the understanding of the unknown field of information retrieval. The same time, due to the presence of cognitive semantic structure of human language commonality assumptions, WordNet as a knowledge base of research and application of human language vocabulary concept (de facto) standards, and its position is also strengthening. Multi-lingual version of the WordNet is also being developed from EuroWordNet fact is not difficult to see, WordNet, a reasonable framework has been recognized by the lexical semantics community and computing dictionary academic. Build multilingual lexical semantic network representation of the semantic relationship between language and multilingual text classification, machine translation can greatly improve the accuracy. Compatible with WordNet Mongolian vocabulary of the 221Semantic Web to build work has important practical significance. II.    THE FRAME DEFINATION OF MONGOLIAN LEXICAL SEMANTIC NETWORK WordNet organized English nouns, verbs, adjectives and adverbs for synonym sets--synsets, and each set presents a basic vocabulary concept, and the establishments are including synonymous relationship    antonymy, upper relationship, bottom relationship, part relationship and complete relationship which are all lexical semantic relationship in vocabulary concepts. In information processing oriented Mongolian semantic research, some scholars use the method of sememe analysis to make an exploratory semantic classification of Mongolian words, especially nouns, adjectives and verbs. Based on this research achievement, we try to relatively comprehensively describe and reflect many semantic relationships in Mongolian words such as synonymous relationship    antonymy, upper relationship, bottom relationship and part relationship, etceteras, according to the designing method and principle of the semantic networks in other languages including English WordNet, Chinese WordNet and Chinese Concept Dictionary--CCD for short--etceteras together with the characteristic of Mongolian. Figure 1.                                                                                                            III. THE SYNSETS BUILDING    The WordNet’s frame is organized with synset , so , establishing Mongolian synset and constructing semantic web taking synset as the unit is the Prerequisite for the Mongolian’compatible with other language’s WordNet. In order to make full use of the research results of the predecessor , the thesis labeled the mongolian word with synset ID, based on the Mongolian grammatical information dictionary. The main approach is to search for the corresponding Chinese word for each word of Mongolian grammatical information dictionary in Chinese WordNet, and mark the mongolian word with the corresponding Chinese synset ID. Considering the corresponding Chinese words tagging of the Mongolian noun ,which in Mongolian grammatical information dictionary , is incomplete or inaccurate, the thesis found the noun from Darhan dictionary at first and then found the corresponding words from the Chinese WordNet. Through the above method we completed the more than 8000 nouns’ synset ID labling work, the corresponding Chinese words of some words like SIYANBEI etceteras do not exist in Chinese WordNet, so we can set the synset ID of them manually. After Mongolian synset gathering is formed, the hyponymy, antonymy, meronymy relations of Chinese WordNet will be added to Mongolian subnet in an automatic transformation way so as to conveniently form the main framework of Mongolian lexical semantic web. Although difference exists between languages, people’s understanding of the world is interlinked, similar or even identical from the point of concept. Therefore using the semantic relationships in Chinese WordNet to construct the main framework of Mongolian lexical semantic web is a better path. Then combining with the research results of traditional Mongolian lexical semantics    in a way of manual in