︹ ᠨᠠᠰᠤᠨ᠋ᠤ᠋ᠷᠲᠤ᠂ ᠴᠠᠩ ᠴᠢᠩ᠂ ᠵᠢᠶᠠᠩ ᠸᠧᠨ ᠪᠢᠨ᠂ ᠦ᠋ ᠶᠢᠨ ᠰᠢᠩ᠂ ᠯᠢᠦ ᠴᠶᠦ᠋ᠨ᠂ ᠵᠣᠣ ᠯᠢ ᠯᠢ ︺ - ᠦᠭᠦᠯᠡᠯ - (ᠬᠢᠲᠠᠳ) 蒙古语词法分析的有向图模型
 »25 null»5 ù 2011 M9 ÏÓÐ JOURNAL OF CHINESE INFORMATION PROCESSING Vol. 25, No. 5 Sep., 2011 ÓcI|: 1003-0077 ︵2011︶ 05-0094-07 ÎÔMEs¥µ_m  {Ó\1,ñÁ1, 2,É ]1,2, * ¨é °m2, 1,u Ø Ý1, 3 ︵1.ÏS SÐý9 Ø/ ùî î,Ø100190︔ 2. = ÎvÐ ÎÐÐý, = Ϋ}+010021︔ 3. 2 =SvÐ9 ØÐ/ Ðý, 2T453007︶ Knull1: 我们为蒙古语词法分析建立了一种生成式的概率统计模型b该模型将蒙古语语句的词法分析结果描述为 有向图结构,图中节点表示分析结果中的词干a词缀及其相应标注,而边则表示节点之间的转移或生成关系b特别 地,在本工作中我们刻画了词干到词干转移概率a词缀到词缀转移概率a词干到词缀生成概率a相应的标注之间的 三种转移或生成概率,以及词干或词缀到相应标注相互生成概率b以内蒙古大学开发的20万词规模的三级标注 人工语料库为训练数据,该模型取得了词级切分正确率95.1%,词级联合切分与标注正确率93%的成绩b 1oM: 蒙古语︔ 词法分析︔ 词语切分︔ 词性标注︔ 词干提取︔ 有向图 Ïms Ë|: TP391null null null nullÓDS M : A Directed Graph Model for Mongolian Lexical Analysis JIANG Wenbin1,WU Jinxing1, 2, CHANG Qing1, 2,Nasanurtu2,LIU Qun1,ZHAO Lili1, 3 ︵1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China︔ 2. Inner Mongolian University, Huhhot, Inner Mongolia 010021, China︔ 3. Henan Normal University, Xinxiang, Henan 453007, China︶ Abstract: We proposea generative statistical model for Mongolian lexical analysis. This model describes the lexical analysis result as a directed graph, wherethe nodes represent thestems, affixes and their tags, while theedges re- present thetransition or generation relationships between nodes. Especially in this work, we adopt three kinds of transition or generation probabilities: a︶ probabilities of stem-stem transition, affix-affix transition and stem-affix generation︔ b︶ the transition or generation probabilities between the corresponding tags︔ and c︶ thegeneration prob- abilities between stems or affixes and their tags. Using the 3rd-level annotated corpus with about 200 000 words as the training data, this model achieves a word-level segmentation accuracy of 95.1%, and a word-level joint segmen- tation and tagging accuracy of 93%. Key words: Mongolian︔ lexical analysis︔ segmentation︔ POS tagging︔ stemming︔ directed graph là ° ù: 2010-08-29 nullçà ° ù: 2011-02-17 Á[ “:SE1  SÐÁ'ù[ “ ︵Contract60736014︶︔ 863 × Ä[ “ ︵2006AA010108︶︔ âaSEÔ¾ öBÔý Ó3︖ SSy︕ #Ä'ù[ “ ︵MZ115-038︶ Teº:{Ó\ ︵1984null︶ , 3,p V 3,ö1ùîZ_¹MEsa Es Jr︔ ñÁ ︵1987null︶ , o, « V 3, ö1ùîZ_¹ ÎÔ︶ Ø︔ É ] ︵1985null︶ , o, « V 3,ö1ùîZ_¹ ÎÔ︶ Øb 1 nullý qÔ¸ öBÔý  ª,MEs ^v 1 Ôý︶ Ø ©¥$bqÔ¥M ¹e , -¥MEsXÜS L= V¨¥ £ Ü[1-4] , 7¿ ÿ¯¥ öBÔý  ÎÔ»ð:Ô, MEs¥  q ¯µ v¥4 6 bW[ 5-11]b ö B @Ð µ³ p °m ç M¥C¨ , Jr/ ¥ × 1T¨︖ jAb öBÔýMEsT¹ Jr¥A︕ -4,³1¤ùî÷¥1ÿb ÐqÔ¥3 ¨Q Ó¤¥MZ TM1, Î 5 ù{Ó\©: ÎÔMEs¥µ_m  Ô»ð:Ô© ÿ~¶¥ÔýM︖ p÷F¯ b ËÔý¥MÔYÈ®MÄ ÏÄ © eT¨ ¥MFî ²,MEs¥ ©ü ^³ MÔ¥MÄMî,i OSçz ñ Ì¥ ËYS ÿb“B ,qÔ rT z¥½ Sÿ  [12-14] ÚM¤ þ a¨,7ù°¤¹ ¨tCî¥L½  ,] H| ©Kç¹Y MsSÿ[7-10] , P¤“d¥ Ø N´ L¨ vu| hb 6BZ ë,.d¥¿︖ 5¥MEs  T³1 Ê¥ÔýÐ ¦,ªª{nv ¥Ú ï Ø kqyª,  q × çiÐ ¦ibyN,á ̵A1/÷¹ ¥d9 ,Ð V︖  ¹ í  ÿ~¶Ôý¥M︖ p,V7 y ÎqyÚ︖ ¥MEs“db á ̹ ÎÔMEsy ë BÕ 3î T¥À qd9 b¾ | ÎÔÔ ¥MEs²T í ¹µ_m²,mÏ«ÄV Us²TÏ¥M ÄaM# MSÿ,7H5V U«ÄW¥M  3î1“, ñ Ì Y MÄaM# MSÿ õ ¤îM¥︖ pb 3î TÀ qd9 ¹tM 3î1“®[ a¥À q T,MEs¥Vñü ^Ùs  îµÀ qðKv¥µ_mb'ýTÏ á Ì Y MÄMÄMÀ qaMMM À qaMÄM 3îÀ qaM¥SÿW¥ ØÕ M 3îÀ q[#MÄMMSÿMº 3 îÀ qbtM 3îÀ q[v » 9¥Z TVÞ Ô Ïd9¤bk¿'ýT¥im¿ d9y ,¹ Ï¥ ÄñM »  V︖ ¥MÔ² ©Ê H,á Ìi Àµ æ¨ ¦ýSÿMEsÔ  o ¥ © Ôý'÷,9 Àµ︕ 9 Ê¥MEÔ E© MÉ · ,7 ^G V ¦ýÔ  oÏ | ¥MÄVMV,YV¾B Æ ì k  îµ V︖ ¥ MZ Tb á Ì = ÎvÐ 7︖ ¥20£M︖ ¥ Ø︶ Sÿ ¦ýÔ  o ︵= ÎvÐ àÔ ︶ É L bá Ì Ûsé5%5%¥ 0sYT¹ 7 ︖ “© k“, :Å¥90%¥ 0 T¹Þ “b© k“ ,¾  |¤ M︶ Ms  q 95.1%,M︶ ó MsÐSÿ  q93%¥zî b 6,ñ“d¥Þ Vñº³1+ E ð' V î,³ VñPC 9 Vr+äM Ä ð¥ Îb7 O,®¿“d+¬ Àµ¹ù © ÔýЩ M,á ÌMº³ ¥¿îü V[¨  ð ÿ~¶¥Ôý b [/¥c«Ï,á Ì n5º  ÎÔMEs ¥ ©çl, ª í á Ì¥ 3î TÀ qd9  ,Z U¾“d L²TiÉM¥s ª ü ª,á ÌÐ - ¦ýTÉ1,Kª ^9²Z¬b 2 null ÎÔMEs ]  ð ÿ~¶¥Ôý Ë », ÎÔ¥M®M Ä V︖ ¥MFîb]¥ ^, ÎÔMÄÐM ¥F³1V+µ¥ÿ : a︶ Mĺ︖ µBñ Oº︖ CK - ë︔ b︶ sMº︖ õ õMª︔ c︶ ] ËMÏ]M¶[+ç¥ ¨½Cb ÿ a︶ ︖ çBñ ÎÔMº︖ µBñl[Ï ,BÄлð:Ô]︔ 7ÿ b︶ c︶ ︖ ç  ]M¥+çC ¨½,BÄÐÔ7Ô» M]b [ = ÎvÐ àÔ Ï¥ ÎÔMHUUR- NILDU/ HU-DU¹ è, +çÔâ/¥BÕM Es²T¹: HUURNI/Ve2+ LDU/Fe3+ HU/ Ft12- DU/Fc21 Ï, null + null|null - null|sYV Uª 덓¥ ^ õªsªbóçBñ ÎÔM,á Ì V[¹ùMÄVMV,[¾B » ¥Z TÜ V︖ ¥ME²  b¾Ô  oÏ,sª Bs õªð SMÏXÜ$S M b î µ¥sªûbM¥Á, O[null - null|sï, è ÂMÁ¥null - DUnull︔ Bs õMÊ¿sM -, O[null /null|Ð - ëssï 7 bs õM  - ë¥null /null| ^ = ÎvÐ àÔ Sÿ VñÏ ¦ýFÉ ¥MÄÐM¸FîsW¥s ï|,yNá Ì“d© k -| ÷“ { Æ  Ï¥null /null|,[  E LÌâ/¥ ÎÔÔ QM Es“d¥ L︖ b ¸ ÎÔMµ¸BÕMEs©Ê² ,] HSMÄSÿMSÿ¥ H©,©Ê²  M¤÷¹ vb ÂTô +ç¥ /ÓÌâ¹ ÎÔMÊ4 ¥MEs²,︔ ^ l ³ ¥Ù5,9 ^ ÎÔMEs¥ 4Ä îb¿Ô ýÐ︖ 5¥MEs“d︖ ¹ Äñÿ¥MÚr ¹ » Ð V︖ Úe¥©Ês“,  üÉ¿ ô /ÓÌâ¹ ÄñMÊ4K ¥©ÊbÔý ¥d9y  z V[Ð︖ 5ZE LCª ]ºbd 9ZE 4[¹ ÄñM çBñÚe¥E©Ês 95 ÏÓÐ2011 M “,  üÉ¿Úr¹¹ñ 0Ê4K V︖ ¥8 s²Tb'ýT¥ × Ä' ÎÔMEs¥d 9y  b 3 nullµ_mÀ q  ¿d9¥µSüy 98  Vs¹  Ë: 3 î Td9y  Y Td9y b Õ 8C  © ]¥y  ± ^b 3î ] H I° { Æs ²T,»sÁ 3À qKÚ¥ { ÆÐs²T ¥FbyN, ñµBsÀ q© M¨¿ í  { Æ Ô ¥ 3î︖ pb Y 5 ë@¿ { Æ I°s ²T,  “¥¿sX© { Æ¥ f /Kª¥s ²T©ÊbÐ 3î ¥ Ø QM1, Y ÷  ¦ Ìs³ %Ù5¥Z Tb Y L9£ ü  Y ¥ª ]b½ Sÿ¥ÜÅÙ5MSÿ ,  Y qÔÔ û1 3î µ üAÉb 7, Y ¨¿ ÎÔMEsÎi µ³ %¥1oÙ5bBZ ë,Ð 3î ] ¥ ^, Y ¥Þ 1︕ 9v ¥ Y+,iY ȳ1 ¥É¥ }Ý}VñbÐ ÿe¥qÔ ÔM1, ÎÔM¯¥ ÿ² P¤ ¥ Æ ì bW1v¥,V7³1v¥i%]¨÷ò¥ Þ HW︔ 6BZ ë, Y YÈÇ a¨¿¹ Æ ì bW²%ç¥ ©y , ÂMSÿGis, ñ ÌûµBñ%çM¥MÔ½ b7¿ ÿ~ ¶Ôý¥MEs  ª5 ︕ ç¥ Æ ì bW² ,y¹á ̳1 » òÕ V︖ ¥©ÊMÔ²,i Ê4KD©Ê²¥] H ç¾² =MÄM ¥Sÿb “ -,á Ì¿ ÿ~¶ÔýMEs¥ Y Ty 9X +* V¥ZÃ,M1¥ùî Xܵ½¥Éb'ÓÏ,á ÌÇÿ¿Â  XÜî¥ 3î TÀ qd9 b 3. 1 nullB Ms¥ ² ]MÔ ÿe¥qÔÔM1,MÔ ÿ~¶¥ ÎÔ¥MEs÷^ ^Bñ ²É Ê4i Ï«ÄÉSÿ¥Vñ,7 ^Bñ e¥L½ SÿÙ5b Ú,á Ìi  º ︖ ] HÉ MsSÿ¥KÔMEs , 7 ^5V ¹e¥ © ª ,'B Ms¥  yb á ÌÜÔ ÏòM¥s²Tçl¹ ý² , Âm1 î U: m1 nullMÔ =²V U null null Ú,S ︵Stem︶ V UMÄ, A ︵Adjoin︶ V U õ M,D ︵Disjoint︶ V UsMbá ̨´L õ¤ ¥ ñA ︵D︶ V U0ñ õM ︵sM ︶ bMÄMW[#MªÃMW, phV U 3îM1“b¿ñÔ ,s ²T5 V í ¹ ², Âm2 î U: m2null ¥MÔ =²V U null nullÐñM¥s²T²M1, s² Ï9F M #M¥MÄW¥ 3îM1“,V 7 îµMÄMWîBñ êµ½¥ ² b Ï«Ä'V UMÄM,7«ÄW¥ H5V UMÄMÄaMÄM[#MM ¥ 3îM1“b À7ý,í ¿︖ 5 Î ^d9 ,y ¥'Éû ^ YÂMÄaM# MSÿ ︵ÂT ] H9SSÿ¥Å︶ W¥ 3îaM©ÿ ︖ pb ÂTá Ì︖ ¹ Ï¥òÕ]¥H︕ 9M¥   × ,t  × ¥ Q «ÄW 3îM︖ p¥ .ÏÓMEsÐ Es µ ¦ {ùî[ J] .ÏÓÐ, 2008, 22 ︵2︶ : 10-17. [ 5] null * ¨é °m,Ò,=A ü.C} ÎÔÔ  oFý / ¥ÉZnullB} ÎÔMÔ1î MsÐSÿ“ d[C] //» E½ S   öBÔýÓ3︶ ØÐ  ù︶ ö,2005. [ 6] null¥¡¼, , * ¨é °m,©.¿d9Ôý ¥ ÎÓM Ms[ J].  T MYÐ ¦ýÆ︖ , 2009, 22: 108-112. [ 7] nullu¿,¥¡¼,V¿,©.¿Hq ÛÆ¥ ÎÔM Msùî[ J] .ÏÓÐ, 2010, 24 ︵5︶ : 31-35. [8] nullW¿.¿ªß : S ¥ ÎÔM Ms“d ¥ùî[ D] . = ÎvÐ « V8 Ó, 2009. [9] null£,¦ ® ° Î.¿HMM¥ ÎÓ1îMS ÿùî[ J]. = Î =SvÐÐ ︵1  SÐqÓñ︶ , 2010, 39 ︵2︶ , 206-209. [10] null á nullêY X, Ü4null ï Ü4.»ð:Ô M MsZE[ J].ÏÓÐ, 2004, 18 ︵6︶ : 61-65. [11] null Lawrence. R. Rabiner. A Tutorial on Hidden Mark- ov Models and Selected Applications in Speech Recog- nition[ C] // Proceedings of IEEE, 1989:257-286. [12] null John Lafferty, AndrewMcCallum, Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[ C] // Proceed- ings of the 18th ICML, 2001:282-289. [13] null McCallum, A., Freitag, D., Pereira, F. Maximum entropy Markov models for information extraction and segmentation[ C] // Proc. ICML, 2000: 591-598. [14] null Stolcke, Andreas. Srilmnull an extensible language modeling toolkit[ C]/ /Proceedings of the Internation- al Conferenceon Spoken Language Processing, 2002: 311-318. ︵¤»93:︶ null null null semantic orientation of adjectives[ C]/ / Proceedings of the Eighth Conference on European Chapter of the As- sociation For Computational Linguistics. European Chapter Meeting of the ACL. Association for Compu- tational Linguistics, Morristown, NJ, 1997: 174-181. [ 6] null Yi,J. ,Nasukawa,T. , Bunescu, R. , Niblack, W. Sen- timent analyzer: Extracting sentiments about a given topic using natural language processing techniques [C] // TheThird IEEE International Conference on Da- ta Mining, November 2003,. IEEE Computer Society Press, Los Alamitos,2003:427-434. [ 7] null Bo Pang, Lillian Lee, Shivakumar Vaithyanathan, Thumbs up︖ Sentiment Classification using Machine Learning Techniques [ C]// Conference on Empirical Methods in Natural Language Processing. 2002:79-86. [ 8] null L. Mangasarian, D. R. Musicant. Lagrangian support vector machines[ J]. Journal of Machine Learning Re- search, 2001,1:161-177. [ 9] null · ¡, f,Ñw.¿Ôl س¥Ó' `_ MYÅ[ J] .ÏÓÐ, 2007,21 ︵1︶ : 96-100. [ 10] null Aizerman M, Braverman E, Rozonoer L. Theoretical foundations of the potential function method in pat- tern recognition learning[ J] . Automation and Remote Control, 1964,25:821-837. [ 11] null Mercer J. Functions of positive and negative typeand their connection with the theory of integral equations [ J] . Philosophical Transactions of the Royal Society of London,1909, A209:415-446. [12] null Chapelle O, Vapnik V N, Bacsquest O, et al. Choo- sing multiple parameters for support vector machine [ J] .Machine Learning. 2002, 46:131-159. [ 13] null Cucker F, SmoleS. On the mathematical foundations of learning [ J]. Bulletin of the American Mathemat-i cal Society, 2001:1-49. [14] null Smithsgf, Jordaanem. Improved SVM regression u- sing mixtures of kernels [ C] // Proceedings of the 2002 International Joint Conference on Neural Net- works. Washington, DC: IEEE, 2002, 3: 2785- 2790. [15] null Turney P, Littman M. . Measuring praise and crit-i cism: Inference of semantic orientation from associa- tion[ J]. ACM Transactions on Information Systems, 2003,21 ︵4︶ :315-346. 100