ᠮᠠᠰᠢᠨ ᠣᠷᠴᠢᠭᠤᠯᠤᠭᠠ (ᠠᠩᠭ᠍ᠯᠢ ᠬᠡᠪᠯᠡᠯ) Thierry Poibeau.pdf
The MIT Press Essential Knowledge Series Auctions, Timothy P. Hubbard and Harry J. PaarschCloud Computing , Nayan Ruparelia Computing : A Concise History, Paul E. CeruzziThe Conscious Mind, Zoltan L. Torey Crowdsourcing , Daren C. BrabhamFree Will, Mark Balaguer Informa tion a nd Society, Michael BucklandInforma tion a nd the Modern Corpora tion, James W. Cortada Intellectua l Property Stra teg y, John PalfreyThe Internet of Thing s, Samuel Greengard Ma chine Lea rning : The New AI, Ethem AlpaydinMa chine Tra nsla tion, Thierry Poibeau Memes in Dig ita l Culture, Limor ShifmanMeta da ta , Jeffrey Pomerantz The Mind–Body Problem, Jonathan WestphalMOOCs, Jonathan Haber Neuropla sticity, Moheb CostandiOpen Access, Peter Suber Pa ra dox, Margaret CuonzoRobots, John Jordan Self-Tra cking , Gina Neff and Dawn NafusSusta ina bility, Kent E. Portney The Technolog ica l Sing ula rity, Murray ShanahanUndersta nding Beliefs, Nils J. Nilsson Wa ves, Frederic Raichlen Ma c hine Tra ns la tion Thierry Poibeau The MIT Press Cambridge, Massachusetts London, England © 2017 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic ormechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This book was set in Chaparral Pro by Toppan Best-set Premedia Limited. Printed and bound inthe United States of America. Library of Congress Cataloging-in-Publication Data is available. ISBN: 978-0-262-53421-5 eISBN 9780262342438 ePub Version 1.0 Ta b le o f Co n te n ts Series page Title page Copyright page Series Foreword Acknowledgments 1 Introduction 2 The Trouble with Translation 3 A Quick Overview of the Evolution of Machine Translation 4 Before the Advent of Computers… 5 The Beginnings of Machine Translation: The First Rule-BasedSystems 6 The 1966 ALPAC Report and Its Consequences 7 Parallel Corpora and Sentence Alignment 8 Example-Based Machine Translation 9 Statistical Machine Translation and Word Alignment 10 Segment-Based Machine Translation 11 Challenges and Limitations of Statistical Machine Translation 12 Deep Learning Machine Translation 13 The Evaluation of Machine Translation Systems 14 The Machine Translation Industry: Between Professional andMass-Market Applications 15 Conclusion: The Future of Machine Translation Glossary Bibliography and Further Reading Index About Author L ist o f Ta b le s Table 1 Example of possible translations in French for the Englishword “motion” L ist o f Illu stra tio n s Figure 1 The Necker cube, the famous optical illusion published byLouis Albert Necker in 1832. (Image licensed under CC BY-SA 3.0 via Wikimedia Commons. Fromhttps://commons.wikimedia.org/wiki/File:Necker_ cube.svg.) Figure 2 Vauquois’ triangle (image licensed under CC BY-SA 3.0, viaWikiMedia Commons). Source: https://en.wikipedia.org/wiki/File:Direct_ translation_ and_ transfer_ translation_ pyramind.svg. Figure 3 An extract from the Hansard corpus aligned at sentence level. Figure 4 Two texts of different length. Each cell with a number ncorresponds to a sentence of length n. Figure 5 Beginning of alignment based on sentence length. Figure 6 Other possible simple alignments. Figure 7 Alignment of remaining sentences. Figure 8 Two texts in a translation situation. Although the content ofthe texts is unknown (here represented by “xxx” and “yyy”), some words are identical or similar and can help determine reliablecorrespondence points. Figure 9 Automatically extracted sentences from a bilingual corpus inorder to translate the sentence “tra ining is not the solution to every problem.” Each sentence in English contains a sequence of n similarwords with the sentence to be translated. Figure 10 Different examples with the Japanese particle “no.” One cansee that the particle requires the use of a different linguistic structure each time when translating into English, depending on the context(see Sumita and Iida, 1991). Figure 11 A possible alignment between two sentences. Figure 12 A possible alignment between two sentences, with severalintersecting links. Figure 13 Initialization of the alignments. Each English word is linkedwith equal probability to all the words in the French translation. Figure 14 After the first iteration, the algorithm identifies the linkbetween “la” and “the” as being the most likely, based on their frequency in the source language and in the target language. Theselinks are strengthened (shown in bold) to the detriment of other links and therefore also other possible alignments. Figure 15 After another iteration, the algorithm identifies the othermost probable links between “voiture” and “car,” then between “chaise” and “chair” and between “red” and “rouge.” The otherpossible links and alignments become less and less probable. Figure 16 The process ends when there is convergence, meaning astable structure has been found. The other links in the figure are removed, but in fact they remain available with a very low probability.It is possible to filter the alignment using a threshold in order to select only a limited number of possibilities, as shown in this figure, wherethe alternative links have been completely deleted. Figure 17 Example of an alignment that is impossible to obtain fromIBM models. The sequence “don’t ha ve a ny money” corresponds to the group “sont démunis” in French: this is an example of an m-ncorrespondence (here, m= 4 and n= 2 such that four English words correspond to two French words, if we consider “don’t” as a singleword). Figure 18 Segment-based translation: different segments have beenfound corresponding to isolated words or to longer sequences of words. The system then has to find the most probable translationfrom these different pieces of translation. It is probable that “les pa uvres n’ont pa s d’a rg ent” will be preferred to “les pa uvres sontdémunis,” but this would be acceptable since the goal of automatic systems is to provide a literal translation, not a literary one. Figure 19 Performance obtained with the same standard statisticaltranslation system applied over 22 different European languages. The translation system is based on the standard Moses toolbox, the corpusused is the JRC-Acquis corpus (see chapter 7), and the metric used is the BLEU score. Dark grey cells correspond to a BLEU scoreperformance over 0.5, and light grey cells to a BLEU score performance under 0.4 (blank: between 0.4 and 0.49). Languageabbreviations: bg: Bulgarian; cs: Czech; da: Danish; de: German; el: Greek; en: English; es: Spanish; et: Estonian; fi: Finnish; fr: French;gr: Greek; hu: Hungarian; it: Italian; lt: Lithuanian; lv: Latvian; mt: Maltese; nl: Dutch; pl: Polish; pt: Portuguese; ro: Romanian; sk:Slovak; sl: Slovene; sv: Swedish (note that et, fi, and hu are Finno- Ugric, mt is Semitic, and all other languages are Indo-European).Figure taken from Koehn et al., 2009. Reproduced with the authorization of the authors. Figure 20 Variations of the word “book” in Finnish, depending on itsgrammatical function. Se rie s Fore word The MIT Press Essential Knowledge series offers accessible, concise,beautifully produced pocket-size books on topics of current interest. Written by leading thinkers, the books in this series deliver expertoverviews of subjects that range from the cultural and the historical to the scientific and the technical.In today’s era of instant information gratification, we have ready access to opinions, rationalizations, and superficial descriptions. Much harder tocome by is the foundational knowledge that informs a principled understanding of the world. Essential Knowledge books fill that need.Synthesizing specialized subject matter for nonspecialists and engaging critical topics through fundamentals, each of these compact volumesoffers readers a point of access to complex ideas. Bruce Tidor Professor of Biolog ica l Eng ineering a nd Computer ScienceMa ssa chusetts Institute of Technolog y Ac k nowle dgme nts This book would not have been possible without the support of colleaguesand friends. I want to thank Michelle Bruni, Elizabeth Rowley-Jolivet, Pablo Ruiz Fabo, and Bernard Victorri for their help during thepreparation of this book. My gratitude also goes to the editorial and production staff at MIT Press, particularly Marie Lufkin Lee andKatherine A. Almeida. Finally, I want to thank the anonymous reviewers for their careful reading and their many insightful comments andsuggestions. Thierry Poibeau is a member of LATTICE, a research laboratorysupported by CNRS, Ecole normale supérieure (ENS), PSL Research University, Université Sorbonne nouvelle, and USPC. 1 Introduc tion In Douglas Adams’ humorous saga The Hitchhiker’s Guide to the Ga la xy,1 all one needs to do to understand any language is to introduce asmall fish (the Babel fish) into one’s ear. This improbable invention is of course related to the idea of a universal translation device,2 and moregenerally to the key problem of language diversity and comprehension. The name of the fish is a transparent allusion to the Biblical episode ofBabel, when God scrambled language so that humans could no longer understand one another.A significant number of thinkers, philosophers, and linguists—and, more recently, computer scientists, mathematicians, and engineers—havetackled the question of language diversity. Moreover, they have imagined theories and devices intended to solve the problems caused by thisdiversity. Since the advent of computers (after the Second World War), this research program has materialized through the design of ma chinetra nsla tion tools—in other words, computer programs capable of automatically producing in a target language the translation of a text in asource language. This research program is very ambitious: it is even one of the mostfundamental in the field of artificial intelligence. The analysis of languages cannot be separated from the analysis of knowledge andreasoning, which explains the interest in this field shown by philosophers and specialists of artificial intelligence as well as the cognitive sciences. This brings to mind the test proposed by Turing3 in 1950: the test issuccessfully completed if a person dialoguing (through a screen) with a computer is unable to say whether her discussion partner is a computeror a human being. This test is foundational, because developing an operational conversational agent presupposes not only understandingwhat the discussion partner says (at least to some extent), but also inferring from what has been said a relevant utterance that helps thewhole conversation move forward. For Turing, if the test is successful, it means that the machine has a certain degree of intelligence. Thisquestion has fueled considerable debate, but we can at least agree on the fact that a robust conversational system would involve formalizing some mechanisms of understanding and reasoning. The analysis of languages cannot be separated from the analysisof knowledge and reasoning, which explains the interest shown by philosophers and specialists of artificial intelligence as well ascognitive sciences in [machine translation]. Machine translation involves different processes that make it at least aschallenging as developing an automatic dialoguing system. The degree of “understanding” shown by the machine can be very partial: for example,the Eliza system developed by Weizenbaum in 1966 was able to simulate a dialogue between a psychotherapist and his patient. The system in factjust derived questions from the patient’s utterances (for example, the system was able to produce the question “why are you afraid of X?” fromthe sentence “I am afraid of X”). The system also included a series of ready-made sentences that were used when no predefined patternsseemed to be applicable (for example “could you specify what you have in mind?” or “really?”). Despite its simplicity, Eliza had great success, andsome patients really thought they were conversing with a real doctor through a computer.The situation is completely different when considering machine translation. Translation requires in-depth understanding of the text to betranslated. Moreover, transposition into another language is a delicate and difficult process, even with news or technical texts. The aim ofmachine translation is not, of course, to address literature or poetry; rather, the idea is to give the most accurate translation of everyday texts.Even so, the task is immensely difficult, and current systems are still far from satisfactory.However, and despite its limitations, from a more theoretical point of view, machine translation also makes us take a fresh look at old andwidely investigated questions: What does it mean to translate? What kind of knowledge is involved in the translation process? How can wetranspose a text from one language to another? These are some of the questions that are addressed in this book.This short book aims at providing an overview of the progress in machine translation since the Second World War. Some pioneers will bementioned, but it is mainly the research implemented with computers that will be addressed. The content of the book is thus partly historical,since the main approaches to the problem will be presented in an intuitive manner: the idea is to make sure that the reader can understandthe main principles without having to know all the technical details. Specifically, recent approaches based on the statistical analysis of verylarge corpora of texts will be presented, but these approaches are highly technical and we will skip the mathematical details that are not necessaryto grasp the overall idea. More technical books exist for those who are interested in the full details of the different approaches.The book begins with a presentation of the main problems one has to solve when developing a machine translation system (chapter 2). Thejourney continues with a quick overview of the evolution of machine translation (chapter 3), followed by a more detailed presentation of thehistory of the field, from its beginnings before the advent of computers (chapter 4) to the most recent advances based on deep learning (chapter12). Along the way, we will encounter all the main approaches developed since the field’s beginning: rule-based approaches (chapter 5) up to theALPAC report and its consequences (chapter 6); and the advent of parallel corpora (chapter 7), which fueled research in the field after the1980s, first through the example-based paradigm (chapter 8), then through the most popular statistical paradigm (chapter 9) along with itsmore recent developments—the segment-based approach (chapter 10) and the introduction of more linguistic knowledge to the systems(chapter 11). This book is not limited to a presentation of the main approaches to the problem: we will also address evaluation issues(chapter 13), which can be either manual or automatic, and the closing