A challenging task in natural language processing is the development of end-to-end dialog systems that incorporate external knowledge which is stored in databases. This thesis implements a state-of-the-art sequence-to-sequence model that is trained on the Movie Dialog Dataset [Dod+15]. The dataset provides a knowledge base that contains meta data about more than 17k movies, which this thesis stores in an Elasticsearch instance.
Since database operations are non-differentiable, two different ways of training are investigated. On the one hand, this thesis extends the dataset by intermediate labels, which represent the query ground truth, and uses them to train the model before the execution of the database operation. The annotation with intermediate labels required human interaction. On the other hand, policy-based reinforcement learning is used to train the model on the original question-answer pairs.
This thesis finds that training with intermediate labels achieves an executional accuracy of 90.6% and thereby approaches the QA benchmark reported by Dodge et al. [Dod+15]. Training with policy gradient achieves an executional accuracy of 84.2% and performs on a competitive level with an ensemble of Memory Networks. Furthermore, the proposed sequence-to-sequence model generalizes on unknown question patterns.
Technology is one of the key drivers guiding the evolution of language and it affects communication from several sides. For instance it increases the speed of information flow, it provides new interfaces, offers the possibility to store information or might by itself be a topic of communication. Furthermore, technology breaks down barriers which seemed intractable. One development which shows the impact of technology is today’s influence of machine translation. While translations formerly required skilled translators, technology enables everyone with an Internet access to gain remarkable insights when confronted with information in a foreign language [YD15]. Another major impact is the advent of smartphones. On the one hand, it altered our way of processing information. In the 21st century, information is ubiquitous as social media allows to share news from everywhere around the world. Technology enables people to participate in events of public interest. Presidential speeches, concerts or sport events are accessible from all over the globe. Same applies during the appearance of a crisis, like natural disasters or acts of war. The combination of technologies that connect people from a technical and social point of view, has changed communication between humans significantly. On the other hand, smartphones revolutionized the way humans interact with machines and devices. Touchscreens already existed before the rise of smartphones. But smartphones brought them into daily life. This trend spread to other mobile devices like tablets and even new generations of laptops.
The next step of evolution in communication between man and machine is happening right now. In 2011 Apple presented the iPhone 4s. In combination with hardware improvements the smartphone introduced a digital personal assistant as a complementary service. Such a system can provide information about your personal calender, stock prices or the weather forecast. Other manufacturers recognized the potential of such assistants and caught up. Almost every smartphone one can purchase today will come with a digital assistant that can receive commands in natural language. The current shift in technology highlights the importance of voice as an interface between human and machines. Over the last years leading tech companies like Amazon and Google developed a new generation of devices.
Their digital assistants, Alexa and Google Home, are fully based on voice commands. These assistants do not have displays any more. So far, their main field of application is to control smart home devices, assistance in simple daily tasks and question answering. It is very likely that their capabilities and usage will increase in the next years.
As these considerations only focused on consumer products it is necessary to add that the advance of digital assistance will also cover industries and enterprises. According to a recently published report about the potential of virtual digital assistants (VDAs), the expected amount of VDAs in enterprises will grow to about 840 million VDAs in 2021. This represents an increase of 440 percent compared to the number of VDAs in 2015. On the consumer side, it is estimated that the amount of VDAs will grow from 390 million in 2015 to 1.8 billion in 2021 [Tra16].
Summarizing these considerations leads to the following three points. Firstly, the current development in technology points to natural language as a key interface between man and machine. At this point it is important to emphasize that natural language includes spoken dialog as well as written instructions. Secondly, technology affects communication in general. It removes traditional barriers and offers new opportunities to communicate. Thirdly, the market for digital assistants already exists. It will grow significantly in the next years.
This thesis aims at the problem field of natural language interfaces (NLI). We are living in times where access to information is crucial. A significant amount of today’s information is stored and organized in databases. The ability to access such resources is limited to the ones who have mastered a corresponding query language [ZXS17]. Applying a query language in order to access information falls into the same category of problems as accessing information available in a foreign language. The field of NLI aims for the interaction between humans and computers through natural language. A subset of NLI is the research area of semantic parsing. It focuses on the mapping of natural language into logical forms and structured representations. A database query is such a logical form. Semantic parsers explicitly separate between the step of parsing natural language to a logical form and executing the same against a database. However, there is an increased interest in the application of neural network based solutions to tackle this problem in an end-to-end fashion [Lia16]. This thesis will follow this trend and determine the capabilities of such systems. Therefore, the core of this work is to implement a state-of-the-art model which is able to turn natural language questions into database queries. Typically, such models are trained with question-query pairs, as a database induces learning problems caused by broken differentiability. An approach to overcome this problem will be presented and the proposed model will be trained on question-answer pairs. In terms of practical relevance, it is a desirable goal to achieve an end-to-end character. Otherwise the system would require human interaction or intermediate labels. Both would lead to increased costs and reduce the flexibility of the solution. Taking these considerations into account leads to the conclusion that this thesis and the related approach address all three points mentioned earlier.