当前位置: 首页>博士论文>资源详情

信息时代,万物数化,“终端、应用、平台、数据”四位一体的泛互联网化环境使得各传统人文学科都面临着一个共同的问题——如何变革固有研究方法、研究路线乃至研究思维,以推进入文学科研究的科学化。国际汉语教学已有半个多世纪的历程,积累了丰厚的理论、经验、资源和人才基础,借助系统工程化的手段承载前贤智慧、集成前沿技术,采用数据科学的思维来研究教与学的问题,是本学科发展的必由之路。 习题训练是语言习得的重要途径。采用系统工程化的方法构建习题库,并借助数据分析技术从习题与课文、习题与学习者水平、习题与能力标准、习题与教学法等多维角度研究习题训练的共性规律,成为推进国际汉语教学研究科学化进程的重要突破口。 国际汉语教学领域的习题库普遍存在“题量规模较小、属性不成体系、知识深度不够”等问题,无从展开上述科学问题的深入研究。究其原因,是因为题库构建缺乏一套切实有效的工程方法及配套的信息处理技术,本文针对这一现状开展研究,主要解决大规模习题库的体系设计、知识挖掘和自动生成等科学问题,具体内容包括:(1)面向国际汉语教学习题库的知识体系构建;(2)习题知识属性挖掘的关键技术研究;(3)国际汉语教学习题库平台设计与实现。 本文的研究成果与创新之处包括以下几方面: (1)针对当前题库缺乏深度知识属性等问题,基于第二语言习得相关理论设计了习题知识体系。该习题知识体系,在宏观方面,以吕必松习题相关理论和布卢姆教育目标分类学为基础建立;在中观方面,依据语言学的逻辑关系,对语言知识中语音、汉字、词语、句式、语法和功能等具体属性进行了分类;在微观方面,与词语库、句式库、语法库等丰富的语言知识库衔接,使得习题知识属性落实到最具体的内容,最终形成一个包含12个习题知识属性、广度与深度兼具的树状层次结构体系。相对于已有相关研究,从结构角度,习题知识体系安排合理、结构完善;从内容角度,属性的设置更能反映习题的核心特征,属性的取值详尽、粒度较细;从实用角度,习题知识体系有效地指导了后续的习题库构建工作。建立在该习题知识体系上的习题库,具备了分析习题与课文、习题与语言知识、习题与学习阶段、习题与习题之间等关系的可行性,有望显现各类教材课后习题中蕴含的丰富训练模式和出题规律,此举对于推进汉语教学在训练、测试等方面的深入研究具有十分重要的意义。 (2)为了建立习题知识体系与题库间的联系,挖掘习题知识属性,提出了基于句本位语法的标题句法分析、基于句本位语法的标题文本聚类和基于题型分析的自动答题三种方法,来实现规模为22319道大题、104437道小题的习题库中题型和语言知识点两个关键习题知识属性自动标注。从习题库构建角度看,三种方法解决了习题知识属性人工标注工作量大、效率低的问题。从方法自身看:基于句本位语法的标题句法分析中的三个算法——基于小句单位的半自动句法分析算法、基于小句单位的自动句法分析算法和基于短语单位的自动句法分析算法,能够有效完成标题这类强模式短文本的句法分析任务,准确率分别达到95.54%、93.42%和71.08%,是首次基于句本位语法理论实现的句法分析算法,是一次在限定类型语料上基于句本位语法的自动句法分析的探索;基于句本位语法的标题文本聚类,能够推广至短文本的文本聚类问题,验证了句本位语法理论在中文信息处理领域中的适用性和有效性;基于题型分析的自动答题,针对具体题型进行设计,在替换词语、选词填空、完型填空和句中填空题型上,准确率分别达到80.72%、65.37%、58.22%和92.32%。 (3)为了解决“学-练-测”的连贯性及针对性问题,提出并构建了一个基于知识点分析的交互式出题系统。针对特定课文,通过“分析知识点→选择知识点→生成习题素材→调整习题素材”四个步骤有针对性地生成国际汉语教学习题。对该系统生成习题的评估显示:接受率50.82%,简单调整后的接受率85.47%。与现有出题系统相比,简单调整后的接受率更高,且覆盖的题型更为全面。交互式出题系统与基于习题知识体系的国际汉语教学习题库共同构成了服务于国际汉语教学领域各环节参与者的动态、交互的国际汉语教学习题库平台。 关键词:习题库,知识挖掘,出题系统,中文信息处理,国际汉语教学


In the age when everything can be informationized and digitized, the four-in-one internet environment of "terminals,applications, platforms, and data" has made all traditional humanities disciplines face a common problem — how to transform the original research methods, routines and thinking to advance the research of the humanities more scientific. The International Chinese Language Teaching has a history of more than half a century. It has accumulated rich theories, experience, resources and talented people. With the help of systems engineering approaches, the wisdom of the predecessors, the cutting-edge technology, as well as the data science thinking, to solve the problems of teaching and learning have become the only way for the development of this discipline. Exercise training is an essential approach to language acquisition, so the data analysis techniques are used to research the relationship between the exercises and texts, exercises and learner levels, exercises and language capability standard, and exercises and teaching methods, etc. Adopt the approach of system engineering to build the exercise bank, and study the general laws from exercise training in multidimensional perspectives become a significant breakthrough in the scientific process of International Chinese Language Teaching. In the field of International Chinese Language Teaching, there are many problems such as the scale of the exercises are small, the attributes are not systematic, and the depth of the knowledge is insufficient etc.; therefore it is impossible to carry out in-depth research due to the above problems. The reason for this is that the lack of a practical and useful engineering method and supporting information processing technology to construct the exercise bank. This paper focuses on this status quo, and mainly solves the scientific problems of system design, knowledge mining, and automatic exercise generation to build a large-scale exercise bank. The content of this paper includes (1)Construction of knowledge system for the exercise bank of International Chinese Language Teaching. (2)Essential technology research for knowledge mining of the attributes of the exercises. (3)Design and implementation of an exercise bank of International Chinese Language Teaching. The achievements and innovations of this paper include the following aspects: (1)The paper designs the knowledge system based on the theory of second language acquisition. At the macroscopic level, the knowledge system is built on Lv Bisong's exercise--related theory and Bloom's taxonomy of educational objectives. In the aspect of the standard view, according to the logical relation of linguistics, the specific attributes (phonetic,Chinese character, word, sentence, grammar, and function) of language knowledge are classified. From the micro perspective point of view, the knowledge system is bound to rich knowledge bases such as word bank, sentence pattern bank, and grammar bank and formed a hierarchy structure system which contains 12 attributes of exercise knowledge provided with both the breadth and depth. Compared with the existing related research, from the structural point of view, the knowledge system is reasonable and well-structured. From the content point of view, the settings of attributes can better reflect the core characteristics of the exercises, and the values of the attributes are detailed and fine-grained. From the practical point of view,the knowledge system can efficiently guide the construction of the follow-on exercise bank. With the exercise bank based on the knowledge system, it is feasible to perform the analysis of the relationships between exercises and texts, exercises and learner levels, exercises and language capability standard, and exercises and teaching methods, etc. It is of great significance to reveal the vibrant training pattern of the exercises of the textbooks and the rule of exercise creating, also to promote the in-depth research of Chinese teaching in training and testing. (2)The paper puts forward three methods: the exercise title parsing method based on the sentence-based grammar, the exercise title clustering method, and the automatic answering method for different exercise type. By the above methods, exercise type and knowledge point of 22319 exercises which contain 104437 specific exercises are annotated automatically. From the perspective of the construction of the exercise bank, the three methods solve the problem of massive workload and low efficiency if had the manual annotation. From the perspective of the methods themselves, the exercise title parsing method based on the sentence-based grammar which has three algorithms: the semi-automatic based on the clause unit, the automatic algorithm based on the clause unit, and the automatic algorithm based on the phrase unit. Those three parsing algorithms which achieved the corresponding precisions of 95.54%, 93.42% and 71.08% can efficiently accomplish the task of parsing for the short text with strong patterns. They are the first three parsing algorithms based on sentence-based grammar and an exploration of parsing in a domain-specific corpus. The exercise title clustering method can be extended to short text clustering problem, and also verified the applicability and validity of sentence-based grammar theory in the field of Chinese language processing. The automatic answering method can get the corresponding precisions of 80.72%, 65.37%, 58.22% and 92.32% for the following exercise types: words replacement, filling the blanks with selected word, the cloze and filling the blanks of the sentence. (3)To make sure the studying-training-testing process is coherence and pertinence, an interactive exercise generating system based on knowledge point analysis was proposed and constructed. For specific texts, the four-step process of "analyzing knowledge points→selecting knowledge points→generating exercise candidates→adjusting the exercise content" generates targeted International Chinese Language Teaching exercises. The evaluation of the generated exercises shows that the acceptance rate is 50.82% and the simple adjustment rate is 85.47%. Compared with the existing exercise generating system, the acceptance rate after simple adjustment is higher,and the covered exercise type is more comprehensive. The exercise generating system and the International Chinese Language Teaching exercise bank together constitute a dynamic, interactive exercise bank platform serving teachers, students, researchers and other participants in the International Chinese Language Teaching field. KEY WORDS: Exercise Bank, Knowledge Mining, Interactive Exercise Generating System, Chinese Information Processing, International Chinese Language Teaching
