Extracting Formulaic Sequences Containing Useful Expressions for Language Learning from Closed Caption TV Corpus PROCEEDING
Hajime Mochizuki, Kohji Shibano, Tokyo University of Foreign Studies, Japan
E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, in Washington, DC, United States Publisher: Association for the Advancement of Computing in Education (AACE), Chesapeake, VA
This paper describes the details of the formulaic sequences (FS) extracted from closed caption TV (CCTV) data corpus to develop language learning materials of e-learning system. In second language education and applied linguistics, it is a widely acceptance that appropriately using FSs in particular situations and functions contributes to learners’ language comprehension, production, and fluency. In our research, we aim to apply FSs as a language e-learning system’s learning materials. To extract the FSs, we calculated sequences of n words (n-grams) in the corpus, where n is from one to nine. We calculated a total of 3,544,847,579 n-grams from the CCTV corpus of over 655 million words. After a sorting and merging process, we acquired 33,173,413 significant n-grams as FSs candidates. We show the details of the FSs and investigate whether they are useful as language learning materials.
Mochizuki, H. & Shibano, K. (2016). Extracting Formulaic Sequences Containing Useful Expressions for Language Learning from Closed Caption TV Corpus. In Proceedings of E-Learn: World Conference on E-Learning (pp. 29-37). Washington, DC, United States: Association for the Advancement of Computing in Education (AACE). Retrieved September 19, 2017 from https://www.learntechlib.org/p/173916/.
© 2016 AACE