Login or register for free to remove ads.
You are here:

Extracting Formulaic Sequences Containing Useful Expressions for Language Learning from Closed Caption TV Corpus PROCEEDING

, , Tokyo University of Foreign Studies, Japan

E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, in Washington, DC, United States Publisher: Association for the Advancement of Computing in Education (AACE), Chesapeake, VA

Abstract

This paper describes the details of the formulaic sequences (FS) extracted from closed caption TV (CCTV) data corpus to develop language learning materials of e-learning system. In second language education and applied linguistics, it is a widely acceptance that appropriately using FSs in particular situations and functions contributes to learners’ language comprehension, production, and fluency. In our research, we aim to apply FSs as a language e-learning system’s learning materials. To extract the FSs, we calculated sequences of n words (n-grams) in the corpus, where n is from one to nine. We calculated a total of 3,544,847,579 n-grams from the CCTV corpus of over 655 million words. After a sorting and merging process, we acquired 33,173,413 significant n-grams as FSs candidates. We show the details of the FSs and investigate whether they are useful as language learning materials.

Citation

Mochizuki, H. & Shibano, K. (2016). Extracting Formulaic Sequences Containing Useful Expressions for Language Learning from Closed Caption TV Corpus. In Proceedings of E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2016 (pp. 29-37). Chesapeake, VA: Association for the Advancement of Computing in Education (AACE).