Data mining techniques in education: A comparison of conventional statistical linear regression and neural-network-based tools
Michele K. Thigpen, The University of Alabama, United States
The University of Alabama . Awarded
The purpose of this study was to explore the use of artificial neural networks as an alternative to multiple regression in data analysis. SPSS 10.0 was used to perform regression analyses, and PolyAnalyst Lite 4.1 was used to analyze the data using an artificial neural network. The researcher examined the tools’ predictive value by examining the correlation matrices, coefficients of determination, and beta weights.
Two separate datasets were used to compare the data mining techniques. The first dataset was generated by a survey given to college students to measure attitudes toward teamwork. The second dataset was generated from a questionnaire designed to measure attitudes toward democracy, and was given to students enrolled in overseas American schools.
The major advantage claimed by artificial neural networks over regression analysis is that ANNs do not assume a linear relationship. The ANN claims to detect any dependencies, linear or nonlinear, in the data. Thus, an ANN would be advantageous when the user has no idea of the type of relationship between the dependent and independent variables. However, the ANN used in this study did not generate a significant solution and the following disadvantages were noted: (1) It is difficult to interpret the results of an ANN. It lacks explanatory power. (2) The ANN requires more data than regression analysis. An ANN's performance improves with sample size. (3) The ANN does not readily indicate what are the strongest predictors, since there is no real beta weight equivalent. (4) Overfitting the data may be a problem. An ANN attempts to find any pattern in the data, however spurious.
The results of this study indicate that great care should be taken in using an artificial neural network to analyze certain datasets, particularly those with a small number of variables and records. However, although it might be tempting to discard the results of the ANN in this study, it may be that the ANN is accurate in this particular illustration, and the traditional statistical data analysis techniques should be accepted with great caution. Whichever technique one chooses, the use of any tool must be thought about carefully.
Thigpen, M.K. Data mining techniques in education: A comparison of conventional statistical linear regression and neural-network-based tools. Ph.D. thesis, The University of Alabama.
Citation reproduced with permission of ProQuest LLC.
For copies of dissertations and theses: (800) 521-0600/(734) 761-4700 or https://dissexpress.umi.com