Page 166 - DCAP606_BUSINESS_INTELLIGENCE
P. 166

Unit 11: Data Mining




                                                                                                Notes
             a distance between individuals was not obvious. We calculated and used a new variable:
             the total number of mistakes made per student in an exercise. As a result, students with
             similar frequency of mistakes were put in the same group. Histograms showing the
             different clusters revealed interesting patterns. There are three clusters: 0 (red, on the left),
             1 (green, in the middle) and 4 (purple, on the right). From other windows (not shown), we
             know that students in cluster 0 made many mistakes per exercise not finished, students in
             cluster 1 made few mistakes and students in cluster 4 made an intermediate number of
             mistakes. Students making many mistakes use also many different logic rules while solving
             exercises; this is shown with the vertical, almost solid lines.
             Classification

             We built decision trees to try and predict exam marks (for the question related to formal
             proofs). The Decision Tree algorithm produces a tree-like representation of the model it
             produces. From the tree it is then easy to generate rules in the form IF condition THEN
             outcome. Using as a training set the previous year of student data (mistakes, number of
             exercises, difficulty of the exercises, number of concepts used in one exercise, level reached)
             as well as the final mark obtained in the logic question), we can build and use a decision
             tree that predicts the exam mark according to the attributes.
             Supporting Teachers and Learners
             Pedagogical Information Extracted
             The information extracted greatly assisted us as teachers to better understand the cohort of
             learners. Whilst SQL queries and various histograms were used during the course of the
             teaching semester to focus the following lecture on problem areas, the more complex
             mining was left for reflection between semesters. Symbolic data analysis revealed that if
             students attempt at least two exercises, they are more likely to do more (probably
             overcoming the initial barrier of use) and complete their exercises. In subsequent years
             we required students to do at least 2 exercises as part of their assessment. Mistakes that
             were associated together indicated to us that the very concept of formal proofs (i.e. the
             structure of each element of the proof, as opposed to the use of rules for instance) was a
             problem. In 2003, that portion of the course was redesigned to take this problem into
             account and the role of each part of the proof was emphasized. After the end of the
             semester, mining for mistakes associations was conducted again. Surprisingly, results did
             not change much (a slight decrease in support and confidence levels in 2003 followed by a
             slight increase in 2004). However, marks in the final exam continued increasing. This
             leads us to think that making mistakes, especially while using a training tool, is simply
             part of the learning process and was supported by the fact that the number of completed
             exercises per student increased in 2003 and 2004. The level of prediction seems to be much
             better when the prediction is based on exercises (number, length, variety of rules) rather
             than on mistakes made. This also supports the idea that mistakes are part of the learning
             process, especially in a practice tool where mistakes are not penalized.
             Using data exploration and results from decision tree, one can infer that if students do
             successfully 2 to 3 exercises for the topic, then they seem to have grasped the concept of
             formal proof and are likely to perform well in the exam question related to that topic. This
             finding is coherent with correlations calculated between marks in the final exam and
             activity with the Logic Tutor and with the general, human perception of tutors in this
             course. Therefore, a sensible warning system could look as follows: Report to the lecturer-
             in-charge students who have completed successfully less than 3 exercises. For those students,
             display the histogram of rules used. Be proactive towards these students, distinguishing
             those who use out the pop-up menu for logic rules from the others.
                                                                                 Contd....



                                           LOVELY PROFESSIONAL UNIVERSITY                                   161
   161   162   163   164   165   166   167   168   169   170   171