Page 166 - DCAP606_BUSINESS

Page 166 - DCAP606_BUSINESS_INTELLIGENCE

P. 166

Unit 11: Data Mining

Notes
a distance between individuals was not obvious. We calculated and used a new variable:
the total number of mistakes made per student in an exercise. As a result, students with
similar frequency of mistakes were put in the same group. Histograms showing the
different clusters revealed interesting patterns. There are three clusters: 0 (red, on the left),
1 (green, in the middle) and 4 (purple, on the right). From other windows (not shown), we
know that students in cluster 0 made many mistakes per exercise not finished, students in
cluster 1 made few mistakes and students in cluster 4 made an intermediate number of
mistakes. Students making many mistakes use also many different logic rules while solving
exercises; this is shown with the vertical, almost solid lines.
Classification

We built decision trees to try and predict exam marks (for the question related to formal
proofs). The Decision Tree algorithm produces a tree-like representation of the model it
produces. From the tree it is then easy to generate rules in the form IF condition THEN
outcome. Using as a training set the previous year of student data (mistakes, number of
exercises, difficulty of the exercises, number of concepts used in one exercise, level reached)
as well as the final mark obtained in the logic question), we can build and use a decision
tree that predicts the exam mark according to the attributes.
Supporting Teachers and Learners
Pedagogical Information Extracted
The information extracted greatly assisted us as teachers to better understand the cohort of
learners. Whilst SQL queries and various histograms were used during the course of the
teaching semester to focus the following lecture on problem areas, the more complex
mining was left for reflection between semesters. Symbolic data analysis revealed that if
students attempt at least two exercises, they are more likely to do more (probably
overcoming the initial barrier of use) and complete their exercises. In subsequent years
we required students to do at least 2 exercises as part of their assessment. Mistakes that
were associated together indicated to us that the very concept of formal proofs (i.e. the
structure of each element of the proof, as opposed to the use of rules for instance) was a
problem. In 2003, that portion of the course was redesigned to take this problem into
account and the role of each part of the proof was emphasized. After the end of the
semester, mining for mistakes associations was conducted again. Surprisingly, results did
not change much (a slight decrease in support and confidence levels in 2003 followed by a
slight increase in 2004). However, marks in the final exam continued increasing. This
leads us to think that making mistakes, especially while using a training tool, is simply
part of the learning process and was supported by the fact that the number of completed
exercises per student increased in 2003 and 2004. The level of prediction seems to be much
better when the prediction is based on exercises (number, length, variety of rules) rather
than on mistakes made. This also supports the idea that mistakes are part of the learning
process, especially in a practice tool where mistakes are not penalized.
Using data exploration and results from decision tree, one can infer that if students do
successfully 2 to 3 exercises for the topic, then they seem to have grasped the concept of
formal proof and are likely to perform well in the exam question related to that topic. This
finding is coherent with correlations calculated between marks in the final exam and
activity with the Logic Tutor and with the general, human perception of tutors in this
course. Therefore, a sensible warning system could look as follows: Report to the lecturer-
in-charge students who have completed successfully less than 3 exercises. For those students,
display the histogram of rules used. Be proactive towards these students, distinguishing
those who use out the pop-up menu for logic rules from the others.
Contd....

LOVELY PROFESSIONAL UNIVERSITY 161

161 162 163 164 165 166 167 168 169 170 171