P. 164
Unit 11: Data Mining
Segmentation algorithms: This type of algorithm divides data into groups, or clusters, of Notes
items that have similar properties.
Association algorithms: This type of algorithm finds correlations between different
attributes in a dataset. The most common application of this kind of algorithm is for
creating association rules, which can be used in a market analysis.
Sequence analysis algorithms: This type summarize frequent sequences or episodes in
data, such as a Web path flow.
Self Assessment
Fill in the blanks:
14. A ........................................... is a set of heuristics and calculations that creates a data mining
model from data.
15. ............................................. type of algorithm finds correlations between different attributes
in a dataset.
Case Study Logic-ITA Student Data
e have performed a number of queries on datasets collected by the Logic-ITA
to assist teaching and learning. The Logic-ITA is a web-based tool used at
WSydney University since 2001, in a course taught by the second author. Its
purpose is to help students practice logic formal proofs and to inform the teacher of the
class progress.
Context of Use
Over the four years, around 860 students attended the course and used the tool, in which an
exercise consists of a set of formulas (called premises) and another formula (called the
conclusion). The aim is to prove that the conclusion can validly be derived from the
premises. For this, the student has to construct new formulas, step by step, using logic
rules and formulas previously established in the proof, until the conclusion is derived.
There is no unique solution and any valid path is acceptable. Steps are checked on the fly
and, if incorrect, an error message and possibly a tip are displayed. Students used the tool
at their own discretion. A consequence is that there is neither a fixed number nor a fixed
set of exercises done by all students.
Data Stored
The tool’s teacher module collates all the student models into a database that the teacher
can query and mine. Two often queried tables of the database are the tables mistake and
correct_step. The most common variables are shown in Table 1.