Data Mining in Dara warehousing: Classification Algorithms

D) SLIQ

SLIQ (Supervised Learning In Ques) was introduced by Mehta et al, (1996). It is a fast, scalable
decision tree algorithm that can be implemented in serial and parallel pattern. It is not based on
Hunt’s algorithm for decision tree classification. It partitions a training data set recursively using
breadth-first greedy strategy that is integrated with pre-sorting technique during the tree building
phase (Mehta et al, 1996). With the pre-sorting technique sorting at decision tree nodes is
eliminated and replaced with one-time sort, with the use of list data structure for each attribute to
determine the best split point (Mehta et al, 1996 and Shafer et al, 1996). In building a decision
tree model SLIQ handles both numeric and categorical attributes. One of the disadvantages of
SLIQ is that it uses a class list data structure that is memory resident thereby imposing memory
restrictions on the data (Shafer et al, 1996). It uses Minimum Description length Principle (MDL)
in pruning the tree after constructing it MDL is an inexpensive technique in tree pruning that usesthe least amount of coding in producing tree that are small in size using bottom-up technique
(Anyanwu et al, 2009 and Mehta et al, 1996).

E) SPRINT

SPRINT (Scalable Parallelizable Induction of decision Tree algorithm) was introduced by Shafer
et al, 1996. It is a fast, scalable decision tree classifier. It is not based on Hunt’s algorithm in
constructing the decision tree, rather it partitions the training data set recursively using breadthfirst
greedy technique until each partition belong to the same leaf node or class (Anyanwu et al,
2009 and Shafer et al, 1996). It is an enhancement of SLIQ as it can be implemented in both
serial and parallel pattern for good data placement andload balancing (Shafer et al, 1996). In this
paper we will focus on the serial implementation of SPRINT. Like SLIQ it uses one time sort of
the data items and it has no restriction on the input data size. Unlike SLIQ it uses two data
structures: attribute list and histogram which is not memory resident making SPRINT suitable for
large data set, thus it removes all the data memory restrictions on data (Shafer et al, 1996). It
handles both continuous and categorical attributes.

Data Mining in Dara warehousing

Friday, 31 August 2012

Classification Algorithms

D) SLIQ

E) SPRINT

No comments:

Post a Comment

About Me