MODEL PAPER
THIRD SEMESTER, M. Sc INFORMATION TECHNOLOGY

Paper - VII (MIT - 307): Data Mining
First Semester
MIT - 101
MIT - 102
MIT - 103
MIT - 104
MIT - 105
MIT - 106
MIT - 107
MIT - 108
Second Semester
MIT - 201
MIT - 202
MIT - 203
MIT - 204
MIT - 205
MIT - 206
MIT - 207
MIT - 208
Third Semester
MIT - 301
MIT - 302
MIT - 303
MIT - 304
MIT - 305
MIT - 306
MIT - 308
Fourth Semester

Time: 3hrs Max. Marks: 75
Attempt any five questions.
All questions carry equal marks

  1. Question.
    1. Briefly describe all phases in the KDD process. How does KDD differ from data mining?
    2. Given an initial population {<101010>, <001100>, <010101>, <000010>}, apply the genetic algorithm to find a better population. Suppose the fitness function is defined as the sum of bit values for each individual and that mutation always occurs by negating the second bit. The termination condition is that the average fitness value for the entire population must be greater than 4. Also, an individual is chosen for crossover only if its fitness is greater than 2.

  2. Question.
    1. Discuss briefly various data mining issues.
    2. Apply Bayesian classification on the following set of data: -
      Number Gender Height Class
      1F1.6Short
      2M2.0Tall
      3F1.9Medium
      4F1.88Medium
      5F1.7Short
      6M1.85Medium
      7F1.6Short
      8M1.7Short
      9M2.2Tall
      10M2.1Tall
      11F1.8Medium
      12M1.95Medium
      13F1.9Medium
      14F1.8Medium
      15F1.75Medium
      and classify the tuple < , M, 1.9>

  3. Question.
    1. Define decision tree and write a decision tree based classification algorithm.
    2. How the splitting attribute is selected in decision tree based algorithm. Show it for the training set given in 2(b).

  4. Question.
    1. Write the K-means clustering algorithm and apply the same for the following data to create three clusters: -
      {2, 4, 10, 12, 3, 20, 30, 11, 25}
    2. A major problem with single link algorithm is that clusters consisting of long chains may be created. Describe and illustrate this concept.

  5. Question.
    1. Define the concept of association rule and describe Apriority algorithm.
    2. Apply apriority algorithm for the following transactions to generate association rules taking s = 20% and = 40% -
      TransactionItems
      t1Bread, Jam, Butter
      t2Bread, Butter
      t3Bread, Milk, Butter
      t4Curd, Bread
      t5Curd, Milk
      t6Bread, Milk

  6. Question.
    1. Describe Web mining with the help of suitable examples.
    2. Given the following sessions: -
      {< A, B, A, C>, < C, B, D, F>, < A, B, A>}, Indicate the sequential patterns, forward sequences and maximal frequent sequences assuming a minimum support of 30%. Assume that each session occurs from a different user.

  7. Question.
    1. What do you understand from spatial data mining? Describe briefly some of its applications.
    2. Describe briefly various data structures used in spatial data mining.

  8. Question.
    1. What do you understand from temporal data mining? Describe briefly some of its applications.
    2. Plot the following time series values as well as the moving average found by replacing a given value with the average of it and ones preceding and following it: -
      {5, 15, 7, 20, 13, 5, 8, 10, 12, 11, 9, 15}.
      For the first and last values, you are to use only the two values available to calculate the average.