This is the algorithm list prepared for the course short presentations, as lecturer could not cover all these materials in details, students are encourged to select algorithms from the list and give a small lecture in class on the selected algorithm, this count towards your final grade.

Top 10 algorithms for Data Mining

We referred to this blog for listing 10 influential algorithms reported in ICDM 2006.

  • C4.5-which inherits ID3 for building decision trees in classification tasks.
  • K-Means algorithm-one clustering algorithm. (刘畅 2120151011,罗佩 2120151019)
  • Support Vector Machines-Supervised learning methodology used in statistical classification and regression by pursuing maximum distance between super planes.(吴松泽 2120151048)
  • The Apriori algorithm-for mining frequent patterns(frequent itemset mining).
  • Expectation Maximization-to handle latent variables in probabilistic models.
  • PageRank-Larry Page owns this patent, the co-founder of google.(李凯霞 2120151003)
  • AdaBoost-An iterative algorithm for combining weak classifiers to form a strong one.(崔绿叶)
  • K-Nearest Neighbor-For finding k nearest neighbor of an instance in feature space. (张燕妮2120151065,韩梦乔2120150989)
  • Naive Bayes-Naive but powerful.
  • CART-Classification and Regression Trees.

More algorithms covered in Data Mining Concepts and Techniques(Jiawei Han, Micheline Kamber, Jian Pei)

Association Rule Mining

  • Apriori
  • FP-growth
  • FPClose
  • ECLAT
  • CLOSET
  • CHARM: An efficient algorithm for closed itemset mining
  • CARPENTER
  • MaxMiner
  • MAFIA
  • TDClose

Classification

  • Decision Tree
    • ID3,ID4/ID5
    • C4.5
    • CART
    • BOAT, Bootstrapped Optimistic Algorithm for Tree construction
    • RainForest
  • Bayes Classification
    • Naive Bayes
  • Rule-based Classification
    • RIPPER
    • FOIL
  • Ensemble Methods
    • Bagging
    • Adaboost(崔绿叶)
    • Random Forest
  • Bayesian Belief Networks
  • Neural network backpropagation algorithm
  • Support Vector Machines(吴松泽 2120151048)
  • Associative Classification
    • CBA: Classification Based on Associations
    • CMAR: Classification based on Multiple Association Rules
    • CPAR: Classification based on Predictive Association Rules
    • DDPMine
  • Lazy Leaner
    • kNN:k-Nearest Neighbor(张燕妮2120151065,韩梦乔2120150989)
    • CBR: Cased-Based Reasoning
  • Genetic algorithm. (曹文强 2120150977,李艳东 2120151006)
  • Rough Set
  • Fuzzy Set
  • Multiclass Classfication
  • Semi-Supervised Classification
    • self-training
    • co-training
  • Active learning
  • Transfer learning
    • TrAdaBoost

Clustering

  • farthest-neighbor clustering algorithm
  • Chameleon: Multiphase Hierarchical Clustering Using Dynamic Modeling (Hierarchical Clustering 黄瓒 2120150996)
  • BIRCH
  • DBSCAN
  • OPTICS algorithm
  • DENCLUE algorithm
  • k-means++ algorithm
  • CLIQUE algorithm
  • SCAN (Structural Clustering Algorithm for Networks)

待分类

  • Frag-Shells algorithm for shell fragment computing
  • MAFIA algorithm
  • CHAID algorithm
  • SPRINT algorithm
  • SMOTE algorithm
  • NDPMine algorithm
  • k-medoids algorithm
  • PartitioningAroundMedoids(PAM)algorithm
  • CLARA (ClusteringLARgeApplications)
  • PAMalgorithm
  • CLARANS (Clustering Large Applications based upon RANdomized Search)
  • Single-linkage algorithm
  • minimal spanning tree algorithm
  • PAM algorithm
  • CLARANS algorithm
  • δ-Cluster Algorithm
  • MaPle algorithm
  • Ng-Jordan-Weiss algorithm
  • COP-k-meansalgorithm
  • CVQE (Constrained Vector Quantization Error)
  • fuzzy c-means algorithm
  • PROCLUS algorithm
  • FindCBLOF algorithm
  • HilOut algorithm
  • Viterbi algorithm
  • Baum-Welch algorithm
  • RankClus
  • DualMiner
  • The CN2 induction algorithm
  • TFP: An efficient algorithm for mining top-k frequent closed itemsets