Biometrics Research Group

Department of Computer Science and Engineering

Current Projects

Semi-supervised learning

Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our contribution to semi-supervised learning literature is in two directions.

The first direction is applying semi-supervised learning in unsupervised learning settings, where there is no label information provided along with the data, but a weaker form of side-information is available in the form of pairwise constraints, i.e. information whether a given pair of data points belongs to the same cluster. However, most methods assume that the side information is provided by some oracle, which is not a very realistic setting. We adopt active learning framework as one of the possible ways to obtain constraint information from a user in the most effective, accurate and semi-automatic manner.

Secondly, unlike most of the previous studies on Semi-supervised learning that have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data, our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a meta-semi-supervised learning algorithm that wraps around the underlying supervised algorithm, and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: (a) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, (b) efficient computation by the iterative boosting algorithm, and (c) exploiting both manifold and cluster assumption in training classification models.

P. K. Mallapragada, R. Jin, A. K. Jain, and Yi Liu . "SemiBoost: Boosting for Semi-supervised Learning," Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, No. 11, pp. 2000-2014, November 2009.

P. K. Mallapragada, R. Jin, and A. K. Jain. "Active Query Selection for Semi-supervised Learning," in Proceedings of the International Conference on Pattern Recognition (ICPR), Tampa, Florida, December 7-13 2008.

P. K. Mallapragada, R. Jin, A. K. Jain, and Yi Liu . "SemiBoost: Boosting for Semi-supervised Learning," Technical Report MSU-CSE-07-197, Department of Computer Science and Engineering, Michigan State University.

A. K. Jain, P. K. Mallapragada, and M. Law. "Bayesian Feedback in Data Clustering," Proc. of the 18th International Conference on Pattern Recognition, Vol. 3, pp. 374-378, Hong Kong, August 20-24, 2006.