Purifying training data to improve performance of multi-label classification algorithms

Sawsan Kanj; Fahed Abdallah; Thierry Denoux

Purifying training data to improve performance of multi-label classification algorithms

Sawsan Kanj, Fahed Abdallah, Thierry Denoux

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Multi-label classification assumes that each object in the training set is associated with a set of labels, and the goal is to assign labels to unseen instances. k-nearest neighbors based algorithms answer the multi-label problem by using inherent information given by the neighbors of the observation to classify. Due to several problems, like errors in the input vectors, or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for editing out some training instances by voting of some metrics in order to purify the existing training sample. This purifying approach is adapted on the recently proposed evidential k-nearest neighbors for multi-label classification. Comparative experimental results on various data sets demonstrate the usefulness and effectiveness of our approach.

Original language	English
Title of host publication	15th International Conference on Information Fusion, FUSION 2012
Pages	1784-1791
Number of pages	8
Publication status	Published - 2012
Externally published	Yes
Event	15th International Conference on Information Fusion, FUSION 2012 - Singapore, Singapore Duration: 7 Sept 2012 → 12 Sept 2012

Publication series

Name	15th International Conference on Information Fusion, FUSION 2012

Conference

Conference	15th International Conference on Information Fusion, FUSION 2012
Country/Territory	Singapore
City	Singapore
Period	7/09/12 → 12/09/12

Cite this

@inproceedings{9032a282bc8b4cc293f8538302ee5257,

title = "Purifying training data to improve performance of multi-label classification algorithms",

abstract = "Multi-label classification assumes that each object in the training set is associated with a set of labels, and the goal is to assign labels to unseen instances. k-nearest neighbors based algorithms answer the multi-label problem by using inherent information given by the neighbors of the observation to classify. Due to several problems, like errors in the input vectors, or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for editing out some training instances by voting of some metrics in order to purify the existing training sample. This purifying approach is adapted on the recently proposed evidential k-nearest neighbors for multi-label classification. Comparative experimental results on various data sets demonstrate the usefulness and effectiveness of our approach.",

author = "Sawsan Kanj and Fahed Abdallah and Thierry Denoux",

year = "2012",

language = "English",

isbn = "9780982443859",

series = "15th International Conference on Information Fusion, FUSION 2012",

pages = "1784--1791",

booktitle = "15th International Conference on Information Fusion, FUSION 2012",

note = "15th International Conference on Information Fusion, FUSION 2012 ; Conference date: 07-09-2012 Through 12-09-2012",

}

Kanj, S, Abdallah, F & Denoux, T 2012, Purifying training data to improve performance of multi-label classification algorithms. in 15th International Conference on Information Fusion, FUSION 2012., 6290519, 15th International Conference on Information Fusion, FUSION 2012, pp. 1784-1791, 15th International Conference on Information Fusion, FUSION 2012, Singapore, Singapore, 7/09/12.

Purifying training data to improve performance of multi-label classification algorithms. / Kanj, Sawsan; Abdallah, Fahed; Denoux, Thierry.
15th International Conference on Information Fusion, FUSION 2012. 2012. p. 1784-1791 6290519 (15th International Conference on Information Fusion, FUSION 2012).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Purifying training data to improve performance of multi-label classification algorithms

AU - Kanj, Sawsan

AU - Abdallah, Fahed

AU - Denoux, Thierry

PY - 2012

Y1 - 2012

N2 - Multi-label classification assumes that each object in the training set is associated with a set of labels, and the goal is to assign labels to unseen instances. k-nearest neighbors based algorithms answer the multi-label problem by using inherent information given by the neighbors of the observation to classify. Due to several problems, like errors in the input vectors, or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for editing out some training instances by voting of some metrics in order to purify the existing training sample. This purifying approach is adapted on the recently proposed evidential k-nearest neighbors for multi-label classification. Comparative experimental results on various data sets demonstrate the usefulness and effectiveness of our approach.

AB - Multi-label classification assumes that each object in the training set is associated with a set of labels, and the goal is to assign labels to unseen instances. k-nearest neighbors based algorithms answer the multi-label problem by using inherent information given by the neighbors of the observation to classify. Due to several problems, like errors in the input vectors, or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for editing out some training instances by voting of some metrics in order to purify the existing training sample. This purifying approach is adapted on the recently proposed evidential k-nearest neighbors for multi-label classification. Comparative experimental results on various data sets demonstrate the usefulness and effectiveness of our approach.

UR - http://www.scopus.com/inward/record.url?scp=84867645129&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84867645129

SN - 9780982443859

T3 - 15th International Conference on Information Fusion, FUSION 2012

SP - 1784

EP - 1791

BT - 15th International Conference on Information Fusion, FUSION 2012

T2 - 15th International Conference on Information Fusion, FUSION 2012

Y2 - 7 September 2012 through 12 September 2012

ER -

Purifying training data to improve performance of multi-label classification algorithms

Abstract

Publication series

Conference

Other files and links

Cite this