Editing training data for multi-label classification with the k-nearest neighbor rule

Sawsan Kanj; Fahed Abdallah; Thierry Denœux; Kifah Tout

doi:10.1007/s10044-015-0452-8

Editing training data for multi-label classification with the k-nearest neighbor rule

Sawsan Kanj, Fahed Abdallah, Thierry Denœux, Kifah Tout

Research output: Contribution to journal › Article › peer-review

41 Citations (Scopus)

Abstract

Multi-label classification allows instances to belong to several classes at once. It has received significant attention in machine learning and has found many real-world applications in recent years, such as text categorization, automatic video annotation and functional genomics, resulting in the development of many multi-label classification methods. Based on labeled examples in the training dataset, a multi-labeled method extracts inherent information to output a function that predicts the labels of unlabeled data. Due to several problems, like errors in the input vectors or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for overcoming these problems by editing the existing training dataset, and adapting the edited set with different multi-label classification methods. Evaluation on benchmark datasets demonstrates the usefulness and effectiveness of our approach.

Original language	English
Pages (from-to)	145-161
Number of pages	17
Journal	Pattern Analysis and Applications
Volume	19
Issue number	1
DOIs	https://doi.org/10.1007/s10044-015-0452-8
Publication status	Published - 1 Feb 2016
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2015, Springer-Verlag London.

Keywords

Classification
Edition
k-Nearest neighbor
Multi-label
Prototype selection

Access to Document

10.1007/s10044-015-0452-8

Cite this

@article{b6c80f9237ce4807bafbfc14ea3de599,

title = "Editing training data for multi-label classification with the k-nearest neighbor rule",

abstract = "Multi-label classification allows instances to belong to several classes at once. It has received significant attention in machine learning and has found many real-world applications in recent years, such as text categorization, automatic video annotation and functional genomics, resulting in the development of many multi-label classification methods. Based on labeled examples in the training dataset, a multi-labeled method extracts inherent information to output a function that predicts the labels of unlabeled data. Due to several problems, like errors in the input vectors or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for overcoming these problems by editing the existing training dataset, and adapting the edited set with different multi-label classification methods. Evaluation on benchmark datasets demonstrates the usefulness and effectiveness of our approach.",

keywords = "Classification, Edition, k-Nearest neighbor, Multi-label, Prototype selection",

author = "Sawsan Kanj and Fahed Abdallah and Thierry Den{\oe}ux and Kifah Tout",

note = "Publisher Copyright: {\textcopyright} 2015, Springer-Verlag London.",

year = "2016",

month = feb,

day = "1",

doi = "10.1007/s10044-015-0452-8",

language = "English",

volume = "19",

pages = "145--161",

journal = "Pattern Analysis and Applications",

issn = "1433-7541",

publisher = "Springer London",

number = "1",

}

TY - JOUR

T1 - Editing training data for multi-label classification with the k-nearest neighbor rule

AU - Kanj, Sawsan

AU - Abdallah, Fahed

AU - Denœux, Thierry

AU - Tout, Kifah

PY - 2016/2/1

Y1 - 2016/2/1

N2 - Multi-label classification allows instances to belong to several classes at once. It has received significant attention in machine learning and has found many real-world applications in recent years, such as text categorization, automatic video annotation and functional genomics, resulting in the development of many multi-label classification methods. Based on labeled examples in the training dataset, a multi-labeled method extracts inherent information to output a function that predicts the labels of unlabeled data. Due to several problems, like errors in the input vectors or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for overcoming these problems by editing the existing training dataset, and adapting the edited set with different multi-label classification methods. Evaluation on benchmark datasets demonstrates the usefulness and effectiveness of our approach.

AB - Multi-label classification allows instances to belong to several classes at once. It has received significant attention in machine learning and has found many real-world applications in recent years, such as text categorization, automatic video annotation and functional genomics, resulting in the development of many multi-label classification methods. Based on labeled examples in the training dataset, a multi-labeled method extracts inherent information to output a function that predicts the labels of unlabeled data. Due to several problems, like errors in the input vectors or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for overcoming these problems by editing the existing training dataset, and adapting the edited set with different multi-label classification methods. Evaluation on benchmark datasets demonstrates the usefulness and effectiveness of our approach.

KW - Classification

KW - Edition

KW - k-Nearest neighbor

KW - Multi-label

KW - Prototype selection

UR - http://www.scopus.com/inward/record.url?scp=84954376160&partnerID=8YFLogxK

U2 - 10.1007/s10044-015-0452-8

DO - 10.1007/s10044-015-0452-8

M3 - Article

AN - SCOPUS:84954376160

SN - 1433-7541

VL - 19

SP - 145

EP - 161

JO - Pattern Analysis and Applications

JF - Pattern Analysis and Applications

IS - 1

ER -

Editing training data for multi-label classification with the k-nearest neighbor rule

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Cite this