État académique
Thèse en cours...
Sujet: Classification multi-classes en Big data
Direction de thèse:
Ellipse bleue: doctorant, ellipse jaune: docteur, rectangle vert: permanent, rectangle jaune: HDR. Trait vert: encadrant de thèse, trait bleu: directeur de thèse, pointillé: jury d'évaluation à mi-parcours ou jury de thèse.
Productions scientifiques
A Distributed Graph Based Approach for Rough Classifications Considering Dominance Relations Between Overlapping Classes
International audience
Several data from real world applications involves overlapping classes. Data is allowed to belong to multiple classes with different membership degrees. In this paper, we explore a different concept characterizing social networks, documents, and most of biological and chemical datasets: data could have multiple classes, but dominant classes are better noticed than dominated classes. For example, a document could discuss economy and politics, but it would be more focused on politics. A molecule could have multiple odors, but experts could notice some odors better than others. We are interested in this type of data, where a dominance relation exists between classes. Experts could easily make mistakes because dominated classes are hardly noticed. Data incoherence is a serious problem but not the only one. There is too much irrelevant and redundant attributes. Unfortunately this increases the computational time of generating classifiers. Our first challenge is to find an adapted model to overlapping classes considering dominance relations. The second challenge is to find the most relevant attributes. Finally the third challenge is to ensure that the approach gives results in an acceptable time. We address those challenges by taking advantage of the rough set theory, which is suited for incoherent data and allows multiple classes and attributes selection. The proposed approach works in a parallel and decentralized way to reduce the computational time. We tested it on real chemical data and the collected results are very promising.
2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA) 10th International Conference on Intelligent Systems: Theories and Applications (SITA 2015) http://hal.upmc.fr/hal-01303006 10th International Conference on Intelligent Systems: Theories and Applications (SITA 2015), Oct 2015, Rabat, Morocco. IEEE, 2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA), pp.1-6, <http://sitaconference.org/sita15/>. <10.1109/SITA.2015.7358388> http://sitaconference.org/sita15/ARRAY(0x7fe6a733ea68) 2015-10-20
Graded multi-label classification: compromise between handling label relations and limiting error propagation
International audience
In graded multi-label classification (GMLC), each data can be assigned to multiple labels according to a degree of membership on an ordinal scale, and with respect to label relations. For example, in a movie catalog web page, a five stars action movie should be at least a one star suspense movie. Ignoring those relations can lead to inconsistent predictions, but if they are considered, then a prediction error for one label will be propagated to all related labels. Most of existing approaches either ignore label relations, or can learn only relations fitting a predefined imposed structure. This paper is motivated by the lack of a study analysing the compromise between handling label relations and limiting error propagation in GMLC, and by the fact that there is no known approach giving a control on that compromise to allow such a study. In this paper, a new meta-classifier with two main advantages is proposed for GMLC. Firstly, no predefined structure is imposed for learning label relations, and secondly, the meta-classifier is based on three measures giving control on the studied compromise. The studied compromise is analysed according to its impact on the classifier complexity and on hamming-loss evaluation measure. A comparison to three existing approaches shows that the proposed meta-classifier is competitive according to hamming-loss evaluation measure, and it is the most stable classifier according to hamming-loss standard deviation.
SITA 2016 - 11th International Conference on Intelligent Systems: Theories and Applications http://hal.upmc.fr/hal-01413694 SITA 2016 - 11th International Conference on Intelligent Systems: Theories and Applications, Oct 2016, Mohammadia, Morocco. IEEE, pp.1-6, 2016, <10.1109/SITA.2016.7772258>ARRAY(0x7fe6a6fb1d28) 2016-10-19
Classification multi-labels graduée: Apprendre les relations entre les labels ou limiter la propagation d'erreur ?
International audience
La classification multi-labels graduée est la tâche d'affecter à chaque donnée l'ensemble des labels qui lui correspondent selon une échelle graduelle de degrés d'appartenance. Les labels peuvent donc avoir à la fois des relations d'ordre et de co-occurrence. D'un côté, le fait d'ignorer les relations entre les labels risque d'aboutir à des prédictions incohérentes, et d'un autre côté, le fait de prendre en compte ces relations risque de propager l'erreur de prédiction d'un label à tous les labels qui lui sont reliés. Les approches de l'état d'art permettent soit d'ignorer les relations entre les labels, soit d'apprendre uniquement les relations correspondant à une structure de dépendance figée. L'approche que nous proposons permet l'apprentissage des relations entre les labels sans fixer une structure de dépendance au préalable. Elle est basée sur un ensemble de classifieurs mono-labels, un pour chaque label. L'idée est d'apprendre d'abord toutes les relations entre les labels y compris les relations cycliques. Ensuite les dépendances cycliques sont résolues en supprimant les relations d'intérêt minimal. Des mesures sont proposées pour évaluer l'intérêt d'apprendre chaque relation. Ces mesures permettent d'agir sur le compromis entre l'apprentissage de relations pour une prédiction cohérente et la minimisa-tion du risque de la propagation d'erreur de prédiction.
Actes EGC 2017 Extraction et Gestion de Connaissances http://hal.upmc.fr/hal-01475683 Extraction et Gestion de Connaissances, Jan 2017, Grenoble, France. Actes EGC 2017ARRAY(0x7fe6a6fb1878) 2017-01-23
Thèse: Classification multi-labels graduée : découverte des relations entre les lacs et adaptation à la naissance des odeurs et au contexte big data des systèmes de recommandation.