Knowledge-based Music Recommendation: Models, Algorithms and Exploratory Search
Sujet proposé par
Directeur de thèse:
Unité de recherche UMR 7102 Laboratoire de recherche d'EURECOM
Domaine: Sciences et technologies de l'information et de la communication
Describing music in all its complexity
Musical works are complex objects. Expressing them comprehensively requires the description of their physical manifestations (recordings, scores) and all the events that define them (creation, publication, performance). The first aspect is relatively well-mastered, in library catalogues as well as in the music industry. Several models can be used to describe a musical work, some being specific to the music domain (MusicOntology), while others being more broadly designed for libraries in general (FRBR , UNIMARC ). The second aspect is rather new, although there is a growing need and interest in using even-based models. Several ontologies aim specifically to define events (Event , LODE ), but there are fewer examples of using them to describe the creation and publication processes.
One of the difficulties with musical works is that although their expressions may differ significantly, they are still regarded as a single work. Modeling it requires to express the singleness of the work as well as the specificities of its expressions, and to show how events are connected to these expressions. Another issue is that an arrangement may be considered as an expression or as a new work, depending on the data producer. Therefore, a first research challenge of this thesis will be to propose an expressive ontological model that enables to represent music metadata in all its complexity, for example, starting from the CRM and FRBRoo ontologies. Furthermore, a mapping to a simpler model useful for web publication such as Schema.org will be investigated.
Interconnecting music catalogs
In order to represent music metadata with semantic graphs, a number of controlled vocabularies need also to be normalized, for example, for identifying music genre (sonata) or instruments (piano). We aim to convert music metadata coming from libraries into RDF following the ontological model proposed above. A research challenge consists in automatically interpreting free form text in order to re-create structured information. A follow-up research challenge consists in interlinking the music catalogs coming from different institutions that will have overlap in terms of coverage.
Due to the high data heterogeneity, link discovery in the musical field becomes a challenging task. To train and test data linking tools, we will collect benchmark data from the BnF and the Philharmonie. In particular, we will pay special attention to multilingual descriptions of works, as well as to significant lexical differences in work titles. Descriptive heterogeneities (level of detail, number of properties) also hinder the performance of general purpose tools. We will evaluate several instance matching tools such as SILK, LIMES or Duke and investigate the use of ensemble learning to improve both precision and recall even when starting with non optimized mapping specifications.
Exploring and Recommending Music
How to visualize and interact with enriched music catalogs? How to design and implement an effective exploratory search engine that uses the rich semantic model for describing music? How to design a tool that can support the selection of music work, for example for recommending a program for a specialized radio channel? How to select music works for illustrating an historical period? or a movement or school?
Music recommendations have been deeply studied by content holders such as Spotify, Deezer, Last.fm and Youtube among others. In this thesis, we will study various recommendation strategies, that fully uses expert knowledge, words-of-mouth (collaborative filtering) and the rich metadata coming from the cultural institutions for different type of audience: experts in music or amateurs that incrementally build his music culture and identity.