État académique
Thèse soutenue le 2014-12-15
Sujet: Combinatoire des mutations génétiques
Direction de thèse:
Productions scientifiques
SVGMapping: an R package to map omic data sets onto pathways templates
1ères Rencontres Rconference, seminar, workshop communication 2012-07-02
Genome-wide transcriptome analysis of hydrogen production in the cyanobacterium Synechocystis: Towards the identification of new players
We report the development of new tools and methods for facile integration and meaningful representation of high throughput data generated by genome-wide analyses of the model cyanobacterium Synechocystis PCC6803, for future genetic engineering aiming at increasing its level of hydrogen photoproduction. These robust tools comprise new oligonucleotide DNA microarrays to monitor the transcriptomic responses of all 3725 genes of Synechocystis, and the SVGMapping method and custom-made templates to represent the metabolic reprogramming for improved hydrogen production. We show, for the first time, that the AbrB2 repressor of the hydrogenase-encoding operon, also regulates metal transport and protection against oxidative stress, as well as numerous plasmid genes, which have been overlooked so far. This report will stimulate the construction and global analysis of hydrogen production mutants with the prospect of developing powerful cell factories for the sustainable production of hydrogen, as well as investigations of the probable role of plasmids in this process.
International Journal of Hydrogen Energy ISSN:0360-3199article in peer-reviewed journal 2013-02-12
The diversity of small non-coding RNAs in the diatom Phaeodactylum tricornutum.
BACKGROUND: Marine diatoms constitute a major component of eukaryotic phytoplankton and stand at the crossroads of several evolutionary lineages. These microalgae possess peculiar genomic features and novel combinations of genes acquired from bacterial, animal and plant ancestors. Furthermore, they display both DNA methylation and gene silencing activities. Yet, the biogenesis and regulatory function of small RNAs (sRNAs) remain ill defined in diatoms. RESULTS: Here we report the first comprehensive characterization of the sRNA landscape and its correlation with genomic and epigenomic information in Phaeodactylum tricornutum. The majority of sRNAs is 25 to 30 nt-long and maps to repetitive and silenced Transposable Elements marked by DNA methylation. A subset of this population also targets DNA methylated protein-coding genes, suggesting that gene body methylation might be sRNA-driven in diatoms. Remarkably, 25-30 nt sRNAs display a well-defined and unprecedented 180 nt-long periodic distribution at several highly methylated regions that awaits characterization. While canonical miRNAs are not detectable, other 21-25 nt sRNAs of unknown origin are highly expressed. Besides, non-coding RNAs with well-described function, namely tRNAs and U2 snRNA, constitute a major source of 21-25 nt sRNAs and likely play important roles under stressful environmental conditions. CONCLUSIONS: P. tricornutum has evolved diversified sRNA pathways, likely implicated in the regulation of largely still uncharacterized genetic and epigenetic processes. These results uncover an unexpected complexity of diatom sRNA population and previously unappreciated features, providing new insights into the diversification of sRNA-based processes in eukaryotes.
BMC Genomics ISSN:1471-2164article in peer-reviewed journal 2014
In a first part, I show the work I have done on molecular evolution. I present the general biological background and the measures that allow us to detect both conservation and coevolution at the amino-acid level. Then, I present an application of these measures to the detection of critical residues in the cancer protein P53. To this end, I have made a benchmark of different prediction methods. I then use the same methodology on a large scale database of pathogenic mutations linked to genetic diseases. After that, I show how residue-level coevolution can help us discover protein-protein interactions in the hepatitis C virus. Finally, I present the PruneTree algorithm, which allows filtering sequence sets used as input for molecular coevolution detection methods. In a second part, I have studied evolution at the genome level, in particular the recombination mechanisms that occur during meiosis. I have looked at the recombination rates along the genomes and its primary cause, the double-strand breaks, but also at the density of other proteins involved in recombination. I also present a method based on Fourier transforms to analyze these genomic signals, and a model for the distribution along the genome of double-strand breaks and recombination proteins. Finally, I present the other tools I have developed. I describe a novel algorithm that can simulate the evolution of genomes in order to benchmark the phylogenetic reconstruction algorithm PhyChro. Finally, I present the R-CLAG package that allows for easy use of the clustering algorithm CLAG.
Dans une première partie, je présente le travail que j’ai accompli sur la coévolution moléculaire. Je présente le contexte biologique et les différentes mesures qui permettent de détecter la conservation et la coévolution à l’échelle des acides aminés. Ensuite, je montre une application de ces mesures à la détection des résidus critiques dans la protéine P53 liée au cancer. Dans ce but, j’ai créé une évaluation des différentes méthodes de prédiction. J’utilise ensuite la même méthodologie sur une base de données de mutations liées à des maladies génétiques. Je montre également comment la coévolution au niveau des résidus permet de découvrir des interactions protéine-protéine sur le virus de l’hépatite C. Enfin, je présente l’algorithme PruneTree, qui permet de filtrer des ensembles de séquences utilisés comme entrée par les programmes de détection de coévolution.Dans une deuxième partie, je m’intéresse à l’étude de l’évolution à l’échelle du génome, en particulier aux mécanismes de recombinaison méiotique. Pour cela j’ai considéré le taux de recombinaison le long du génome et sa cause, les cassures double-brin de l’ADN. Je présente alors un modèle de la distribution de ces cassures et de la liaison des différentes protéines liées à la recombinaison. Je présente également une méthode de détection de périodicité le long du génome basée sur les transformées de Fourier.Enfin, dans la dernière partie, je présente un nouvel algorithme pour simuler l’évolution des génomes de façon à évaluer les outils de reconstruction, et le paquet R-CLAG permettant d’utiliser l’algorithme de classification CLAG depuis R.
https://tel.archives-ouvertes.fr/tel-01118660 Bioinformatics. Université Pierre et Marie Curie - Paris VI, 2014. English. <NNT : 2014PA066636>Theses 2014-12-15
