Students are expected to master the following skills :
- implement and test a solution in the form of a software prototype and/or a numerical model,
- demonstrate a good understanding of the basic concepts and the methodology of programming,
- make a relevant choice between several data representations and algorithms to process them,
- analyse a problem to provide an IT solution and implement it in a high level programming language,
- understand and know how to apply in various stuations the basic concepts of probability and statistical inference,
- use a scientific approach to extract reliable information from a data sample,
as covered within the courses LFSAB1401, LFSAB1402, LFSAB1105
The following skills are also useful. They are briefly reviewed at the beginning of the LGBIO2010 course :
- explain the functions that take place in the cells of a living organism,
- describe the basic concepts of molecular genetics,
- define the different classes of biomolecules and their links within the cell processes and structures,
as covered within the courses LGBIO1111 and LBIR1220A
Bioinformatics refers to a set of concepts and tools that are required for the analysis of biological data and the interpretation of the results. After a review of molecular biology basics and recent technologies for genome analysis, the course focuses on molecular biology databases (DNA and protein sequences), sequence comparison algorithms, identification of protein structural features (motifs), Hidden Markov models, selection of transcriptional markers, inference of transcriptional regulatory networks, and prediction of evolutionary relationship.
With respect to the AA referring system defined for the Master in biomedical engineering, the course contributes to the development, mastery and assessment of the following skills :
- AA1.1, AA1.2, AA1.3
- AA2.2, AA2.4
- AA4.3
- AA5.3
At the end of this course, students will be able:
- to master the basic concepts of molecular biology for appropriate use of bioinformatics tools,
- to design and develop tools or methods for database management, information extraction and data mining,
- to formulate informed decisions between the many computational methods that are available for solving biological questions,
- to carry out a collaborative project aiming at the resolution of a bioinformatics problem and taking benefit from complementary student's education and expertise,
- to use the information available in major sequence databases (Genbank, Uniprot) with a critical mind and with discernment,
- to master a software environment (EMBOSS, R, Bioconductor).
The contribution of this Teaching Unit to the development and command of the skills and learning outcomes of the programme(s) can be accessed at the end of this sheet, in the section entitled “Programmes/courses offering this Teaching Unit”.
The first part of the written examination, in a closed-book format, focuses on algorithmic and statistical aspects, and accounts for 50% of the global note. The second part, in an open-book format, proposes a sequence to be analysed using the computer programs discussed in the classroom, and accounts for another 30%. The mini-projects account for 20% of the final evaluation marks. Students who failed the examination are not allowed to retake the miniprojects.
The theoretical part consists of ex cathedra lectures in a classroom (30h). The training sessions (30h) consist of a set of problems to be solved (mini-projects) and tutorials. The mini-projects are based on the algorithms discussed in the lectures. Teams of up to two students work on statistical and algorithmic aspects to solve biological problems, using a programming language of their choice (typically among R, Matlab, Python, or Perl). The tutorials introduce students to the methodology followed for protein function prediction, using the EMBOSS open software suite. The importance of the choice of the method and the analysis parameters is illustrated for common biological cases.
- Overview of basic concepts in biochemistry and molecular biology
- Major Sequence and structure repositories and associated search tools
- Sequence comparison
- Sequence statistics
- Pairwise sequence alignment
- Database search for homology
- Hidden Markov models
- Multiple sequence alignment and profiles
- Transcriptome profiling
- Gene expression analysis
- Gene regulatory networks
- Molecular Phylogeny
Syllabus, slides and a set of problems will be available via Moodle.
The following books are suggested as complementary resources :
- Bioinformatics: Sequence and Genome Analysis, D.W. Mount (CSHL press), 2nd ed., 2004,
- Introduction to Computational Genomics: a case-study approach, N. Cristianini, M.W. Hahn, Cambridge University Press, 2007.
- Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, R. Durbin et al., Cambridge University Press, 1998
- Inferring Phylogenies, J. Felsenstein, Sinauer Associates; 2nd ed., 2003.
Tutorials on protein function prediction will be held in the computational room Cérès or Ulysse (Faculty of Bioscience Engineering)