Titre
Probabilistic base calling of Solexa sequencing data.
Type
article
Institution
UNIL/CHUV/Unisanté + institutions partenaires
Périodique
Auteur(s)
Rougemont, J.
Auteure/Auteur
Amzallag, A.
Auteure/Auteur
Iseli, C.
Auteure/Auteur
Farinelli, L.
Auteure/Auteur
Xenarios, I.
Auteure/Auteur
Naef, F.
Auteure/Auteur
Liens vers les personnes
Liens vers les unités
ISSN
1471-2105
Statut éditorial
Publié
Date de publication
2008
Volume
9
Première page
431
Langue
anglais
Résumé
BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology.
RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads.
CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads.
CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
PID Serval
serval:BIB_54200D66E5BC
PMID
Open Access
Oui
Date de création
2012-10-18T07:10:56.546Z
Date de création dans IRIS
2025-05-20T19:51:44Z
Fichier(s)![Vignette d'image]()
En cours de chargement...
Nom
BIB_54200D66E5BC.P001.pdf
Version du manuscrit
preprint
Taille
503.55 KB
Format
Adobe PDF
PID Serval
serval:BIB_54200D66E5BC.P001
URN
urn:nbn:ch:serval-BIB_54200D66E5BC8
Somme de contrôle
(MD5):52b573a5653f6e44e1100b8075f91409