Funpec-RpAbout The JournalEditorial BoardCurrent IssueAll IssuesSearchIndexersInstructions For AuthorsContactSponsorsLinks

Splice site prediction using stochastic regular grammars
A.Y. Kashiwabara1, D.C.G. Vieira2, A. Machado-Lima1 and A.M. Durham1
1Departamento de Ciência da Computação, Instituto de Matemática e Estatística,
Universidade de São Paulo, São Paulo, SP, Brasil
2Bolsa de Mercadorias e Futuros (BM&F), São Paulo, SP, Brasil
Corresponding author: A.M. Durham
E-mail: alan@ime.usp.br
Genet. Mol. Res. 6 (1): 105-115 (2007)
Received August 3, 2006
Accepted November 8, 2006
Published March 20, 2007

ABSTRACT. This paper presents a novel approach to the problem of splice site prediction, by applying stochastic grammar inference. We used four grammar inference algorithms to infer 1465 grammars, and used 10-fold cross-validation to select the best grammar for each algorithm. The corresponding grammars were embedded into a classifier and used to run splice site prediction and compare the results with those of NNSPLICE, the predictor used by the Genie gene finder. We indicate possible paths to improve this performance by using Sakakibara’s windowing technique to find probability thresholds that will lower false-positive predictions.

Key words: Splice sites, Gene prediction, Stochastic grammars, Machine learning

 

Copyright © 2007 by FUNPEC