|
- Splice site prediction using stochastic regular grammars
- A.Y. Kashiwabara1, D.C.G. Vieira2, A. Machado-Lima1 and A.M. Durham1
- 1Departamento de Ciência da Computação, Instituto de Matemática e Estatística,
- Universidade de São Paulo, São Paulo, SP, Brasil
- 2Bolsa de Mercadorias e Futuros (BM&F), São Paulo, SP, Brasil
- Corresponding author: A.M. Durham
- E-mail: alan@ime.usp.br
- Genet. Mol. Res. 6 (1): 105-115 (2007)
- Received August 3, 2006
- Accepted November 8, 2006
- Published March 20, 2007
ABSTRACT. This paper presents a novel approach to the problem of splice site prediction, by applying stochastic grammar inference. We used four grammar inference algorithms to infer 1465 grammars, and used 10-fold cross-validation to select the best grammar for each algorithm. The corresponding grammars were embedded into a classifier and used to run splice site prediction and compare the results with those of NNSPLICE, the predictor used by the Genie gene finder. We indicate possible paths to improve this performance by using Sakakibara’s windowing technique to find probability thresholds that will lower false-positive predictions.
Key words: Splice sites, Gene prediction, Stochastic grammars, Machine learning
|