|
- Personalized diagnosis by cached solutions with hypertension as a study model
- P.C. Carvalho1,4*, S.S. Freitas2*, A.B. Lima3, M. Barros3, I. Bittencourt3, W. Degrave4,
- I. Cordovil3, R. Fonseca5, M.G.C. Carvalho6, R.S. Moura Neto7 and P.H. Cabello2
- *Both authors contributed equally to this study.
- 1Programa de Engenharia de Sistemas e Computação, COPPE,
- Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brasil
- 2Departamento de Genética Humana, Instituto Oswaldo Cruz, Rio de Janeiro, RJ, Brasil
- 3Instituto Nacional de Cardiologia, Laranjeiras, RJ, Brasil
- 4Laboratório de Genômica Funcional e Bioinformática, Fiocruz, Rio de Janeiro, RJ, Brasil
- 5Departamento de Ciência da Computação, Universidade Federal de Juiz de Fora,
- Juiz de Fora, MG, Brasil
- 6Laboratório do Controle da Expressão Gênica, Instituto de Biofísica Carlos Chagas Filho,
- UFRJ, Rio de Janeiro, RJ, Brasil
- 7Departamento de Genética Humana, Universidade Federal do Rio de Janeiro,
- Rio de Janeiro, RJ, Brasil
- Corresponding author: P.C. Carvalho
- E-mail: carvalhopc@cos.ufrj.br
- Genet. Mol. Res. 5 (4): 856-867 (2006)
- Received May 22, 2006
- Accepted September 18, 2006
- Published December 18, 2006
ABSTRACT. Statistical modeling of links between genetic profiles with environmental and clinical data to aid in medical diagnosis is a challenge. Here, we present a computational approach for rapidly selecting important clinical data to assist in medical decisions based on personalized genetic profiles. What could take hours or days of computing is available on-the-fly, making this strategy feasible to implement as a routine without demanding great computing power. The key to rapidly obtaining an optimal/nearly optimal mathematical function that can evaluate the “disease stage” by combining information of genetic profiles with personal clinical data is done by querying a precomputed solution database. The database is previously generated by a new hybrid feature selection method that makes use of support vector machines, recursive feature elimination and random sub-space search. Here, to evaluate the method, data from polymorphisms in the renin-angiotensin-aldosterone system genes together with clinical data were obtained from patients with hypertension and control subjects. The disease “risk” was determined by classifying the patients’ data with a support vector machine model based on the optimized feature; then measuring the Euclidean distance to the hyperplane decision function. Our results showed the association of renin-angiotensin-aldosterone system gene haplotypes with hypertension. The association of polymorphism patterns with different ethnic groups was also tracked by the feature selection process. A demonstration of this method is also available online on the project’s web site.
Key words: Genetic polymorphisms, Essential hypertension, Evironmental risks, Support vector machines, Feature selection
|