Balkan Journal of Medical Genetics

Large bodies of experimental data from family, adoption and twin studies suggest a genetic component of the individual differences in susceptibility to complex disorders. It is clear that multifactorial disorders are, in part, heritable and their etiology results from a complex interaction between environmental and genetic factors [1,2]. In contrast to the single gene (Mendelian) disorders, they have more compound pathogenesis. According to the contemporary models, the potential effect of many genes and genetic variants in several different loci determines genetic susceptibility to such disorders [3]. Emerging data from linkage and association studies support the hypothesis that the triggering effect of certain environmental risk factors, such as a particular lifestyle, might provoke phenotype expression when affecting individuals with certain genetic background [1].

Intense interest has been focused on genome-based studies of complex diseases and accelerated with the completion of the human genome project and the progression of advanced technologies. Comparison of the DNA sequences of people from the major population groups has established a comprehensive map of genetic variants in the human genome, which conveniently serve as genetic markers. Detailed information about genetic diseases, genes, sequences and a great variety of polymorphisms is available in on-line public databases and provides an irreplaceable tool for molecular genetic studies [4,5].

The quest for genetic factors in the susceptibility to complex disorders has focused on single nucleotide polymorphisms (SNPs) which are the most common type of genetic variants in the human genome and occur in approximately every 100 to 300 bp [6,7]. Most SNPs have only two possible alleles that differ between the individuals of the same population group, where the frequency of the minor allele is usually specific. Although SNPs offer a limited number of possible alleles, which is a prerequisite for the selection of markers for DNA analysis, they are very convenient and highly informative for haplotype analysis, because of their abundance (over 10^{6 deposited in the dbSNP database of the National Center for Biotechnology Information (NCBI): http://www.ncbi.nlm.nih.gov/ About/primer/snps.html) and their genetic stability in the human genome [7-9].}

The SNPs occur within coding gene regions, non coding intra- and intergenetic sequences. Most fall in introns, untranslated 3' and 5' regions (UTR3' and UTR5') of the genes and spacer DNA [8]. Although they do not cause gene product modifications, some may play an important role in the control of gene transcription level, by influencing the affinity of promoters or other regulating sequences to trans-regulating factors that modify the gene expression rate, or by affecting pre-mRNA processing [6]. A small portion of SNPs are in coding DNA sequences, however, most are synonymous, i.e., they do not alter the polypeptide structure and only a few are non synonymous SNPs, causing an amino acid exchange [8]. The distribution of SNPs could be explained by a negative effect on survival and fast elimination of expressed variations by natural selection [6].

Single nucleotide polymorphisms are associated with population diversity and individual differences in complex traits [6,8]. Therefore, they are convenient for genetic association studies on identification of susceptibility loci for multifactorial disorders. An association between a disorder and a non synonymous SNP makes the pheno-type-genotype relationship very clear. However, an association with a synonymous SNP or a SNP in a non coding sequence is difficult to explain, and usually another causative marker needs to be identified [7,8,10].

Currently, SNPs are preferred as genetic markers in case-control and whole genome association studies. They have been used in studies for mapping and discovery of susceptibility genes for many complex disorders: cardiovascular (essential hypertension), neurological (Alzheimer’s disease, multiple sclerosis), psychiatric (schizophrenia, bipolar affective disorders), autoimmune (rheumatoid arthritis) disorders, diabetes mellitus type 2, and different types of cancer [11,12]. Linkage studies and genome scans have identified several candidate chromosomal regions for common diseases [7,13]. Selection of SNPs in such loci has become a basic approach in candidate gene(s) association studies [7]. However, the candidate gene approach, is time-consuming, cost-intensive, and insufficient, and has largely failed in prediction of risk for disease susceptibility, since only a limited number of genetic markers in a relatively small region are investigated [7,12]. Results from meta analyses are often inconsistent and demonstrate the need for more efficient and cost-effective high-throughput SNP genotyping technologies, such as DNA-microarray-based technology, for revealing disease causing genes [7,9,12,13].

Application of DNA-microarray technologies in large-scale studies of complex disorders facilitates genotyping of large number of SNPs. A DNA chip consists of an arrayed series of thousands of sequences for detection of tag SNPs from the entire genome [13]. Selection of popu-lation-specific tag SNPs has become available since the haplotype block structure of the human genome was established in the International HapMap Project (www. HapMap.org). Tag SNPs are representative markers for a set of variants within a region of high linkage disequilibrium in the genome. Thus, they are useful for economical and efficient genotyping of a relatively small number of markers which provide adequate information on disease-associated genes and loci. Candidate genes identified by such large-scale approaches require further analysis, to elucidate their role in disease etiology (http://www.ornl.gov/sci/techresources/Human_ Genome/faq/snps.shtm l) [13]. A whole genome association study based on array technology produces large amounts of data and requires a sufficient database, appropriate computational statistical methods, techniques for false-positive error detection and maintenance. Moreover, the use of such technology is allied to high costs and significant time, effort, and resource consumption.

We have performed a whole genome association study (WGAS) of DNA samples from unrelated Bulgarian patients with schizophrenia and healthy volunteers (unpublished data). Subsequently to the WGAS, the 100 top SNPs showing lowest p values were validated (genotyped by alternative method in the same samples) and replicated (genotyped in additional DNA samples). The large amount of data produced required comprehensive statistical analysis. For this reason we have created a client-server web-based application for statistical processing and for reliable storage of data from an automated genotyping study in a set of DNA samples as specified below.