Cite as: Cold Spring Harb. Protoc.; 2008; doi:10.1101/pdb.prot5023
| Protocol |
This protocol was adapted from "Using the HapMap Web Site," Chapter 6, in Genetic Variation (eds. Weiner et al.). Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA, 2007.
INTRODUCTION
The primary goal of the International Haplotype Map Project has been to develop a haplotype map of the human genome that describes the common patterns of genetic variation, in order to accelerate the search for the genetic causes of human disease. Within the project, ~3.9 million distinct single-nucleotide polymorphisms (SNPs) have been genotyped in 270 individuals from four worldwide populations. The project data are available for unrestricted public use at the HapMap website. This site, which is the primary portal to genotype data produced by the project, offers bulk downloads of the data set, as well as interactive data browsing and analysis tools that are not available elsewhere. Research into the genetic contributions to a human disease commonly focuses on candidate genes identified from linkage and/or association studies, as well as from pathways suspected to be involved in a particular disease process. In studying candidate genes, a researcher will want to know whether there are any common SNPs in the immediate vicinity, what those SNPs alleles are, and the relative frequencies of the alleles in the population. The researcher will also be particularly interested in coding SNPs, whose alleles change the amino acid sequence of the gene product and therefore might represent functional variations. This protocol provides details on how to use the genome browser to navigate to and explore HapMap data for a gene or region of interest.
RELATED INFORMATION
HapMap data from the International Haplotype Map Project (International HapMap Consortium 2005) are available at the project website: http://www.hapmap.org (Thorisson et al. 2005). The HapMap website provides researchers with a number of tools that allow them to analyze the data as well as to download data for local analyses. The following web resources are also useful:
http://www.ensembl.org (Hubbard et al. 2007)
The following protocols describe additional tools and functions that have been developed for viewing, retrieving, and analyzing HapMap data:
MATERIALS
Equipment
Computer (Internet-connected)
METHOD
Finding and Browsing to a Region of Interest
The genome browser at the HapMap website provides access to small- to medium-sized regions of the genome for this type of interactive exploration. This basic protocol describes how to start using the genome browser.
![]() View larger version (54K): [in a new window] |
Figure 1. The initial page shown when starting to use the HapMap genome browser for the first time. Depending on your computer language settings, this page can appear in one of several languages, although this section assumes English. The page can also be reached directly at http://www.hapmap.org/cgi-perl/gbrowse/. |
![]() View larger version (45K): [in a new window] |
Figure 2. The HapMap genome browser displaying a requested feature. |
Viewing the Extent of Linkage Disequilibrium (LD)
When a researcher designs a study to detect the association between a common allelic variation of a gene and a disease of interest, knowledge of the extent of LD in the region is essential for reducing the number of SNPs that need to be genotyped across the region. If there is high LD in the region, then only a few SNPs need to be genotyped because their linkage to other SNPs in the region will serve as proxies for the genotypes of noncharacterized SNPs. In contrast, a region of low LD will need to be sampled more heavily because the allelic state of a genotyped SNP will be a poor predictor of the state of nongenotyped SNPs. The determination of patterns of LD in the populations characterized by the HapMap project has been one of the major goals of this project. The International HapMap Project has precalculated patterns of LD among the genotyped SNPs. The data can be downloaded in bulk from the HapMap website or browsed interactively using the HapMap genome browser. The latter method allows researchers to see patterns of LD in context with the distribution of genes of interest.
![]() View larger version (13K): [in a new window] |
Figure 3. The configuration page of the HapMap genome browser allows the user to customize numerous style features of the data display. |
![]() View larger version (23K): [in a new window] |
Figure 4. The HapMap genome browser displaying a triangle plot of LD values for multiple populations. A typical region of LD demonstrating "patches" of high LD separated by relatively well-defined boundaries of low LD is shown. The triangle plot is constructed by connecting every pair of SNPs along lines at 45° to the horizontal track line. The color of the diamond at the position where two SNPs intersect indicates the amount of LD; more intense colors indicate higher LD. A gray diamond indicates that data are missing. |
Picking and Viewing tag-SNPs
tag-SNPs are a reduced set of SNPs that capture much of the LD in regions; they can be used in association studies to reduce the number of SNPs needed to detect LD-based association between a trait of interest and a region of the genome. For small regions, it is possible to select tag-SNPs by hand using the graphical and numeric displays of LD generated above, but for best results, it is recommended that the researcher use an algorithm that chooses tag-SNPs by formally maximizing the number of linked SNPs captured by the tag set. There is no single set of tag-SNPs that will satisfy the diverse requirements of every association study design. Researchers may wish to select SNPs that work well with a particular genotyping system (e.g., those that have been included on a particular "SNP chip") and may be willing to accept different tradeoffs between the cost of genotyping a study population and the strength of the association they can detect. For this reason, the HapMap website does not offer a static set of preselected tag-SNPs, but instead offers researchers a tool for interactively selecting tag-SNPs based on user-provided criteria. The tag-SNP lists are generated from algorithms in the Tagger program (http://www.broad.mit.edu/mpg/tagger/; de Bakker et al. 2005).
![]() View larger version (55K): [in a new window] |
Figure 5. The HapMap genome browser graphically displaying tag-SNPs, as well as phased haplotypes. |
Viewing Phased Haplotypes
A researcher may wish to correlate the tag-SNP set selected by the tag-SNP picker algorithm with the underlying haplotype structure of the region. One way to do this is to turn both the pairwise LD and tag-SNP tracks on simultaneously (Steps 7-10 and 11-14, respectively). An alternative, however, is to activate a track that displays the phased haplotypes themselves. The phased haplotype data described in this section were generated by the International HapMap Project Consortium using the program PHASE version 2.1 (Stephens and Donnelly 2003). During phasing, each allele in a genotype is assigned to one or the other parental chromosome, using a maximum likelihood algorithm that uses trio (lineage) information in the HapMap population groups, or, if trio information is not available, by fitting the data to a model that minimizes the number of implied historical crossovers in the population. The phased haplotypes are displayed as a graphic in which each chromosome of the individuals sampled by the project is represented as a line one pixel high, and each SNP allele is arbitrarily colored blue or yellow. A region of high LD will appear as a region in which there are long runs of SNPs that share alleles across multiple chromosomes, indicating that there is little recombination among them. A region of low LD will appear as an area where the runs are shorter and more fragmentary.
DISCUSSION
A number of public online resources have been developed as portals to high-volume genome-wide data sets. The UCSC Genome Browser (http://genome.ucsc.edu; Kent et al. 2002) and the EnsEMBL project (www.ensembl.org;Hubbard 2007) have developed multispecies genome browsers that display genomic annotations graphically and offer retrieval of the underlying data. dbSNP (http://www.ncbi.nlm.nih.gov/SNP;Wheeler et al. 2007) is a repository for information on SNPs, but does not yet contain extensive information on the relationships among those SNPs.
The HapMap Web site, located at http://www.hapmap.org, has a distinct focus. It aims to be a resource in the display, retrieval, and analysis of high-throughput, high-quality, genome-wide human genetic data, with an emphasis on the support of tools for facilitating disease association studies. Although the resource is still in development, it currently provides the basic tools for visualizing patterns of common polymorphism among the populations surveyed by the HapMap project, selecting tag-SNP sets based on a variety of criteria, and generating customized extracts of the data set. In the future, the HapMap Web site will evolve to provide more services to those designing and interpreting genetic association studies.
REFERENCES
Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E., and Pritchard, J.K. 2006. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38: 75–81.[Medline]
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S. 2001. High-resolution haplotype structure in the human genome. Nat. Genet. 29: 229–232.[Medline]
de Bakker, P.I.W., Yelensky, R., Peer, I., Gabriel, S.B., Daly, M.J., and Altshuler, D. 2005. Efficiency and power in genetic association studies. Nat. Genet. 37: 1217–1223.[Medline]
Hinds, D.A., Kloek, A.P., Jen, M., Chen, X., and Frazer, K.A. 2006. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 38: 82–85.[Medline]
Hubbard, T.J.P., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cunningham, F., Cutts, T., et al. 2007. Ensembl 2007. Nucleic Acids Res. 35: D610–D617. doi: 10.1093/nar/gkl996.
Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W., and Lee, C. 2004. Detection of large-scale variation in the human genome. Nat. Genet. 36: 949–951.[Medline]
International HapMap Consortium. 2005. A haplotype map of the human genome. Nature 437: 1299–1320.[Medline]
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The Human Genome Browser at UCSC. Genome Res. 12: 996–1006.
McCarroll, S.A., Hadnott, T.N., Perry, G.H., Sabeti, P.C., Zody, M.C., Barrett, J.C., Dallaire, S., Gabriel, S.B., Lee, C., Daly, M.J., et al. 2006. Common deletion polymorphisms in the human genome. Nat. Genet. 38: 86–92.[Medline]
Mueller, J.C. 2004. Linkage disequilibrium for different scales and applications. Brief Bioinform. 5: 355–364.
Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W., et al. 2006. Global variation in copy number in the human genome. Nature 444: 444–454.[Medline]
Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Maner, S., Massa, H., Walker, M., Chi, M., et al. 2004. Large-scale copy number polymorphism in the human genome. Science 305: 525–528.
Sharp, A.J., Locke, D.P., McGrath, S.D., Cheng, Z., Bailey, J.A., Vallente, R.U., Pertz, L.M., Clark, R.A., Schwartz, S., Segraves, R., et al. 2005. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77: 78–88.[Medline]
Smith, A.V. 2008a. Generating HapMap data text reports using the genome browser. CSH Protocols (this issue) doi: 10.1101/pdb.prot5024.
Smith, A.V. 2008b. Manipulating HapMap data using HaploView. CSH Protocols (this issue) doi: 10.1101/pdb.prot5025.
Smith, A.V. 2008c. Retrieving HapMap data using HapMart. CSH Protocols (this issue) doi: 10.1101/pdb.prot5026.
Smith, A.V. 2008d. Retrieving HapMap data via bulk download. CSH Protocols (this issue) doi: 10.1101/pdb.prot5027.
Smith, A.V., Thomas, D.J., Munro, H.M., and Abecasis, G.R. 2005. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res. 15: 1519–1534.
Stephens, M. and Donnelly, P. 2003. A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73: 1162–1169.[Medline]
Thorisson, G.A., Smith, A.V., Krishnan, L., and Stein, L.D. 2005. The International HapMap Project Web site. Genome Res. 15: 1592–1593.
Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., Haugen, E., Hayden, H., Albertson, D., Pinkel, D., et al. 2005. Fine-scale structural variation of the human genome. Nat. Genet. 37: 727–732.[Medline]
Vastrik, I., DEustachio, P., Schmidt, E., Joshi-Tope, G., Gopinath, G., Croft, D., de Bono, B., Gillespie, M., Jassal, B., Lewis, S., et al. 2007. Reactome: A knowledge base of biologic pathways and processes. Genome Biol. 8: R39.[Medline]
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., et al. 2007. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 35: D5–D12. doi: 10.1093/nar/gkl1031.
Related Protocols
This article has been cited by other articles:
![]() |
A. V. Smith Generating HapMap Data Text Reports Using the Genome Browser Cold Spring Harb Protoc, July 1, 2008; 2008(8): pdb.prot5024 - pdb.prot5024. [Abstract] [Full Text] |
||||
![]() |
A. V. Smith Manipulating HapMap Data Using HaploView Cold Spring Harb Protoc, July 1, 2008; 2008(8): pdb.prot5025 - pdb.prot5025. [Abstract] [Full Text] |
||||
![]() |
A. V. Smith Retrieving HapMap Data Using HapMart Cold Spring Harb Protoc, July 1, 2008; 2008(8): pdb.prot5026 - pdb.prot5026. [Abstract] [Full Text] |
||||
![]() |
A. V. Smith Retrieving HapMap Data via Bulk Download Cold Spring Harb Protoc, July 1, 2008; 2008(8): pdb.prot5027 - pdb.prot5027. [Abstract] [Full Text] |
||||
Copyright © 2008 by Cold Spring Harbor Laboratory Press. Online ISSN: 1559-6095 Terms of Service |