hapConstructor

Description

Haplotypes carry important information that can direct investigators towards underlying susceptibility variants, and hence multiple tagging-SNPs are usually studied in candidate gene association studies. However, it is often unknown which SNPs should be included together in haplotype analyses, or how the tests should be constructed for maximum power. We have developed a program, hapConstructor, which automatically builds multi-locus SNP sets to test for association in a case-control framework. The multi-SNP sets considered at any step in the process need not be contiguous; the SNP sets are built based on the significance of the preceding steps' SNP subsets. An important feature is that missing data imputation is carried out based on the full data, for maximal information and consistency in the building process. HapConstructor is implemented in a Monte Carlo framework that provides appropriate significance testing, that can account for the construction process and naturally extends to related individuals. Further, empirical false discovery rate thresholds are also available. HapConstructor is a useful tool for exploring multi-locus associations in candidate genes and regions in a valid and structured process.
 
Models Detail
  • Diplotype - phase is ignored
    • Composite Genotype - phase is ignored
      • Global - consider all composite genotypes in one contingency table.
        • Test with chi-square test, or odds ratio (compare all composite genotypes with the hom. wild type).
      • All dominant and recessive combinations
        • These would create 2x2 contingency tables with specific composite genotypes (depending on the model) compared to all other composite genotypes.
          • Test with chi-square or odds ratio.
    • Haplotype - phase is considered
      • Global - consider all haplotype combinations in one contingency table.
        • Test with chi-square test, or odds ratio (compare all diplotypes with the common diplotype).
      • Dominant, Recessive, and Additive for each possible haplotype.
        • Test with chi-square and odds ratio for 2x2 tables and Cochran-Armitage Trend test with 2x3 contingency table.

  • Monotype - chromosome is unit of study (examining the individual haplotypes instead of pairs).
    • Global - Consider all haplotypes in one table
      • Test with chi-square or odds ratio (compare most common haplotype with all other haplotypes).
    • Specific reduction for each possible haplotype. This refers to carriage of each specific haplotype tested against all other haplotypes.
      • Create 2x2 contingency tables (compare a specific haplotype with all other haplotypes).
        • Test with chi-square or odds ratio test.
How to execute java -jar Genie.jar hapConstructor rgenfile[.rgen]
.rgen Parameter File Detailed description of .rgen XML file
hapConstructor Examples All hapConstructor Example Files
 
Instructions to run hapConstructor

  1. Java 1.6 JRE must be installed on your system (Download here)
   - To check if Java is installed go to a command prompt and type java.
   - To check the Java version installed go to a command prompt and type java -version.

  2. Download Genie.zip, extract / unzip the zip file.
   - Genie.jar, genepi.jar, and ge-rgen.dtd are the three files within the extracted directory that are necessary to run the program. The other files in the extracted directory are example files. For the program to work, the genepi.jar file must either be in the same directory as the Genie.jar or in the java classpath on your system. The ge-rgen.dtd file should also be in the same directory as the Genie.jar file.

  3. Create .rgen and .dat files. Examples can be found here.
   - Note that the .rgen and .dat files can be placed anywhere on your system, but precaution needs to be taken when specifying where they are located when you execute the program.
   - In the simplest situation, the .rgen and .dat are in the same directory as the .jar files. In this scenario, the .rgen would have specify the .dat file as being in the same directory (i.e. genotypedata="GenotypeData.dat").
   - The .dat file also should not contain any extra lines at the bottom. This will cause an error while the program is reading the data.

  4. Go to the directory with Genie.jar and type java -jar Genie.jar hapConstructor <.rgen file name> .
   - If the .rgen file is in another directory, then it is necessary to specify that location in the command line.
 
Additional Notes

  Running out of heap space?
  - For larger datasets or use of large number of Monte Carlo simulations (i.e. 80,000 - 100,000) the default Java Virtual Machine (JVM) memory allocation may not be sufficient. In this case, more memory for the JVM can be allocated provided the system being used has the memory by using -Xms and -Xmx when executing the program. Example: java -Xms1024m -Xmx1536m -jar Genie.jar hapConstructor <.rgen file name>. The example will allocate a maximum of 1.5 Gb and a minimum of 1 Gb of memory for the JVM to use while executing the program. The maximum amount of memory allocation for 32 bit systems is 2 Gb.

  A note on output files
  - When running hapConstructor there are a number of output files generated. The build files, denoted with .build, are generated after each step is complete, and are named after the study name specified in the .rgen file. The build files only contain the test results that passed the specified threshold for that step, while the all_obs.final file contains all the tests results from the tests conducted during the build process. The all_obs.final file is continually being written to as the program runs. Moving this file during execution could create an error. It is also important to note that on subsequent runs none of the files generated previously will be overwritten, but rather output will simply append to the end of what already exists in those files. This means that if two different analyses are performed in the same folder, the all_obs.final file generated from the first run will now contain results from both analyses. The same occurs with the all_sims.final file.
 
Viewing Results

  Organize and view results
  - The .build files that are generated by hapConstructor can be further processed using a python script created to view and sort the results in a sortable table within an html file. The python file and the necessary CSS and javascript files for formatting can be downloaded here. Example files are included in the zip file. To use the script, unzip the file into the directory which contains the .build files of interest. Run the python script by opening a command prompt and entering python hapConstructor_xml.py. If you do not currently have python your system, it can be downloaded here.

  Show me the haplotypes that correspond with the results
  - The .build files that are generated by hapConstructor can be further processed with another python script within the same zip file, called hapConstructor_table.py. The script needs to be in the same directory as all the build files, and can be executed in the command line with python hapConstructor_table.py . The script generates three text files, increase_risk.out, decrease_risk.out, and other.out. The two risk files contain all the test results in the .build files that correspond to their respective direction of risk. The chi-square test results are place in the other.out file. The .out files contain the markers across the top of the file followed by columns for the test model, test statistic, columns compared (for Odds ratios), the observed statistic value, and the empirical p-value. Each of the lines contains a test result and the haplotype or SNP that was used as the exposure variable for the test. The output in the text files are tab delimited and can easy be imported into an Excel spreadsheet for easier viewing. Example files are included in the zip file.

Home   PedGenie