• Genome Query Tools (GQT) ( Slides )
    • Genome Query Tools (GQT) is a tool and C API for storing and querying large-scale genotype data sets like those produced by 1000 Genomes. Genotypes are represented by compressed bitmap indices, which reduce the storage and compute burden by orders of magnitude. This index can significantly expand the capabilities of population-scale analyses by providing interactive-speed queries to data sets with millions of individuals.

      Get GQT now:
      # Download and make htslib
      git clone https://github.com/samtools/htslib.git
      cd htslib
      make
      cd ..

      # Download sqlite amalgamation source
      wget http://www.sqlite.org/2014/sqlite-amalgamation-3080701.zip
      unzip sqlite-amalgamation-3080701.zip

      # Get GQT
      git clone https://github.com/ryanlayer/gqt.git
      cd gqt
      # Modify HTS_ROOT and SQLITE_ROOT variables in src/Makfile
      make

      # Test GQT
      cd test/func/
      bash functional_tests.sh
  • LUMPY
    • Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. LUMPY is a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency
  • Binary Interval Search (BITS)
    • The Binary Interval Search (BITS) algorithm is a novel and scalable approach to interval set intersection. BITS outperforms existing methods at counting interval intersections. Moreover, BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals.