Dataset Information

Please find below specific information on each dataset regarding study site, sample information and sequencing pipeline information.

Each dataset is represented as “SiteID StudyID”.

Current included datasets are:

  1. UMiami WES

UMiami WES

Contact information:
PI Jeffery Vance (
Morris K. Udall Parkinson Disease Research Center of Excellence
John P. Hussman Institute for Human Genomics
University of Miami, Miller School of Medicine
1501 NW 10th Ave, Biomedical Research Building
Miami, FL 33136
+1 (305) 243-2283

Sample information:
The UMiami WES dataset includes 396 white unrelated patients and 222 controls.
The 396 cases have an average onset age of 55.5 ± 15.1 (range: 14-83), include 62.2% males and include 69% with known positive family history.
The 222 controls have an average inclusion age of 68.1 ± 11.0 (range: 50-95) and include 28.1% males.

Pipeline information:
Fragmented DNA was captured using the SureSelect Human All Exon Kit, designed to cover 38 Mb or 50Mb of human genomic sequences. We used the 38Mb capture kit in the initial 21 individuals ; the remaining samples were all processed using the 50Mb version 3 kit. The libraries were loaded onto an Illumina cBot for cluster generation. The primer-hybridized flow cells were then transferred to HiSeq2000 sequencers and paired-end sequencing was done in a 2 × 101b mode (Illumina, San Diego, CA) (3-plex or 4-plex). The base calling was done by Illumina CASAVA 1.6 pipeline, and aligned to hg19 by BWA. Variant calling was performed using Genome Analysis Tool Kit (GATK). The Unified Genotyper from GATK performs variant quality score (VQS) recalibration and genotype refinement to make accurate variant calls. Additionally, the Unified Genotyper generates normalized Phred-scaled likelihood (PL) scores without priors, for each alternate genotype. Variants with VQSLOD<-3, depth <6 and alternate PL scores < 99 are excluded from the rest of the analysis presented here. Subsequent quality control included principal component analyses using common and rare variants, variant call-rate, relatedness.

Contact information:
More info on this dataset can be found at:

Sample information:
DNA was extracted from blood according to the PPMI Research Biomarkers Laboratory Manual. The PPMI WES dataset includes 462 cases and 183 unaffected individuals (“healthy controls”).
The 462 cases have an average onset age of 59.9 ± 10.1 (range: 25-83), include 64.3% males and include 24.9% with known positive family history. The 183 unaffected individuals (“healthy controls”) have an average inclusion age of 60.8 ± 11.2 (range: 31-84) and include 63.9% males. 10 of these healthy controls report presence of PD in paternal/maternal aunts/uncles/grandparents.

Pipeline Information:
Library preparation for next-generation sequencing using Nextera Rapid Capture Expanded Exome Kit was performed per manufacturer’s protocol (Illumina, Inc. San Diego). Nextera Expanded Exome targets 201,121 Exons, UTRs and miRNA and covers 95.3% of Refseq exome. >340,000 probes are constructed against the human NCBI37/hg19 reference genome. Targeted genomic footprint is 62Mb. Exome-¬enriched libraries (multiplexed sets of 12 samples) were sequenced on the Illumina HiSeq 2500 sequencing platform using 2 × 100 bp paired-¬‐end read cycles.

Briefly, paired-end sequence reads (fastq files) were aligned using BWA against the reference human genome (UCSC hg19). Duplicate read removal; format conversion and indexing were performed with Picard ( The Genome Analysis Toolkit (GATK) was used to recalibrate base quality scores and perform local re-alignments around indels for the aligned sequencing reads (per subject bam file). Variant calling and genotype likelihoods were generated per subject using the GATK HaplotypeCaller (per subject genomic vcf file). GATK CombineGVCFs and GenotypeGVCFs were used to perform joint genotyping for the cohort from the set of per subject genomic vcf files. Variant filtering was then applied using the GATK Variant Quality Score Recalibration tools (cohort vcf file). Subject quality control was performed based on variant call-rate, heterzygosity rate, gender check, relatedness/duplicates and population outliers using PLINK. {adjusted from file: PPMI_Methods_Exome_sequencing_116_20150311-1.pdf available at:}

