Proceedings of the 6th Workshop on Encrypted Computing & Applied Homomorphic Cryptography - WAHC '18, 2018
We describe our recent experience, building a system that uses fully-homomorphic encryption (FHE)... more We describe our recent experience, building a system that uses fully-homomorphic encryption (FHE) to approximate the coefficients of a logistic-regression model, built from genomic data. The aim of this project was to examine the feasibility of a solution that operates "deep within the bootstrapping regime," solving a problem that appears too hard to be addressed just with somewhat-homomorphic encryption. As part of this project, we implemented optimized versions of many "bread and butter" FHE tools. These tools include binary arithmetic, comparisons, partial sorting, and low-precision approximation of "complicated functions" such as reciprocals and logarithms. Our eventual solution can handle thousands of records and hundreds of fields, and it takes a few hours to run. To achieve this performance we had to be extremely frugal with expensive bootstrapping and data-movement operations. We believe that our experience in this project could serve as a guide for what is or is not currently feasible to do with fully-homomorphic encryption.
In recent years, there has been tremendous progress in the development of quantum computing hardw... more In recent years, there has been tremendous progress in the development of quantum computing hardware, algorithms and services leading to the expectation that in the near future quantum computers will be capable of performing simulations for natural science applications, operations research, and machine learning at scales mostly inaccessible to classical computers. Whereas the impact of quantum computing has already started to be recognized in fields such as cryptanalysis, natural science simulations, and optimization among others, very little is known about the full potential of quantum computing simulations and machine learning in the realm of healthcare and life science (HCLS). Herein, we discuss the transformational changes we expect from the use of quantum computation for HCLS research, more specifically in the field of cell-centric therapeutics. Moreover, we identify and elaborate open problems in cell engineering, tissue modeling, perturbation modeling, and bio-topology while discussing candidate quantum algorithms for research on these topics and their potential advantages over classical computational approaches.
Genome-wide association studies (GWAS) of multiple populations with distinctive genetic and lifes... more Genome-wide association studies (GWAS) of multiple populations with distinctive genetic and lifestyle backgrounds are crucial to the understanding of Type 2 Diabetes Mellitus (T2DM) pathophysiology. We report a GWAS on the genetic basis of T2DM in a 3,286 Lebanese participants. More than 5,000,000 SNPs were directly genotyped or imputed using the 1000 Genomes Project reference panels. We identify genome-wide significant variants in two loci CDKAL1 and TCF7L2, independent of sex, age and BMI, with leading variants rs7766070 (OR 5 1.39, P 5 4.77 3 10 29 ) and rs34872471 (OR 5 1.35, P 5 1.01 3 10 28 ) respectively. The current study is the first GWAS to find genomic regions implicated in T2DM in the Lebanese population. The results support a central role of CDKAL1 and TCF7L2 in T2DM susceptibility in Southwest Asian populations and provide a plausible component for understanding molecular mechanisms involved in the disease. T ype 2 diabetes mellitus (T2DM) is a chronic metabolic disease with a complex pathogenesis defined by genetic predisposition and environmental factors 1 . In Lebanon, T2DM was evaluated on 8,050 Lebanese cases in 1990. Its prevalence and incidence, similar to the international averages, were 5.0% and 1.5 to 1.7% respectively 2 . In 1992, a study on 436 cases from all over the Lebanese territory gave prevalence of 7 to 8% for T2DM and 10 to 11% for impaired glucose tolerance 3 . In 2005, a study on 3,000 exclusively Lebanese individuals from Greater Beirut showed a prevalence of 11.3% which increased with older age 4 . The combined prevalence of previously and newly diagnosed T2DM was 15.8%. At that time, in the U.S., 6.3% of the population had T2DM: 4.5% diagnosed and 1.8% undiagnosed according to the 2004 National Diabetes Fact Sheet. These results suggest that the prevalence of T2DM in Greater Beirut is relatively high and is increasing among the Lebanese population. Furthermore, according to the latest figures in the American National Diabetes Statistics Report, 29.1 million children and adults in the United States have diabetes, and 86 million people have prediabetes 5 . The pathogenesis of T2DM is closely associated with a positive family history, male gender, age over 45 years, overweight, hypertension, and abnormal lipid levels. In addition, the genetic contribution to T2DM is well recognized with a total of 91 established associated susceptibility loci . The common variants in the reported loci however account for only a small proportion of the heritability of T2DM and the functional role of most of these variants remains far from clear. It is possible that a large number of highly-penetrant but rare T2DM susceptibility genetic variants remain to be identified. Additional genome-wide explorations including whole genome and exome sequencing in wellestablished groups of patients and controls may unravel these additional important genetic disease contributors which will undoubtedly help us understand the complex mechanisms involved in the development of T2DM. The prevalence of T2DM is distinctly variable across populations and this variability adds to the disease complexity. This variability could be due to differences in lifestyle factors such as dietary habits, as well as behavioral patterns among populations. Multiple established T2DM susceptibility genetic loci have been identified from previous Genome Wide Association Studies (GWAS) in populations of European and Asian ethnicities . It is however, equally important to replicate the behavior of these previously discovered associated
A Quantitative and Qualitative Characterization of k-mer Based Alignment-Free Phylogeny Construction
Lecture Notes in Computer Science, 2019
The rapidly growing volume of genomic data, including pathogens, both invites exploration of poss... more The rapidly growing volume of genomic data, including pathogens, both invites exploration of possible phylogenetic relationships among unclassified organisms, and challenges standard techniques that require multiple sequence alignment. Further, the ability to probe variations in selection pressure e.g. among viral outbreaks, is an important characterization of the life of a virus in its biological reservoir.
Background Forced displacement and war trauma cause high rates of post-traumatic stress, anxiety ... more Background Forced displacement and war trauma cause high rates of post-traumatic stress, anxiety disorders and depression in refugee populations. We investigated the impact of forced displacement on mental health status, gender, presentation of type 2 diabetes (T2D) and associated inflammatory markers among Syrian refugees in Lebanon. Methods Mental health status was assessed using the Harvard Trauma Questionnaire (HTQ) and the Hopkins Symptom Checklist-25 (HSCL-25). Additional metabolic and inflammatory markers were analyzed. Although symptomatic stress scores were observed in both men and women, women consistently displayed higher symptomatic anxiety/depression scores with the HSCL-25 (2.13 ± 0.58 versus 1.95 ± 0.63). With the HTQ, however, only women aged 35-55 years displayed symptomatic post-traumatic stress disorder (PTSD) scores (2.18 ± 0.43). Furthermore, a significantly higher prevalence of obesity, prediabetes and undiagnosed T2D were observed in women participants (23.43, 14.91 and 15.18%, respectively). Significantly high levels of the inflammatory marker serum amyloid A were observed in women (11.90 ± 11.27 versus 9.28 ± 6.93, P = 0.036). Conclusions Symptomatic PTSD, anxiety/depression coupled with higher levels of inflammatory marker and T2D were found in refugee women aged between 35 and 55 years favoring the strong need for psychosocial therapeutic interventions in moderating stress-related immune dysfunction and development of diabetes in this subset of female Syrian refugees.
The role of Lipoprotein(a) (Lp(a)) in increasing the risk of cardiovascular diseases is reported ... more The role of Lipoprotein(a) (Lp(a)) in increasing the risk of cardiovascular diseases is reported in several populations. The aim of this study is to investigate the correlation of high Lp(a) levels with the degree of coronary artery stenosis. Methods: Two hundred and sixty-eight patients were enrolled for this study. Patients who underwent coronary artery angiography and who had Lp(a) measurements available were included in this study. Binomial logistic regressions were applied to investigate the association between Lp(a) and stenosis in the four major coronary arteries. The effect of LDL and HDL Cholesterol on modulating the association of Lp(a) with coronary artery disease (CAD) was also evaluated. Multinomial regression analysis was applied to assess the association of Lp(a) with the different degrees of stenosis in the four major coronary arteries. Results: Our analyses showed that Lp(a) is a risk factor for CAD and this risk is significantly apparent in patients with HDL-cholesterol ≥35 mg/dL and in non-obese patients. A large proportion of the study patients with elevated Lp(a) levels had CAD even when exhibiting high HDL serum levels. Increased HDL with low Lp(a) serum levels were the least correlated with stenosis. A significantly higher levels of Lp(a) were found in patients with >50% stenosis in at least two major coronary vessels arguing for pronounced and multiple stenotic lesions. Finally, the derived variant (rs1084651) of the LPA gene was significantly associated with CAD. Our study highlights the importance of Lp(a) levels as an independent biological marker of severe and multiple coronary artery stenosis.
Background: The COVID-19 pandemic claimed millions of lives worldwide without clear signs of abat... more Background: The COVID-19 pandemic claimed millions of lives worldwide without clear signs of abating. There have been tremendous interests in understanding the etiology of the disease particularly in what makes it fatal in certain patients. Methods: This study investigated 819 COVID-19 patients admitted to the COVID-19 ward at a tertiary care center in Lebanon and evaluated their vital signs and biomarkers while probing for two main outcomes: intubation and fatality. Results: Correlation analysis of various comorbidities revealed that hypertension, diabetes, being overweight, kidney disease, cardiovascular disease, autoimmune disease, and gender are independent risk factors for both intubation and fatality. Shortness of breath, age and being overweight correlated with intubation while fatality correlated with shortness of breath in our group of patients. Elevated level of serum creatinine was the highest correlating factor with fatality, while both white blood count and serum glutam...
Parkinson's Disease (PD) is a progressive neurodegenerative movement disorder characterized b... more Parkinson's Disease (PD) is a progressive neurodegenerative movement disorder characterized by loss of striatal dopaminergic neurons. Progression of PD is usually captured by a host of clinical features represented in different rating scales. PD diagnosis is associated with a broad spectrum of non-motor symptoms such as depression, sleep disorder as well as motor symptoms such as movement impairment, etc. The variability within the clinical phenotype of PD makes detection of the genes associated with early onset PD a difficult task. To address this issue, we developed CuNA, a cumulant-based network analysis algorithm that creates a network from higher-order relationships between eQTLs and phenotypes as captured by cumulants. We also designed a multi-omics simulator, CuNAsim to test CuNA's qualitative accuracy. CuNA accurately detects communities of clinical phenotypes and finds genes associated with them. When applied on PD data, we find previously unreported genes INPP5J, S...
As studies move into deeper characterization of the impact of selection through non-neutral mutat... more As studies move into deeper characterization of the impact of selection through non-neutral mutations in whole genome population genetics, modeling for selection becomes crucial. Moreover, epistasis has long been recognized as a significant component in understanding evolution of complex genetic systems. We present a backward coalescent model EpiSimRA, that builds multiple loci selection, with multi-way (k-way) epistasis for any arbitrary k. Starting from arbitrary extant populations with epistatic sites, we trace the Ancestral Recombination Graph (ARG), sampling relevant recombination and coalescent events. Our framework allows for studying different complex evolutionary scenarios in the presence of selective sweeps, positive and negative selection with multiway epistasis. We also present a forward counterpart of the coalescent model based on a Wright-Fisher (WF) process which we use as a validation framework, comparing the hallmarks of the ARG between the two. We provide the first...
Currently, there are 18 different religious communities living in Lebanon. While evolving primari... more Currently, there are 18 different religious communities living in Lebanon. While evolving primarily within Lebanon, these communities show a level of local isolation as demonstrated previously from their Y-haplogroup distributions. In order to trace the origins and migratory patterns that may have led to the genetic isolation and autosomal clustering in some of these communities we analyzed Y-chromosome STR and SNP sample data from 6327 individuals, in addition to whole genome autosomal sample data from 609 individuals, from Mount Lebanon and other surrounding communities. We observed Y chromosome L1b Levantine STR branching that occurred around 5000 years ago. Autosomal DNA analyses suggest that the North Lebanese Mountain Maronite community possesses an ancestral Fertile Crescent genetic component distinct from other populations in the region. We suggest that the Levantine L1b group split from the Caucasus ancestral group around 7300 years ago and migrated to the Levant. This event was distinct from the earlier expansions from the Caucasus region that contributed to the wider Levantine populations. Differential cultural adaption by populations from the North Lebanese Mountains are clearly aligned with the L1b haplotype STR haplogroup clusters, indicating pre-existing and persistent cultural barriers marked by the transmission of L1b lineages. Our findings highlight the value of uniparental haplogroups and STR haplotype data for elucidating biosocial events among these populations.
India represents an intricate tapestry of population substructure shaped by geography, language, ... more India represents an intricate tapestry of population substructure shaped by geography, language, culture and social stratification operating in concert. To date, no study has attempted to model and evaluate how these evolutionary forces have interacted to shape the patterns of genetic diversity within India. Geography has been shown to closely correlate with genetic structure in other parts of the world. However, the strict endogamy imposed by the Indian caste system, and the large number of spoken languages add further levels of complexity. We merged all publicly available data from the Indian subcontinent into a data set of 835 individuals across 48,373 SNPs from 84 well-defined groups. Bringing together geography, sociolinguistics and genetics, we developed COGG (Correlation Optimization of Genetics and Geodemographics) in order to build a model that optimally explains the observed population genetic sub-structure. We find that shared language rather than geography or social stru...
Background: Complex diseases may have multiple pathways leading to disease. E.g. coronary artery ... more Background: Complex diseases may have multiple pathways leading to disease. E.g. coronary artery disease evolves from arterial damage to their epithelial layers, but has multiple causal pathways. More challenging, those pathways are highly correlated within metabolic syndrome. The challenge is to identify specific clusters of phenotype characteristics (composite phenotypes) that may reflect these different etiologies. Further, GWAS seeking to identify SNPs satisfying multiple composite phenotype descriptions allows for lower false positive rates at lower α thresholds, allowing for the possibility of reducing false negatives. This may provide a window into the missing heritability problem. Methods: We identify significant phenotype patterns, and identify fuzzy redescriptions among those patterns using Jaccard distances. Further, we construct Vietoris-Rips complexes from the Jaccard distances and compute the persistent homology associated with those. The patterns comprising these topological features are identified as composite phenotpyes, whose genetic associations are explored with logistic regression applied to pathways and to GWAS. We identified several phenotypes that tended to be dominated by metabolic syndrome descriptions, and which were distinct among the combinations of metabolic syndrome conditions. Among SNPs marking the RAAS complex, various SNPs associated specifically with different groups of composite phenotypes, as well as distinguishing between the composite phenotypes and simple phenotypes. Each of these showed different genetic associations, namely rs6693954, rs762551, rs1378942, and rs1133323. GWAS identified SNPs that associated with composite phenotypes included rs12365545, rs6847235, and rs701319. Eighteen GWAS identified SNPs appeared in combinations supported in composite combinations with greater power than for any individual phenotype. Conclusions: We do find systematic associations among metabolic syndrome variates that show distinctive genetic association profiles. Further, the systematic characterization involves composite phenotype descriptions that allow for combined power of individual phenotype GWAS tests, yielding more significance for lower individual thresholds, permitting the exploration of SNPs that would otherwise show as false negatives.
Background The COVID-19 pandemic claimed millions of lives worldwide without clear signs of abati... more Background The COVID-19 pandemic claimed millions of lives worldwide without clear signs of abating despite several mitigation efforts and vaccination campaigns. There have been tremendous interests in understanding the etiology of the disease particularly in what makes it severe and fatal in certain patients. Studies have shown that COVID-19 patients with kidney injury on admission were more likely to develop severe disease, and acute kidney disease was associated with high mortality in COVID-19 hospitalized patients. Methods This study investigated 819 COVID-19 patients admitted between January 2020-April 2021 to the COVID-19 ward at a tertiary care center in Lebanon and evaluated their vital signs and biomarkers while probing for two main outcomes: intubation and fatality. Logistic and Cox regressions were performed to investigate the association between clinical and metabolic variables and disease outcomes, mainly intubation and mortality. Times were defined in terms of admissio...
Biological pathways play a crucial role in the properties of diseases and are important in drug d... more Biological pathways play a crucial role in the properties of diseases and are important in drug discovery. Identifying the logical relationships among distinctive phenotypic clusters could reveal possible connections to the underlying pathways. However, this process is challenging since clinical phenotypes are often available through unstructured electronic health records. Moreover, in the absence of a standardized questionnaire, there could be bias among physicians toward selecting certain medical terms. In this article, we develop an efficient pipeline to address these challenges and help practitioners to reveal the pathways associated with the disease. We use topological data analysis and redescriptions and propose a pipeline of four phases: (1) pre-processing the clinical notes to extract the salient concepts, (2) constructing a feature space of the patients to characterize the extracted concepts, (3) leveraging the topological properties to distill the available knowledge and visualize the extracted features, and finally, (4) investigating the bias in the clinical notes of the selected features and identify possible pathways. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.
On the basis of the premise that scaling is defined by a set of scaling transformations which for... more On the basis of the premise that scaling is defined by a set of scaling transformations which form a simple product group, a scaling group is constructed for simple fractal scaling and the scaling of multifractal sets. The singularity strengths and densities are shown to be corollary to the scaling-group formalism. Fractal dimension and other exponents are shown to be generators of infinitesimal transformations. Applications are made to self-affine fractals and the scaling log-normal distribution.
COVID-19 has caused thousands of deaths around the world and also resulted in a large internation... more COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop tools and models that can aid researchers with this process in a timely manner. However, medical records are often unstructured clinical notes, and this poses significant challenges to developing the automated systems. In this article, we propose a pipeline to aid practitioners in analyzing clinical notes and revealing the pathways associated with this disease. Our pipeline relies on topological properties and consists of three steps: 1) preprocessing the clinical notes to extract the salient concepts, 2) constructing a feature space of the patients to characterize the extracted concepts, and finally, 3) leveraging the topological properties to distill the available knowledge and visualize the result. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.
We derive an exact relationship between the density-density correlation function C(r) and the rad... more We derive an exact relationship between the density-density correlation function C(r) and the radius of gyration R,. We show that if the correlation function scales as C(r) = r-"f(r/N') then from the exact relations between C(r) and R, = N" it follows that, unlike the commonly used expression LY = d-1 /p. the dependence of a on p must include a y dependence. The new relationship is 2/3 = (d + 2 ~ cu) y-1. Using the above expression on DLA clusters of up to 330000 particles produces an excellent scaling collapse of C(r).
Uploads
Papers by Daniel Platt