Clinical Laboratory Data Analytics: Implementing Gaussian Process distribution on Over One Million (1 million) Clinical Laboratory data towards Information for Actionable Insights.
Authored by: Oluwatobi Michael Owoeye, Principal Investigator, and Software Developer at Handsonlabs Software Academy.
(Clinical Laboratory Data Analytics)
Image 1: 3 Dimensional Laser powered Laboratory Microscope 
It is widely accepted Medical practice to run Clinical Laboratory tests before therapeutic treatment is given to a Patient diagnosed with any ailment. This is particularly important if there would be eventual outcome –based benefit for local, National or International Health Interventions towards improving or establishing evidence based results and clinical practice.
Machine (Deep) learning is a current branch of Artificial Intelligence, under Computer Science and is being applied to Financial History, Auditing of Accounts, Health Research, Consumer behavior amongst several others. This computational tool enables the analysis of not just large clinical laboratory dataset but logs of information stockpiled over time. It is very important to state here that in order to achieve United Nation’s Goal 3, which encapsulates health and wellness for all by 2030, two things are needful:
- Improving Patient outcomes from the perspective of Local/National/Hospital or International Health Interventions.
- Improving Health Interventions from the perspective of Patient based outcomes.
As we would soon agree that, both Patient outcomes and Health Interventions are connected and need to be given proper attention. Though of late, focus is being redirected to the former. This research paper, delves into over one million clinical Laboratory Diagnostic health interventions with the aim of improving Patient based outcomes through useful actionable insights from data drawn from Hospital Interventions (Clinical Laboratory Data). Gaussian statistical model is then applied to develop predictive models and reveal insights easily missed by the human eye. Finally, results, discussions and summary is made.
Keywords: Machine Learning, Artificial Intelligence, Clinical Laboratory Tests, Diagnosis, Pattern Recognition, Medical Research, Deep Learning, research and development, Health, Patient Outcomes, Hospital, Medical research, Computer Science, Well-being, Sustainable Development Goals (SDG).
INTRODUCTION & BACKGROUND
CLINICAL LABORATORY DIAGNOSTICS, AN INTEGRAL TOOL FOR BEST MEDICAL PRACTICE, HEALTH INTERVENTIONS & PATIENT BASED OUTCOMES.
Image 2: Technology-lens-laboratory-medical-60022 
It is imperative to know that clinical laboratory test results, and the right interpretation immensely and directly has impact on Patient treatment and eventual health outcomes of any Hospitalized Patient. Some common tests include but not limited to complete “blood count [CBC], electrolytes [sodium, potassium, chloride, and carbon-dioxide], thyroid stimulating hormone [TSH], glucose, etc.) routinely performed on site by most hospital-based clinical laboratories,”  this, ultimately helps clinicians, health authorities to make informed decision for best health interventions during an outbreak or epidemic.
According to Frank H. Wians , questions to be asked before Ordering a Clinical Laboratory Test include;
“What justifies the need for clinical Diagnosis?
- Are there implications for not ordering this test?
- What is the resultant effect of the absence of a standard clinical test?
- What standard procedure brings the best meaning to a clinical Laboratory test?
- What ways can patient management based on these tests be better improved?“
In addition, the purpose for ordering Laboratory Test as Frank H. Wians  states, include:
“1. Diagnosis (to rule in or rule out a diagnosis).
- Monitoring (eg, the effect of drug therapy).
- Screening (eg, for congenital hypothyroidism via neonatal thyroxine testing).
- Research (to understand the pathophysiology of a particular disease process).” 
Furthermore, Frank H. Wians , classifies Clinical Laboratory Tests as pre-analytic (order placed, transferred to lab, identifying information centered, specimen obtained), Analytic (Analysis of specimen), and Post-Analytic (Clinical response to result, Data interpreted, result conveyed to clinician, report generated) respectively. Where this research fits is the Post-Analytic terrain.
Image 3: Two-test-tubes-954585 
Additionally, Walker HK, Hall WD, Hurst JW, describes Clinical Laboratory rationale in line with, “probabilistic nature of diagnosis and its implications for ordering and interpreting laboratory tests-1”, “operating characteristics that define a laboratory test: reliability (precision), accuracy, sensitivity, specificity, and predictive value-2”, “differentiating purposes for which laboratory tests are ordered (diagnosis, monitoring therapy, and screening) and the operating characteristics required for each purpose-3”, “’normal-4’ test result and its meaning” and finally, “-Frequently used test ordering strategies and their limitations-”..
Clinical Diagnostic tests gulp quite a substantial resource and expenditure hence there is need to carefully administer, analyze and accurately learn patterns, interpret the results to better the lot of local/National/International health interventions particularly patient based outcomes. Therefore, there is no space for wrong assumptions as this will definitely have multiplier effects on public healthcare as well as therapeutic outcomes.
Given an open source clinical laboratory dataset provided by NCTU Pooled Resource Open-Access ALS Clinical Trials Database (Pro-Act) , I extracted five major columns namely Subject_ID, Test_Name, Test_Result, Test_Unit and Laboratory_Delta consisting of One million, forty eight thousand five hundred and seventy six (1,048,576) Clinical Laboratory test result data respectively.
Mat-lab R2017b provided the software platform for Classification tools, Regression toolbox for analyzing the immediate features needed to train and model of this clinical laboratory dataset.
Artificial Intelligence, Machine (Deep Learning) Methodology in analyzing over One Million Clinical Laboratory Data
Software Applications based on Artificial Intelligence have been on the increase over the past decade. Of worthy mention is its arm of machine learning as well as deep learning. Some of the most popular areas of application is found in Quality HealthCare Research, Fraud Detection, Weather Forecast, Smart Cities, Image Recognition & tracking just to mention a few.
Firstly, let’s take a pip into the raw table data structure:
|Subject_ID,||Most likely the de-identified patient personal details for compliance of International public health research ethics of privacy, confidentiality and security of Patient Identity. It is stores numeric character.|
|Test_Name,||Text or variable character. Here is the main subject matter. It stocks particular presence of substances (such as Hemoglobin, Urine Color, Uric Acid, ALPHA2-GLOBULIN, Sodium, Glucose, Chloride, Bicarbonate, Segmented Neutrophils, etc.) that gives pointers to medical or clinical guidance, insights or evidence based results, as well as therapeutic insight or action based outcomes.|
|Test_Result,||Text or variable character. Stores the numeric measure, level or quantity of each of the test name identified in the clinical Laboratory.|
|Test_Unit||Variable character. This contains units of the Test Results such as Hemoglobin (g/L), Urine Color(color), Uric Acid (umol/L), ALPHA2-GLOBULIN (g/dL), Sodium (mmol/L), Chloride (mmol/L), Bicarbonate Sodium (mmol/L), Segmented Neutrophils (%), Platelets (10E9/L,) Red Blood Cells (10E9/L).|
|Laboratory_Delta||Integer Variable. Optional metric for localized use.|
Please kindly refer to further elaboration on some of the Clinical Laboratory Units used in this over One million Laboratory dataset are described below :
Of all the column names, the most significant are the Test_Name, Test_Result, and Test_Unit.
Okay, right before we jump into core details of interpolations and plots, let’s quickly do a (Diagnostic detail to Vector plot) deep Learning classification of this One million, forty eight thousand five hundred and seventy six (1,048,576) Clinical Laboratory test result data. With 7440 words and 1048577 documents:
WordCloudChart with properties:
WordData: [1×7440 string]
SizeData: [1×7440 double]
Image 4, (Clinical Diagnostic Data to Vector Plot), showing over 1 million Clinical Laboratory Diagnostic Data breakdown using Mathematical Word to Vector Analytics.
Refer to the Discussion of this Word to Vector analytics in the Discussion, Results and Conclusion Section.
Gaussian distribution is an aspect of statistics and probability theory. It is a continuous distribution that gives an apt depiction of existing dataset that surrounds a mean value which is usually located at the peak of the curve referred to as Gaussian function or bell curve.
Image 5, showing Gaussian distribution on a normal curve
|1. m = 1; n = 40000;
2. t = m + (n-m) * rand(1, 500);
3. o = (m + n)/2;
4. c_y = 20000.5;
6. //plot the Gausian Bell Curve
8. g = gauss_distribution(t, m, c_y);
10. grid on
11. title(‘1million Clinical Data on Gaussian Bell Curve’)
12. xlabel(‘Randomly produced numbers’)
13. ylabel(‘Clinical Data on Gaussian Bell Curve’)
m, n represent varied clinical readings from the Medical Laboratory.
t is a variable that combines the permuted sum of m and n as sample variables multiplied by
random numbers between 1 and 500 respectively.
g, represents the Gaussian distribution method.
c_y represents the standard deviation between 1 and 40,000
Image 6, showing the plot of clinical Data on a Gaussian Bell Curve, real time
Notice the huge radius of this Gaussian Bell Curve as a direct reflection of the clinical laboratory dataset size.
Image 7, showing the plot of clinical Data on a Gaussian Bell 3d Curve
Image 8, showing Gaussian-Process.Distribution.ResidualsPlot.LabDelta.TestUnitPredictor
Image 9, showing Gaussian Process Distribution Response Plot. Clinical Laboratory Delta against Clinical Laboratory Test Unit(s).
Residuals plot in Machine Learning measures model performance. A perfect model has points or observations littered approximately and evenly distributed around 0. Response plots frequently displays regression model results.
Results, Discussions & Conclusion
Before we tackle the Nitti, gritty details of our findings, it is important to get accustomed to the standard units of measurements used extensively in this dataset;
10E9 / L
|1.0 × 1013 m-3
[platelets, RBC, Absolute Eosinophil count, white blood cells]
Alkaline Phosphatase, AST(SGOT),CK, Lactate Dehydrogenase, ALT(SGPT,SGOT), Mean Corpuscular, Hemoglobin
|gL||Grams per Liter
Hemoglobin, Protein, C-Reactive Protein, Albumin, Beta Globulin
|mL/min/1.73m||Glomerular filtration rate (GFR): Clinical Laboratory Test to ascertain functionality of kidneys; particularly how much blood passes per interval. |
|Mg/dL||Milligrams per deciliter (mg/dL).
Immunoglobin M, Fibrinogen, immunuglobin A, Urine glucose, Urine Protein
|umol/L||Units mol per liter
Uric Acid, Bilirubin, creatinine
|mmoll||Clinical Lab tests report results in millimoles per liter (mmol/L); millimole is one-thousandth of a mole. . From this dataset, it’s the main metric for: Sodium, Chroride, Glucose, Potassium, Calcium, Blood Urea Nitrogen (BUN), Phosphorus, and Bicarbonate.|
Predictions of Clinical Data with Gaussian Process Distribution using Laboratory Delta (local Clinic Variable) against Test_Unit Predictor variable gives the following Noticeable & notable Observations from the deep classification of the word to vector and Box plot with Machine Learning Gaussian Process Models respectively;
From the Clinical Diagnostic Data to Vector Plot, prominently we can see: 10E9 / L, 1.0 × 1013 m-3 standard metric used for [platelets, RBC, Absolute Eosinophil count, white blood cells respectively]. Little wonder why the One (1) million Clinical Laboratory Diagnostic word to vector gives this Clinical Laboratory Test unit an undeniable focus. In addition, so does urine (color yellow) related information.
Secondly, uL, (Copies/Microliter) the clinical metric for Alkaline Phosphatase, AST(SGOT),CK, Lactate Dehydrogenase, ALT(SGPT,SGOT), Mean Corpuscular, Hemoglobin respectively.
Thirdly, g/L, Grams per Liter extensively used for Hemoglobin, Protein, C – reactive protein, Albumin, and Beta Globulin Clinical Laboratory tests.
In addition, we see, Ml/min/1.73m used for Glomerular filtration rate (GFR) Test to ascertain functionality of kidneys. This is succeeded by mmol main metric for: Sodium, Chloride, Glucose, Potassium, Calcium, Blood Urea Nitrogen (BUN), Phosphorus and Bicarbonate respectively.
Again, from the response plots, and box plots of the Gaussian Predictive Process Model graph above, Absolute Eosinophil count measured in (10e9l) stands out of the over one (1) million clinical dataset very significantly, and prominently as previously stated. In orange color is also is mmol, spotted in the deep learning vector analytics, which happens to be the clinical laboratory standard unit for test for the presence of Glucose, Sodium, Potassium, Blood Urea Nitrogen, Lithium, Triglycerides.
Other notable information extracted from mining this dataset are:
“lymphocytes”, “dehydrogenase”, ”platelets”, “bicarbonate”, ”creatine”, “hematocrit”, “glucose”,“phosphatase”,”calcium”,”neutrophil”,”phosphorus”,”wbc”,”potassium”, “urea”, ”bilirubin”, ”blood”, “red”, ”rbc”, ”nitrogen”, ”alkaline”, ”acid”, ”sodium”, “altsgpt”,”basophil”, ”protein”, “ph”, ”monocytes”, ”triglycerides”, ”chloride”,”albumin”,”glycated”,”gammaglutamyltransferase”, “hemoglobin”,
Let’s take a look at the outcomes of this Clinical laboratory data mining, its implications and possible actionable insights:
- “Lymphocytes”: Also referred to as white blood cell constituting more than 20% of white blood cell count. Though, its quantity varies. They usually exist in two forms:
- B cells which manufacture antibodies and
- T cells which identifies harmful bodies and immediately process them for eviction. .
Shortage of this biological component is a clear indication of severe ailment that requires prompt medical attention. .
- “Dehydrogenase”: Often referred to as (Lactate Dehydrogenase-LD or LDH) is an enzyme present every living body’s cells, linked directly to energy manufacturing. Specific health conditions relating to weakness symptoms can be used to confirm or reject certain ailments. .
- ”platelets”: Another name for thrombocytes are parts of living cells needed for natural blood clotting. Shortage of this substance could indicate abnormality. .
- “Bicarbonate”: This is the sum of CO2, TCO2, Carbon Dioxide Content, CO2 Content Bicarbonate, and HCO3 respectively. They form a fragment of the electrolyte body build to supervise electrolyte irregularity or acid-base (ph), a shortage of which, is often responsible for signs of weakness, confusion, constant vomiting, breathing issues which could point to electrolyte or ph referred to as acidosis or alkalosis. .
- ”Creatine”, it’s also referred to as Creat Blood Creatinine Serum Creatinine Urine Creatinine. This is a good fragment of a complete/fundamental metabolic panel when there is suspected clinical sign of kidney disease or damage or conditions that affect normal kidney functions. Or at parts to supervise therapeutic kidney disease performance erstwhile on medication prescription. Some research studies have indicated that eating cooked meat prior to testing for presence of creatine can temporarily increase the level of creatinine. .
- “Hematocrit”: Also referred to as Hct Crit Packed Cell Volume PCV H and H (Hemoglobin and Hematocrit) . This is a routine test targeting hemoglobin, complete blood count(CBC), when there are signs of anemia (weakness, fatigue) or polycythemia (dizziness, headache); , at steady intervals to keep abreast an anomaly that impacts red blood cells to ascertain the impact of a clinical treatment.
- “Glucose”: Glucose test is the most common test for blood sugar levels subdivided into type I & II respectively. This helps ascertaining diagnostic for a diabetic patient, confirming insulin alterations and leveling up insights for patient to monitor glucose levels, insulin adjustments and insights on how specific nutrients and eating habits can be linked to glucose levels within a healthcare environment and help patients with this ailment. In the Clinical Diagnostic word to vector deep learning classification process above, mmol was significantly highlighted in orange color thus indicating an appreciable level of glucose level checks and possible population prone to diabetes. .
- “Phosphatase”: Also referred to as ALP, Alk Phos Alkp Alkaline phosphatase isoenzymes, Bone specific ALP, Public Name Alkaline Phosphatase Tests are usually carried out in case of liver or bone ailment. .
- ”calcium”: symptoms of organ disorder, related to kidneys, thyroid, parathyroid, nerves, surveillance of ionized calcium levels, specific cancer diagnosis, or effectiveness of a therapeutics. 
- ”neutrophil”: Most common type of white blood cells. 
- ”phosphorus”: Other reference names P, PO4, Phosphate. Clinical Laboratory tests for kidney disorder or uncontrolled diabetes related ailments. .
- ”wbc”: Referred to as WBC Count Leukocyte, Count White Count, Its common name is White Blood Cell Count. In the case of routine health check; when a patient has signs and symptoms that may be related to a condition affecting WBCs such as infection, inflammation, or cancer. 
- ”potassium”: Referred to as K. its common name is Potassium, blood or urine. Often used for symptoms such as muscle weakness and/or irregular heart beat (cardiac arrhythmia) or when an electrolyte imbalance is suspected; at steady interims when under medication and/or have a disease or condition, such as high blood pressure (hypertension) or kidney disease, that can affect potassium level; as part of a procedural medical check-up. 
- “Urea”: Referred to as BUN, Urea Nitrogen Urea, Its official name is often referred to as Blood Urea Nitrogen. Its test is usually carried out when there are signs and symptoms that may be due to kidney disease or a suspected condition that may cause or be worsened by kidney dysfunction; at regular times or when treatment is intended for kidney disease or damage. 
- ”Bilirubin”: Total Bilirubin TBIL, Neonatal Bilirubin, Direct Bilirubin Conjugated Bilirubin, Indirect Bilirubin, Unconjugated Bilirubin are common references: Test is usually carried out when there are signs or symptoms of liver damage, liver disease, bile duct blockage, hemolytic anemia, or a liver-related metabolic problem, or if a newborn has jaundice. 
- ”RBC”: RBC Count Erythrocyte, Count Red Count, as often referred. Its official reference name is Red Blood Cell Count. Usually done for up to date blood count (CBC), during a health check, or when there are signs and symptoms of a condition such as anemia or polycythemia. 
- ”Nitrogen”: Referred as BUN, Urea Nitrogen, Urea Formal Name, Blood Urea Nitrogen: Done as a wide-ranging or basic metabolic panel; for symptoms that may be due to kidney related or a condition that may cause or be worsened by kidney dysfunction; at constant intervals when treatment for kidney disease or damage is being administered. 
- ”Alkaline”: similar to Phosphatase above.
- ”Sodium”: Chemically represented as Na, identity Sodium. There are normal limits of sodium within human body. Besides, Sodium contributes immensely to ph levels, electrolyte equilibrium for tracking ailments linked to excess sodium levels in the human body that may affect the kidneys. Dehydration, blood pressure, excess fluid (edema) , routine health checks [nerve, muscle function, blood/urine contents]. Sodium is generally responsible for thirst, dehydration caused by excess sodium levels.
- “Altsgpt”: Clinically referred to as ALT Serum Glutamic-Pyruvic Transaminase SGPT GPT Alanine Transaminase. Lab wise called Alanine Aminotransferase. Tests are done with respect to liver disorder, abdominal pain, nausea, vomiting, jaundice (yellow skin), hepatitis virus or general medical checkup. 
- ”basophil”: Type of white blood cell (leukocyte), with coarse granules that stain blue when exposed to a basic dye. Basophils normally constitute 1% or less of the total white blood cell count but may increase or decrease in certain diseases. ,
- ”protein”: Proteins are generally building blocks life. They constitute cells, tissues, organs, enzymes, hormones that keeps life functioning properly.,
- “ph”: Measures of acidity and alkalinity levels.
- ”Monocytes”: Another type of WBC responsible for monitoring harmful biological components such as bacteria or other harmful organisms.
- ”Triglycerides”: TG, TRIG, Triglycerides are referred names. Measures risk of heart diseases, efficacy of lipid reducing therapeutics. ,
- ”Chloride”: Chlorine (element) is an integral part of electrolyte metabolism in the human body. It fares well with potassium (K), Sodium (Na), Bicarbonate (CO2) respectively. Maintains body fluid, acid-base regularization. Clinical Lab test tracking chloride targets electrical neutrality at cellular level. Its works directly with Sodium activities in the body..
- ”Albumin”: Clinical reference names are ALB, Formal Name Albumin, serum, ”glycated”. Most Clinical Lab screens tests focuses on relationship to Kidney disease, nutritional status, hospitalized patients. It is also produced in the liver, 60% of total protein in the blood and assumes several roles including restricting leaking of blood vessels, tissue nourishment, moves hormones, vitamins, drugs, substances such as calcium in the entire body. Its activity are Kidney, liver centered. .
- ”Gammaglutamyltransferase”: Clinically referred as Gamma-Glutamyl Trans-peptidase, GGTP, Gamma-GT, GTP Clinical reference name is Gamma-Glutamyl Transferase. Usually done with respect to possible liver disease or bile duct disease or to distinguish between liver and bone disease as a cause of elevated alkaline phosphatase (ALP); occasionally to screen for or monitor alcohol abuse. 
- “Hemoglobin”: Clinically referred to as Hgb Hb H and H (Hemoglobin and Hematocrit) Formal Name Hemoglobin. To ascertain the hemoglobin content of human blood or as part of routine health assessment; to screen for and help diagnose conditions that affect red blood cells (RBCs); if anemia (low hemoglobin) or polycythemia (high hemoglobin), to assess the severity of these conditions and to monitor response to treatment. 
On first glance, the long list of one million clinical laboratory data, completely keeps one lost as there is no indicator or clue as to where to start, look or pay attention to. However, with the help of machine learning tool, actively, [10E9 / L ] metric for platelets, RBC, Absolute Eosinophil count, white blood cells, and [mmol] metric for: Sodium, Chroride, Glucose, Potassium, Calcium, Blood Urea Nitrogen (BUN), Phosphorus, and Bicarbonate, in addition, [uL], for Alkaline Phosphatase, AST(SGOT),CK, Lactate Dehydrogenase, ALT(SGPT,SGOT), Mean Corpuscular, Hemoglobin with the help of data mining analytics indicates possible population based ailments that demand these tests.
Let’s assume that a specific patient severally tests for deficiency or excess of these substances, then the best therapeutic would be to diagnose and recommend medically proven interventions to mitigate the patient condition and improve outcomes.
 Clinical Laboratory Tests: Which, Why, and What Do The Results Mean? Frank H. Wians [ONLINE] Available at: https://academic.oup.com/labmed/article/40/2/105/2504825, [Accessed 4th May, 2018].
 NCTU Pooled Resource Open-Access ALS Clinical Trials Database (Pro-Act) Clinical Database [ONLINE] Available at: https://nctu.partners.org/ProACT/Data/Download?file_id=87&guid=5ecddc8f-c0da-4692-b597-b583867f07ff ,[Accessed 4th May, 2018].
 NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health. Walker HK, Hall WD, Hurst JW, editors. Clinical Methods: The History, Physical, and Laboratory Examinations. 3rd edition. Boston: Butterworths; 1990. Available at: https://www.ncbi.nlm.nih.gov/books/NBK372/ 1/14, [Accessed 4th May, 2018].
 Frankel Cardiovascular Center, Michigan Medicine. Lab Test Results: Units of Measurement; October 9, 2017. Available at: https://www.umcvc.org/health-library/zd1440, [Accessed 9th May, 2018].
 Lab Tests Online; October 9, 2017. Available at: https://labtestsonline.org/tests/, [Accessed 9th May, 2018].
 Medline-Plus; Glomerular filtration rate; 30 April 2018. Available at: https://medlineplus.gov/ency/article/007305.htm, [Accessed 11th May, 2018].
 Close Up of Microscope; 25th May, 2018. Available at: https://www.pexels.com/photo/close-up-of-microscope-256262/, [Accessed 25th May, 2018].
 Technology-lens-laboratory-medical; 25th May, 2018. Available at: https://www.pexels.com/photo/technology-lens-laboratory-medical-60022/ [Accessed 25th May, 2018].
 Two test tubes; 25th May, 2018. Available at: https://www.pexels.com/photo/two-test-tubes-954585/, [Accessed 25th May, 2018].
 Clinical Laboratory Analytics; Oluwatobi Owoeye, Handsonlabs Software Academy, 29th May, 2018. Available at: http://handsonlabs.org/clinical-laboratory-data-analytics/, [Accessed 25th May, 2018].
I especially want to thank administrators of Prize4Life Israel & Neurological Clinical Research Institute, Massachusetts General Hospital, NCTU Pooled Resource Open-Access ALS Clinical Trials Database (Pro-Act) Clinical Trials Database, who gracefully approved my application to the in March, 2018. Much thanks also to the United States’ National Institutes of Health, who so consistently and relentlessly support research, development in health and human sciences, as well as other website authors and administrators whose articles were cited as references in this research and development. Thank you All, God bless you most abundantly for supporting this life saving research and development to save human lives.