Type 2 diabetes is one of the most prevalent chronic diseases in the United States, affecting the health of millions of people, and putting an enormous financial burden on the US economy.
This “assignment” was inspired on the works of Xie Z, Nikolayeva O, Luo J, Li D. Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques. Their paper can be accessed via this link.
My objective is to practice and learn how to build predictive models using machine learning techniques, in the spirit of the original study, but using the most recent survey data (2022). It would be a bonus if my models came close to the performance of Dr Xie’s.
To recap, the original definition of an individual with Type 2 Diabetes is: - an individual aged 30 years or older (respondents younger than 30 years old were excluded as they most likely had Type 1 diabetes), - an individual who had been told by a healthcare professional that he/she had Type 2 diabetes, - respondents who had pre-diabetes, or respondents who had diabetes while pregnant, were excluded from the study.
rm(list = ls())
sessionInfo()
# Set packages and dependencies
pacman::p_load("tidyverse", #for tidy data science practice
"tidymodels", "workflows",# for tidy machine learning
"pacman", #package manager
"devtools", #developer tools
"Hmisc", "skimr", "broom", "modelr",#for EDA
"jtools", "huxtable", "interactions", # for EDA
"ggthemes", "ggstatsplot", "GGally",
"scales", "gridExtra", "patchwork", "ggalt", "vip",
"ggstance", "ggfortify", # for ggplot
"DT", "plotly", #interactive Data Viz
# Lets install some ML related packages that will help tidymodels::
"usemodels", "poissonreg", "agua", "sparklyr", "dials",#load computational engines
"doParallel", # for parallel processing (speedy computation)
"ranger", "xgboost", "glmnet", "kknn", "earth", "klaR", "discrim", "naivebayes",#random forest
"janitor", "lubridate", "haven")
I obtained the latest available Behavioral Risk Factor Surveillance System (BRFSS 2022) data available from the Centers for Disease Control and Prevention.
The Behavioral Risk Factor Surveillance System (BRFSS) is the US’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.
The BRFSS 2022 data from CDC was stored in an SAS (.XPT) file format.
This was imported into R using read_xpt
from the haven
package. It
had 445132 rows representing individual survey responses and 328 columns representing variables.
df <- read_xpt("LLCP2022.XPT")
I included most of the independent variables from the original study, as well as several new variables of interest. Below is a summary of dependent and independent variables used:
Variable | Description | Values |
---|---|---|
diabete4 | (Ever told) (you had) diabetes? | yes, no |
bmi5cat | Four-categories of BMI (body mass index) | 1. underweight, 2. normal weight, 3. overweight, 4. |
smoker3 | Four-levels of smoker status | 1.everyday smoker, 2. someday smoker, 3. former smoker, 4. non-smoker |
cvdstrk3 | (Ever told) (you had) a stroke? | 1.yes, 2. no |
cvdcrhd4 | (Ever told) (you had) angina or coronary heart disease? | 1.yes, 2. no |
GENHLTH Question: Would you say that in general your health is: 1 Excellent 71,878 16.15 17.40 2 Very good 148,444 33.35 31.84 3 Good 143,598 32.26 32.48 4 Fair 60,273 13.54 13.69 5 Poor 19,741 4.43 4.29 7 Don’t know/Not Sure 810 0.18 0.19 9 Refused 385 0.09 0.10 BLANK Not asked or Missing 3 . .
_AGEG5YR 1 Age 18 to 24 Notes: 18 <= AGE <= 24 26,941 6.05 11.90 2 Age 25 to 29 Notes: 25 <= AGE <= 29 21,990 4.94 7.72 3 Age 30 to 34 Notes: 30 <= AGE <= 34 25,807 5.80 9.38 4 Age 35 to 39 Notes: 35 <= AGE <= 39 28,526 6.41 7.63 5 Age 40 to 44 Notes: 40 <= AGE <= 44 29,942 6.73 8.41 6 Age 45 to 49 Notes: 45 <= AGE <= 49 28,531 6.41 6.49 7 Age 50 to 54 Notes: 50 <= AGE <= 54 33,644 7.56 7.72 8 Age 55 to 59 Notes: 55 <= AGE <= 59 36,821 8.27 7.31 9 Age 60 to 64 Notes: 60 <= AGE <= 64 44,511 10.00 8.67 10 Age 65 to 69 Notes: 65 <= AGE <= 69 47,099 10.58 6.98 11 Age 70 to 74 Notes: 70 <= AGE <= 74 43,472 9.77 6.32 12 Age 75 to 79 Notes: 75 <= AGE <= 79 32,518 7.31 4.37 13 Age 80 or older Notes: 80 <= AGE <= 99 36,251 8.14 4.94 14 Don’t know/Refused/Missing Notes: 7 <= AGE <= 9 9,079 2.04 2.15
_BMI5CAT Question: Four-categories of Body Mass Index (BMI) 1 Underweight Notes: _BMI5 < 1850 (_BMI5 has 2 implied decimal places) 6,778 1.71 2.03 2 Normal Weight Notes: 1850 <= _BMI5 < 2500 116,976 29.52 30.50 3 Overweight Notes: 2500 <= _BMI5 < 3000 139,995 35.32 34.14 4 Obese Notes: 3000 <= _BMI5 < 9999 132,577 33.45 33.32
CHECKUP1 Question: About how long has it been since you last visited a doctor for a routine checkup? 1 Within past year (anytime less than 12 months ago) 350,944 78.84 74.97 2 Within past 2 years (1 year but less than 2 years ago) 41,919 9.42 10.74 3 Within past 5 years (2 years but less than 5 years ago) 24,882 5.59 6.75 4 5 or more years ago 19,079 4.29 5.13 7 Don’t know/Not sure 5,063 1.14 1.39 8 Never 2,509 0.56 0.83 9 Refused 733 0.16 0.20
INCOME3 Question: Is your annual household income from all sources: (If respondent refuses at any income level, code ´Refused.´) 1 Less than $10,000 10,341 2.39 2.95 2 Less than $15,000 ($10,000 to < $15,000) 11,031 2.55 2.43 3 Less than $20,000 ($15,000 to < $20,000) 14,300 3.31 3.44 4 Less than $25,000 ($20,000 to < $25,000) 20,343 4.71 4.71 5 Less than $35,000 ($25,000 to < $35,000) 42,294 9.79 9.92 6 Less than $50,000 ($35,000 to < $50,000) 46,831 10.84 10.20 7 Less than $75,000 ($50,000 to < $75,000) 59,148 13.69 12.42 8 Less than $100,000? ($75,000 to < $100,000) 48,436 11.21 10.42 9 Less than $150,000? ($100,000 to < $150,000)? 50,330 11.65 11.19 10 Less than $200,000? ($150,000 to < $200,000) 22,553 5.22 5.39 11 $200,000 or more 23,478 5.43 6.13 77 Don’t know/Not sure 36,114 8.36 10.44 99 Refused 47,001 10.87 10.37 BLANK Not asked or Missing 12,932 . .
FLUSHOT7 Question: During the past 12 months, have you had either flu vaccine that was sprayed in your nose or flu shot injected into your arm? 1 Yes 209,256 52.11 44.53 2 No—Go to Section 15.03 PNEUVAC4 188,755 47.01 54.46 7 Don’t know/Not Sure—Go to Section 15.03 PNEUVAC4 2,455 0.61 0.69 9 Refused—Go to Section 15.03 PNEUVAC4 1,073 0.27 0.32 BLANK 43,593 . .
EMPLOY1 Question: Are you currently…? 1 Employed for wages 186,004 42.38 47.34 2 Self-employed 38,768 8.83 9.46 3 Out of work for 1 year or more 8,668 1.97 2.54 4 Out of work for less than 1 year 8,044 1.83 2.56 5 A homemaker 17,477 3.98 4.94 6 A student 11,111 2.53 4.80 7 Retired 137,083 31.23 20.46 8 Unable to work 26,737 6.09 6.41 9 Refused 5,044 1.15 1.48 BLANK Not asked or Missing 6,196 . .
SEXVAR Question: Sex of Respondent 1 Male—Code=1 if LANDSEX1=1 or CELLSEX1=1 or COLGSEX1=1 209,239 47.01 48.69 2 Female—Code=2 if LANDSEX1=2 or CELLSEX1=2 or COLGSEX1=2 235,893 52.99 51.31
MARITAL Question: Are you: (marital status) 1 Married 227,424 51.09 49.33 2 Divorced 57,516 12.92 10.20 3 Widowed 48,019 10.79 7.03 4 Separated 8,702 1.95 2.36 5 Never married 80,001 17.97 24.71 6 A member of an unmarried couple 18,668 4.19 5.20 9 Refused 4,794 1.08 1.18 BLANK Not asked or Missing 8 . .
EDUCAG Question: Level of education completed 1 Did not graduate High School Notes: EDUCA = 1 or 2 or 3 26,011 5.84 11.63 2 Graduated High School Notes: EDUCA = 4 108,990 24.48 27.39 3 Attended College or Technical School Notes: EDUCA = 5 120,252 27.01 30.04 4 Graduated from College or Technical School Notes: EDUCA = 6 187,496 42.12 30.34 9 Don’t know/Not sure/Missing Notes: EDUCA = 9 or Missing 2,383 0.54 0.60
SLEPTIM1 Question: On average, how many hours of sleep do you get in a 24-hour period? 1 - 24 Number of hours [1-24] 439,679 98.78 98.57 77 Don’t know/Not Sure 4,792 1.08 1.23 99 Refused 658 0.15 0.21 BLANK Missing 3 . .
CVDCRHD4 Question: (Ever told) (you had) angina or coronary heart disease? 1 Yes 26,551 5.96 4.40 2 No 414,176 93.05 94.67 7 Don’t know/Not sure 4,044 0.91 0.84 9 Refused 359 0.08 0.10 BLANK Not asked or Missing 2 .
PRIMINSR Question: What is the current primary source of your health insurance? 1 A plan purchased through an employer or union (including plans purchased through another person´s employer) 161,388 36.26 39.07 2 A private nongovernmental plan that you or another family member buys on your own 36,931 8.30 9.28 3 Medicare 135,848 30.52 20.78 4 Medigap 536 0.12 0.15 5 Medicaid 29,072 6.53 8.51 6 Children´s Health Insurance Program (CHIP) 188 0.04 0.06 7 Military related health care: TRICARE (CHAMPUS) / VA health care / CHAMP- VA 15,373 3.45 3.28 8 Indian Health Service 1,385 0.31 0.17 9 State sponsored health plan 12,878 2.89 2.76 10 Other government program 10,630 2.39 2.70 88 No coverage of any type 23,018 5.17 8.07 77 Don’t know/Not Sure 9,890 2.22 3.22 99 Refused 7,991 1.80 1.95 BLANK Not asked or Missing 4 . .
MENTHLTH Question: Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good? 1 - 30 Number of days Notes: _ _ Number of days 170,836 38.38 41.49 88 None 265,229 59.58 56.10 77 Don’t know/Not sure 6,589 1.48 1.76 99 Refused 2,475 0.56 0.65 BLANK Not asked or Missing 3 . .
CHCKDNY2 Question: Not including kidney stones, bladder infection or incontinence, were you ever told you had kidney disease? 1 Yes 20,315 4.56 3.68 2 No 422,891 95.00 95.87 7 Don’t know / Not sure 1,581 0.36 0.35 9 Refused 343 0.08 0.10 BLANK Not asked or Missing 2 . .
_TOTINDA Question: Adults who reported doing physical activity or exercise during the past 30 days other than their regular job 1 Had physical activity or exercise Notes: EXERANY2 = 1 337,559 75.83 75.85 2 No physical activity or exercise in last 30 days Notes: EXERANY2 = 2 106,480 23.92 23.85 9 Don’t know/Refused/Missing Notes: EXERANY2 = 7 or 9 or Missing 1,093 0.25 0.29
ADDEPEV3 Question: (Ever told) (you had) a depressive disorder (including depression, major depression, dysthymia, or minor depression)? 1 Yes 91,410 20.54 20.47 2 No 350,910 78.83 78.74 7 Don’t know/Not sure 2,140 0.48 0.62 9 Refused 665 0.15 0.17 BLANK Not asked or Missing 7 . .
RENTHOM1 Question: Do you own or rent your home? 1 Own 310,708 69.80 66.63 2 Rent 108,332 24.34 25.81 3 Other arrangement 21,463 4.82 6.11 7 Don’t know/Not Sure 1,099 0.25 0.49 9 Refused 3,521 0.79 0.96 BLANK Not asked or Missing Notes: Due to the nature of the data or the size of the table for display, this information is not printed for this report 9 . .
EXERANY2 Question: During the past month, other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise? 1 Yes 337,559 75.83 75.85 2 No 106,480 23.92 23.85 7 Don’t know/Not Sure 724 0.16 0.18 9 Refused 367 0.08 0.11 BLANK Not asked or Missing 2 . .
BLIND Question: Are you blind or do you have serious difficulty seeing, even when wearing glasses? 1 Yes 23,658 5.56 5.78 2 No 399,910 94.04 93.75 7 Don’t know/Not Sure 1,042 0.25 0.27 9 Refused 667 0.16 0.20 BLANK Not asked or Missing 19,855 . .
DECIDE Question: Because of a physical, mental, or emotional condition, do you have serious difficulty concentrating, remembering, or making decisions? 1 Yes 50,100 11.81 13.34 2 No 370,792 87.42 85.81 7 Don’t know/Not Sure 2,266 0.53 0.56 9 Refused 988 0.23 0.29 BLANK Not asked or Missing 20,986 . .
HLTHPLN Question: Adults who had some form of health insurance 1 Have some form of insurance Notes: PRIMINSR=1, 2, 3, 4, 5, 6, 7, 8, 9, 10 404,229 90.81 86.77 2 Do not have some form of health insurance Notes: PRIMINSR=88 23,018 5.17 8.07 9 Don´t know, refused or missing insurance response Notes: PRIMINSR=77, 99, or missing 17,885 4.02 5.16
DIABETE4 Question: (Ever told) (you had) diabetes? (If ´Yes´ and respondent is female, ask ´Was this only when you were pregnant?´. If Respondent says pre-diabetes or borderline diabetes, use response code 4.)
1 Yes 61,158 13.74 12.04 2 Yes, but female told only during pregnancy—Go to Section 08.01 AGE 3,836 0.86 1.01 3 No—Go to Section 08.01 AGE 368,722 82.83 84.34 4 No, pre-diabetes or borderline diabetes—Go to Section 08.01 AGE 10,329 2.32 2.27 7 Don’t know/Not Sure—Go to Section 08.01 AGE 763 0.17 0.23 9 Refused—Go to Section 08.01 AGE 321 0.07 0.11 BLANK Not asked or Missing 3 . .
_SMOKER3 Question: Four-level smoker status: Everyday smoker, Someday smoker, Former smoker, Non-smoker 1 Current smoker - now smokes every day Notes: SMOKE100 = 1 and SMOKEDAY = 1 36,003 8.09 8.09 2 Current smoker - now smokes some days Notes: SMOKE100 = 1 and SMOKEDAY = 2 13,938 3.13 3.54 3 Former smoker Notes: SMOKE100 = 1 and SMOKEDAY = 3 113,774 25.56 21.87 4 Never smoked Notes: SMOKE100 = 2 245,955 55.25 57.07 9 Don’t know/Refused/Missing Notes: SMOKE100 = 1 and SMOKEDAY = 9 or SMOKE100 = 7 or 9 or Missing 35,462 7.97 9.44
DRNKWK2 Question: Calculated total number of alcoholic beverages consumed per week 0 Did not drink Notes: DROCDY4_=0 or AVEDRNK3=88 188,832 42.42 41.91 1 - 98999 Number of drinks per week Notes: 0 < DROCDY4_ < 990 206,595 46.41 44.78 99900 Don’t know/Not sure/Refused/Missing Notes: AVEDRNK3=.,77,99 or DROCDY4_=900 49,705 11.17 13.32
DRNKANY6 Question: Adults who reported having had at least one drink of alcohol in the past 30 days. 1 Yes Notes: 1 <= ALCDAY4 <= 231 210,891 47.38 46.04 2 No Notes: ALCDAY4=888 187,667 42.16 41.60 7 Don’t know/Not Sure Notes: ALCDAY4=777 3,447 0.77 0.94 9 Refused/Missing Notes: ALCDAY4=999, Missing 43,127 9.69 11.43
_CURECI2 Question: Adults who are current e-cigarette users 1 Not currently using E-cigarettes Notes: ECIGNOW2=1, 4 387,356 87.02 83.59 2 Current E-cigarette user Notes: ECIGNOW2=2,3 22,116 4.97 6.76 9 Don’t know/Refused/Missing Notes: ECIGNOW2=7,9, or missing 35,660 8.01 9.64
_RFSMOK3 Question: Adults who are current smokers 1 No Notes: _SMOKER3 = 3 or 4 359,729 80.81 78.93 2 Yes Notes: _SMOKER3 = 1 or 2 49,941 11.22 11.62 9 Don’t know/Refused/Missing Notes: _SMOKER3 = 9 35,462 7.97 9.44
_HADSIGM Question: Colonoscopy and sigmoidoscopy are exams to check for colon cancer. Have you ever had either of these exams? 1 Yes 213,158 72.82 68.17 2 No—Go to Section 11.06 COLNCNCR 76,372 26.09 30.53 7 Don’t know/Not Sure—Go to Section 11.06 COLNCNCR 1,811 0.62 0.74 9 Refused—Go to Section 11.06 COLNCNCR 1,378 0.47 0.55 BLANK Not asked or Missing Notes: Section 08.01, AGE, is less than 45; 152,413 . .
_INCOMG1 Question: Income categories 1 Less than $15,000 Notes: INCOME3=1,2 21,372 4.80 5.17 2 $15,000 to < $25,000 Notes: INCOME3=3,4 34,643 7.78 7.83 3 $25,000 to < $35,000 Notes: INCOME3=5 42,294 9.50 9.54 4 $35,000 to < $50,000 Notes: INCOME3=6 46,831 10.52 9.81 5 $50,000 to < $100,000 Notes: INCOME3=7,8 107,584 24.17 21.96 6 $100,000 to < $200,000 Notes: INCOME3=9,10 72,883 16.37 15.95 7 $200,000 or more Notes: INCOME3=11 23,478 5.27 5.89 9 Don’t know/Not sure/Missing Notes: INCOME3=77, 99, or missing 96,047 21.58 23.84
_EDUCAG Question: Level of education completed 1 Did not graduate High School Notes: EDUCA = 1 or 2 or 3 26,011 5.84 11.63 2 Graduated High School Notes: EDUCA = 4 108,990 24.48 27.39 3 Attended College or Technical School Notes: EDUCA = 5 120,252 27.01 30.04 4 Graduated from College or Technical School Notes: EDUCA = 6 187,496 42.12 30.34 9 Don’t know/Not sure/Missing Notes: EDUCA = 9 or Missing 2,383 0.54 0.60
_CHLDCNT Question: Number of children in household 1 No children in household Notes: CHILDREN = 88 321,907 72.32 64.10 2 One child in household Notes: CHILDREN = 01 46,241 10.39 13.23 3 Two children in household Notes: CHILDREN = 02 37,923 8.52 10.83 4 Three children in household Notes: CHILDREN = 03 15,975 3.59 4.78 5 Four children in household Notes: CHILDREN = 04 5,521 1.24 1.66 6 Five or more children in household Notes: 05 <= CHILDREN < 88 3,100 0.70 0.97 9 Don’t know/Not sure/Missing Notes: CHILDREN = 99 14,464 3.25 4.43 BLANK 1 . .
_BMI5 Question: Body Mass Index (BMI)
WTKG3 Question: Reported weight in kilograms
HTM4 Question: Reported height in meters
_AGE80 Question: Imputed Age value collapsed above 80 18 - 24 Imputed Age 18 to 24 26,943 6.05 11.90 25 - 29 Imputed Age 25 to 29 22,000 4.94 7.73 30 - 34 Imputed Age 30 to 34 25,840 5.81 9.41 35 - 39 Imputed Age 35 to 39 28,771 6.46 7.79 40 - 44 Imputed Age 40 to 44 30,403 6.83 8.68 45 - 49 Imputed Age 45 to 49 29,580 6.65 6.86 50 - 54 Imputed Age 50 to 54 37,404 8.40 8.54 55 - 59 Imputed Age 55 to 59 38,059 8.55 7.44 60 - 64 Imputed Age 60 to 64 44,681 10.04 8.71 65 - 69 Imputed Age 65 to 69 47,642 10.70 7.07 70 - 74 Imputed Age 70 to 74 44,940 10.10 6.53 75 - 79 Imputed Age 75 to 79 32,616 7.33 4.40 80 - 99 Imputed Age 80 or older 36,253 8.14 4.94
_RACEPR1 Question: Computed race groups used for internet prevalence tables 1 White only, non-Hispanic Notes: _RACE=1 or _RACE=9 and _IMPRACE=1 333,514 74.92 59.20 2 Black only, non-Hispanic Notes: _RACE=2 or _RACE=9 and _IMPRACE=2 35,876 8.06 11.62 3 American Indian or Alaskan Native only, Non-Hispanic Notes: _RACE=3 or _RACE=9 and _IMPRACE=4 7,120 1.60 1.21 4 Asian only, non-Hispanic Notes: _RACE=4 or _RACE=9 and _IMPRACE=3 13,487 3.03 6.11 5 Native Hawaiian or other Pacific Islander only, Non-Hispanic Notes: _RACE=5 2,414 0.54 0.48 6 Multiracial, non-Hispanic Notes: _RACE=6 9,744 2.19 3.12 7 Hispanic Notes: _RACE=7 or _RACE=9 and _IMPRACE==5 42,977 9.65 18.25
_DRDXAR2 Question: Respondents who have had a doctor diagnose them as having some form of arthritis 1 Diagnosed with arthritis Notes: HAVARTH4 = 1 151,148 34.16 26.64 2 Not diagnosed with arthritis Notes: HAVARTH4 = 2 291,351 65.84 73.36 BLANK Don´t know/Not Sure/Refused/Missing Notes: HAVARTH4 = 7 or 9 or Missing 2,633 . .
ASTHMA3 Question: (Ever told) (you had) asthma? 1 Yes 66,694 14.98 15.17 2 No—Go to Section 07.06 CHCSCNC1 376,665 84.62 84.34 7 Don’t know/Not Sure—Go to Section 07.06 CHCSCNC1 1,494 0.34 0.42 9 Refused—Go to Section 07.06 CHCSCNC1 277 0.06 0.08 BLANK Not asked or Missing 2 . .
_DENVST3 Question: Adults who have visited a dentist, dental hygenist or dental clinic within the past year 1 Yes Notes: LASTDEN4=1 292,408 65.69 62.66 2 No Notes: LASTDEN4=2 or 3 or 4 145,703 32.73 35.42 9 Don’t know/Not Sure Or Refused/Missing Notes: LASTDEN4=7 or 9 or Missing 7,017 1.58 1.93 BLANK Missing 4 . .
SDHISOLT Question: How often do you feel socially isolated from others? Is it… 1 Always 8,098 3.19 4.06 2 Usually 13,178 5.19 5.63 3 Sometimes 53,072 20.91 21.62 4 Rarely 70,617 27.82 26.18 5 Never 106,160 41.83 41.21 7 Don’t know/Not Sure 1,696 0.67 0.79 9 Refused 969 0.38 0.50 BLANK Not asked or Missing 191,342 . .
LSATISFY Question: In general, how satisfied are you with your life? 1 Very satisfied 114,252 44.89 42.07 2 Satisfied 123,445 48.51 50.46 3 Dissatisfied 10,758 4.23 4.67 4 Very dissatisfied 3,062 1.20 1.38 7 Don’t know/Not sure 1,864 0.73 0.90 9 Refused 1,107 0.43 0.51 BLANK Not asked or Missing 190,644 . .
DIFFWALK Question: Do you have serious difficulty walking or climbing stairs? 1 Yes 68,081 16.10 13.75 2 No 353,039 83.47 85.78 7 Don’t know/Not Sure 1,221 0.29 0.28 9 Refused 636 0.15 0.19 BLANK Not asked or Missing 22,155 . .
DIFFDRES Question: Do you have difficulty dressing or bathing? 1 Yes 16,813 3.98 3.85 2 No 404,404 95.77 95.81 7 Don’t know/Not Sure 488 0.12 0.15 9 Refused 548 0.13 0.19 BLANK Not asked or Missing 22,879 . .
DEAF Question: Are you deaf or do you have serious difficulty hearing? 1 Yes 38,946 9.13 7.06 2 No 385,539 90.40 92.44 7 Don’t know/Not Sure 1,246 0.29 0.27 9 Refused 757 0.18 0.23 BLANK Not asked or Missing 18,644 . .
PHYSHLTH Question: Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 days was your physical health not good? 1 - 30 Number of days 166,386 37.38 36.75 88 None 267,819 60.17 60.54 77 Don’t know/Not sure 8,875 1.99 2.21 99 Refused 2,047 0.46 0.50 BLANK Not asked or Missing 5 . .
CDASSIST Question: As a result of confusion or memory loss, how often do you need assistance with these day-to-day activities? 1 Always 304 4.09 4.57 2 Usually 281 3.78 4.90 3 Sometimes 1,354 18.22 21.00 4 Rarely—Go to Module 13.05 CDSOCIAL 1,447 19.47 19.25 5 Never—Go to Module 13.05 CDSOCIAL 3,954 53.21 49.12 7 Don’t know/Not sure—Go to Module 13.05 CDSOCIAL 78 1.05 1.04 9 Refused—Go to Module 13.05 CDSOCIAL 13 0.17 0.13 BLANK Not asked or Missing Notes: Section 08.01, AGE, is less than 45; or Module 13.01, CIMEMLOS, is coded 2 or 9 437,701 . .
CVDSTRK3 Question: (Ever told) (you had) a stroke. 1 Yes 19,239 4.32 3.56 2 No 424,336 95.33 96.01 7 Don’t know/Not sure 1,274 0.29 0.35 9 Refused 281 0.06 0.08 BLANK Not asked or Missing 2 . .
CVDCRHD4 Question: (Ever told) (you had) angina or coronary heart disease? 1 Yes 26,551 5.96 4.40 2 No 414,176 93.05 94.67 7 Don’t know/Not sure 4,044 0.91 0.84 9 Refused 359 0.08 0.10 BLANK Not asked or Missing 2 . .
data <-
df %>%
dplyr::select("DIABETE4", # response variable
# personal health
"_BMI5CAT", "_BMI5", #bmi cat, bmi numeric
"_SMOKER3", "CVDSTRK3", "CVDCRHD4", #smoke, stroke, heart disease
"_CURECI2", # e-cig
#demographics
# age, income cat, employ, gender, marital, education, home (rent/own)
"_AGEG5YR", "INCOME3", "EMPLOY1", "SEXVAR", "MARITAL", "_EDUCAG", "RENTHOM1",
# number children, age numeric, race
"_CHLDCNT", "_AGE80", "_RACEPR1",
#self assessment
"GENHLTH", "PRIMINSR", "MENTHLTH", "BLIND", "DECIDE", "_HLTHPLN", "WTKG3", "HTM4",
"DIFFWALK", "DIFFDRES", "DEAF", "PHYSHLTH",
#habits
"SLEPTIM1", "_TOTINDA", "EXERANY2", "_DRNKWK2", "DRNKANY6",
#medical
"CHECKUP1", "FLUSHOT7", "CVDCRHD4", "CHCKDNY2", "ADDEPEV3",
"_DRDXAR2", "ASTHMA3", "_DENVST3") %>%
janitor::clean_names() %>%
mutate(diabete4 = as.factor(case_when(diabete4 == 1 ~ "yes",
diabete4 == 2 ~ "no",
diabete4 == 3 ~ "no",
diabete4 == 4 ~ "no")
),
bmi5cat = factor(bmi5cat),
bmi5 = as.numeric(bmi5/100),
smoker3 = as.factor(case_when(smoker3 == 1 ~ "smoker",
smoker3 == 2 ~ "smoker",
smoker3 == 3 ~ "former smoker",
smoker3 == 4 ~ "non-smoker")
),
cvdstrk3 = as.factor(case_when(cvdstrk3 == 7 ~ NA_character_,
cvdstrk3 == 9 ~ NA_character_,
.default = as.factor(cvdstrk3)
)
),
cvdcrhd4 = as.factor(case_when(cvdcrhd4 == 7 ~ NA_character_,
cvdcrhd4 == 9 ~ NA_character_,
.default = as.factor(cvdcrhd4)
)
),
cureci2 = as.factor(case_when(cureci2 == 9 ~ NA_character_,
.default = as.factor(cureci2)
)
),
ageg5yr = case_when(ageg5yr == 14 ~ NA_character_,
.default = as.character(ageg5yr)
),
ageg5yr = as.numeric(ageg5yr),
income3 = as.factor(case_when(income3 == 77 ~ NA_character_,
income3 == 99 ~ NA_character_,
.default = as.factor(income3)
)
),
employ1 = as.factor(case_when(employ1 == 9 ~ NA_character_,
.default = as.factor(employ1)
)
),
sexvar = as.factor(sexvar),
marital = as.factor(case_when(marital == 9 ~ NA_character_,
.default = as.factor(marital)
)),
educag = as.factor(case_when(educag == 9 ~ NA_character_,
.default = as.factor(educag)
)
),
renthom1 = as.factor(case_when(renthom1 == 7 ~ NA_character_,
renthom1 == 9 ~ NA_character_,
.default = as.factor(renthom1)
)
),
chldcnt = as.factor(case_when(chldcnt == 1 ~ "0",
chldcnt == 2 ~ "1",
chldcnt == 3 ~ "2",
chldcnt == 4 ~ "3",
chldcnt == 5 ~ "4",
chldcnt == 6 ~ "5 or more",
chldcnt == 9 ~ NA_character_)
),
age80 = as.numeric(age80),
racepr1 = as.factor(racepr1),
genhlth = as.factor(case_when(genhlth == 9 ~ NA_character_,
.default = as.factor(genhlth)
)
),
priminsr = as.factor(case_when(priminsr == 88 ~ "11", # no coverage
priminsr == 77 ~ NA_character_,
priminsr == 99 ~ NA_character_,
.default = as.factor(priminsr)
)
),
menthlth = as.numeric(ifelse(menthlth == 88, 0, menthlth)), #filter out 77 and 99 later
blind = as.factor(case_when(blind == 7 ~ NA_character_,
blind == 9 ~ NA_character_,
.default = as.factor(blind)
)
),
decide = as.factor(case_when(decide == 7 ~ NA_character_,
decide == 9 ~ NA_character_,
.default = as.factor(decide)
)
),
hlthpln = as.factor(case_when(hlthpln == 9 ~ NA_character_,
.default = as.factor(hlthpln)
)
),
wtkg3 = as.numeric(wtkg3 / 100),
htm4 = as.numeric(htm4 / 100),
diffwalk = as.factor(case_when(diffwalk == 7 ~ NA_character_,
diffwalk == 9 ~ NA_character_,
.default = as.factor(diffwalk)
)
),
diffdres = as.factor(case_when(diffdres == 7 ~ NA_character_,
diffdres == 9 ~ NA_character_,
.default = as.factor(diffdres)
)
),
deaf = as.factor(case_when(deaf == 7 ~ NA_character_,
deaf == 9 ~ NA_character_,
.default = as.factor(deaf)
)
),
physhlth = as.numeric(ifelse(physhlth == 88, 0, physhlth)), #filter out 77 and 99 later
sleptim1 = as.numeric(sleptim1), # filter out 77 and 99
totinda = as.factor(case_when(totinda == 9 ~ NA_character_,
.default = as.factor(totinda)
)
),
exerany2 = as.factor(case_when(exerany2 == 9 ~ NA_character_,
.default = as.factor(exerany2)
)
),
drnkwk2 = as.numeric(ifelse(drnkwk2 == 99900, NA_character_, drnkwk2)
),
drnkany6 = as.factor(case_when(drnkany6 == 7 ~ NA_character_,
drnkany6 == 9 ~ NA_character_,
.default = as.factor(drnkany6)
)
),
checkup1 = as.factor(case_when(checkup1 == 7 ~ NA_character_,
checkup1 == 8 ~ NA_character_,
checkup1 == 9 ~ NA_character_,
.default = as.factor(checkup1)
)
),
flushot7 = as.factor(case_when(flushot7 == 7 ~ NA_character_,
flushot7 == 9 ~ NA_character_,
.default = as.factor(flushot7)
)
),
chckdny2 = as.factor(case_when(chckdny2 == 7 ~ NA_character_,
chckdny2 == 9 ~ NA_character_,
.default = as.factor(chckdny2)
)
),
addepev3 = as.factor(case_when(addepev3 == 7 ~ NA_character_,
addepev3 == 9 ~ NA_character_,
.default = as.factor(addepev3)
)
),
drdxar2 = as.factor(drdxar2),
asthma3 = as.factor(case_when(asthma3 == 7 ~ NA_character_,
asthma3 == 9 ~ NA_character_,
.default = as.factor(asthma3)
)
),
denvst3 = as.factor(case_when(denvst3 == 9 ~ NA_character_,
.default = as.factor(denvst3)
)
)
)
data <-
data %>%
filter (ageg5yr > 2 & age80 >=30 & menthlth < 77 & physhlth < 77 & sleptim1 < 77) %>% # filter for age >-30 years definition of type 2 diabetes
mutate(ageg5yr = as.factor(ageg5yr)
) %>%
na.omit()
skim(data)
#write_csv(data, "diabetes_cleaned_data.csv")
# check correlation between numeric
data <- read_csv("diabetes_cleaned_data.csv")
data %>%
select_if(is.numeric) %>%
as.matrix(.) %>%
rcorr() %>%
tidy() %>%
arrange(desc(abs(estimate)))
┌───────────────────────────────────────────────────────┐
│ column1 column2 estimate n p.value │
├───────────────────────────────────────────────────────┤
│ exerany2 totinda 1 243049 0 │
│ age80 ageg5yr 0.995 243049 0 │
│ wtkg3 bmi5 0.859 243049 0 │
│ bmi5 bmi5cat 0.826 243049 0 │
│ wtkg3 bmi5cat 0.738 243049 0 │
│ htm4 sexvar -0.698 243049 0 │
│ age80 employ1 0.611 243049 0 │
│ employ1 ageg5yr 0.61 243049 0 │
│ hlthpln priminsr 0.601 243049 0 │
│ physhlth genhlth 0.499 243049 0 │
│ htm4 wtkg3 0.48 243049 0 │
│ physhlth diffwalk -0.44 243049 0 │
│ educag income3 0.433 243049 0 │
│ addepev3 menthlth -0.42 243049 0 │
│ diffwalk genhlth -0.418 243049 0 │
│ diffdres diffwalk 0.388 243049 0 │
│ decide menthlth -0.379 243049 0 │
│ employ1 income3 -0.373 243049 0 │
│ wtkg3 sexvar -0.355 243049 0 │
│ priminsr income3 -0.35 243049 0 │
│ genhlth income3 -0.344 243049 0 │
│ drdxar2 age80 -0.34 243049 0 │
│ drdxar2 ageg5yr -0.338 243049 0 │
│ drnkany6 drnkwk2 -0.332 243049 0 │
│ physhlth diffdres -0.331 243049 0 │
│ addepev3 decide 0.33 243049 0 │
│ physhlth menthlth 0.323 243049 0 │
│ renthom1 income3 -0.323 243049 0 │
│ diffwalk employ1 -0.318 243049 0 │
│ marital income3 -0.315 243049 0 │
│ drdxar2 diffwalk 0.311 243049 0 │
│ drdxar2 employ1 -0.306 243049 0 │
│ diffwalk income3 0.305 243049 0 │
│ renthom1 marital 0.298 243049 0 │
│ totinda genhlth 0.289 243049 0 │
│ exerany2 genhlth 0.289 243049 0 │
│ totinda diffwalk -0.287 243049 0 │
│ exerany2 diffwalk -0.287 243049 0 │
│ drnkany6 income3 -0.281 243049 0 │
│ menthlth genhlth 0.281 243049 0 │
│ denvst3 income3 -0.271 243049 0 │
│ diffdres genhlth -0.264 243049 0 │
│ drdxar2 genhlth -0.264 243049 0 │
│ flushot7 age80 -0.264 243049 0 │
│ flushot7 ageg5yr -0.263 243049 0 │
│ decide genhlth -0.261 243049 0 │
│ physhlth decide -0.257 243049 0 │
│ totinda physhlth 0.253 243049 0 │
│ exerany2 physhlth 0.253 243049 0 │
│ checkup1 hlthpln 0.247 243049 0 │
│ genhlth employ1 0.246 243049 0 │
│ genhlth bmi5 0.244 243049 0 │
│ totinda income3 -0.242 243049 0 │
│ exerany2 income3 -0.242 243049 0 │
│ physhlth income3 -0.239 243049 0 │
│ diffwalk decide 0.238 243049 0 │
│ drdxar2 physhlth -0.235 243049 0 │
│ genhlth educag -0.235 243049 0 │
│ denvst3 educag -0.232 243049 0 │
│ decide income3 0.227 243049 0 │
│ checkup1 age80 -0.225 243049 0 │
│ checkup1 ageg5yr -0.223 243049 0 │
│ physhlth employ1 0.219 243049 0 │
│ diffwalk ageg5yr -0.219 243049 0 │
│ diffwalk age80 -0.218 243049 0 │
│ deaf ageg5yr -0.216 243049 0 │
│ deaf age80 -0.214 243049 0 │
│ addepev3 genhlth -0.214 243049 0 │
│ flushot7 checkup1 0.214 243049 0 │
│ totinda educag -0.213 243049 0 │
│ exerany2 educag -0.213 243049 0 │
│ diffdres decide 0.212 243049 0 │
│ priminsr educag -0.211 243049 0 │
│ genhlth cvdcrhd4 -0.207 243049 0 │
│ addepev3 physhlth -0.207 243049 0 │
│ genhlth bmi5cat 0.204 243049 0 │
│ ageg5yr cvdcrhd4 -0.2 243049 0 │
│ age80 cvdcrhd4 -0.199 243049 0 │
│ age80 renthom1 -0.194 243049 0 │
│ drnkany6 genhlth 0.192 243049 0 │
│ diffwalk blind 0.191 243049 0 │
│ renthom1 ageg5yr -0.191 243049 0 │
│ htm4 income3 0.191 243049 0 │
│ drnkany6 educag -0.187 243049 0 │
│ wtkg3 genhlth 0.185 243049 0 │
│ priminsr employ1 0.185 243049 0 │
│ diffwalk bmi5 -0.185 243049 0 │
│ employ1 cvdcrhd4 -0.184 243049 0 │
│ priminsr renthom1 0.184 243049 0 │
│ diffdres income3 0.183 243049 0 │
│ chckdny2 genhlth -0.182 243049 0 │
│ diffwalk menthlth -0.181 243049 0 │
│ denvst3 genhlth 0.18 243049 0 │
│ deaf employ1 -0.18 243049 0 │
│ racepr1 age80 -0.179 243049 0 │
│ drdxar2 income3 0.179 243049 0 │
│ drnkany6 employ1 0.178 243049 0 │
│ blind income3 0.178 243049 0 │
│ racepr1 ageg5yr -0.178 243049 0 │
│ income3 ageg5yr -0.177 243049 0 │
│ flushot7 employ1 -0.177 243049 0 │
│ denvst3 renthom1 0.177 243049 0 │
│ drnkany6 diffwalk -0.176 243049 0 │
│ age80 income3 -0.176 243049 0 │
│ totinda diffdres -0.174 243049 0 │
│ exerany2 diffdres -0.174 243049 0 │
│ diffdres employ1 -0.174 243049 0 │
│ denvst3 priminsr 0.173 243049 0 │
│ diffdres menthlth -0.173 243049 0 │
│ renthom1 educag -0.173 243049 0 │
│ menthlth income3 -0.172 243049 0 │
│ diffwalk educag 0.172 243049 0 │
│ blind genhlth -0.171 243049 0 │
│ checkup1 employ1 -0.169 243049 0 │
│ racepr1 renthom1 0.168 243049 0 │
│ diffwalk cvdcrhd4 0.168 243049 0 │
│ denvst3 checkup1 0.167 243049 0 │
│ decide blind 0.165 243049 0 │
│ diffwalk cvdstrk3 0.164 243049 0 │
│ totinda bmi5 0.164 243049 0 │
│ exerany2 bmi5 0.164 243049 0 │
│ drnkany6 totinda 0.161 243049 0 │
│ drnkany6 exerany2 0.161 243049 0 │
│ genhlth cvdstrk3 -0.161 243049 0 │
│ addepev3 diffwalk 0.16 243049 0 │
│ racepr1 income3 -0.159 243049 0 │
│ menthlth age80 -0.158 243049 0 │
│ menthlth ageg5yr -0.157 243049 0 │
│ chckdny2 diffwalk 0.156 243049 0 │
│ deaf diffwalk 0.156 243049 0 │
│ sleptim1 ageg5yr 0.156 243049 0 │
│ sleptim1 age80 0.155 243049 0 │
│ age80 cureci2 -0.154 243049 0 │
│ denvst3 flushot7 0.153 243049 0 │
│ menthlth renthom1 0.153 243049 0 │
│ ageg5yr cureci2 -0.153 243049 0 │
│ physhlth cvdcrhd4 -0.152 243049 0 │
│ cvdcrhd4 cvdstrk3 0.151 243049 0 │
│ flushot7 educag -0.151 243049 0 │
│ physhlth blind -0.15 243049 0 │
│ employ1 cvdstrk3 -0.15 243049 0 │
│ hlthpln age80 -0.149 243049 0 │
│ hlthpln ageg5yr -0.149 243049 0 │
│ asthma3 addepev3 0.148 243049 0 │
│ drdxar2 checkup1 0.148 243049 0 │
│ addepev3 sexvar -0.147 243049 0 │
│ flushot7 hlthpln 0.147 243049 0 │
│ chckdny2 employ1 -0.147 243049 0 │
│ denvst3 hlthpln 0.146 243049 0 │
│ priminsr marital 0.145 243049 0 │
│ chckdny2 cvdcrhd4 0.145 243049 0 │
│ denvst3 totinda 0.145 243049 0 │
│ denvst3 exerany2 0.145 243049 0 │
│ genhlth renthom1 0.145 243049 0 │
│ drdxar2 diffdres 0.144 243049 0 │
│ diffdres blind 0.144 243049 0 │
│ asthma3 genhlth -0.143 243049 0 │
│ chckdny2 physhlth -0.143 243049 0 │
│ drnkany6 physhlth 0.143 243049 0 │
│ decide renthom1 -0.142 243049 0 │
│ addepev3 income3 0.14 243049 0 │
│ hlthpln educag -0.139 243049 0 │
│ educag employ1 -0.139 243049 0 │
│ drnkwk2 sexvar -0.138 243049 0 │
│ drnkany6 ageg5yr 0.137 243049 0 │
│ drdxar2 cvdcrhd4 0.137 243049 0 │
│ drnkany6 age80 0.137 243049 0 │
│ drdxar2 deaf 0.136 243049 0 │
│ totinda employ1 0.135 243049 0 │
│ exerany2 employ1 0.135 243049 0 │
│ drdxar2 addepev3 0.133 243049 0 │
│ htm4 racepr1 -0.133 243049 0 │
│ addepev3 diffdres 0.133 243049 0 │
│ hlthpln racepr1 0.133 243049 0 │
│ priminsr genhlth 0.132 243049 0 │
│ sleptim1 menthlth -0.131 243049 0 │
│ denvst3 diffwalk -0.131 243049 0 │
│ chckdny2 ageg5yr -0.131 243049 0 │
│ chckdny2 age80 -0.131 243049 0 │
│ diffwalk bmi5cat -0.13 243049 0 │
│ decide educag 0.13 243049 0 │
│ hlthpln renthom1 0.129 243049 0 │
│ denvst3 marital 0.129 243049 0 │
│ totinda menthlth 0.128 243049 0 │
│ exerany2 menthlth 0.128 243049 0 │
│ physhlth cvdstrk3 -0.128 243049 0 │
│ asthma3 physhlth -0.128 243049 0 │
│ deaf genhlth -0.128 243049 0 │
│ totinda bmi5cat 0.127 243049 0 │
│ exerany2 bmi5cat 0.127 243049 0 │
│ ageg5yr cvdstrk3 -0.127 243049 0 │
│ drdxar2 decide 0.127 243049 0 │
│ age80 cvdstrk3 -0.126 243049 0 │
│ physhlth educag -0.126 243049 0 │
│ hlthpln income3 -0.126 243049 0 │
│ racepr1 educag -0.126 243049 0 │
│ drdxar2 totinda -0.125 243049 0 │
│ drdxar2 exerany2 -0.125 243049 0 │
│ wtkg3 ageg5yr -0.125 243049 0 │
│ genhlth age80 0.124 243049 0 │
│ genhlth ageg5yr 0.124 243049 0 │
│ wtkg3 age80 -0.123 243049 0 │
│ addepev3 renthom1 -0.122 243049 0 │
│ htm4 employ1 -0.122 243049 0 │
│ blind employ1 -0.122 243049 0 │
│ totinda decide -0.122 243049 0 │
│ exerany2 decide -0.122 243049 0 │
│ priminsr racepr1 0.122 243049 0 │
│ drnkwk2 htm4 0.121 243049 0 │
│ drnkany6 htm4 -0.121 243049 0 │
│ drdxar2 chckdny2 0.121 243049 0 │
│ deaf blind 0.121 243049 0 │
│ drdxar2 flushot7 0.121 243049 0 │
│ decide employ1 -0.121 243049 0 │
│ asthma3 menthlth -0.121 243049 0 │
│ income3 cvdstrk3 0.121 243049 0 │
│ drdxar2 bmi5 -0.119 243049 0 │
│ diffwalk wtkg3 -0.119 243049 0 │
│ sexvar income3 -0.118 243049 0 │
│ physhlth bmi5 0.118 243049 0 │
│ decide priminsr -0.118 243049 0 │
│ denvst3 menthlth 0.115 243049 0 │
│ menthlth marital 0.115 243049 0 │
│ denvst3 physhlth 0.114 243049 0 │
│ sleptim1 employ1 0.114 243049 0 │
│ checkup1 priminsr 0.114 243049 0 │
│ age80 marital -0.114 243049 0 │
│ asthma3 diffwalk 0.112 243049 0 │
│ addepev3 bmi5 -0.112 243049 0 │
│ marital ageg5yr -0.11 243049 0 │
│ asthma3 bmi5 -0.109 243049 0 │
│ denvst3 decide -0.109 243049 0 │
│ deaf cvdcrhd4 0.108 243049 0 │
│ addepev3 ageg5yr 0.107 243049 0 │
│ blind educag 0.107 243049 0 │
│ deaf decide 0.107 243049 0 │
│ menthlth cureci2 0.106 243049 0 │
│ addepev3 age80 0.106 243049 0 │
│ blind menthlth -0.106 243049 0 │
│ flushot7 renthom1 0.106 243049 0 │
│ denvst3 drnkany6 0.106 243049 0 │
│ asthma3 decide 0.105 243049 0 │
│ drdxar2 bmi5cat -0.105 243049 0 │
│ physhlth renthom1 0.105 243049 0 │
│ diffwalk priminsr -0.105 243049 0 │
│ diffdres cvdstrk3 0.104 243049 0 │
│ asthma3 drdxar2 0.104 243049 0 │
│ drnkany6 priminsr 0.104 243049 0 │
│ diffwalk renthom1 -0.103 243049 0 │
│ deaf income3 0.102 243049 0 │
│ totinda ageg5yr 0.102 243049 0 │
│ exerany2 ageg5yr 0.102 243049 0 │
│ genhlth marital 0.101 243049 0 │
│ totinda age80 0.101 243049 0 │
│ exerany2 age80 0.101 243049 0 │
│ totinda wtkg3 0.0998 243049 0 │
│ exerany2 wtkg3 0.0998 243049 0 │
│ hlthpln marital 0.0995 243049 0 │
│ physhlth deaf -0.0994 243049 0 │
│ educag marital -0.0992 243049 0 │
│ menthlth sexvar 0.0985 243049 0 │
│ drdxar2 drnkany6 -0.0982 243049 0 │
│ physhlth priminsr 0.0982 243049 0 │
│ htm4 ageg5yr -0.0969 243049 0 │
│ racepr1 marital 0.0968 243049 0 │
│ flushot7 priminsr 0.0964 243049 0 │
│ htm4 age80 -0.0962 243049 0 │
│ totinda renthom1 0.096 243049 0 │
│ exerany2 renthom1 0.096 243049 0 │
│ diffdres bmi5 -0.0957 243049 0 │
│ renthom1 cureci2 0.0953 243049 0 │
│ totinda priminsr 0.0952 243049 0 │
│ exerany2 priminsr 0.0952 243049 0 │
│ addepev3 totinda -0.095 243049 0 │
│ addepev3 exerany2 -0.095 243049 0 │
│ addepev3 htm4 0.0946 243049 0 │
│ addepev3 cureci2 -0.0939 243049 0 │
│ drnkany6 sexvar 0.0939 243049 0 │
│ blind cvdstrk3 0.0938 243049 0 │
│ decide marital -0.0934 243049 0 │
│ menthlth bmi5 0.0933 243049 0 │
│ educag bmi5 -0.0933 243049 0 │
│ flushot7 racepr1 0.0932 243049 0 │
│ chckdny2 income3 0.0929 243049 0 │
│ income3 cvdcrhd4 0.0917 243049 0 │
│ totinda blind -0.0917 243049 0 │
│ exerany2 blind -0.0917 243049 0 │
│ diffdres educag 0.0916 243049 0 │
│ drdxar2 cvdstrk3 0.0909 243049 0 │
│ decide cvdstrk3 0.0909 243049 0 │
│ drdxar2 blind 0.0908 243049 0 │
│ flushot7 income3 -0.0906 243049 0 │
│ drdxar2 menthlth -0.0905 243049 0 │
│ drnkany6 diffdres -0.09 243049 0 │
│ ageg5yr bmi5 -0.09 243049 0 │
│ addepev3 marital -0.0899 243049 0 │
│ asthma3 sexvar -0.0893 243049 0 │
│ chckdny2 cvdstrk3 0.0892 243049 0 │
│ menthlth priminsr 0.0888 243049 0 │
│ drdxar2 educag 0.0887 243049 0 │
│ diffdres cvdcrhd4 0.0884 243049 0 │
│ age80 bmi5 -0.0877 243049 0 │
│ wtkg3 employ1 -0.087 243049 0 │
│ chckdny2 totinda -0.0869 243049 0 │
│ chckdny2 exerany2 -0.0869 243049 0 │
│ checkup1 sexvar -0.0869 243049 0 │
│ denvst3 diffdres -0.0868 243049 0 │
│ checkup1 diffwalk 0.0867 243049 0 │
│ chckdny2 diffdres 0.0865 243049 0 │
│ educag bmi5cat -0.0865 243049 0 │
│ drdxar2 htm4 0.0859 243049 0 │
│ chckdny2 drnkany6 -0.0857 243049 0 │
│ drnkany6 renthom1 0.0851 243049 0 │
│ racepr1 employ1 -0.0848 243049 0 │
│ decide cureci2 -0.0848 243049 0 │
│ diffdres renthom1 -0.0845 243049 0 │
│ totinda htm4 -0.0845 243049 0 │
│ exerany2 htm4 -0.0845 243049 0 │
│ denvst3 blind -0.0843 243049 0 │
│ drdxar2 sexvar -0.0839 243049 0 │
│ addepev3 blind 0.083 243049 0 │
│ denvst3 bmi5 0.0826 243049 0 │
│ deaf sexvar 0.0822 243049 0 │
│ income3 bmi5 -0.082 243049 0 │
│ deaf diffdres 0.0819 243049 0 │
│ diffwalk htm4 0.0816 243049 0 │
│ totinda cvdcrhd4 -0.0816 243049 0 │
│ exerany2 cvdcrhd4 -0.0816 243049 0 │
│ totinda cvdstrk3 -0.0803 243049 0 │
│ exerany2 cvdstrk3 -0.0803 243049 0 │
│ blind renthom1 -0.0802 243049 0 │
│ checkup1 genhlth -0.0801 243049 0 │
│ drnkany6 decide -0.0801 243049 0 │
│ asthma3 bmi5cat -0.0796 243049 0 │
│ drdxar2 racepr1 0.0788 243049 0 │
│ physhlth age80 0.078 243049 0 │
│ physhlth wtkg3 0.0774 243049 0 │
│ physhlth ageg5yr 0.0773 243049 0 │
│ menthlth educag -0.0771 243049 0 │
│ physhlth bmi5cat 0.077 243049 0 │
│ drnkany6 bmi5 0.0769 243049 0 │
│ asthma3 diffdres 0.0768 243049 0 │
│ checkup1 marital 0.0768 243049 0 │
│ addepev3 bmi5cat -0.0766 243049 0 │
│ checkup1 renthom1 0.0763 243049 0 │
│ hlthpln employ1 -0.0763 243049 0 │
│ denvst3 wtkg3 0.0763 243049 0 │
│ checkup1 htm4 0.0762 243049 0 │
│ checkup1 cvdcrhd4 0.076 243049 0 │
│ drnkany6 blind -0.0749 243049 0 │
│ checkup1 drnkwk2 0.0748 243049 0 │
│ htm4 educag 0.0748 243049 0 │
│ deaf educag 0.0742 243049 0 │
│ flushot7 cureci2 0.0741 243049 0 │
│ chckdny2 blind 0.074 243049 0 │
│ sexvar cvdcrhd4 0.074 243049 0 │
│ drnkany6 cvdstrk3 -0.0736 243049 0 │
│ flushot7 marital 0.0735 243049 0 │
│ drnkwk2 income3 0.0734 243049 0 │
│ asthma3 htm4 0.0732 243049 0 │
│ blind priminsr -0.0729 243049 0 │
│ blind cvdcrhd4 0.0727 243049 0 │
│ drnkany6 cvdcrhd4 -0.0716 243049 0 │
│ asthma3 income3 0.0716 243049 0 │
│ deaf cvdstrk3 0.0711 243049 0 │
│ diffdres wtkg3 -0.071 243049 0 │
│ totinda deaf -0.0709 243049 0 │
│ exerany2 deaf -0.0709 243049 0 │
│ flushot7 cvdcrhd4 0.0708 243049 0 │
│ drdxar2 hlthpln 0.0708 243049 0 │
│ flushot7 sleptim1 -0.0705 243049 0 │
│ marital cureci2 0.0702 243049 0 │
│ diffdres priminsr -0.0699 243049 0 │
│ denvst3 racepr1 0.0695 243049 0 │
│ diffwalk marital -0.0695 243049 0 │
│ chckdny2 deaf 0.0693 243049 0 │
│ genhlth racepr1 0.069 243049 0 │
│ denvst3 addepev3 -0.0683 243049 0 │
│ sleptim1 genhlth -0.0682 243049 0 │
│ renthom1 bmi5 0.0672 243049 0 │
│ sexvar employ1 0.0668 243049 0 │
│ denvst3 cureci2 0.0665 243049 0 │
│ blind ageg5yr -0.0665 243049 0 │
│ drnkwk2 ageg5yr -0.0663 243049 0 │
│ sleptim1 decide 0.0661 243049 0 │
│ drnkwk2 age80 -0.0657 243049 0 │
│ blind age80 -0.0655 243049 0 │
│ htm4 menthlth -0.0654 243049 0 │
│ educag cvdstrk3 0.0653 243049 0 │
│ drnkwk2 employ1 -0.0652 243049 0 │
│ drnkany6 racepr1 0.065 243049 0 │
│ totinda marital 0.0635 243049 0 │
│ exerany2 marital 0.0635 243049 0 │
│ sleptim1 racepr1 -0.0634 243049 0 │
│ ageg5yr bmi5cat -0.0633 243049 0 │
│ employ1 cureci2 -0.0631 243049 0 │
│ decide bmi5 -0.0629 243049 0 │
│ denvst3 bmi5cat 0.0628 243049 0 │
│ blind racepr1 -0.0626 243049 0 │
│ sexvar bmi5cat -0.0625 243049 0 │
│ drdxar2 wtkg3 -0.0623 243049 0 │
│ diffwalk sexvar -0.0623 243049 0 │
│ asthma3 renthom1 -0.0618 243049 0 │
│ chckdny2 decide 0.0618 243049 0 │
│ checkup1 physhlth -0.0618 243049 0 │
│ chckdny2 flushot7 0.0611 243049 0 │
│ decide cvdcrhd4 0.0607 243049 0 │
│ age80 bmi5cat -0.0607 243049 0 │
│ checkup1 cureci2 0.0606 243049 0 │
│ chckdny2 checkup1 0.0606 243049 0 │
│ denvst3 age80 -0.06 243049 0 │
│ drnkany6 marital 0.0599 243049 0 │
│ educag cureci2 -0.0598 243049 0 │
│ htm4 genhlth -0.0589 243049 0 │
│ sleptim1 renthom1 -0.0588 243049 0 │
│ denvst3 ageg5yr -0.0587 243049 0 │
│ addepev3 priminsr -0.0585 243049 0 │
│ htm4 marital -0.0577 243049 0 │
│ sleptim1 bmi5 -0.0571 243049 0 │
│ physhlth marital 0.0568 243049 0 │
│ addepev3 chckdny2 0.0562 243049 0 │
│ sleptim1 physhlth -0.0561 243049 0 │
│ checkup1 sleptim1 -0.0561 243049 0 │
│ asthma3 wtkg3 -0.0561 243049 0 │
│ denvst3 sleptim1 -0.0559 243049 0 │
│ denvst3 sexvar -0.0559 243049 0 │
│ sleptim1 wtkg3 -0.0557 243049 0 │
│ totinda sexvar 0.0555 243049 0 │
│ exerany2 sexvar 0.0555 243049 0 │
│ menthlth bmi5cat 0.0553 243049 0 │
│ totinda racepr1 0.0548 243049 0 │
│ exerany2 racepr1 0.0548 243049 0 │
│ deaf priminsr -0.0544 243049 0 │
│ denvst3 cvdstrk3 -0.054 243049 0 │
│ diffdres ageg5yr -0.0537 243049 0 │
│ diffdres age80 -0.0535 243049 0 │
│ blind marital -0.0535 243049 0 │
│ drnkwk2 cureci2 0.0535 243049 0 │
│ drnkany6 bmi5cat 0.0534 243049 0 │
│ checkup1 bmi5 -0.0533 243049 0 │
│ diffdres bmi5cat -0.0532 243049 0 │
│ checkup1 bmi5cat -0.0528 243049 0 │
│ educag ageg5yr -0.0524 243049 0 │
│ drnkany6 deaf -0.0522 243049 0 │
│ flushot7 menthlth 0.0522 243049 0 │
│ flushot7 sexvar -0.052 243049 0 │
│ htm4 renthom1 -0.0517 243049 0 │
│ checkup1 cvdstrk3 0.0516 243049 0 │
│ age80 educag -0.0515 243049 0 │
│ chckdny2 bmi5 -0.0513 243049 0 │
│ htm4 decide 0.0512 243049 0 │
│ asthma3 totinda -0.0511 243049 0 │
│ asthma3 exerany2 -0.0511 243049 0 │
│ drdxar2 priminsr -0.0508 243049 0 │
│ sleptim1 cureci2 -0.0506 243049 0 │
│ sleptim1 bmi5cat -0.0504 243049 0 │
│ physhlth htm4 -0.0503 243049 0 │
│ menthlth cvdstrk3 -0.0499 243049 0 │
│ decide age80 0.0496 243049 0 │
│ flushot7 deaf 0.0495 243049 0 │
│ asthma3 blind 0.049 243049 0 │
│ marital sexvar 0.0488 243049 0 │
│ decide ageg5yr 0.0487 243049 0 │
│ checkup1 deaf 0.0484 243049 0 │
│ addepev3 cvdstrk3 0.0482 243049 0 │
│ flushot7 htm4 0.0482 243049 0 │
│ addepev3 employ1 -0.048 243049 0 │
│ addepev3 wtkg3 -0.048 243049 0 │
│ wtkg3 menthlth 0.0477 243049 0 │
│ asthma3 ageg5yr 0.0476 243049 0 │
│ income3 bmi5cat -0.0476 243049 0 │
│ asthma3 age80 0.0475 243049 0 │
│ educag cvdcrhd4 0.0473 243049 0 │
│ asthma3 sleptim1 0.0472 243049 0 │
│ htm4 blind 0.0472 243049 0 │
│ diffdres marital -0.0471 243049 0 │
│ drnkwk2 diffwalk 0.0471 243049 0 │
│ addepev3 sleptim1 0.0467 243049 0 │
│ addepev3 drnkany6 -0.0467 243049 0 │
│ hlthpln cureci2 0.0466 243049 0 │
│ priminsr cureci2 0.0464 243049 0 │
│ hlthpln menthlth 0.0462 243049 0 │
│ decide sexvar -0.0456 243049 0 │
│ drnkany6 menthlth 0.0456 243049 0 │
│ decide racepr1 -0.0449 243049 0 │
│ denvst3 cvdcrhd4 -0.0447 243049 0 │
│ wtkg3 educag -0.0443 243049 0 │
│ chckdny2 menthlth -0.0442 243049 0 │
│ checkup1 educag -0.0441 243049 0 │
│ flushot7 diffwalk 0.0441 243049 0 │
│ wtkg3 cvdcrhd4 -0.0438 243049 0 │
│ flushot7 drnkwk2 0.0431 243049 0 │
│ chckdny2 bmi5cat -0.043 243049 0 │
│ physhlth sexvar 0.0429 243049 0 │
│ priminsr cvdstrk3 -0.0426 243049 0 │
│ drnkwk2 bmi5 -0.0412 243049 0 │
│ checkup1 drnkany6 -0.0406 243049 0 │
│ asthma3 chckdny2 0.0404 243049 0 │
│ income3 cureci2 -0.0404 243049 0 │
│ asthma3 cvdstrk3 0.04 243049 0 │
│ asthma3 drnkany6 -0.0397 243049 0 │
│ sleptim1 marital -0.0388 243049 0 │
│ wtkg3 racepr1 -0.0388 243049 0 │
│ asthma3 marital -0.0386 243049 0 │
│ cvdcrhd4 bmi5cat -0.0384 243049 0 │
│ chckdny2 drnkwk2 0.0382 243049 0 │
│ asthma3 cvdcrhd4 0.0382 243049 0 │
│ decide bmi5cat -0.0382 243049 0 │
│ chckdny2 educag 0.038 243049 0 │
│ addepev3 checkup1 0.0375 243049 0 │
│ drnkwk2 genhlth -0.0372 243049 0 │
│ flushot7 wtkg3 0.037 243049 0 │
│ employ1 bmi5cat -0.0369 243049 0 │
│ renthom1 cvdstrk3 -0.0369 243049 0 │
│ menthlth cvdcrhd4 -0.0367 243049 0 │
│ cvdcrhd4 bmi5 -0.0359 243049 0 │
│ checkup1 menthlth 0.0358 243049 0 │
│ flushot7 decide -0.0358 243049 0 │
│ addepev3 cvdcrhd4 0.0355 243049 0 │
│ marital bmi5 0.0353 243049 0 │
│ drnkwk2 hlthpln 0.0353 243049 0 │
│ addepev3 deaf 0.0348 243049 0 │
│ racepr1 cvdcrhd4 0.0346 243049 0 │
│ genhlth cureci2 0.0345 243049 0 │
│ hlthpln sexvar -0.0342 243049 0 │
│ denvst3 chckdny2 -0.0339 243049 0 │
│ checkup1 diffdres 0.0335 243049 0 │
│ totinda hlthpln 0.0333 243049 0 │
│ exerany2 hlthpln 0.0333 243049 0 │
│ checkup1 racepr1 0.0331 243049 0 │
│ htm4 priminsr -0.0331 243049 0 │
│ deaf racepr1 0.0329 243049 0 │
│ priminsr cvdcrhd4 -0.0329 243049 0 │
│ denvst3 deaf -0.0326 243049 0 │
│ renthom1 bmi5cat 0.0326 243049 0 │
│ priminsr age80 0.0326 243049 0 │
│ hlthpln cvdcrhd4 0.0325 243049 0 │
│ priminsr ageg5yr 0.0325 243049 0 │
│ sleptim1 diffdres 0.0324 243049 0 │
│ denvst3 asthma3 -0.032 243049 0 │
│ drdxar2 drnkwk2 0.0315 243049 0 │
│ racepr1 bmi5 0.0315 243049 0 │
│ flushot7 totinda 0.0313 243049 0 │
│ flushot7 exerany2 0.0313 243049 0 │
│ hlthpln decide -0.0313 243049 0 │
│ flushot7 cvdstrk3 0.0312 243049 0 │
│ drnkwk2 bmi5cat -0.0311 243049 0 │
│ drnkwk2 physhlth -0.031 243049 0 │
│ denvst3 employ1 0.0308 243049 0 │
│ employ1 bmi5 -0.0302 243049 0 │
│ drnkwk2 totinda -0.0301 243049 0 │
│ drnkwk2 exerany2 -0.0301 243049 0 │
│ racepr1 bmi5cat 0.0301 243049 0 │
│ asthma3 checkup1 0.03 243049 0 │
│ chckdny2 htm4 0.03 243049 0 │
│ chckdny2 wtkg3 -0.0298 243049 0 │
│ wtkg3 renthom1 0.0298 243049 0 │
│ deaf menthlth -0.0297 243049 0 │
│ diffdres racepr1 -0.0287 243049 0 │
│ sleptim1 blind 0.0286 243049 0 │
│ menthlth racepr1 0.0279 243049 0 │
│ asthma3 deaf 0.0279 243049 0 │
│ physhlth cureci2 0.0279 243049 0 │
│ wtkg3 decide -0.0277 243049 0 │
│ drnkwk2 menthlth 0.0276 243049 0 │
│ chckdny2 priminsr -0.0274 243049 0 │
│ htm4 bmi5cat 0.0272 243049 0 │
│ sleptim1 hlthpln -0.0271 243049 0 │
│ sleptim1 educag 0.0267 243049 0 │
│ drnkwk2 racepr1 -0.0261 243049 0 │
│ renthom1 employ1 -0.026 243049 0 │
│ asthma3 employ1 -0.0258 243049 0 │
│ blind bmi5 -0.0258 243049 0 │
│ wtkg3 income3 0.0256 243049 0 │
│ deaf hlthpln 0.0252 243049 0 │
│ priminsr sexvar -0.025 243049 0 │
│ addepev3 flushot7 0.0243 243049 0 │
│ drnkwk2 cvdcrhd4 0.0242 243049 0 │
│ htm4 cvdcrhd4 -0.024 243049 0 │
│ diffdres htm4 0.0237 243049 0 │
│ denvst3 drnkwk2 0.0237 243049 0 │
│ chckdny2 hlthpln 0.0236 243049 0 │
│ asthma3 flushot7 0.0236 243049 0 │
│ deaf htm4 -0.0235 243049 0 │
│ educag sexvar 0.0232 243049 0 │
│ checkup1 totinda -0.0232 243049 0 │
│ checkup1 exerany2 -0.0232 243049 0 │
│ drnkwk2 wtkg3 0.023 243049 0 │
│ drnkany6 hlthpln 0.0229 243049 0 │
│ asthma3 cureci2 -0.0219 243049 0 │
│ diffwalk hlthpln 0.0217 243049 0 │
│ flushot7 genhlth -0.0215 243049 0 │
│ hlthpln genhlth 0.0212 243049 0 │
│ menthlth employ1 0.0212 243049 0 │
│ htm4 bmi5 -0.0209 243049 0 │
│ denvst3 drdxar2 -0.0209 243049 0 │
│ drnkany6 cureci2 -0.0206 243049 0 │
│ addepev3 educag 0.0203 243049 0 │
│ sleptim1 diffwalk 0.02 243049 0 │
│ renthom1 sexvar 0.0198 243049 0 │
│ htm4 cvdstrk3 0.0197 243049 0 │
│ chckdny2 renthom1 -0.0196 243049 0 │
│ htm4 cureci2 0.0196 243049 0 │
│ blind sexvar -0.0192 243049 0 │
│ diffwalk racepr1 -0.0189 243049 0 │
│ asthma3 drnkwk2 0.0188 243049 0 │
│ drnkwk2 cvdstrk3 0.0186 243049 0 │
│ sleptim1 deaf -0.0184 243049 0 │
│ physhlth racepr1 0.0181 243049 0 │
│ totinda cureci2 0.018 243049 0 │
│ exerany2 cureci2 0.018 243049 0 │
│ cureci2 cvdcrhd4 0.0178 243049 0 │
│ marital cvdstrk3 -0.0174 243049 0 │
│ hlthpln blind -0.0169 243049 0 │
│ checkup1 blind 0.0165 243049 4.44e-16 │
│ hlthpln cvdstrk3 0.0165 243049 4.44e-16 │
│ asthma3 racepr1 -0.0164 243049 6.66e-16 │
│ diffdres cureci2 -0.0163 243049 8.88e-16 │
│ deaf wtkg3 -0.0163 243049 8.88e-16 │
│ age80 sexvar 0.0162 243049 1.33e-15 │
│ drdxar2 renthom1 0.0162 243049 1.33e-15 │
│ sexvar ageg5yr 0.0161 243049 2.44e-15 │
│ drdxar2 cureci2 0.0157 243049 1.15e-14 │
│ flushot7 bmi5 0.0154 243049 3.02e-14 │
│ blind cureci2 -0.0152 243049 7.48e-14 │
│ addepev3 racepr1 0.015 243049 1.51e-13 │
│ blind bmi5cat -0.0145 243049 7.6e-13 │
│ asthma3 priminsr -0.014 243049 5.39e-12 │
│ chckdny2 cureci2 0.0139 243049 6.99e-12 │
│ sleptim1 cvdcrhd4 -0.0136 243049 1.86e-11 │
│ cvdstrk3 bmi5 -0.0136 243049 2.08e-11 │
│ drnkwk2 diffdres 0.0134 243049 3.85e-11 │
│ chckdny2 marital -0.0133 243049 5.55e-11 │
│ asthma3 educag 0.013 243049 1.36e-10 │
│ deaf bmi5cat -0.0126 243049 5.65e-10 │
│ flushot7 bmi5cat 0.0125 243049 6.21e-10 │
│ sleptim1 cvdstrk3 -0.0124 243049 1.08e-09 │
│ asthma3 hlthpln 0.0123 243049 1.16e-09 │
│ cvdstrk3 bmi5cat -0.0121 243049 2.7e-09 │
│ deaf renthom1 0.012 243049 3.46e-09 │
│ racepr1 sexvar 0.0114 243049 1.75e-08 │
│ deaf cureci2 0.0114 243049 2.1e-08 │
│ checkup1 wtkg3 -0.0112 243049 2.98e-08 │
│ sleptim1 sexvar 0.0108 243049 8.95e-08 │
│ sexvar cureci2 -0.0107 243049 1.47e-07 │
│ deaf marital 0.0106 243049 1.53e-07 │
│ wtkg3 cureci2 0.0106 243049 1.95e-07 │
│ marital cvdcrhd4 0.0104 243049 2.68e-07 │
│ denvst3 htm4 0.0103 243049 3.46e-07 │
│ priminsr bmi5 0.0102 243049 4.84e-07 │
│ racepr1 cureci2 0.0102 243049 5e-07 │
│ │
│ sexvar bmi5 -0.00998 243049 8.71e-07 │
│ genhlth sexvar 0.00989 243049 1.08e-06 │
│ flushot7 drnkany6 0.00988 243049 1.11e-06 │
│ wtkg3 priminsr -0.00984 243049 1.24e-06 │
│ diffdres sexvar -0.00974 243049 1.56e-06 │
│ racepr1 cvdstrk3 0.00958 243049 2.31e-06 │
│ sleptim1 htm4 -0.00956 243049 2.43e-06 │
│ chckdny2 sexvar -0.00928 243049 4.76e-06 │
│ totinda sleptim1 -0.00884 243049 1.3e-05 │
│ exerany2 sleptim1 -0.00884 243049 1.3e-05 │
│ flushot7 blind -0.00877 243049 1.55e-05 │
│ sleptim1 priminsr -0.00876 243049 1.56e-05 │
│ drdxar2 sleptim1 0.00838 243049 3.59e-05 │
│ drnkwk2 marital 0.00827 243049 4.58e-05 │
│ drdxar2 marital 0.00786 243049 0.000107 │
│ flushot7 physhlth -0.00778 243049 0.000125 │
│ chckdny2 racepr1 0.00753 243049 0.000204 │
│ chckdny2 sleptim1 -0.00697 243049 0.000594 │
│ sexvar cvdstrk3 0.00694 243049 0.000623 │
│ drnkwk2 priminsr 0.00692 243049 0.000651 │
│ drnkany6 wtkg3 0.00666 243049 0.00102 │
│ sleptim1 income3 0.00662 243049 0.0011 │
│ drnkwk2 renthom1 -0.00644 243049 0.0015 │
│ drnkwk2 educag 0.00577 243049 0.00443 │
│ marital bmi5cat 0.0057 243049 0.00497 │
│ diffdres hlthpln 0.00558 243049 0.00598 │
│ cureci2 cvdstrk3 0.0052 243049 0.0103 │
│ drnkwk2 blind 0.00506 243049 0.0126 │
│ addepev3 drnkwk2 0.00486 243049 0.0165 │
│ drnkany6 sleptim1 -0.00404 243049 0.0466 │
│ deaf bmi5 -0.00397 243049 0.05 │
│ cureci2 bmi5cat -0.00376 243049 0.0639 │
│ drnkwk2 sleptim1 -0.00373 243049 0.0657 │
│ htm4 hlthpln -0.00354 243049 0.0806 │
│ priminsr bmi5cat 0.0032 243049 0.115 │
│ physhlth hlthpln -0.00249 243049 0.219 │
│ wtkg3 hlthpln -0.00207 243049 0.308 │
│ wtkg3 cvdstrk3 -0.00206 243049 0.309 │
│ hlthpln bmi5 0.002 243049 0.325 │
│ addepev3 hlthpln 0.00197 243049 0.332 │
│ checkup1 income3 0.00189 243049 0.351 │
│ wtkg3 blind 0.00189 243049 0.351 │
│ marital employ1 0.00154 243049 0.448 │
│ drnkwk2 decide -0.0015 243049 0.459 │
│ flushot7 diffdres -0.00143 243049 0.48 │
│ cureci2 bmi5 0.00127 243049 0.532 │
│ hlthpln bmi5cat -0.0012 243049 0.554 │
│ renthom1 cvdcrhd4 0.000892 243049 0.66 │
│ checkup1 decide 0.000616 243049 0.761 │
│ drnkwk2 deaf 0.000379 243049 0.852 │
│ diffwalk cureci2 -0.000232 243049 0.909 │
│ wtkg3 marital -0.000111 243049 0.956 │
└───────────────────────────────────────────────────────┘
Column names: column1, column2, estimate, n, p.value
# split data
set.seed(2024021401)
data_split <-
data %>%
dplyr::sample_frac(size = 0.05, replace = FALSE) %>% #use 10% of data due to lack of computing power
initial_split(strata = diabete4) # strata by diabete4
data_train <-
data_split %>%
training()
data_test <-
data_split %>%
testing()
data_fold <-
data_train %>%
vfold_cv(v = 10, strata = diabete4)
# split data
set.seed(2024021401)
data_split_big <-
data %>%
initial_split(strata = diabete4) # strata by diabete4
data_train_big <-
data_split_big %>%
training()
data_test_big <-
data_split_big %>%
testing()
data_fold_big <-
data_train_big %>%
vfold_cv(v = 10, strata = diabete4)
base_rec <-
recipes::recipe(formula = diabete4 ~.,
data = data_train) %>%
step_zv(all_predictors())
dummy_rec <-
base_rec %>%
step_dummy(all_nominal_predictors())
normal_rec <-
dummy_rec %>%
step_normalize(all_predictors())
log_rec <-
base_rec %>%
step_log(all_numeric_predictors())
# random forest
rf_spec <-
rand_forest(trees = 1000L) %>%
set_engine("ranger",
importance = "permutation") %>%
set_mode("classification")
rf_spec_for_tuning <-
rf_spec %>%
set_args(mtry = tune(),
min_n = tune())
# Classification Tree Model
ct_spec <-
decision_tree() %>%
set_engine(engine = 'rpart') %>%
set_mode('classification')
ct_spec_for_tuning <-
ct_spec %>%
set_args(tree_depth = tune(),
min_n = tune(),
cost_complexity = tune())
# knn
knn_spec <-
nearest_neighbor() %>%
set_engine("kknn") %>%
set_mode("classification")
knn_spec_for_tuning <-
knn_spec %>%
set_args(neighbors = tune(),
weight_func = tune(),
dist_power = tune())
# xgboost
xgb_spec <-
boost_tree(trees = 1000L) %>%
set_engine("xgboost") %>%
set_mode("classification")
xgb_spec_for_tuning <-
xgb_spec %>%
set_args(tree_depth = tune(),
min_n = tune(),
loss_reduction = tune(),
sample_size = tune(),
mtry = tune(),
learn_rate = tune())
# # naive bayes
naive_spec <-
naive_Bayes() %>%
set_engine("naivebayes",
usepoisson = TRUE) %>%
set_mode("classification")
naive_spec_for_tuning <-
naive_spec %>%
set_args(smoothness = tune(),
Laplace = tune())
# Logistic Regression Model
logistic_spec <-
logistic_reg() %>%
set_engine(engine = 'glm') %>%
set_mode('classification')
# Lasso Logistic Regression Model
logistic_lasso_spec <-
logistic_reg(mixture = 1, penalty = 1) %>%
set_engine(engine = 'glmnet') %>%
set_mode('classification')
logistic_lasso_spec_for_tuning <-
logistic_lasso_spec %>%
set_args(penalty = tune()) #we could let penalty = tune()
base_set <- #works
workflow_set (
list(base_rec, dummy_rec, log_rec), #preprocessor
list(rf_spec, ct_spec,
rf_spec_for_tuning, ct_spec_for_tuning), #model
cross = TRUE) #default is cross = TRUE
dummy_set <- #works
workflow_set (
list(dummy_rec),
list(knn_spec, xgb_spec, logistic_spec,
knn_spec_for_tuning, xgb_spec_for_tuning),
cross = TRUE)
normal_set <-
workflow_set(
list(normal_rec),
list(logistic_lasso_spec,
logistic_lasso_spec_for_tuning),
cross = TRUE)
naive_set <- #works
workflow_set(
list(base_rec, log_rec),
list(naive_spec,
naive_spec_for_tuning),
cross = TRUE)
model_set <-
bind_rows(base_set, dummy_set, normal_set, naive_set)