Load Cleaned Data and Libraries
# Data Handling and Model Building
library (tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.1
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
✔ broom 1.0.3 ✔ rsample 1.1.1
✔ dials 1.1.0 ✔ tune 1.0.1
✔ infer 1.0.4 ✔ workflows 1.1.2
✔ modeldata 1.0.1 ✔ workflowsets 1.0.0
✔ parsnip 1.0.3 ✔ yardstick 1.1.0
✔ recipes 1.0.4
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ dplyr::lag() masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Learn how to get started at https://www.tidymodels.org/start/
Attaching package: 'gtsummary'
The following objects are masked from 'package:recipes':
all_double, all_factor, all_integer, all_logical, all_numeric
here() starts at C:/Users/Kai/Documents/School/Colleges/UGA/MPH Year/Spring 2023/Modern Data Analysis/kaichen-MADA-portfolio
path <- here ("fluanalysis" , "data" , "cleaned_data.rds" )
# Load Data
clean_data <- readRDS (path)
As mentioned in previous files, BodyTemp acts as the main continuous outcome of interest, and Nausea acts as the main categorical outcome of interest. For model fitting, all variables other than the outcome in question will be used as predictors. This means that in a logistic regression model for nausea, body temperature will be one of the predictors. Additionally, RunnyNose will be treated as the main predictor in the simple regression models.
Linear Regression Model
Set Up Linear Regression Engine
lm_model <- linear_reg () %>% set_engine ("lm" )
Runny Nose vs Body Temperature
# Tidyverse
tidy (lm_model %>% fit (BodyTemp ~ RunnyNose, data = clean_data))
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 99.1 0.0819 1210. 0
2 RunnyNoseYes -0.293 0.0971 -3.01 0.00268
# Base R
summary (lm (BodyTemp ~ RunnyNose, data = clean_data))
Call:
lm(formula = BodyTemp ~ RunnyNose, data = clean_data)
Residuals:
Min 1Q Median 3Q Max
-1.9431 -0.7505 -0.3505 0.3495 4.1495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 99.14313 0.08191 1210.426 < 2e-16 ***
RunnyNoseYes -0.29265 0.09714 -3.013 0.00268 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.19 on 728 degrees of freedom
Multiple R-squared: 0.01231, Adjusted R-squared: 0.01096
F-statistic: 9.076 on 1 and 728 DF, p-value: 0.00268
All Relevant Predictors vs Body Temperature
# Tidymodels
rmarkdown:: paged_table (tidy (lm_model %>% fit (BodyTemp ~ SwollenLymphNodes + ChestCongestion + ChillsSweats + NasalCongestion + CoughYN + Sneeze + Fatigue + SubjectiveFever + Headache + Weakness + WeaknessYN + CoughIntensity + CoughYN2 + Myalgia + MyalgiaYN + RunnyNose + AbPain + ChestPain + Diarrhea + EyePn + Insomnia + ItchyEye + Nausea + EarPn + Hearing + Pharyngitis + Breathless + ToothPn + Vision + Vomit + Wheeze, data = clean_data)))
# Base R
summary (lm (BodyTemp ~ SwollenLymphNodes + ChestCongestion + ChillsSweats + NasalCongestion + CoughYN + Sneeze + Fatigue + SubjectiveFever + Headache + Weakness + WeaknessYN + CoughIntensity + CoughYN2 + Myalgia + MyalgiaYN + RunnyNose + AbPain + ChestPain + Diarrhea + EyePn + Insomnia + ItchyEye + Nausea + EarPn + Hearing + Pharyngitis + Breathless + ToothPn + Vision + Vomit + Wheeze, data = clean_data))
Call:
lm(formula = BodyTemp ~ SwollenLymphNodes + ChestCongestion +
ChillsSweats + NasalCongestion + CoughYN + Sneeze + Fatigue +
SubjectiveFever + Headache + Weakness + WeaknessYN + CoughIntensity +
CoughYN2 + Myalgia + MyalgiaYN + RunnyNose + AbPain + ChestPain +
Diarrhea + EyePn + Insomnia + ItchyEye + Nausea + EarPn +
Hearing + Pharyngitis + Breathless + ToothPn + Vision + Vomit +
Wheeze, data = clean_data)
Residuals:
Min 1Q Median 3Q Max
-2.2110 -0.7219 -0.2853 0.4342 4.2095
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 97.925243 0.303804 322.330 < 2e-16 ***
SwollenLymphNodesYes -0.165302 0.091959 -1.798 0.072682 .
ChestCongestionYes 0.087326 0.097546 0.895 0.370973
ChillsSweatsYes 0.201266 0.127302 1.581 0.114330
NasalCongestionYes -0.215771 0.113798 -1.896 0.058362 .
CoughYNYes 0.313893 0.240738 1.304 0.192707
SneezeYes -0.361924 0.098299 -3.682 0.000249 ***
FatigueYes 0.264762 0.160558 1.649 0.099596 .
SubjectiveFeverYes 0.436837 0.103398 4.225 2.71e-05 ***
HeadacheYes 0.011453 0.125405 0.091 0.927256
WeaknessMild 0.018229 0.189169 0.096 0.923258
WeaknessModerate 0.098944 0.197864 0.500 0.617189
WeaknessSevere 0.373435 0.230766 1.618 0.106065
WeaknessYNYes NA NA NA NA
CoughIntensityMild 0.084881 0.279878 0.303 0.761768
CoughIntensityModerate -0.061384 0.301997 -0.203 0.838992
CoughIntensitySevere -0.037272 0.314013 -0.119 0.905551
CoughYN2Yes NA NA NA NA
MyalgiaMild 0.164242 0.160498 1.023 0.306510
MyalgiaModerate -0.024064 0.167834 -0.143 0.886031
MyalgiaSevere -0.129263 0.207854 -0.622 0.534216
MyalgiaYNYes NA NA NA NA
RunnyNoseYes -0.080485 0.108526 -0.742 0.458569
AbPainYes 0.031574 0.140236 0.225 0.821927
ChestPainYes 0.105071 0.106980 0.982 0.326365
DiarrheaYes -0.156806 0.129545 -1.210 0.226522
EyePnYes 0.131544 0.129757 1.014 0.311047
InsomniaYes -0.006824 0.090797 -0.075 0.940114
ItchyEyeYes -0.008016 0.110191 -0.073 0.942028
NauseaYes -0.034066 0.102049 -0.334 0.738620
EarPnYes 0.093790 0.113875 0.824 0.410436
HearingYes 0.232203 0.222043 1.046 0.296037
PharyngitisYes 0.317581 0.121342 2.617 0.009057 **
BreathlessYes 0.090526 0.099837 0.907 0.364863
ToothPnYes -0.022876 0.113750 -0.201 0.840673
VisionYes -0.274625 0.277681 -0.989 0.323010
VomitYes 0.165272 0.151432 1.091 0.275478
WheezeYes -0.046665 0.107036 -0.436 0.662990
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.144 on 695 degrees of freedom
Multiple R-squared: 0.1287, Adjusted R-squared: 0.08605
F-statistic: 3.019 on 34 and 695 DF, p-value: 4.197e-08
Conclusions
The output of the models is ordinary in that they relay the same primary information of summary(lm(…)) in table form. However, I was unpleasantly surprised by the lack of stars next to p-values. For this reason, I prefer the longer-winded summary(lm(…)) to tidymodels. I will not deny, though, that there may be some instances where the ability to convert such information into a table would be greatly beneficial. The only difference between the two linear regression models built is that one accounts for the multiple predictors that have been inputted, rather than just one.
Logistic Model
Set Up Logistic Regression Engine
logistic_model <- logistic_reg () %>% set_engine ("glm" )
Runny Nose vs Nausea
# Tidymodels
tidy (logistic_model %>% fit (Nausea ~ RunnyNose, data = clean_data))
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -0.658 0.145 -4.53 0.00000589
2 RunnyNoseYes 0.0502 0.172 0.292 0.770
# Base R
summary (glm (Nausea ~ RunnyNose, family = binomial, data = clean_data))
Call:
glm(formula = Nausea ~ RunnyNose, family = binomial, data = clean_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9325 -0.9325 -0.9137 1.4439 1.4664
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.65781 0.14520 -4.530 5.89e-06 ***
RunnyNoseYes 0.05018 0.17182 0.292 0.77
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 944.65 on 729 degrees of freedom
Residual deviance: 944.57 on 728 degrees of freedom
AIC: 948.57
Number of Fisher Scoring iterations: 4
All Relevant Predictors vs Nausea
# Tidymodels
rmarkdown:: paged_table (tidy (logistic_model %>% fit (Nausea ~ SwollenLymphNodes + ChestCongestion + ChillsSweats + NasalCongestion + CoughYN + Sneeze + Fatigue + SubjectiveFever + Headache + Weakness + WeaknessYN + CoughIntensity + CoughYN2 + Myalgia + MyalgiaYN + RunnyNose + AbPain + ChestPain + Diarrhea + EyePn + Insomnia + ItchyEye + EarPn + Hearing + Pharyngitis + Breathless + ToothPn + Vision + Vomit + Wheeze + BodyTemp, data = clean_data)))
# Base R
summary (glm (Nausea ~ SwollenLymphNodes + ChestCongestion + ChillsSweats + NasalCongestion + CoughYN + Sneeze + Fatigue + SubjectiveFever + Headache + Weakness + WeaknessYN + CoughIntensity + CoughYN2 + Myalgia + MyalgiaYN + RunnyNose + AbPain + ChestPain + Diarrhea + EyePn + Insomnia + ItchyEye + EarPn + Hearing + Pharyngitis + Breathless + ToothPn + Vision + Vomit + Wheeze + BodyTemp, family = binomial, data = clean_data))
Call:
glm(formula = Nausea ~ SwollenLymphNodes + ChestCongestion +
ChillsSweats + NasalCongestion + CoughYN + Sneeze + Fatigue +
SubjectiveFever + Headache + Weakness + WeaknessYN + CoughIntensity +
CoughYN2 + Myalgia + MyalgiaYN + RunnyNose + AbPain + ChestPain +
Diarrhea + EyePn + Insomnia + ItchyEye + EarPn + Hearing +
Pharyngitis + Breathless + ToothPn + Vision + Vomit + Wheeze +
BodyTemp, family = binomial, data = clean_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9065 -0.8138 -0.5301 0.8581 2.4268
Coefficients: (3 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.222870 7.827409 0.028 0.977285
SwollenLymphNodesYes -0.251083 0.196029 -1.281 0.200248
ChestCongestionYes 0.275554 0.212662 1.296 0.195066
ChillsSweatsYes 0.274097 0.287828 0.952 0.340948
NasalCongestionYes 0.425817 0.254561 1.673 0.094376 .
CoughYNYes -0.140423 0.518798 -0.271 0.786644
SneezeYes 0.176724 0.210349 0.840 0.400828
FatigueYes 0.229062 0.371882 0.616 0.537925
SubjectiveFeverYes 0.277741 0.225363 1.232 0.217793
HeadacheYes 0.331259 0.284896 1.163 0.244937
WeaknessMild -0.121606 0.446886 -0.272 0.785531
WeaknessModerate 0.310849 0.454483 0.684 0.493999
WeaknessSevere 0.823187 0.510424 1.613 0.106799
WeaknessYNYes NA NA NA NA
CoughIntensityMild -0.220794 0.584367 -0.378 0.705554
CoughIntensityModerate -0.362678 0.631370 -0.574 0.565676
CoughIntensitySevere -0.950544 0.658142 -1.444 0.148659
CoughYN2Yes NA NA NA NA
MyalgiaMild -0.004146 0.368094 -0.011 0.991013
MyalgiaModerate 0.204743 0.373231 0.549 0.583301
MyalgiaSevere 0.120758 0.444927 0.271 0.786075
MyalgiaYNYes NA NA NA NA
RunnyNoseYes 0.045324 0.232645 0.195 0.845535
AbPainYes 0.939304 0.281463 3.337 0.000846 ***
ChestPainYes 0.070777 0.227858 0.311 0.756090
DiarrheaYes 1.063934 0.258705 4.113 3.91e-05 ***
EyePnYes -0.341991 0.277720 -1.231 0.218164
InsomniaYes 0.084175 0.192985 0.436 0.662710
ItchyEyeYes -0.063364 0.232501 -0.273 0.785212
EarPnYes -0.181719 0.239207 -0.760 0.447451
HearingYes 0.323052 0.452402 0.714 0.475177
PharyngitisYes 0.275364 0.266059 1.035 0.300680
BreathlessYes 0.526801 0.208579 2.526 0.011548 *
ToothPnYes 0.480649 0.229474 2.095 0.036209 *
VisionYes 0.125498 0.541114 0.232 0.816596
VomitYes 2.458466 0.348608 7.052 1.76e-12 ***
WheezeYes -0.304435 0.234084 -1.301 0.193417
BodyTemp -0.031246 0.079838 -0.391 0.695526
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 944.65 on 729 degrees of freedom
Residual deviance: 751.47 on 695 degrees of freedom
AIC: 821.47
Number of Fisher Scoring iterations: 4
Conclusions
I reiterate my thoughts from the previous Conclusions section under linear regression; I would still prefer manually recoding my variables for a summary(glm(…)) than use tidymodels, even if model performance between the options offered is the same. However, I have also noticed that summary() is not needed for the statistics to be displayed. Instead, that function is already built into fit() . Just like the linear regression models, the only difference between the two logistic models built is that one accounts for the multiple predictors that have been inputted, rather than just one.
All in all, while tidymodels would not be my go-to for model fitting, the package has a few functions I would like to keep in mind going forward.