--- title: "dataquieR example report" author: "Elisa Kasbohm, Joany Marino, Elena Salogni, Adrian Richter, Stephan Struckmann, Carsten Oliver Schmidt" output: rmarkdown::html_vignette: css: dfg_qs_style.css toc: TRUE toc_depth: 3 vignette: > %\VignetteIndexEntry{dataquieR example report} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Preface This is a brief example report using `dataquieR`'s functions. Also, all outputs are disabled to avoid big files and long runtimes on CRAN. For a longer and better elaborated example, please also consider our [online example with data from SHIP](https://dataquality.qihs.uni-greifswald.de/VIN_DQ-report-SHIP-example.html). Please, also consider the `dq_report2` function for creating interactive reports, that can be viewed using a web browser. # INTEGRITY ## Study data ```r load(system.file("extdata", "study_data.RData", package = "dataquieR")) sd1 <- study_data ``` The imported study data consist of: * N = `dim(sd1)[1]` observations and * P = `dim(sd1)[2]` study variables ## Metadata ```r load(system.file("extdata", "meta_data.RData", package = "dataquieR")) md1 <- meta_data ``` The imported meta data provide information for: * P = `dim(md1)[1]` study variables and * Q = `dim(md1)[2]` attributes ## Applicability The call of this R-function requires two inputs only: ```r appmatrix <- pro_applicability_matrix( study_data = sd1, meta_data = md1, label_col = LABEL ) ``` Heatmap-like plot: ```r appmatrix$ApplicabilityPlot ``` # COMPLETENESS ## Unit missingness ```r my_unit_missings2 <- com_unit_missingness( study_data = sd1, meta_data = md1, id_vars = c("CENTER_0", "PSEUDO_ID"), strata_vars = "CENTER_0", label_col = "LABEL" ) ``` ```r my_unit_missings2$SummaryData ``` ## Segment missingness ```r MissSegs <- com_segment_missingness( study_data = sd1, meta_data = md1, label_col = "LABEL", threshold_value = 5, direction = "high", exclude_roles = c("secondary", "process") ) ``` ```r MissSegs$SummaryPlot ``` ### Adding variables for stratification For some analyses adding new and transformed variable to the study data is necessary. ```r # use the month function of the lubridate package to extract month of exam date require(lubridate) # apply changes to copy of data sd2 <- sd1 # indicate first/second half year sd2$month <- month(sd2$v00013) ``` Static metadata of the variable must be added to the respective metadata. ```r MD_TMP <- prep_add_to_meta( VAR_NAMES = "month", DATA_TYPE = "integer", LABEL = "EXAM_MONTH", VALUE_LABELS = "1 = January | 2 = February | 3 = March | 4 = April | 5 = May | 6 = June | 7 = July | 8 = August | 9 = September | 10 = October | 11 = November | 12 = December", meta_data = md1 ) ``` Subsequent call of the R-function may include the new variable. ```r MissSegs <- com_segment_missingness( study_data = sd2, meta_data = MD_TMP, group_vars = "EXAM_MONTH", label_col = "LABEL", threshold_value = 1, direction = "high", exclude_roles = c("secondary", "process") ) ``` ```r MissSegs$SummaryPlot ``` ## Item missingness The following implementation considers also labeled missing codes. The use of such a table is optional but recommended. Missing code labels used in the simulated study data are loaded as follows: ```r code_labels <- prep_get_data_frame("meta_data_v2|missing_table") ``` ```r item_miss <- com_item_missingness( study_data = sd1, meta_data = meta_data, label_col = "LABEL", show_causes = TRUE, cause_label_df = code_labels, include_sysmiss = TRUE, threshold_value = 80 ) ``` The function call above sets the analyses of causes for missing values to TRUE, includes system missings with an own code, and sets the threshold to 80%. ```r item_miss$SummaryTable ``` #### Summary plot of item missingness ```r item_miss$SummaryPlot ``` # CONSISTENCY ## Limit deviations ```r MyValueLimits <- con_limit_deviations( resp_vars = NULL, label_col = "LABEL", study_data = sd1, meta_data = md1, limits = "HARD_LIMITS" ) ``` ### Summary table ```r MyValueLimits$SummaryTable ``` ### Summary plot ```r # select variables with deviations whichdeviate <- unique(as.character(MyValueLimits$SummaryData$Variables)[ MyValueLimits$SummaryData$Number > 0 & MyValueLimits$SummaryData$Section != "within"]) ``` ```r patchwork::wrap_plots(plotlist = MyValueLimits$SummaryPlotList[whichdeviate], ncol = 2) ``` ## Inadmissible levels ```r IAVCatAll <- con_inadmissible_categorical( study_data = sd1, meta_data = md1, label_col = "LABEL" ) ``` ## Contradictions ```r checks <- read.csv(system.file("extdata", "contradiction_checks.csv", package = "dataquieR" ), header = TRUE, sep = "#" ) ``` ```r AnyContradictions <- con_contradictions( study_data = sd1, meta_data = md1, label_col = "LABEL", check_table = checks, threshold_value = 1 ) ``` ```r AnyContradictions$SummaryTable ``` ```r AnyContradictions$SummaryPlot ``` # ACCURACY ```r robust_univariate_outlier(study_data = sd1, meta_data = md1, label_col = LABEL) c( # head(ruol$SummaryPlotList, 2), tail(ruol$SummaryPlotList, 2) ) ``` ```r myloess <- dataquieR::acc_loess( resp_vars = "SBP_0", group_vars = "USR_BP_0", time_vars = "EXAM_DT_0", label_col = "LABEL", study_data = sd1, meta_data = md1 ) myloess$SummaryPlotList ```