News
threshold_value
from acc_varcomp()
loess
and margins plot slightly improvedAmendment to 2.5.0 news
threshold_value
from acc_varcomp()
New features
dq_report2()
can store results on the disk instead of the RAM with the
new argument storr_factory
. This can be useful in reducing issues of
memory consumption, but we suggest to use fast SSD
s or NVMe
soptions(dataquieR.dontwrapresults = TRUE)
.
With options(dataquieR.testdebug = TRUE)
, you can switch off this
behavior.dataquieR
can provision your function arguments from the metadata.
In order to enable lapply
and Vectorize(SIMPLIFY = FALSE)
with
indicator functions, the first argument is now always
resp_vars
for item level functions.
dataquieR
tries to guess if a function that features both resp_vars
and
study_data
as its first arguments was called w/o resp_vars
but only with
study_data
as its first unnamed argument. If that is the case, it sets
resp_vars
to the default for resp_vars
(typically all variables).
With options(dataquieR.testdebug = TRUE)
, you can switch off this
behavior, if you need.dq_report_by
, in which it is possible to specify:
resp_vars
)id_vars
)int_encoding_errors
checking invalid characters present in
the text with respect to the expected character encoding / code page,
e.g., a code place in the latin1
table is used but the encoding
is utf8
resulting in damaged text outputItem-level data quality dashboard
,
usable to customize data summariesCODE_LIST_TABLE
in the metadata,
where it is possible to state both value label tables and
missing list tables all in one table.item_computation_level
in the metadata,
where it is possible to state variables to be computed from the provided
study data.Breaking changes
prep_get_data_frame("ship")
or
prep_get_data_frame("study_data")
in your code to access example data,
no change is needed. If you are still accessing example data using
system.file()
(e.g. using
load(system.file("extdata", "study_data.RData", package = "dataquieR"))
),
you need to switch to prep_get_data_frame()
, i.e.:
load(system.file("extdata", "study_data.RData", package = "dataquieR"))
would become study_data <- prep_get_data_frame("study_data")
SummaryData
in ResultData
(functions: acc_shape_or_scale
,
acc_margins
, com_segment_missingness
)GRADING
from SummaryData
outputs.
SummaryTable
outputs still feature the column, since these are meant
to be a machine readable interfacecon_contradictions_redcap
used to return a result named SummaryTable
,
while the documentation spoke about SummaryData
. Alas, it should have
been VariableGroupTable
in both cases.
If you relied on SummaryTable
in the results of
con_contradictions_redcap
, you need to change your code
to use now the correct output name VariableGroupTable
. Also, the table
has been slightly modified.VariableGroupData
as returned by con_contradictions_redcap
is a
version optimized for human readers.VariableGroupTable
as returned by con_contradictions_redcap
the column category
has been renamed to CONTRADICTION_TYPE
con_contradictions_redcap
, if summarize_categories
is selected
the result will now be in a sub-list named Other
prep_add_computed_variables
, the column resp_vars
is now named
VAR_NAMES
, to be more in line with other data frames.Reporting
plotly
's
interactive figures[.dataquieR_resultset2
and [[.dataquieR_result
and
related functions have changed slightly. You can now for a
report (r <- dq_report2(...)
) call, e.g.,r[, "com_item_missingness", "ReportSummaryTable"]
to get a balloon plot or
r[, "com_item_missingness", "SummaryData"]
to get a table, for all
variables that were assessed with com_item_missingness()
in the report r
dataquieR_result
objects, these will be combined,
but due to restrictions in R
, this only works, if you call print()
explicitly on this list, not with "auto-printing" (see
https://stackoverflow.com/a/53983005), for example:a <- lapply(c("v00001", "v00004", "v00005", "v00006"), acc_loess, meta_data_v2 = "meta_data_v2", study_data = "study_data")
print(a)
works, but typing a
alone does not.
You have to call print()
or to put lapply()
in brackets:
(lapply())
(Indicator) Functions related
acc_distributions()
was split in acc_distributions()
and
acc_distributions_ecdf()
(prep_acc_distributions_with_ecdf()
creates the original plot)acc_cat_distributions()
meta_data_v2
argumentitem_level
, as synonyms for meta_data
,
new argument segment_level
, as synonyms for meta_data_segment
,
new argument dataframe_level
, as synonyms for meta_data_dataframe
,
new argument cross-item_level
, as synonyms for meta_data_cross_item
,
new argument item_computation_level
, as synonyms for
meta_data_item_computation
label_col
, the label_col
will now
default to LABEL
, except you set the option
options(dataquieR.testdebug = TRUE)
or
options(dataquieR.dontwrapresults = TRUE)
resp_vars
in prep_scalelevel_from_data_and_metadata()
was
never working correctly and not used neither, so it has been deprecated.
It is already not functional and it never wasdes_summary
is still present, but you can now get results for
continuous or categorical variables only, using
des_summary_continuous
and des_summary_categorical
respectivelycon_contradictions_redcap
plot colors vary depending
on CONTRADICTION_TYPES
acc_loess()
uses lowess
instead of loess
(both from the stats
package)General
prep_check_for_dataquieR_updates()
, so, maybe, you need to
manually install the latest beta release using
devtools::install_gitlab("libreumg/dataquieR", auth_token = NULL)
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "subset_u")
is now the
default assuming a one-fits-all-metadata-file (see
? dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
)rlang
or withr
,
most prominently a faster prep_prepare_dataframes()
and rlang
compatible
condition (error) handling.dataquieR_result
class, which is
now applied also to results outside a pipeline.SEGMENT_ID_TABLE
to SEGMENT_ID_REF_TABLE
in
segment level metadata
dq_report_by
files structureHTML
reportsCODE_INTERPRET
changed to be in line with the
AAPOR
definitions, so the following translation:
PP -> P; P -> I; OH -> UO
prep_save_report
and prep_load_report
HTML/JS
output for Firefox
plot.ly
-plotsgginnards
installed; removed dependency from gginnards
.robustbase
about doScale
dq_report2
reportssummarytools
are included in dq_report2
reports, if installed.HTML
generation prepareddq_report2
using a queue improves speedVARIABLE_ROLES
in dq_report2
and suppressing helper variable outputs in dq_report_by
dq_report2
and not directly by the userdq_report2
because it is not so useful in its current implementationdq_report_by
for large reports (can write and optionally render results to disk rather than returning them)dq_report_by
causing DATA_PROCESS
not to workTODO
's in dq_report_by
and add dependent variables on the fly but with VARIABLE_ROLE
suppress:
dq_report_by
dq_report_by
filter_result_slots
in dq_report2
)JS
-table prevented controlling the tableVARIABLE_ROLES
filtered itemsUNIVARIATE_OUTLIER_CHECKTYPE
and MULTIVARIATE_OUTLIER_CHECKTYPE
REDCap
syntax: strictly_successive_dates
and successive_dates
REDCap
rules and NA
handling and DATA_PROCESS
.use_value_labels
is not supported anymore. You can specify the behavior on the rules level in the new cross-item-level metadata column DATA_PREPARATION
END_DIGIT_CHECK
in dq_report2
, (DATA_ENTRY_TYPE
is still supported and auto-converted). If missing, END_DIGIT_CHECK
defaults to FALSE
NA
were in the dataJUMP_LIST
could be added to the item-level metadata if missing, but causing this type of failing rulesWindows
and uncommon variable namesprep_load_workbook_like_file
and meta_data_v2 =
formal in dq_report2
) supporting http
and https
URLs (e.g., Excel
or OpenOffice
workbooks)
dq_report2
replaces dq_report
. Please use dq_report2
from now on.
htmtools
and supports plotly
)data.frame
, and cross-item levels). No required action by user, previous version still supportedREDCap
rules for contradictions (cross-item level metadata), previous contradictions function still supporteddata.frame
-level metadata)AAPOR
conceptacc_univariate_outlier
and acc_multivariate_outlier
now allow selecting the methods used to flag outliers
whoami
is installed, reports now show a more suitable user name~
from the ggplot2
updates causing acc_margins
to
fail for categorical variablesdq_report
reports with wrong brackets
ggplot2 3.4.0
ORCIDs
for two authorsCITATION
fileREADME.md
file adding the funding sources.NEWS.md
file
sigmagap
and made missing guessing more robust.
logical
.acc_margins
.GRADING
columns.rbind.ReportSummaryTable
since these are
not needed anyways and the inherited documentation for those arguments
rbind
from base
contains an invalid URL triggering a NOTE
.int_datatype_matrix
.prep_study2meta
can now also convert factors to dataquieR
compatible
meta_data
/study_data
com_item_missingness
for textual response variables.DT JS
is always loaded when a dq_report
report is
rendered
com_segment_missingness
with
strata_vars
/ group_vars
did not worklabel_col
was set to something else than LABEL
,
strata_vars
did not work for com_unit_missingness
dq_report
.cowplot
to patchwork
in acc_margins
yielding figures
that can be easier manipulated. Please note, that this change could break
existing output manipulations, since the structure of the margins plots
has changed internally. However, output manipulations were hardly
possible for margins plots before, so it is unlikely, that there
are pipelines affected.acc_loess
function.prep_create_meta
handling length-0 arguments by ignoring
these variable attributes at all.con_inadmissible_categorical
(one resp_var
only and
value-limits all the same for all resp_vars
)
README
-Filepandoc
-less systems
dataquieR
function was called
by a generated function f
that lives in an environment
directly inheriting from the empty environment, e.g.
environment(f) <- new.env(parent = emptyenv())
.dontrun
, because they sometimes caused NOTE
s
on rhub
.SummaryTable
entry
of a result within a dq_report
output, the summary
and also
print
generic did not work on the report.devtools::check(cran = TRUE, env_vars = c(NOT_CRAN = "false"))
takes 2:22 minutes now.