Handle Zeros and Censored Data in Compositional Data
gc_handle_zeros.RdProcess compositional data containing zeros or below-detection-limit (censored) values using log-ratio imputation. Imputation is critical for compositional data analysis because the geometric mean of zeros is undefined, making standard covariance analysis unstable. This function provides three strategies: multiplicative zero imputation, additive zero imputation, or probabilistic imputation using the EM algorithm.
Arguments
- comp_data
A data frame with columns representing compositional parts (e.g.,
sand,silt,clay). Rows with at least one zero or negative value are candidates for imputation.- method
Character string specifying imputation approach:
"mzero"(default): Multiplicative zero imputation using zCompositions::multLRM()"azero": Additive zero imputation using zCompositions::addLRM()"lrem": Log-ratio expectation-maximization using zCompositions::lrEM()
Recommended: Use
"lrem"for data with many zeros; use"mzero"for sparse zeros. Both are log-ratio based and preserve compositional geometry.- dl
Numeric vector of detection limits (one per column) for handling censored data. Default
NULLtreats all positive values as observed. If provided, values belowdl[i]are treated as left-censored in columni. Only used ifmethod = "lrem".- tolerance
Numeric tolerance for identifying zeros (default 1e-10). Values with absolute value below this threshold are treated as zeros.
Value
A list containing:
imputed_data: Data frame with imputed compositional valuesn_zeros_imputed: Integer count of zero/censored values imputedimputation_rate: Proportion of values imputed (n_zeros / total_values)method_used: Character string with imputation method namerow_status: Factor indicating which rows were modified:"observed": No imputation needed (no zeros/negative values)"imputed": At least one zero/censored value replaced"failed": Imputation failed (row excluded from analysis)
Details
Zero Imputation Strategies:
Multiplicative Zero Replacement (mzero): Replaces zeros with a small multiple of the detection limit, then applies log-ratio closure. Fast, appropriate for few isolated zeros.
Additive Zero Replacement (azero): Adds a small constant to all values before closure. Conservative and robust, but can distort low-abundance components.
Log-Ratio EM (lrem): Probabilistic imputation using expectation-maximization on log-ratio transformed data. Respects compositional geometry while honoring censoring patterns. Most theoretically sound but slower than replacement methods.
Detection Limits:
If dl is provided and method = "lrem", values below their detection limit
are treated as left-censored (uncertainty in exact value). EM iterates to
estimate most likely imputed values consistent with the censoring pattern
and covariance structure.
Examples
if (FALSE) { # \dontrun{
# Example: Soil texture data with some zeros or missing detections
soil_samples <- data.frame(
sand = c(40, 35, 0, 42, 38),
silt = c(35, 40, 45, 36, 40),
clay = c(25, 25, 55, 22, 22)
)
# Check for zeros before imputation
has_zeros <- rowSums(soil_samples == 0) > 0
print(paste("Rows with zeros:", sum(has_zeros)))
# Impute using multiplicative zero replacement
result_mzero <- gc_handle_zeros(soil_samples, method = "mzero")
print(result_mzero$imputed_data)
print(paste("Imputation rate:", result_mzero$imputation_rate))
# Impute using log-ratio EM (more principled)
result_lrem <- gc_handle_zeros(soil_samples, method = "lrem")
print(result_lrem$imputed_data)
# With detection limits (censored measurements)
detection_limits <- c(sand = 1, silt = 1, clay = 1)
result_dl <- gc_handle_zeros(soil_samples, method = "lrem", dl = detection_limits)
} # }