Set pseudorandom number generator seed for consistent results between runs (optional).
set.seed(123)
Next we will generate some sample data representing 5 “synthetic transects”.
We assume a similar “soil forming function” for these transects across each hypothetical “delineation” being transected.
Here the values have been customized so that taxonomic particle size
is fine-loamy, but coarser textured in higher fragment material. Soil
depth varies from shallow to very deep, uniform but centered around
moderately deep. pscs_*
quantities provided for example
numeric quantities that can be used.
x <- do.call('rbind', lapply(1:3, \(i) data.frame(id = paste0(LETTERS[1:10], i),
taxpartsize = c("fine-loamy","loamy","fine-loamy","fine-loamy",
"coarse-loamy", "coarse-loamy", "coarse-loamy",
"loamy-skeletal", "loamy-skeletal", "loamy-skeletal"),
depth = runif(10, 35, 150),
pscs_clay = c(runif(4, 18, 35), runif(6, 14, 18)),
pscs_frags = c(runif(3, 0, 15), runif(4, 10, 34),
runif(3, 35, 60) + c(0, 40, 0)))))
Next we define some rating functions for particle size class and depth class.
rate_taxpartsize <- function(x) {
dplyr::case_match(x,
c("sandy-skeletal") ~ 1,
c("sandy") ~ 2,
c("loamy", "coarse-loamy", "coarse-silty") ~ 3,
c("fine-loamy", "fine-silty") ~ 4,
c("clayey", "fine") ~ 5,
c("very-fine") ~ 6,
c("loamy-skeletal", "clayey-skeletal") ~ 7)
}
rate_depthclass <- function(x,
breaks = c(
`very shallow` = 25,
`shallow` = 50,
`moderately deep` = 100,
`deep` = 150,
`very deep` = 1e4
),
...) {
res <- cut(x, c(0, breaks))
factor(res, levels = levels(res), labels = names(breaks))
}
The above rating functions can be combined in a list (m
)
which will be use used as the mapping
argument to
similar_soils()
. The similar_soils()
function
applies the rating functions to the columns of the input data
x
. Target column names in the data match the names of
m
, providing the “mapping” of data to rating functions.
Now we will demonstrate iterative filtering and application of similar soils criteria.
m <- list(taxpartsize = rate_taxpartsize,
depth = rate_depthclass)
res0 <- similar_soils(x, m)
#> comparing to dominant reference condition (`7.3` on 6 rows)
res0
#> id taxpartsize depth similar_dist similar
#> 1 A1 4 3 3 FALSE
#> 2 B1 3 4 5 FALSE
#> 3 C1 4 3 3 FALSE
#> 4 D1 4 4 4 FALSE
#> 5 E1 3 4 5 FALSE
#> 6 F1 3 2 5 FALSE
#> 7 G1 3 3 4 FALSE
#> 8 H1 7 4 1 TRUE
#> 9 I1 7 3 0 TRUE
#> 10 J1 7 3 0 TRUE
#> 11 A2 4 4 4 FALSE
#> 12 B2 3 4 5 FALSE
#> 13 C2 4 4 4 FALSE
#> 14 D2 4 4 4 FALSE
#> 15 E2 3 2 5 FALSE
#> 16 F2 3 3 4 FALSE
#> 17 G2 3 4 5 FALSE
#> 18 H2 7 3 0 TRUE
#> 19 I2 7 3 0 TRUE
#> 20 J2 7 3 0 TRUE
#> 21 A3 4 4 4 FALSE
#> 22 B3 3 2 5 FALSE
#> 23 C3 4 3 3 FALSE
#> 24 D3 4 3 3 FALSE
#> 25 E3 3 4 5 FALSE
#> 26 F3 3 3 4 FALSE
#> 27 G3 3 4 5 FALSE
#> 28 H3 7 4 1 TRUE
#> 29 I3 7 4 1 TRUE
#> 30 J3 7 3 0 TRUE
First, we accept what is identified as the “dominant condition” for
the dataset, including the similar soils, will be a major component of
the hypothetical mapunit. Identifying soils similar to
"7.3"
(moderately deep, skeletal), corresponds to the soils
supporting a hypothetical major component in a mapunit.
We might consider selecting a different reference condition manually
after inspection. If we were to do that we could set, for example,
condition="4.4"
to select condition "4.4"
rather than "7.3"
that was automatically selected in this
example.
Next, let’s take the remaining dissimilar soils, and re-apply the
similarity criteria based on the next-most-dominant condition. We could
consider loosening the concept of “similarity” by setting
thresh=2
.
y <- subset(x, !res0$similar, select = c("id", "taxpartsize", "depth"))
res1 <- similar_soils(y, m)
#> comparing to dominant reference condition (`3.4` on 6 rows)
res1
#> id taxpartsize depth similar_dist similar
#> 1 A1 4 3 2 FALSE
#> 2 B1 3 4 0 TRUE
#> 3 C1 4 3 2 FALSE
#> 4 D1 4 4 1 TRUE
#> 5 E1 3 4 0 TRUE
#> 6 F1 3 2 2 FALSE
#> 7 G1 3 3 1 TRUE
#> 11 A2 4 4 1 TRUE
#> 12 B2 3 4 0 TRUE
#> 13 C2 4 4 1 TRUE
#> 14 D2 4 4 1 TRUE
#> 15 E2 3 2 2 FALSE
#> 16 F2 3 3 1 TRUE
#> 17 G2 3 4 0 TRUE
#> 21 A3 4 4 1 TRUE
#> 22 B3 3 2 2 FALSE
#> 23 C3 4 3 2 FALSE
#> 24 D3 4 3 2 FALSE
#> 25 E3 3 4 0 TRUE
#> 26 F3 3 3 1 TRUE
#> 27 G3 3 4 0 TRUE
At this second step, "3.4"
(loamy, moderately deep) is
the dominant condition, also identified as similar are
"3.4"
(coarse-loamy, deep). One might consider which one of
these is the best representative condition for the mapunit (including
unobserved areas) regardless of what is “dominant” per se.
If there are issues with dissimilar soils being included in the same
groups, consider revising the rating functions to ensure dissimilar
properties have a distance greater than the set threshold
(thresh
). With similar_soils()
, you can
specify an alternate condition
to compare against, or a
thresh
value higher or lower than 1
.
z <- subset(x, !x$id %in% c(res0$id[res0$similar], res1$id[res1$similar]),
select = c("id", "taxpartsize", "depth"))
res2 <- similar_soils(z, m)
#> comparing to dominant reference condition (`4.3` on 4 rows)
res2
#> id taxpartsize depth similar_dist similar
#> 1 A1 4 3 0 TRUE
#> 3 C1 4 3 0 TRUE
#> 6 F1 3 2 2 FALSE
#> 15 E2 3 2 2 FALSE
#> 22 B3 3 2 2 FALSE
#> 23 C3 4 3 0 TRUE
#> 24 D3 4 3 0 TRUE
Applying the similar soils criteria a final time, we are left with
two components, "4.3"
and "3.2"
; the latter is
dissimilar (and limiting).
Let’s reconstruct a data.frame, with the 4 groups of similar soils each identified with a greek letter. We will see which is the most prevalent overall based on the whole dataset. We could also assess prevalence within individual transects.
fin <- do.call('rbind', list(
data.frame(component = greekletters[[1]][1], subset(res0, similar)),
data.frame(component = greekletters[[1]][2], subset(res1, similar)),
data.frame(component = greekletters[[1]][3], subset(res2, similar)),
data.frame(component = greekletters[[1]][4], subset(res2, !similar))
))
# label any unassigned observations
una <- subset(res0, !res0$id %in% fin$id)
if (nrow(una) > 0) {
fin <- rbind(fin, data.frame(component = "unassigned", una))
}
# put in original order of dataset
fin <- fin[match(x$id, fin$id), ]
We can tabulate the assignments we made and see how that corresponds with our concept for the relative abundance of the soils on the landscape in the typical delineation.
res <- sort(prop.table(table(fin$component)), decreasing = TRUE)
res
#>
#> Beta Alpha Gamma Delta
#> 0.4666667 0.3000000 0.1333333 0.1000000
# TODO: abstract this concept
cmp <- subset(fin, component == names(res[1]))
ref <- names(tail(sort(table(
interaction(cmp$taxpartsize, cmp$depth)
)), 1))
fin_sim <- similar_soils(x, m, ref)
#> comparing to dominant reference condition (`3.4` on 6 rows)
# transfer similarity distance and similar ranking
fin$similar_dist <- fin_sim$similar_dist
fin$similar <- fin_sim$similar # similarity to the dominant condition within Beta
# original sort order
fin
#> component id taxpartsize depth similar_dist similar
#> 1 Gamma A1 4 3 2 FALSE
#> 2 Beta B1 3 4 0 TRUE
#> 3 Gamma C1 4 3 2 FALSE
#> 4 Beta D1 4 4 1 TRUE
#> 5 Beta E1 3 4 0 TRUE
#> 6 Delta F1 3 2 2 FALSE
#> 7 Beta G1 3 3 1 TRUE
#> 8 Alpha H1 7 4 4 FALSE
#> 9 Alpha I1 7 3 5 FALSE
#> 10 Alpha J1 7 3 5 FALSE
#> 11 Beta A2 4 4 1 TRUE
#> 12 Beta B2 3 4 0 TRUE
#> 13 Beta C2 4 4 1 TRUE
#> 14 Beta D2 4 4 1 TRUE
#> 15 Delta E2 3 2 2 FALSE
#> 16 Beta F2 3 3 1 TRUE
#> 17 Beta G2 3 4 0 TRUE
#> 18 Alpha H2 7 3 5 FALSE
#> 19 Alpha I2 7 3 5 FALSE
#> 20 Alpha J2 7 3 5 FALSE
#> 21 Beta A3 4 4 1 TRUE
#> 22 Delta B3 3 2 2 FALSE
#> 23 Gamma C3 4 3 2 FALSE
#> 24 Gamma D3 4 3 2 FALSE
#> 25 Beta E3 3 4 0 TRUE
#> 26 Beta F3 3 3 1 TRUE
#> 27 Beta G3 3 4 0 TRUE
#> 28 Alpha H3 7 4 4 FALSE
#> 29 Alpha I3 7 4 4 FALSE
#> 30 Alpha J3 7 3 5 FALSE
component | id | taxpartsize | depth | similar_dist | similar | |
---|---|---|---|---|---|---|
2 | Beta | B1 | 3 | 4 | 0 | TRUE |
5 | Beta | E1 | 3 | 4 | 0 | TRUE |
12 | Beta | B2 | 3 | 4 | 0 | TRUE |
17 | Beta | G2 | 3 | 4 | 0 | TRUE |
25 | Beta | E3 | 3 | 4 | 0 | TRUE |
27 | Beta | G3 | 3 | 4 | 0 | TRUE |
4 | Beta | D1 | 4 | 4 | 1 | TRUE |
7 | Beta | G1 | 3 | 3 | 1 | TRUE |
11 | Beta | A2 | 4 | 4 | 1 | TRUE |
13 | Beta | C2 | 4 | 4 | 1 | TRUE |
14 | Beta | D2 | 4 | 4 | 1 | TRUE |
16 | Beta | F2 | 3 | 3 | 1 | TRUE |
21 | Beta | A3 | 4 | 4 | 1 | TRUE |
26 | Beta | F3 | 3 | 3 | 1 | TRUE |
1 | Gamma | A1 | 4 | 3 | 2 | FALSE |
3 | Gamma | C1 | 4 | 3 | 2 | FALSE |
6 | Delta | F1 | 3 | 2 | 2 | FALSE |
15 | Delta | E2 | 3 | 2 | 2 | FALSE |
22 | Delta | B3 | 3 | 2 | 2 | FALSE |
23 | Gamma | C3 | 4 | 3 | 2 | FALSE |
24 | Gamma | D3 | 4 | 3 | 2 | FALSE |
8 | Alpha | H1 | 7 | 4 | 4 | FALSE |
28 | Alpha | H3 | 7 | 4 | 4 | FALSE |
29 | Alpha | I3 | 7 | 4 | 4 | FALSE |
9 | Alpha | I1 | 7 | 3 | 5 | FALSE |
10 | Alpha | J1 | 7 | 3 | 5 | FALSE |
18 | Alpha | H2 | 7 | 3 | 5 | FALSE |
19 | Alpha | I2 | 7 | 3 | 5 | FALSE |
20 | Alpha | J2 | 7 | 3 | 5 | FALSE |
30 | Alpha | J3 | 7 | 3 | 5 | FALSE |