Skip to contents

Set pseudorandom number generator seed for consistent results between runs (optional).

Next we will generate some sample data representing 5 “synthetic transects”.

We assume a similar “soil forming function” for these transects across each hypothetical “delineation” being transected.

Here the values have been customized so that taxonomic particle size is fine-loamy, but coarser textured in higher fragment material. Soil depth varies from shallow to very deep, uniform but centered around moderately deep. pscs_* quantities provided for example numeric quantities that can be used.

x <- do.call('rbind', lapply(1:3, \(i) data.frame(id = paste0(LETTERS[1:10], i),
           taxpartsize = c("fine-loamy","loamy","fine-loamy","fine-loamy",
                           "coarse-loamy", "coarse-loamy", "coarse-loamy",
                           "loamy-skeletal", "loamy-skeletal", "loamy-skeletal"),
           depth = runif(10, 35, 150),
           pscs_clay = c(runif(4, 18, 35), runif(6, 14, 18)),
           pscs_frags = c(runif(3, 0, 15), runif(4, 10, 34),
                          runif(3, 35, 60) + c(0, 40, 0)))))

Next we define some rating functions for particle size class and depth class.

rate_taxpartsize <- function(x) {
  dplyr::case_match(x,
                    c("sandy-skeletal") ~ 1,
                    c("sandy") ~ 2,
                    c("loamy", "coarse-loamy", "coarse-silty") ~ 3,
                    c("fine-loamy", "fine-silty") ~ 4,
                    c("clayey", "fine") ~ 5,
                    c("very-fine") ~ 6,
                    c("loamy-skeletal", "clayey-skeletal") ~ 7)
}

rate_depthclass <- function(x,
                            breaks = c(
                              `very shallow` = 25,
                              `shallow` = 50,
                              `moderately deep` = 100,
                              `deep` = 150,
                              `very deep` = 1e4
                            ),
                            ...) {
  res <- cut(x, c(0, breaks))
  factor(res, levels = levels(res), labels = names(breaks))
}

The above rating functions can be combined in a list (m) which will be use used as the mapping argument to similar_soils(). The similar_soils() function applies the rating functions to the columns of the input data x. Target column names in the data match the names of m, providing the “mapping” of data to rating functions.

Now we will demonstrate iterative filtering and application of similar soils criteria.

m <- list(taxpartsize = rate_taxpartsize,
          depth =  rate_depthclass)

res0 <- similar_soils(x, m)
#> comparing to dominant reference condition (`7.3` on 6 rows)
res0
#>    id taxpartsize depth similar_dist similar
#> 1  A1           4     3            3   FALSE
#> 2  B1           3     4            5   FALSE
#> 3  C1           4     3            3   FALSE
#> 4  D1           4     4            4   FALSE
#> 5  E1           3     4            5   FALSE
#> 6  F1           3     2            5   FALSE
#> 7  G1           3     3            4   FALSE
#> 8  H1           7     4            1    TRUE
#> 9  I1           7     3            0    TRUE
#> 10 J1           7     3            0    TRUE
#> 11 A2           4     4            4   FALSE
#> 12 B2           3     4            5   FALSE
#> 13 C2           4     4            4   FALSE
#> 14 D2           4     4            4   FALSE
#> 15 E2           3     2            5   FALSE
#> 16 F2           3     3            4   FALSE
#> 17 G2           3     4            5   FALSE
#> 18 H2           7     3            0    TRUE
#> 19 I2           7     3            0    TRUE
#> 20 J2           7     3            0    TRUE
#> 21 A3           4     4            4   FALSE
#> 22 B3           3     2            5   FALSE
#> 23 C3           4     3            3   FALSE
#> 24 D3           4     3            3   FALSE
#> 25 E3           3     4            5   FALSE
#> 26 F3           3     3            4   FALSE
#> 27 G3           3     4            5   FALSE
#> 28 H3           7     4            1    TRUE
#> 29 I3           7     4            1    TRUE
#> 30 J3           7     3            0    TRUE

First, we accept what is identified as the “dominant condition” for the dataset, including the similar soils, will be a major component of the hypothetical mapunit. Identifying soils similar to "7.3" (moderately deep, skeletal), corresponds to the soils supporting a hypothetical major component in a mapunit.

We might consider selecting a different reference condition manually after inspection. If we were to do that we could set, for example, condition="4.4" to select condition "4.4" rather than "7.3" that was automatically selected in this example.

Next, let’s take the remaining dissimilar soils, and re-apply the similarity criteria based on the next-most-dominant condition. We could consider loosening the concept of “similarity” by setting thresh=2.

y <- subset(x, !res0$similar, select = c("id", "taxpartsize", "depth"))

res1 <- similar_soils(y, m)
#> comparing to dominant reference condition (`3.4` on 6 rows)
res1
#>    id taxpartsize depth similar_dist similar
#> 1  A1           4     3            2   FALSE
#> 2  B1           3     4            0    TRUE
#> 3  C1           4     3            2   FALSE
#> 4  D1           4     4            1    TRUE
#> 5  E1           3     4            0    TRUE
#> 6  F1           3     2            2   FALSE
#> 7  G1           3     3            1    TRUE
#> 11 A2           4     4            1    TRUE
#> 12 B2           3     4            0    TRUE
#> 13 C2           4     4            1    TRUE
#> 14 D2           4     4            1    TRUE
#> 15 E2           3     2            2   FALSE
#> 16 F2           3     3            1    TRUE
#> 17 G2           3     4            0    TRUE
#> 21 A3           4     4            1    TRUE
#> 22 B3           3     2            2   FALSE
#> 23 C3           4     3            2   FALSE
#> 24 D3           4     3            2   FALSE
#> 25 E3           3     4            0    TRUE
#> 26 F3           3     3            1    TRUE
#> 27 G3           3     4            0    TRUE

At this second step, "3.4" (loamy, moderately deep) is the dominant condition, also identified as similar are "3.4" (coarse-loamy, deep). One might consider which one of these is the best representative condition for the mapunit (including unobserved areas) regardless of what is “dominant” per se.

If there are issues with dissimilar soils being included in the same groups, consider revising the rating functions to ensure dissimilar properties have a distance greater than the set threshold (thresh). With similar_soils(), you can specify an alternate condition to compare against, or a thresh value higher or lower than 1.

z <- subset(x, !x$id %in% c(res0$id[res0$similar], res1$id[res1$similar]),
            select = c("id", "taxpartsize", "depth"))
res2 <- similar_soils(z, m)
#> comparing to dominant reference condition (`4.3` on 4 rows)
res2
#>    id taxpartsize depth similar_dist similar
#> 1  A1           4     3            0    TRUE
#> 3  C1           4     3            0    TRUE
#> 6  F1           3     2            2   FALSE
#> 15 E2           3     2            2   FALSE
#> 22 B3           3     2            2   FALSE
#> 23 C3           4     3            0    TRUE
#> 24 D3           4     3            0    TRUE

Applying the similar soils criteria a final time, we are left with two components, "4.3" and "3.2"; the latter is dissimilar (and limiting).

Let’s reconstruct a data.frame, with the 4 groups of similar soils each identified with a greek letter. We will see which is the most prevalent overall based on the whole dataset. We could also assess prevalence within individual transects.

fin <- do.call('rbind', list(
  data.frame(component = greekletters[[1]][1], subset(res0, similar)),
  data.frame(component = greekletters[[1]][2], subset(res1, similar)),
  data.frame(component = greekletters[[1]][3], subset(res2, similar)),
  data.frame(component = greekletters[[1]][4], subset(res2, !similar))
))

# label any unassigned observations
una <- subset(res0, !res0$id %in% fin$id)
if (nrow(una) > 0) {
  fin <- rbind(fin, data.frame(component = "unassigned", una))
}

# put in original order of dataset
fin <- fin[match(x$id, fin$id), ]

We can tabulate the assignments we made and see how that corresponds with our concept for the relative abundance of the soils on the landscape in the typical delineation.

res <- sort(prop.table(table(fin$component)), decreasing = TRUE)
res
#> 
#>      Beta     Alpha     Gamma     Delta 
#> 0.4666667 0.3000000 0.1333333 0.1000000
# TODO: abstract this concept
cmp <- subset(fin, component == names(res[1]))
ref <- names(tail(sort(table(
  interaction(cmp$taxpartsize, cmp$depth)
)), 1))
fin_sim <- similar_soils(x, m, ref)
#> comparing to dominant reference condition (`3.4` on 6 rows)

# transfer similarity distance and similar ranking
fin$similar_dist <- fin_sim$similar_dist
fin$similar <- fin_sim$similar # similarity to the dominant condition within Beta

# original sort order
fin
#>    component id taxpartsize depth similar_dist similar
#> 1      Gamma A1           4     3            2   FALSE
#> 2       Beta B1           3     4            0    TRUE
#> 3      Gamma C1           4     3            2   FALSE
#> 4       Beta D1           4     4            1    TRUE
#> 5       Beta E1           3     4            0    TRUE
#> 6      Delta F1           3     2            2   FALSE
#> 7       Beta G1           3     3            1    TRUE
#> 8      Alpha H1           7     4            4   FALSE
#> 9      Alpha I1           7     3            5   FALSE
#> 10     Alpha J1           7     3            5   FALSE
#> 11      Beta A2           4     4            1    TRUE
#> 12      Beta B2           3     4            0    TRUE
#> 13      Beta C2           4     4            1    TRUE
#> 14      Beta D2           4     4            1    TRUE
#> 15     Delta E2           3     2            2   FALSE
#> 16      Beta F2           3     3            1    TRUE
#> 17      Beta G2           3     4            0    TRUE
#> 18     Alpha H2           7     3            5   FALSE
#> 19     Alpha I2           7     3            5   FALSE
#> 20     Alpha J2           7     3            5   FALSE
#> 21      Beta A3           4     4            1    TRUE
#> 22     Delta B3           3     2            2   FALSE
#> 23     Gamma C3           4     3            2   FALSE
#> 24     Gamma D3           4     3            2   FALSE
#> 25      Beta E3           3     4            0    TRUE
#> 26      Beta F3           3     3            1    TRUE
#> 27      Beta G3           3     4            0    TRUE
#> 28     Alpha H3           7     4            4   FALSE
#> 29     Alpha I3           7     4            4   FALSE
#> 30     Alpha J3           7     3            5   FALSE
component id taxpartsize depth similar_dist similar
2 Beta B1 3 4 0 TRUE
5 Beta E1 3 4 0 TRUE
12 Beta B2 3 4 0 TRUE
17 Beta G2 3 4 0 TRUE
25 Beta E3 3 4 0 TRUE
27 Beta G3 3 4 0 TRUE
4 Beta D1 4 4 1 TRUE
7 Beta G1 3 3 1 TRUE
11 Beta A2 4 4 1 TRUE
13 Beta C2 4 4 1 TRUE
14 Beta D2 4 4 1 TRUE
16 Beta F2 3 3 1 TRUE
21 Beta A3 4 4 1 TRUE
26 Beta F3 3 3 1 TRUE
1 Gamma A1 4 3 2 FALSE
3 Gamma C1 4 3 2 FALSE
6 Delta F1 3 2 2 FALSE
15 Delta E2 3 2 2 FALSE
22 Delta B3 3 2 2 FALSE
23 Gamma C3 4 3 2 FALSE
24 Gamma D3 4 3 2 FALSE
8 Alpha H1 7 4 4 FALSE
28 Alpha H3 7 4 4 FALSE
29 Alpha I3 7 4 4 FALSE
9 Alpha I1 7 3 5 FALSE
10 Alpha J1 7 3 5 FALSE
18 Alpha H2 7 3 5 FALSE
19 Alpha I2 7 3 5 FALSE
20 Alpha J2 7 3 5 FALSE
30 Alpha J3 7 3 5 FALSE