This vignette goes through the spatial thinning example presented in
“spThin: An R package for spatial thinning of species occurrence records
for use in ecological niche models”. Here we demonstrate how
spThin
can be used to spatially thin species occurence
records, we test how many repetitions of the thinning algorithm are
necessary to achieve the optimal number of thinned records for a dataset
previously thinned “by hand”, and we examine whether there is a notable
increase in efficiency if an occurence dataset is thinned as multiple
smaller groups of occurrences, rather than a single large set of
occurrences.
spThin
R packageHere we load the R package from source code. This source code will soon be submitted to CRAN, so that this package can be loaded using standard package management methods
## Loading required package: spam
## Spam version 2.11-1 (2025-01-20) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
##
## Attaching package: 'spam'
## The following objects are masked from 'package:base':
##
## backsolve, forwardsolve
## Loading required package: grid
## Loading required package: fields
## Loading required package: viridisLite
##
## Try help(fields) to get started.
## Loading required package: knitr
To demonstrate the use of spThin
we used a set of 201
verified, georeferenced occurrence records for the Caribbean spiny
pocket mouse Heteromys anomalus. These occurrences are from
Columbia, Venezuela, and three Caribbean islands: Trinidad, Tobago, and
Margarita. This dataset is included as part of the spThin
package.
## SPEC LAT LONG REGION
## 1 anomalus 7.883333 -75.20000 mainland
## 2 anomalus 8.000000 -76.73333 mainland
## 3 anomalus 10.616667 -75.03333 mainland
## 4 anomalus 8.633333 -74.06667 mainland
## 5 anomalus 9.966667 -75.06667 mainland
## 6 anomalus 10.216667 -73.38333 mainland
Here we load and examine the dataset. The name assigned to this
dataset is Heteromys_anomalus_South_America
. Note that this
dataset includes a column indicating which REGION the occurrences was
collected. Regions here refer to either the mainland or three islands in
which an occurrence was collected. We can see that there are many more
occurrences collected for the mainland than for the three islands. Note
that Trinidad has been shortened to ‘trin’ an Margarita has been
shortened to ‘mar’.
##
## mainland mar tobago trin
## 174 2 4 21
spThin::thin
on the full datasetthin
involves multiple settings. This allows for
extensive flexibility in how the user spatially thins a dataset.
However, many have default values. See ?thin
for further
information.
thinned_dataset_full <-
thin( loc.data = Heteromys_anomalus_South_America,
lat.col = "LAT", long.col = "LONG",
spec.col = "SPEC",
thin.par = 10, reps = 100,
locs.thinned.list.return = TRUE,
write.files = FALSE,
write.log.file = FALSE)
## **********************************************
## Beginning Spatial Thinning.
## Script Started at: Fri Feb 28 04:16:15 2025
## lat.long.thin.count
## 122 123 124
## 11 44 45
## [1] "Maximum number of records after thinning: 124"
## [1] "Number of data.frames with max records: 45"
## [1] "No files written for this run."
Below is the same call, but in this case we are writing a number of files to disk. This files include a set of *.csv files of the thinned data and a log file.
thinned_dataset_full <-
thin( loc.data = Heteromys_anomalus_South_America,
lat.col = "LAT", long.col = "LONG",
spec.col = "SPEC",
thin.par = 10, reps = 100,
locs.thinned.list.return = TRUE,
write.files = TRUE,
max.files = 5,
out.dir = "hanomalus_thinned_full/", out.base = "hanomalus_thinned",
write.log.file = TRUE,
log.file = "hanomalus_thinned_full_log_file.txt" )
In the case above, we found that 10 repetitions were sufficient to
return spatially thinned datasets with the optimal number of occurrence
records (124). Because this is a random process, it is possible that a
similarly repeated run would not return any datasets
with the optimal number of occurrence records. To visually assess
whether we are using enough reps
to approach the optimal
number we use the function plotThin
, This function produces
three plots: 1) the cumulative number of records retained versus the
number of repetitions, 2) the log cumulative number of records retained
versus the log number of repetitions, and 3) a histogram of the maximum
number of records retained for each thinned dataset.
Looking at the plot of cumulative maximum records retained versus number of repetitions, we see that in this run, this value is constant through out the dataset creation process, indicating that a single repetition would have sufficed to reach 124. This is likely not always the case, but this plot can be examined to assess whether a given number of repetitions is sufficient to achieve a plateau (sensu species accumulation curves in Ecology).
spThin::thin
on datasets separated by regionthinned_dataset_mainland <-
thin( loc.data = Heteromys_anomalus_South_America[ which( Heteromys_anomalus_South_America$REGION == "mainland" ) , ],
lat.col = "LAT", long.col = "LONG",
spec.col = "SPEC",
thin.par = 10, reps = 100,
locs.thinned.list.return = TRUE,
write.files = FALSE,
write.log.file = FALSE)
## **********************************************
## Beginning Spatial Thinning.
## Script Started at: Fri Feb 28 04:16:16 2025
## lat.long.thin.count
## 109 110
## 32 68
## [1] "Maximum number of records after thinning: 110"
## [1] "Number of data.frames with max records: 68"
## [1] "No files written for this run."
thinned_dataset_trin <-
thin( loc.data = Heteromys_anomalus_South_America[ which( Heteromys_anomalus_South_America$REGION == "trin" ) , ],
lat.col = "LAT", long.col = "LONG",
spec.col = "SPEC",
thin.par = 10, reps = 10,
locs.thinned.list.return = TRUE,
write.files = FALSE,
write.log.file = FALSE)
## **********************************************
## Beginning Spatial Thinning.
## Script Started at: Fri Feb 28 04:16:16 2025
## lat.long.thin.count
## 11 12
## 3 7
## [1] "Maximum number of records after thinning: 12"
## [1] "Number of data.frames with max records: 7"
## [1] "No files written for this run."
thinned_dataset_mar <-
thin( loc.data = Heteromys_anomalus_South_America[ which( Heteromys_anomalus_South_America$REGION == "mar" ) , ],
lat.col = "LAT", long.col = "LONG",
spec.col = "SPEC",
thin.par = 10, reps = 10,
locs.thinned.list.return = TRUE,
write.files = FALSE,
write.log.file = FALSE )
## **********************************************
## Beginning Spatial Thinning.
## Script Started at: Fri Feb 28 04:16:16 2025
## lat.long.thin.count
## 1
## 10
## [1] "Maximum number of records after thinning: 1"
## [1] "Number of data.frames with max records: 10"
## [1] "No files written for this run."
thinned_dataset_tobago <-
thin( loc.data = Heteromys_anomalus_South_America[ which( Heteromys_anomalus_South_America$REGION == "tobago" ) , ],
lat.col = "LAT", long.col = "LONG",
spec.col = "SPEC",
thin.par = 10, reps = 10,
locs.thinned.list.return = TRUE,
write.files = FALSE,
write.log.file = FALSE )
## **********************************************
## Beginning Spatial Thinning.
## Script Started at: Fri Feb 28 04:16:16 2025
## lat.long.thin.count
## 1
## 10
## [1] "Maximum number of records after thinning: 1"
## [1] "Number of data.frames with max records: 10"
## [1] "No files written for this run."