Spatial clustering based on correlation or other metrics.
cluster_locid(
x,
varname,
locid = "locid",
time = "UTC",
locid_info = NULL,
weight = NULL,
group = NULL,
k = NULL,
max_loss = 0.05,
verbose = TRUE,
distance = "cor",
cores = 1,
...
)
`data.frame` (merra subset) with location and time identifiers, and a time-series variable to cluster.
name of column with data to be used to cluster locations.
name of column of location identifiers.
name of column with time dimension
(optional) `data.frame` or `sf` object with weights and/or spatial groups (regions) of location identifiers.
(optional) name of column with (positive) weights in `locid_info`, used in calculating weighted `mean` and `sd` metrics.
(optional) name of column with group-names of locations (such as regions). If provided, clustering will be made for each group separately.
(optional) integer vector of number of clusters to test. By default (`NULL`) clustering process start from `1` to the number of locations and terminates when `max_loss` condition is met.
maximum loss of variation (standard deviation) of clustered variable, measured as `1 - sd(clustered_variable) / sd(original_variable)`. Default value is `0.05`, meaning up to `5` percent of variability of original, non-clustered variable is allowed to be lost by clustering.
logical, should the clustering process be reported, TRUE by default.
character name of a selected distance measure to use `TSdist::KMedoids`. Default metrics is `cor` - Pearson's correlation between the time series variable in different locations. Alternative, allowed methasures: `"euclidean", "manhattan", "minkowski", "infnorm", "ccor", "sts", "dtw", "keogh_lb", "edr", "erp", "lcss", "fourier", "tquest", "dissimfull", "dissimapprox", "acf", "pacf", "ar.lpc.ceps", "ar.mah", "ar.mah.statistic", "ar.mah.pvalue", "ar.pic", "cdm", "cid", "cor", "cort", "wav", "int.per", "per", "mindist.sax", "ncd", "pred", "spec.glk", "spec.isd", "spec.llr", "pdc", "frechet"`.
integer number of processor cores to use, currently ignored.
additional parameters to pass to `TSdist::KMedoids`, might be required for some distance measures.
`data.frame` with alternative number of clusters with columns:
Number of clusters
Total number of time series
location identifier in `merra2ools` datasets
(if provided) column with locid-groups
cluster number in every `k`-group
weight of the cluster in the `k`-group
standard deviation of the whole sample of (N) time-series
standard deviation of clustered time series with `k` clusters
loss of standard deviation as result of clusterisation, for each `k`
# see "Cluster locations" in "Get started"