Spatial clustering based on correlation or other metrics.

```
cluster_locid(
x,
varname,
locid = "locid",
time = "UTC",
locid_info = NULL,
weight = NULL,
group = NULL,
k = NULL,
max_loss = 0.05,
verbose = TRUE,
distance = "cor",
cores = 1,
...
)
```

- x
`data.frame` (merra subset) with location and time identifiers, and a time-series variable to cluster.

- varname
name of column with data to be used to cluster locations.

- locid
name of column of location identifiers.

- time
name of column with time dimension

- locid_info
(optional) `data.frame` or `sf` object with weights and/or spatial groups (regions) of location identifiers.

- weight
(optional) name of column with (positive) weights in `locid_info`, used in calculating weighted `mean` and `sd` metrics.

- group
(optional) name of column with group-names of locations (such as regions). If provided, clustering will be made for each group separately.

- k
(optional) integer vector of number of clusters to test. By default (`NULL`) clustering process start from `1` to the number of locations and terminates when `max_loss` condition is met.

- max_loss
maximum loss of variation (standard deviation) of clustered variable, measured as `1 - sd(clustered_variable) / sd(original_variable)`. Default value is `0.05`, meaning up to `5` percent of variability of original, non-clustered variable is allowed to be lost by clustering.

- verbose
logical, should the clustering process be reported, TRUE by default.

- distance
character name of a selected distance measure to use `TSdist::KMedoids`. Default metrics is `cor` - Pearson's correlation between the time series variable in different locations. Alternative, allowed methasures: `"euclidean", "manhattan", "minkowski", "infnorm", "ccor", "sts", "dtw", "keogh_lb", "edr", "erp", "lcss", "fourier", "tquest", "dissimfull", "dissimapprox", "acf", "pacf", "ar.lpc.ceps", "ar.mah", "ar.mah.statistic", "ar.mah.pvalue", "ar.pic", "cdm", "cid", "cor", "cort", "wav", "int.per", "per", "mindist.sax", "ncd", "pred", "spec.glk", "spec.isd", "spec.llr", "pdc", "frechet"`.

- cores
integer number of processor cores to use, currently ignored.

- ...
additional parameters to pass to `TSdist::KMedoids`, might be required for some distance measures.

`data.frame` with alternative number of clusters with columns:

- k
Number of clusters

- N
Total number of time series

- locid
location identifier in `merra2ools` datasets

- "group"
(if provided) column with locid-groups

- cluster
cluster number in every `k`-group

- weight
weight of the cluster in the `k`-group

- sd_N
standard deviation of the whole sample of (N) time-series

- sd_k
standard deviation of clustered time series with `k` clusters

- sd_loss
loss of standard deviation as result of clusterisation, for each `k`

```
# see "Cluster locations" in "Get started"
```