The package has two main functions, mb_reconstruction()
for reconstruction, and cv_mb()
for cross-validation. This
vignette will demonstrate these functions using the two built-in data
sets. First, let’s look at the built-in data, taken from Nguyen et al
(2021).
The data frame p1Seasonal
contains the reconstruction
targets, namely the dry season, wet season, and water year streamflow
for the Ping River at station P.1 (Chiang Mai, Thailand). The data span
from 1922 to 2003.
p1Seasonal
#> season year Qa
#> <fctr> <int> <num>
#> 1: NJ 1922 576.020
#> 2: NJ 1923 629.170
#> 3: NJ 1924 583.900
#> 4: NJ 1925 680.410
#> 5: NJ 1926 564.320
#> ---
#> 242: WY 1999 1338.252
#> 243: WY 2000 1235.269
#> 244: WY 2001 1898.075
#> 245: WY 2002 1770.417
#> 246: WY 2003 1853.401
As paleoclimate proxies, we use the principal components (PCs) of the
Southeast Asian Dendrochronology Network. A set of PCs has been derived
for each target (see details in Nguyen et al, 2020). These are provided
in pc3seasons
.
str(pc3seasons)
#> List of 3
#> $ NJ: num [1:254, 1:4] -1.2142 -0.2256 -0.0803 1.6726 3.1718 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:4] "PC1" "PC5" "PC6" "PC7"
#> $ JO: num [1:254, 1] 0.51 -0.417 2.113 2.968 1.692 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr "PC1"
#> $ WY: num [1:254, 1] -0.242 -0.439 2.016 3.242 2.508 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr "PC1"
The tree ring data spans from 1750 to 2003. Let us look at the first 10 rows of each principal component matrix.
lapply(pc3seasons, head, n = 10)
#> $NJ
#> PC1 PC5 PC6 PC7
#> [1,] -1.21420806 0.7473268 -0.5783063 -0.7674267
#> [2,] -0.22561311 -0.2710709 -0.5699502 0.7896157
#> [3,] -0.08031591 -0.5816381 -0.7841998 0.9286036
#> [4,] 1.67255722 -1.5043708 -1.2338918 -1.3938897
#> [5,] 3.17182495 -0.7129998 0.7000585 -0.9904258
#> [6,] 1.02307742 -1.4861251 -0.1496151 -0.2286487
#> [7,] 1.30705152 -0.5124227 -0.2053901 -0.2480695
#> [8,] 0.88386663 1.0840077 -0.1134354 -0.8353086
#> [9,] -0.03419247 0.7223128 1.0412121 1.4769790
#> [10,] -1.11164309 -1.5384658 1.8846923 -0.4261086
#>
#> $JO
#> PC1
#> [1,] 0.5098988
#> [2,] -0.4166514
#> [3,] 2.1134018
#> [4,] 2.9683959
#> [5,] 1.6922489
#> [6,] -0.5483287
#> [7,] 2.1975923
#> [8,] 1.0627828
#> [9,] 0.3118631
#> [10,] -0.7735868
#>
#> $WY
#> PC1
#> [1,] -0.2417374
#> [2,] -0.4387672
#> [3,] 2.0158877
#> [4,] 3.2423019
#> [5,] 2.5075143
#> [6,] -0.3573139
#> [7,] 1.7650888
#> [8,] 0.8676295
#> [9,] -0.7693806
#> [10,] -0.1852180
We build a reconstruction with the full data set.
fit <- mb_reconstruction(
instQ = p1Seasonal,
pc.list = pc3seasons,
start.year = 1750,
lambda = 1,
log.trans = 1:3
)
We need to provide the instrumental data (instQ
) and the
PC list (pc.list
). Since the PC data do not have a time
column, we need to provide start.year
, 1750 in this
case.
For the mass balance adjustment, we need to provide a penalty weight
lambda
. The default value is 1 and it works in this case.
But for other applications you may need to test a few values for
lambda
to figure out the optimal value.
Finally, the argument log.trans
provides the indices of
the targets that need to be log transformed. Here we transform all three
targets.
Let’s look at the results.
fit
#> season year Q lambda
#> <char> <int> <num> <num>
#> 1: NJ 1750 596.9214 1
#> 2: NJ 1751 666.2604 1
#> 3: NJ 1752 697.5015 1
#> 4: NJ 1753 1146.8719 1
#> 5: NJ 1754 1171.9384 1
#> ---
#> 758: WY 1999 1866.2336 1
#> 759: WY 2000 2139.2197 1
#> 760: WY 2001 2388.0273 1
#> 761: WY 2002 1763.4657 1
#> 762: WY 2003 1815.3354 1
Let us now cross-validate the model with a hold-out-25% scheme. The
cross-validation folds can be created with the function
make_Z()
.
# Create hold-out chunks
set.seed(24)
cvFolds <- make_Z(
obs = 1922:2003,
nRuns = 50,
frac = 0.25,
contiguous = TRUE
)
# Run cross validation
cv <- cv_mb(
instQ = p1Seasonal,
pc.list = pc3seasons,
cv.folds = cvFolds,
start.year = 1750,
lambda = 1,
log.trans = 1:3,
return.type = 'metric means'
)
# Round up to two decimal places
cv[, (2:6) := lapply(.SD, round, digits = 2), .SDcols = 2:6][]
#> season R2 RE CE nRMSE KGE
#> <fctr> <num> <num> <num> <num> <num>
#> 1: NJ 0.46 0.52 0.43 0.24 0.51
#> 2: JO 0.43 0.39 0.28 0.26 0.42
#> 3: WY 0.46 0.49 0.39 0.22 0.46
Nguyen, H. T. T., Galelli, S., Xu, C., & Buckley, B. (2020). Multi-Proxy, Multi-Season Streamflow Reconstruction with Mass Balance Adjustment. Earth and Space Science Open Archive, 22. https://doi.org/10.1002/essoar.10504791.1