Title: | Permanent Random Number Sampling |
---|---|
Description: | Survey sampling using permanent random numbers (PRN's). A solution to the problem of unknown overlap between survey samples, which leads to a low precision in estimates when the survey is repeated or combined with other surveys. The PRN solution is to supply the U(0, 1) random numbers to the sampling procedure, instead of having the sampling procedure generate them. In Lindblom (2014) <doi:10.2478/jos-2014-0047>, and therein cited papers, it is shown how this is carried out and how it improves the estimates. This package supports two common fixed-size sampling procedures (simple random sampling and probability-proportional-to-size sampling) and includes a function for transforming the PRN's in order to control the sample overlap. |
Authors: | Kira Coder Gylling [aut, cre] |
Maintainer: | Kira Coder Gylling <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2024-11-21 05:20:47 UTC |
Source: | https://github.com/kirajcg/prnsamplr |
This package provides two functions for drawing stratified
PRN-assisted samples: srs
and pps
. The former –
simple random sampling – assumes that each unit in a given
stratum
is equally likely to be sampled, with inclusion
probability
for each stratum .
The function then samples the
elements with the smallest PRN's,
for each stratum
.
The latter – Pareto sampling – assumes that large units are
more likely to be sampled than small units. The function approximates this
unknown inclusion probability as
where
is a size measure, and samples the
elements with the
smallest values of
for each stratum .
These two functions can be run standalone or via the wrapper function
samp
. Input to the functions is the sampling frame, stratification
information and PRN's given as variables on the frame, and in the case for
pps
also a size measure given as variable on the frame. Output is a
copy of the sampling frame containing sampling information, and in the case
for pps
also containing and
.
Provided is also a function transformprn
via which it is possible to
select where to start counting and in which direction when enumerating the
PRN's in the sampling routines. This is done by specifying starting point
and direction to transformprn
and then calling srs
or
pps
on its output.
Finally, an example dataset is provided that can be used to illustrate the functionality of the package.
Maintainer: Kira Coder Gylling [email protected] (ORCID)
Lindblom, A. (2014). "On Precision in Estimates of Change over Time where Samples are Positively Coordinated by Permanent Random Numbers." Journal of Official Statistics, vol.30, no.4, 2014, pp.773-785. https://doi.org/10.2478/jos-2014-0047.
srs, pps, samp, transformprn, ExampleData
dfSRS <- srs( frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands ) dfPPS <- pps( frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands, size = ~sizeM ) dfPRN <- transformprn( frame = ExampleData, prn = ~rands, direction = "U", start = 0.2 )
dfSRS <- srs( frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands ) dfPPS <- pps( frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands, size = ~sizeM ) dfPRN <- transformprn( frame = ExampleData, prn = ~rands, direction = "U", start = 0.2 )
Artificial dataset to be used with samp
and transformprn
.
ExampleData
ExampleData
## 'ExampleData'
A data frame with 40,000 rows and 6 columns:
stratum
a character vector
id
a numeric vector
npopul
a numeric vector
nsample
a numeric vector
rands
a numeric vector
sizeM
a numeric vector
Ad-hoc simulation in base R.
prnsamplr, samp, srs, pps, transformprn
Stratified probability-proportional-to-size (Pareto PiPS) sampling using permanent random numbers. Can also be used for non-stratified Pareto PiPS using a dummy stratum taking the same value for each object.
pps(frame, stratid, nsamp, prn, size)
pps(frame, stratid, nsamp, prn, size)
frame |
Data frame (or data.table or tibble) containing the elements to sample from. |
stratid |
Variable in |
nsamp |
Variable in |
prn |
Variable in |
size |
Variable in |
A copy of the input sampling frame together with the boolean variable
sampled
, indicating sample inclusion, as well as a numeric variable
lambda
containing the estimated first-order inclusion probabilities
and the numeric variable
that determines which elements are sampled.
prnsamplr, samp, srs, transformprn, ExampleData
dfOut <- pps( frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands, size = ~sizeM )
dfOut <- pps( frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands, size = ~sizeM )
Wrapper for stratified simple random sampling (SRS) and probability-proportional-to-size (PPS) sampling using permanent random numbers. Can also be used for non-stratified sampling using a dummy stratum taking the same value for each object.
samp(method, frame, ...)
samp(method, frame, ...)
method |
|
frame |
Data frame (or data.table or tibble) containing the elements to sample from. |
... |
Further method-specific arguments. |
A copy of the input data frame together with the boolean variable
sampled
, as well as the numeric variables lambda
and Q
when pps is used.
prnsamplr, srs, pps, transformprn, ExampleData
dfOut <- samp( method = pps, frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands, size = ~sizeM ) dfOut <- samp( method = srs, frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands )
dfOut <- samp( method = pps, frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands, size = ~sizeM ) dfOut <- samp( method = srs, frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands )
Stratified simple random sampling (SRS) using permanent random numbers. Can also be used for non-stratified SRS using a dummy stratum taking the same value for each object.
srs(frame, stratid, nsamp, prn)
srs(frame, stratid, nsamp, prn)
frame |
Data frame (or data.table or tibble) containing the elements to sample from. |
stratid |
Variable in |
nsamp |
Variable in |
prn |
Variable in |
A copy of the input sampling frame together with the boolean variable
sampled
, indicating sample inclusion.
prnsamplr, samp, pps, transformprn, ExampleData
dfOut <- srs( frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands )
dfOut <- srs( frame = ExampleData, nsamp = ~nsample, stratid = ~stratum, prn = ~rands )
Transformation of the permanent random numbers used in the sampling procedure, to control the overlap between samples, and thus control the sample coordination. The method used is specified in Lindblom and Teterukovsky (2007).
transformprn(frame, prn, direction, start)
transformprn(frame, prn, direction, start)
frame |
Data frame (or data.table or tibble) containing the elements to sample from. |
prn |
Variable in |
direction |
Direction for the enumeration. "U" or "R" for upwards, or equivalently to the right on the real-number line. "D" or "L" for downwards, or equivalently to the left on the real-number line. |
start |
Starting point for the transformation. For SRS this corresponds to the point at which one wants to start sampling. |
A copy of the input data frame with the permanent random numbers
transformed according to specification, along with the numeric variable
prn.old
containing the non-transformed permanent random numbers.
Lindblom, A. and Teterukovsky, A. (2007). "Coordination of Stratified Pareto pps Samples and Stratified Simple Random Samples at Statistics Sweden." In Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada.
prnsamplr, samp, srs, pps, ExampleData
dfOut <- transformprn( frame = ExampleData, prn = ~rands, direction = "U", start = 0.2 )
dfOut <- transformprn( frame = ExampleData, prn = ~rands, direction = "U", start = 0.2 )