diff --git a/.Rbuildignore b/.Rbuildignore index e9e5a8b..5caca1e 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -25,3 +25,4 @@ tests/.*_ghaction.R ^\.github$ rhub-checks /Untitled.+\.R$ +^CRAN-SUBMISSION$ diff --git a/README.Rmd b/README.Rmd index 05e5f5b..84d88e1 100644 --- a/README.Rmd +++ b/README.Rmd @@ -99,10 +99,10 @@ nlsw_tsv <- ) ``` -Now, Dataverse often translates rectangular data into an ingested, or "archival" version, which is application-neutral and easily-readable. `read_dataframe_*()` defaults to taking this ingested version rather than using the original, through the argument `original = FALSE`. - -This default is safe because you may not have the proprietary software that was originally used. On the other hand, the data may have lost information in the process of the ingestation. +**The `original` argument:** Dataverse often translates rectangular data into an ingested, or "archival" version, which is application-neutral and easily-readable. `read_dataframe_*()` defaults to taking this ingested version rather than using the original, through the argument `original = FALSE`. +This default is safe because you may not have the proprietary software that was originally used. +On the other hand, the data may have lost information in the process of the ingestion. Instead, to read the same file but its original version, specify `original = TRUE` and set an `.f` argument. In this case, we know that `nlsw88.tab` is a Stata `.dta` dataset, so we will use the `haven::read_dta` function. ```{r get_dataframe_by_name_original} @@ -120,7 +120,6 @@ Note that even though the file prefix is ".tab", we use `haven::read_dta`. Of course, when the dataset is not ingested (such as a Rds file), users would always need to specify an `.f` argument for the specific file. - Note the difference between `nls_tsv` and `nls_original`. `nls_original` preserves the data attributes like value labels, whereas `nls_tsv` has dropped this or left this in file metadata. ```{r} @@ -132,6 +131,7 @@ attr(nlsw_original$race, "labels") # original dta has value labels ``` +**Caching**: When the dataset to be downloaded is large, downloading the dataset from the internet can be time consuming, and users want to run the download only once in a script they run multiple times. As of version 0.3.15, our package will cache the download data if the user specifies which version of the Dataverse dataset they download from. See the `version` argument in the help page. ### Data Upload and Archiving @@ -208,7 +208,7 @@ Functions related to user management and permissions are currently not exported Dataverse clients in other programming languages include [pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for Python and the [Java client](https://github.com/IQSS/dataverse-client-java). For more information, see [the Dataverse API page](https://guides.dataverse.org/en/5.5/api/client-libraries.html#r). -Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's [OAIHarvester](https://cran.r-project.org/package=OAIHarvester) and Scott Chamberlain's [oai](https://cran.r-project.org/package=oai), which offer metadata download from any web repository that is compliant with the [Open Archives Initiative](https://www.openarchives.org:443/) standards. Additionally, [rdryad](https://cran.r-project.org/package=rdryad) uses OAIHarvester to interface with [Dryad](https://datadryad.org/stash). The [rfigshare](https://cran.r-project.org/package=rfigshare) package works in a similar spirit to **dataverse** with . +Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's [OAIHarvester](https://cran.r-project.org/package=OAIHarvester) and Scott Chamberlain's [oai](https://cran.r-project.org/package=oai), which offer metadata download from any web repository that is compliant with the [Open Archives Initiative](https://www.openarchives.org:443/) standards. Additionally, [rdryad](https://cran.r-project.org/package=rdryad) uses OAIHarvester to interface with [Dryad](https://datadryad.org/). The [rfigshare](https://cran.r-project.org/package=rfigshare) package works in a similar spirit to **dataverse** with . ### More Information diff --git a/README.md b/README.md index 4433f16..12c2245 100644 --- a/README.md +++ b/README.md @@ -137,18 +137,17 @@ nlsw_tsv <- ) ``` -Now, Dataverse often translates rectangular data into an ingested, or -“archival” version, which is application-neutral and easily-readable. -`read_dataframe_*()` defaults to taking this ingested version rather -than using the original, through the argument `original = FALSE`. - -This default is safe because you may not have the proprietary software -that was originally used. On the other hand, the data may have lost -information in the process of the ingestation. - -Instead, to read the same file but its original version, specify -`original = TRUE` and set an `.f` argument. In this case, we know that -`nlsw88.tab` is a Stata `.dta` dataset, so we will use the +**The `original` argument:** Dataverse often translates rectangular data +into an ingested, or “archival” version, which is application-neutral +and easily-readable. `read_dataframe_*()` defaults to taking this +ingested version rather than using the original, through the argument +`original = FALSE`. This default is safe because you may not have the +proprietary software that was originally used. + +On the other hand, the data may have lost information in the process of +the ingestion. Instead, to read the same file but its original version, +specify `original = TRUE` and set an `.f` argument. In this case, we +know that `nlsw88.tab` is a Stata `.dta` dataset, so we will use the `haven::read_dta` function. ``` r @@ -185,6 +184,13 @@ attr(nlsw_original$race, "labels") # original dta has value labels ## white black other ## 1 2 3 +**Caching**: When the dataset to be downloaded is large, downloading the +dataset from the internet can be time consuming, and users want to run +the download only once in a script they run multiple times. As of +version 0.3.15, our package will cache the download data if the user +specifies which version of the Dataverse dataset they download from. See +the `version` argument in the help page. + ### Data Upload and Archiving **Note**: *There are known issues to using to dataverse creation and @@ -288,7 +294,7 @@ offer metadata download from any web repository that is compliant with the [Open Archives Initiative](https://www.openarchives.org:443/) standards. Additionally, [rdryad](https://cran.r-project.org/package=rdryad) uses OAIHarvester to -interface with [Dryad](https://datadryad.org/stash). The +interface with [Dryad](https://datadryad.org/). The [rfigshare](https://cran.r-project.org/package=rfigshare) package works in a similar spirit to **dataverse** with . diff --git a/inst/constants.yml b/inst/constants.yml index 4266efe..f4c392e 100644 --- a/inst/constants.yml +++ b/inst/constants.yml @@ -1,4 +1,4 @@ server: "demo.dataverse.org" -api_token: "15372813-c54f-471f-a3e8-c269ee6a610f" -api_token_expiration: "2025-05-10" +api_token: "e7563e83-1e8c-4ca3-8c01-03e274a8277b" +api_token_expiration: "2026-05-20" api_token_name: "shirokuriwaki" diff --git a/man-roxygen/version.R b/man-roxygen/version.R index cb109dc..5852813 100644 --- a/man-roxygen/version.R +++ b/man-roxygen/version.R @@ -1,11 +1,10 @@ #' @param version A character specifying a version of the dataset. #' This can be of the form `"1.1"` or `"1"` (where in `"x.y"`, x is a major -#' version and y is an optional minor version), or -#' `":latest"` (the default, the latest published version). -#' We recommend using the number format so that -#' the function stores a cache of the data (See \code{\link{cache_dataset}}). -#' If the user specifies a `key` or `DATAVERSE_KEY` argument, they can access the -#' draft version by `":draft"` (the current draft) or `":latest"` (which will -#' prioritize the draft over the latest published version. +#' version and y is an optional minor version). As of v0.3.14, setting a version +#' in this way will cache the dataset (See example in \code{\link{cache_dataset}}) +#' so that it will not re-download the file the second time and read from the cache. #' Finally, set `use_cache = "none"` to not read from the cache and re-download #' afresh even when `version` is provided. +#' If the user specifies a `key` or `DATAVERSE_KEY` argument, they can access the +#' draft version by `":draft"` (the current draft) or `":latest"` (which will +#' prioritize the draft over the latest published version). diff --git a/man/cache.Rd b/man/cache.Rd index 33e557a..486cca8 100644 --- a/man/cache.Rd +++ b/man/cache.Rd @@ -19,15 +19,14 @@ cache_reset() \arguments{ \item{version}{A character specifying a version of the dataset. This can be of the form \code{"1.1"} or \code{"1"} (where in \code{"x.y"}, x is a major -version and y is an optional minor version), or -\code{":latest"} (the default, the latest published version). -We recommend using the number format so that -the function stores a cache of the data (See \code{\link{cache_dataset}}). +version and y is an optional minor version). As of v0.3.14, setting a version +in this way will cache the dataset (See example in \code{\link{cache_dataset}}) +so that it will not re-download the file the second time and read from the cache. +Finally, set \code{use_cache = "none"} to not read from the cache and re-download +afresh even when \code{version} is provided. If the user specifies a \code{key} or \code{DATAVERSE_KEY} argument, they can access the draft version by \code{":draft"} (the current draft) or \code{":latest"} (which will -prioritize the draft over the latest published version. -Finally, set \code{use_cache = "none"} to not read from the cache and re-download -afresh even when \code{version} is provided.} +prioritize the draft over the latest published version).} } \value{ \code{cache_dataset()} returns \code{"disk"} if the dataset version is to be cached to disk, \code{"none"} otherwise. diff --git a/man/files.Rd b/man/files.Rd index ef0f187..3b2eb14 100644 --- a/man/files.Rd +++ b/man/files.Rd @@ -102,15 +102,14 @@ no ingested version, is set to NA. Note in \verb{get_dataframe_*}, \item{version}{A character specifying a version of the dataset. This can be of the form \code{"1.1"} or \code{"1"} (where in \code{"x.y"}, x is a major -version and y is an optional minor version), or -\code{":latest"} (the default, the latest published version). -We recommend using the number format so that -the function stores a cache of the data (See \code{\link{cache_dataset}}). +version and y is an optional minor version). As of v0.3.14, setting a version +in this way will cache the dataset (See example in \code{\link{cache_dataset}}) +so that it will not re-download the file the second time and read from the cache. +Finally, set \code{use_cache = "none"} to not read from the cache and re-download +afresh even when \code{version} is provided. If the user specifies a \code{key} or \code{DATAVERSE_KEY} argument, they can access the draft version by \code{":draft"} (the current draft) or \code{":latest"} (which will -prioritize the draft over the latest published version. -Finally, set \code{use_cache = "none"} to not read from the cache and re-download -afresh even when \code{version} is provided.} +prioritize the draft over the latest published version).} \item{...}{Additional arguments passed to an HTTP request function, such as \code{\link[httr]{GET}}, \code{\link[httr]{POST}}, or diff --git a/man/get_dataframe.Rd b/man/get_dataframe.Rd index 0266fd7..a84dd71 100644 --- a/man/get_dataframe.Rd +++ b/man/get_dataframe.Rd @@ -66,15 +66,14 @@ or add \code{DATAVERSE_SERVER = "dataverse.harvard.edu"} in one's \code{.Renviro file (\code{usethis::edit_r_environ()}), with the appropriate domain as its value.} \item{\code{version}}{A character specifying a version of the dataset. This can be of the form \code{"1.1"} or \code{"1"} (where in \code{"x.y"}, x is a major -version and y is an optional minor version), or -\code{":latest"} (the default, the latest published version). -We recommend using the number format so that -the function stores a cache of the data (See \code{\link{cache_dataset}}). +version and y is an optional minor version). As of v0.3.14, setting a version +in this way will cache the dataset (See example in \code{\link{cache_dataset}}) +so that it will not re-download the file the second time and read from the cache. +Finally, set \code{use_cache = "none"} to not read from the cache and re-download +afresh even when \code{version} is provided. If the user specifies a \code{key} or \code{DATAVERSE_KEY} argument, they can access the draft version by \code{":draft"} (the current draft) or \code{":latest"} (which will -prioritize the draft over the latest published version. -Finally, set \code{use_cache = "none"} to not read from the cache and re-download -afresh even when \code{version} is provided.} +prioritize the draft over the latest published version).} \item{\code{return_url}}{Instead of downloading the file, return the URL for download. Defaults to \code{FALSE}.} }} diff --git a/man/get_dataset.Rd b/man/get_dataset.Rd index d5c35b6..8b796d6 100644 --- a/man/get_dataset.Rd +++ b/man/get_dataset.Rd @@ -41,15 +41,14 @@ for example \code{"10.70122/FK2/HXJVJU"}. Alternatively, an object of class \item{version}{A character specifying a version of the dataset. This can be of the form \code{"1.1"} or \code{"1"} (where in \code{"x.y"}, x is a major -version and y is an optional minor version), or -\code{":latest"} (the default, the latest published version). -We recommend using the number format so that -the function stores a cache of the data (See \code{\link{cache_dataset}}). +version and y is an optional minor version). As of v0.3.14, setting a version +in this way will cache the dataset (See example in \code{\link{cache_dataset}}) +so that it will not re-download the file the second time and read from the cache. +Finally, set \code{use_cache = "none"} to not read from the cache and re-download +afresh even when \code{version} is provided. If the user specifies a \code{key} or \code{DATAVERSE_KEY} argument, they can access the draft version by \code{":draft"} (the current draft) or \code{":latest"} (which will -prioritize the draft over the latest published version. -Finally, set \code{use_cache = "none"} to not read from the cache and re-download -afresh even when \code{version} is provided.} +prioritize the draft over the latest published version).} \item{key}{A character string specifying a Dataverse server API key. If one is not specified, functions calling authenticated API endpoints will fail. diff --git a/tests/testthat/tests-dataset_metadata.R b/tests/testthat/tests-dataset_metadata.R index 9f722d4..127877d 100644 --- a/tests/testthat/tests-dataset_metadata.R +++ b/tests/testthat/tests-dataset_metadata.R @@ -27,7 +27,7 @@ test_that("check versions format", { "fileAccessRequest", "files", "id", "lastUpdateTime", "latestVersionPublishingState", "license", "metadataBlocks", "publicationDate", "releaseTime", "storageIdentifier", "UNF", "versionMinorNumber", "versionNumber", - "versionState") + "versionState", "deaccessionLink") expect_setequal(names(actual[[1]]), expected_names) expect_s3_class(actual[[2]], "dataverse_dataset_version") }) diff --git a/tests/testthat/tests-get_dataset.R b/tests/testthat/tests-get_dataset.R index bec0535..8d86587 100644 --- a/tests/testthat/tests-get_dataset.R +++ b/tests/testthat/tests-get_dataset.R @@ -16,7 +16,7 @@ test_that("download tab from DOI and filename", { "lastUpdateTime", "latestVersionPublishingState", "license", "metadataBlocks", "publicationDate", "releaseTime", "storageIdentifier", "UNF", "versionMinorNumber", - "versionNumber", "versionState") + "versionNumber", "versionState", "deaccessionLink") expect_setequal(names(actual) , expected_names) expect_equal(actual$id , 182158L) diff --git a/tests/testthat/tests-list_datasets.R b/tests/testthat/tests-list_datasets.R index ff27ba4..f9bbd3e 100644 --- a/tests/testthat/tests-list_datasets.R +++ b/tests/testthat/tests-list_datasets.R @@ -49,7 +49,8 @@ test_that("dataverse for 'dataverse-client-r'", { ), id = c( "https://demo.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.70122/FK2/HXJVJU", - "https://demo.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.70122/FK2/PPIAXE" + "https://demo.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.70122/FK2/PPIAXE", + "https://demo.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.70122/FK2/QZDNI4" ) ), class = "data.frame",