Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ tests/.*_ghaction.R
^\.github$
rhub-checks
/Untitled.+\.R$
^CRAN-SUBMISSION$
10 changes: 5 additions & 5 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,10 @@ nlsw_tsv <-
)
```

Now, Dataverse often translates rectangular data into an ingested, or "archival" version, which is application-neutral and easily-readable. `read_dataframe_*()` defaults to taking this ingested version rather than using the original, through the argument `original = FALSE`.

This default is safe because you may not have the proprietary software that was originally used. On the other hand, the data may have lost information in the process of the ingestation.
**The `original` argument:** Dataverse often translates rectangular data into an ingested, or "archival" version, which is application-neutral and easily-readable. `read_dataframe_*()` defaults to taking this ingested version rather than using the original, through the argument `original = FALSE`.
This default is safe because you may not have the proprietary software that was originally used.

On the other hand, the data may have lost information in the process of the ingestion.
Instead, to read the same file but its original version, specify `original = TRUE` and set an `.f` argument. In this case, we know that `nlsw88.tab` is a Stata `.dta` dataset, so we will use the `haven::read_dta` function.

```{r get_dataframe_by_name_original}
Expand All @@ -120,7 +120,6 @@ Note that even though the file prefix is ".tab", we use `haven::read_dta`.

Of course, when the dataset is not ingested (such as a Rds file), users would always need to specify an `.f` argument for the specific file.


Note the difference between `nls_tsv` and `nls_original`. `nls_original` preserves the data attributes like value labels, whereas `nls_tsv` has dropped this or left this in file metadata.

```{r}
Expand All @@ -132,6 +131,7 @@ attr(nlsw_original$race, "labels") # original dta has value labels
```


**Caching**: When the dataset to be downloaded is large, downloading the dataset from the internet can be time consuming, and users want to run the download only once in a script they run multiple times. As of version 0.3.15, our package will cache the download data if the user specifies which version of the Dataverse dataset they download from. See the `version` argument in the help page.

### Data Upload and Archiving

Expand Down Expand Up @@ -208,7 +208,7 @@ Functions related to user management and permissions are currently not exported

Dataverse clients in other programming languages include [pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for Python and the [Java client](https://github.com/IQSS/dataverse-client-java). For more information, see [the Dataverse API page](https://guides.dataverse.org/en/5.5/api/client-libraries.html#r).

Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's [OAIHarvester](https://cran.r-project.org/package=OAIHarvester) and Scott Chamberlain's [oai](https://cran.r-project.org/package=oai), which offer metadata download from any web repository that is compliant with the [Open Archives Initiative](https://www.openarchives.org:443/) standards. Additionally, [rdryad](https://cran.r-project.org/package=rdryad) uses OAIHarvester to interface with [Dryad](https://datadryad.org/stash). The [rfigshare](https://cran.r-project.org/package=rfigshare) package works in a similar spirit to **dataverse** with <https://figshare.com/>.
Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's [OAIHarvester](https://cran.r-project.org/package=OAIHarvester) and Scott Chamberlain's [oai](https://cran.r-project.org/package=oai), which offer metadata download from any web repository that is compliant with the [Open Archives Initiative](https://www.openarchives.org:443/) standards. Additionally, [rdryad](https://cran.r-project.org/package=rdryad) uses OAIHarvester to interface with [Dryad](https://datadryad.org/). The [rfigshare](https://cran.r-project.org/package=rfigshare) package works in a similar spirit to **dataverse** with <https://figshare.com/>.


### More Information
Expand Down
32 changes: 19 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,18 +137,17 @@ nlsw_tsv <-
)
```

Now, Dataverse often translates rectangular data into an ingested, or
“archival” version, which is application-neutral and easily-readable.
`read_dataframe_*()` defaults to taking this ingested version rather
than using the original, through the argument `original = FALSE`.

This default is safe because you may not have the proprietary software
that was originally used. On the other hand, the data may have lost
information in the process of the ingestation.

Instead, to read the same file but its original version, specify
`original = TRUE` and set an `.f` argument. In this case, we know that
`nlsw88.tab` is a Stata `.dta` dataset, so we will use the
**The `original` argument:** Dataverse often translates rectangular data
into an ingested, or “archival” version, which is application-neutral
and easily-readable. `read_dataframe_*()` defaults to taking this
ingested version rather than using the original, through the argument
`original = FALSE`. This default is safe because you may not have the
proprietary software that was originally used.

On the other hand, the data may have lost information in the process of
the ingestion. Instead, to read the same file but its original version,
specify `original = TRUE` and set an `.f` argument. In this case, we
know that `nlsw88.tab` is a Stata `.dta` dataset, so we will use the
`haven::read_dta` function.

``` r
Expand Down Expand Up @@ -185,6 +184,13 @@ attr(nlsw_original$race, "labels") # original dta has value labels
## white black other
## 1 2 3

**Caching**: When the dataset to be downloaded is large, downloading the
dataset from the internet can be time consuming, and users want to run
the download only once in a script they run multiple times. As of
version 0.3.15, our package will cache the download data if the user
specifies which version of the Dataverse dataset they download from. See
the `version` argument in the help page.

### Data Upload and Archiving

**Note**: *There are known issues to using to dataverse creation and
Expand Down Expand Up @@ -288,7 +294,7 @@ offer metadata download from any web repository that is compliant with
the [Open Archives Initiative](https://www.openarchives.org:443/)
standards. Additionally,
[rdryad](https://cran.r-project.org/package=rdryad) uses OAIHarvester to
interface with [Dryad](https://datadryad.org/stash). The
interface with [Dryad](https://datadryad.org/). The
[rfigshare](https://cran.r-project.org/package=rfigshare) package works
in a similar spirit to **dataverse** with <https://figshare.com/>.

Expand Down
4 changes: 2 additions & 2 deletions inst/constants.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
server: "demo.dataverse.org"
api_token: "15372813-c54f-471f-a3e8-c269ee6a610f"
api_token_expiration: "2025-05-10"
api_token: "e7563e83-1e8c-4ca3-8c01-03e274a8277b"
api_token_expiration: "2026-05-20"
api_token_name: "shirokuriwaki"
13 changes: 6 additions & 7 deletions man-roxygen/version.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
#' @param version A character specifying a version of the dataset.
#' This can be of the form `"1.1"` or `"1"` (where in `"x.y"`, x is a major
#' version and y is an optional minor version), or
#' `":latest"` (the default, the latest published version).
#' We recommend using the number format so that
#' the function stores a cache of the data (See \code{\link{cache_dataset}}).
#' If the user specifies a `key` or `DATAVERSE_KEY` argument, they can access the
#' draft version by `":draft"` (the current draft) or `":latest"` (which will
#' prioritize the draft over the latest published version.
#' version and y is an optional minor version). As of v0.3.14, setting a version
#' in this way will cache the dataset (See example in \code{\link{cache_dataset}})
#' so that it will not re-download the file the second time and read from the cache.
#' Finally, set `use_cache = "none"` to not read from the cache and re-download
#' afresh even when `version` is provided.
#' If the user specifies a `key` or `DATAVERSE_KEY` argument, they can access the
#' draft version by `":draft"` (the current draft) or `":latest"` (which will
#' prioritize the draft over the latest published version).
13 changes: 6 additions & 7 deletions man/cache.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 6 additions & 7 deletions man/files.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 6 additions & 7 deletions man/get_dataframe.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 6 additions & 7 deletions man/get_dataset.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/testthat/tests-dataset_metadata.R
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ test_that("check versions format", {
"fileAccessRequest", "files", "id", "lastUpdateTime", "latestVersionPublishingState",
"license", "metadataBlocks", "publicationDate", "releaseTime",
"storageIdentifier", "UNF", "versionMinorNumber", "versionNumber",
"versionState")
"versionState", "deaccessionLink")
expect_setequal(names(actual[[1]]), expected_names)
expect_s3_class(actual[[2]], "dataverse_dataset_version")
})
2 changes: 1 addition & 1 deletion tests/testthat/tests-get_dataset.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ test_that("download tab from DOI and filename", {
"lastUpdateTime", "latestVersionPublishingState", "license",
"metadataBlocks", "publicationDate", "releaseTime",
"storageIdentifier", "UNF", "versionMinorNumber",
"versionNumber", "versionState")
"versionNumber", "versionState", "deaccessionLink")

expect_setequal(names(actual) , expected_names)
expect_equal(actual$id , 182158L)
Expand Down
3 changes: 2 additions & 1 deletion tests/testthat/tests-list_datasets.R
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ test_that("dataverse for 'dataverse-client-r'", {
),
id = c(
"https://demo.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.70122/FK2/HXJVJU",
"https://demo.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.70122/FK2/PPIAXE"
"https://demo.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.70122/FK2/PPIAXE",
"https://demo.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.70122/FK2/QZDNI4"
)
),
class = "data.frame",
Expand Down
Loading