Skip to content

Add WKU Epi Game dataset files#70

Draft
aoliveram wants to merge 19 commits intomasterfrom
issue-62-add-epigames-data
Draft

Add WKU Epi Game dataset files#70
aoliveram wants to merge 19 commits intomasterfrom
issue-62-add-epigames-data

Conversation

@aoliveram
Copy link
Member

@aoliveram aoliveram commented Mar 3, 2026

Closes #69

Opening this draft PR to start integrating the Epigames datasets into netdiffuseR.
So far, I've only added the raw data files for the datasets we discussed. I will be pushing the parsing functions and documentation in the upcoming commits.

To-do list:
-[x] Add raw datasets (game 165).

  • Add internal R routine to parse the frame-by-frame .mtx files into CsparseMatrix and diffnet objects.
  • Update package documentation describing the new datasets.

I'll mark this as ready for review once the code and docs are complete!

@aoliveram
Copy link
Member Author

The raw data and the data in diffnet format are now ready.

Remaining tasks before this PR is ready for review:

  • Expand and improve the documentation to properly describe these new datasets.
  • Review the nature of the networks (specifically, handling the binarization of weighted networks).
  • Evaluate whether we should include the varnet (network variables). Currently commented: #netvars = c("qyes", "qno", "mask", "med"),

Copilot AI and others added 2 commits March 3, 2026 14:32
Co-authored-by: gvegayon <893619+gvegayon@users.noreply.github.com>
@gvegayon
Copy link
Member

gvegayon commented Mar 3, 2026

I don't see any R code yet, is that expected?

Co-authored-by: gvegayon <893619+gvegayon@users.noreply.github.com>
@aoliveram
Copy link
Member Author

Yes, it's expected. All the code is in the repo epigames-analysis-recreation. I just sent an invitation if you want to have a look now. The repo is very similar to the one from Andres, but this one has the new notebook 1-data-parsing-R-format.ipynb, and the notebook 4-diffusion-analysis.Rmd has been updated to the WKU dataset (165).

If you run both, you should get the data you saw in the commit of this PR.

I'll be getting back to work soon.

Copy link
Member

@gvegayon gvegayon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @aoliveram, looks great! Thanks for doing this. I left a couple of comments.

Comment on lines +29 to +31
save(epigamesDiffNet, file = "data/epigamesDiffNet.rda", compress = "xz")

message("diffnet object successfully created and exported to data/epigamesDiffNet.rda")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the usethis::use_data() function instead.

)

# Save as .rda compressed using xz for CRAN compliance
save(epigames_raw, file = "data/epigames_raw.rda", compress = "xz")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, let's use the usethis::use_data() functionality. About the names, let's follow the same convention we are using for the other datasets, this ise, let's name the datasets epigames and epigamesDiffNet.

R/data.r Outdated
NULL # "fakeEdgelist"


#' Epi Games Dataset (Raw version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add the "Raw" label, we aren't doing that with the other datasets.

@aoliveram
Copy link
Member Author

Thanks for the comments @gvegayon. I was working on the collapse_timeframes() function. Let's see how it works. I'll comment on the details about it later. Also, I'll address your comments!

@aoliveram aoliveram force-pushed the issue-62-add-epigames-data branch from 3d739bc to f14bcef Compare March 9, 2026 22:20
@aoliveram
Copy link
Member Author

Hi @gvegayon! Lots of news. Long history short, now we have a collapse_timeframes() functionality that is used to bring us the day-by-day datasetsepigames and epigamesDiffNet from an hourly-by-hourly raw data. All is documented and the function has its respective tests.

I want you to review the novelties, but especially to address an issue I had with GitHub Actions: the macOS-latest (davel) is always failing when installing R.

This is a summary made by the IA:

## Progress summary

This PR adds three new contributions to the package:

**1. `collapse_timeframes()`** – a new utility function that aggregates
high-resolution or continuous-time longitudinal edgelists into discrete time
windows, making them ready for use with `edgelist_to_adjmat()` or
`as_diffnet()`. Includes 21 unit tests.

**2. `epigames` + `epigamesDiffNet`** – a new dataset from the WKU Epi Games
simulation study (594 nodes, 15 daily time periods). `epigames` is the raw
list (hourly edgelist + node attributes with `toa`); `epigamesDiffNet` is the
ready-to-use `diffnet` object built from it using `collapse_timeframes()`.
The pipeline in `data-raw/` is fully reproducible.

**3. `wku_diffnet`** – a companion `diffnet` object from the same WKU study,
built from the per-day adjacency matrices directly (not from the hourly data).
It includes dynamic attributes (`qyes`, `qno`, `mask`, `med`) and serves as a
cross-validation reference: both objects agree on TOA (330 infected, 264 NA).

Documentation was consolidated in `R/data.r`, aliases were fixed, and the
package version was bumped to 1.24.2.

---

## CI status

All checks pass except one job that fails consistently due to a **network
timeout unrelated to the package code**:


All jobs
├─ macOS-latest (release) (valgrind:)          ✓
├─ macOS-latest (devel) (valgrind:)            ✗   ← infrastructure failure
├─ windows-latest (release) (valgrind:)        ✓
├─ ubuntu-latest (devel) (valgrind:)           ✓
├─ ubuntu-latest (release) (valgrind:)         ✓
├─ ubuntu-latest (devel) (valgrind:true)       ○
└─ build-pkg                                   ✓

macOS-latest (devel) failure detail:
└─ Run r-lib/actions/setup-r@v2
   └─ Error: Failed to get R devel: Failed to get R 4.6.0:
      Failed to download version: connect ETIMEDOUT 169.60.149.197:443


The job fails at the **R installation step** — before any package code runs —
because the GitHub Actions runner cannot reach the CRAN mirror
(`169.60.149.197:443`). This is a transient infrastructure issue on the CI
runner side, not a problem with the package. All other platforms (Ubuntu and
macOS release) pass cleanly.

Copy link
Member

@gvegayon gvegayon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, @aoliveram! Yeah, the macos stuff is bugging me across many projects. I don't have a solution yet, so I think is OK for the moment to forget about it. Made some comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of the package?

Package: netdiffuseR
Title: Analysis of Diffusion and Contagion Processes on Networks
Version: 1.24.0
Version: 1.24.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version should be 1.25.0. The versioning is done using

[major].[minor].[patches]

  • Major: Something that breaks the current code
  • Minor: New features that don't break the code.
  • Path: Internal changes fixing bugs or improving things (invisible for the user).

* New dataset `epigames` and `epigamesDiffNet`: a simulated epidemic game
network with 594 nodes and 15 time periods from the WKU Epi Games study.

* New dataset `wku_diffnet`: a `diffnet` object from the WKU simulation study.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be wkuDiffNet to keep consistency. I don't see the code that generates that dataset.

Comment on lines +12 to +17
# Changes in netdiffuseR version 1.24.1 (2026-03-03)

* Fixed CRAN example error in `round_to_seq()`: `plot(w, x)` replaced with
`plot(w)` to avoid `%||%` operator issue in R 4.4.0+'s `formula.default`
when called via `plot.data.frame()`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add to the list of changes in version 1.25.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Epigames datasets to netdiffuseR

3 participants