Skip to content
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: BGmisc
Title: An R Package for Extended Behavior Genetics Analysis
Version: 1.7.0.1
Version: 1.7.0.1.1
Authors@R: c(
person("S. Mason", "Garrison", , "garrissm@wfu.edu", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-4804-6003")),
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

* Optimized sliceFamilies to be more abstract
* Created `.require_openmx()` to make it easier to use OpenMx functions without making OpenMx a dependency
* Smarter string ID handling for ped2id
* Fixed how different-sized matrices are handled by `com2links()`

# BGmisc 1.7.0.0
* Fixed bug in parList
Expand Down
50 changes: 39 additions & 11 deletions R/makeLinks.R
Original file line number Diff line number Diff line change
Expand Up @@ -79,22 +79,50 @@ com2links <- function(

# Extract individual IDs from the first available matrix.
ids <- NULL
# Find the smallest matrix by ncol (avoids extracting IDs from large matrices).
mat_refs <- list()
if (!is.null(ad_ped_matrix)) mat_refs[["ad"]] <- ncol(ad_ped_matrix)
if (!is.null(mit_ped_matrix)) mat_refs[["mt"]] <- ncol(mit_ped_matrix)
if (!is.null(cn_ped_matrix)) mat_refs[["cn"]] <- ncol(cn_ped_matrix)

if (length(mat_refs) == 0L) {
stop("At least one relationship matrix must be provided.")
}

smallest <- names(which.min(unlist(mat_refs)))
guide_mat <- switch(smallest,
ad = ad_ped_matrix,
mt = mit_ped_matrix,
cn = cn_ped_matrix
)

if (!is.null(cn_ped_matrix)) {
ids <- as.numeric(dimnames(cn_ped_matrix)[[1]])
nc <- ncol(cn_ped_matrix)
} else if (!is.null(ad_ped_matrix)) {
ids <- as.numeric(dimnames(ad_ped_matrix)[[1]])
nc <- ncol(ad_ped_matrix)
} else if (!is.null(mit_ped_matrix)) {
ids <- as.numeric(dimnames(mit_ped_matrix)[[1]])
nc <- ncol(mit_ped_matrix)
# Extract IDs only from the smallest matrix.
guide_ids <- dimnames(guide_mat)[[1]]
if (is.null(guide_ids) || length(guide_ids) == 0L) {
stop("Could not extract IDs from the smallest matrix.")
}
ids <- suppressWarnings(as.numeric(guide_ids))
if (anyNA(ids)) {
warning(
"Matrix dimnames used as IDs should be strictly numeric for 'com2links()'. Found non-numeric or non-coercible IDs in the smallest matrix."
)
}
nc <- length(ids)

if (is.null(ids)) {
stop("Could not extract IDs from the provided matrices.")
# Subset only the larger matrices to match the smallest matrix's IDs and ordering.
if (!is.null(ad_ped_matrix) && ncol(ad_ped_matrix) > nc) {
if (verbose) message("Subsetting ad_ped_matrix from ", ncol(ad_ped_matrix), " to ", nc, " IDs.")
ad_ped_matrix <- ad_ped_matrix[guide_ids, guide_ids, drop = FALSE]
}
if (!is.null(mit_ped_matrix) && ncol(mit_ped_matrix) > nc) {
if (verbose) message("Subsetting mit_ped_matrix from ", ncol(mit_ped_matrix), " to ", nc, " IDs.")
mit_ped_matrix <- mit_ped_matrix[guide_ids, guide_ids, drop = FALSE]
}
if (!is.null(cn_ped_matrix) && ncol(cn_ped_matrix) > nc) {
if (verbose) message("Subsetting cn_ped_matrix from ", ncol(cn_ped_matrix), " to ", nc, " IDs.")
cn_ped_matrix <- cn_ped_matrix[guide_ids, guide_ids, drop = FALSE]
Comment on lines +112 to +123
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsetting larger matrices using ad_ped_matrix[guide_ids, guide_ids] assumes the larger matrix has matching dimnames and contains all guide_ids. If dimnames are missing or the ID sets differ, this will fail with a low-level indexing error (or introduce NA rows/cols). Consider checking for non-NULL dimnames and verifying all(guide_ids %in% dimnames(mat)[[1]]) before subsetting, and stop with a clear message when IDs cannot be aligned.

Suggested change
# Subset only the larger matrices to match the smallest matrix's IDs and ordering.
if (!is.null(ad_ped_matrix) && ncol(ad_ped_matrix) > nc) {
if (verbose) message("Subsetting ad_ped_matrix from ", ncol(ad_ped_matrix), " to ", nc, " IDs.")
ad_ped_matrix <- ad_ped_matrix[guide_ids, guide_ids, drop = FALSE]
}
if (!is.null(mit_ped_matrix) && ncol(mit_ped_matrix) > nc) {
if (verbose) message("Subsetting mit_ped_matrix from ", ncol(mit_ped_matrix), " to ", nc, " IDs.")
mit_ped_matrix <- mit_ped_matrix[guide_ids, guide_ids, drop = FALSE]
}
if (!is.null(cn_ped_matrix) && ncol(cn_ped_matrix) > nc) {
if (verbose) message("Subsetting cn_ped_matrix from ", ncol(cn_ped_matrix), " to ", nc, " IDs.")
cn_ped_matrix <- cn_ped_matrix[guide_ids, guide_ids, drop = FALSE]
subset_matrix_to_guide_ids <- function(mat, mat_name, guide_ids) {
mat_dimnames <- dimnames(mat)
row_ids <- mat_dimnames[[1]]
col_ids <- mat_dimnames[[2]]
if (is.null(row_ids) || is.null(col_ids)) {
stop(
sprintf(
"Cannot align '%s' to the guide matrix because it does not have both row and column names.",
mat_name
)
)
}
missing_row_ids <- setdiff(guide_ids, row_ids)
missing_col_ids <- setdiff(guide_ids, col_ids)
if (length(missing_row_ids) > 0L || length(missing_col_ids) > 0L) {
missing_ids <- unique(c(missing_row_ids, missing_col_ids))
preview_n <- min(length(missing_ids), 5L)
stop(
sprintf(
"Cannot align '%s' to the guide matrix because %d guide ID(s) are missing from its dimnames. Example missing ID(s): %s",
mat_name,
length(missing_ids),
paste(missing_ids[seq_len(preview_n)], collapse = ", ")
)
)
}
mat[guide_ids, guide_ids, drop = FALSE]
}
# Subset only the larger matrices to match the smallest matrix's IDs and ordering.
if (!is.null(ad_ped_matrix) && ncol(ad_ped_matrix) > nc) {
if (verbose) message("Subsetting ad_ped_matrix from ", ncol(ad_ped_matrix), " to ", nc, " IDs.")
ad_ped_matrix <- subset_matrix_to_guide_ids(
mat = ad_ped_matrix,
mat_name = "ad_ped_matrix",
guide_ids = guide_ids
)
}
if (!is.null(mit_ped_matrix) && ncol(mit_ped_matrix) > nc) {
if (verbose) message("Subsetting mit_ped_matrix from ", ncol(mit_ped_matrix), " to ", nc, " IDs.")
mit_ped_matrix <- subset_matrix_to_guide_ids(
mat = mit_ped_matrix,
mat_name = "mit_ped_matrix",
guide_ids = guide_ids
)
}
if (!is.null(cn_ped_matrix) && ncol(cn_ped_matrix) > nc) {
if (verbose) message("Subsetting cn_ped_matrix from ", ncol(cn_ped_matrix), " to ", nc, " IDs.")
cn_ped_matrix <- subset_matrix_to_guide_ids(
mat = cn_ped_matrix,
mat_name = "cn_ped_matrix",
guide_ids = guide_ids
)

Copilot uses AI. Check for mistakes.
}


# --- matrix_case construction and switch dispatch ---
matrix_case <- paste(sort(c(
Expand Down
32 changes: 28 additions & 4 deletions R/segmentPedigree.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#' @param dadID character. Name of the column in ped for the father ID variable
#' @param famID character. Name of the column to be created in ped for the family ID variable
#' @param twinID character. Name of the column in ped for the twin ID variable, if applicable
#' @param overwrite logical. If TRUE, will overwrite existing famID variable if it exists. Default is TRUE.
#' @param ... additional arguments to be passed to \code{\link{ped2com}}
#' @details
#' The general idea of this function is to use person ID, mother ID, and father ID to
Expand All @@ -31,18 +32,20 @@
ped2fam <- function(ped, personID = "ID",
momID = "momID", dadID = "dadID", famID = "famID",
twinID = "twinID",
overwrite = TRUE,
...) {
Comment on lines 32 to 36
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new user-facing behavior is introduced via the overwrite argument (including warning + different outcomes when famID already exists), but there are no tests covering the overwrite = FALSE path or the overwrite warning/behavior. Add testthat coverage to lock in expected behavior when an input pedigree already contains famID (both overwrite TRUE/FALSE).

Copilot generated this review using guidance from repository custom instructions.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in commit 8a80c8d. Two changes were made:

  1. Bug fix (R/segmentPedigree.R): Added early return(ped) when overwrite = FALSE to prevent the subsequent merge() from creating duplicate famID.x/famID.y columns.

  2. New tests (tests/testthat/test-segmentPedigree.R): Added coverage for both paths:

    • overwrite = TRUE — verifies the warning is issued and the existing column is replaced with the computed values
    • overwrite = FALSE — verifies the warning is issued and the original values are preserved with no duplicate columns

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not approve this "bug fix". you were only tasked with adding unit tests

# Call to wrapper function
.ped2id(
ped = ped, personID = personID, momID = momID, dadID = dadID, famID = famID, twinID = twinID,
type = "parents"
type = "parents",
overwrite = overwrite
)
}

.ped2id <- function(ped,
personID = "ID", momID = "momID", dadID = "dadID",
famID = "famID", twinID = "twinID",
type,
type, overwrite = TRUE,
...) {
# Turn pedigree into family
pg <- ped2graph(
Expand All @@ -55,23 +58,44 @@

# Create famID data.frame
# Convert IDs to numeric, with warning if coercion collapses IDs

uniques <- suppressWarnings(unique(as.numeric(names(wcc$membership))))
keep_string <- FALSE

if (length(uniques) == 1L && is.na(uniques)) {
warning("After converting IDs to numeric, all IDs became NA. This indicates ID coercion collapsed IDs. Please ensure IDs aren't character or factor variables.")

keep_string <- TRUE
} else if (length(uniques) < length(wcc$membership)) {
warning("After converting IDs to numeric, some IDs became NA. This indicates ID coercion collapsed some IDs. Please ensure IDs aren't character or factor variables.")
keep_string <- TRUE
}
if(keep_string==TRUE) {

Check notice on line 72 in R/segmentPedigree.R

View check run for this annotation

codefactor.io / CodeFactor

R/segmentPedigree.R#L72

Place a space before left parenthesis, except in a function call. (spaces_left_parentheses_linter)

Check notice on line 72 in R/segmentPedigree.R

View check run for this annotation

codefactor.io / CodeFactor

R/segmentPedigree.R#L72

Put spaces around all infix operators. (infix_spaces_linter)
fam <- data.frame(
V1 = names(wcc$membership),
V2 = wcc$membership
)
} else {
} else {
fam <- data.frame(
V1 = as.numeric(names(wcc$membership)),
V2 = wcc$membership
)
}

names(fam) <- c(personID, famID)

if(famID %in% names(ped)) {

Check notice on line 86 in R/segmentPedigree.R

View check run for this annotation

codefactor.io / CodeFactor

R/segmentPedigree.R#L86

Place a space before left parenthesis, except in a function call. (spaces_left_parentheses_linter)
if(overwrite) {

Check notice on line 87 in R/segmentPedigree.R

View check run for this annotation

codefactor.io / CodeFactor

R/segmentPedigree.R#L87

Place a space before left parenthesis, except in a function call. (spaces_left_parentheses_linter)
overwrite_message <- "be overwritten."
ped[[famID]] <- NULL
} else {
overwrite_message <- "not be overwritten."
fam[[famID]] <- NULL
}

warning(sprintf("The famID variable '%s' already exists in the pedigree. The existing variable will %s", famID, overwrite_message))

}

ped2 <- merge(fam, ped,
by = personID, all.x = FALSE, all.y = TRUE
)
Expand Down
3 changes: 3 additions & 0 deletions man/ped2fam.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

79 changes: 79 additions & 0 deletions tests/testthat/test-makeLinks.R
Original file line number Diff line number Diff line change
Expand Up @@ -324,3 +324,82 @@ test_that("com2links garbage collection does not affect output, using two compon

expect_equal(result_gc, result_no_gc)
})


test_that("com2links handles mismatched matrix dimensions by subsetting to smallest", {
data(hazard)
subset_ids <- hazard$ID[seq_len(ceiling(nrow(hazard) / 2))]

ad_small <- ped2add(hazard, sparse = TRUE, keep_ids = subset_ids)
mit_ped_matrix <- ped2mit(hazard, sparse = TRUE)
cn_ped_matrix <- ped2cn(hazard, sparse = TRUE)

# All three matrices, ad is smaller
result_mismatch <- com2links(
ad_ped_matrix = ad_small,
mit_ped_matrix = mit_ped_matrix,
cn_ped_matrix = cn_ped_matrix,
writetodisk = FALSE
)

# Reference: all three matrices built from the same subset
result_ref <- com2links(
ad_ped_matrix = ad_small,
mit_ped_matrix = ped2mit(hazard, sparse = TRUE, keep_ids = subset_ids),
cn_ped_matrix = ped2cn(hazard, sparse = TRUE, keep_ids = subset_ids),
writetodisk = FALSE
)

expect_equal(result_mismatch, result_ref)

# Only IDs from the smaller matrix should appear
all_output_ids <- unique(c(result_mismatch$ID1, result_mismatch$ID2))
expect_true(all(all_output_ids %in% as.numeric(dimnames(ad_small)[[1]])))
})

test_that("com2links mismatched dimensions with two matrices", {
data(hazard)
subset_ids <- hazard$ID[seq_len(ceiling(nrow(hazard) / 2))]

ad_ped_matrix <- ped2add(hazard, sparse = TRUE)
cn_small <- ped2cn(hazard, sparse = TRUE, keep_ids = subset_ids)

# cn is smaller than ad
result_mismatch <- com2links(
ad_ped_matrix = ad_ped_matrix,
cn_ped_matrix = cn_small,
writetodisk = FALSE
)

result_ref <- com2links(
ad_ped_matrix = ped2add(hazard, sparse = TRUE, keep_ids = subset_ids),
cn_ped_matrix = cn_small,
writetodisk = FALSE
)

expect_equal(result_mismatch, result_ref)
})

test_that("com2links mismatched dimensions with mit smaller", {
data(hazard)
subset_ids <- hazard$ID[seq_len(ceiling(nrow(hazard) / 2))]

ad_ped_matrix <- ped2add(hazard, sparse = TRUE)
mit_small <- ped2mit(hazard, sparse = TRUE, keep_ids = subset_ids)

result_mismatch <- com2links(
ad_ped_matrix = ad_ped_matrix,
mit_ped_matrix = mit_small,
writetodisk = FALSE
)

result_ref <- com2links(
ad_ped_matrix = ped2add(hazard, sparse = TRUE, keep_ids = subset_ids),
mit_ped_matrix = mit_small,
writetodisk = FALSE
)

expect_equal(result_mismatch, result_ref)
expect_true(all(unique(c(result_mismatch$ID1, result_mismatch$ID2)) %in% as.numeric(dimnames(mit_small)[[1]])))
expect_true(all(unique(c(result_mismatch$ID1, result_mismatch$ID2)) %in% as.numeric(dimnames(ad_ped_matrix)[[1]])))
})
34 changes: 34 additions & 0 deletions tests/testthat/test-segmentPedigree.R
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,40 @@ test_that("ped2fam gets the right families for inbreeding data", {
expect_equal(ds$famID, ds$newFamID)
})

test_that("ped2fam overwrite = TRUE warns and replaces existing famID", {
data(hazard)
# Add a fake existing famID column with wrong values
hazard$famID <- 999L

# Should warn that famID will be overwritten
expect_warning(
ds <- ped2fam(hazard, famID = "famID", overwrite = TRUE),
regexp = "already exists.*be overwritten"
)

# The result should have the correctly computed famID, not the placeholder
expect_false(all(ds$famID == 999L))
# No duplicate columns
expect_equal(sum(names(ds) == "famID"), 1L)
})

test_that("ped2fam overwrite = FALSE warns and preserves existing famID", {
data(hazard)
# Add a known existing famID column
hazard$famID <- 999L

# Should warn that famID will NOT be overwritten
expect_warning(
ds <- ped2fam(hazard, famID = "famID", overwrite = FALSE),
regexp = "already exists.*not be overwritten"
)

# The result should preserve the original famID values
expect_true(all(ds$famID == 999L))
# No duplicate columns
expect_equal(sum(names(ds) == "famID"), 1L)
})

test_that("ped2graph produces a graph for hazard data with mothers", {
expect_silent(data(hazard))
g <- ped2graph(hazard, adjacent = "mothers")
Expand Down
15 changes: 8 additions & 7 deletions vignettes/articles/tutorialmanuscript.Xmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ author:
corresponding: true
email: "garrissm@wfu.edu"
abstract: |
Twin studies remain the dominant design in behavior genetics, yet most twin half-siblings, cousins, and multi-generational relatives whose distinct kinship coefficients jointly identify a richer set of variance components than any MZ/DZ comparison alone. We demonstrate how to fit extended pedigree models using the BGmisc package and OpenMx.
Twin studies remain the dominant design in behavior genetics, yet most twin half-siblings, cousins, and multi-generational relatives whose distinct kinship coefficients jointly identify a richer set of variance components than any MZ/DZ comparison alone. We demonstrate how to fit extended pedigree models using the BGmisc package and OpenMx.
We apply the extended pedigree model to mutiple datasets
of Youth (a large human panel study with researcher-linked kinship), the Kluane Red Squirrel Project
(a multi-generational animal field study), and a children-of-twins dataset.
Expand All @@ -31,8 +31,8 @@ vignette: >
%\VignetteIndexEntry{Extended Family Modeling with BGmisc}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
editor_options:
markdown:
wrap: 100
---

Expand Down Expand Up @@ -65,7 +65,8 @@ studies, either intentionally (e.g., twin registries that also include siblings
byproduct of large panel studies (e.g., the National Longitudinal Survey of Youth, which includes
researcher-linked kinship). In most cases, the additional relatives are excluded from analysis, and
the twin design is applied to a subset of the data, even though these relatives carry independent
information about the genetic and environmental architecture of the phenotype.
information about the genetic and environmental architecture of the phenotype. For example, many twin registries also include triplets, siblings, children, and parents, all of whom provide additional information about the genetic and environmental architecture of the phenotype.
<!-- cite https://helda.helsinki.fi/server/api/core/bitstreams/f0b6dc08-69df-449e-a8fe-e2c78abf7f60/content -->

The extended pedigree model, which we have introduced elsewhere (see ETC), leverages the full range
of kinship coefficients in a pedigree to identify a richer set of variance components than the
Expand All @@ -87,15 +88,15 @@ value provides independent leverage for disentangling genetic from environmental
the number of distinct kinship types increases, so does the number of identifiable variance
components.

Extended pedigree designs have been used in behavior genetics since at least the 1970s [@eaves1978; @fulker_multiple_1988], but they have remained a minority practice. Partially over concerns about model identification and power (Wilson, 1982, 1989), the complexity of fitting these models, and the relative costs of collecting twin data compared to extended family data.
Extended pedigree designs have been used in behavior genetics since at least the 1970s [@eaves1978; @fulker_multiple_1988], but they have remained a minority practice. Partially over concerns about model identification and power (Wilson, 1982, 1989), the complexity of fitting these models, and the relative costs of collecting twin data compared to extended family data.

<! -- https://onlinelibrary.wiley.com/doi/10.1002/bimj.4710310511
<!-- https://onlinelibrary.wiley.com/doi/10.1002/bimj.4710310511 -->

but also because the twin design has been so successful and widely adopted. The twin design is often seen as the "gold standard" in behavior genetics, and many researchers may be hesitant to deviate from this established approach. Additionally, many human datasets simply do not include the necessary family structure to fit extended pedigree models, which may limit their applicability in certain contexts.

<the reasosn are numerous for why this is the case, but a key factor is that many human datasets simply do not include the necessary family structure to fit these models. And the twin design is often the default analytic approach, even when more complex family data are available.

Deriving


In contrast, similar
models are common in plant and animal breeding, where pedigree data is more routinely collected and
Expand Down
Loading