Skip to content

Added rtsk_individual_table_add_row() and tc.individual_table_add_row()#122

Merged
bryo-han merged 1 commit intoHighlanderLab:mainfrom
gregorgorjanc:gregorgorjanc/issue120
Mar 12, 2026
Merged

Added rtsk_individual_table_add_row() and tc.individual_table_add_row()#122
bryo-han merged 1 commit intoHighlanderLab:mainfrom
gregorgorjanc:gregorgorjanc/issue120

Conversation

@gregorgorjanc
Copy link
Member

Fixes #120

@gregorgorjanc
Copy link
Member Author

@hannesbecher @LynxJinyangii can you review this PR? Ideally you would gh pr checkout 122 it and explore/run it on your end. This one was a bit more complex than I imagined, but it was not that hard at the end with the help of LLM (I did check the various angles ...).

@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@bryo-han
Copy link
Contributor

bryo-han commented Mar 10, 2026

Once I added an add_row function each for sites and mutations, I ran the pre-commit hook. The clang-tidy step fails as there are no checks for it to do:

clang-tidy for RcppTskit..................................................Failed
- hook id: clang-tidy
- exit code: 1

Error: no checks enabled.

I am not sure if this is due to any changes that I've made. Codex has suggested changes but I don't want to implement them unless it's really needed. What do you think, @gregorgorjanc ?

clang-tidy is failing because your hook runs it without any enabled checks.

What I confirmed in your repo:

Hook entry is just RcppTskit/tools/clang_tidy.py in .pre-commit-config.yaml.
That script calls clang-tidy <file> ... -- <flags> and does not pass --checks=....
There is no .clang-tidy config file in the repo.
Your installed binary is Homebrew LLVM version 22.1.0, and clang-tidy -list-checks returns No checks enabled.
So the failure is expected: with this clang-tidy version/config combo, zero checks are active, and clang-tidy exits with Error: no checks enabled.

Typical fixes:

Add a repo .clang-tidy with a Checks: entry.
Or pass --checks=... from clang_tidy.py / pre-commit hook args.
Only if intentional, pass --allow-no-checks (suppresses error but performs no linting).

@gregorgorjanc
Copy link
Member Author

@bryo-han interesting edge case ... Let's see. On my laptop, I have

❯ clang-tidy --version
Homebrew LLVM version 21.1.8
  Optimized build

Is yours 22.1 as the LLM suggests, can you check and report back here?

Also, I see the below when running clang-tidy -list-checks - what do you see? Is there a missing setting/setup on your end that failed or something similar or have I done something specific on my end (I do not recall this).

❯ clang-tidy -list-checks
Enabled checks:
    clang-analyzer-apiModeling.Errno
    clang-analyzer-apiModeling.TrustNonnull
    clang-analyzer-apiModeling.TrustReturnsNonnull
    clang-analyzer-apiModeling.google.GTest
    clang-analyzer-apiModeling.llvm.CastValue
    clang-analyzer-apiModeling.llvm.ReturnValue
    clang-analyzer-core.BitwiseShift
    clang-analyzer-core.CallAndMessage
    clang-analyzer-core.CallAndMessageModeling
    clang-analyzer-core.DereferenceModeling
    clang-analyzer-core.DivideZero
    clang-analyzer-core.DynamicTypePropagation
    clang-analyzer-core.FixedAddressDereference
    clang-analyzer-core.NonNullParamChecker
    clang-analyzer-core.NonnilStringConstants
    clang-analyzer-core.NullDereference
    clang-analyzer-core.StackAddrEscapeBase
    clang-analyzer-core.StackAddressEscape
    clang-analyzer-core.UndefinedBinaryOperatorResult
    clang-analyzer-core.VLASize
    clang-analyzer-core.builtin.AssumeModeling
    clang-analyzer-core.builtin.BuiltinFunctions
    clang-analyzer-core.builtin.NoReturnFunctions
    clang-analyzer-core.uninitialized.ArraySubscript
    clang-analyzer-core.uninitialized.Assign
    clang-analyzer-core.uninitialized.Branch
    clang-analyzer-core.uninitialized.CapturedBlockVariable
    clang-analyzer-core.uninitialized.NewArraySize
    clang-analyzer-core.uninitialized.UndefReturn
    clang-analyzer-cplusplus.ArrayDelete
    clang-analyzer-cplusplus.InnerPointer
    clang-analyzer-cplusplus.Move
    clang-analyzer-cplusplus.NewDelete
    clang-analyzer-cplusplus.NewDeleteLeaks
    clang-analyzer-cplusplus.PlacementNew
    clang-analyzer-cplusplus.PureVirtualCall
    clang-analyzer-cplusplus.SelfAssignment
    clang-analyzer-cplusplus.SmartPtrModeling
    clang-analyzer-cplusplus.StringChecker
    clang-analyzer-deadcode.DeadStores
    clang-analyzer-fuchsia.HandleChecker
    clang-analyzer-nullability.NullPassedToNonnull
    clang-analyzer-nullability.NullReturnedFromNonnull
    clang-analyzer-nullability.NullableDereferenced
    clang-analyzer-nullability.NullablePassedToNonnull
    clang-analyzer-nullability.NullableReturnedFromNonnull
    clang-analyzer-optin.core.EnumCastOutOfRange
    clang-analyzer-optin.cplusplus.UninitializedObject
    clang-analyzer-optin.cplusplus.VirtualCall
    clang-analyzer-optin.mpi.MPI-Checker
    clang-analyzer-optin.osx.OSObjectCStyleCast
    clang-analyzer-optin.osx.cocoa.localizability.EmptyLocalizationContextChecker
    clang-analyzer-optin.osx.cocoa.localizability.NonLocalizedStringChecker
    clang-analyzer-optin.performance.GCDAntipattern
    clang-analyzer-optin.performance.Padding
    clang-analyzer-optin.portability.UnixAPI
    clang-analyzer-optin.taint.GenericTaint
    clang-analyzer-optin.taint.TaintPropagation
    clang-analyzer-optin.taint.TaintedAlloc
    clang-analyzer-optin.taint.TaintedDiv
    clang-analyzer-osx.API
    clang-analyzer-osx.MIG
    clang-analyzer-osx.NSOrCFErrorDerefChecker
    clang-analyzer-osx.NumberObjectConversion
    clang-analyzer-osx.OSObjectRetainCount
    clang-analyzer-osx.ObjCProperty
    clang-analyzer-osx.SecKeychainAPI
    clang-analyzer-osx.cocoa.AtSync
    clang-analyzer-osx.cocoa.AutoreleaseWrite
    clang-analyzer-osx.cocoa.ClassRelease
    clang-analyzer-osx.cocoa.Dealloc
    clang-analyzer-osx.cocoa.IncompatibleMethodTypes
    clang-analyzer-osx.cocoa.Loops
    clang-analyzer-osx.cocoa.MissingSuperCall
    clang-analyzer-osx.cocoa.NSAutoreleasePool
    clang-analyzer-osx.cocoa.NSError
    clang-analyzer-osx.cocoa.NilArg
    clang-analyzer-osx.cocoa.NonNilReturnValue
    clang-analyzer-osx.cocoa.ObjCGenerics
    clang-analyzer-osx.cocoa.RetainCount
    clang-analyzer-osx.cocoa.RetainCountBase
    clang-analyzer-osx.cocoa.RunLoopAutoreleaseLeak
    clang-analyzer-osx.cocoa.SelfInit
    clang-analyzer-osx.cocoa.SuperDealloc
    clang-analyzer-osx.cocoa.UnusedIvars
    clang-analyzer-osx.cocoa.VariadicMethodTypes
    clang-analyzer-osx.coreFoundation.CFError
    clang-analyzer-osx.coreFoundation.CFNumber
    clang-analyzer-osx.coreFoundation.CFRetainRelease
    clang-analyzer-osx.coreFoundation.containers.OutOfBounds
    clang-analyzer-osx.coreFoundation.containers.PointerSizedValues
    clang-analyzer-security.ArrayBound
    clang-analyzer-security.FloatLoopCounter
    clang-analyzer-security.MmapWriteExec
    clang-analyzer-security.PointerSub
    clang-analyzer-security.PutenvStackArray
    clang-analyzer-security.SetgidSetuidOrder
    clang-analyzer-security.cert.env.InvalidPtr
    clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling
    clang-analyzer-security.insecureAPI.SecuritySyntaxChecker
    clang-analyzer-security.insecureAPI.UncheckedReturn
    clang-analyzer-security.insecureAPI.bcmp
    clang-analyzer-security.insecureAPI.bcopy
    clang-analyzer-security.insecureAPI.bzero
    clang-analyzer-security.insecureAPI.decodeValueOfObjCType
    clang-analyzer-security.insecureAPI.getpw
    clang-analyzer-security.insecureAPI.gets
    clang-analyzer-security.insecureAPI.mkstemp
    clang-analyzer-security.insecureAPI.mktemp
    clang-analyzer-security.insecureAPI.rand
    clang-analyzer-security.insecureAPI.strcpy
    clang-analyzer-security.insecureAPI.vfork
    clang-analyzer-unix.API
    clang-analyzer-unix.BlockInCriticalSection
    clang-analyzer-unix.Chroot
    clang-analyzer-unix.DynamicMemoryModeling
    clang-analyzer-unix.Errno
    clang-analyzer-unix.Malloc
    clang-analyzer-unix.MallocSizeof
    clang-analyzer-unix.MismatchedDeallocator
    clang-analyzer-unix.StdCLibraryFunctions
    clang-analyzer-unix.Stream
    clang-analyzer-unix.Vfork
    clang-analyzer-unix.cstring.BadSizeArg
    clang-analyzer-unix.cstring.CStringModeling
    clang-analyzer-unix.cstring.NotNullTerminated
    clang-analyzer-unix.cstring.NullArg
    clang-analyzer-valist.CopyToSelf
    clang-analyzer-valist.Uninitialized
    clang-analyzer-valist.Unterminated
    clang-analyzer-valist.ValistBase
    clang-analyzer-webkit.NoUncountedMemberChecker
    clang-analyzer-webkit.RefCntblBaseVirtualDtor
    clang-analyzer-webkit.UncountedLambdaCapturesChecker

@gregorgorjanc
Copy link
Member Author

@bryo-han one more bit for you too check - are you using homebrew or some other LLVM - I think former, but good to check. Are you on Mac or Linux?

❯ which clang-tidy
/opt/homebrew/bin/clang-tidy

@bryo-han
Copy link
Contributor

@gregorgorjanc this seems to be due the different versions of clang-tidy we have installed:

You’re seeing a version-dependent default behavior.

On your machine (clang-tidy 22.1.0), running without --checks (and without a .clang-tidy file) yields an empty enabled-check set, so pre-commit fails with Error: no checks enabled.
On the other machine (clang-tidy 21.1.8), default enabled checks still include many clang-analyzer-* checks, so -list-checks shows them.
I verified locally that your LLVM 22 install does have checks available:

clang-tidy -list-checks -checks='*' lists many checks.
clang-tidy -list-checks -checks='clang-analyzer-*' lists the same family your colleague sees.
So this is not a broken install; it’s “no default checks selected” unless explicitly configured.

Use one of these fixes in repo:

Add .clang-tidy with a Checks: value.
Pass --checks=... in the hook/script (recommended for deterministic CI/dev behavior).
Use --allow-no-checks only if you intentionally want clang-tidy to run with zero checks.
Sources:

https://clang.llvm.org/extra/clang-tidy/index.html
https://discourse.llvm.org/t/rfc-provide-better-experience-for-new-clang-tidy-users/87466

If I run clang-tidy -list-checks -checks='*', I get load of checks shown.

I suggest I add a file .clang-tidy tonight as suggested above as I am running the newer version. Hope that does not break compatibility.

@gregorgorjanc
Copy link
Member Author

@bryo-han give it a try in your incoming PR and I can test on my end and will let you know.

@gregorgorjanc
Copy link
Member Author

@bryo-han @LynxJinyangii do you have any other comments on the PR and the code in it? Is this doing what we expect in comparison to Python side?

#' new_id <- tc$individual_table_add_row(flags = 0L)
#' new_id <- tc$individual_table_add_row(parents = c(0L, 2L))
#' new_id <- tc$individual_table_add_row(metadata = "abc")
#' new_id <- tc$individual_table_add_row(metadata = charToRaw("cba"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think matadata should be a nested list but not a string. For example:
In python:

tb.individuals.add_row(metadata={'file_id':33})

In R with reticulate:

tb$individuals$add_row(metadata=list(file_id=as.integer(33)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LynxJinyangii thanks for raising this. I am not really clued in about the metadata so this part is very hazy for me. Looking at the C function https://tskit.dev/tskit/docs/stable/c-api.html#c.tsk_individual_table_add_row metadata is a character vector. Looking at the Python function https://tskit.dev/tskit/docs/stable/python-api.html#tskit.IndividualTable.add_row metadata is Any object that is valid metadata for the table’s schema. Defaults to the default metadata value for the table’s schema. This is typically {}. For no schema, None and I am a bit clueless what is object that is valid metadata! Do you know and could you suggest what to use for this?! Is it indeed a list()? @bryo-han thoughts from your end?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I guess we'd better not handel this at C/C++ layer, they mentioned "The main area of difference is, unlike the Python API, the C API doesn’t do any decoding, encoding or schema validation of [Metadata] fields, instead only handling the byte string representation of the metadata. Metadata is therefore never used directly by any tskit C API method, just stored" (https://tskit.dev/tskit/docs/stable/c-api.html). Perhaps we can use https://jeroen.r-universe.dev/jsonlite/doc/manual.html in R. @gregorgorjanc could you please give me an example of how to access a row in the individual table, so I can try writing metadata to a list/dictionary (things like https://tskit.dev/pyslim/docs/latest/metadata.html) in R, storing it in binary in C, and then decoding it again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about JSON (though it looks simpleish), which is why I struggle regarding the metadata side of things.

As to examples of rtsk_individual_table_add_row() and tc.individual_table_add_row() see https://github.com/HighlanderLab/RcppTskit/pull/122/changes#diff-b8bc9e42f1189821e14b71369310c9d873f56ac1337fa3a2f766817ccb09341aR156 and https://github.com/HighlanderLab/RcppTskit/pull/122/changes#diff-912ff421309575a5784c260f0fdfa9bfa88fb4f5c48acc539b1ca1542f16bc3cR1256 (these examples are part of this PR;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LynxJinyangii @bryo-han we could for now ignore (play ignorant about) metadata for now and just assume it will be a character and we sort it later as part of #36 and #24 - once we figure what is the best way of handling the metadata, we can then easily propose a solution for that aspect later instead of getting bogged down with how to handle metadata while we are trying to add the add_row methods. Yes, let's focus on the add_row methods first for all the tables, and worry about the metadata later!

@gregorgorjanc
Copy link
Member Author

@LynxJinyangii @bryo-han have you managed to run the examples in the code - see #122 (comment) - and are they doing what we expect they should be doing? I would like to get this PR merged and then we work on your PRs for the other add_row methods.

@bryo-han
Copy link
Contributor

@LynxJinyangii and I think this is fine. Have not worked out how to correctly add complex metadata. But we knew this was going to be tricky. Happy to merge!

@bryo-han bryo-han merged commit cd52b64 into HighlanderLab:main Mar 12, 2026
8 checks passed
@gregorgorjanc
Copy link
Member Author

OK, thanks for running over the examples and merging this in! Let's aim to do rebase and merge instead of just merge - this creates a simpler git history - atm we see this

Screenshot 2026-03-12 at 17 01 52

but if we would rebase and merge there would be no "merge" commit on top - we would just have the "Added rtsk_ind..." commit.

@gregorgorjanc gregorgorjanc deleted the gregorgorjanc/issue120 branch March 12, 2026 17:29
@bryo-han
Copy link
Contributor

Yes, absolutely! Sorry, I forgot about the rebase. Many instructions to keep in mind, but we'll get there. I guess having complex instructions/requirements will deter people from sending unwanted PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prototype one tc.table.add_row C++ function

3 participants