Added rtsk_individual_table_add_row() and tc.individual_table_add_row()#122
Conversation
|
@hannesbecher @LynxJinyangii can you review this PR? Ideally you would |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Once I added an add_row function each for sites and mutations, I ran the pre-commit hook. The clang-tidy step fails as there are no checks for it to do: I am not sure if this is due to any changes that I've made. Codex has suggested changes but I don't want to implement them unless it's really needed. What do you think, @gregorgorjanc ? |
|
@bryo-han interesting edge case ... Let's see. On my laptop, I have Is yours 22.1 as the LLM suggests, can you check and report back here? Also, I see the below when running |
|
@bryo-han one more bit for you too check - are you using |
|
@gregorgorjanc this seems to be due the different versions of clang-tidy we have installed: If I run I suggest I add a file |
|
@bryo-han give it a try in your incoming PR and I can test on my end and will let you know. |
|
@bryo-han @LynxJinyangii do you have any other comments on the PR and the code in it? Is this doing what we expect in comparison to Python side? |
| #' new_id <- tc$individual_table_add_row(flags = 0L) | ||
| #' new_id <- tc$individual_table_add_row(parents = c(0L, 2L)) | ||
| #' new_id <- tc$individual_table_add_row(metadata = "abc") | ||
| #' new_id <- tc$individual_table_add_row(metadata = charToRaw("cba")) |
There was a problem hiding this comment.
I think matadata should be a nested list but not a string. For example:
In python:
tb.individuals.add_row(metadata={'file_id':33})
In R with reticulate:
tb$individuals$add_row(metadata=list(file_id=as.integer(33)))
There was a problem hiding this comment.
@LynxJinyangii thanks for raising this. I am not really clued in about the metadata so this part is very hazy for me. Looking at the C function https://tskit.dev/tskit/docs/stable/c-api.html#c.tsk_individual_table_add_row metadata is a character vector. Looking at the Python function https://tskit.dev/tskit/docs/stable/python-api.html#tskit.IndividualTable.add_row metadata is Any object that is valid metadata for the table’s schema. Defaults to the default metadata value for the table’s schema. This is typically {}. For no schema, None and I am a bit clueless what is object that is valid metadata! Do you know and could you suggest what to use for this?! Is it indeed a list()? @bryo-han thoughts from your end?
There was a problem hiding this comment.
okay, I guess we'd better not handel this at C/C++ layer, they mentioned "The main area of difference is, unlike the Python API, the C API doesn’t do any decoding, encoding or schema validation of [Metadata] fields, instead only handling the byte string representation of the metadata. Metadata is therefore never used directly by any tskit C API method, just stored" (https://tskit.dev/tskit/docs/stable/c-api.html). Perhaps we can use https://jeroen.r-universe.dev/jsonlite/doc/manual.html in R. @gregorgorjanc could you please give me an example of how to access a row in the individual table, so I can try writing metadata to a list/dictionary (things like https://tskit.dev/pyslim/docs/latest/metadata.html) in R, storing it in binary in C, and then decoding it again?
There was a problem hiding this comment.
I don't know much about JSON (though it looks simpleish), which is why I struggle regarding the metadata side of things.
As to examples of rtsk_individual_table_add_row() and tc.individual_table_add_row() see https://github.com/HighlanderLab/RcppTskit/pull/122/changes#diff-b8bc9e42f1189821e14b71369310c9d873f56ac1337fa3a2f766817ccb09341aR156 and https://github.com/HighlanderLab/RcppTskit/pull/122/changes#diff-912ff421309575a5784c260f0fdfa9bfa88fb4f5c48acc539b1ca1542f16bc3cR1256 (these examples are part of this PR;)
There was a problem hiding this comment.
@LynxJinyangii @bryo-han we could for now ignore (play ignorant about) metadata for now and just assume it will be a character and we sort it later as part of #36 and #24 - once we figure what is the best way of handling the metadata, we can then easily propose a solution for that aspect later instead of getting bogged down with how to handle metadata while we are trying to add the add_row methods. Yes, let's focus on the add_row methods first for all the tables, and worry about the metadata later!
|
@LynxJinyangii @bryo-han have you managed to run the examples in the code - see #122 (comment) - and are they doing what we expect they should be doing? I would like to get this PR merged and then we work on your PRs for the other |
|
@LynxJinyangii and I think this is fine. Have not worked out how to correctly add complex metadata. But we knew this was going to be tricky. Happy to merge! |
|
Yes, absolutely! Sorry, I forgot about the rebase. Many instructions to keep in mind, but we'll get there. I guess having complex instructions/requirements will deter people from sending unwanted PRs. |

Fixes #120