Conversation
|
Review requested:
|
|
@araujogui Maybe |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #62241 +/- ##
==========================================
- Coverage 89.67% 89.67% -0.01%
==========================================
Files 676 676
Lines 206469 206582 +113
Branches 39537 39562 +25
==========================================
+ Hits 185157 185244 +87
- Misses 13448 13465 +17
- Partials 7864 7873 +9
🚀 New features to boost your workflow:
|
|
Should we make this a diagnostics channel instead? |
Sounds a good idea to me CC @nodejs/sqlite @nodejs/diagnostics |
|
Yeah, this seems very much like a thing diagnostics_channel would be more suitable for. There's also a native-side publishing API now, so you can use that. |
|
What is the advantage of using a diagnostic channel here? What if you have an application with 1000s of Do diagnostic channels offer that much flexibility? Note that there are other things that |
|
The sqlite trace interfaces just gives you some data at the C layer. You still need to figure out how to deliver that efficiently to JS. The diagnostics_channel API is designed to be as close as possible to zero-cost when nothing is listening for a given event, and to be as cheap as possible when something is. It also allows decoupling through named channels so you don't have to explicitly pass around your objects to the things which want to observe them, which makes it much easier for Observability tools to see them. Why reinvent all this machinery to get your data from C to JavaScript when we already have a thing specifically for this purpose? |
Qard
left a comment
There was a problem hiding this comment.
Generally LGTM, but if at all possible I'd like to see more data than just the final SQL string. It'd be helpful to be able to better correlate where exactly it came from and what the original inputs were.
| if (retval.IsEmpty()) { | ||
| db->SetIgnoreNextSQLiteError(true); | ||
| } | ||
| ch->Publish(env, sql_string); |
There was a problem hiding this comment.
Is it possible to get the query template and values as they were prior to generating the final SQL string? If at all possible, I think it would be valuable for Observability purposes to publish an event which includes: database instance, query template, query parameters, and final query string.
|
Because, as I understand it, it is not scoped per database connection. It's all or nothing. If you have 1000s of concurrent database connections, there's no way to trace only a single of them with this implementation. You will have to deal with a deluge of traces from all of them once you enable the diagnostics channel, with no way to differentiate between messages. |
|
You could just put the instance in the published message so you can check if it matches whichever instance you care about. That's what is typically done with diagnostics_channel. It's meant to be a bag of relevant data that the subscriber can look at to extract as much or as little detail as it needs. Some cases might need to correlate with specific instances, some might not, so they can pick and choose how much of the data they do anything with. And if you really wanted events to only be published for a specific instance, though there's no specific recommendation or precedent for it, you could simply not publish the events from a given instance until some particular flag on that instance has been enabled. That'd move the control of making the data available from the consumer to the producer though, which may not be desirable. |
There was a problem hiding this comment.
There is a real performance impact of registering a trace callback with sqlite3_trace_v2. It should not be done for all databases indiscriminately.
sqlite/sqlite-prepare-select-get-options.js, 30 runs (no listeners for diagnostics channel):
options='returnArrays' statement='SELECT ██| -1.2%
options='readBigInts ███| -1.4%
statement='SELECT * FROM foo LIMIT 1' ta ███| -1.7%
statement='SELECT text_column, integer_c ████| -2.0%
statement='SELECT text_8kb_column FROM f █████| -2.3%
statement='SELECT text_column, integer_c ██████| -2.8%
options='none' statement='SELECT * FROM ██████| -2.8%
statement='SELECT text_column, integer_c ██████| -3.0%
options='readBigInts' statement='SELECT ███████| -3.3%
statement='SELECT text_column FROM foo L ████████| -3.8%
statement='SELECT 1' tableSeedSize=10000 ███████████| -5.6%
Using a diagnostics channel is probably OK as long as there is some way to find the corresponding database for all messages. Right now this is not possible which is problematic.
My philosophy is that the SQLite API of Node.js should mirror that of SQLite itself where it makes sense. Being able to register callbacks directly on database handles matches SQLite more closely. It is also more ergonomic in my opinion.
|
Feel free to add something to diagnostics_channel to support that. It's already tracking the zero-to-one and one-to-zero subscribers transition for the Channel/ActiveChannel prototype swap, it'd be trivial to just attach something to that. |
|
@araujogui With SQLite it is very common to have one open database (active) per user. Or even one database per document. Think about 1000s databases. Opening or removing a diagnostic channel when you are interested in tracing one particular database would be a very slow operation. And you still have the problem that you receive traces from 1000s of databases you are not interested in that you need to filter through. We also don't keep track of all open databases globally (and I don't think we should). That would get even more complicated when allowing databases on different threads. Maybe giving each database its own diagnostics channel is an option? Again, my preference would be to stay close to the SQLite API and go with the a lightweight wrapper on top of |
Why would that be slow? That's specifically what diagnostics_channel is designed for. And like I said before, if you really wanted to you could just have a flag per-database to turn on diagnostics_channel publishing for that specific database instance, so you can selectively enable/disable. You could do something like a two channel set up where you publish to one channel every time a database is constructed, passing in the instance itself. The subscriber could decide if it cares about that instance and set a flag to turn on tracing for that instance and start publishing to the other activity channel. |
Because calling
Yes, this would be a possibility. Although you would need to implement some mechanism so that you can find the corresponding database for a message. There's overhead and complexity involved with that. Can't help but think we're shoehorning something that should be configured locally and per-instance into a global paradigm... |
Why? Does
Why should it be local only? Most consumers of debug/diagnostics data prefer global channels to observe everything. It's much more complicated to try and patch your way to some specific instance than to just filter an event log. In my opinion, there's not a ton of value to exposing it only as a unique API which needs to be called and configured per-instance. Most dev tools will simply not support it if they can't access it via a global sidecar pattern. |
I don't quite understand this question. But the current implementation does not pass the
I think that would give more control to the user. If you want all databases to be traced you could just write a wrapper script for setting up tracing the way you want it. These would be my desiderata:
I don't have a lot of experience with tracing, so I'll take a step back from the discussion to give other people a chance to pitch in (maybe @Renegade334 who initially suggested using a diagnostics channel or other people on the @nodejs/sqlite team). |
Yes, I'm saying it should.
The vast majority of Observability product users do not want this to be a manual process to setup. Needing a wrapper script would mean almost no one would use it.
All that is possible with the two channel form I mentioned. Publish an event when a database is constructed, subscriber sets a flag on databases it is interested in, that turns on tracing for that database and it starts publishing to another channel. But diagnostics_channel is designed to be extremely low-cost, so I think that is likely an unnecessary performance optimization. We only really need a hook to turn tracing on/off broadly depending on if a subscriber exists. |
Yes, but I'm thinking of a scenario where you have one application database and many small user databases (that you do not want to use tracing on).
OK so if I understand it correctly, with such a product you can just plug-and-play. Give it a Node.js process and it will show number of queries per second, average time per query per database, how many rows read over time etc. It's hard for me to imagine why adding a few lines of code is such a huge blocker, but again I don't have a lot of experience with tracing. |
Add
DatabaseSynctrace sql hook for logging and debug