Skip to content

9d planner#3807

Draft
grandixximo wants to merge 101 commits intoLinuxCNC:masterfrom
grandixximo:9d
Draft

9d planner#3807
grandixximo wants to merge 101 commits intoLinuxCNC:masterfrom
grandixximo:9d

Conversation

@grandixximo
Copy link

No description provided.

@petterreinholdtsen
Copy link
Collaborator

Is there a way to add scripts in tests/ to demonstrate that this new code is working?

@BsAtHome
Copy link
Contributor

src/emc/motion/kinematics_params.h has fixed kinematics enum. The problem is that kinematics modules are loadable from anything the user want to make. Having them enumerated creates a problem for those kinematics not know in the default distro that the user adds himself. The loss of generic user-defined kinematics would be a great loss for us all. Or can you still load new modules with any and all TP?

@grandixximo
Copy link
Author

grandixximo commented Feb 16, 2026

src/emc/motion/kinematics_params.h has fixed kinematics enum. The problem is that kinematics modules are loadable from anything the user want to make. Having them enumerated creates a problem for those kinematics not know in the default distro that the user adds himself. The loss of generic user-defined kinematics would be a great loss for us all. Or can you still load new modules with any and all TP?

Planner 0/1 are unaffected, the TP works in Cartesian space and the RT module's kinematicsInverse/kinematicsForward symbols are resolved at load time as usual. Any user-written kinematics module works fine.

Planner 2 has a problem: it needs a userspace reimplementation of each kinematics module's math (for Jacobian computation, joint-space limit enforcement, path sampling). Currently kinematics_params.h enumerates known modules, and kinematicsUserInit() hard-fails for anything not in the list, aborting trajectory init entirely.

Three options to preserve compatibility with custom kinematics in Planner 2:

Fallback to identity kins in userspace, Treat unknown modules as trivkins for the userspace layer. RT still uses the real module. Joint limit enforcement would be approximate but conservative.

Downgrade to planner 0/1, If userspace kins init fails for an unknown module, automatically fall back to planner 1 (or 0) with a warning. Simplest fix, preserves "any kins works with any TP".

Generic RT-userspace bridge, Add a KINS_TYPE_GENERIC that calls the RT module's forward/inverse via shared memory. Correct but slower, and requires a new communication channel.

Which approach would you prefer?

@grandixximo
Copy link
Author

Is there a way to add scripts in tests/ to demonstrate that this new code is working?

The new code is not really done yet, I'll work on tests after things are stable enough, G64 is still not implemented, rigid tapping is missing, adaptive feed not tested, a few more things...

@grandixximo
Copy link
Author

at the moment still hardening the Feed Override system, it is quite complex still have not squashed all possible ways things could go wrong, but getting closer each day...

@BsAtHome
Copy link
Contributor

Preserving the generic plugable nature of kinematics across trajectory planners is a very nice feature and should be preserved if possible.
What is the structural (core) difference between the realtime kinematics and userspace kinematics? Isn't it possible to make the same base code plugable for both realtime and userspace use with the appropriate wrappers? Say, one source compiles both into a realtime module for TP=[0,1] and a userspace module for TP=2.

@grandixximo
Copy link
Author

grandixximo commented Feb 16, 2026

Preserving the generic plugable nature of kinematics across trajectory planners is a very nice feature and should be preserved if possible.
What is the structural (core) difference between the realtime kinematics and userspace kinematics? Isn't it possible to make the same base code plugable for both realtime and userspace use with the appropriate wrappers? Say, one source compiles both into a realtime module for TP=[0,1] and a userspace module for TP=2.

Good point. The math is actually already shared, each kinematics module has a *_math.h header (e.g. 5axiskins_math.h, trtfuncs_math.h) with pure static inline forward/inverse functions and no RT dependencies. Both the RT module and the userspace lib call into these same headers.

What differs is the glue code around the math. The RT side creates HAL pins with hal_pin_float_newf() and reads them by direct pointer dereference, while userspace walks HAL shmem by pin name string through hal_pin_reader. Init is hal_malloc() + switchkinsSetup() + EXPORT_SYMBOL() on the RT side vs calloc() + function pointer dispatch in userspace. Logging is rtapi_print() vs fprintf. Trying to unify these into a single .c would mean heavy #ifdef RTAPI scaffolding around everything except the math, which is already shared.

The real obstacle for custom kinematics in planner 2 isn't math duplication, it's that kinematics_user.c needs to know the module exists at compile time (enum entry, function pointers, HAL pin names for refresh()). A user-written RT module has no matching userspace entry.

A possible path: let custom modules optionally ship a mykins_userspace.so implementing a standard kins_userspace_init() API, which the planner dlopen()s at runtime. That would make planner 2 pluggable the same way RT kins already are, without enumerating every module. If that's too heavy, falling back to planner 1 for unknown modules is the simplest safe option.

If you are able to conjure up a method to make this work, I'd be glad to implement it.

@BsAtHome
Copy link
Contributor

The real obstacle for custom kinematics in planner 2 isn't math duplication, it's that kinematics_user.c needs to know the module exists at compile time (enum entry, function pointers, HAL pin names for refresh()). A user-written RT module has no matching userspace entry.

That is the real problem. You have static enumeration instead of a dynamic plugable system. Why do you need enumeration? Your interface into the kinematics should be generic.

The real question is, when the math is shared, what glue is required for userspace to be usable with the new planner and what is the glue code for realtime.

The glue-code should be the same for each and every kinematics for the interface. Therefore, you only need to devise a way to compile the kinematics modules so they give you two resulting loadable modules, one for realtime and one for userspace.

A possible path: let custom modules optionally ship a mykins_userspace.so implementing a standard kins_userspace_init() API, which the planner dlopen()s at runtime. That would make planner 2 pluggable the same way RT kins already are, without enumerating every module. If that's too heavy, falling back to planner 1 for unknown modules is the simplest safe option.

That is how I think it is supposed to be, yes. Just like the realtime kinematics. You probably only need to be able to load and not to unload modules (the kinematics is set in a configuration file and cannot be changed at run-time).

If you are able to conjure up a method to make this work, I'd be glad to implement it.

Have a look at rtapi/uspace_rtapi_app.cc, which uses dlopen() to load RT modules. The same procedure can be used for userspace modules. It might even be possible to reuse some of the current infrastructure. The system works by the loaded module registering/announcing itself. Example: RT modules have rtapi_app_main()/rtapi_app_exit() functions that are called on load/unload which will registers/unregister with HAL.

A similar strategy will also work for userspace kinematics.

@rmu75
Copy link
Collaborator

rmu75 commented Feb 16, 2026

Would it be possible to pass an ID value to the kinematics modules and use that instead of the enum? Then users could configure / customize it .

@grandixximo
Copy link
Author

Deep into hardening feed override system, I'll have a better look at the kins probably tomorrow, thank you for the input, I'll try my best to make it work, I'm sure there is a possible approach.

@grandixximo
Copy link
Author

grandixximo commented Feb 16, 2026

The last PR addresses all jerk spikes I was able to find, my testing method was running gcodes with small segments, at high feed rate, and having a script wildly swing the feed override, the feed override hand-off branching system is now basically bullet proof as far as I've tested, and I have tested it a lot.
There was a lot of thought and multiple refinements put into it, last two or three weeks its all I've been working on basically, pretty proud of the result, it is basically 4 - 5k lines of code, but covers proactively every single way you could make the system fail, the only draw back is, there is a delay 50 - 100ms before the feed rate change takes actual effect, but it is inherent to the architecture I've envisioned for planner type 2, and it is something that an operator would barely feel, the only weakness is probably adaptive feed override, but will have to be tested, I think I'll take a look again at the kins before moving to blending.

@grandixximo
Copy link
Author

grandixximo commented Feb 17, 2026

Implemented the dlopen plugin approach. Here's what changed:

The kinematics_type_id_t enum and map_kinsname_to_type_id() are gone entirely. No IDs, no dispatch switch. The module name string is the identity — it maps directly to _userspace.so.

Each kinematics module now ships a small plugin (50-150 lines) that exports one symbol: kins_userspace_setup(). The loader does dlopen(EMC2_HOME "/lib/kinematics/" name "_userspace.so"), calls setup, and the plugin sets its forward/inverse/refresh function pointers. Built-in and custom modules are loaded identically.

The 17 built-in kinematics were extracted into self-contained .c files under plugins/. They reuse the existing *_math.h headers (pure math, no HAL deps) — same shared code that RT uses. The glue is minimal: read params from ctx->params, call the math function, done.

If planner 2 is requested but the plugin .so doesn't exist (custom kins without a userspace plugin), it warns and falls back to planner 0 instead of aborting. Custom kins still work fine on planners 0/1 as before — they just won't get planner 2 until they add a _userspace.so.

kinematics_user.c went from ~1500 lines to ~280. The shared memory struct changed (removed type_id field)

@BsAtHome
Copy link
Contributor

Good to hear you implemented dlopen().

But I'm still not sure why you moved the actual forward/reverse kinematics calculation into a *_math.h header file. That seems to defeat the one source file and two glues. Header files are usually a very bad place for code. Header files are there as an interface layer. Sure, using "static inline" qualifiers makes them local, but that is, IMO, a very bad habit.

What I had expected was:

  • sources:
    • some_kinematics.c (with actual calculation) --> compile to some_kinematics.o
    • hal_kins_glue.c --> compile to hal_kins_glue.o
    • nonrt_kins_glue.c --> compile to nonrt_kins_glue.o
  • link some_kinematics.o + hal_kins_glue.o --> some_kins_hal.so (hal kinematics component)
  • link some_kinematics.o + nonrt_kins_glue.o --> some_kins_nonrt.so (your non-RT userspace kinematics module)

Or is the *_math.h header a remnant from the previous code iteration?

@rmu75
Copy link
Collaborator

rmu75 commented Feb 17, 2026

In C, code in header files is indeed not a common practice.

If we don't want to abandon RTAI just yet, I think it is usually not possible to link one object file to a kernel module and to a normal program. IMO #inlcude-ing the code is not a bad idea in this case, also keeps the build system out of the loop.

The sources to be included could be renamed to a different ending like .inc.

@BsAtHome
Copy link
Contributor

In C, code in header files is indeed not a common practice.
If we don't want to abandon RTAI just yet, I think it is usually not possible to link one object file to a kernel module and to a normal program. IMO #inlcude-ing the code is not a bad idea in this case, also keeps the build system out of the loop.

Creating a .ko can be done from multiple .o objects, just like creating an .so can be created from multiple .o objects. There is no difference afaics. Only the userspace/kernelspace interface/glue layer is different, which is done in rtapi. I only propose to add some glue to differentiate between linking RT and non-RT kinematics modules.

I don't think the non-RT kinematics can run in kernel space. @grandixximo must pitch in here to make that assessment whether the non-RT kinematics could ever be a kernel module.

The sources to be included could be renamed to a different ending like .inc.

I don't think we should be using this type of code inclusion at all.

And for RTAI, it seems that development has stopped completely. I'm not sure it is worth the effort to keep it in very much longer. There is already a lot that does not work with RTAI anyway.

@grandixximo
Copy link
Author

I don't think they can be a kernel module, because they run on a userspace thread, but I'm no expert, and have not really explored this deeply yet.
The separation was because kinematics modules were deeply linked with their variable parameters like pivot lenghts which they can get from hal pins.
I did not find another clean method to reuse the same code, without making a bunch of ifndef everywhere, in my opinion complicating things. So I opted for extracting the math, and having userspace and RT both include the same math, and from uspace get the pin values according to what pins are created by the kinematics component in RT, it's just what I thought would work the best maintenance wise, trying not to duplicate too much code. About RTAI I don't have hardware that I can reliably test RTAI on, I could never get it to run reliably enough, and I'm using ethercat heavily and RTAI does not support ethercat as far as I know, so RTAI is something I would need someone to test it for me on real hardware.
I'll look into it a bit more, if you guys prefer ifndef and a single file, I can do that.

@rmu75
Copy link
Collaborator

rmu75 commented Feb 17, 2026

I have no problem if kernel-mode stuff is abandoned, but then it should be stated, and all that C / C++ schisma could be resolved / isn't needed in new code, so the whole split would be pointless.

If we want to keep kernel support for now, linking one object into userspace and kernel objects is of course possible in theory, but it is asking for trouble.

math stuff is handled differently for one, you can't just include <math.h> in kernel code, rtapi_math.h has conditional compilation depending on KERNEL or not. It may work to link stuff compiled against the "wrong" prototypes, but that is a hack at best. There may be other problems like LTO, autovectorization, calling conventions, frame pointer and in worst case it would break on "wrong" kernel configs. I don't think it's worth it just to get rid of a #include that would be unclean under normal circumstances.

@grandixximo
Copy link
Author

I had a better look at this. I considered the single .o approach, but the *_math.h pattern has advantages, for example in BUILD_SYS=normal (kernel), RT objects are compiled with -nostdinc and kernel includes, so the same .o can't serve both contexts. The math headers work for both build systems with zero #ifdef. They could be renamed to .inc if the .h extension bothers, but static inline functions in headers is the same pattern the Linux kernel uses extensively (list.h, rbtree.h, etc.), so unless the kernel also has bad habits, I think it's fine.

@grandixximo
Copy link
Author

I might be wrong about this, but I explored this for a while before settling on the shared header approach. The alternative would be splitting each *_math.h into a .h (prototypes) and .c (implementation), then compiling the .c twice with different flags and updating the link rules for every module. It's doable but adds significant Makefile complexity for the same result. If you'd prefer that approach I can implement it.

@BsAtHome
Copy link
Contributor

If I understand it correctly, the loadable kinematics module is not going into the same process space for RT (rtai_app) and non-RT (milltask?). When your new TP cannot load into the kernel (RTAI) then we do not need to consider that option too seriously, just enough to bypass in compilation.

In the case of uspace, why can't the same kinematics .so be loaded into two different processes and perform their specific function in the process' context?

The motion controller links directly into the kinematics{Forward,Reverse} functions, which means that the kinematics .so must be loaded before the controller's .so to satisfy the dynamic linking process. If you also export appropriate functions for your non-RT process hook, then you could, in principle, load the same .so in both processes and have it perform the kinematics there too.

Or am I missing something here?

@grandixximo
Copy link
Author

If I understand it correctly, the loadable kinematics module is not going into the same process space for RT (rtai_app) and non-RT (milltask?). When your new TP cannot load into the kernel (RTAI) then we do not need to consider that option too seriously, just enough to bypass in compilation.

In the case of uspace, why can't the same kinematics .so be loaded into two different processes and perform their specific function in the process' context?

The motion controller links directly into the kinematics{Forward,Reverse} functions, which means that the kinematics .so must be loaded before the controller's .so to satisfy the dynamic linking process. If you also export appropriate functions for your non-RT process hook, then you could, in principle, load the same .so in both processes and have it perform the kinematics there too.

Or am I missing something here?

The RT .so (e.g., maxkins.so) does hal_init() + hal_pin_new() in rtapi_app_main(), and kinematicsForward() reads params directly from HAL pin pointers. Loading the same .so in a second process would either conflict on hal_init() or need runtime detection to skip it and read parameters differently. The separate userspace plugin avoids that, it reads HAL pin values through a read-only interface without registering as a HAL component.

@BsAtHome
Copy link
Contributor

BsAtHome commented Feb 17, 2026

Afaik, only when you run halcmd loadrt <module> (or call the equivalent internally) it will call rtapi_app_main() (see rtapi/uspace_rtapi_app.c:301 where it calls the start function, which was acquired in line 285).

But, you don't need to call rtapi_app_main() at all when you yourself do the dlopen(). A call to dlopen() will do nothing more than resolve the dynamic link dependencies. Adding RTLD_LOCAL will prevent exporting any symbols from the loaded .so and the only way to get to them is to use dlsym(). You don't even need worry or care about the kinematicsForward and kinematicsReverse symbols (functions).

You can simply split the mathematics inside the kinematics source and implement and export, lets say, as an example, nonrt_kinematicsForward and nonrt_kinematicsReverse from the kinematics file and then find these symbols using dlsym(). Your functions don't need to hook into HAL if you don't want them to.

You can also prevent your functions from being exported in a kernel build simply by placing the definition and EXPORT_SYMBOL() invocations in a #ifndef __KERNEL__ conditional. More should not be required.

@grandixximo
Copy link
Author

grandixximo commented Feb 17, 2026

You're right that dlopen() alone won't call rtapi_app_main(), confirmed. Both approaches work, so here's a comparison from a maintenance perspective:

Current approach (math headers + separate plugins):

Math extracted into *_math.h, RT modules and userspace plugins both include it
RT modules are thin wrappers: read HAL pins -> fill params struct -> call math
Plugins are thin wrappers: read params from shared memory -> call math
(+) Each file has a single responsibility, easy to review in isolation
(+) Custom modules just ship an extra _userspace.so,no special exports required
(+) No #ifdef conditionals
(-) More files (math header + plugin per module)
(-) Param struct defined twice (math header + kinematics_params.h)
(-) static inline in headers is unconventional for C
Your proposal (nonrt exports from the RT .so):

Math stays in the .c file, nonrt_* functions exported alongside RT functions
Userspace planner does dlopen() on the RT .so + dlsym() for nonrt_*
(+) Fewer files, everything for one module in one place
(+) No param struct duplication
(+) Conventional C (code in .c files, not headers)
(-) Every RT module needs #ifndef KERNEL blocks for the nonrt_* exports
(-) Custom modules must implement both interfaces in one file
(-) RT and userspace concerns mixed in the same source file
I'm fine with either approach. The math separation was the hard part and that's done regardless of which way the glue is structured. What's your preference?

@BsAtHome
Copy link
Contributor

(-) Every RT module needs #ifndef __KERNEL__ blocks for the nonrt_* exports

That is completely optional. You are allowed to export the non-RT functions and they will simply go unused and fill a marginal amount of space. No problem with that. As long as there are no dyn-link refs, but that is a naming question. You only need to make sure that it links, which could mean the requirement of a few stubs. Although, the code can be designed that no or only few stubs are required.
And then, considering that the use of #ifndef __KERNEL__ is already widespread, I cannot see any problem with that. It will vanish when kernel-mode is dropped anyway.

(-) Custom modules must implement both interfaces in one file

That I see as an advantage because the actual kinematics is in one file. You can reuse code more effectively. You do have to choose carefully what the interface does. You do not want to replicate code from higher layers in the modules.

(-) RT and userspace concerns mixed in the same source file

That is a general issue in all of the components already because of the kernel/userspace boundary. The RT/non-RT boundary is easier to handle. Just make your code run as RT, then it should also run as non-RT.

I can't imagine that your use of the kinematics calculations changes its actual behaviour in any meaningful way. Or does it? If not, then it should be a moot issue.

The biggest advantage here is that the changeset should be easier to understand and people with their own kinematics component can add/change their code to work with the new way a bit easier. I guess a "how to migrate kinematics components" document would be required in any circumstance.

@rmu75
Copy link
Collaborator

rmu75 commented Feb 17, 2026

Kernel modules are pretty restricted in what they can do, whereas in the userspace realtime thread nearly everything is allowed, including C++, exceptions, etc...

Things with non-constant upper limit of runtime should be avoided though, and code should only access data and code that is locked and can't be evicted or paged out. Shared memory segment and stack is OK, dynamic memory probably not. The rtapi glue code memlocks all code, but probably not stuff that you dlopen somewhere in a module, so that should be checked.

@grandixximo
Copy link
Author

Agreed, I will go ahead with refactoring, thank you for the guidance.

Kernel modules are pretty restricted in what they can do, whereas in the userspace realtime thread nearly everything is allowed, including C++, exceptions, etc...

Things with non-constant upper limit of runtime should be avoided though, and code should only access data and code that is locked and can't be evicted or paged out. Shared memory segment and stack is OK, dynamic memory probably not. The rtapi glue code memlocks all code, but probably not stuff that you dlopen somewhere in a module, so that should be checked.

The dlopen() of the RT .so happens in milltask (non-RT), not in the servo thread.

@grandixximo
Copy link
Author

refactored the Kinematics, much cleaner approach, thank you @BsAtHome for the guidance

@grandixximo grandixximo force-pushed the 9d branch 2 times, most recently from 19663a5 to 3d04de5 Compare February 21, 2026 07:39
COMPLETED
=========

Architecture:
  * Dual-layer: userspace planning, RT execution
  * Lock-free SPSC queue with atomic operations
  * 9D vector abstractions (lines work, arcs TODO)
  * Backward velocity pass optimizer
  * Peak smoothing algorithm
  * Atomic state sharing between layers

Critical Fixes:
  * Optimizer now updates `tc->finalvel` (prevents velocity discontinuities)
  * Force exact stop mode (`TC_TERM_COND_STOP`) - no blending yet
  * RT loop calls `tpRunCycle()` every cycle (fixes 92% done bug)
  * Error handling uses proper `rtapi_print_msg()` instead of `printf()`

Verified Working:
  * Simple linear G-code completes (squares, rectangles)
  * Acceleration stays within INI limits during normal motion
  * No blend spikes (fixed)

KNOWN LIMITATIONS
=================

E-stop: 3x acceleration spike
  - Tormach has identical behavior (checked their code)
  - Industry standard for emergency stops
  - Safety requirement: immediate response
  - Acceptable for Phase 0

No Blending: Exact stop at every corner
  - Expected - Phase 4 feature
  - Prevents acceleration spikes without blend geometry

No Arcs: G2/G3 not implemented
  - Not needed for Phase 0 validation
  - `tpAddCircle_9D()` stub exists

Feed Override: Abrupt changes
  - Predictive handoff needed (Phase 3)
  - Works, just not smooth

FUTURE PHASES
=============

Phase 1: Kinematics in userspace
Phase 2: Ruckig S-curve integration
Phase 3: Predictive handoff, time-based buffering
Phase 4: Bezier blend geometry
Phase 5: Hardening, edge cases
Phase 6: Cleanup

FILES MODIFIED
==============

Core Planning:
  src/emc/motion_planning/motion_planning_9d.cc - Optimizer
  src/emc/motion_planning/motion_planning_9d_userspace.cc - Segment queueing
  src/emc/motion_planning/motion_planning_9d.hh - Interface

RT Layer:
  src/emc/motion/control.c - RT control loop fix
  src/emc/motion/command.c - Mode transitions
  src/emc/tp/tp.c - Apply optimized velocities
  src/emc/tp/tcq.c - Lock-free queue operations

Infrastructure:
  src/emc/motion/atomic_9d.h - SPSC atomics
  src/emc/tp/tc_9d.c/h - 9D vector math
  src/emc/tp/tc_types.h - Shared data structures

TEST
====

G21 G90 F1000
G1 X10 Y0
G1 X10 Y10
G1 X0 Y10
G1 X0 Y0
M2

Expected: Jerky motion (exact stop), completes without errors.
@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 7, 2026

Sorry for the messy first attempt, didn't mean to tangle up things.

Well, things always start out messy before they become clean and shiny ;-)

I've reworked it along the lines you sketched. The new approach:
Adds hal_struct_newf() / hal_struct_attach() / hal_struct_detach() to hal.h / hal_lib.c as a proper public API.

That is good.

No new shmem segment,the blob lives inside the existing HAL shmem block, registered as a HAL_RO s32 param so it's findable by name.

The shmem is good, but the param is a problem. There are actually multiple problems in your approach. There is no param that holds the struct, thereby both polluting the namespace and misrepresenting the actual content. There is also a plan in motion to get rid of params completely.

You should not be able to "see" the struct when you search for a param (or a pin or a signal). These are separate namespaces. You need to create a namespace for structs. In that namespace you can also track how many are attached, which would make detach a valid operation.

The one minor divergence from your proposal: hal_struct_newf takes a comp_id as first arg, needed because the underlying hal_param_s32_new requires an owner component for lifecycle tracking.

That should be fine, but the struct lifecycle then.

Replace the param-backed implementation with a proper struct namespace
inside HAL shmem.  hal_struct_newf now maintains its own linked list
(struct_list_ptr / struct_free_ptr in hal_data_t), sorted by name,
entirely separate from pins, signals, and parameters.

hal_struct_attach increments attach_count; hal_struct_detach decrements
it, making detach a meaningful operation.  No hal_param_s32_new call
anywhere in the new path.

HAL_VER bumped to 0x11 for the hal_data_t layout change.
@grandixximo
Copy link
Author

@BsAtHome Good points, thanks for the review. Reworked:

The struct namespace is now a proper first-class citizen in hal_data_t alongside comp_list, pin_list, sig_list, param_list. hal_struct_newf inserts into struct_list_ptr (sorted by name, shmalloc_dn for the entry, shmalloc_up for the blob), no hal_param_s32_new anywhere. hal_struct_attach increments attach_count and hal_struct_detach decrements it, so detach is meaningful. HAL_VER bumped to 0x11 for the layout change. Structs are invisible to halcmd show pin/param/sig.

@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 7, 2026

Very good. Note that there is something wrong with indentation of your code. What happened there? Copying the faulty indentation of the other functions seems very wrong.

When detaching, you should return an error when detaching from a struct with a zero attach count and not ignore it.

Otherwise, looks fine. You might want to consider updating the utilities, but that does not seem to be an issue too urgent. Not all hal data is exposed in the utilities anyway, but then,... nice to have some peeking abilities ;-)

@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 7, 2026

A few more things.

You don't seem to release the data memory when you have a duplicate name detection (in the insertion loop).

Pins an params check for component not ready on creation. You don't do that check in new struct. Not sure whether it should matter. But it may be something to consider to have done because you probably should not create these structures on the fly. Unless you have a counter argument.

There is no corresponding hal_struct_delete or remove. Should it be there as well? Not sure.

You should add documentation with the proper man3 man-page(s).

Four fixes from review:

- Fix indentation: new functions used 4-space instead of tabs
- Add comp->ready check in hal_struct_newf, consistent with hal_param_new
  and hal_pin_new
- Check for duplicate name before shmalloc_up: the bump allocator has no
  free, so allocating then detecting a duplicate would silently leak shmem
- hal_struct_detach now returns -EINVAL when attach_count is already 0
  instead of silently ignoring the over-detach
@grandixximo
Copy link
Author

Indentation was tabs throughout, the editor must have substituted spaces when I inserted the block. Corrected.

comp->ready check added, consistent with hal_param_new and hal_pin_new.

Duplicate detection moved before shmalloc_up. The bump allocator has no free, so allocating and then hitting the duplicate branch would silently leak shmem. The insertion loop now does a combined find-and-position pass, then allocates only if the name is unique.

hal_struct_detach now returns -EINVAL on over-detach.

On hal_struct_delete: since shmalloc_up has no corresponding free, deleting can only remove the hal_struct_entry_t from the list and return it to the free list. The data blob stays in the bump allocator forever, same as any hal_malloc allocation. Given that pins and params also have no delete, I would leave it out for now. The intended lifecycle is create-once at module init. Happy to add it if you feel it belongs.

Man pages and halcmd show noted, will add.

- docs/man/man3/hal_struct_newf.3: full man page for hal_struct_newf,
  hal_struct_attach, hal_struct_detach with SYNOPSIS, ARGUMENTS,
  DESCRIPTION, RETURN VALUE, EXAMPLE, and SEE ALSO sections
- hal_struct_attach.3, hal_struct_detach.3: .so redirects to main page
- halcmd_commands.cc: add print_struct_info() and print_struct_names()
  following the print_param_info/print_thread_info pattern; wire into
  do_show_cmd (show struct, bare show, show all) and do_list_cmd
- halcmd_completion.c: add "struct" to show_table and list_table
@grandixximo
Copy link
Author

@BsAtHome
Man pages added as docs/man/man3/hal_struct_newf.3 (with .so redirects for hal_struct_attach.3 and hal_struct_detach.3), following the existing man3 format. halcmd show struct and list struct are now wired in, with tab completion for both.

Replace the hand-written troff files in docs/man/man3/ with a proper
AsciiDoc source at docs/src/man/man3/hal_struct_newf.3.adoc, matching
the format used by all other HAL man pages.  Register the new page in
docs/po4a.cfg so the build generates the troff and translations from it.
@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 8, 2026

In your man-page example, the pointers to the shared blob should be marked volatile (on both RT and non-RT sides), as in:

static volatile my_params_t *params;

The changes in the struct are non-observable and that means the compiler cannot and must not assume that the value of the last read or write is still valid (the other side may have changed them in the meantime). Hence, volatile.

@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 8, 2026

Something entirely different...
You say to reuse the kinematics you made this interface, but the kinematics live in RT and your new planner in non-RT. How does this work? The values in RT are only updated once every servo-thread cycle. I assume you need many kinematics calculations for planning, not just one every servo-thread cycle.

For that matter, I don't understand why the TP is in RT at all. The RT stuff should only be bothered with joints and not with axes. So why are axes pushed into RT, converted into joints using the kinematics and then converted into movement of the joints? When all of TP lives in non-RT, then the only thing you need to push into RT (motion control) are the actual commands to move joints within the bounds of position, velocity, acceleration and jerk, which all are pre-calculated in non-RT. Am I missing something here?

Two root causes addressed:

1. Backward-pass missing kink constraint (motion_planning_9d.cc):
   The backward pass computed prev_tc's final_vel without applying
   tc->kink_vel (the junction kink at the prev_tc→tc boundary).
   Only prev_tc->kink_vel (its own entry kink) was applied, leaving
   the predecessor free to exit faster than the downstream junction
   allows.  Add tc->kink_vel as Constraint 4 in
   tpComputeOptimalVelocity_9D so the predecessor's exit is always
   capped to the physical junction limit.

2. Stale-feed profile v0 mismatch at handoff (tp.c, tc_types.h):
   When feed override changes after a profile is written but before
   RT reaches that segment, profile.v[0] reflects the old feed while
   the actual junction velocity reflects the new feed.  tpUpdateCycle
   samples the profile at t=0, snapping currentvel to the stale v0.
   Fix: stamp each profile with written_at_feed (the committed feed
   at write time).  At split-cycle handoff, when the feed drift
   exceeds 5%, clamp nexttc->currentvel to the physical junction_vel
   and correct progress/position_base proportionally.  Userspace
   re-converges a fresh profile within 1-2 cycles.
@grandixximo
Copy link
Author

grandixximo commented Mar 8, 2026

Something entirely different... You say to reuse the kinematics you made this interface, but the kinematics live in RT and your new planner in non-RT. How does this work? The values in RT are only updated once every servo-thread cycle. I assume you need many kinematics calculations for planning, not just one every servo-thread cycle.

the kinematics math functions (kinematicsForward/kinematicsInverse) are pure math, no HAL pins, no RT dependencies. We dlopen the RT kinematics module directly into the userspace planner process and call those functions as many times as needed per planning cycle. The shmem block that RT updates every servo cycle carries only slowly-changing configuration (pivot length, joint mapping, etc.), not the per-call inputs. So userspace can run thousands of kinematics calls per planning cycle using those parameters as constants.

For that matter, I don't understand why the TP is in RT at all. The RT stuff should only be bothered with joints and not with axes. So why are axes pushed into RT, converted into joints using the kinematics and then converted into movement of the joints? When all of TP lives in non-RT, then the only thing you need to push into RT (motion control) are the actual commands to move joints within the bounds of position, velocity, acceleration and jerk, which all are pre-calculated in non-RT. Am I missing something here?

You're not missing anything, your intuition is correct. The ideal architecture is exactly what you describe, userspace owns all geometry and kinematics, RT only receives pre-computed joint commands. Planner 2 moves in that direction, but it's constrained by the existing LinuxCNC RT infrastructure: jogging, mode switching, and the cubic interpolator in control.c all assume axis-space commands arrive at the RT boundary every servo cycle. As long as planner 0/1 and jogging need to coexist with planner 2, that boundary can't easily move. So planner 2 pushes all the planning work to userspace but still hands RT an axis-space position each servo cycle, kinematicsInverse still runs in RT. It's a modernization within those constraints, not a clean-sheet redesign.

To give you a bit more architectural context:

Why kinematics and axis-space in userspace

The userspace planner needs kinematics for two reasons. First, to know the velocity and acceleration limits that apply along a given path. These are naturally expressed in axis/world space (mm/s along the tool path), and mapping them to joint space correctly requires knowing the kinematics Jacobian along the path. Second, because G-code commands are in axis space, which is what the machine is actually asked to do at the requested feed rate. Planning in joint space from the start would mean losing direct connection to those commanded feeds and tolerances.

Why we didn't redesign the full architecture

You're right that a clean design would have userspace own all geometry and kinematics, with RT only receiving pre-computed joint commands. Planner 2 moves in that direction but deliberately stayed within the bounds of the existing LinuxCNC RT infrastructure, so jogging, mode switching, and planner 0/1 compatibility all continue to work without touching control.c or the servo loop. Whether a deeper restructuring is worth pursuing is really a question for the maintainers. We would be open to that discussion.

Joint limits are enforced as estimates, not guarantees

The userspace planner does compute proper Jacobian-based joint limits: it samples the path at multiple points, projects the path tangent through the kinematics Jacobian at each sample, and derives worst-case joint velocity, acceleration, and jerk limits to cap the segment profile. But this is inherently an approximation. The Jacobian changes continuously along the path, and between sample points violations can slip through. The planner does not verify the full joint trajectory at sub-millisecond resolution, and RT does not re-check limits either. So joint limits in planner 2 are enforced in a conservative best-effort sense, not with the strictness a fully joint-space planner would provide.

Known current limitations of planner 2

A few things worth being aware of:

Feed changes are not instantaneous. The planner uses a convergence gate: when a feed override change arrives, it re-optimizes the queued segments and only commits the new feed once the profiles have converged to a safe depth. This adds latency, typically a fraction of a second under normal conditions, but longer with many short segments.

No spindle sync support yet. Position-synchronized moves (rigid tapping, threading) are not implemented in planner 2. Those fall back to planner 0.

Switching kinematics mid-execution is problematic. If kinematics are switched (e.g. via HAL switchkins) while planner 2 is running, the planner has no way to anticipate the change. It simply uses whatever kinematics module is currently loaded. The velocity limits and joint-space constraints will be wrong for any segments already planned under the old kinematics. A G-code-level kinematics switch (something like G12.1) could help here by giving the planner a synchronization point, but that is a future topic.

Planner switching mid-execution is similarly rough: planners 0, 1, and 2 use different limit awareness and different profile representations, so switching between them during a program will produce discontinuities in velocity behavior.

Mark shared blob pointers as volatile in the example code on both RT and
userspace sides, with a note that volatile alone does not provide memory
ordering guarantees and that atomics are needed for sequence-lock fields.

Add hal_struct_attach/detach/newf.3 to docs/man/.gitignore so the
asciidoc-generated man pages are not reported as untracked.
@grandixximo
Copy link
Author

In your man-page example, the pointers to the shared blob should be marked volatile (on both RT and non-RT sides), as in:

static volatile my_params_t *params;

The changes in the struct are non-observable and that means the compiler cannot and must not assume that the value of the last read or write is still valid (the other side may have changed them in the meantime). Hence, volatile.

fixed in last PR, thank you for keeping an eye out, appreciate it

@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 8, 2026

Thanks for your insights. It confirms my long held believes that TP should get removed from RT.

That brings me to an obvious conclusion. You are trying to work around the problems that the split brain from TP in both RT and non-RT gives you. Then why not start a slightly different place and first remove TP 0/1 from RT completely and move it into non-RT and adapt a new flexible infrastructure? Then, you can add the new 9D TP much more easily in the infrastructure that has become available.

The taskintf (emcmot) boundary between task(non-RT) and motion controller (RT) needs to be redefined, in part, for this to work. There are two types of communication packets afaics: asynchronous and synchronous. Joint moves from the planner are generally asynchronous. Canned cycles, probes and such are synchronous because the upper layers need to wait before it can proceed. All axis-referencing stuff needs to be moved out of RT. Then you have, of course, aborts and other messages, but these are now also handled and should be portable to a new infrastructure. The non-RT/RT command-queue may now become deeper and even split between in-sequence and out-of-sequence commands. (well, haven't exactly thought all details through, but you get my idea).

Such an approach would make it easier to proceed into the future. If you agree, maybe we should take this up for a management decision.

@grandixximo
Copy link
Author

I think we may eventually land on that, but I'm not so sure is part of this PR, I don't think I currently have a grasp on the total linuxcnc architecture to be able to even start where you are saying I should.

Remove TP 0/1 from RT completely and move it into non-RT

The problem here is that TP0 is what everything has been running on for ages, and I don't think I'm remotely qualified to attempt such a refactor, the current 9D Planner PR draft, is more of a playground for me, moving the Kins into userpace, ruckig, and bezier, and the gated handoff feed override system, are my experimental attempts at solving the TP, putting it here has had great effect, as with your guidance I think the Kinematics can make it into the main working branch that YangYang manages, and then we can make a real PR with what actually seems to be working well.

The major issue with TP2 is feed override, TP0 can currently feed override on the fly inherently I believe because of the architecture, if we move TP0 into non-RT, for feed override there are three inputs: the operator override, feed_hold (a HAL pin), and adaptive_feed (also a HAL pin, intended to be driven by hardware at RT speed, plasma torch height controllers, digitizing probes, etc.). Right now these are read every servo cycle inside the RT loop and applied immediately. If TP moves to userspace, adaptive_feed in particular loses its RT-speed response unless you keep a lightweight RT velocity scaler that trims joint commands after they leave userspace. That's essentially what planner 2's gated handoff is trying to manage, and it's non-trivial to get right.

@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 8, 2026

I think we may eventually land on that, but I'm not so sure is part of this PR, I don't think I currently have a grasp on the total linuxcnc architecture to be able to even start where you are saying I should.

Well, it is a different project, but it might provide better progression for us all in the long run.

There is probably no one left to grasp all the complexities and interactions anymore. I've been building knowledge as part of fixing bugs, writing drivers and rewriting old code.

Remove TP 0/1 from RT completely and move it into non-RT
The problem here is that TP0 is what everything has been running on for ages, and I don't think I'm remotely qualified to attempt such a refactor

There are no qualified people. This is years of work of many hands and minds. No single one has a complete view anymore. Sometimes you need to bite the bullet for progress ;-)

The major issue with TP2 is feed override, TP0 can currently feed override on the fly inherently I believe because of the architecture

Just a moment, feed override is a scaling factor (from the instantaneous value on which all calculations are based) and a linear operation in world-coordinates. The kinematics are supposed to be linear operations too, in either direction, which should mean that joint-coordinate space has a similar scaling factor. All precomputed segments' velocities would become scaled by a factor. That, of course, can generate velocity/acceleration/jerk violations, but that info should already be available for a move in joint-space (and that is RT) and therefore be able signal/prevent the violation.

Where am I off base here?

@grandixximo
Copy link
Author

Where am I off base here?

You are not off base. Feed override is indeed a linear scale in world-space, and for a linear kinematics (trivkins) it propagates cleanly to joint-space as the same scalar. For non-trivkins it is more subtle because the Jacobian is configuration-dependent, in other words, the same world-space velocity requires different joint velocities at different configurations along the path. But your point that the joint-space limits are "already available in RT" is the key insight.

The way I see it playing out if TP0/1 moved to userspace, to be taken with a bucket of salt:

The userspace planner computes segments and pre-calculates, per segment, the worst-case joint velocity and acceleration limits via Jacobian sampling along the path (this is already what planner 2 does). Those limits get stored with each segment. At 100% feed override, the planned velocity already respects those limits. Feed override above 100% is then a question of how much headroom exists between the planned velocity and those pre-computed joint limits.

For the instantaneous feed override signal (especially adaptive_feed which is driven by hardware at RT speed) you would want a lightweight RT component that reads the current joint positions every servo cycle, computes the Jacobian, and derives the maximum safe velocity scaling factor for the current configuration. This acts as a real-time clamp on the feed override value before it gets applied to the queued joint commands. The planner doesn't need to know about it, it just sees that the executed velocity was lower than commanded, which it already handles via the convergence mechanism.

So the architecture could be: userspace planner pre-computes joint-space limits per segment, plus a small RT Jacobian-based velocity governor that clips instantaneous feed override to what the current configuration can actually sustain. These are complementary, the planner handles the planned case, the RT governor handles the real-time feed override transients.

The current situation in TP0 with non-trivkins is actually worse than this, joint limits are not enforced by the TP at all. The servo drives clip excess velocity silently, and if the following error grows large enough, it triggers an estop. So a properly designed non-RT TP with Jacobian-aware limits would strictly improve on what exists today (at least on the non-trivkins respecting limits front), even before adding the RT governor.

But what I am most worried about is that introducing this might break some behavior that I am not privy to. It may seem sound in theory, but you don't really know that it actually works until you have it running in production.

@mika4128
Copy link

mika4128 commented Mar 9, 2026

In my opinion, it is not ideal to house the planning module within the task thread. I propose creating a dedicated planner_thread (potentially implemented via pthread) to handle TP planning and look-ahead operations. The primary advantage of using pthread is low-latency response, as it decouples the process from HAL cycle polling.

@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 9, 2026

You are not off base. Feed override is indeed a linear scale in world-space, and for a linear kinematics (trivkins) it propagates cleanly to joint-space as the same scalar. For non-trivkins it is more subtle because the Jacobian is configuration-dependent, in other words, the same world-space velocity requires different joint velocities at different configurations along the path. But your point that the joint-space limits are "already available in RT" is the key insight.

I'm a not an expert in non-trivial kinematics, so this is me being on thin ice here...

As I understand it, in non-trivial kinematics the velocity can become a curve. That my mind can grasp. But shouldn't that curve adhere to scaling then? The problem is that you suddenly need the absolute max of the velocity curve ($$max(|velocity()|)$$), but that can also be pre-calculated for the segment.

[snip]

But what I am most worried about is that introducing this might break some behavior that I am not privy to. It may seem sound in theory, but you don't really know that it actually works until you have it running in production.

Well, it should be tested, of course. But it is the strategy that needs to be laid out first. As a second that includes how to test it with all the corner cases.

@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 9, 2026

In my opinion, it is not ideal to house the planning module within the task thread. I propose creating a dedicated planner_thread (potentially implemented via pthread) to handle TP planning and look-ahead operations. The primary advantage of using pthread is low-latency response, as it decouples the process from HAL cycle polling.

It may be strategically sound to offload the planner to a separate thread. But you need to ensure synchronization. This may get complicated accommodating different interfaces (gcode from program or MDI). But it should be possible to do nicely.

Currently, the TP is limited to run in the servo-thread, which usually runs at 1kHz. Doing more fancy stuff can take (a lot) longer and may therefore also interfere with the (usual 1kHz) task thread. Therefore, it may be a requirement to offload the calculations into a separate thread.

@rmu75
Copy link
Collaborator

rmu75 commented Mar 9, 2026 via email

@grandixximo
Copy link
Author

As I understand it, in non-trivial kinematics the velocity can become a curve. That my mind can grasp. But shouldn't that curve adhere to scaling then? The problem is that you suddenly need the absolute max of the velocity curve ( m a x ( | v e l o c i t y ( ) | ) ), but that can also be pre-calculated for the segment.

I think you might be right, and we just need to scale and respect limits. Once we talk joints with RT, the Jacobian is implicit in the inverse kinematics. I was complicating things because current planner 2 has to output axis space.

The one remaining caveat is adaptive_feed, the HAL pin driven by hardware at RT speed (plasma THC, digitizing probe, etc.). That can change the feed scale instantaneously mid-segment, faster than any userspace planner can react. For that specific case you still want a simple RT clamp: min(override, joint_vel_limit / max_joint_vel_segment). No Jacobian computation needed in RT, just the pre-computed per-segment maximum joint velocity stored at planning time.

@BsAtHome
Copy link
Contributor

BsAtHome commented Mar 9, 2026

Well, my thought has always been: the velocity override problem is effectively "a matter to time".
When you scale the velocity to 50%, then you scale time to progress slower. When you override velocity to 200% you tell time to speed up by a factor of two. That time factor should be the same in any kinematics domain and that would suggest linearity across domains.

If so, then any externally commanded change in velocity, from user or HAL, can be expressed as a factor and applied. Then you just need to assure to remain inside the set bounds and limits.

@andypugh
Copy link
Collaborator

andypugh commented Mar 9, 2026

The userspace planner computes segments and pre-calculates, per segment, the worst-case joint velocity and acceleration limits via Jacobian sampling along the path (this is already what planner 2 does).

When you say "Jacobian sampling" do you mean numerical differentiation of the last N points on the trajectory, transformed via the kinematics modules between jouint and cartesian space? This can work with existing kinematics modules, and should be exact within the limits of double-precision as we are working with mathematical values, not experimental / measured data.

@andypugh
Copy link
Collaborator

andypugh commented Mar 9, 2026

AFAIK the current TP (to the extent it follows acceleration limits at all) plans movements to stay within limits even for max feed override

I think this is correct. It's also something we should stress more in the docs, as configuring a system to accept 100x feed-override will actually make it run poorly (excessively conservatively) at 1x feed override.

@grandixximo
Copy link
Author

When you say "Jacobian sampling" do you mean numerical differentiation of the last N points on the trajectory, transformed via the kinematics modules between jouint and cartesian space? This can work with existing kinematics modules, and should be exact within the limits of double-precision as we are working with mathematical values, not experimental / measured data.

yes, exactly that, and it works with any existing kinematics module without modification since it only requires kinematicsInverse as a callable function.

@grandixximo
Copy link
Author

AFAIK the current TP (to the extent it follows acceleration limits at all) plans movements to stay within limits even for max feed override

I think this is correct. It's also something we should stress more in the docs, as configuring a system to accept 100x feed-override will actually make it run poorly (excessively conservatively) at 1x feed override.

the current TP0 does account for max feed override, just not against the right limits for non-trivkins, it does axis instead of joints. Still a proper implementation would take both and use the more restrictive one.
min(axis_limit * maxFeedScale, joint_limit_via_jacobian * maxFeedScale)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants