Merged
Conversation
Kashargul
commented
Jan 20, 2026
itsmeow
approved these changes
Jan 21, 2026
Contributor
itsmeow
left a comment
There was a problem hiding this comment.
Very thorough and helpful! This is the kind of thing I was lacking experience in when it comes to parallelism and multi threading, so thank you for taking the time to work this out.
Contributor
|
Can you post a patch file for the DM changes, since the API is codebase dependent? |
Contributor
Author
diff --git a/code/controllers/subsystem/asset_loading.dm b/code/controllers/subsystem/asset_loading.dm
index 30c3078a0d9..d8139caf148 100644
--- a/code/controllers/subsystem/asset_loading.dm
+++ b/code/controllers/subsystem/asset_loading.dm
@@ -7,6 +7,7 @@ SUBSYSTEM_DEF(asset_loading)
flags = SS_NO_INIT | SS_HIBERNATE
runlevels = RUNLEVEL_LOBBY|RUNLEVELS_DEFAULT
var/list/datum/asset/generate_queue = list()
+ var/assets_generating = 0
var/last_queue_len = 0
/datum/controller/subsystem/asset_loading/PreInit()
@@ -19,14 +20,17 @@ SUBSYSTEM_DEF(asset_loading)
while(length(generate_queue))
var/datum/asset/to_load = generate_queue[generate_queue.len]
+
+ last_queue_len = length(generate_queue)
+ generate_queue.len--
+
to_load.queued_generation()
if(MC_TICK_CHECK)
return
- last_queue_len = length(generate_queue)
- generate_queue.len--
+
// We just emptied the queue
- if(last_queue_len && !length(generate_queue))
+ if(last_queue_len && !length(generate_queue) && !assets_generating)
// Clean up cached icons, freeing memory.
rustg_iconforge_cleanup()
diff --git a/code/modules/asset_cache/spritesheet/batched/batched_spritesheet.dm b/code/modules/asset_cache/spritesheet/batched/batched_spritesheet.dm
index a5a636327f5..23c116536af 100644
--- a/code/modules/asset_cache/spritesheet/batched/batched_spritesheet.dm
+++ b/code/modules/asset_cache/spritesheet/batched/batched_spritesheet.dm
@@ -46,6 +46,7 @@
var/cache_dmi_hashes_json = null
/// Used to prevent async cache refresh jobs from looping on failure.
var/cache_result = null
+ var/getting_genned = FALSE
/datum/asset/spritesheet_batched/proc/should_load_immediately()
#ifdef DO_NOT_DEFER_ASSETS
@@ -168,6 +169,9 @@
if(!length(entries))
CRASH("Spritesheet [name] ([type]) is empty! What are you doing?")
+ if(getting_genned)
+ stack_trace("Spritesheet batching has been called twice. This is illegal!")
+
if(isnull(entries_json))
entries_json = json_encode(entries)
@@ -189,8 +193,12 @@
var/data_out
if(yield || !isnull(job_id))
if(isnull(job_id))
+ getting_genned = TRUE
+ SSasset_loading.assets_generating++
job_id = rustg_iconforge_generate_async("data/spritesheets/", name, entries_json, do_cache, FALSE, TRUE)
UNTIL((data_out = rustg_iconforge_check(job_id)) != RUSTG_JOB_NO_RESULTS_YET)
+ getting_genned = FALSE
+ SSasset_loading.assets_generating--
else
data_out = rustg_iconforge_generate("data/spritesheets/", name, entries_json, do_cache, FALSE, TRUE)
if (data_out == RUSTG_JOB_ERROR)
@@ -232,7 +240,7 @@
CRASH("Error during spritesheet generation for [name]: [data["error"]]")
/datum/asset/spritesheet_batched/queued_generation()
- realize_spritesheets(yield = TRUE)
+ INVOKE_ASYNC(src, PROC_REF(realize_spritesheets), TRUE) // The proc is called inside a subsystem and waits with an UNTIL
/datum/asset/spritesheet_batched/ensure_ready()
if(!fully_generated) |
Absolucy
suggested changes
Jan 26, 2026
ZeWaka
approved these changes
Feb 12, 2026
Collaborator
ZeWaka
left a comment
There was a problem hiding this comment.
Looks all correct to me.
Talked to Kash and this is all backwards-compatible, will include DM patch snippet on what to fix in the release notes, with reminders in the next major as well.
itsmeow
pushed a commit
to BeeStation/rust-g
that referenced
this pull request
Feb 26, 2026
itsmeow
added a commit
to BeeStation/rust-g
that referenced
this pull request
Feb 26, 2026
* fixes the clippy lints on nightly (tgstation#215) * Adds a new time function, `formatted_timestamp` (tgstation#214) * dmi_create_png is rgba instead of rgb (tgstation#217) * add useragent to byond installer * crate internal updates (tgstation#218) * v3.9.0 (tgstation#219) * Enable pathfinder by default (tgstation#220) * Fix clippy URL lifetime lint (tgstation#223) * Improve DM test functionality on Windows, fix regressions (tgstation#221) * Adds `roll_dice`, an advanced xdy dice roller. (tgstation#216) Co-authored-by: Kapu1178 <75460809+Kapu1178@users.noreply.github.com> * v3.11.0 (tgstation#224) * Fix TOML dme test failing (tgstation#227) * hash: optimize file hashing (tgstation#228) * IconForge: DMI Generation, Code Reorganization, Improved Caching, Cleaner I/O (tgstation#213) * Adds a new `uuid` module (tgstation#229) * IconForge: BYOND Parity + Tests, Optimizations, New Transforms (tgstation#230) * `clippy` lints for tgstation#230 (tgstation#233) * Adds `http_request_fire_and_forget` (tgstation#232) * DMI metadata reading and injection (tgstation#234) * 4.0.0 (tgstation#235) * dmi: Add QR code generation (tgstation#226) * Bump `dmi` to 0.5.0, optimize `dmi_read_metadata` (tgstation#238) * Update CI to target 516.1666 (tgstation#237) * 4.1.0 (tgstation#239) * makes `rustg_noise_poisson_map` around 8x faster (tgstation#240) * IconForge: Headless Icon Generation (tgstation#236) * `cargo update` & `cargo upgrade` & `png` fix (tgstation#241) * assorted optimizations to `cellularnoise`, `dbpnoise`, and `worleynoise` (tgstation#243) * v4.2.0 (tgstation#242) * Fix attempt for CI * hash: Adds ChaCha20 CSPRNG functions, updates TOTP generator, adds Base32 (tgstation#225) * Handle errors in panic hook explicitly (tgstation#245) * do not store Git information static (tgstation#247) * Prevent `decode_base64` and `decode_base32` from panicking if given invalid base64 (tgstation#244) * 4.3.0 (tgstation#249) * fix iconforge generate_headless on win for bad paths not panicing (tgstation#248) * 5.0.0 (tgstation#250) * redo iconforge errors - 5.0.1 (tgstation#252) * 6.0.0 - drop windows 7 support (tgstation#251) * fix spritesheet gen on linux (tgstation#255) * cargo update and clippy (tgstation#253) * 6.0.1 - and `cargo update` (tgstation#256) --------- Co-authored-by: Lucy <lucy@absolucy.moe> Co-authored-by: TiviPlus <57223640+TiviPlus@users.noreply.github.com> Co-authored-by: ZeWaka <zewakagamer@gmail.com> Co-authored-by: Mothblocks <35135081+Mothblocks@users.noreply.github.com> Co-authored-by: Comrade Niobe <126028983+ComradeNiobe@users.noreply.github.com> Co-authored-by: Kapu1178 <75460809+Kapu1178@users.noreply.github.com> Co-authored-by: Y0SH1M4S73R <y0sh1m4s73r@gmail.com> Co-authored-by: Ivy <distributivgesetz93@gmail.com> Co-authored-by: Zephyr <12817816+ZephyrTFA@users.noreply.github.com> Co-authored-by: Kashargul <144968721+Kashargul@users.noreply.github.com>
flleeppyy
pushed a commit
to Monkestation/Monkestation2.0
that referenced
this pull request
Mar 5, 2026
Updates rust-g to 6.0.1, including the DM-side iconforge fixes from tgstation/rust-g#255
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The suffering with spritesheet generation and iconforge in three acts.
Act1: The DM side mishandling in a subsystem
Quite soon after porting and implementing icon forge, the first errors showed up:
[2025-08-08T18:17:20] Runtime in code/modules/asset_cache/iconforge/batched_spritesheet.dm,200: Spritesheet robot_icons_miner UNKNOWN ERROR: NO SUCH JOBAt that time, little was known what a box of pandora this error would open up, but let's go through.
https://discord.com/channels/484170914754330625/485190298050363393/1403446760504033281
While discussing the issue on coderbus, sophie gave the right hint about assets being invoked multiple times. Let's break down what happened..
code\controllers\subsystems\asset_loading.dm
code\modules\asset_cache\iconforge\batched_spritesheet.dm
If we look at the original code, we were calling the asset generation in sync, our UNTIL proc now slept the subsystem while the asset was generated. This was our first "race condition". How, well we had a subsystem that was sleeping and could be woken again by the MC. This led to the same asset being called multiple times. Not to mention the wrong tick check, which was also placed quite badly... it alone could lead to double executions in case we ever hit a TICK_CHECK and jumped out.
The fix and what followed:
code\controllers\subsystems\asset_loading.dm
In the subsystem, a counter was added to count the currently externally generated assets, so that we won't clean up too early.
Our actual spritesheet generation was invoked asynchronously to our subsystem to prevent bad duplicated calls. If someone wants to test this on the old code, just add the getting_genned variable as implemented above, the old subsystem will cause those crashes if a codebase has many or large assets.
The box of pandora... We now had the cases that we were firing rust-g jobs faster to rust-g than they would finish. Rust-g works with threads, this led to parallel execution of the asset generation.
Why does this only show on linux. In short, linux handles read/write accesses as well as CWD differently than windows. Or one could say windows hosters were just lucky this time.
Act2: parallel spritesheet generation
Let's look into what happened.
Runtime in code/modules/asset_cache/iconforge/batched_spritesheet.dm, line 239: Error during spritesheet generation for vending: Failed to open DMI 'icons/obj/lighters.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/chaplain.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/inventory/suit/item.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/inventory/suit/item.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/weapons.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/food.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/ecig.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/lighters.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/pdrink.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/pdrink.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/pdrink.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/food.dmi' - No such file or directory (os error 2) Failed to open DMI 'icons/obj/chaplain.dmi' - No such file or directory (os error 2) Image was requested but does not exist in the cache. It's likely that the icon state doesn't exist: UniversalIcon(icon_file=icons/obj/chaplain.dmi, icon_state=mrobe, dir=Some(2), frame=None) - while generating 'godfig' No such file or directory (os error 2)Our filepath to dmi function is neither thread safe not directory safe as it relies on cwd. More about the cwd issues further down. This can cause errors on file accesses in heavy parallel workloads.
A simple test which fails quite reliably, especially for file access, we don't want parallel access to the same file.
Multiple sheets getting genned always could call the same icon, they then called the same, global cache which can lead to a race condition. This entire function needs to be able to handle parallel access.
I'm also not sure why there were locks over entire, large parallel workloads... the locks should be as local as possible to utilize the parallel computation properly...
Anyway, let's go on, this later down the line led to the following error. Which always was present, just so rare that it often slipped compared to the above one.
Act3: current working directory on linux
Error during write:
Runtime in code/modules/asset_cache/iconforge/batched_spritesheet.dm, line 239: Error during spritesheet generation for robot_icons_gravekeeper: Failed to save PNG file 'data/spritesheets/robot_icons_gravekeeper_120x120.png': No such file or directory (os error 2)What we have here, is a common case of a CWD issue. Icon job operations all depend on the current working directory. This can change under linux at any time. So our assumption of the path might no longer be the same once execution finishes and our write fails. This is something that likely should be changed for all of those jobs. The path should be pinned once. It's static in our use case anyway.
Some more comments later in the PR code itself on each section.