Skip to content

fix: resolve server crash and hang causes (signal handler, buffer overflow, race conditions)#11

Merged
m1ngsama merged 1 commit intomainfrom
fix/stability-crashes
Mar 5, 2026
Merged

fix: resolve server crash and hang causes (signal handler, buffer overflow, race conditions)#11
m1ngsama merged 1 commit intomainfrom
fix/stability-crashes

Conversation

@m1ngsama
Copy link
Owner

@m1ngsama m1ngsama commented Mar 5, 2026

Closes #10

Summary

Five bugs confirmed by static analysis of the full source, all causing crashes or permanent hangs in production.

Changes

1. Signal handler deadlock — main.c

signal_handler called room_destroy() (which acquires pthread_rwlock_wrlock) and printf() — neither is async-signal-safe. If SIGTERM arrived while any thread held g_room->lock, the process deadlocked.

Also: close(g_listen_fd) was closing stdin (fd 0) because ssh_server_init returns 0 on success, not a real fd.

Fix: handler writes via write(2) (async-signal-safe) and calls _exit(0). Remove the bogus close(0).

2. NULL dereference in room_broadcastchat_room.c

client_t **clients_copy = calloc(room->client_count, sizeof(client_t*));
memcpy(clients_copy, ...);  // crash when client_count==0 and calloc returns NULL

POSIX allows calloc(0, n) to return NULL. No NULL check for the OOM case either.

Fix: early return when count == 0; check calloc return.

3. Stack buffer overflow in tui_render_screentui.c

char buffer[8192] is overflowed with tall terminals:
197 lines × ~1031 bytes/msg ≈ 203 KiB. Title padding loop also had unchecked buffer[pos++].

Fix: malloc(65536) with buf_size used consistently throughout. Bounds check on padding loop.

4. sleep(2) inside libssh auth callback — ssh_server.c

auth_password is invoked from ssh_event_dopoll in the main thread. Sleeping there blocked the entire accept loop — one attacker with repeated wrong passwords stalled all connections for 2s per attempt. IP blocking via record_auth_failure already handles brute force.

Fix: remove sleep(2) from auth_password.

5. Spurious sleep() in accept loop error paths — ssh_server.c

sleep(1)/sleep(2) after rejecting rate-limited connections blocked accepting the next legitimate connection with no benefit.

Fix: remove all sleep() from accept loop error paths.

Test plan

  • Build passes with no new warnings: make clean && make
  • Server starts and stays up under systemctl
  • Multiple simultaneous connections work correctly
  • SIGTERM causes clean exit (no deadlock): kill -TERM $(pidof tnt)
  • Auth failures don't stall other connections

Fixes #10.

Five bugs that caused the server to crash or become unresponsive:

1. Signal handler deadlock (main.c)
   signal_handler called room_destroy (pthread_rwlock + free) and printf —
   neither is async-signal-safe. If SIGTERM arrived while any thread held
   g_room->lock, the process deadlocked permanently.
   Fix: handler now only writes a message via write(2) and calls _exit(0).
   Also remove close(g_listen_fd) which was closing stdin (fd 0), since
   ssh_server_init returns 0 on success, not a real file descriptor.

2. NULL dereference in room_broadcast when room is empty (chat_room.c)
   calloc(0, n) may return NULL per POSIX; memcpy on NULL is undefined.
   Also: no NULL check after calloc for the OOM case.
   Fix: early return if count == 0; check calloc return value.

3. Stack buffer overflow in tui_render_screen (tui.c)
   char buffer[8192] overflows with tall terminals: 197 visible lines *
   ~1031 bytes/message ≈ 203 KiB. Title padding loop also lacked a
   bounds check (buffer[pos++] = ' ' with no guard).
   Fix: switch to malloc(65536) with buf_size used consistently.
   Add bounds check to the title padding loop.

4. sleep() inside libssh auth callback (ssh_server.c)
   auth_password is called from ssh_event_dopoll in the main thread.
   sleep(2) there blocks the entire accept loop — one attacker with
   repeated wrong passwords stalls all incoming connections.
   IP blocking via record_auth_failure already handles brute force.
   Fix: remove sleep(2) from auth_password.

5. Spurious sleep() calls in the main accept loop (ssh_server.c)
   sleep(1/2) after rejecting rate-limited or over-limit connections
   delays accepting the next legitimate connection for no benefit.
   Fix: remove all sleep() from the accept loop error paths.
@m1ngsama m1ngsama merged commit e3e1486 into main Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: server crashes and hangs in production (signal handler, buffer overflow, race conditions)

1 participant