issues/1597: Temperature Monitoring#1876
issues/1597: Temperature Monitoring#1876MitchellThompkins wants to merge 108 commits intounraid:mainfrom
Conversation
…s object for NodeJs
Thanks! Those are all good suggestions. I'll try to knock them out tonight or tomorrow sometime. |
Hmmmm, so I tried to do the gen but get: ~/workspace/api/web > pnpm codegen
> @unraid/web@4.29.2 codegen /home/mthompkins/workspace/api/web
> graphql-codegen --config codegen.ts -r dotenv/config
✔ Parse Configuration
⚠ Generate outputs
❯ Generate to src/composables/gql/
✔ Load GraphQL schemas
✔ Load GraphQL documents
✖ GraphQL Document Validation failed with 5 errors;
Error 0: Unknown type "ConnectSignInInput".
at /home/mthompkins/workspace/api/web/src/store/account.fragment.ts:2:34
Error 1: Cannot query field "connectSignIn" on type "Mutation".
at /home/mthompkins/workspace/api/web/src/store/account.fragment.ts:3:5
Error 2: Cannot query field "connectSignOut" on type "Mutation".
at /home/mthompkins/workspace/api/web/src/store/account.fragment.ts:3:5
Error 3: Cannot query field "cloud" on type "Query". Did you mean "rclone"?
at /home/mthompkins/workspace/api/web/src/store/server.fragment.ts:3:5
Error 4: Unknown type "Cloud". Did you mean "CpuLoad" or "Float"?
at /home/mthompkins/workspace/api/web/src/store/server.fragment.ts:2:28
✖ One or more errors occurred, no files were generated. To allow output on errors, set config.allowPartialOutputs=true
ELIFECYCLE Command failed with exit code 1.And yeah the Here's the machine: ~/workspace/api > cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://gitlab.archlinux.org/groups/archlinux/-/issues"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo |
api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/ipmi_sensors.service.ts
Outdated
Show resolved
Hide resolved
|
|
||
| private async initializeProviders(): Promise<void> { | ||
| // 1. Get sensor specific configs | ||
| const lmSensorsConfig = this.configService.get('api.temperature.sensors.lm_sensors'); |
There was a problem hiding this comment.
Instead, we have created a config file persister that you can use for something like this :)
Ah good catch. I totally didn't appreciate how configs were being passed in. I asked deepwiki where to find the config and it lead me to just pulling the config file directly.
If I did this correctly, it should be addressed with ecffc73 but please double check me.
| export class LmSensorsService implements TemperatureSensorProvider { | ||
| readonly id = 'lm-sensors'; |
| const name = parts[0]; | ||
| const readingParts = parts[1].split(' '); | ||
| const valueStr = readingParts[0]; | ||
| const unitStr = readingParts.slice(1).join(' '); // "degrees C" |
There was a problem hiding this comment.
|
I'll get to the last few comments tomorrow or the next day probably. I'm also hitting an issue where when I try to write my new Before starting API: {
"enabled": true,
"polling_interval": 10000,
"default_unit": "RANKINE",
"history": {
"max_readings": 4,
"retention_ms": 300000
},
"thresholds": {
"cpu_warning": 75,
"cpu_critical": 90,
"disk_warning": 50,
"disk_critical": 60
},
"sensors": {
"lm_sensors": {
"enabled": true,
"config_path": "/etc/sensors3.conf"
},
"smartctl": {
"enabled": true
},
"ipmi": {
"enabled": false
}
}
}
After starting API: root@server:/boot/config/plugins/dynamix.my.servers/configs# cat temperature.json
{
"default_unit": "CELSIUS",
"sensors": {
"lm_sensors": {
"enabled": true,
"config_path": ""
},
"smartctl": {
"enabled": true
},
"ipmi": {
"enabled": true,
"args": []
}
},
"thresholds": {
"cpu_warning": 70,
"cpu_critical": 85,
"disk_warning": 50,
"disk_critical": 60,
"warning": 80,
"critical": 90
}
}Clearly the default is missing fields...but I feel like I have them in the right places so I'll need to figure that out. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts (1)
264-270:⚠️ Potential issue | 🟠 MajorGuard against empty sensors before calling
buildSummary.After filtering on line 264,
sensorscould be empty (if all sensor data was invalid). CallingbuildSummarywith an empty array will throw an error (line 381-383). Add a guard before building the result.🐛 Proposed fix
.filter((s): s is TemperatureSensor => s !== null); + if (sensors.length === 0) { + return null; + } + return { id: 'temperature-metrics', sensors, summary: this.buildSummary(sensors), };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts` around lines 264 - 270, After filtering to get `sensors: TemperatureSensor[]`, guard against the case where `sensors` is empty before calling `this.buildSummary(sensors)`; if empty, return the result object with a safe summary (e.g., null or an empty/default summary shape) instead of invoking `buildSummary` which throws on empty input — update the code that returns the object with id 'temperature-metrics' to check `sensors.length === 0` and supply a safe summary value, referencing the `sensors` variable and the `this.buildSummary` method.
🧹 Nitpick comments (6)
api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/ipmi_sensors.service.ts (1)
21-21: Remove unusedConfigServicedependency.The
ConfigServiceis injected but never used in this service. If it's intended for future configuration (as the comment on line 33 suggests), consider removing it until actually needed to keep the code clean.♻️ Proposed fix
-import { ConfigService } from '@nestjs/config'; - import { execa } from 'execa'; import { @@ -13,10 +11,6 @@ import { export class IpmiSensorsService implements TemperatureSensorProvider { readonly id = 'ipmi-sensors'; private readonly logger = new Logger(IpmiSensorsService.name); private readonly timeoutMs = 3000; - constructor(private readonly configService: ConfigService) {} + constructor() {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/ipmi_sensors.service.ts` at line 21, The ConfigService injected into the IpmiSensorsService constructor is unused; remove the unused dependency and any related import to keep the class clean. Edit the IpmiSensorsService (constructor) to drop the private readonly configService: ConfigService parameter and remove the ConfigService import at the top of ipmi_sensors.service.ts; if there are any comments referencing future use, leave the comment but do not inject ConfigService until it's actually required.api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.module.ts (1)
1-1: Remove the filename comment.As per coding guidelines, comments should only be added when needed for clarity of function. The filename is already evident from the file path.
♻️ Proposed fix
-// temperature/temperature.module.ts import { Module } from '@nestjs/common';As per coding guidelines: Never add comments unless they are needed for clarity of function.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.module.ts` at line 1, Remove the unnecessary top-of-file comment "// temperature/temperature.module.ts" from the module file; simply delete that single-line filename comment so the file contains only real code and meaningful comments (no other changes to exports, imports, or any symbols in the temperature.module.ts file).api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.ts (1)
59-68: Consider using a simpler mock pattern forTemperatureConfigService.Using
Object.create(TemperatureConfigService.prototype)and then assigning methods is unusual. A typed partial mock object would be cleaner and more consistent with the other mocks in this file.♻️ Alternative approach
- temperatureConfigService = Object.create(TemperatureConfigService.prototype); - temperatureConfigService.getConfig = vi.fn().mockReturnValue({ + temperatureConfigService = { + getConfig: vi.fn().mockReturnValue({ default_unit: 'celsius', sensors: { lm_sensors: { enabled: true }, smartctl: { enabled: true }, ipmi: { enabled: false }, }, thresholds: {}, - }); + }), + } as unknown as TemperatureConfigService;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.ts` around lines 59 - 68, Replace the unusual Object.create(TemperatureConfigService.prototype) pattern with a simple typed partial mock for temperatureConfigService: create a plain object literal that implements getConfig as vi.fn().mockReturnValue(...), cast it to Partial<TemperatureConfigService> or TemperatureConfigService as appropriate, and use that mock in tests instead of TemperatureConfigService.prototype to match the other mocks in this file.api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts (3)
28-30: Remove unused cache fields (dead code).The
cacheandcacheTimestampfields are declared but never used. The actual caching logic usesthis.history.getMostRecentReading()instead. These fields should be removed to avoid confusion.🧹 Proposed fix
export class TemperatureService implements OnModuleInit { private readonly logger = new Logger(TemperatureService.name); private availableProviders: TemperatureSensorProvider[] = []; - private cache: TemperatureMetrics | null = null; - private cacheTimestamp = 0; private readonly CACHE_TTL_MS = 1000;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts` around lines 28 - 30, Remove the dead private fields `cache` and `cacheTimestamp` from the TemperatureService class (they're declared but never referenced); locate their declarations (private cache: TemperatureMetrics | null = null; private cacheTimestamp = 0;) and delete them so only the actual caching via `this.history.getMostRecentReading()` remains; ensure no other code references `cache` or `cacheTimestamp` and run type checks to confirm no further fixes are needed.
22-22: Remove unnecessary comment.This comment just states the filename, which provides no additional value. As per coding guidelines: "Never add comments unless they are needed for clarity of function."
🧹 Proposed fix
-// temperature.service.ts `@Injectable`()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts` at line 22, Remove the redundant top-of-file comment that only repeats the filename in temperature.service.ts; open the file and delete the lone comment line "// temperature.service.ts" (or any similar filename-only comment at the top) so the file contains only meaningful code/comments.
333-338: Confusing parameter namesourceUnit.The parameter
sourceUniton line 338 actually receivestargetUnitfrom the caller (line 149). Consider renaming totargetUnitordisplayUnitfor clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts` around lines 333 - 338, The parameter name sourceUnit on the computeStatus function is misleading because callers pass a target/display unit; rename the parameter to targetUnit (or displayUnit) in computeStatus and update its signature and all references inside computeStatus to the new name, then update every call site that passes the unit (e.g., where computeStatus(...) is invoked) to match the renamed parameter; also update any related type annotations or JSDoc for TemperatureUnit to reflect the new parameter name to keep intent clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/src/unraid-api/graph/resolvers/metrics/metrics.resolver.ts`:
- Around line 86-102: The resolver is reading wrong config keys so it always
uses defaults; update the config access in the metrics resolver (where
isTemperatureEnabled, pollingInterval and this.configService are used) to use
the correct keys or inject TemperatureConfigService: either replace
'api.temperature.enabled' and 'api.temperature.polling_interval' with
'temperature.enabled' and 'temperature.polling_interval', or inject
TemperatureConfigService and call its getConfig() to obtain enabled and
polling_interval and use those values when calling
subscriptionTracker.registerTopic (the same place you call
temperatureService.getMetrics and publish to
PUBSUB_CHANNEL.TEMPERATURE_METRICS).
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts`:
- Around line 170-180: getThresholdsForType is incorrectly treating stored
threshold numbers as being in the user's target unit; update
getThresholdsForType so it returns thresholds as-is in Celsius (do not convert
based on targetUnit) and remove the targetUnit parameter from its signature,
then update all callers (e.g., where getThresholdsForType(r.type,
thresholdConfig, targetUnit) is used) to call getThresholdsForType(type,
thresholdConfig) only; keep the convertValue(...) calls that convert from
TemperatureUnit.CELSIUS to the user's targetUnit so displayed thresholds are
converted at the call site, not inside getThresholdsForType.
---
Duplicate comments:
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts`:
- Around line 264-270: After filtering to get `sensors: TemperatureSensor[]`,
guard against the case where `sensors` is empty before calling
`this.buildSummary(sensors)`; if empty, return the result object with a safe
summary (e.g., null or an empty/default summary shape) instead of invoking
`buildSummary` which throws on empty input — update the code that returns the
object with id 'temperature-metrics' to check `sensors.length === 0` and supply
a safe summary value, referencing the `sensors` variable and the
`this.buildSummary` method.
---
Nitpick comments:
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/ipmi_sensors.service.ts`:
- Line 21: The ConfigService injected into the IpmiSensorsService constructor is
unused; remove the unused dependency and any related import to keep the class
clean. Edit the IpmiSensorsService (constructor) to drop the private readonly
configService: ConfigService parameter and remove the ConfigService import at
the top of ipmi_sensors.service.ts; if there are any comments referencing future
use, leave the comment but do not inject ConfigService until it's actually
required.
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.module.ts`:
- Line 1: Remove the unnecessary top-of-file comment "//
temperature/temperature.module.ts" from the module file; simply delete that
single-line filename comment so the file contains only real code and meaningful
comments (no other changes to exports, imports, or any symbols in the
temperature.module.ts file).
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.ts`:
- Around line 59-68: Replace the unusual
Object.create(TemperatureConfigService.prototype) pattern with a simple typed
partial mock for temperatureConfigService: create a plain object literal that
implements getConfig as vi.fn().mockReturnValue(...), cast it to
Partial<TemperatureConfigService> or TemperatureConfigService as appropriate,
and use that mock in tests instead of TemperatureConfigService.prototype to
match the other mocks in this file.
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts`:
- Around line 28-30: Remove the dead private fields `cache` and `cacheTimestamp`
from the TemperatureService class (they're declared but never referenced);
locate their declarations (private cache: TemperatureMetrics | null = null;
private cacheTimestamp = 0;) and delete them so only the actual caching via
`this.history.getMostRecentReading()` remains; ensure no other code references
`cache` or `cacheTimestamp` and run type checks to confirm no further fixes are
needed.
- Line 22: Remove the redundant top-of-file comment that only repeats the
filename in temperature.service.ts; open the file and delete the lone comment
line "// temperature.service.ts" (or any similar filename-only comment at the
top) so the file contains only meaningful code/comments.
- Around line 333-338: The parameter name sourceUnit on the computeStatus
function is misleading because callers pass a target/display unit; rename the
parameter to targetUnit (or displayUnit) in computeStatus and update its
signature and all references inside computeStatus to the new name, then update
every call site that passes the unit (e.g., where computeStatus(...) is invoked)
to match the renamed parameter; also update any related type annotations or
JSDoc for TemperatureUnit to reflect the new parameter name to keep intent
clear.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (11)
api/dev/configs/api.jsonapi/src/unraid-api/graph/resolvers/metrics/metrics.resolver.spec.tsapi/src/unraid-api/graph/resolvers/metrics/metrics.resolver.tsapi/src/unraid-api/graph/resolvers/metrics/temperature/sensors/ipmi_sensors.service.spec.tsapi/src/unraid-api/graph/resolvers/metrics/temperature/sensors/ipmi_sensors.service.tsapi/src/unraid-api/graph/resolvers/metrics/temperature/sensors/lm_sensors.service.tsapi/src/unraid-api/graph/resolvers/metrics/temperature/temperature-config.model.tsapi/src/unraid-api/graph/resolvers/metrics/temperature/temperature-config.service.tsapi/src/unraid-api/graph/resolvers/metrics/temperature/temperature.module.tsapi/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.tsapi/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/lm_sensors.service.ts
api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts
Show resolved
Hide resolved
|
I'll put some time aside on monday or tuesday to pull your code locally and review your changes again. Will also be able to guide you better when I have time to debug. Tomorrow is going to be a bit busy for me. For the pnpm codegen thing, make sure your api dev server is running while you run pnpm codegen. Love that you're using arch :) |
|
Sorry for this wall of text in advance. Regarding the test issues I was havingSo I was having 3 issues:
Regarding (3), this took me forever to find b/c I wasn’t really familiar with the intricacies of
fee7d46 introduced this change and I think it’s in order to handle the case where a config change was made via the API while shutting down and wanting to capture that change, though I'm not 100% sure. As my My change though does cause a unit test failure. Here’s the explicit behavior I observed: Start API service, then edit Restart API service and polling_interval field has reverted: Regarding the codegenI did follow the instructions, however the issue was that Regenerated with 53bf97e... Rest of the commentsI'll get to the rest of your comments this weekend probably, it's mostly just small stuff. Oh also I only added the machine info in case it was something weird with my VM (which is hosted on unraid actually :)). I'm no Arch evangelist lol, I'll use whatever works. |
| */ | ||
| async onModuleDestroy() { | ||
| this.configObserver?.unsubscribe(); | ||
| await this.persist(); |
There was a problem hiding this comment.
@Ajit-Mehrotra this was what I changed to let my config managed by ConfigFilePersister actually persist. Without this change I could not get my changes to actually stick around between API restarts.
However, this change (06df963) is driving a test failure.
I was able to fix the tests with:
- 790d28c
- I think I may have found a bug in the tests that I fixed, though read that commit to make sure I'm not wrong.
- 13ac0b8
- explicitly not test this persist behavior.
All of this said I'm pretty sure that I'm missing something b/c changing this feels antithetical to the purpose of this class. Maybe I'm violation some assumption? Like maybe I'm not supposed to modify configuration json files directly through the terminal?
Previously, the tests were manually emitting an array of changes. When the persister collected these into its own window, the resulting nested data broke its internal validation, preventing the automatic save logic from triggering. I think the tests were only passing because the file was being forced to save during the cleanup phase in ConfigFilePersist in onModuleDestroy(). After removing the automatic save in 06df963, these tests began to fail. With the change here, if I comment out the "unsubscribes from config changes and persists final state" test (which explictly checks onModuleDestory() behavior) these tests pass.
This explicitly changes the expectation for the behavior of onModuleDestory() to _not_ expect a file to persist.
Temperature Monitoring System Feature Request
Description
This PR implements a comprehensive temperature monitoring system integrated into the existing
MetricsResolver. It provides real-time access to temperature data from various system sensors via GraphQL queries and subscriptions.I tested these changes locally using the provided Apollo Server on Unraid
7.2.0.Addresses #1597
Key Features
lm-sensorsintegration.smartctlintegrationsmartctlto leverage json parsing instead of raw strings.ipmitoolsensors (see notes below!).temperaturefield on theMetricsnode.systemMetricsTemperaturesubscription for real-time updates.current,min,max, andhistoryfor each sensor.api.json(enabled status, polling intervals, thresholds).default_unitpreference (Celsius/Fahrenheit) with automatic conversion.Implementation Details
sensors,smartctl,ipmitool) rather than bundling binaries, aligning with the base OS integration strategy. There is no attempt here to package sensor tooling as part of this feature. This is per a conversation on the feature scope with the Unraid team.TemperatureSensorProviderinterface allowing for easy addition of new sensor types.DisksServicewas updated to parsesmartctlJSON output directly, resolving issues with raw string parsing on certain drive models. There was an issue with some Seagate drives reporting raw values with extra data in parentheses:24 (0 14 0 0 0). Parsing the last value in the string returned incorrect data. Parsing with json prevents this.Documentation
docs/developer/temperature.mddetailing configuration options and API usage.Scope Notes
IpmiSensorsServicehas been implemented to parse standardipmitooloutput, but it has not been tested on live hardware as I do not have access to a system with IPMI support. It relies on standardipmitooloutput formats (at least the documentation I saw and what AI told me).WARNINGandCRITICALstatuses based on thresholds but does not currently trigger active system notifications. This type of alert is passive only.Summary by CodeRabbit
New Features