-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remaining issues with RDNA3 and 0.5.2 (kernel 6.7) #255
Comments
|
Glad to hear about the power limit, hopefully that's coming soon. |
If you can manage to replicate the proper "static speed" behaviour using a curve (by having all of its points on the same speed), then please tell me what that curve looks like. The current implementation uses a single minimum temperature point and fills the rest with the maximum, it might perform differently if the curve is configured in some other way. You can check the actual curve that's applied in:
|
Sorry, didn't get around to this until now. |
I have a 7800XT and have tested controlling the fan curves. If you remain below a usage limit on the GPU, fan control does nothing. Only once you have higher usage (when booting up a game e.g.) the GPU\ applies your fan curves. This seems to be a hardware/driver issue that LACT has no control over. Run a game in the background, then play with your fan curves, that's what worked for me |
On my XFX SPEEDSTER MERC 310 AMD Radeon RX 7900 XT the fan speed is correctly read. Setting static fan speed or a curve doesn't error, but wont work. Same with tuxclocker. This works:
But its not possible to change the mode to manual for static fan speed:
debug.tgz PS: I had to |
Setting a curve through lact should use exactly the same commands, can you check how the contents of
This is expected, the hwmon interface is readonly on RDNA3
You should add your user's group to the start of |
nvm, can't get it working again
LACT:
I got it to spin up once a few days ago, but it was weird and didn't seem right...
maaaaan, I thought everythings implemented on 6.7 >_< |
This isn't a missing feature, it's a change in how the GPU firmware works, you're supposed to use the new fan_curve and target temperature/speed interfaces instead of it. |
There have been some updates regarding the power limit setting in kernel 6.7.3:
It's worth checking if that helps with the incorrect limit |
But there is problem with OC in general. When you enable static FAN it breaks OC settings being saved, they reset back to stock. OC wise it is still a mess. While Kernel 6.8 allows you to set the right power limit now, it uses it in weird fashion and breaks clocking higher, thus you get slower performance. |
I have 7900 XTX on Arch Linux. Power limit works but fan control doesn't. System still turns on/off the fan at built-in card thresholds instead of the custom curve I set up using the LACT GUI. I have both tried the curve and static. No changes to the fan speed at all. Any recommendations? Note that the OC is enabled, system rebooted, and I do have the kernel parameter Debug file: LACT-sysfs-snapshot-20240223-193349.zip |
Unfortunately there isn't anything you can do about this currently. It will use your custom settings after it crosses the threshold, but you cannot configure this threshold. |
Thanks for responding to my message. Does this mean Lact will never work for my card? Or is it something fixed? |
If the driver adds support for configuring this, then LACT will have an option for it. |
Thank you again. Much appreciate your time replying to me. If you don't mind, one final question, what's the best way to find out if the driver will add a support for configuring fan curves? Should I follow the linux kernel updates? I found the following which appears to be adding fan control support for RDNA3 cards. Not sure why mine still doesn't work though. https://lore.kernel.org/lkml/CAPM=9txd+1FtqU-R_8Zr_UePUzu7QUWsDBV1syKBo16v_gx2XQ@mail.gmail.com/
|
Fan control itself is supported, the card will use your custom fan speed settings, but only after a builtin threshold when the fan gets turned on - that's the part you cannot currently configure. As for updates: kernel changelog will have info about it if something changes, you can also track these issues in amd's repo: |
Oh I see it now. That makes sense. I was wondering why the fan speed goes up and down randomly. So it does recognize my custom curve but still tied to the built in thresholds. Thanks for the explanation. I will check out the links you included. |
XFX merc 310 7900XTX Any change to FAN settings results in Input/Output error and failure to change settings again, but the following manual script works. Although I would prefer GUI to be fixed. #!/usr/bin/bash
GPU_DEVICE="/sys/class/drm/card1/device"
GPU_SYSFS_FAN="$GPU_DEVICE/gpu_od/fan_ctrl"
GPU_SYSFS_HWMON="$GPU_DEVICE/hwmon/hwmon0"
POWER_LIMIT=402 # watts
GPU_FAN_CURVE="$GPU_SYSFS_FAN/fan_curve"
GPU_FAN_CURVE_0="0 30 15"
GPU_FAN_CURVE_1="1 40 30"
GPU_FAN_CURVE_2="2 50 60"
GPU_FAN_CURVE_3="3 60 70"
GPU_FAN_CURVE_4="4 75 100"
GPU_FAN_TARGET="$GPU_SYSFS_FAN/fan_target_temperature"
GPU_FAN_TARGET_TEMP="85"
echo "Setting fan curve"
echo "$GPU_FAN_CURVE_0" > "$GPU_FAN_CURVE"
echo "$GPU_FAN_CURVE_1" > "$GPU_FAN_CURVE"
echo "$GPU_FAN_CURVE_2" > "$GPU_FAN_CURVE"
echo "$GPU_FAN_CURVE_3" > "$GPU_FAN_CURVE"
echo "$GPU_FAN_CURVE_4" > "$GPU_FAN_CURVE"
echo "c" > "$GPU_FAN_CURVE"
echo "Committed fan curve"
echo "Setting power limit"
echo "$((POWER_LIMIT * 1000000))" > "$GPU_SYSFS_HWMON/power1_cap"
echo "Comitted power limit"
cat $GPU_SYSFS_FAN/fan_curve
|
After some debugging problematic line of code is this, ignoring error here fixes the issue on 7900 XTX. I would try to prepare some patch or workaround, but I'm not sure how this will impact RDNA2 or older cards. Lines 378 to 381 in f55db81
|
@In-line could you post the full error that happens when you try to apply settings as well as your |
@ilya-zlobintsev Already fixed it myself in #279 |
Uhm I have the problem, regardless of game I start with LACT, my GPU is stuck at 100% usage. The GPU Clock is kind of "locked around 2200 Mhz and the current stays around 750mV. Once I change ANY setting and apply it while the game is running that "lock" is lifted and the GPU seems to ignore any settings made with LACT. |
@dinotheextinct More info please. Kernel version, mesa version, LACT version, distribution, etc.. |
Is the info not in the sysfs snapshot? |
Kernel 6.8.8-1-default |
@dinotheextinct You're using 0.5.3 version of the LACT. RX 7900 has known problems in it, update to the last version. This is what I fetched from info.json in sysfs-snapshot.
|
sorry I just updated it, the issue is exactly the same after updating, I just attached the sysfs snapshot again, but like I said issue is the same: |
Hello, I wanted to add some information, I don't know if it will be useful, I bought an AMD 7800XT Sapphire Nitro+ graphics card but I can't control the fans, I noticed that there is a problem where other users also reported this situation, I wanted to leave some information, but I'm new to this and I don't know exactly how to do it, I'll put as much information as possible here. Thankyou I am using the BazziteOS operating system ostree-image-signed:docker://ghcr.io/ublue-os/bazzite:stable |
@JosephM0on please check if the problem is resolved in the test build |
@ilya-zlobintsev Sorry for the delay in responding, I'm just leaving work now, I'll check as soon as possible, Thank you |
I performed the following steps It looks like it's still the same, do I need to do any more steps? |
Could you be more specific about "can't control the fans"? There's a known limitation to rdna3 fan control, is that the problem?
|
Haven't checked on this in a while, glad to hear that someone else also cares about this problem. @ilya-zlobintsev what do you mean by "unconfigurable"? Is it being worked on or abandoned? I know firsthand that it is configurable on Windows with AMD's official software (and the power limit is also correct for non-stock cards). |
Unconfigurable means that the GPU keeps the fan off below a certain temperature, even with a custom curve. There have been mentions of it on the drm issue tracker, but i don't believe there was any recent activity regarding this unfortunately. |
Sorry, I didn't know that this was a limitation, I thought it was possible to control the speed of the fans, since and when the values are entered, as for example in the image I placed in my last comment. |
I dont know if this is related to this issue, but when maxing my power limit through lact (402w), my average power draw lands around 350w instead of 402w under full load. This isnt the case when using corectrl. When setting the power limit to 402, my 7900XTX draws 402w on average under full load. As consequence of this behavior, my overclock on my 7900XTX pulse becomes unstable under lact when compared to corectrl, which leads to artifacting in certain games. This would be my system:
|
I've upgraded to a Sapphire Nitro+ 7800XT today. This thread really help relieve my worries that this card might be broken, because the fans seemed to do weird things :D I can confirm the problem that the fans start at some temperature which the card has in its firmware. The point at which the fans start is definitely not related to power draw or usage, but to one of the three temperatures (junction / mem / normal) - I'll test around to see when my card turns its fans off and on and share some results. Changing the power limit seems to work fine for me. At least its set without errors. I cant check this well (physically), because at 100% gpu util, the card newer pulls more than 190W. What I did notice: There has to be some reporting error in the clock speeds or voltage:
Fiddling around with the clock speeds on "Automatic", it appears that 2.15 GHz lead to ~145W power draw. So there really is some clock speed reporting error in the "Lowest Clocks" Mode. I tested this in the pause menu of horizon zero dawn which actually keeps on rendering the still image over and over. Update1Overclocking way above 190W doesnt work for my card. Even though "Minimum GPU Clock" and "Maximum GPU Clock" can be set up to 5GHz (at least the slider goes to these values), setting MIN Clock above 2.565GHz results in an error. At this clock speed my GPU consumes just slightly more than 190W, but not enough to reliably test if applying the power target works. Update2After a lot of testing, I am 100% certain that my GPU uses the "junction" temperature as the determining factor for fan speed. This would explain why my fan settings felt so weird - junction behaves non-linearly to the edge temp but is always at least 8°C hotter. ADDITIONALLY the fans turn off at ~55°C junction but turn back on at ~65°C. This behavior is so annoying, because I can hear the fans ramp up in 20s intervals in some games. Cant wait for AMD to give us full access to the fans. Maybe a fan curve thats controlled by the average wattage of the last 5s would be more intuitive? |
@SeekerOfAsh could you please make a:
And link both of them here to see the difference in how the settings are applied. |
Hi, I'll try to test once I have the time. But it seems the bug is not unique to LACT and instead its the kernel driver misbehaving. Corectrl exhibits the same issue when a profile is activated, resulting in an average power draw of only 350w. In order for the correct power limit to be active, I need to apply a different power limit and then change back to 402w (max power limit) while the profile is active. This results in the correct average power draw of 402w under 100% load. |
Apparently someone made a patch that adds a zero RPM setting: https://gitlab.freedesktop.org/drm/amd/-/issues/3489#note_2626120 Hopefully it's upstreamed, then this can become a setting in LACT. |
reading through this issue and https://gitlab.freedesktop.org/drm/amd/-/issues/2356 it sounds like power limit overclocking should be fully functional on the latest 6.12 kernel? however on my Asrock Challenger 7900 GRE I can only set the power limit to 280 while on windows it can be set to 300w is this expected? |
Note that I do have the kernel parameter
amdgpu.ppfeaturemask=0xffffffff
.sensors
also shows 0).The text was updated successfully, but these errors were encountered: