Undervolting support #236

Evernow · 2022-05-18T04:20:29Z

Evernow
May 18, 2022

Is your feature request related to a problem? Please describe.
On Windows it is possible to use third party utilities like MSI Afterburner to undervolt GPUs, on Linux this functionality does not exist, even through nvidia-smi and nvidia-settings after being removed. Utilities such as gwe also cannot support this due to it not being exposed: https://gitlab.com/leinardi/gwe/-/issues/118

Describe the solution you'd like
Offer the functionality to be able to undervolt supported GPUs

YusufKhan-gamedev · 2022-05-20T21:04:22Z

YusufKhan-gamedev
May 20, 2022

If your running Turing or Volta then your best bet currently is to use nouveau, it will set the clocks to the lowest it can!

0 replies

philipl · 2022-09-05T02:06:26Z

philipl
Sep 5, 2022

You actually can under-volt on Linux. It's just that the way you do it is not what you'd have guessed. This post describes the process, but the basic idea is that due to how overclocking works on modern GPUs, overclocking is undervolting.

The GPU has a set of pre-defined voltage levels, and each voltage level has a range of GPU speeds - when the speed moves outside of the range, the voltage is moved up or down accordingly. So when you overclock, you are increasing the allowed GPU speeds for each voltage level - that means that for a given MHz, the voltage will be lower - that's undervolting.

The next observation is that these GPUs have a power limit (eg: 350W for a 3090 FE). So the GPU will run as fast as it can within that power limit. So, if you overclock, which is undervolting, you will be running at a higher speed when you hit the power limit - unless you overclock too much and the GPU crashes first. But this means you will not save any power by undervolting if you don't do something about the power. You'll just run faster.

To control power, you have two mechanisms - you can reduce the max speed and you can reduce the power limit, and both have their uses.

Setting the power limit: If you reduce the power limit (eg: 300W instead of 350W in my example), then you will obviously save power, and this will happen whether you overclock/undervolt or not. But when you hit the power limit, it will be at some speed and GPU behaviour at the power limit is often spiky with the speed changing constantly as it bumps up and down. Overclocking/undervolting lets you reach a higher speed before you hit the power limit.

Setting the max speed: On the other hand, if you set the max speed, then the GPU will not go any faster, regardless of the power limit. This gives you a smooth constant speed but perhaps leaves some performance on the table - as the power consumption at max speed varies, depending on the work the GPU is asked to do. So some 10 year old game can hit max speed at 200W while a brand new one might hit 300W, etc.

If your goal is to reduce power consumption without giving up performance, then the general process is to run whatever you consider your representative workload to be (your most resource intensive game, etc). While it's running observe what the max speed reached is before/at which you hit the power limit. You can use nvidia-smi or nvtop for this. You can take that as your target max speed to maintain performance.

Now you set your max speed to that value and begin experimenting with overclock values that allow you to hit that speed while staying under the power limit and without crashing. If you're not absolutely focused on always running at lower power, you might set your goal as getting your most intensive game to run at your target speed just at the power limit - then most of the time you'll be well below it, but you still allow full power to be available at the most extreme case. If you want to always reduce power, then set your reduced power limit and then iterate to find your right overclock. Note that your goals might be incompatible. It might turn out that the overclock required to hit your max speed at your desired power level is unstable and you crash. Then you have to decide whether to reduce the max speed even more or increase the power level. But something has to give.

Clock offsets are always multiples of 15MHz. While you can specify a value in between it will be rounded down.

The commands:

First, you must turn on Persistence Mode, or the overclock will reset when work units change (which is often).

nvidia-smi -pm 1

Then, set your max speed. In my example, I'm setting it to 1830MHz. The 210 is the minimum speed. You can increase this, but usually you have no reason to. It just increases your idle power consumption.

nvidia-smi -lgc 210,1830

Then, you set your overclock. This is done with nvidia-settings and you obviously have to set your coolbits to enable overclocking.

nvidia-settings -a [gpu:0]/GPUGraphicsClockOffsetAllPerformanceLevels=240

In my example, the best overclock I can set that allows me to hit 1830MHz in my worst case workload is +240MHz. In most games I could go higher but I don't want to micromanage it.

In all of this, I haven't talked about voltages, because it's not necessary information, but if you want to see what voltage level you are using, you can use

nvidia-smi -q -d VOLTAGE

Unfortunately, this isn't exposed in nvml so there's no way to add it to nvtop - you have to use nvidia-smi. But if you're curious, or if you want to compare with undervolting data from windows, you can use this.

So, there you go.

6 replies

IPlayZed Dec 29, 2022

Unfortunately, this isn't exposed in nvml so there's no way to add it to nvtop - you have to use nvidia-smi. But if you're curious, or if you want to compare with undervolting data from windows, you can use this.

Then this issue should be raised for nvml @Evernow

neworld Dec 29, 2022

Another reason was not mentioned, yet I would like to share. The suggested undervolting to reduce power usage during load makes sense to me. But without proper undervolting, I can not save power during idle. This is because the NAS server does some image/video processing from time to time, but GPU consumes ~20-23W during idle. Maybe, I could win the silicon lottery for reducing idling power consumption.

ghost Oct 5, 2023

So is there any way to reduce idle power usage as well?

jacklul Jun 25, 2024

According to this you can now set the clocks offset only for the highest performance PSTATEs:
P0-P1 - maximum 3D performance
P2-P3 - balanced 3D performance-power
Those nvmlDeviceGetClockOffsets and nvmlDeviceSetClockOffsets NVML methods were added in 555.85 driver.
Right now testing around with python's nvidia-ml-py library, from the looks of it - a semi-undervolt setup could be possible by:
locking the clocks to the max desired value then using the offset to raise the curve for the P0 state only, this would essentially make idle curve to stay stock (and not crash the GPU if the global offset is too high) and voltage should be lower at the max clock.

You should be able to use legacy nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[4]=offset still
Find the highest one (instead of 4) using nvidia-settings -q GPUGraphicsClockOffset[5] (try different numbers, pick the highest that doesn't error out)

unphased Jun 25, 2024

Interesting. Well I realized (and I trust nvidia-smi on this) that with my 3090s the real power suck is all this GDDR6X idle at 35 watts.

        SM                                : 0 MHz
        Memory                            : 405 MHz
        Video                             : 555 MHz
        SM                                : 0 MHz
        Memory                            : 405 MHz
        Video                             : 555 MHz
    Voltage
        Graphics                          : 0.000 mV

    Voltage
        Graphics                          : 0.000 mV

(via nvidia-smi -q -d CLOCK and VOLTAGE), even when the GPU itself (usually at 210Mhz) powers down entirely 90+% of the time, each GPU is still drawing more than 30 watts.

But it is good to know there is maybe a way to achieve low power idling with undervolting on linux as well. I hope that's what this means, anyway.

xor2k · 2022-09-14T11:20:34Z

xor2k
Sep 14, 2022

@philipl thank you for the hint with nvidia-smi -q -d VOLTAGE! I could verify that a clock offset of more than 200Mhz does not change the voltage anymore.

The issue with nvidia-settings is that it is really difficult to run from a command line e.g. as the root user: one needs to determine the right Xauthority and display. This is particularly annoying if one wants to undervolt remote machines.
Another issue is that undervolting (at least under Linux, I've not yet tried out MSI Afterburner on Windows) increases power draw when the GPU is idle.
Besides, on a default Ubuntu installation not only CoolBits need to be enabled but also the Xwrapper modified for undervolting to actually work.
I have taken care of all of this, see
https://github.com/xor2k/gpu_undervolt
~~The latest version of gpu_undervolt comes with a daemon mode that watches the power draw and turns undervolting on demand.~~ Currently, it works with a RTX 3090 and can save around 100W (of 350W) under full load (tested with stable diffusion). ~~When the power draw drops below 120W, undervolting is turned off again. This allows the power draw to drop down to around 30W, which is typical for an idle RTX 3090.~~
~~We will add some more GPUs in the next weeks.~~ (already done)

It would be really cool if undervolting support could be directly integrated into nvidia-smi or so.

It would be even better if undervolting would not be necessary in the first place and the voltage curve more efficient out-of-the-box. Any reason why this is the case?

12 replies

philipl Jan 4, 2024

Yes. It's not as flexible as afterburner - as the constant overclocking offset basically means there is only one voltage curve shape available to you, but in practice it works fine. You should identify the max MHz you want based on what your GPU achieves in Windows, and then set the overclock offset to whatever gives you your target voltage at that max speed.

unphased Jan 5, 2024

I'm not understanding how to save/commit the changes to the "overclock/undervolt" setting:

XAUTHORITY=/run/user/1000/gdm/Xauthority nvidia-settings -a [gpu:0]/GPUGraphicsClockOffsetAllPerformanceLevels=400

I can run this but when i run the load again the voltage as reported by watch -n1 nvidia-smi -q -d VOLTAGE stays the same. I'm trying to run the GPU at 1700mhz at 750mV, but it will set 889.5mV fairly consistently

unphased Jan 5, 2024

@philipl my testing shows that i can set a power limit, and that takes, and i can set a clock limit, and that takes, but I am having trouble getting the GPUGraphicsClockOffsetAllPerformanceLevels to do anything. I did set coolbits, i think there may be some kind of "xwrapper" thing I don't understand that I also need to do.

Meanwhile if i only tweak power limit and clock limit, it will still use the stock too-much-headroom voltage curve, clocking the GPU at like 1600mhz at 850mV while pegging the power at 220W. Which is really silly since I know it could run stable from windows at 750mV at 1700mhz.

unphased Jan 5, 2024

Aha, ok once I rebooted, i am able to get a pair of mhz adjustments under PowerMizer. And these are what that command line control also controls. Definitely able to get under 800mv while pegging clocks to 1695mhz now. Very good. Looks like a 300mhz "overclock" is pretty close to where I want to be on this unit.

The only drawback compared to afterburner is an inability to tweak the full voltage curve, but overall it's not a big deal as we're able to get the flat plateau implemented via the clock speed limit.

Betaminos Jan 13, 2024

As an addition to this: I think it comes somewhat close to the needed functionality but does not work on mobile cards. My laptop's 1080 GTX allows for the overclocking of core and memory, but does not allow for changing the maximum power limit or locking the maximum frequency or changing the maximum frequency.
The result is that I am stuck with overclocking and cannot go too high (to undervolt) without the GPU becoming unstable as it will actually try to use the frequencies.

edisionnano · 2022-09-14T20:34:58Z

edisionnano
Sep 14, 2022

I really hope this open driver has proper power management when it gets implemented. Prop driver's power managament is ultra trash rendering linux+nvidia unusable is some cases, especially on laptops. One such big issue
https://forums.developer.nvidia.com/t/has-anyone-been-able-to-run-an-rtx-3060-laptop-gpu-at-more-than-80w-on-linux/192959/80

1 reply

itsTyrion Sep 15, 2022

not unusable for me but I'm tired of CONSTANTLY hearing the fans on my GTX 1070 spin up and stop because clocks keep shooting from 135 core/405 mem (idle) to 1632 (load base clock)/4000 mem.
No, setting them to not stop isn't a solution because it also increases power draw by 4x (8W -> 40W)

mutlucan96 · 2023-01-15T15:43:34Z

mutlucan96
Jan 15, 2023

Still, we have to waste power and bear with fan noise.

0 replies

Fxzzi · 2023-02-14T13:15:06Z

Fxzzi
Feb 14, 2023

Bumping this because I would really like to see support for undervolting in Linux. Especially for my small form factor 7.4l gaming rig with a 3070. Things get toasty

5 replies

itsTyrion Feb 14, 2023

you can try reducing the power target and applying a clock offset up

IPlayZed Feb 15, 2023

I think the commenter is proposing that this should get more focus, as one would rather see an actual implementation of undervolting support, compared to workarounds.

Fxzzi Feb 18, 2023

you can try reducing the power target and applying a clock offset up

I can't do this anyway, considering that it relies on x11 / x.org to set these, and I use Wayland with the Hyprland compositor. This isn't an exact solution for the issue either way, just like the second commenter said.

itsTyrion Feb 18, 2023

I know it's not a solution. I undervolt on Windows myself.

unphased Jan 4, 2024

I have a 6L SFF (velka 7) with a 3080Ti FE in it, and let me assure you this thing gets crunchy-toasty under load. I've seen chipset and even SSD temps over 70C under some thermal scenarios.

Today I finally got around to playing with Afterburner doing proper undervolts in Windows, and was delighted to obtain 93% of stock performance with an undervolt that uses 0.6 of original power (350W TDP down to 210W) depending on workload.

I'm definitely giving undervolting on linux a shot based on results like this. I read about undervolting before but usually folks don't go far enough to drop near 90% performance. Maybe my silicon is a good sample but we're in range of sipping around HALF power while retaining 90% performance... This is simply too much efficiency win not to take advantage of. I worry especially with use cases like stable diffusion which generate frequently cycling heavy and light loads, that it will put a lot of stress on the devices physically due to intense thermal cycles. One way to address this well is to lighten the load of the cycles (undervolting is highly effective at this) and another is to smooth out the workloads with more efficient software to keep the hardware loaded more consistently.

hluu11 · 2023-02-26T20:16:38Z

hluu11
Feb 26, 2023

Up!
NVIDIA, Please expose those voltage control APIs.

0 replies

johnnynunez · 2023-06-09T10:35:39Z

johnnynunez
Jun 9, 2023

TEMP of memory like hwinfo

0 replies

Saeniv · 2023-09-29T09:59:25Z

Saeniv
Sep 29, 2023

Undvolting my Nvidia card is the last feature that is missing for me under Linux. If this were implemented, I could finally delete win10. I just keep it for gaming, because I can undervolt my RTX 3070 and save 90W.

0 replies

Betaminos · 2023-12-29T17:27:09Z

Betaminos
Dec 29, 2023

As we are close to ending 2023 and Linux has seen significant growth: any news on this?

0 replies

BlueGoliath · 2023-12-30T21:09:18Z

BlueGoliath
Dec 30, 2023

Proper undervolting support already exists. Ask Linux's "many" programmers to make an app for you.

1 reply

Betaminos Dec 31, 2023

This sounds promising, please share the documentation or point me where to find the details about this interface. Thanks.

weter11 · 2024-01-13T11:13:01Z

weter11
Jan 13, 2024

So we have only GPUGraphicsClockOffsetAllPerformanceLevels or there something like GPUGraphicsOffset_table for every frequency? It's not a problem to manually create a script with all offsets for frequencies between 1400 and 1900 MHz (only 33 values every 15 MHz). My RTX 3070 is for example stable at +100MHz at any frequency up to +280MHz at up to 1500MHz (at that frequencies GPU already working below 700mV with offset).

5 replies

philipl Jan 13, 2024

It's not possible, and I would say not necessary. In practice, you care about the stable max speed and the offset at that speed. Below that max speed, the offset is basically automatically stable. I then use a wrapper script which sets the max speed and offset for the game I'm playing.

Betaminos Jan 13, 2024

Unfortunately, the max speed cannot be set for some cards, including my laptop's 1080 GTX. Due to this I assume it's not possible to limit the max clock for laptops via the current implementation.

weter11 Jan 14, 2024

I'll rephrase what I wanted to say. I have a new game in my library, I don't know how good it was optimized and instead of playing the game with applied voltage table, first I need to find what frequency it uses, then set a limit at that frequency and apply offset. Is it user friendly?
For example TLoU because of bad optimization can work up to 1920MHz with power consumption 135W, but DMC5 go only up to 1680MHz and uses all 150W (mobile 3070). In both situations I need to apply different offset.
Also I don't know how to set nvidia-smi -i 0 -lgc 210,1815 w/o root privileges or any app which can do that w/o root. Also sadly to hear that even this command not working on GTX 1080. Even lazy AMD devs implement GPU voltage curve control in kernel 2 years ago plus you can see hot spot and memory temperatures and only recently one dude hack Nvidia drivers to add ability to watch memory temperature in Linux on Nvidia GPU, but there only small number of supported GPUs.
Plus that strange voltage control, when for example GPU load was 30% and it worked at 687mV, but when load became 98-100% and power cons increased, instead of applying higher voltage for stability - it decrease it to 681mV and voila X freezes. Even my old AMD CPU 3600X worked like 1.2V, when idle, but 1.22V when loaded. And sadly not AMD, not Intel intoduce gaming notebooks, which is more powerful than Nvidia or at least provide much better performance than my 3070 for upgrade.
UPD: GPU decrease voltage with temperature rising, not because of load. Thats why X became unstable. It try to limit increased power consumption, because of current leakage in transistors, but with limited cooling solutions in notebooks it did a bad joke.

unphased Jan 14, 2024

Yeah there are lots of variables, if your power delivery comes up short this would manifest in a lower voltage. Given current state of affairs there is simply a lot more control tuning the hardware using the typical tools we use in Windows, so I would definitely say if you care about the tweaking do it in windows and then see how close you can get it from Linux, that is the only pragmatic approach for now. Hope it changes.

weter11 Jan 27, 2024

Ok, so in my spare time I did some research, read articles about FinFET and their implementations by TSMC (used in Ampere, Ada plus Blackwell sadly also would be based on 3N FinFET). Technically all results are valid for all three architectures. I created a table with obtained results.

1.1 TLoU with applied MAX freq 1395MHz and +280MHz offset (badly optimized game)
1.2 TLoU with applied MAX freq 1890MHz and +160MHz offset
2.1 DMC5 with applied MAX freq 1395MHz and +280MHz offset (good optimization)
2.2 DMC5 with applied MAX freq 1725MHz and +180MHz offset
2.3 DMC5 with applied MAX freq 1800MHz and +160MHz offset

My conclusions:

Applied voltage is unstable (especially between 50-60°C huge voltage drop spikes up to -18mV).
Because of that instability plus voltage drop with increasing temperature we have freezes, lower offset, worse results and higher power consumption .
Current leakage at low voltages <0.8V after 65°C sharply increases.

I'm still in a shock with obtained results. Current leakage have a huge impact on GPU energy consumption depending on temperature on FinFETs and that leads to high power consumption even during idle (from 12W at 0°C up to 18.5W at 67°C. Plus default voltage table (or how devs implemented voltage regulation IDK) leads to very big inneficiency (TLoU w/o max freq limit uses 130W and in a test scene I have 68FPS, with max freq limit to 1830MHz and 160MHz offset only 80W at 67FPS. Minus 50W for 1FPS drop in performance at the same temperature! This is not a bug, users should themselves do this.
There are a huge bug in how driver control voltage on Ampere GPUs (IDK about other architectures). That strange spikes in voltage between 50-60°C and higher voltage at lower temperatures (I checked how GPU working <50°C and it working stable at 662mV like at 85°C. Because of that you can't apply higher offset (In DMC5 I can apply +250MHz, but decreased voltage after 50°C lead to freezes. Driver should or stay at the same voltage independently of temperature or apply small increase +12mV after 50°C.
PS Also I decrease memory speed from 14Gbps to 12Gbps and power consumption dropped by 7-10W and its strange, but frametime stability increased a lot. Yea I loose 3% performance but now have more headroom for core clocks.

weter11 · 2024-02-04T17:34:19Z

weter11
Feb 4, 2024

I create a simple script, which technically doing undervolt using offsets based on temperature. At this moment it working with one GPUGraphicsClockOffsetAllPerformanceLevels and depending on temperature it apply different offset. Extending it to apply different offsets depending on power, current is a lot more work and it would work good only on my hardware for a short period of time (semiconductors degradation). There should be something more flexible and tunable than script, ideally with a GUI.

Also I was wrong that instability coming from driver. It's a FinFETs and power MOSFETs nature. Plus my MOSFETs can't provide more than 150A, controller limit that. From one side its good for their temperature, from another: lower the voltage - lower stress for MOSFETs and higher stability in theory.
If someone have an interest, I used this script https://github.com/vandabbin/nvidia-fan-control-linux/blob/master/fan-control.sh and with minimum changes it's working.

PS According to recent news, now NVK became part of Nouveau and one week ago exposes Vulkan 1.3 support. Users report that in some comercial games performance now much better than year ago. I think when driver became more polished and powerful devs open gate to control voltage. They would not say why user need voltage control or another blah-blah-blah. This is my hardware, I paid money for it and want to use it like I wish!

13 replies

BlueGoliath Feb 7, 2024

They did provide the API and nvidia-settings is Open Source. Fork it and add whatever features yourself. Package it as a Flatpak so anyone can use it. Lets see Linux's "many" programmers get together and do it.

Fxzzi Feb 7, 2024

I'll say it again. Nvidia-settings does not even work on wayland. Do you realise how incomplete this stuff is?

weter11 Feb 7, 2024

Sorry master Goliath, it's David. Did you ever compare how different W vs L in provided features at least in last 5 years in the same apps? Your app even with all cross-compatibility was created only for W systems and you try now said something about L w/o any knowledge in that area? nvidia-settings never worked in the same way as Windows counterpart and it's not working in Wayland sessions. Ouhh.
PS Strange Goliath with a weird outlook on life.

BlueGoliath Feb 7, 2024

Sure doesn't. Enjoy not having undervolting support.

BlueGoliath Feb 7, 2024

Oh wow you changed "Dev" to "Goliath". You totally owned me Mr No Projects. No wonder no one has made one already, you people are pathetic.

unphased · 2024-02-09T10:58:13Z

unphased
Feb 9, 2024

I got a question for folks here. I have a pair of nvlinked 3090s in my workstation running Ubuntu 20.04 desktop, and i use it primarily remotely, so the monitor isn't plugged in (well i do have a PiKVM so one "monitor" is plugged in and X11 is active, that will not change as it's needed for certain admin situations)

Typically when i access it over SSH I can see nvidia-smi reports usually the core clock at 0Mhz instead of the usual min of 210Mhz. I changed my undervolting nvidia-settings calls as detailed above to set the clock range e.g. sudo nvidia-smi -lgc 0,1800 but this still keeps them idling at 210Mhz from this point forward.

It makes the GPUs draw 44W or so instead of 23W at idle.

Anyone know what's up with this? I just gotta put up with it and choose between lower power consumption when running workloads or lower idle power consumption, and just can't have both?

I'll also note my 3080Ti tends to stay at 210Mhz idle all the time but draws also in the realm of 20-something watts at idle. Though I reckon this is due to not having ram on the backside of the card, possibly if i had 3090Ti's for a similar one sided vram config, there'd be an idle power efficiency gain due to this.

0 replies

weter11 · 2024-03-01T20:31:43Z

weter11
Mar 1, 2024

@unphased, looks like your GPUs not switching to "Level 0" power level, that idle freq identical for a lot of Nvidia GPUs, there no "ZeroCore power" like on AMDs GPUs, Nvidia GPUs always work at idle freq like 210MHz. There was some discussion on forums about that.
About undervolt. I create two scripts, which control offset automatically, first is basic and another for advanced calculations and can futher undervolt GPUs.
Now bad news. Everytime, when we apply offset with a command nvidia-settings -a "[gpu:0]/GPUGraphicsClockOffsetAllPerformanceLevels=$total_offset", we have huge microstutter. When we apply nvidia-smi -i 0 -lgc $nvidia_smi_lgc_min,$nvidia_smi_lgc_max we do not have that, but because first setting should be set every second, we have a stuttery mess in games. Also, I manually update nvidia-settings to latest, no any difference.
Scripts here: https://github.com/weter11/nvidia-offset-controller/tree/main.
Question to Nvidia devs: is it fixable, or maybe there another way to apply offsets on the fly?

0 replies

weter11 · 2024-03-04T20:43:28Z

weter11
Mar 4, 2024

Reimplement script to python to use pynvml. Result: vkcube - no any problems, real games - stuttery mess. IDK what to do next. I don't have knowledge to rewrite code to C.
Question to Nvidia devs: why applying offsets create microstutters, even when script send pynvml.nvmlDeviceSetGpcClkVfOffset with offset=0?

0 replies

Fxzzi · 2024-04-23T22:54:37Z

Fxzzi
Apr 23, 2024

If nvidia could simply give a method / API to

view frequency target for each voltage
set frequency target for each voltage
do this under X11 and Wayland

We could easily have everything solved.

Undervolting? Done. Easily make a script with an array of frequencies (which you could create yourself, or just grab from MSI Afterburner on windows) to set at each voltage.

Overclocking? Easy. Just loop through each voltage, view its frequency target, and apply that target with a user supplied offset

AMD already has this solved. Now its your turn. We need nvidia to be viable under Linux, however with these clear missing features, its hard to justify purchasing a new nvidia GPU.

You guys have done a really good job on Explicit Sync and the drivers as a whole for the past year or two. There's only very little left to make Nvidia a recommendable option under Linux. However these little things which are left are some of the biggest bugs / issues / missing features.

17 replies

Fxzzi Apr 24, 2024

I wonder if this will help.

@unphased , you can have a look at my previous reply. seems I did it very similarly. That solved overclocking, and can partially solve undervolting, however from what I can see, there's not really a way to control the Voltage. So right now I'm doing a power limit, which the voltage of the card is partially dependent on. I watch the voltage with this command:

watch -t -n 1 "nvidia-smi -q -d VOLTAGE | grep Graphics"

and I attempt to get it as close to my Windows counterpart which is 1830Mhz @ 831mV.

My attempt is here: #236 (reply in thread)

EDIT: the documentation which this person was referring to is here. Fan control, core clocks, mem clocks, and power limits can all be changed through this API which doesn't rely on a display server, or even the OS. https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceCommands.html#group__nvmlDeviceCommands_1g2cff59fde3b91b22ea10dfa5d27a546b

BlueGoliath Apr 24, 2024

You people deserve not having undervolting/overvolting support and that's what you won't get. Congratulations kids. You earned it.

Fxzzi Apr 24, 2024

To anyone joining this conversation, this specimen is talking a bunch of smack. They don't have voltage control.

BlueGoliath Apr 25, 2024

This kid really went to the developer forums just to troll.

Imagine being an Nvidia employee who has to deal with kids in the Linux community doing that.

This comment was marked as a violation of GitHub Acceptable Use Policies

jacklul · 2024-07-09T17:48:27Z

jacklul
Jul 9, 2024

I am planning to switch to Linux in the near future and right now I'm in thorough "getting ready" phase.
One thing that is not having a checkmark on my list is GPU undervolt... I've been researching this for some time and unfortunately I see that this is not something that will be easy to achieve unless NVIDIA exposes frequency and voltage curve control to us.

So in preparation I made a Python script that can achieve an "undervolt" using a cheap trickery.
Example usage based on my card is documented here, including results and comparison to MSI Afterburner setup.
At least driver version 555.42 is required since earlier versions do not have nvmlDeviceSetClockOffsets method (this method sets offset for specific PSTATE(s) rather than globally).
It must run with admin permissions, obviously.

Don't judge my code, I don't work with Python at all but it was the only suitable language to use in this case.
Keep in mind that I tested this script only on Windows but since NVML API is universal it should be working on Linux as well.

I am posting this in good faith that someone picks up on this and develops a proper application on top of it.
I will appreciate any feedback and testing as well.

5 replies

markhagemann Jul 20, 2024

Thanks a lot for this! Was my only real hurdle so far with moving over to Linux fully and you have resolved it.

I have it up and running with systemd though I did have to provide the virtual env in the service file for it to run on root with required pip packages.

Environment="PYTHONPATH=$PYTHONPATH:/home/<user>/.pyenv/versions/3.12.4/lib/python3.12/site-packages"

Only issue I have is that it gets a segv fault for both manual and systemd configuration but it still seems to work fine and all my settings are changed as expected?

nvml-undervolt.service: Main process exited, code=dumped, status=11/SEGV

jacklul Jul 20, 2024

I have it up and running with systemd though I did have to provide the virtual env in the service file for it to run on root with required pip packages.
Environment="PYTHONPATH=$PYTHONPATH:/home/<user>/.pyenv/versions/3.12.4/lib/python3.12/site-packages"

Have you tried looking for python-nvidia-ml-py package (or something similar, providing pynvml) in your package manager?
python-nvidia-ml-py in Arch's AUR, python3-pynvml might work for Debian-based systems.

Only issue I have is that it gets a segv fault for both manual and systemd configuration but it still seems to work fine and all my settings are changed as expected?
nvml-undervolt.service: Main process exited, code=dumped, status=11/SEGV

I would say this is something with the NVML itself, can't really test it on my end since I'm not on Linux yet.
Does the service restart? I included Restart=on-failure in the service file anticipating crashing issue.
journalctl -u nvml-undervolt.service might give more insight

markhagemann Jul 20, 2024

Have you tried looking for python-nvidia-ml-py package (or something similar, providing pynvml) in your package manager?
python-nvidia-ml-py in Arch's AUR, python3-pynvml might work for Debian-based systems.

I imagine that will probably work for those that don't want to keep python managed with pyenv or similar.

I would say this is something with the NVML itself, can't really test it on my end since I'm not on Linux yet.
Does the service restart? I included Restart=on-failure in the service file anticipating crashing issue.
journalctl -u nvml-undervolt.service might give more insight

Yeah the restart works but it isn't necessary since it just keeps repeating the same segv fault and seeing as it "works" after the first try I just commented out the restart lines for now.

I'll give it some time to see how others go but for now I am just thrilled there is a solution for wayland + nvidia underclocking. Thanks again.

markhagemann Jul 20, 2024

Moved to jacklul/nvml-scripts#1

jacklul Jul 20, 2024

I don't even know where to start with debugging of this, you might try using verbose mode - doesn't work with systemd service so you have to run it with sudo.

I can take this to issues on your repo if you'd prefer.

Yes, no need to clutter this thread

Undervolting support #236

Replies: 18 comments · 65 replies

This comment was marked as a violation of GitHub Acceptable Use Policies

Replies: 18 comments 65 replies