Hard Pass

8 readers
0 users here now
Rules
  1. Don't be an asshole
  2. Don't make us write more rules.

View hardpass in other ways:

Hardpass.lol is an invite-only Lemmy Instance.
founded 11 months ago
ADMINS

hard pass chief

1251
1252
 
 

Rule

1253
1254
 
 

Double Check Your NFS timeouts to your NAS arent an NFS problem. They might be a dirty page writeback problem.

I'm really Sorry in advance for the wall of text here. I debated trimming this down but honestly the whole reason I spent months stuck on this is because nothing about it was obvious. The symptoms point you at NFS, your mount options, your network, everything except whats actually wrong. And because the defaults that cause it ship with basically every linux distro, Id bet money theres a ton of people out there with the same problem right now just blaming thier NAS or Jellyfin or whatever. For all I know this is common knowledge and I'm just the last person to figure it out, but on the off chance somebody else is out there googling the same NFS timeout errors I was, heres the full story. (TL;DR Below)

Ive been chasing NFS issues on my Proxmox cluster for months now and I finally found the actual cause, and it wasnt anything Id seen anyone talk about online. Figured Id write it up because I guarantee other people are hitting this exact same wall.

The setup: half a dozen VMs on Proxmox, all mounting a Synology NAS over NFS. Jellyfin, Audiobookshelf, Sonarr, Radarr, the usual self-hosted media stack. Things would work fine for a while and then randomly go sideways. Jellyfin stops mid-playback. Audiobookshelf loses track of where you were. Sonarr tries to import a downloaded episode and the entire container locks up. dmesg fills with nfs: server 192.168.1.50 not responding, timed out and youre rebooting things again.

The part that kept me going in circles for so long is that it was never consistent. An audiobook would stream for hours without a hiccup, but then Sonarr would try to move a 4GB episode file and the whole mount would go down. I could ls the mount and browse around just fine even while Sonarr was hung. Small file operations worked. Large writes didnt. But not always, Sometimes a big import would go through without a problem, and Id convince myself whatever Id just changed in my mount options had fixed it.

I went through all the usual advice. Switched from NFSv4 to NFSv3, which I was especially convinced was the fix because the timing lined up with when Id been experimenting with v4. It wasnt. I toggled nolock, tuned rsize and wsize down from 128K to 32K, tried soft vs hard mounts, checked the Synologys HDD hibernation settings, disabled TCP offloading on the virtio NIC. Nothing actually fixed it. Every time I thought I had it, the next import that was over the threshold would fail and i would scream.

Then at one point I gave a couple of the VMs more RAM, thinking the media workloads could use the headroom. Everything got worse after that. Like, measurably worse. I didnt connect the two at the time.

What finally cracked it was running a dd test to write a 2GB file to the NFS mount and actually watching the numbers. With the 32K buffer mount options, the write reported 2.1 GB/s. On a gigabit link. Obviously that data is not going to the NAS. The kernel was eating the entire write into the VMs page cache, saying "yep, done!" and then trying to flush 2+ GB of dirty pages to the Synology all at once. The NAS gets hit with a wall of data it cant process fast enough, NFS RPC calls start timing out, and everything goes to hell.

The default value for vm.dirty_ratio is 20, meaning the kernel will let 20% of your RAM fill up with dirty pages before it forces a writeback. On my 13GB VM thats 2.6GB of buffered writes. So the kernel would happily sit there absorbing data into RAM, and then try to shove 2.6 gigs down a gigabit pipe to the NAS all at once. And when I "upgraded" VMs with more RAM, I was literally raising the ceiling on how big that buffer could get. Thats why things got worse. The inconsistency made sense too. A 700MB file might stay under the background flush threshold and trickle out fine. A 4GB season pack would blow past it and trigger the whole mess.

The fix

Two sysctl values:

sysctl -w vm.dirty_bytes=67108864
sysctl -w vm.dirty_background_bytes=33554432

This caps the dirty page buffer at 64MB and starts background writeback at 32MB. Instead of hoarding gigabytes and flushing all at once, the kernel now pushes data out to the NAS continuously in small batches. Make it persistent:

# For distros using /etc/sysctl.d/ (Debian 12+, Ubuntu, etc.)
echo -e 'vm.dirty_bytes=67108864\nvm.dirty_background_bytes=33554432' > /etc/sysctl.d/99-nfs-dirty-pages.conf
sysctl -p /etc/sysctl.d/99-nfs-dirty-pages.conf

# For distros using /etc/sysctl.conf
echo 'vm.dirty_bytes=67108864' >> /etc/sysctl.conf
echo 'vm.dirty_background_bytes=33554432' >> /etc/sysctl.conf

Before: 2GB dd writes at 101 MB/s, dies at the 2GB mark with NFS timeouts and I/O errors. After: same test, steady 11.4 MB/s start to finish, zero NFS timeouts, completes cleanly. OK oK Yeah, the throughput number is lower, but Ill take a transfer that actually finishes over one that crashes every time.

I applied this across all six of my VMs that mount the NAS and the whole fleet has been stable since. Theyd all been independently building up multi-gigabyte write backlogs and dumping them onto the Synology simultanously. I was basically DDoSing my own nas from six directions every time anything tried to write a big file.

Then I checked the Proxmox host itself. 128GB of RAM. Four NFS mounts to the same Synology, including the one Proxmox writes VM backups to. All hard mounts with default dirty ratio. Thats a 25GB dirty page ceiling on the hypervisor. Every scheduled backup was potentially building up a 25 gigabyte write buffer and then hosing the NAS with it in one shot. And because the mounts were hard, if the Synology choked during the flush, the hypervisor itself would hang, not just a VM. I dont even want to think about how many weird backup failures and unexplained freezes this was behind.

Since applying the fix Ive also noticed that Jellyfin library scans are completing reliably now. They used to hang constantly and Id just accepted that as normal Jellyfin-over-NFS jank. The scans were generating thumbnails and writing metadata, building up dirty pages, and triggering the same flush that would take down the mount mid-scan. Audiobookshelf was doing the same thing. It would scan libraries and randomly lose connection to the mounted paths. That one was harder to pin down because audiobook files and cover art are small enough that the writes wouldnt always push past the threshold on their own. But if another VM had already half-filled the NASs tolerance with its own flush, Audiobookshelf tipping it over would be enough. Same underlying bug in every case, and I spent months blaming three different applications for it.

If youre running a media stack on VMs with NFS mounts to a NAS and youve been tearing your hair out over random timeouts, check your vm.dirty_ratio and do the math against your RAM. Bet you its higher than you think.

TLDR; If your NFS mounts to a NAS randomly time out during large writes, your VMs are probably buffering gigabytes of dirty pages in RAM and then flushing them all at once, overwhelming the nas. Symptoms in my case were Jellyfin stopping mid-playback and hanging during library scans, Audiobookshelf losing connection to mounted paths and forgetting playback position, and Sonarr/Radarr locking up completely when trying to import episodes. Set vm.dirty_bytes=67108864 and vm.dirty_background_bytes=33554432 on every VM (and the hypervisor) to cap the buffer at 64MB and force continuous small writebacks instead.


Edit 1: @deadcade pointed out that 11.4 MB/s is suspiciously close to a 100 Mbps link ceiling and they were right. Checked the NAS LAN1 network status and it was negotiating at 100 Mbps... The NAS was plugged into my router which has gigabit ports but was apparently negotiating down due to what i must assume is an issue with the router.

SO the real solution: I went to Bestbuy and grabbed a $20 gigabit switch, plugged the NAS and Proxmox host into it directly, and the Synology came up at 1000 Mbps immediately. Same 2GB dd test now completes at 107 MB/s from the host and 115 MB/s from the VM, no timeouts, totally clean.

So if i actually understand wtf is going on here... it was actually two problems stacked on top of each other this entire time.

The 100 Mbps link was the speed ceiling between the router and the NAS. The dirty page defaults were what turned that speed limitation into a catastrophic failure. The kernel would buffer gigabytes of writes and then try to flush them through a 100 Mbps pipe where the NFS RPCs would time out long before the data finished arriving. The sysctl fix worked because it accidentally rate-limited the client to roughly what the 100 Mbps link could handle. Fixing the link speed solved the actual bottleneck.

THANKS for the insight deadcade!

Both fixes stay though. 64MB dirty page cap on a gigabit link still saturates the connection at 115 MB/s and there's no reason to let a 128GB Proxmox host build up a 25GB write buffer aimed at a consumer NAS. Also check your link speeds.

Edit 2: Thanks again to everyone who chimed in with your fantastic insights and ideas.

1255
1256
 
 
  • Fast Nuclear Buildout: The Trump administration is rapidly rewriting rules to support the development of nuclear power plants.
  • Aligning With Industry: Staffers from DOGE are revamping rules in ways to ease regulations and provide financial breaks for industry.
  • “No Longer Independent”: Nuclear Regulatory Commission veterans say the administration is limiting oversight in dangerous ways.
1257
 
 
1258
1259
9
submitted 4 days ago* (last edited 3 days ago) by War5oldier@lemmy.world to c/AskUSA@discuss.online
 
 

Putting it into perspective: the Swiss Franc is backed by mutual trust which is something money can't buy (investors have confidence on economic stability during times of crisis) since it's not pegged to another currency or gold despite them having gold reserves. They even have a 1000 CHF bill ($1,269) so it's a strong currency in that sense, they barely circulate it outside.

Do YOU consider the US Dollar a safe haven currency? If it were: it would've received the same status as the Swiss Franc. The reason why CHF is strong is due their trust & confidence alongside a stable economical & political system, put it in comparison: how many Americans have confidence in their own currency? Does the USA have a "truly" stable political system?

The thing is: Switzerland is neutral, meaning they have no incentive whatsoever on becoming belligerents in foreign wars (something the USA can't stay away from since they spend a LOT of money on the military). Their national debt is lower than it is in USA (140m CHF / ~$179m) while in comparison: America's debt has ballooned to around $38,200,000,000,000 if I recall.

It's also tied to their monetary policy (which is highly trusted) hence why they managed to keep inflation relatively low but inflation in America is a joke (no need to say how bad it is). Their interest rate is 0% (can't be said for US Federal Reserve: 3.75%) as Switzerland's goal is to ensure price stability long term while the USA is more on promoting maximum employment.

1260
1261
 
 

Putin couldn't have picked a better tool to do it.

1262
 
 

Today, AI is rapidly changing the way we build software, and the pace of that change is only accelerating. If our goal is to make programming more productive, then building at the frontier of AI and software feels like the highest-leverage thing we can do.

It is increasingly clear to me that Codex is that frontier. And by bringing Astral’s tooling and expertise to OpenAI, we’re putting ourselves in a position to push it forward. After joining the Codex team, we’ll continue building our open source tools, explore ways they can work more seamlessly with Codex, and expand our reach to think more broadly about the future of software development.

1263
1264
 
 
1265
1266
 
 
1267
1268
1269
1270
86
submitted 6 days ago* (last edited 6 days ago) by CyberEgg@discuss.tchncs.de to c/memes@lemmy.world
 
 
1271
 
 

A former Palantir executive recently confirmed what many have long suspected. In a public statement, the whistleblower said it plainly: Palantir intended to take over the US government, and many of his former colleagues are now installed inside the federal apparatus. He called it an occupied nation. He is not alone. Thirteen former Palantir employees—engineers, managers, and a member of the company’s own privacy team—signed a letter shared with NPR warning that guardrails meant to prevent discrimination, disinformation, and abuse of power have been violated and are being rapidly dismantled.

What Palantir represents is something unprecedented: the convergence of American imperialism, Zionism, technofascism, and surveillance capitalism into a single instrument of control. Understanding how we got here requires looking at the machine Palantir has built, who built it, and what they believe.

Palantir was founded in 2004 by Peter Thiel and Alex Karp. Its first major investor was In-Q-Tel, the CIA’s venture capital arm, which seeded the company with millions and opened the door to every major intelligence and defense agency. The logic was deliberate: The American ruling class recognized decades ago that the state’s coercive power—surveillance, targeting, data harvesting—could be run more effectively and more profitably through private contractors. When a government agency surveils its own citizens, there are hearings, FOIA requests, oversight committees. When a private company does it, it is a trade secret.

1272
 
 

blowing up the largest oil field in the world would be absolutely mental and therefore this is just a bluff, right? there's no way he would actually bomb South Pars off the map because it would immediately cause permanently geopolitical shifts + Iran bombing Israel and then.. lol

1273
 
 

Probably a silly question but the .uk domain is really cheap. If I'm not in the UK can I still use that domain for my server without issue?

Its like 50 bucks for a ten year lease

1274
 
 

https://archive.is/0fxqX

Can't the Pentagon just sell those $2000 hammers and toilet seats and pull themselves up by their boot strings?

1275
 
 
view more: ‹ prev next ›