nfsv4 mount fails with "operation not permitted"

skullnobrains used Ask the Experts™
nfsv4 mount fails with "operation not permitted"

i'm using nfsv4 over tcp

tcpdump output says
Flags [P.], seq 365:489, ack 141, win 229, options [nop,nop,TS val 3498408 ecr 486410650], length 124: NFS request xid 4189983984 120 getattr fh 0,0/24
Flags [P.], seq 141:217, ack 489, win 122, options [nop,nop,TS val 486410655 ecr 3498408], length 76: NFS reply xid 4189983984 reply ok 72 getattr ERROR: Operation not permitted

i'm using a synology nas and have reasons to believe the rights are properly configured as they are cloned from working rules for other hosts and the same export
- read only
- squash all users to admin
- non privilege ports allowed
- cross mounts denied

other mounts are performed through a VIP + source nat

in this case, there is 2 different layers of port redirections ( one checkpoint and one regular end-user internet box ) followed by both destination and source nat performed by a pfsense firewall

EDIT for clarification
CLIENT > checkpoint > internet > box > pfsense > SERVER
__ checkpoint performs source port and address translation as a regular outgoing NAT firewall
__ the internet box performs a simple destination address translation : source address and both source and destination port untouched
__ the pfsense firewall translates both source and destination addresses. destination port untouched. i'm unsure about the source port but it is changed by checkpoint anyway

the network part does work : i can successfully sniff packets in both directions along the way and connect to port 2049 with netcat

i assume this setup breaks something in the nfs protocol internals
i have tried many mount options including specifying sec=sys and various combination of addresses

ideas are welcome

thanks for your time
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
nociSoftware Engineer
Distinguished Expert 2018

If settings are cloned, were the host entries also adjusted for the new system?
Does the hostname also map to the right IP address? (The IPADDRESS is what is actualy used).
i'm using the ip and correctly reaching the server. the nfs exports also use IPS, and i've tried to add ACLs for all source ips : the original one, nated by checkpoint, nated by pfsense. the server answers to a few nfs queries and rejects the "getattr fh 0,0/24" whatever that is.

googling suggest that query is related to nfs security negociation but i'm unsure about this information

the only thing i can think of that would mess things up is the source port is changed by the checkpoint firewall which i have no control on. additionally the port is translated from a privileged port to a non privileged one.

this is a nfs issue probably caused by the networking mess. any insight regarding the negociation performed by nfs during the mount initialisation ?
EDIT for clarification
CLIENT > checkpoint > internet > box > pfsense > SERVER
__ checkpoint performs source port and address translation as a regular outgoing NAT firewall
__ the internet box performs a simple destination address translation : source address and both source and destination port untouched
__ the pfsense firewall translates both source and destination addresses. destination port untouched. i'm unsure about the source port but it is changed by checkpoint anyway
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

i'm back to tackling this issue.
the 24 filehandle apparently is returned by a previous nfs request reported as type 40, likely decimal, in tcpdump

for now i fail to find any decent documentation regarding nfs request types but my hunch is the XATTR_READ request on the root directory fails
setting "fsid" to "root" in exports changes the message to "reason given by server: no such file or directory"
i'm unsure why. insight is welcome.

... and obviously insight regarding the new issue as well

i'll be posting more debug later on when i get a chance to get back to this issue
got it running some time ago, pretty much on the first try : used "/" in the mount command

for some reason, sticking "fsid=root," in the exports which can only be done manualy on the synology nas does the trick, and i had the mount command wrong hence the not found error i had before lunch break

just to be clear i did check, and the mount does not work without fsid

additionally the address that needs to be set in nfs exports is the pfsense's address, which makes sense. I did not test whether the client's real address is required as well since i'm keeping it anyway in the config since i expect to migrate to a less messy setup at some point.

i'm keeping the thread open in case anyone would figure out why this option is required only with the messy NAT setup described above while direct mounts or less messy nat do not require it and will accept this post as an answer otherwise

@noci : that was not the issue, but thanks for your time
nociSoftware Engineer
Distinguished Expert 2018

The actual cloning process could very well be the source of this. Due to cloning the UUID of the filesystems would be identical,
so the automatic fsid used would be equal as well.
Setting fsid overrides this automatic mechanism See man 5 exports for more info:

              NFS needs to be able to identify each filesystem that it exports.  Normally it will use a UUID for the filesystem (if  the  filesystem  has
              such a thing) or the device number of the device holding the filesystem (if the filesystem is stored on the device).

              As  not  all filesystems are stored on devices, and not all filesystems have UUIDs, it is sometimes necessary to explicitly tell NFS how to
              identify a filesystem.  This is done with the fsid= option.

              For NFSv4, there is a distinguished filesystem which is the root of all exported filesystem.  This is specified with  fsid=root  or  fsid=0
              both of which mean exactly the same thing.

              Other filesystems can be identified with a small integer, or a UUID which should contain 32 hex digits and arbitrary punctuation.

              Linux  kernels  version 2.6.20 and earlier do not understand the UUID setting so a small integer must be used if an fsid option needs to be
              set for such kernels.  Setting both a small number and a UUID is supported so the same configuration can be made to work  on  old  and  new
              kernels alike.
regarding cloning, i've been cloning the CLIENT and actually the mount is the same, but the mount is different since the external ip is different. i've tried the saame mount from a different machine ( which never had that nfs mount ) at the same location and got the same results. this possibility seems inconsistent with other tests as well, but i'll be digging just to make sure... which is quite a pita since i have no idea how to print the uuid of a nfs filesystem from the client side

in the current case, the mount will not work without fsid, and it only works by mounting "server:/" without a path.
for some reason, setting fsid to something which is not root or zero produces the same results

without fsid=, nat won't work ( except in some lucky cases ) and with fsid=, i become unable to specify the filesystem path in other mounts and have to use server:/ as well ^^

i'll be digging on that a little more : maybe it only breaks specifying the path when 0 is specified and other values would work as expected

i'm very unsure, but it seems this behavior is due to a regression introduced by redhat in 2.6.38 : i'm under the impression that the initial getattr command that is rejected is expected to produce the fsid of the mountpoint's root ( which for some reason does not work over NAT setups ) and setting it simply circumvents the bug.

still wondering how to produce a cleaner setup. maybe passing the uuid in the mount target would work, though i'm unsure if and how that would be feasible. at best i'd like the mount to work remotely without setting fsid manually which will be trashed any time a modif is performed using the server's gui ( which is a synology NAS )

@noci: thanks a lot for your time and explanations. though they did not provide an actual answer ( yet ? ), that was helpful
<quote>for some reason, setting fsid to something which is not root or zero produces the same results</quote>

- setting fsid to root/0 allows the mount to work with server:/ and any other path produces "no such file..."
- setting fsid to anything else produces the initial behavior : "Operation not permitted"

ways to circumvent that bug client side are welcome. i've been told the fuse version of nfsmight work better. did not test yet. for now i'm considering switching to webdav for all ro access which is kinda dumb but at least would allow predictable firewall traversal
nociSoftware Engineer
Distinguished Expert 2018

For nfsv4 mounting is done differently, the next might help here
The whole thing is intentional to have a mount ..... server:/  (so links accross separate directories in the filesystem keep on working)

Another question.... clocks are synced on both client & server, that might be another item to consider. Although it doesn't be the case here, just to be sure.
hmm... i get the idea.

nevertheless mounting specific directories does work without NAT as expected and does work with other clients ( though i did not try the exact same situation, i remember using nfsv4 over nat on bsd host. i don't recollect encountering similar issues but that was a while ago ). additionally i believe that limiting NFSv4 mounts in such a way is plain dumb and fail to see why RHEL beliefs should prevent me from mounting a bunch of separate exports wherever i want without bothering with bind mounts in subdirs of /etc/exports which is both inconsistent with NFSv3 and not supported in the synology NAS.

...but i'll be working using this piece of information. i see how to produce a clean config in a regular server ; i'll have to dig into what can be done with the synology nas without toying manually

yes clocks are reasonably synced
a little insight :

the initial mount works over NAT if and only if the source port is preserved ( no fsid server side and full path client side )
but the rights that apply are those of the natted IP
i assume this is a heritage from nfsv3
since i only use a reasonable number of machines, i'm not really afraid of collisions and i even should be able to set fixed source ports on each of them should that be necessary.

i'll probably be working my way through this mess by using a VPN ( yeah that's overkill but i need it for other reasons ) to circumvent 2 of the layers of nat and only keep a VIP ( required since the network cannot be reached through the VPN ) with an additional layer of source nat ( required since the server's default route is on a different equipment ) with fixed source port... ^^ believe it or not i like to KISS/LEAN ;)


i'm not expecting input from anyone else, so unless you have insight regarding why the fixed source port would be a requirement and possibly how to change that behavior, i'll be closing the question shortly

ps : i still believe this is a bug : how it should work is a matter of personal opinion but having that much of a different behavior when changing something that is rather unrelated is most definitely a bug
nociSoftware Engineer
Distinguished Expert 2018

hm are you aware of this:

(RPC will use those ports, so you may need to fix those instead of dynamic.)
And this is something from NFSv2, NFSv3.
NFSv4 uses port 2049 and if TCP is used there should be no concerns on firewall ports.
yeah : those apply to nfsv3

nfsv4 actually does use a second port the other way round for direct/fast(/useless?) access to files but it is not mandatory for regular operations

there should be no concerns on firewall ports.

agreed, that's the whole point of striving away from RPCs which no firewall except for ipf would handle properly

nevertheless the linux client somehow cannot grab the fsid when the source port is mapped to a different one
i've double checked and i'm 100% sure changing the client port is what messes things
i did not check but believe this is a linux-only issue

i don't have time to dig into the source code, and won't have for weeks so i'll be closing this thread without a complete answer

regarding my initial problem, i've solved it with an overkill VPN ( not a problem given the use case : i need the VPN for other reasons and poor performance is not an issue in this context )

@noci, thanks for your help. i'll try to award some points without misleading future readers
@noci, thanks for your help. i'll try to award some points without misleading future readers
@noci : thanks : though you did not provide a solution, i believe your comments made sense and might be helpful to future readers as well.
extra thanks for not posting misleading or foolish or hugely unrelated information.

note : i'm a little baffled i cannot tag some of my own comments as helpful for future ( obviously for no points should anyone care ) so future readers would benefit of the useful information they contain

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial