summaryrefslogtreecommitdiff
path: root/sys/kern/vfs_syscalls.c
AgeCommit message (Collapse)Author
2018-04-28Clean up the parameters of VOP_LOCK() and VOP_UNLOCK(). It is alwaysVisa Hankala
curproc that does the locking or unlocking, so the proc parameter is pointless and can be dropped. OK mpi@, deraadt@
2018-04-27Move FREF() inside fd_getfile().Martin Pieuchot
ok visa@
2018-04-03Move FREF()s just after fd_getfile() in sys_kevent(), sys_lseek() andMartin Pieuchot
getvnode(). ok millert@
2018-04-03Add proper FREF()/FRELE() dance in sys_fchdir().Martin Pieuchot
The syscall doesn't sleep before a vnode reference is taken, so it doesn't stickly need the refcounts now. But they will be soon be used for parrallelism, so make it ready. ok bluhm@
2018-03-28Call FREF() right after fd_getfile*() in pread(), prwrite() & co.Martin Pieuchot
This ensure that all operations manipulating a 'struct file *' do so with a properly refcounted element. ok visa@, tedu@, bluhm@
2018-02-19Remove almost unused `flags' argument of suser().Martin Pieuchot
The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2018-02-10Syncronize filesystems to disk when suspending. Each mountpoint's vnodesTheo de Raadt
are pushed to disk. Dangling vnodes (unlinked files still in use) and vnodes undergoing change by long-running syscalls are identified -- and such filesystems are marked dirty on-disk while we are suspended (in case power is lost, a fsck will be required). Filesystems without dangling or busy vnodes are marked clean, resulting in faster boots following "battery died" circumstances. Tested by numerous developers, thanks for the feedback.
2018-01-02Stop assuming <sys/file.h> will pull in fcntl.h when _KERNEL is defined.Philip Guenther
ok millert@ sthen@
2017-12-11In uvm Chuck decided backing store would not be allocated proactivelyTheo de Raadt
for blocks re-fetchable from the filesystem. However at reboot time, filesystems are unmounted, and since processes lack backing store they are killed. Since the scheduler is still running, in some cases init is killed... which drops us to ddb [noted by bluhm]. Solution is to convert filesystems to read-only [proposed by kettenis]. The tale follows: sys_reboot() should pass proc * to MD boot() to vfs_shutdown() which completes current IO with vfs_busy VB_WRITE|VB_WAIT, then calls VFS_MOUNT() with MNT_UPDATE | MNT_RDONLY, soon teaching us that *fs_mount() calls a copyin() late... so store the sizes in vfsconflist[] and move the copyin() to sys_mount()... and notice nfs_mount copyin() is size-variant, so kill legacy struct nfs_args3. Next we learn ffs_mount()'s MNT_UPDATE code is sharp and rusty especially wrt softdep, so fix some bugs adn add ~MNT_SOFTDEP to the downgrade. Some vnodes need a little more help, so tie them to &dead_vnops. ffs_mount calling DIOCCACHESYNC is causing a bit of grief still but this issue is seperate and will be dealt with in time. couple hundred reboots by bluhm and myself, advice from guenther and others at the hut
2017-04-15After forced unmount of a file system that has other mount pointsAlexander Bluhm
in it, dangling mounts could remain. When unmounting check the hierarcy and unmount recursively. Also prevent that a new mount appears during the scan. Joint work with natano@; testing and OK krw@
2017-02-15Threads share filedesc, so we can walk allprocess instead of allprocPhilip Guenther
ok mpi@ millert@
2017-02-11Add a flags argument to falloc() that lets it optionally set thePhilip Guenther
close-on-exec flag on the newly allocated fd. Make falloc()'s return arguments non-optional: assert that they're not NULL. ok mpi@ millert@
2017-01-23Avoid curproc dance in dupfdopen(), by passing a struct proc *Theo de Raadt
ok guenther mpi
2017-01-15When traversing the mount list, the current mount point is lockedAlexander Bluhm
with vfs_busy(). If the FOREACH_SAFE macro is used, the next pointer is not locked and could be freed by another process. Unless necessary, do not use _SAFE as it is unsafe. In vfs_unmountall() the current pointer is actullay freed. Add a comment that this race has to be fixed later. OK krw@
2017-01-10Fix white spaces. No binary change.Alexander Bluhm
2017-01-10Remove the unused olddp parameter from function dounmount().Alexander Bluhm
OK mpi@ millert@
2016-09-10Add a noperm mount flag for FFS to be used for building release setsMartin Natano
without root privileges. This is only the kernel/mount flag; additional work in the build Makefile's will be necessary such that the files in $DESTDIR are created with correct permissions. tedu couldn't find anything wrong with it in a quick review idea & ok deraadt
2016-09-07Remove usermount remnants. ok teduMartin Natano
2016-07-14kern.usermount=1 is unsafe for everyone, since it allows any non-pledgedTheo de Raadt
program to call the mount/umount system calls. There is no way any user can be expected to keep their system safe / reliable with this feature. Ignore setting to =1, and after release we'll delete the sysctl entirely. ok lots of people
2016-07-12The only valid flag for unmount(2) is MNT_FORCE, ignore any others.Todd C. Miller
Fixes a crash when MNT_DOOMED is passed in the flags to unmount(2) found by NCC Group. OK bluhm@
2016-07-06Return EINVAL for mknod/mknodat when dev is -1 (aka VNOVAL).Todd C. Miller
OK beck@ tedu@
2016-07-03introduces new promise "chown" to allow changing owner/group with *chown(2) ↵Sebastien Marie
family it splits PLEDGE_FATTR in two ("fattr" stills grant the 2 flags, so no functional changes): - PLEDGE_CHOWN : to be able to call *chown(2) syscalls - PLEDGE_FATTR : the rest it introduces "chown" which grant: - PLEDGE_CHOWN : be able to call *chown(2) - PLEDGE_CHOWNUID : be able to modifying owner/group ok deraadt@ tedu@
2016-06-27dovutimens: call vrele(9) before returning EINVALSebastien Marie
ok guenther@
2016-06-27sys_revoke: call vrele() before returning ENOTTYSebastien Marie
ok guenther@
2016-06-26use error code path instead of return early without calling VOP_ABORTOP() andSebastien Marie
vrele()/vput(). ok deraadt@
2016-06-01rmdir(2) should return EINVAL not EBUSY when trying to remove ".".Todd C. Miller
This brings us back in conformance with POSIX rmdir(2) and rmdirat(2). OK kettenis@
2016-05-27W^X violations are no longer permitted by default. A kernel log messageTheo de Raadt
is generated, and mprotect/mmap return ENOTSUP. If the sysctl(8) flag kern.wxabort is set then a SIGABRT occurs instead, for gdb use or coredump creation. W^X violating programs can be permitted on a ffs/nfs filesystem-basis, using the "wxallowed" mount option. One day far in the future upstream software developers will understand that W^X violations are a tremendously risky practice and that style of programming will be banished outright. Until then, we recommend most users need to use the wxallowed option on their /usr/local filesystem. At least your other filesystems don't permit such programs. ok jca kettenis mlarkin natano
2016-05-15remove chroot(2) from allowed syscalls under pledge(2).Sebastien Marie
please note that chrooted process are still possible with pledge(2), but only if the chroot(2) is done *before* calling pledge(2). Once pledged, no more chroot(2) call are permitted.
2016-03-27When pulling and unmounting an umass USB stick, the file systemAlexander Bluhm
could end up in an inconsistent state. The fstype dependent mp->mnt_data was NULL, but the general mp was still listed as a valid mount point. Next access to the file system would crash with a NULL pointer dereference. If closing the device fails, the mount point must go away anyway. There is nothing we can do about it. Remove the workaround for the EIO error in the general unmount code, but do not generate any error in the file system specific unmount functions. OK natano@ beck@
2016-03-19Remove the unused flags argument from VOP_UNLOCK().natano
torture tested on amd64, i386 and macppc ok beck mpi stefan "the change looks right" deraadt
2016-01-06remove unnecessary casts where the incoming type is void *.Ted Unangst
2016-01-02mmcc noticed that nd.ni_pledge was uninitialized in doopenat() for theTheo de Raadt
oflags & 3 == 3 case. Therefore this depends on vn_open() blocking the operation later. Probably this meant the ni_pledge request would be too high, causing transient operation failure, rather than transient operation passage). Instead of initializing based on the oflags value use the result of FFLAGS(). I should have done this from the start. ok semarie [oflags & 3 == 3 is major dejavu for me]
2015-12-16in pledged process, setuid/setgid/sticky bits should be ignored.Sebastien Marie
enforce it for open(2) when used with O_CREAT and mode. ok deraadt@
2015-12-16in pledged process, setuid/setgid/sticky bits should be ignored.Sebastien Marie
enforce it for mkfifo(2) and mknod(2) (with "dpath" promise). ok deraadt@
2015-12-05remove stale lint annotationsTed Unangst
2015-12-04Add pledge "dpath", which provides access to mknod(2) and mkfifo(2).Theo de Raadt
This will be required to keep pax/tar/cpio at otherwise very high levels of pledge (and we will see where else it is beneficial). Allocate a bit for pledge "audio", which will be coming soon. good discussions with semarie
2015-11-20VISTTY check in revoke() is not working well for the non-indirectedTheo de Raadt
/dev/console case, so go back to doing the direct D_TTY check. signed over a few times with guenther
2015-11-20Fix whitespace. No binary change.Jonathan Gray
2015-11-18In sys_revoke, inspect the VISTTY flag on the backside of VOP_GETATTR,Theo de Raadt
because that shows the /dev/console translated vnode. You either already know the story, or you don't want to know.
2015-11-16Permit revoke(2) for a pledge "rpath tty"Theo de Raadt
ok millert semarie tedu guenther
2015-11-16Only perform revoke(2) on tty cdevs. Others paths return ENOTTY.Theo de Raadt
ok millert semarie tedu guenther
2015-11-14Add pathconf() to pledge "rpath"; ok guentherTheo de Raadt
2015-11-02move the pledgenote annotation from `struct proc' to `struct nameidata'Sebastien Marie
pledgenote is used for annotate the policy for a namei context. So make it tracking the nameidata. It is expected for the caller to explicitly define the policy. It is a kernel bug to not do so. ok deraadt@
2015-11-01refactor pledge_*_check and pledge_fail functionsSebastien Marie
- rename _check function without suffix: a "pledge" function called from anywhere is a "check" function. - makes pledge_fail call the responsability to the _check function. remove it from caller. - make proper use of (potential) returned error of _check() functions. - adds pledge_kill() and pledge_protexec() with and OK deraadt@
2015-10-28mkdir is PLEDGE_CPATH, not PLEDGE_CPATH | PLEDGE_RPATH...Theo de Raadt
ok semarie
2015-10-28remove duplicate setting of p_pledgenote:Sebastien Marie
- in sys_access() which call dofaccessat() and where the same note is already setted - in sys_mkdir() which call domkdirat() and where the same note is already setted - in sys_rmdir() which call dounlinkat() and where the same note is already setted it makes the p_pledgenote affectation be near NDINIT/NDINITAT call.
2015-10-28make sys_chroot() only allowed to be used when pledged, with "rpath id proc".Sebastien Marie
the previous check in pledge_namei() was incomplete. For using SYS_chroot we needed "id", and we could have passed pledge_namei() just with "rpath" (without using the now removed whitelisted entry). the check for "rpath id proc" is now done using p_pledgenote: pledge_namei() will check that the pledgenote is permitted by your pledge. "go ahead" deraadt@
2015-10-28Set pledgenote to PLEDGE_RPATH in chdir & chrootTheo de Raadt
noticed by semarie
2015-10-25Fold "malloc" into "stdio" and -- recognizing that no program so far hasTheo de Raadt
used less than "stdio" -- include all the "self" operations. Instead of different defines, use regular PLEDGE_* in the "p_pledgenote" variable (which indicates the operation subtype a system call is performing). Many checks before easier to understand. p_pledgenote can often be passed directly to ktrace, so that kdump says: 15565 test CALL pledge(0xa9a3f804c51,0) 15565 test STRU pledge request="stdio" 15565 test RET pledge 0 15565 test CALL open(0xa9a3f804c57,0x2<O_RDWR>) 15565 test NAMI "/tmp/testfile" 15565 test PLDG open, "wpath", errno 1 Operation not permitted with help from semarie, ok guenther
2015-10-20clear whitelisted-paths view in pledge.Sebastien Marie
the following diff adds a clear view of whitelisted-paths in pledge. before, whitelisting "/usr/local/bin" path would make only "/usr/local/bin" VNODE was present and let "/usr/local", "/usr", and "/" been ENOENT. It was a somehow odd filesystem hierarchy, and it breaks realpath(3). with this diff, the directories that are one of the parents of a whitelisted-directory become visible to stat(2) related syscalls, but only with restricted permissions: stat(2) will lie a bit, and saying they owned by root:wheel and mode is --x--x--x. Note that only stat(2) is affected by this "view", and the owner/mode aren't effectively changed: it is just a "lie". while here, refactor a bit pledge_namei() in order to avoid multiple for-loop on whitelisted-path array. ok deraadt@