summaryrefslogtreecommitdiff
path: root/sys/kern/vfs_syscalls.c
AgeCommit message (Collapse)Author
2018-07-22Avoid a NULL pointer deref when calling fchown() on a file descriptor belonginganton
to a cloned device. ok kettenis@
2018-07-13Make the default failure for unveil while disabled return successBob Beck
so that people don't get screwed when playing with it on their machines
2018-07-13Unveiling unveil(2).Bob Beck
This brings unveil into the tree, disabled by default - Currently this will return EPERM on all attempts to use it until we are fully certain it is ready for people to start using, but this now allows for others to do more tweaking and experimentation. Still needs to send the unveil's across forks and execs before fully enabling. Many thanks to robert@ and deraadt@ for extensive testing. ok deraadt@
2018-07-03Add a new so_seek member to "struct file" such that we can have seekableMark Kettenis
files that aren't vnodes. Move the vnode-specific code into its own function. Add an implementation for the "DMA buffers" that can be used by DRI3/prime code to find out the size of the graphics buffer. This implementation is very limited and only supports offset 0 and only for SEEK_SET and SEEK_END. This doesn't really make sense; implementing stat(2) would be a more obvious choice. But this is what Linux does. ok guenther@, visa@
2018-06-25During open(2), release the fdp lock before calling vn_open(9).Visa Hankala
This lets other threads of the process modify the file descriptor table even if the vn_open(9) call blocks. The change has an effect on dup2(2) and dup3(2). If the new descriptor is the same as the one reserved by an unfinished open(2), the system call will fail with error EBUSY. The accept(2) system call already behaves like this. Issue pointed out by art@ via mpi@ Tested in a bulk build by ajacoutot@ OK mpi@
2018-06-18Put file descriptors on shared data structures when they are completelyMartin Pieuchot
setup, take 3. LARVAL fd still exist, but they are no longer marked with a flag and no longer reachable via `fd_ofiles[]' or the global linked list. This allows us to simplifies a lot code grabbing new references to fds. All of this is now possible because dup2(2) refuses to clone LARVAL fds. Note that the `fdplock' could now be release in all open(2)-like syscalls, just like it is done in accept(2). With inputs from Mathieu Masson, visa@, guenther@ and art@ Previous version ok bluhm@, ok visa@, sthen@
2018-06-14In dounlinkat() only perform the check for a mounted directory whenTodd C. Miller
actually removing a directory. Fixes a problem where removing device special files could result in EBUSY. OK guenther@
2018-06-13Make the VFS layer responsible for preventing the deletionVisa Hankala
of mounted on directories. OK guenther@, mpi@
2018-06-07Make callers of VOP_CREATE(9) and VOP_MKNOD(9) responsible forVisa Hankala
unlocking the directory vnode. OK mpi@, helg@
2018-06-05Revert introduction of fdinsert(), a sanitify check triggers whenMartin Pieuchot
closing a LARVAL file. Found the hardway by sthen@.
2018-06-04Add VB_DUPOK to suppress witness(4) warning of concurrent mount locks.Philip Guenther
Use that in three places: - vfs_stall() - sys_mount() - dounmount()'s MNT_FORCE-does-recursive-unmounts case ok deraadt@ visa@
2018-06-02Put file descriptors on shared data structures when they are completelyMartin Pieuchot
setup. LARVAL fd still exist, but they are no longer marked with a flag and no longer reachable via `fd_ofiles[]'. This allows us to simplifies a lot code grabbing new references to fds. All of this is now possible because dup2(2) refuses to clone LARVAL fds. Note that the `fdplock' could now be release in all open(2)-like syscalls, just like it is done in accept(2). With inputs from Mathieu -, visa@, guenther@ and art@ ok visa@, bluhm@
2018-05-08Protect per-file counters and document which lock is used to protectMartin Pieuchot
the other fields. Once we no longer have any [k] (kernel lock) protections, we'll be able to unlock almost all network related syscalls. Inputs from and ok bluhm@, visa@
2018-05-02Remove proc from the parameters of vn_lock(). The parameter isVisa Hankala
unnecessary because curproc always does the locking. OK mpi@
2018-04-28Clean up the parameters of VOP_LOCK() and VOP_UNLOCK(). It is alwaysVisa Hankala
curproc that does the locking or unlocking, so the proc parameter is pointless and can be dropped. OK mpi@, deraadt@
2018-04-27Move FREF() inside fd_getfile().Martin Pieuchot
ok visa@
2018-04-03Move FREF()s just after fd_getfile() in sys_kevent(), sys_lseek() andMartin Pieuchot
getvnode(). ok millert@
2018-04-03Add proper FREF()/FRELE() dance in sys_fchdir().Martin Pieuchot
The syscall doesn't sleep before a vnode reference is taken, so it doesn't stickly need the refcounts now. But they will be soon be used for parrallelism, so make it ready. ok bluhm@
2018-03-28Call FREF() right after fd_getfile*() in pread(), prwrite() & co.Martin Pieuchot
This ensure that all operations manipulating a 'struct file *' do so with a properly refcounted element. ok visa@, tedu@, bluhm@
2018-02-19Remove almost unused `flags' argument of suser().Martin Pieuchot
The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2018-02-10Syncronize filesystems to disk when suspending. Each mountpoint's vnodesTheo de Raadt
are pushed to disk. Dangling vnodes (unlinked files still in use) and vnodes undergoing change by long-running syscalls are identified -- and such filesystems are marked dirty on-disk while we are suspended (in case power is lost, a fsck will be required). Filesystems without dangling or busy vnodes are marked clean, resulting in faster boots following "battery died" circumstances. Tested by numerous developers, thanks for the feedback.
2018-01-02Stop assuming <sys/file.h> will pull in fcntl.h when _KERNEL is defined.Philip Guenther
ok millert@ sthen@
2017-12-11In uvm Chuck decided backing store would not be allocated proactivelyTheo de Raadt
for blocks re-fetchable from the filesystem. However at reboot time, filesystems are unmounted, and since processes lack backing store they are killed. Since the scheduler is still running, in some cases init is killed... which drops us to ddb [noted by bluhm]. Solution is to convert filesystems to read-only [proposed by kettenis]. The tale follows: sys_reboot() should pass proc * to MD boot() to vfs_shutdown() which completes current IO with vfs_busy VB_WRITE|VB_WAIT, then calls VFS_MOUNT() with MNT_UPDATE | MNT_RDONLY, soon teaching us that *fs_mount() calls a copyin() late... so store the sizes in vfsconflist[] and move the copyin() to sys_mount()... and notice nfs_mount copyin() is size-variant, so kill legacy struct nfs_args3. Next we learn ffs_mount()'s MNT_UPDATE code is sharp and rusty especially wrt softdep, so fix some bugs adn add ~MNT_SOFTDEP to the downgrade. Some vnodes need a little more help, so tie them to &dead_vnops. ffs_mount calling DIOCCACHESYNC is causing a bit of grief still but this issue is seperate and will be dealt with in time. couple hundred reboots by bluhm and myself, advice from guenther and others at the hut
2017-04-15After forced unmount of a file system that has other mount pointsAlexander Bluhm
in it, dangling mounts could remain. When unmounting check the hierarcy and unmount recursively. Also prevent that a new mount appears during the scan. Joint work with natano@; testing and OK krw@
2017-02-15Threads share filedesc, so we can walk allprocess instead of allprocPhilip Guenther
ok mpi@ millert@
2017-02-11Add a flags argument to falloc() that lets it optionally set thePhilip Guenther
close-on-exec flag on the newly allocated fd. Make falloc()'s return arguments non-optional: assert that they're not NULL. ok mpi@ millert@
2017-01-23Avoid curproc dance in dupfdopen(), by passing a struct proc *Theo de Raadt
ok guenther mpi
2017-01-15When traversing the mount list, the current mount point is lockedAlexander Bluhm
with vfs_busy(). If the FOREACH_SAFE macro is used, the next pointer is not locked and could be freed by another process. Unless necessary, do not use _SAFE as it is unsafe. In vfs_unmountall() the current pointer is actullay freed. Add a comment that this race has to be fixed later. OK krw@
2017-01-10Fix white spaces. No binary change.Alexander Bluhm
2017-01-10Remove the unused olddp parameter from function dounmount().Alexander Bluhm
OK mpi@ millert@
2016-09-10Add a noperm mount flag for FFS to be used for building release setsMartin Natano
without root privileges. This is only the kernel/mount flag; additional work in the build Makefile's will be necessary such that the files in $DESTDIR are created with correct permissions. tedu couldn't find anything wrong with it in a quick review idea & ok deraadt
2016-09-07Remove usermount remnants. ok teduMartin Natano
2016-07-14kern.usermount=1 is unsafe for everyone, since it allows any non-pledgedTheo de Raadt
program to call the mount/umount system calls. There is no way any user can be expected to keep their system safe / reliable with this feature. Ignore setting to =1, and after release we'll delete the sysctl entirely. ok lots of people
2016-07-12The only valid flag for unmount(2) is MNT_FORCE, ignore any others.Todd C. Miller
Fixes a crash when MNT_DOOMED is passed in the flags to unmount(2) found by NCC Group. OK bluhm@
2016-07-06Return EINVAL for mknod/mknodat when dev is -1 (aka VNOVAL).Todd C. Miller
OK beck@ tedu@
2016-07-03introduces new promise "chown" to allow changing owner/group with *chown(2) ↵Sebastien Marie
family it splits PLEDGE_FATTR in two ("fattr" stills grant the 2 flags, so no functional changes): - PLEDGE_CHOWN : to be able to call *chown(2) syscalls - PLEDGE_FATTR : the rest it introduces "chown" which grant: - PLEDGE_CHOWN : be able to call *chown(2) - PLEDGE_CHOWNUID : be able to modifying owner/group ok deraadt@ tedu@
2016-06-27dovutimens: call vrele(9) before returning EINVALSebastien Marie
ok guenther@
2016-06-27sys_revoke: call vrele() before returning ENOTTYSebastien Marie
ok guenther@
2016-06-26use error code path instead of return early without calling VOP_ABORTOP() andSebastien Marie
vrele()/vput(). ok deraadt@
2016-06-01rmdir(2) should return EINVAL not EBUSY when trying to remove ".".Todd C. Miller
This brings us back in conformance with POSIX rmdir(2) and rmdirat(2). OK kettenis@
2016-05-27W^X violations are no longer permitted by default. A kernel log messageTheo de Raadt
is generated, and mprotect/mmap return ENOTSUP. If the sysctl(8) flag kern.wxabort is set then a SIGABRT occurs instead, for gdb use or coredump creation. W^X violating programs can be permitted on a ffs/nfs filesystem-basis, using the "wxallowed" mount option. One day far in the future upstream software developers will understand that W^X violations are a tremendously risky practice and that style of programming will be banished outright. Until then, we recommend most users need to use the wxallowed option on their /usr/local filesystem. At least your other filesystems don't permit such programs. ok jca kettenis mlarkin natano
2016-05-15remove chroot(2) from allowed syscalls under pledge(2).Sebastien Marie
please note that chrooted process are still possible with pledge(2), but only if the chroot(2) is done *before* calling pledge(2). Once pledged, no more chroot(2) call are permitted.
2016-03-27When pulling and unmounting an umass USB stick, the file systemAlexander Bluhm
could end up in an inconsistent state. The fstype dependent mp->mnt_data was NULL, but the general mp was still listed as a valid mount point. Next access to the file system would crash with a NULL pointer dereference. If closing the device fails, the mount point must go away anyway. There is nothing we can do about it. Remove the workaround for the EIO error in the general unmount code, but do not generate any error in the file system specific unmount functions. OK natano@ beck@
2016-03-19Remove the unused flags argument from VOP_UNLOCK().natano
torture tested on amd64, i386 and macppc ok beck mpi stefan "the change looks right" deraadt
2016-01-06remove unnecessary casts where the incoming type is void *.Ted Unangst
2016-01-02mmcc noticed that nd.ni_pledge was uninitialized in doopenat() for theTheo de Raadt
oflags & 3 == 3 case. Therefore this depends on vn_open() blocking the operation later. Probably this meant the ni_pledge request would be too high, causing transient operation failure, rather than transient operation passage). Instead of initializing based on the oflags value use the result of FFLAGS(). I should have done this from the start. ok semarie [oflags & 3 == 3 is major dejavu for me]
2015-12-16in pledged process, setuid/setgid/sticky bits should be ignored.Sebastien Marie
enforce it for open(2) when used with O_CREAT and mode. ok deraadt@
2015-12-16in pledged process, setuid/setgid/sticky bits should be ignored.Sebastien Marie
enforce it for mkfifo(2) and mknod(2) (with "dpath" promise). ok deraadt@
2015-12-05remove stale lint annotationsTed Unangst
2015-12-04Add pledge "dpath", which provides access to mknod(2) and mkfifo(2).Theo de Raadt
This will be required to keep pax/tar/cpio at otherwise very high levels of pledge (and we will see where else it is beneficial). Allocate a bit for pledge "audio", which will be coming soon. good discussions with semarie