summaryrefslogtreecommitdiff
path: root/sys/kern/vfs_syscalls.c
AgeCommit message (Collapse)Author
2020-08-23Remove unused debug_syncprt, improve debug sysctl handlingkn
"syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008. Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c the only visible difference between used and stub ctldebug structs in the debugvars[] array is their extern keyword, indicating that it is defined elsewhere. sys/sysctl.h declares all debugN members as extern upfront, but these declarations are not needed. Remove the unused debug sysctl, rename the only remaining one to something meaningful and remove forward declarations from /sys/sysctl.h; this way, adding new debug sysctls is a matter of adding extern and coming up with a name, which is nicer to read on its own and better to grep for. OK mpi
2020-08-22Move sysctl(2) CTL_DEBUG from DEBUG to new DEBUG_SYSCTLkn
Adding "debug.my-knob" sysctls is really helpful to select different code paths and/or log on demand during runtime without recompile, but as this code is under DEBUG, lots of other noise comes with it which is often undesired, at least when looking at specific subsystems only. Adding globals to the kernel and breaking into DDB to change them helps, but that does not work over SSH, hence the need for debug sysctls. Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general option for all of sysctl(2). OK gnezdo
2020-06-24kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha
time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
2020-03-19Move unveil data structures away from the proc.h header into theanton
implementation file. Pushing the assignment of ps_uvpcwd down to unveil_add() is required but it doesn't introduce any functional change. ok mpi@ semarie@
2020-03-13In order to unlock flock(2), make writes to the f_iflags field of structanton
file atomic. This also gets rid of the last kernel lock protected field in the scope of struct file. ok mpi@ visa@
2020-01-30Acquire fdplock when updating fd_cmask. This moves the codeVisa Hankala
toward MP-safety. OK mpi@, anton@
2020-01-26add a new __tmpfd system call that creates a new unnamed file in /tmp.Ted Unangst
intended for shm/fd passing, but for programs that may otherwise like filesystem access. ok beck deraadt kettenis
2020-01-18Clear mount operation argument flags from mnt_flag after mount.Visa Hankala
OK bluhm@
2020-01-10Convert the vnode list at the mount point into a tailq. DuringAlexander Bluhm
unmount this list is traversed and the dirty vnodes are flushed to disk. Forced unmount expects that the list is empty after flushing, otherwise the kernel panics with "dangling vnode". As the write to disk can sleep, new vnodes may be inserted. If softdep is enabled, resolving the dependencies creates new dirty vnodes and inserts them to the list. To fix the panic, let insmntque() insert new vnodes at the tail of the list. Then vflush() will still catch them while traversing the list in forward direction. OK tedu@ millert@ visa@
2019-11-29Eliminate the sketchy use of ps_mainproc here by making unveil_add_vnode()Philip Guenther
take a struct proc* instead of a struct process*, and vice versa making unveil_lookup() take a process* instead of a proc*. ok beck@
2019-11-26Don't use LOCKPARENT on namei calls for realpath(). We don'tBob Beck
require this anymore since we now behave like posix. Fixes a problem where a symlink to / would return ENOTDIR because the parent could not be locked - noticed by Raimo Niskanen <raimo@erlang.org> ok guenther@ deraadt@
2019-10-06Fix vn_open to require an op of 0, and 0 or KERNELPATH only as flags.Bob Beck
sweep tree to correct NDIINT op and flags ahead of time. document the requirement. This allows KERNELPATH to be used to bypass unveil for crash dumps with nosuidcoredump=2 or 3 ok visa@ deraadt@ florian@
2019-08-31Make readlink require UNVEIL_READ instead of UNVEIL_INSPECT onlyBob Beck
since realpath() is now a system call ok deraadt@
2019-08-07The pathname in unveil(2) allocated 1024 bytes on the stack duringAlexander Bluhm
the system call. Better use namei pool like sys___realpath() does. OK semarie@ deraadt@
2019-08-06Fix white spaces.Alexander Bluhm
2019-08-05Kernel realpath(3) and unveil(2) did not work correctly if the rootAlexander Bluhm
directory was written as "//". If there is no non-slash character in the path name, use the spacial case for root. found by gmake regression tests; OK naddy@ benno@
2019-08-05Kernel realpath(3) had the same vnode leakage bug like unveil(2).Alexander Bluhm
If parent and lookup vnode are equal, namei(9) locks them once but reference counts twice. from Moritz Buhl
2019-08-05Allow concurrent reads of the f_offset field of struct file byanton
serializing both read/write operations using the existing file mutex. The vnode lock still grants exclusive write access to the offset; the mutex is only used to make the actual write atomic and prevent any concurrent reader from observing intermediate values. ok mpi@ visa@
2019-08-04Calling unveil(2) with the current directory leaked a vnode. EvenAlexander Bluhm
if the parent and the lookup vnode are equal, namei(9) reference counts both. So release the parent vnode uncoditionally. OK visa@
2019-08-02Move prototypes of unveil(2) functions which are used in separate CAlexander Bluhm
files into the common namei.h header. OK deraadt@
2019-07-25vinvalbuf(9): tlseep -> tsleep_nsec(9); ok millert@cheloha
2019-07-23Fix realpath(3) errno code for an empty input path string.Stefan Sperling
It should return ENOENT in this case, but was returning EINVAL. ok bluhm@ deraadt@
2019-07-22Correct minor style nit in sys_getdents() for consistency, missing parens aroundanton
return expression.
2019-07-22Grab the vnode lock earlier in sys_getdents() since it could end upanton
sleeping, allowing the file offset to change. This is part of the ongoing effort to protect the file offset using the vnode lock. ok mpi@ visa@
2019-07-15revert unintended change that snuck in last commitBob Beck
2019-07-15Make realpath posixly correct by changing the kernel implementationBob Beck
to not succeed on final path components that do not exist. The original implmentation succeeded in these cases. ok bluhm@
2019-07-12Revert anton@ changes about read/write unlockingsolene
https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2 ok anton@
2019-07-10Make read/write of the f_offset field belonging to struct file MP-safe;anton
as part of the effort to unlock the kernel. Instead of relying on the vnode lock, introduce a dedicated lock per file. Exclusive write access is granted using the new foffset_enter and foffset_leave API. A convenience function foffset_get is also available for threads that only need to read the current offset. The lock acquisition order in vn_write has been changed to match the one in vn_read in order to avoid a potential deadlock. This change also gets rid of a documented race in vn_read(). Inspired by the FreeBSD implementation. With help and ok mpi@ visa@
2019-06-19the pledge STATLIE code is no longer needed, as discussed with beck.Theo de Raadt
it actually isn't reached...
2019-06-15Have __realpath() do the pathname==NULL -> EINVAL check itself, eliminatingTheo de Raadt
the need to do this in libc. btw, it is unfortunate posix went this way, because converting a clearly illegal condition to not be fatal but instead return an error which is potentially not checked in the caller, is sadly a large component of the runaway-train model that makes exploitation of software easy.. illegal software should crash hard. ok beck
2019-05-30namei() generate KTR_NAMEI record input filenames, but getcwd(2) andTheo de Raadt
realpath(2) have output filenames. Generate additional KTR_NAMEI records upon success. ok millert beck
2019-05-30Correct call to vfs_getcwd_common from within __realpathBob Beck
I borrowed an example usage from __getcwd poorly to begin with and then there was some other strangeness in there. diagnosed with deraadt. ok deraadt@
2019-05-13Add a kernel implementation of realpath() as __realpath().Bob Beck
We want this so that we can stop allowing readlink() on traversed vnodes in unveil(). This includes all the kernel side and the system call. This is not yet used in libc for realpath, so nothing calls this yet. The libc wrapper will be committed later. Testing by many, and ports build by naddy@ ok deraadt@
2019-03-24Make stat(2) and access(2) need UNVEIL_READ instead of UNVEIL_INSPECTBob Beck
UNVEIL_INSPECT is a hack we added to get chrome/glib working. It silently adds permission for stat(2), access(2), and readlink(2) to be used on all path components of any unveil'ed path. robert@ has sucessfully now fixed chrome/glib to not require exessive TOC vs TOU stat(2) and access(2) calls on the paths it uses, so that this no longer needed there. readlink(2) is the sole call that is now permitted by UNVEIL_INSPECT, and this is only needed so that realpath(3) can work. Going forward we will likely make a realpath(2), after which we can completely deprecate UNVEIL_INSPECT. ok deraadt@
2019-01-23futimens(2), futimes(2), utimensat(2), utimes(2): Validate input at copyincheloha
Currently we validate time input for all four of these syscalls in the workhorse function dovutimens(). This is bad because both futimes(2) and utimes(2) have input as timevals that need to be converted to timespecs. This multiplication can overflow to create a "valid" input, e.g. if tv_usec is equal to 2^61 (invalid value) on a platform with 64-bit longs, the resulting tv_nsec is equal to zero (valid value). This is also a bit wasteful. We aquire a vnode and do other work under KERNEL_LOCK only to release the vnode when the time input is invalid. So, duplicate a bit of code to validate the time inputs before we do any conversions or real VFS work. probably still ok tedu@ deraadt@
2019-01-22namei can return a null dvp on success. check this before access.Ted Unangst
ok beck Reported-by: syzbot+cc59412ed8429450a1ae@syzkaller.appspotmail.com
2019-01-21sometimes we don't call unveil_add, which means memory allocated by nameiTed Unangst
doesn't get freed. move the free calls into the same function as namei. fixed bug report from Dariusz Sendkowski ok beck
2019-01-03Fix a collection of covering unveil bugs that prevent unveil's of upperBob Beck
level directories from working when you don't traverse into them starting from /. Most found by brynet@ and a few others. ok brynet@ deraadt@
2018-12-23Rectify some issues with the noperm mount flag; the root vnode was notMartin Natano
protected properly and files without any x bit set were accidentaly considered executable when checked with access(2). Issues found and reported by deraadt, halex, reyk, tb ok deraadt
2018-10-28Correctly deal with upper level unveil's by keeping track of the coveringBob Beck
unveil for each unveil in the process at unveil() time, and refactoring the handling of current directory and ISDOTDOT to be much more sensible. Worked out at ns2k18 with guenther@. ok deraadt@
2018-09-26Move the allocating and freeing of mount points intoVisa Hankala
dedicated functions. OK deraadt@ mpi@
2018-09-16Move vfsconf lookup code into dedicated functions.Visa Hankala
OK bluhm@
2018-09-01Fix errno for post-lock unveil callsTheo de Raadt
from Jan Klemkow
2018-08-20Reorder checks in the read/write(2) family of syscalls to prepare makingMartin Pieuchot
file operations mp-safe. This change makes it clear that `f_offset' is only accessed in vn_read() and vn_write(), which will help taking it out of the KERNEL_LOCK(). This refactoring uncovered a race in vn_read() which is now documented and will be addressed in a later diff. ok visa@
2018-08-13in sys_statfs(), BYPASSUNVEIL can be passed to NDINIT in the "flags"Theo de Raadt
argument, rather than manually |= afterwards. Observed by semarie
2018-08-11Get rid of PLEDGE_STAT, which was a hack used for unveil.Bob Beck
We use UNVEIL_INSPECT instead in the unveil flags for the same purpose, and now add traversed vnodes of a path with UNVEIL_INSPECT instead of with 0 flags and voodoo in unveil_flagmatch. This allows us to uncontort the logic of unveil_flagmatch a bunch. helpful review and ok from semarie@
2018-08-05Decouple unveil from the pledge flags, by adding dedicated unveil flagsBob Beck
to the namei args. This fixes a bug where chmod would be allowed when with only READ. This also allows some further cleanup of some awkward things like PLEDGE_STAT that will follow Lots of assistence from semarie@ - thanks! ok semarie@
2018-08-03ni_pledge flags are a uint64_t not an int - don't initialize with an int.Bob Beck
2018-07-30Fix a NULL-pointer dereference when calling open() on a cloned device with writeanton
permissions and the flags include O_TRUNC|O_SHLOCK. ok deraadt@
2018-07-30rename 2nd argument of unveil from vague "flags" to "permissions";Theo de Raadt
man page change will follow