Age | Commit message (Collapse) | Author |
|
Switch libc and ld.so to the generic stubs for these calls.
WARNING: reboot to updated kernel before installing libc or ld.so!
Time for a story...
When gcc (back in 1.x days) first implemented long long, it didn't (always)
pass 64bit arguments in 'aligned' registers/stack slots, with the result that
argument offsets didn't match structure offsets. This affected the nine system
calls that pass off_t arguments:
ftruncate lseek mmap mquery pread preadv pwrite pwritev truncate
To avoid having to do custom ASM wrappers for those, BSD put an explicit pad
argument in so that the off_t argument would always start on a even slot and
thus be naturally aligned. Thus those odd wrappers in lib/libc/sys/ that use
__syscall() and pass an extra '0' argument.
The ABIs for different CPUs eventually settled how things should be passed on
each and gcc 2.x followed them. The only arch now where it helps is landisk,
which needs to skip the last argument register if it would be the first half of
a 64bit argument. So: add new syscalls without the pad argument and on landisk
do that skipping directly in the syscall handler in the kernel. Keep compat
support for the existing syscalls long enough for the transition.
ok deraadt@
|
|
it is a leftover from LOCKPARENT removal in NDINIT() (in rev 1.337)
ok mpi@
|
|
pass in the vnode to unveil_start_relative() like it is done for *at()
syscalls. This fixes an issue with fchdir() that actually did not correctly
reset this pointer when changing the working directory.
OK beck@
|
|
These traversed vnodes are a leftover from early times where realpath(3)
was still all done in userland.
OK semarie@
|
|
The code doesn't doesn't need it: the returned vnode is released
immediately. The string path is built from the namei() call using
REALPATH, during directories traversal.
Without LOCKLEAF, calling vrele() only is enough if namei() found a
file, instead of calling VOP_UNLOCK() + vrele().
ok claudio@ mpi@
|
|
function which need the lock (falloc, fdinsert, fdremove). In most cases
it is not correct to hold the lock while calling VFS functions or e.g.
closef since those aquire or release long lived VFS locks.
OK visa@ mvs@
|
|
if they are out of range, making it easier to isolate reason for EINVAL
ok cheloha
|
|
"syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008.
Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c
the only visible difference between used and stub ctldebug structs in the
debugvars[] array is their extern keyword, indicating that it is defined
elsewhere.
sys/sysctl.h declares all debugN members as extern upfront, but these
declarations are not needed.
Remove the unused debug sysctl, rename the only remaining one to something
meaningful and remove forward declarations from /sys/sysctl.h; this way,
adding new debug sysctls is a matter of adding extern and coming up with a
name, which is nicer to read on its own and better to grep for.
OK mpi
|
|
Adding "debug.my-knob" sysctls is really helpful to select different
code paths and/or log on demand during runtime without recompile,
but as this code is under DEBUG, lots of other noise comes with it
which is often undesired, at least when looking at specific subsystems
only.
Adding globals to the kernel and breaking into DDB to change them helps,
but that does not work over SSH, hence the need for debug sysctls.
Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of
DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general
option for all of sysctl(2).
OK gnezdo
|
|
time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.
This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).
There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.
There is no performance cost on 64-bit (__LP64__) platforms.
With input from visa@, dlg@, and tedu@.
Several bugs squashed by visa@.
ok kettenis@
|
|
implementation file. Pushing the assignment of ps_uvpcwd down to
unveil_add() is required but it doesn't introduce any functional change.
ok mpi@ semarie@
|
|
file atomic. This also gets rid of the last kernel lock protected field
in the scope of struct file.
ok mpi@ visa@
|
|
toward MP-safety.
OK mpi@, anton@
|
|
intended for shm/fd passing, but for programs that may otherwise like
filesystem access.
ok beck deraadt kettenis
|
|
OK bluhm@
|
|
unmount this list is traversed and the dirty vnodes are flushed to
disk. Forced unmount expects that the list is empty after flushing,
otherwise the kernel panics with "dangling vnode". As the write
to disk can sleep, new vnodes may be inserted. If softdep is
enabled, resolving the dependencies creates new dirty vnodes and
inserts them to the list. To fix the panic, let insmntque() insert
new vnodes at the tail of the list. Then vflush() will still catch
them while traversing the list in forward direction.
OK tedu@ millert@ visa@
|
|
take a struct proc* instead of a struct process*, and vice versa making
unveil_lookup() take a process* instead of a proc*.
ok beck@
|
|
require this anymore since we now behave like posix.
Fixes a problem where a symlink to / would return ENOTDIR because
the parent could not be locked - noticed by Raimo Niskanen <raimo@erlang.org>
ok guenther@ deraadt@
|
|
sweep tree to correct NDIINT op and flags ahead of time. document
the requirement. This allows KERNELPATH to be used to bypass
unveil for crash dumps with nosuidcoredump=2 or 3
ok visa@ deraadt@ florian@
|
|
since realpath() is now a system call
ok deraadt@
|
|
the system call. Better use namei pool like sys___realpath() does.
OK semarie@ deraadt@
|
|
|
|
directory was written as "//". If there is no non-slash character
in the path name, use the spacial case for root.
found by gmake regression tests; OK naddy@ benno@
|
|
If parent and lookup vnode are equal, namei(9) locks them once but
reference counts twice.
from Moritz Buhl
|
|
serializing both read/write operations using the existing file mutex.
The vnode lock still grants exclusive write access to the offset; the
mutex is only used to make the actual write atomic and prevent any
concurrent reader from observing intermediate values.
ok mpi@ visa@
|
|
if the parent and the lookup vnode are equal, namei(9) reference
counts both. So release the parent vnode uncoditionally.
OK visa@
|
|
files into the common namei.h header.
OK deraadt@
|
|
|
|
It should return ENOENT in this case, but was returning EINVAL.
ok bluhm@ deraadt@
|
|
return expression.
|
|
sleeping, allowing the file offset to change. This is part of the
ongoing effort to protect the file offset using the vnode lock.
ok mpi@ visa@
|
|
|
|
to not succeed on final path components that do not exist.
The original implmentation succeeded in these cases.
ok bluhm@
|
|
https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2
ok anton@
|
|
as part of the effort to unlock the kernel. Instead of relying on the
vnode lock, introduce a dedicated lock per file. Exclusive write access
is granted using the new foffset_enter and foffset_leave API. A
convenience function foffset_get is also available for threads that only
need to read the current offset.
The lock acquisition order in vn_write has been changed to match the one
in vn_read in order to avoid a potential deadlock. This change also gets
rid of a documented race in vn_read().
Inspired by the FreeBSD implementation.
With help and ok mpi@ visa@
|
|
it actually isn't reached...
|
|
the need to do this in libc.
btw, it is unfortunate posix went this way, because converting a clearly
illegal condition to not be fatal but instead return an error which is
potentially not checked in the caller, is sadly a large component of the
runaway-train model that makes exploitation of software easy.. illegal
software should crash hard.
ok beck
|
|
realpath(2) have output filenames. Generate additional KTR_NAMEI
records upon success.
ok millert beck
|
|
I borrowed an example usage from __getcwd poorly to begin with
and then there was some other strangeness in there.
diagnosed with deraadt.
ok deraadt@
|
|
We want this so that we can stop allowing readlink() on traversed
vnodes in unveil().
This includes all the kernel side and the system call.
This is not yet used in libc for realpath, so nothing calls this yet.
The libc wrapper will be committed later.
Testing by many, and ports build by naddy@
ok deraadt@
|
|
UNVEIL_INSPECT is a hack we added to get chrome/glib working. It silently
adds permission for stat(2), access(2), and readlink(2) to be used on
all path components of any unveil'ed path. robert@ has sucessfully now
fixed chrome/glib to not require exessive TOC vs TOU stat(2) and access(2)
calls on the paths it uses, so that this no longer needed there.
readlink(2) is the sole call that is now permitted by UNVEIL_INSPECT,
and this is only needed so that realpath(3) can work. Going forward we will
likely make a realpath(2), after which we can completely deprecate
UNVEIL_INSPECT.
ok deraadt@
|
|
Currently we validate time input for all four of these syscalls in the
workhorse function dovutimens(). This is bad because both futimes(2)
and utimes(2) have input as timevals that need to be converted to
timespecs. This multiplication can overflow to create a "valid"
input, e.g. if tv_usec is equal to 2^61 (invalid value) on a platform
with 64-bit longs, the resulting tv_nsec is equal to zero (valid value).
This is also a bit wasteful. We aquire a vnode and do other work
under KERNEL_LOCK only to release the vnode when the time input is
invalid.
So, duplicate a bit of code to validate the time inputs before we do
any conversions or real VFS work.
probably still ok tedu@ deraadt@
|
|
ok beck
Reported-by: syzbot+cc59412ed8429450a1ae@syzkaller.appspotmail.com
|
|
doesn't get freed. move the free calls into the same function as namei.
fixed bug report from Dariusz Sendkowski
ok beck
|
|
level directories from working when you don't traverse into them starting
from /. Most found by brynet@ and a few others.
ok brynet@ deraadt@
|
|
protected properly and files without any x bit set were accidentaly considered
executable when checked with access(2).
Issues found and reported by deraadt, halex, reyk, tb
ok deraadt
|
|
unveil for each unveil in the process at unveil() time, and refactoring the
handling of current directory and ISDOTDOT to be much more sensible.
Worked out at ns2k18 with guenther@.
ok deraadt@
|
|
dedicated functions.
OK deraadt@ mpi@
|
|
OK bluhm@
|
|
from Jan Klemkow
|