Age | Commit message (Collapse) | Author |
|
curproc that does the locking or unlocking, so the proc parameter
is pointless and can be dropped.
OK mpi@, deraadt@
|
|
ok visa@
|
|
getvnode().
ok millert@
|
|
The syscall doesn't sleep before a vnode reference is taken, so it
doesn't stickly need the refcounts now. But they will be soon be
used for parrallelism, so make it ready.
ok bluhm@
|
|
This ensure that all operations manipulating a 'struct file *' do so
with a properly refcounted element.
ok visa@, tedu@, bluhm@
|
|
The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.
No objection from millert@, ok tedu@, bluhm@
|
|
are pushed to disk. Dangling vnodes (unlinked files still in use) and
vnodes undergoing change by long-running syscalls are identified -- and
such filesystems are marked dirty on-disk while we are suspended (in case
power is lost, a fsck will be required). Filesystems without dangling or
busy vnodes are marked clean, resulting in faster boots following
"battery died" circumstances.
Tested by numerous developers, thanks for the feedback.
|
|
ok millert@ sthen@
|
|
for blocks re-fetchable from the filesystem. However at reboot time,
filesystems are unmounted, and since processes lack backing store they
are killed. Since the scheduler is still running, in some cases init is
killed... which drops us to ddb [noted by bluhm]. Solution is to convert
filesystems to read-only [proposed by kettenis]. The tale follows:
sys_reboot() should pass proc * to MD boot() to vfs_shutdown() which
completes current IO with vfs_busy VB_WRITE|VB_WAIT, then calls VFS_MOUNT()
with MNT_UPDATE | MNT_RDONLY, soon teaching us that *fs_mount() calls a
copyin() late... so store the sizes in vfsconflist[] and move the copyin()
to sys_mount()... and notice nfs_mount copyin() is size-variant, so kill
legacy struct nfs_args3. Next we learn ffs_mount()'s MNT_UPDATE code is
sharp and rusty especially wrt softdep, so fix some bugs adn add
~MNT_SOFTDEP to the downgrade. Some vnodes need a little more help,
so tie them to &dead_vnops.
ffs_mount calling DIOCCACHESYNC is causing a bit of grief still but
this issue is seperate and will be dealt with in time.
couple hundred reboots by bluhm and myself, advice from guenther and
others at the hut
|
|
in it, dangling mounts could remain. When unmounting check the
hierarcy and unmount recursively. Also prevent that a new mount
appears during the scan.
Joint work with natano@; testing and OK krw@
|
|
ok mpi@ millert@
|
|
close-on-exec flag on the newly allocated fd. Make falloc()'s
return arguments non-optional: assert that they're not NULL.
ok mpi@ millert@
|
|
ok guenther mpi
|
|
with vfs_busy(). If the FOREACH_SAFE macro is used, the next pointer
is not locked and could be freed by another process. Unless
necessary, do not use _SAFE as it is unsafe. In vfs_unmountall()
the current pointer is actullay freed. Add a comment that this
race has to be fixed later.
OK krw@
|
|
|
|
OK mpi@ millert@
|
|
without root privileges. This is only the kernel/mount flag; additional
work in the build Makefile's will be necessary such that the files in
$DESTDIR are created with correct permissions.
tedu couldn't find anything wrong with it in a quick review
idea & ok deraadt
|
|
|
|
program to call the mount/umount system calls. There is no way any user
can be expected to keep their system safe / reliable with this feature.
Ignore setting to =1, and after release we'll delete the sysctl entirely.
ok lots of people
|
|
Fixes a crash when MNT_DOOMED is passed in the flags to unmount(2)
found by NCC Group. OK bluhm@
|
|
OK beck@ tedu@
|
|
family
it splits PLEDGE_FATTR in two ("fattr" stills grant the 2 flags, so no functional changes):
- PLEDGE_CHOWN : to be able to call *chown(2) syscalls
- PLEDGE_FATTR : the rest
it introduces "chown" which grant:
- PLEDGE_CHOWN : be able to call *chown(2)
- PLEDGE_CHOWNUID : be able to modifying owner/group
ok deraadt@ tedu@
|
|
ok guenther@
|
|
ok guenther@
|
|
vrele()/vput().
ok deraadt@
|
|
This brings us back in conformance with POSIX rmdir(2) and rmdirat(2).
OK kettenis@
|
|
is generated, and mprotect/mmap return ENOTSUP. If the sysctl(8) flag
kern.wxabort is set then a SIGABRT occurs instead, for gdb use or coredump
creation.
W^X violating programs can be permitted on a ffs/nfs filesystem-basis,
using the "wxallowed" mount option. One day far in the future
upstream software developers will understand that W^X violations are a
tremendously risky practice and that style of programming will be
banished outright. Until then, we recommend most users need to use the
wxallowed option on their /usr/local filesystem. At least your other
filesystems don't permit such programs.
ok jca kettenis mlarkin natano
|
|
please note that chrooted process are still possible with pledge(2), but only
if the chroot(2) is done *before* calling pledge(2). Once pledged, no more
chroot(2) call are permitted.
|
|
could end up in an inconsistent state. The fstype dependent
mp->mnt_data was NULL, but the general mp was still listed as a
valid mount point. Next access to the file system would crash with
a NULL pointer dereference.
If closing the device fails, the mount point must go away anyway.
There is nothing we can do about it. Remove the workaround for the
EIO error in the general unmount code, but do not generate any error
in the file system specific unmount functions.
OK natano@ beck@
|
|
torture tested on amd64, i386 and macppc
ok beck mpi stefan
"the change looks right" deraadt
|
|
|
|
oflags & 3 == 3 case. Therefore this depends on vn_open() blocking the
operation later. Probably this meant the ni_pledge request would be too
high, causing transient operation failure, rather than transient operation
passage). Instead of initializing based on the oflags value use the
result of FFLAGS(). I should have done this from the start.
ok semarie
[oflags & 3 == 3 is major dejavu for me]
|
|
enforce it for open(2) when used with O_CREAT and mode.
ok deraadt@
|
|
enforce it for mkfifo(2) and mknod(2) (with "dpath" promise).
ok deraadt@
|
|
|
|
This will be required to keep pax/tar/cpio at otherwise very high levels
of pledge (and we will see where else it is beneficial).
Allocate a bit for pledge "audio", which will be coming soon.
good discussions with semarie
|
|
/dev/console case, so go back to doing the direct D_TTY check.
signed over a few times with guenther
|
|
|
|
because that shows the /dev/console translated vnode.
You either already know the story, or you don't want to know.
|
|
ok millert semarie tedu guenther
|
|
ok millert semarie tedu guenther
|
|
|
|
pledgenote is used for annotate the policy for a namei context. So make it
tracking the nameidata.
It is expected for the caller to explicitly define the policy. It is a kernel
bug to not do so.
ok deraadt@
|
|
- rename _check function without suffix: a "pledge" function called from
anywhere is a "check" function.
- makes pledge_fail call the responsability to the _check function. remove it
from caller.
- make proper use of (potential) returned error of _check() functions.
- adds pledge_kill() and pledge_protexec()
with and OK deraadt@
|
|
ok semarie
|
|
- in sys_access() which call dofaccessat() and where the same note is already setted
- in sys_mkdir() which call domkdirat() and where the same note is already setted
- in sys_rmdir() which call dounlinkat() and where the same note is already setted
it makes the p_pledgenote affectation be near NDINIT/NDINITAT call.
|
|
the previous check in pledge_namei() was incomplete. For using SYS_chroot we
needed "id", and we could have passed pledge_namei() just with "rpath" (without
using the now removed whitelisted entry).
the check for "rpath id proc" is now done using p_pledgenote: pledge_namei()
will check that the pledgenote is permitted by your pledge.
"go ahead" deraadt@
|
|
noticed by semarie
|
|
used less than "stdio" -- include all the "self" operations. Instead of
different defines, use regular PLEDGE_* in the "p_pledgenote" variable
(which indicates the operation subtype a system call is performing). Many
checks before easier to understand. p_pledgenote can often be passed
directly to ktrace, so that kdump says:
15565 test CALL pledge(0xa9a3f804c51,0)
15565 test STRU pledge request="stdio"
15565 test RET pledge 0
15565 test CALL open(0xa9a3f804c57,0x2<O_RDWR>)
15565 test NAMI "/tmp/testfile"
15565 test PLDG open, "wpath", errno 1 Operation not permitted
with help from semarie, ok guenther
|
|
the following diff adds a clear view of whitelisted-paths in pledge.
before, whitelisting "/usr/local/bin" path would make only "/usr/local/bin"
VNODE was present and let "/usr/local", "/usr", and "/" been ENOENT. It was a
somehow odd filesystem hierarchy, and it breaks realpath(3).
with this diff, the directories that are one of the parents of a
whitelisted-directory become visible to stat(2) related syscalls, but only
with restricted permissions: stat(2) will lie a bit, and saying they owned by
root:wheel and mode is --x--x--x. Note that only stat(2) is affected by this
"view", and the owner/mode aren't effectively changed: it is just a "lie".
while here, refactor a bit pledge_namei() in order to avoid multiple for-loop
on whitelisted-path array.
ok deraadt@
|