summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2021-06-04regenmvs
2021-06-04Unlock connect(2). Again.mvs
ok mpi@
2021-06-02Use the same logic in all copies of gpt_chk_mbr(), relaxing theKenneth R Westerback
media length check to allow EFI GPT partitions to be smaller that the entire disk. Consistently use GPTSECTOR instead of randomly tossing in some literal '1's. ok kettenis@
2021-06-02Enable pool cache on knote poolVisa Hankala
Use the pool cache to reduce the overhead of memory management in function kqueue_register(). When EV_ADD is given, kqueue_register() pre-allocates a knote to avoid potential sleeping in the middle of the critical section that spans from knote lookup to insertion. However, the pre-allocation is useless if the lookup finds a matching knote. The cost of knote allocation will become significant with kqueue-based poll(2) and select(2) because the frequency of allocation will increase. Most of the cost appears to come from the locking inside the pool. The pool cache amortizes it by using CPU-local caches of free knotes as buffers. OK dlg@ mpi@
2021-06-02regenmvs
2021-06-02Unlock setrtable(2). Local copy of `ps_rtableid' used to make checksmvs
consistent. ok mpi@
2021-06-02kernel: introduce per-CPU panic(9) message bufferscheloha
Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each platform for use by panic(9). The first panic on a given CPU writes its message to this buffer. Subsequent panics on a given CPU print the panic message to the console but do not modify the buffer. This aids debugging in two cases: - If 2+ CPUs panic simultaneously there is no risk of garbled messages in the panic buffer. - If a CPU panics and then the operator causes a second panic while using ddb(4), the operator can still recall the first failure on a particular CPU. Misc. changes to support this bigger change: - Set panicstr atomically to identify the first CPU to reach panic(). - Tweak db_show_panic_cmd() to print all panic messages across all CPUs. Prefix the first panic with an asterisk ('*'). - Prefer db_printf() to printf() during a panic if we have it. Apparently it disturbs less global state. - On amd64, tweak fault() to write the local panic buffer. This needs more work. Prompted by bluhm@ and deraadt@. Mostly written by deraadt@. Discussed with bluhm@, deraadt@ and kettenis@. Borne from a discussion on tech@ about making panic(9) more MP-safe: https://marc.info/?l=openbsd-tech&m=162086462316143&w=2 ok kettenis@, visa@, bluhm@, deraadt@
2021-06-01Make spoofed disklabel boundstart and boundend default to the boundsKenneth R Westerback
of the usable LBA range defined by the GPT header. And then shrink them to the bounds of the first OpenBSD partition if one is found. While here simplify the logic, eliminate some superfluous variables and reduce use of magic numbers. Improvement suggested by sobrado@ ok kettenis@
2021-05-31Redefine ADJFREQ_MIN to avoid undefined behaviour (when not using -fwrapv)Visa Hankala
Change the definition of ADJFREQ_MIN so that it does not shift a negative value. Such shifting is undefined in standard C. This came up when cross-compiling the kernel using ports clang. The shifting becomes defined when compiling with option -fwrapv. Base clang enables this option by default. OK naddy@ cheloha@
2021-05-30Declare all struct protosw as constant.Alexander Bluhm
OK mvs@
2021-05-28Add f_modify and f_process callbacks to socket filterops.Visa Hankala
This makes kqueue use the extended callback interface with socket event filters. Now one level of nested kernel locking is avoided, and the callbacks run without splhigh(). The filterops no longer check NOTE_SUBMIT, and use a fixed locking pattern instead. The f_event routines are always called with solock(), whereas f_modify and f_process are always called without the lock. OK mpi@
2021-05-27Relax criteria for recognizing GPT formatted media by allowing theKenneth R Westerback
EFI GPT partition (0xEE) in the protective MBR to be smaller that the actual size of the media. This allows GPT disk images dd'ed onto larger physical media to be recognized by fdisk(8) and the kernel. Feedback from kettenis@ on various earlier versions.
2021-05-26Fix the return value for the FUTEX_WAIT/FUTEX_WAIT_PRIVATE futex(2)Mark Kettenis
operation. System calls should return -1 and set errno when they fail. They should not return an errno value directly. This matches how the Linux version of futex(2) behaves and what Mesa expects. This fixes a bug in Mesa where a timeout wouldn't be reported properly. Technically this is an ABI break. But libc and libpthread were changed to be compatible with both the old and new ABI, and code outside of base almost certainly expects Linux compatible behaviour. If you have not rebuilt libc and the last few days, upgrade using a snap. Mesa issue discovered by jsg@ ok mpi@, deraadt@
2021-05-26Use `so_lock' to protect key management (PF_KEY) sockets. This can bemvs
done because we have no cases where one thread should lock two sockets simultaneously. tested by yasuoka@ ok bluhm@ markus@
2021-05-25As network features are not added dynamically, the domain structuresAlexander Bluhm
are constant. Having more const makes MP review easier. More pointers are mapped read-only in the kernel image. OK deraadt@ mvs@
2021-05-19In ttyinfo() check that ps_vmspace isn't NULL before calculating theMark Kettenis
resident set size. This replicates what the sysctl code does and fixes a kernel crash reported by robert@ ok deraadt@
2021-05-18Move potential sleeping m_getclr(9) out of `unp_lock' within unp_bind().mvs
ok mpi@
2021-05-17Increase the default buffer space using on PF_UNIX sockets to 8k.Claudio Jeker
Additionally make the values tuneable via sysctl. OK deraadt@ mvs@
2021-05-16panic does not require a \n at the end. When one is provided, it looks wrong.Theo de Raadt
2021-05-14Whitespace tweaks and a couple of stray u_int* in gpt_chk_mbr().Kenneth R Westerback
No intentional functional change.
2021-05-14Tweak the two copies of gpt_chk_mbr() to return the index of the MBRKenneth R Westerback
0xEE (DOSPTYP_EFI) partition, or -1 no usable such partition is found. Adopt a consistent idiom to capture the index for future use. Clean up the gpt_chk_mbr() logic to make it clearer what constraints are being applied when looking for the DOSTYP_EFI partition. No intentional functional change.
2021-05-13Do `so_rcv' cleanup with sblock() held.mvs
solock() should be taken before sblock(). soreceive() grabs solock() and then locks `so_rcv'. But later it releases solock() before call uimove(9). So concurrent thread which performs soshutdown() could break sorecive() loop. But `so_rcv' is still locked by sblock() so this soshutdown() thread will sleep in sorflush() at sblock() call. soshutdown() thread doesn't release solock() after sblock() call so it has no matter where to release `so_rcv' - is will be locked until the solock() release. That's why this strange looking code works fine. This sbunlock() movement just after `so_rcv' cleanup affects nothing but makes the code consistent and clean to understand. ok mpi@
2021-05-13Use NULL instead of 0 for mbuf(9) pointers.mvs
ok millert@
2021-05-13Assign NULL instead of 0 to `control' within sendit(). It's mbuf(9)mvs
pointer. ok deraadt@
2021-05-13Move ktrfds() below fdpunlock(). This fixes lock order issue betweenmvs
vn_lock(9) and fdplock(). Reported-by: syzbot+2300a1bedc425f6f851e@syzkaller.appspotmail.com ok visa@
2021-05-12regenMartin Pieuchot
2021-05-12Revert unlock of connect(2), bind(2), listen(2) and shutdown(2).Martin Pieuchot
At least one of them cause a deadlock involving `unplock' and mbuf allocations ('mbufpl') as reported by millert@.
2021-05-11timeout_barrier(9), timeout_del_barrier(9): remove kernel lockcheloha
In timeout_barrier(9) we take/release the kernel lock to ensure that the given timeout has finished running (if it had been running at all). This approach is inefficient. If we put a barrier timeout on the queue and wait for it to run in cond_wait(9) we can block instead of spinning for the kernel lock. We already do this for process-context timeouts in timeout_barrier(9) anyway. Discussed with dlg@, visa@, and mpi@. ok dlg@
2021-05-11regenmvs
2021-05-11Unlock shutdown(2).mvs
ok mpi@
2021-05-11regenmvs
2021-05-11Unlock listen(2).mvs
ok mpi@
2021-05-11regenmvs
2021-05-11Unlock connect(2).mvs
ok mpi@
2021-05-11regenmvs
2021-05-11Unlock bind(2).mvs
ok mpi@
2021-05-10Revert previous, it introduced a regression with breakpoints in gdb.Martin Pieuchot
2021-05-08Spoof GPT partitions of type 21686148-6449-6e6f-744e-656564454649 (a.k.a.Kenneth R Westerback
"IdontNeedEFI", a.k.a. "BIOS boot") as FS_BOOT. Often used to contain the second stage boot loader binary on disk images. Makes it easier to recognize/overwrite/remove the contents. Not yet supported in fdisk(8). Example image provided by mlarkin@
2021-05-06regenanton
2021-05-06Unlock lseek(2).anton
In August 2019 I tried to unlock lseek which failed since the vnode lock could not be acquired without holding the kernel lock back then. claudio@ recently made it possible to acquire a vnode lock without holding the kernel lock. The kernel lock is still required around VOP_GETATTR() as the underlying file system implementations are not MP-safe. ok claudio@
2021-05-06Refactor routines to stop/unstop processes and save the corresponding signal.Martin Pieuchot
- Move the "hack" involving P_SINTR to avoid grabbing the SCHED_LOCK() recursively closer to where it is necessary, in proc_stop() - Introduce proc_unstop(), the symmetric routine to proc_stop(), which manipulates `ps_xsig' and use it whenever a SSTOPed thread needs to be awaken. - Manipulate `ps_xsig' only in proc_stop/unstop() ok kettenis@
2021-05-04Reorder the integer sysctl functions. Then the traditional 4.4BSDAlexander Bluhm
comment 'As above...' makes sense again. Improve comments for sysctl_int_bounded() and sysctl_bounded_arr(). OK gnezdo@ mvs@
2021-05-04As the unbouded feature in sysctl_int_bounded() is no longer used,Alexander Bluhm
remove it. This also fixes a defective check of the dynamic boundary in sysctl_sysvshm(). OK mvs@ gnezdo@
2021-05-04syscalls.c, init_sysent.c, syscall.h, syscallargs.h: regencheloha
Regen after unlocking getitimer(2) and setitimer(2). ok anton@, mpi@
2021-05-04getitimer(2), setitimer(2): unlock syscallscheloha
With the changes in kern_time.c v1.150, neither getitimer(2) nor setitimer(2) need the kernel lock anymore. ok anton@, mpi@
2021-05-01Update the remaining SYSCTL_INT_READONLY casesgnezdo
OK mvs@
2021-05-01Implement per-socket `so_lock' rwlock(9) and use it to protect routingmvs
(PF_ROUTE) sockets. This can be done because we have no cases where one thread should lock two sockets simultaneously. Against the previous version rtm_senddesync_timer() execution was moved to process context. Also this time `so_lock' used for routing sockets only but in the future it will be used to other socket types too. tested by claudio@ ok claudio@ bluhm@
2021-05-01Retire OpenBSD/sgi.Visa Hankala
OK deraadt@
2021-04-30Rearrange the implementation of bounded sysctl. The primitiveAlexander Bluhm
functions are sysctl_int() and sysctl_rdint(). This brings us back the 4.4BSD implementation. Then sysctl_int_bounded() builds the magic for range checks on top. sysctl_bounded_arr() is a wrapper around it to support multiple variables. Introduce macros that describe the meaning of the magic boundary values. Use these macros in obvious places. input and OK gnezdo@ mvs@
2021-04-30When terminating via pledge_fail() stop all threads, before issuing aTheo de Raadt
(delayed action) sigabort() and disabling all syscalls for this process (ie. all threads). This resulted in multiple-threads crashing over top of themselves, and a poor debugging experience. We keep using sigabort() rather than sigexit(), to keep the debugging process good. Diagnosed from a report from brynet, and followup discussion with many.