summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2024-08-06Unlock KERN_CLOCKRATE.Vitaliy Makkoveev
Read-only access to local `clkinfo' filled with immutable data. ok bluhm
2024-08-06Stop using KERNEL_LOCK to protect the per process kqueue listClaudio Jeker
Instead of the KERNEL_LOCK use the ps_mtx for most operations. If the ps_klist is modified an additional global rwlock (kqueue_ps_list_lock) is required. This includes the knotes with NOTE_FORK and NOTE_EXIT since in either cases a ps_klist is changed. In the NOTE_FORK | NOTE_TRACK case the call to kqueue_register() can sleep this is why a global rwlock is used. Adjust the reaper() to call knote_processexit() without KERNEL_LOCK. Double lock idea from visa@ OK mvs@
2024-08-05change the nsec argument to timeout_add_nsec from int to uint64_tDavid Gwynne
you can only fit a couple of nanonseconds into an int, which limited the usefulness of the api. worse, if a large nsec value was passed in it could be cast to a negative int value which tripped over a KASSERT at the top of timeout_add that ends up being called. avoid this footgun by working in the bigger type and doing the same range checks/fixes for other timeout_add wrappers. ok claudio@ mvs@
2024-08-05Unlock KERN_BOOTTIME.Vitaliy Makkoveev
microboottime() and following binboottime() are mp-safe and `mb' is local data. ok bluhm
2024-08-05Unlock most of `kern_vars' variables.Vitaliy Makkoveev
Add corresponding cases to the kern_sysctl() switch and unlock read-only variables from `kern_vars'. Unlock KERN_SOMAXCONN and KERN_SOMINCONN which are atomically read-only accessed only from solisten(). ok kettenis
2024-08-05Take `sysctl_lock' before kernel lock.Vitaliy Makkoveev
ok bluhm
2024-08-02regenVitaliy Makkoveev
2024-08-02Push kernel lock down to sysctl(2).Vitaliy Makkoveev
Unlock few obvious immutable or read-only variables from "kern.*" and "hw.*" paths. Keep the rest variables locked as before, include pages wiring. Use new sysctl_vs{,un}lock() functions introduced for thar purpose. In kern.* path: - KERN_OSTYPE, KERN_OSRELEASE, KERN_OSVERSION, KERN_VERSION - immutable; - KERN_NUMVNODES - read-only access to integer; - KERN_MBSTAT - read-only access to per-CPU counters; In hw.* path: - HW_MACHINE, HW_MODEL, HW_NCPUONLINE, HW_PHYSMEM, HW_VENDOR, HW_PRODUCT, HW_VERSION, HW_SERIALNO, HW_UUID, HW_PHYSMEM64 - immutable; - HW_USERMEM and HW_USERMEM64 - `physmem' is immutable, uvmexp.wired is mutable but integer; read-only access to localy stored difference between `physmem' and uvmexp.wired; - `hw_vars' - read-only access to integers; some of them like HW_BYTEORDER and HW_PAGESIZE are immutable; ok bluhm kettenis
2024-08-01Run socket splice idle timeout without kernel lock.Alexander Bluhm
OK mvs@
2024-07-29Move the signal related kqueue filters to kern_event.c.Claudio Jeker
Since proc and signal filters share the same klist it makes sense to keep them together. OK mvs@
2024-07-29Remove `sb_sel' from sobuf_print() output, no sense to print "...".Vitaliy Makkoveev
ok bluhm
2024-07-29Replace per thread P_CONTINUED with per process PS_CONTINUED flagClaudio Jeker
dowait6() can only look at per process state so switch this over. Right now SIGCONT handling in ptsignal is recursive and not quite right but this is a step in the right direction. It fixes dowait6() handling for multithreaded processes where the main thread exited. OK mpi@
2024-07-26Trace struct itimervalPhilip Guenther
ok deraadt@ claudio@
2024-07-24KASSERT that the ps_single proc has P_SUSPSINGLE cleared.Claudio Jeker
Requested by kettenis@ and guenther@
2024-07-24Remove the (pr->ps_single->p_flag & P_SUSPSINGLE) == 0 check since itClaudio Jeker
is always true. Also consitently wrap all flag checks into parantheses. OK kettenis@ guenther@
2024-07-24Use a different mutex to protect the kqueue klist in logsoftc.Claudio Jeker
knote_locked() will call wakeup() and with it the SCHED_LOCK and by that makes log_mtx no longer a leaf lock. By using an own lock for the klist we can keep log_mtx a leaf lock and with that printf(9) can be used in most contexts again. OK mvs@
2024-07-24Move uvm_exit() outside of the KERNEL_LOCK() in the reaper.Martin Pieuchot
Use atomic operations to reference count VM spaces. Tested by claudio@, bluhm@, sthen@, jca@ ok jca@, claudio@
2024-07-23Pass curproc pointer down from sleep_finish() instead of pulling it inClaudio Jeker
again in sleep_signal_check(). OK dlg@
2024-07-22Rename PS_STOPPED to PS_STOPPING. I want to use PS_STOPPED to indicateClaudio Jeker
that a process has been stopped so make room for that. OK kettenis@
2024-07-22Switch proc_finish_wait() to use the process as argument instead of itsClaudio Jeker
ps_mainproc. dowait6() needs to stop using ps_mainproc and this is the first step. OK guenther@
2024-07-20Unlock udp(4) somove().Vitaliy Makkoveev
Socket splicing belongs to sockets buffers. udp(4) sockets are fully switched to fine-grained buffers locks, so use them instead of exclusive solock(). Always schedule somove() thread to run as we do for tcp(4) case. This brings delay to packet processing, but it is comparable wit non splicing case where soreceive() threads are always scheduled. So, now spliced udp(4) sockets rely on sb_lock() of `so_rcv' buffer together with `sb_mtx' mutexes of both buffers. Shared solock() only required around pru_send() call, so the most of somove() thread runs simultaneously with network stack. Also document 'sosplice' structure locking. Feedback, tests and OK from bluhm.
2024-07-14Fix source and drain confusion in socket splicing somove().Alexander Bluhm
If a large mbuf in the source socket buffer does not fit into the drain buffer, split the mbuf. But if the drain buffer still has some data in it, stop moving data and try again later. This skips a potentially expensive mbuf operation. When looking which socket buffer has to be locked, I found that the length of the source send buffer was checked. Change it to drain. As this is a performance optimization for a special corner case, noone noticed the bug. OK sashan@
2024-07-14Actually provide *definitions* for hwcap & hwcap2Jeremie Courreges-Anglas
Double checked by kettenis@ Sorry for the time window with breakage visible on arm64 and riscv64. :-/
2024-07-14Actually set up hwcap AUX_* entries when availableJeremie Courreges-Anglas
Erroneously dropped from the last elf_aux_info(3) diff I sent on tech@. Lack of this chunk would affect arm64 and riscv64 as they're the two architectures providing hwcap*. Should have been ok kettenis@
2024-07-13Revert the vdoom change, while it prevents the crashes on joel's goBob Beck
builder and avoids the ufs_inactive problems, bluhm hits panics on shutdown and filesystem unmount on the regress testers. We'll have to try the other approach of detecting the corrupted vnode perhaps.
2024-07-12Remove internet PCB mutex.Alexander Bluhm
All incpb locking has been converted to socket receive buffer mutex. Per PCB mutex inp_mtx is not needed anymore. Also delete PRU related locking functions. A flag PR_MPSOCKET indicates whether protocol functions support parallel access with per socket rw-lock. TCP is the only protocol that is not MP capable from the socket layer and needs exclusive netlock. OK mvs@
2024-07-12Switch `so_snd' of udp(4) sockets to the new locking scheme.Vitaliy Makkoveev
udp_send() and following udp{,6}_output() do not append packets to `so_snd' socket buffer. This mean the sosend() and sosplice() sending paths are dummy pru_send() and there is no problems to simultaneously run them on the same socket. Push shared solock() deep down to sesend() and take it only around pru_send(), but keep somove() running unedr exclusive solock(). Since sosend() doesn't modify `so_snd' the unlocked `so_snd' space checks within somove() are safe. Corresponding `sb_state' and `sb_flags' modifications are protected by `sb_mtx' mutex(9). Tested and OK bluhm.
2024-07-12Add vdoom() to fix ufs/ext2fs re-use of invalid vnode.Bob Beck
This was noticed by syzkiller and analyzed in isolaiton by mbuhl@ and visa@ two years ago. As the kernel has become more unlocked it has started to appear more and was being hit regularly by jsing@ on the Go builder. The problem was during reclaim of a inode the corresponding vnode could be picked up by a vget() by another thread while the inode was being cleared out in the ufs_inactive routine and the thread running ufs_inactive slept for i/o. When raced the vnode would then not have zero use count and would not be cleared out on exit from ufs_inactive with a dead/invalid vnode being used. While this could get "fixed" by checking for the race happening and trying again in the inactive routine, or by adding "yet another visible vnode locking flag" we choose to add a vdoom() api for the moment that allows the caller to block future attempts to grab this vnode until it is cleared out fully with vclean. Teste by jsing@ on the Go builder and seems to solve the issue. ok kettenis@, claudio@
2024-07-11Use atomic operations to access integers in sysctl(2).Alexander Bluhm
In sysctl_int_bounded() use atomic operations to load, store, or swap integer values. By using volatile pointers this will result in a single assembly instruction, no matter how over optimizing compilers will become. Note that this does not solve data dependency problems, nor MP problems in the kernel code using these integers. For full MP safety additional considerations, memory barriers, or locks will be needed where the values are used. But for simple integer in- and output volatile is enough. If new and old value pointers are given to sysctl, atomic swapping guarantees that userlands sees the same old value only once. There are more sysctl_int() functions that have to be adapted. OK deraadt@ kettenis@
2024-07-10Kill the runfast and run label and inline those bits. No functional change.Claudio Jeker
OK mpi@
2024-07-10Sweep up another softdep crumb.Kenneth R Westerback
Remove #if notyet/#endif chunk that references the never-defined STATFS_SOFTUPD. ok jsg@
2024-07-09Remove splassert() for now since IPL_STATCLOCK is MD and not all archs have it.Claudio Jeker
Noticed by bluhm@ on octeon
2024-07-09Reshuffle the switch cases in ptsignal and single_thread_set to beClaudio Jeker
in the order needed for future changes. No functional change. OK mpi@
2024-07-09In sched_toidle() only call the TRACEPOINT if curproc is set.Claudio Jeker
sched_toidle() is called by cpu_hatch() to start APs and then curproc may be NULL. OK mpi@
2024-07-09change format strings to fix SEM_DEBUG buildJonathan Gray
2024-07-08Remove the KASSERT() in sched_unpeg_curproc().Martin Pieuchot
This fix rebooting a GENERIC.MP kernel on SP machines because unpeg is out of the loop in smr_thread().
2024-07-08Introduce sched_unpeg_curproc() to abstract the current implementation.Martin Pieuchot
ok kettenis@, mlarkin@, miod@, claudio@
2024-07-08Rework per proc and per process time usage accountingClaudio Jeker
For procs (threads) the accounting happens now lockless by curproc using a generation counter. Callers need to use tu_enter() and tu_leave() for this. To read the proc p_tu struct tuagg_get_proc() should be used. It ensures that the values read is consistent. For processes only the time of exited threads is accumulated in ps_tu and to get the proper process time usage tuagg_get_process() needs to be called. tuagg_get_process() will sum up all procs p_tu plus the ps_tu. This removes another SCHED_LOCK() dependency. Adjust the code in exit1() and exit2() to correctly account for the full run time. For this adjust sched_exit() to do the runtime accounting like it is done in mi_switch(). OK jca@ dlg@
2024-07-08Fix comment for exit2() this code is called by sched_idle() not cpu_exit().Claudio Jeker
The note can be removed but add a comment that since this is called from the idle process exit2() is not allowed to sleep. OK jca@
2024-07-05remove unused vn_isdisk(), added for softdepJonathan Gray
ok kn@
2024-07-03remove __mp_release_all_but_one(), unused since sched_bsd.c rev 1.92Jonathan Gray
ok claudio@
2024-06-28Restore original EPIPE and ENOTCONN errors priority in the uipc_send()Vitaliy Makkoveev
path changed in rev 1.206. At least acme-client(1) is not happy with this change. Reported by claudio. Tests and ok by bluhm.
2024-06-26Push socket re-lock to the vnode(9) release path within unp_detach().Vitaliy Makkoveev
The only reason to re-lock dying `so' is the lock order with vnode(9) lock, thus `unp_gc_lock' rwlock(9) could be taken after solock(). ok bluhm
2024-06-26return type on a dedicated line when declaring functionsJonathan Gray
ok mglocker@
2024-06-22remove space between function names and argument listJonathan Gray
2024-06-14Switch AF_ROUTE sockets to the new locking scheme.Vitaliy Makkoveev
At sockets layer only mark buffers as SB_MTXLOCK. At PCB layer only protect `so_rcv' with corresponding `sb_mtx' mutex(9). SS_ISCONNECTED and SS_CANTRCVMORE bits are redundant for AF_ROUTE sockets. Since SS_CANTRCVMORE modifications performed with both solock() and `sb_mtx' held, the 'unlocked' SS_CANTRCVMORE check in rtm_senddesync() is safe. ok bluhm
2024-06-05No need to call d_open/d_close for every hibernate resume i/o.Kenneth R Westerback
Speeds up resuming from hibernate. Testing florian@ stsp@ ok mlarkin@ stsp@
2024-06-04Enable hibernate/resume to nvme(4) disks with 4096 byte sectors.Kenneth R Westerback
testing by florian@ mglocker@ mlarkin@ ok deraadt@ mglocker@ mlarkin@
2024-06-03Remove lock_class_sched_lock from lock_classes since the correspondingClaudio Jeker
entry in enum lock_class_index was removed in sys/_lock.h You get fireworks if the lock_classes array and enum lock_class_index get out of sync.
2024-06-03Remove the now unsued s argument to SCHED_LOCK and SCHED_UNLOCK.Claudio Jeker
The SPL level is not tacked by the mutex and we no longer need to track this in the callers. OK miod@ mlarkin@ tb@ jca@