summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2022-03-18Cleanup reference counting. Remove #ifdef DIAGNOSTIC to keep theAlexander Bluhm
code similar in non DIAGNOSTIC case. Rename refcnt variable to refs for consistency with r_refs. Add KASSERT() in refcnt_finalize(). OK visa@
2022-03-18Use the refcnt API with struct plimit.Visa Hankala
OK bluhm@ dlg@
2022-03-17Use the refcnt API with struct ucred.Visa Hankala
OK bluhm@
2022-03-16Remove an unneeded include.Visa Hankala
2022-03-16Use the refcnt API in kqueue.Visa Hankala
OK dlg@ bluhm@
2022-03-16Add refcnt_shared() and refcnt_read()Visa Hankala
refcnt_shared() checks whether the object has multiple references. When refcnt_shared() returns zero, the caller is the only reference holder. refcnt_read() returns a snapshot of the counter value. refcnt_shared() suggested by dlg@. OK dlg@ mvs@
2022-03-14Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ulTheo Buehler
This reverts the commit protecting the list and hashes in the PCB tables with a mutex since the build of sysctl(8) breaks, as found by kettenis. ok sthen
2022-03-14pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. ToAlexander Bluhm
run pf in parallel, make parts of the stack MP safe. Protect the list and hashes in the PCB tables with a mutex. Note that the protocol notify functions may call pf via tcp_output(). As the pf lock is a sleeping rw_lock, we must not hold a mutex. To solve this for now, collect these PCBs in inp_notify list and protect it with exclusive netlock. OK sashan@
2022-03-11Revert part of rev 1.293. Using cursig() to deliver masked signalsClaudio Jeker
to the debugger can cause a loop between the debugger and cursig() if the signal is masked. cursig() has no way to know which signal was already delivered to the debugger and so it delivers the same signal over and over again. Instead handle traps to masked signals directly in trapsignal. This is what rev 1.293 was mostly about. If SIGTRAP was masked by the process breakpoints no longer worked since the signal deliver to the debugger did not happen. Doing this case in trapsignal solves both the problem with the loop and the delivery of masked traps. Problem reported and fix tested by matthieu@ OK kettenis@ mpi@
2022-03-10Use atomic load and store functions to access refcnt and waitAlexander Bluhm
variables. Although not necessary everywhere, using atomic functions exclusively for variables marked as atomic is clearer. OK mvs@ visa@
2022-02-25Reported-by: syzbot+1b5b209ce506db4d411d@syzkaller.appspotmail.comPhilip Guenther
Revert the pr_usrreqs move: syzkaller found a NULL pointer deref and I won't be available to monitor for followup issues for a bit
2022-02-25add setrtable to pledge("id"). from Matthew MartinTed Unangst
ok deraadt
2022-02-25Move pr_attach and pr_detach to a new structure pr_usrreqs that canPhilip Guenther
then be shared among protosw structures, following the same basic direction as NetBSD and FreeBSD for this. Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the proper prototype to eliminate the previously necessary casts. ok mvs@ bluhm@
2022-02-24regenVitaliy Makkoveev
2022-02-24Unlock getsockname(2) syscall. For inet and UNIX sockets it fills passedVitaliy Makkoveev
'sockaddr' structure with socket's address. For key management and route domain sockets it just returns error. ok bluhm@
2022-02-22Since other exported commandnames were increased to 24 and graduated intoTheo de Raadt
proper strings, adapt struct acct's ac_comm similarily. While here increase ac_mem to 32-bits, increase ac_flag from 8 to 32 bits for future extensions, add ac_pid for forensics, and reorder the structure to avoid compiler pads. More work remains in the sa(8) command to use ac_pid better. This is a flag day for the acct file format, new/old files/tools are incompatible. ok bluhm millert
2022-02-22Start using new _MAXCOMLEN (a proper string expanded to 24 bytesTheo de Raadt
including the NUL), in all internal interafaces, and expose this in ktrace, core, or proc.h visibility. ok millert
2022-02-22Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>Philip Guenther
net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-02-21anscestors -> ancestorsJonathan Gray
2022-02-21consisitent -> consistentJonathan Gray
2022-02-21expliclitly -> explicitlyJonathan Gray
2022-02-19The suspend/resume code sleeps-not-allowed phases are protected withTheo de Raadt
cold=2. Use the same strategy in a a similar phase during hibernate.
2022-02-19tsleep() prints a stack trace when cold==2. The suspend/resume code hasTheo de Raadt
phases where sleeps are not allowed, and this used to discover it. msleep() needs the same check.
2022-02-17Writes to the ps_flags field of struct process should be atomic.Rob Pierce
Ok deraadt@ guenther@
2022-02-16return unique errors (I chose some errno values.. ) for the variousTheo de Raadt
failure modes. Also, pack the code a little bit, easier to read.
2022-02-16Reduce code duplication in socket event filters.Visa Hankala
OK mpi@
2022-02-16unifdef PROC_PCJonathan Gray
ok guenther@ rob@
2022-02-16If the lid is closed, suspend_finish() now returns EAGAIN, so go to the topTheo de Raadt
and restart the suspend all over again. This was previously done by issuing a task to the acpi thread, but this is simpler. (I want to try to duplicate these tests earlier in the resume path...)
2022-02-16change MD gosleep() and sleep_finish() to return int, the MI code will beTheo de Raadt
able to react to this suitably.
2022-02-15Reintroduce ps state flag 'c' indicating chrooted process (via PS_BITS).Rob Pierce
Ok deraat@
2022-02-15Since acpitoshiba brightness button processing no longer plays gamesTheo de Raadt
with AML parsing outside the acpi thread, the locking-release dance around wsdisplay_{suspend,resume} can be removed ok kettenis
2022-02-15when the MI suspend code encounters problems, we need a way toTheo de Raadt
reset the MD state before bailing out. New MD function sleep_abort() does that.
2022-02-15unifdef TIOCHPCL, 4.3BSD compat ioctlJonathan Gray
ok deraadt@ guenther@
2022-02-15MI disable_lid_wakeups() is not needed, x86 systems can do thisTheo de Raadt
in sleep_resume(), which seems sensible for other future systems also
2022-02-14Introduce a signal context that is used to pass signal related informationClaudio Jeker
from cursig() to postsig() or the caller itself. This will simplify locking. Also alter sigactsfree() a bit and move it into process_zap() so ps_sigacts is always a valid pointer. OK semarie@
2022-02-14update sbchecklowmem() to better detect actual mbuf memory usage.David Gwynne
previously sbchecklowmem() (and sonewconn()) would look at the mbuf and mbuf cluster pools to see if they were approaching their hard limits. based on how many mbufs/clusters were allocated against the limits, socket operations would start to fail with ENOBUFS until utilisation went down. mbufs and clusters have changed a lot since then though. there are now many mbuf cluster pools, not just one for 2k clusters. because of this the mbuf layer now limits the amount of memory all the mbuf pools can allocate backend pages from rather than limit the individual pools. this means sbchecklowmem() ends up looking at the default pool hard limit, which is UINT_MAX, which in turn means means sbchecklowmem() probably never applies backpressure. this is made worse on multiprocessor systems where per cpu caches of mbuf and cluster pool items are enabled because the number of in use pool items is distorted by the cpu caches. this switches sbchecklowmem to looking at the page allocations made by all the pools instead. the big benefit of this is that the page allocations are much more representative of the overall mbuf memory usage in the system. the downside is is that the backend page allocation accounting does not see idle memory held by pools. pools cannot release partially free pages to the page backend (obviously), and pools cache idle items to avoid thrashing on the backend page allocator. this means the page allocation level is higher than the memory used by actual in-flight mbufs. however, this can also be a benefit. the backend page allocation is a kind of smoothed out "trend" line. mbuf utilisation over short periods can be extremely bursty because of things like rx ring dequeue and fill cycles, or large socket sends. if you're trying to grow socket buffers while these things are happening, luck becomes an important factor in whether it will work or not. because pools cache idle items, the backend page utilisation better represents the overall trend of activity in the system and will give more consistent behaviour here. this diff is deliberately simple. we're basically going from "no limits" to "some sort of limit" for sockets again, so keeping the code simple means it should be easy to understand and tweak in the future. ok djm@ visa@ claudio@
2022-02-13Move some MI pieces out of suspend_mp/resume_mpTheo de Raadt
ok kettenis
2022-02-13Use knote_modify() and knote_process() in obvious places.Visa Hankala
2022-02-13Rename knote_modify() to knote_assign()Visa Hankala
This avoids verb overlap with f_modify.
2022-02-12Reduce code duplication in pipe event filtersVisa Hankala
Use the f_event callback for checking event state within the pipe event filters. This enables the same f_modify and f_process functions to handle the different filter types. OK anton@
2022-02-11Inline klist_empty() for more economic machine code.Visa Hankala
OK mpi@
2022-02-11the sleep_clocks() hook is not needed because the architectures whichTheo de Raadt
need to do this can do it a few moments later in a different hook
2022-02-10Duplicate "park disk" code, so that the SUSPEND case can be MI, it is onlyTheo de Raadt
HIBERNATE that needs to be in MD code. ok gkoehler
2022-02-08The suspend/resume code is a sticky mess of MI, MD, and ACPI sequencing.Theo de Raadt
This splits out the MI sequencing, backing it with per-architecture helper functions. Further steps will be neccesary because ACPI and MD are too tightly coupled, but soon we'll be able to use this code for more architectures (which depends on figuring out the lowest-level cpu sleeping method) ok kettenis
2022-02-08use sizeof(long) - 1 in m_pullup to determine payload alignment.David Gwynne
this makes it consistent with the rest of the network stack when determining alignment. ok bluhm@
2022-02-08poll(2): Switch to kqueue backendVisa Hankala
Implement the poll(2) system call on top of the kqueue subsystem. This obsoletes the old, non-MP-safe poll backend. On entering poll(2), the new code translates each pollfd array entry into a set of knotes. When these knotes receive events through kqueue, the events are translated back to pollfd format. Entries in the pollfd array can refer to the same file descriptor with overlapping event masks. To allow such overlap with knotes, use an extra kn_pollid key that separates knotes of different pollfd entries. Adapted from DragonFly BSD, initial implementation by mpi@. Tested in snaps for three weeks. OK mpi@
2022-02-07Delete STACKGAPLEN: this exec-time allocation at the top of thePhilip Guenther
original thread's stack hasn't been used since 2015. ok miod@ deraadt@
2022-02-06Simplify cursig() a bit and make sure that signals are always sent toClaudio Jeker
the parent of ptraced processes. Especially ignore the signal mask set by sigprocmask(2) in that case. In userret() alter the testcase for when to call cursig() which is only there to avoid taking the KERNEL_LOCK when returning from a MP safe syscall. This can be revisited once cursig() is MP safe. Problem with debugging signal handlers found by kurt@ Tested and OK kurt@, OK mpi@
2022-02-04whitelist resolv.conf for stat. go dns library does this.Ted Unangst
ok deraadt
2022-01-28When it's the possessive of 'it', it's spelled "its", without thePhilip Guenther
apostrophe.