summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2018-12-17When using MSG_WAITALL, soreceive() can sleep while processing theAlexander Bluhm
receive buffer of a stream socket. Then a new pair of control and data mbuf can be appended to the mbuf queue. In this case, terminate the loop with a short read to prevent a panic. Userland should read the control message with the next system call. OK claudio@ deraadt@
2018-12-17Remove unused function gsignal().Visa Hankala
OK deraadt@ anton@
2018-12-16add task_pendingDavid Gwynne
jsg@ wants this for drm, and i've had a version of it in diffs sine 2016, but obviously havent needed to use it just yet. task_pending is modelled on timeout_pending, and tells you if the task is on a list waiting to execute. ok jsg@
2018-12-12free(9) sizes for sysv shm.Martin Pieuchot
ok bluhm@, visa@
2018-12-12free(9) sizes for SVID semaphores.Martin Pieuchot
ok bluhm@, visa@
2018-12-07free(9) sizes for netcred.Martin Pieuchot
ok visa@
2018-12-06Core files with >65535 sections have to use PN_XNUM and a section headerPhilip Guenther
to pass the real count, with a minimal .shstrtab segment for consistency. Also, add support for PN_XNUM to readelf. problem reported and testing by claudio@ ok kettenis@
2018-12-05free(9) sizes for softcs.Martin Pieuchot
ok tedu@
2018-12-05free(9) size for temporary buffer.Martin Pieuchot
ok ratchov@
2018-11-30Trivial MH_ALIGN/M_ALIGN to m_align conversions.Claudio Jeker
OK bluhm@
2018-11-27EVFILT_TIMER: Remove extra tick from tvtohz(9) on timeout reload.cheloha
tvtohz(9) adds an extra tick to account for the present tick, but this tick needs to be removed when the timeout is reloaded thereafter. We already do this for periodic setitimer(2) timeouts. Prompted by Paul Herman's writeup on clock aliasing for DragonflyBSD: https://frenchfries.net/paul/dfly/nanosleep.html Also fixed in FreeBSD r238424. Style tweaks from visa. ok visa@, guenther@
2018-11-21In unp_internalize() check the length more carefully preventing anClaudio Jeker
underflow in a later calcuation. Using the same CMSG_LEN(0) check that other cmsghdr handlers implemented. Probelm found by anton@ OK anton@, deraadt@, visa@
2018-11-21When using MSG_PEEK to peak into packets skip control messages holdingClaudio Jeker
SCM_RIGHTS from being sent to the userland since they hold kernel internal data and it does not make sense to externalize it. OK deraadt@, guenther@, visa@
2018-11-21free(9) sizes for bread_cluser().Martin Pieuchot
ok mikeb@, visa@
2018-11-19delete the dns jackport experiment. it has no future.Ted Unangst
2018-11-19Utilize sigio with sockets.Visa Hankala
OK mpi@
2018-11-17Add new KERN_CPUSTATS sysctl(2) so we can identify offline CPUs.cheloha
Because of hw.smt we need a way to determine whether a given CPU is "online" or "offline" from userspace. KERN_CPTIME2 is an array, and so cannot be cleanly extended for this purpose, so add a new sysctl(2) KERN_CPUSTATS with an extensible struct. At the moment it's just KERN_CPTIME2 with a flags member, but it can grow as needed. KERN_CPUSTATS appears to have been defined by BSDi long ago, but there are few (if any) packages in the wild still using the symbol so breakage in ports should be near zero. No other system inherited the symbol from BSDi, either. Then, use the new sysctl(2) in systat(1) and top(1): - systat(1) draws placeholder marks ('-') instead of percentages for offline CPUs in the cpu view. - systat(1) omits offline CPU ticks when drawing the "big bar" in the vmstat view. The upshot is that the bar isn't half idle when half your logical CPUs are disabled. - top(1) does not draw lines for offline CPUs; if CPUs toggle on or offline in interactive mode we redraw the display to expand/reduce space for the new/missing CPUs. This is consistent with what some top(1) implementations do on Linux. - top(1) omits offline CPUs from the totals when CPU totals are combined into a single line (the '-1' flag). Originally prompted by deraadt@. Discussed endlessly with deraadt@, ketennis@, and sthen@. Tested by jmc@ and jca@. Earlier versions also discussed with jca@. Earlier versions tested by jmc@, tb@, and many others. docs ok jmc@, kernel bits ok ketennis@, everything ok sthen@, "Is your stuff in yet?" deraadt@
2018-11-17Avoid leaking kernel memory in struct kevent padding.Todd C. Miller
From NetBSD (maxv). OK deraadt@ visa@
2018-11-14Revert previous, it breaks regress.Martin Pieuchot
2018-11-14Userland malloc(3) & free(3) take only one argument.Martin Pieuchot
2018-11-13Fix fcntl(fd, F_GETOWN) with pipes. As a regressionVisa Hankala
of kern_descrip.c r1.177 and sys_pipe.c r1.82, the call always returned an error. OK jca@ anton@ mpi@
2018-11-12Utilize sigio with pipes. This makes fcntl(fd, F_SETOWN, arg) correctlyVisa Hankala
handle arg as a process ID if the value is positive and as a process group ID if the value is negative. In addition, now the signal sending checks privileges. OK mpi@
2018-11-12Add a mechanism for managing asynchronous IO signal registrations.Visa Hankala
It centralizes IO signal privilege checking and makes possible to revoke a registration when the target process or process group is deleted. Adapted from FreeBSD. OK kettenis@ mpi@ guenther@
2018-11-12Introduce m_align() a function that works like M_ALIGN() but works withClaudio Jeker
all types of mbufs. Also introduce some KASSERT in the m_*space() functions to ensure that no negative number is returned. This also introduces two internal macros M_SIZE() & M_DATABUF() which return the right size and start pointer of the mbuf data area. Use it in a few obvious places to simplify code. OK bluhm@
2018-11-10use the LFPRINTF() debug macro consistently; ok mpi@anton
2018-11-10Conform to POSIX-2001 in which the behavior of passing a negative length usinganton
posix file locks is defined. Also, detect overflows when dealing with positive lengths. ok millert@ visa@
2018-11-09M_LEADINGSPACE() and M_TRAILINGSPACE() are just wrappers forClaudio Jeker
m_leadingspace() and m_trailingspace(). Convert all callers to call directly the functions and remove the defines. OK krw@, mpi@
2018-11-06new sysctl for userland malloc flags, kernel part. ok millert@ deraadt@Otto Moerbeek
2018-11-05trace struct flock; ok visa@anton
2018-11-02make debug flags continuousanton
2018-10-30If we execute a #!shell binary, the shell is an integral part of theTheo de Raadt
binary so it should bypass unveil restrictions. This is similar (but different...) to how the ELF linker (ld.so) is loaded (after unveils get dropped). Discovered in doas, due to more accurate unveil semantics. ok guenther tedu beck
2018-10-29irrelevant part snuck into previous commit; from semarieTheo de Raadt
2018-10-29Now that most archs have better NMBCLUSTERS defaults it is possible to bringClaudio Jeker
back rev 1.90. ---- mbufs and mbuf clusters are now backed by large pools. Because of this we can relax the oversubscribe limit of socketbuffers a fair bit. Instead of maxing out as sb_max * 1.125 or 2 * sb_hiwat the maximum is increased to 8 * sb_hiwat -- which seems to be a good compromise between memory waste and better socket buffer usage. OK deraadt@ ---- ok benno@
2018-10-29needs sys/lock.hTheo de Raadt
2018-10-28Correctly deal with upper level unveil's by keeping track of the coveringBob Beck
unveil for each unveil in the process at unveil() time, and refactoring the handling of current directory and ISDOTDOT to be much more sensible. Worked out at ns2k18 with guenther@. ok deraadt@
2018-10-27Add assertions for lockf list manipulation, hidden behind LOCKF_DIAGNOSTIC.anton
While here, improve existing lockf debug routines and sprinkle some more logging related to list manipulation. ok deraadt@ visa@ (as part of a larger diff)
2018-10-27Rework previous lockf fix; bluhm@ noticed a regress failure during consecutiveanton
runs. This is a second attempt in which the lockf structure is turned into a doubly linked list which makes it easier to ensure correctness during list insertion and deletion. ok deraadt@ visa@
2018-10-25Fix a resource leak in doaccept().Visa Hankala
If a connection that is being accepted gets aborted early, or if the user-supplied buffer is invalid, doaccept() leaks a socket. This is a regression caused by r1.153 of uipc_syscalls.c. Correct the issue by associating the socket with the file early enough. In case soaccept() or copyaddrout() fails, the socket will be freed as a result of the file closing. This logic was used by the pre-r1.153 code. closef() may block, so it is hoisted outside the fdp lock. OK bluhm@ mpi@
2018-10-17Only the scheduler time statistics should be affected by spinning.Alexander Bluhm
Change the process time accounting back to the original code before spinning time was added. No change for scheduler time. Spinning interrupts are no longer accounted to process system time. input and OK visa@
2018-10-10User land time accounting has changed when kernel spinning time wasAlexander Bluhm
introduced. Account spinning time to the process system time again. time(1) has no spinning, it only shows real, user, sys. OK visa@ mpi@ deraadt@
2018-10-09Fix a "copy-and-paste" error that Coverity picked up in the augment codeDavid Gwynne
This brings it back in line with the macros. via Paco A. and the FRRouting project. ok deraadt@ visa@ guenther@ tb@
2018-10-06When freeing a lockf struct that already is part of a linked list, make sure toanton
update the next pointer for the preceding lock. Prevents a double free panic. ok millert@
2018-10-05Revert KERN_CPTIME2 ENODEV changes in kernel and userspace.cheloha
ok kettenis deraadt
2018-10-04Call unveil_destroy() from exit1() instead of from the reaper. Fixes aMark Kettenis
race between the reaper and unveil_removevnode() that would trigger a KASSERT. At least as far as I can tell. Pointed out by semarie@ ok beck@, deraadt@
2018-10-04Revert the inpcb table mutex commit. It triggers a witness panicAlexander Bluhm
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
2018-09-29Use atomic operations to update vfc_refcount. Change the field's typeVisa Hankala
to unsigned int. OK deraadt@
2018-09-26KERN_CPTIME2: set ENODEV if the CPU is offline.cheloha
This lets userspace distinguish between idle CPUs and those that are not schedulable because hw.smt=0. A subsequent commit probably needs to add documentation for this to sysctl.2 (and perhaps elsewhere) after the dust settles. Also included here are changes to systat(1) and top(1) that account for the ENODEV case and adjust behavior accordingly: - systat(1)'s cpu view prints placeholder marks ('-') instead of percentages for each state if the given CPU is offline. - systat(1)'s vmstat view checks for offline CPUs when computing the machine state total and excludes them, so the CPU usage graph only represents the states for online CPUs. - top(1) does not draw CPU rows for offline CPUs when the view is redrawn. If CPUs "go offline", percentages for each state are replaced by placeholder marks ('-'); the view will need to be redrawn to remove these rows. If CPUs "go online" the view will need to be redrawn to show these new CPUs. In "combined CPU" mode, the count and the state totals only represent online CPUs. Ports using KERN_CPTIME2 will need to be updated. The changes described above to make systat(1) and top(1) aware of the ENODEV case *and* gracefully handle a changing HW_NCPUONLINE while the application is running are not necessarily appropriate for each and every port. The changes described above are so extensive in part to demonstrate one way a program *might* be made robust to changing CPU availability. In particular, changing hw.smt after boot is an extremely rare event, and this needs to be weighed when updating ports. The logic needed to account for the KERN_CPTIME2 ENODEV case is very roughly: if (sysctl(...) == -1) { if (errno != ENODEV) { /* Actual error occurred. */ } else { /* CPU is offline. */ } } else { /* CPU is online and CPU states were set by sysctl(2). */ } Prompted by deraadt@. Basic idea for ENODEV from kettenis@. Discussed at length with kettenis@. Additional testing by tb@. No complaints from hackers@ after a week. ok kettenis@, "I think you should commit [now]" deraadt@
2018-09-26Move the allocating and freeing of mount points intoVisa Hankala
dedicated functions. OK deraadt@ mpi@
2018-09-25fix typo in commentJasper Lievisse Adriaanse
ok beck@
2018-09-22Harmonize spacing after ellipses in displayed messages.Frederic Cambus
We were using spacing after ellipses in an inconsistent way in the installer. Standardize on using "... " everywhere and take into account the cursor position while we are waiting for the task to complete: the cursor is now always positioned after the last dot, and the space is added when displaying completion confirmation. While there, also take cursor position into account in vfs_shutdown(), and remove the extra leading space before ticks in dhclient. OK deraadt@