summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2017-07-22Introduce jiffies, a volatile unsigned long version of our ticks variableMark Kettenis
for use by the linux compatibility APIs in drm(4). While I hate infecting code in sys/kern with this, untangling all the of having different types and different signedness is too much for me right now. The best strategy may be to change ticks itself to be long but that needs some careful auditing. ok deraadt@
2017-07-20When receiving a struct sockaddr from userland, enforce that memoryAlexander Bluhm
for sa_len and sa_family is provided. This will make handling of socket name mbufs within the kernel safer. issue reported by Ilja Van Sprundel; OK claudio@
2017-07-20Initialize a local variable to not leak kernel stack info to userlandMartin Pieuchot
if TIOCGPGRP fail. Issue found by Ilja van Sprundel. ok bluhm@, millert@, deraadt@
2017-07-20If pool_get() sleeps while allocating additional memory for socketAlexander Bluhm
splicing, another process may allocate it in the meantime. Then one of the splicing structures leaked in sosplice(). Recheck that no struct sosplice exists after a protential sleep. reported by Ilja Van Sprundel; OK mpi@
2017-07-20Extend the scope of the socket lock in soo_stat() to protect `so_state'Martin Pieuchot
and `so_rcv'. ok bluhm@, claudio@, visa@
2017-07-20Prepare filt_soread() to be locked. No functionnal change.Martin Pieuchot
ok bluhm@, claudio@, visa@
2017-07-19Uninitialized variable can leak kernel memory.Theo de Raadt
Found by Ilja Van Sprundel ok kettenis
2017-07-19Move KTRPOINT call up. The lenght variable i is getting aligned and soClaudio Jeker
uninitialised data can be dumped into the ktrace message. Found by Ilja Van Sprundel OK bluhm@
2017-07-18Both syslog(3) and syslogd(8) truncate the message at 8192 bytes.Alexander Bluhm
Do the same in sendsyslog(2) and document the behavior. reported by Ilja Van Sprundel; OK millert@ deraadt@
2017-07-18soreserve() modifies `so_snd' and `so_rcv' so asserts that it is calledMartin Pieuchot
with the socket lock. This change is safe because sbreserve() already asserts that the lock is held, but it acts as implicit documentation and indicates that I looked at the function.
2017-07-13Do not unlock the netlock in the goto out error path before it hasAlexander Bluhm
been acquired in sosend(). Fixes a kernel lock assertion panic. OK visa@ mpi@
2017-07-12Invalidate read-ahead buffers when read shortMike Belopuhov
Buffercache performs read-ahead for cluster reads by extending the length of an original read operation to the MAXPHYS (64k). Upon I/O completion, the length is trimmed and the buffer is returned to the filesystem and the remaining data is cached. However, under certain circumstances, the underlying hardware may fail to do a complete I/O operation and return with a non- zero value of the residual length (i.e. data that wasn't read). The residual length may exceed the size of an original request and must be re-adjusted to uphold the contract with the caller, e.g. the filesystem. At the same time, read-ahead buffers that cover chunks of memory corresponding to the residual length must be invalidated and not cached. Discussed at length during d2k17, ok tedu
2017-07-12Do not call fo_ioctl() in syscall that do, or will, take the socketMartin Pieuchot
lock. Prevents a future lock recursion since soo_ioctl() will need to grab the lock. ok bluhm@, visa@
2017-07-12Compute the level of contention only once.Visa Hankala
Suggested by and OK dlg@
2017-07-12When there is no contention on a pool cache lock, lower the numberVisa Hankala
of items that a cache list is allowed to hold. This lets the cache release resources back to the common pool after pressure on the cache has decreased. OK dlg@
2017-07-10make malloc(9) mpsafe by using a mutex instead of splvm.David Gwynne
this is almost a straightforward change of spl ops with mutex ops, except the accounting has been shuffled around. memory is counted as used before an attempt to allocate it from uvm is made to prevent overcommitting memory. this is modelled on how pools limit allocations. the uvm bits have been eyeballed by kettenis@ who says they should be safe. visa@ found some nits which have been fixed. tested by chris@ and amit kulkarni ok kettenis@ visa@ mpi@
2017-07-08Revert grabbing the socket lock in kqueue filters.Martin Pieuchot
It is unsafe to sleep while iterating the list of pending events in kqueue_scan(). Reported by abieber@ and juanfra@
2017-07-04some of this code was written in an era when spaces cost extra.Ted Unangst
add a little breathing room.
2017-07-04Always hold the socket lock when calling sblock().Martin Pieuchot
Implicitely protects `so_state' with the socket lock in sosend(). ok visa@, bluhm@
2017-07-04Assert that the socket lock is held when `so_state' is modified.Martin Pieuchot
ok bluhm@, visa@
2017-07-04Assert that the socket lock is held when `so_qlen' is modified.Martin Pieuchot
ok bluhm@, visa@
2017-07-03Do not grab the socket lock in doaccept() twice. Pass NOTE_SUBMITAlexander Bluhm
to KNOTE() as we are already holding the lock. Fixes "panic: rw_enter: netlock locking against myself" reported by Gregor Best and reproduced with src/regress/lib/libtls/gotls. OK millert@
2017-07-03Protect `so_state', `so_error' and `so_qlen' with the socket lock inMartin Pieuchot
kqueue filters. ok millert@, bluhm@, visa@
2017-06-29Due to risks known for decades, TIOCSTI now performs no action, and simplyTheo de Raadt
returns EIO. The base system has been cleaned of TIOCSTI uses (collaboration between anton and I), and the ports tree appears mostly clean. A few stragglers may be discovered and cleaned up later... In a month or so, we should see if the #define can be removed entirely. ok anton tedu, support from millert
2017-06-27Add missing solock()/sounlock() dances around sbreserve().Martin Pieuchot
While here document an abuse of parent socket's lock. Problem reported by krw@, analysis and ok bluhm@
2017-06-26Assert that the corresponding socket is locked when manipulating socketMartin Pieuchot
buffers. This is one step towards unlocking TCP input path. Note that all the functions asserting for the socket lock are not necessarilly MP-safe. All the fields of 'struct socket' aren't protected. Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to tell when a filter needs to lock the underlying data structures. Logic and name taken from NetBSD. Tested by Hrvoje Popovski. ok claudio@, bluhm@, mikeb@
2017-06-23set the alignment of the per cpu cache structures to CACHELINESIZE.David Gwynne
hardcoding 64 is too optimistic.
2017-06-23change the semantic for calculating when to grow the size of a cache list.David Gwynne
previously it would figure out if there's enough items overall for all the cpus to have full active an inactive free lists. this included currently allocated items, which pools wont actually hold on a free list and cannot predict when they will come back. instead, see if there's enough items in the idle lists in the depot that could instead go on all the free lists on the cpus. if there's enough idle items, then we can grow. tested by hrvoje popovski and amit kulkarni ok visa@
2017-06-22calculate a "sum" based upon pointers to functions all over the kernel,Theo de Raadt
so that an unhibernate kernel can detect if it is running with the kernel it booted. ok mlarkin
2017-06-21Permit TIOCSTAT on a tty.Theo de Raadt
2017-06-20In ddb print socket bit field so_state in hex to match SS_ defines.Alexander Bluhm
2017-06-20Do not touch file pointers for which FILE_IS_USABLE() is false.Gerhard Roth
They're might not be fully constructed. ok mpi@ deraadt@ bluhm@
2017-06-20Convert sodidle() to timeout_set_proc(9), it needs a process contextMartin Pieuchot
to grab the rwlock. Problem reported by Rivo Nurges. ok bluhm@
2017-06-19dynamically scale the size of the per cpu cache lists.David Gwynne
if the lock around the global depot of extra cache lists is contented a lot in between the gc task runs, consider growing the number of entries a free list can hold. the size of the list is bounded by the number of pool items the current set of pages can represent to avoid having cpus starve each other. im not sure this semantic is right (or the least worst) but we're putting it in now to see what happens. this also means reality matches the documentation i just committed in pool_cache_init.9. tested by hrvoje popovski and amit kulkarni ok visa@
2017-06-19Terminate pledge log(9) with newline. This fixes dmesg(8) output.Alexander Bluhm
found by regress/sys/kern/pledge/generic; OK deraadt@
2017-06-16add garbage collection of unused lists percpu cached items.David Gwynne
the cpu caches in pools amortise the cost of accessing global structures by moving lists of items around instead of individual items. excess lists of items are stored in the global pool struct, but these idle lists never get returned back to the system for use elsewhere. this adds a timestamp to the global idle list, which is updated when the idle list stops being empty. if the idle list hasn't been empty for a while, it means the per cpu caches arent using the idle entries and they can be recovered. timestamping the pages prevents recovery of a lot of items that may be used again shortly. eg, rx ring processing and replenishing from rate limited interrupts tends to allocate and free items in large chunks, which the timestamping smooths out. gc'ed lists are returned to the pool pages, which in turn get gc'ed back to uvm. ok visa@
2017-06-16split returning an item to the pool pages out of pool_put as pool_do_put.David Gwynne
this lets pool_cache_list_put return items to the pages. currently, if pool_cache_list_put is called while the per cpu caches are enabled, the items on the list will put put straight back onto another list in the cpu cache. this also avoids counting puts for these items twice. a put for the items have already been coutned when the items went to a cpu cache, it doesnt need to be counted again when it goes back to the pool pages. another side effect of this is that pool_cache_list_put can take the pool mutex once when returning all the items in the list with pool_do_put, rather than once per item. ok visa@
2017-06-15report contention on caches global data to userland.David Gwynne
2017-06-15white space tweaks. no functional change.David Gwynne
2017-06-15implement the backend of the sysctls that report pool cache info.David Gwynne
KERN_POOL_CACHE reports info about the global cache info, like how long the lists of cache items the cpus build should be and how many of these lists are idle on the pool struct. KERN_POOL_CACHE_CPUS reports counters from each each. the counters are for how many item and list operations the cache has handled on a cpu. the sysctl provides an array of ncpusfound * struct kinfo_pool_cache_cpu, not a single struct kinfo_pool_cache_cpu. tested by hrvoje popovski ok mikeb@ millert@ ----------------------------------------------------------------------
2017-06-14tweak sysctl_string and sysctl_tstring to use size_t for lengths, not intDavid Gwynne
theyre both wrappers around sysctl__string, which is where half the fix is too.
2017-06-13when enabling cpu caches, check the item size against the right thingDavid Gwynne
lists of free items on the per cpu caches are built out the pool items as struct pool_cache_items, not struct pool_cache. make the KASSERT in pool_cache_init check that properly.
2017-06-13use size_t for the size of things in memory, not int.David Gwynne
this tweaks the len argument to sysctl_rdstring, sysctl_struct, and sysctl_rdstruct. there's probably more to fix. ok millert@
2017-06-12Pledge is fairly done, so the kernel printf's can be converted to log()Theo de Raadt
calls. They'll be a little less visible, but still in the system logs. ok bluhm
2017-06-08ASLR, W^X, and guard pages trigger processor traps that result inAlexander Bluhm
SIGILL, SIGBUS, SIGSEGV signals. Make such memory violations visible in lastcomm(1). This also works if a programm tries to hide them with a signal handler. Manual kill -SEGV does not generate false positives. OK deraadt@
2017-06-08make rb_n2e return a struct rb_entry *, not void *David Gwynne
maybe this will help prevent misassignment in the future.
2017-06-08use unsigned long instead of caddr_t to move between nodes and entries.David Gwynne
this removes the need for sys/param.h. this code can be built with only sys/tree.h, which in turn only needs sys/_null.h.
2017-06-08add RBT_SET_LEFT, RBT_SET_RIGHT, and RBT_SET_PARENTDavid Gwynne
this are provided so an RBT and it's topology can be copied without having to reinsert the copied nodes into a new tree. there are two reasons RBT_LEFT/RIGHT/PARENT macros cant be used like RB_LEFT/RIGHT/PARENT for this. firstly, RBT_LEFT and co are functions that return a pointer value, they dont provide access to the pointer itself for use as an lvalue that you can assign to. secondly, RBT entries dont store pointers to other nodes, they point to the RBT_ENTRY structures inside other nodes. this means that RBT_SET_LEFT and co have to get an offset from the node to the RBT_ENTRY and store that.
2017-06-07Add an acct(5) flag for pledge violations. Then lastcomm(1) showsAlexander Bluhm
when something went wrong. This allows to monitor whether the system is under attack and that the attack has been prevented by OpenBSD pledge(2). OK deraadt@ millert@ jmc@
2017-06-07Assert that the KERNEL_LOCK() is held when messing with routing,Martin Pieuchot
pfkey and unix sockets. ok claudio@