src - OpenBSD base system

Age	Commit message (Collapse)	Author
2024-11-06	Use atomic load and store operations for sbchecklowmem().	Alexander Bluhm
	Use atomic operations to make explicit where access from multiple CPU happens. Add a comment why sbchecklowmem() is sufficently MP safe without locks. OK mvs@ claudio@
2024-09-10	Fix build of m_print_chain() on sparc64.	Alexander Bluhm
	Use %zu to print mbuf MHLEN and MLEN in ddb, otherwise gcc complains. found by claudio@
2024-09-09	Print mbuf size also for non cluster.	Alexander Bluhm
	Command "ddb> show /c mbuf" always prints mbuf data size. In uipc_mbuf.c include db_interface.h as it contains prototype for m_print_chain(). OK mvs@
2024-09-05	In ddb(4) print mbuf chain and packet list.	Alexander Bluhm
	For debugging hardware offloading, DMA requirements, bounce buffers, and performance optimizations, knowing the memory layout of mbuf content helps. Implement /c and /p modifiers in ddb show mbuf. It traverses the pointer m_next for mbuf chain or m_nextpkt for packet list. Show mbuf type, data offset, mbuf length, packet length, cluster size, and total number of elements, length and size. OK claudio@ mvs@
2024-08-29	Show expensive mbuf operations in netstat(1) statistics.	Alexander Bluhm
	If the memory layout is not optimal, m_defrag(), m_prepend(), m_pullup(), and m_pulldown() will allocate mbufs or copy memory. Count these operations to find possible optimizations. input dhill@; OK mvs@
2024-03-05	Revert m_defrag() mbuf alignment to IP header.	Alexander Bluhm
	m_defrag() is intended as last resort to make DMA transfers to the hardware. Therefore page alingment is more important than IP header alignment. The reason, why the mbuf returned by m_defrag() was switched to IP header alingment, was that ether_extract_headers() failed in em(4) driver with TSO on sparc64. This has been fixed by using memcpy(). The alignment change in m_defrag() is too late in the 7.5 relaese process. It may affect several drivers on different architectures. Bus dmamap for ixl(4) on sun4v expects page alignment. Such alignment issues and TSO mbuf mapping for IOMMU need more thought. OK deraadt@
2024-02-21	Keep mbuf data alignment intact in m_defrag()	Claudio Jeker
	The recent TSO support in em(4) triggered an alignment error on the TCP header. In em(4) m_defrag() is called before setting up the TSO dma bits and with that the TCP header was suddenly no longer aligned. Like other mbuf functions preserve the data alignment in m_defrag() to prevent such unaligned packets. With help and OK bluhm@ mglocker@
2023-10-20	Avoid assertion failure when splitting mbuf cluster.	Alexander Bluhm
	m_split() calls m_align() to initialize the data pointer of newly allocated mbuf. If the new mbuf will be converted to a cluster, this is not necessary. If additionally the new mbuf is larger than MLEN, this can lead to a panic. Only call m_align() when a valid m_data is needed. This is the case if we do not refecence the existing cluster, but memcpy() the data into the new mbuf. Reported-by: syzbot+0e6817f5877926f0e96a@syzkaller.appspotmail.com OK claudio@ deraadt@
2023-06-23	Avoid division by 0 in m_pool_used	Greg Steuck
	OK dlg@ Reported-by: syzbot+a377d5cd833c2343429a@syzkaller.appspotmail.com
2023-05-16	Always set maximum queue length to passed in the IFQCTL_MAXLEN case.	Vitaliy Makkoveev
	This is not the fast path, so dropping mq->mq_maxlen check doesn't introduce any performance impact, but makes code MP consistent. Discussed with and ok from bluhm@
2023-05-05	The mbuf_queue API allows read access to integer variables which	Alexander Bluhm
	another CPU may change simultaneously. To prevent miss optimisation by the compiler, they need the READ_ONCE() macro. Otherwise there could be two read operations with inconsistent values. Writing to integer in mq_set_maxlen() needs mutex protection. Otherwise the value could change within critical sections. Again the compiler could optimize to multiple read operations within the critical section. With inconsistent values, the behavior is undefined. OK dlg@
2022-08-14	remove unneeded includes in sys/kern	Jonathan Gray
	ok mpi@ miod@
2022-02-22	Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>	Philip Guenther
	net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-02-14	update sbchecklowmem() to better detect actual mbuf memory usage.	David Gwynne
	previously sbchecklowmem() (and sonewconn()) would look at the mbuf and mbuf cluster pools to see if they were approaching their hard limits. based on how many mbufs/clusters were allocated against the limits, socket operations would start to fail with ENOBUFS until utilisation went down. mbufs and clusters have changed a lot since then though. there are now many mbuf cluster pools, not just one for 2k clusters. because of this the mbuf layer now limits the amount of memory all the mbuf pools can allocate backend pages from rather than limit the individual pools. this means sbchecklowmem() ends up looking at the default pool hard limit, which is UINT_MAX, which in turn means means sbchecklowmem() probably never applies backpressure. this is made worse on multiprocessor systems where per cpu caches of mbuf and cluster pool items are enabled because the number of in use pool items is distorted by the cpu caches. this switches sbchecklowmem to looking at the page allocations made by all the pools instead. the big benefit of this is that the page allocations are much more representative of the overall mbuf memory usage in the system. the downside is is that the backend page allocation accounting does not see idle memory held by pools. pools cannot release partially free pages to the page backend (obviously), and pools cache idle items to avoid thrashing on the backend page allocator. this means the page allocation level is higher than the memory used by actual in-flight mbufs. however, this can also be a benefit. the backend page allocation is a kind of smoothed out "trend" line. mbuf utilisation over short periods can be extremely bursty because of things like rx ring dequeue and fill cycles, or large socket sends. if you're trying to grow socket buffers while these things are happening, luck becomes an important factor in whether it will work or not. because pools cache idle items, the backend page utilisation better represents the overall trend of activity in the system and will give more consistent behaviour here. this diff is deliberately simple. we're basically going from "no limits" to "some sort of limit" for sockets again, so keeping the code simple means it should be easy to understand and tweak in the future. ok djm@ visa@ claudio@
2022-02-08	use sizeof(long) - 1 in m_pullup to determine payload alignment.	David Gwynne
	this makes it consistent with the rest of the network stack when determining alignment. ok bluhm@
2022-01-18	Properly handle read-only clusters in m_pullup(9).	Alexander Bluhm
	If the first mbuf of a chain in m_pullup is a cluster, check if the cluster is read-only (shared or an external buffer). If so, don't touch it and create a new mbuf for the pullup data. This restores original 4.4BSD m_pullup, that not only returned contiguous mbuf data of the specified length, but also converted read-only clusters into writeable memory. The latter feature was lost during some refactoring. from ehrhardt@; tested by weerd@; OK stsp@ bluhm@ claudio@
2021-03-06	ansi	Jonathan Gray

2021-02-25	let m_copydata use a void * instead of caddr_t	David Gwynne
	i'm not a fan of having to cast to caddr_t when we have modern inventions like void *s we can take advantage of. ok claudio@ mvs@ bluhm@
2021-01-13	Convert mbuf type KDASSERT() to a proper KASSERT() in m_get(9).	Alexander Bluhm
	Should prevent to use uninitialized value as bogus counter index. OK mvs@ claudio@ anton@
2020-12-12	Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.	jan
	OK dlg@, bluhm@ No Opinion mpi@ Not against it claudio@
2020-06-21	add mq_push. it's like mq_enqueue, but drops from the head, not the tail.	David Gwynne
	from Matt Dunwoodie and Jason A. Donenfeld
2020-01-22	add ml_hdatalen and mq_hdatalen as workalikes of ifq_hdatalen.	David Gwynne
	this is so pppx(4) and the upcoming pppac(4) can give kq read data dn FIONREAD values that makes sense like the ones tun(4) and tap(4) provide with ifq_hdatalen.
2019-10-22	Replace the mutex that protects the mbuf allocation limit by an	Alexander Bluhm
	atomic operation. OK visa@ cheloha@
2019-07-19	After the kernel has reached the sysclt kern.maxclusters limit,	Alexander Bluhm
	operations get stuck while holding the net lock. Increasing the limit did not help as there was no wakeup of the waiting pools. So introduce pool_wakeup() and run through the mbuf pool request list when the limit changes. OK dlg@ visa@
2019-07-16	Fix uipc white spaces.	Alexander Bluhm

2019-07-16	Prevent integer overflow in kernel and userland when checking mbuf	Alexander Bluhm
	limits. Convert kernel variables and calculations for mbuf memory into long to allow larger values on 64 bit machines. Put a range check into the kernel sysctl. For the interface itself int is still sufficient. In netstat -m cast all multiplications to unsigned long to hold the product of two unsigned int. input and OK visa@
2019-06-10	add m_microtime for getting the wall clock time associated with a packet	David Gwynne
	if the packet has the M_TIMESTAMP csum_flag, ph_timestamp is added to the boottime clock, otherwise it just uses microtime().
2019-02-10	revert revert revert. there are many other archs that use custom allocs.	Ted Unangst

2019-02-10	make it possible to reduce kmem pressure by letting some pools use a more	Ted Unangst
	accomodating allocator. an interrupt safe pool may also be used in process context, as indicated by waitok flags. thanks to the garbage collector, we can always free pages in process context. the only complication is where to put the pages. solve this by saving the allocation flags in the pool page header so the free function can examine them. not actually used in this diff. (coming soon.) arm testing and compile fixes from phessler
2019-02-01	make m_pullup use the first mbuf with data to measure alignment.	David Gwynne
	this fixes an issue found by a regress test on sparc64 by claudio, and between us took about half a day of work to understand and fix at a2k19. ok claudio@
2019-01-09	Eliminate an else branch from m_extunref().	Visa Hankala
	OK millert@ bluhm@
2019-01-08	If the mbuf cluster in m_zero() is read only, propagate the M_ZEROIZE	Alexander Bluhm
	flag to the other references. Then the final m_free() will clear the memory. OK claudio@
2019-01-07	It is possible to call m_zero with a read-only cluster. In that case just	Claudio Jeker
	return. Hopefully the other reference holder has the M_ZEROIZE flag set as well. Triggered by syzkaller. OK deradt@ visa@ Reported-by: syzbot+c578107d70008715d41f@syzkaller.appspotmail.com
2018-11-30	Trivial MH_ALIGN/M_ALIGN to m_align conversions.	Claudio Jeker
	OK bluhm@
2018-11-12	Introduce m_align() a function that works like M_ALIGN() but works with	Claudio Jeker
	all types of mbufs. Also introduce some KASSERT in the m_*space() functions to ensure that no negative number is returned. This also introduces two internal macros M_SIZE() & M_DATABUF() which return the right size and start pointer of the mbuf data area. Use it in a few obvious places to simplify code. OK bluhm@
2018-11-09	M_LEADINGSPACE() and M_TRAILINGSPACE() are just wrappers for	Claudio Jeker
	m_leadingspace() and m_trailingspace(). Convert all callers to call directly the functions and remove the defines. OK krw@, mpi@
2018-09-13	Add reference counting for inet pcb, this will be needed when we	Alexander Bluhm
	start locking the socket. An inp can be referenced by the PCB queue and hashes, by a pf mbuf header, or by a pf state key. OK visa@
2018-09-10	Instead of calculating the mbuf packet header length here and there,	Alexander Bluhm
	put the algorithm into a new function m_calchdrlen(). Also set an uninitialized m_len to 0 in NFS code. OK claudio@
2018-09-10	During fragment reassembly, mbuf chains with packet headers were	Alexander Bluhm
	created. Add a new function m_removehdr() do convert packet header mbufs within the chain to regular mbufs. Assert that the mbuf at the beginning of the chain has a packet header. found by Maxime Villard in NetBSD; from markus@; OK claudio@
2018-03-18	NULL deref on armv7 performing NFS, within 10 seconds.	Theo de Raadt
	Previous commit has no OK's or discussion about testing.
2018-03-13	make m_pullup skip over empty mbufs when finding the payload alignment.	David Gwynne

2018-03-12	make m_adj keep m_data aligned when removing all the data in an mbuf.	David Gwynne
	previously it took a shortcut when emptying an mbuf by only setting m_len to 0, but leaving m_data alone. this interacts badly with m_pullup, which tries to maintain the alignment of the data payload. if there was a 14 byte ethernet header on its own that was m_adjed off, and then the stack wants an ip header, m_pullup would put the ip header on the ethernet header alignment, which is off by 2 bytes. found by stsp@ with pair(4) on sparc64. ok stsp@ too
2018-01-16	garbage collect an unused variable	Sebastian Benoit
	ok dlg@
2017-12-29	Make sure that pf_mbuf_link_state_key() does not overwrite an	Alexander Bluhm
	existing statekey in the mbuf header. Reset the statekey in m_dup_pkthdr(). suggested by and OK sahan@
2017-12-29	Make the functions which link the pf state keys to mbufs, inpcbs,	Alexander Bluhm
	or other states more consistent. OK visa@ sashan@ on a previous version
2017-10-12	Move sysctl_mq() where it can safely mess with mbuf queue internals.	Martin Pieuchot
	ok visa@, bluhm@, deraadt@
2017-09-15	Coverity complained that the while loop at the end of m_adj() could	Alexander Bluhm
	dereference m if it is NULL. See CID 501458. - Remove the m NULL check from the final for loop, it is not necessary. This cannot happen due to the length calculation. The inconsistent code caused the coverity issue. - Move the m = mp close to all the loops where the mbuf chain is traversed. - Use mp to access the m_pkthdr consistently. - Move the next assignemnt from for (;;m = m->m_next) to the end of the loop to make it consistent to the previous for (;;) where the total length is calculated. OK visa@ mpi@
2017-05-27	Put an assert that M_PKTHDR is set before accessing m_pkthdr in the	Alexander Bluhm
	mbuf functions. OK claudio@
2017-05-27	Refactor m_makespace() using MCLGETI to simplify the logic of this function.	Claudio Jeker
	Still quite complicated but more legible in the end and it will do less M_GET calls for huge packets. OK bluhm@
2017-05-08	add a compile time assertion MSIZE == sizeof(struct mbuf)	Ted Unangst
	ok kettenis mpi tom