summaryrefslogtreecommitdiff
path: root/sys/netinet
AgeCommit message (Collapse)Author
2021-08-10Remove unused `ipa_pcb' from 'ipsec_acquire' structure.mvs
ok gnezdo@
2021-08-09During unidirectional data transmission, a TCP connection may stall.Alexander Bluhm
The sending machine is doing zero window probes, but is not sending any more data although the other machine announced that it has space again. The header prediction code did not update snd_wl2. If there was a sequence number wrap, the send window update block is not reached. Update snd_wl2 when receiving predicted ACKs and and update snd_wl1 and rcv_up for predicted pure data. from FreeBSD; OK sashan@ claudio@
2021-08-09Fix white spaces.Alexander Bluhm
2021-07-27Revert "Use per-CPU counters for tunnel descriptor block" diff.mvs
Panic reported by Hrvoje Popovski.
2021-07-26Use per-CPU counters for tunnel descriptor block (tdb) statistics.mvs
'tdb_data' struct became unused and was removed. ok bluhm@
2021-07-26Do not queue crypto operations for IPsec. The packet entries inAlexander Bluhm
task queues were unlimited and could overflow during havy traffic. Even if we still use hardware drivers that sleep, softnet task instead of soft interrupt can handle this now. Without queues net lock is inherited and kernel lock is only needed once per packet. This results in less lock contention and faster IPsec. Also protect tdb drop counters with net lock and avoid a leak in crypto dispatch error handling. intense testing Hrvoje Popovski; OK mpi@
2021-07-26The mbuf header cleanup in revision 1.173 of ip_icmp.c was tooAlexander Bluhm
strict. ICMP error packets generated by pf were not passed immediately, but could be blocked. Preserve PF_TAG_GENERATED flag in icmp_reflect() and icmp6_reflect(). reported by sf@; OK patrick@ kn@
2021-07-21Also count crypto errors in ipsec_input_cb() like IPsec output inAlexander Bluhm
previous commit.
2021-07-21Propagate errors from crypto_invoke() and count them in IPsec. TheyAlexander Bluhm
should not happen, but always check error conditions. tq is never NULL, remove the check. tdb->tdb_odrops++ is not MP safe, but will be addressed separately in ipsec_output_cb(). OK mvs@
2021-07-19Remove `ids' from `ipsec_ids_tree' while following ipsp_ids_insert()mvs
error path. This fixes use-after-free issue. Also fix debug message mistype pointed by bluhm@ in error path. ok millert@ bluhm@
2021-07-18Introduce and use garbage collector for 'ipsec_ids' struct entitiesmvs
destruction instead of using per-entity timeout. This fixes the races between ipsp_ids_insert(), ipsp_ids_free() and ipsp_ids_timeout(). ipsp_ids_insert() can't stop ipsp_ids_timeout() timeout handler which is already running and awaiting netlock to be released, so reused `ids' will be silently removed in this case. ipsp_ids_free() can't determine is ipsp_ids_timeout() timeout handler running because timeout_del(9) called by ipsp_ids_insert() clears it's triggered state. So ipsp_ids_timeout() could be scheduled to run twice in this case. Also hrvoje@ reported about ipsec(4) throughput increased with this diff so it seems we caught significant count of ipsp_ids_insert() races. tests and feedback by hrvoje@ ok bluhm@
2021-07-18The IPsec authentication before decryption used a different replayAlexander Bluhm
counter than after decryption. This could result in "esp_input_cb: authentication failed for packet in SA" errors. As we run crypto operations async, thousands of packets are stored in the crypto task. During the queueing the replay counter of the tdb can change. Then the higher 32 bits may increment although the lower 32 bits did not wrap. checkreplaywindow() must be called twice per packet with the same replay counter. Store the value in struct tdb_crypto while dangling in the task queue and doing crypto operations. tested by Hrvoje Popovski; joint work with tobhe@
2021-07-16Improve comments in IPsec replay window calculation.Alexander Bluhm
OK tobhe@
2021-07-14Resend the TCP packet only if the MTU locked flag appears at theAlexander Bluhm
route and was not there before. This should prevent a recursion in path MTU discovery with TCP over IPsec. reported and tested Matthias Schmidt; tested and OK tobhe@
2021-07-13Remove unused `PolicyHead' from 'sockaddr_encap' structure.mvs
ok tobhe@
2021-07-08The xformsw array never changes. Declare struct xformsw constantAlexander Bluhm
and map data read only. OK deraadt@ mvs@ mpi@
2021-07-08Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead ofmvs
doing that in runtime within ipsp_acquire_sa(). ok bluhm@
2021-07-08Debug printfs in encdebug were inconsistent, some missing newlinesAlexander Bluhm
produced ugly output. Move the function name and the newline into the DPRINTF macro. This simplifies the debug statements. OK tobhe@
2021-07-08The properties of the crypto algorithms never change. Declare themAlexander Bluhm
constant. Then they are mapped as read only. OK deraadt@ dlg@
2021-07-07tell ether_input() to call pf_test() outside of smr_read sections,Alexandr Nedvedicky
because smr_read sections don't play well with sleeping locks in pf(4). OK bluhm@
2021-07-07Fix whitespaces in IPsec code.Alexander Bluhm
2021-06-30For path MTU discovery tcp_mtudisc() should resend a TCP packet byAlexander Bluhm
calling tcp_output() if the TCP maximum segment size changes. But that did not work, as the new value was compared before tcp_mss() had a chance to modify it. Move the comparison and change it from not equal to greater than. It makes only sense to resend a packet immediately if it becomes smaller and is more likely to fit. OK sashan@ tobhe@
2021-06-21Fix uninitialized variables introduced in rev 1.361Jeremie Courreges-Anglas
Thankfully clang elided the code in an almost harmless way (at least on amd64 GENERIC.MP). Spotted by chance when building kernels with -Wno-error=uninitialized. ok dlg@ sashan@ bluhm@
2021-06-18The crypto(9) framework used by IPsec runs on a kernel task thatAlexander Bluhm
is protected by kernel lock. There were crashes in swcr_authenc() when it was accessing swcr_sessions. As a quick fix, protect all calls from network stack to crypto with kernel lock. This also covers the rekeying case that is called from pfkey via tdb_init(). OK mvs@
2021-06-03remember if the ipv4 header checksum is ok.David Gwynne
if a bridge checks the ip header before the network stack, then we can remember it was ok when the bridge checks it so the ip stack doesnt have to. ok claudio@ mvs@
2021-06-02factor out the code that does basic sanity checks on ipv4 headers.David Gwynne
this will allow these checks to be reused by bridge (where they're currently duplicated), veb, and tpmr. ok bluhm@ sashan@
2021-05-25As network features are not added dynamically, the domain structuresAlexander Bluhm
are constant. Having more const makes MP review easier. More pointers are mapped read-only in the kernel image. OK deraadt@ mvs@
2021-05-15Fix IPsec NAT-T to work with pipex(4). Introduce a new packet tagYASUOKA Masahiko
PACKET_TAG_IPSEC_FLOWINFO to specify the IPsec flow. ok mvs
2021-05-12Use local copy of `ps_rtableid' in ip{,6}_ctloutput() and markmvs
`ps_rtableid' as atomic. This allows us to unlock setrtable(2). ok claudio@ mpi@
2021-05-04Initialize `ipsec_policy_pool' within pfkey_init() instead of doing thatmvs
in runtime within pfkeyv2_send(). Also set it's interrupt protection level to IPL_SOFTNET. ok bluhm@ mpi@
2021-04-30Rearrange the implementation of bounded sysctl. The primitiveAlexander Bluhm
functions are sysctl_int() and sysctl_rdint(). This brings us back the 4.4BSD implementation. Then sysctl_int_bounded() builds the magic for range checks on top. sysctl_bounded_arr() is a wrapper around it to support multiple variables. Introduce macros that describe the meaning of the magic boundary values. Use these macros in obvious places. input and OK gnezdo@ mvs@
2021-04-28Use mq_delist() to fetch the ARP mbuf hold queue once and feed theAlexander Bluhm
mbuf list to if_output(). OK sashan@ mvs@
2021-04-28Document the locking mechanism of the global variables in ARP code.Alexander Bluhm
The global list of ARP llinfo is protected by net lock. This is not sufficent when we switch to shared netlock. Add a mutex for insertion and removal when net lock is not exclusive. This is needed if we want run IP output on multiple CPU. Put an assertion for shared net lock into arp_rtrequest. input mvs@; OK sashan@
2021-04-26Convert the ARP packet hold queue from mbuf list to mbuf queue whichAlexander Bluhm
contins a mutex. Update la_hold_total with atomic operations. OK sashan@
2021-04-23Setting variable arpinit_done is not MP save if we want to executeAlexander Bluhm
arp_rtrequest() in parallel. Move initialization to arpinit() function. OK kettenis@ mvs@
2021-04-23The variable la_hold_total contains the number of packets currentlyAlexander Bluhm
in the arp queue. So the sysctl net.inet.ip.arpqueued must be read only. In if_ether.c include the header with the declaration of la_hold_total to ensure that the definition matches. OK mvs@
2021-04-16Turn on the direct ACK on every other segment.Alexander Bluhm
This is a backout of rev 1.366 which turned this feature off. Although sending less ACKs makes TCP faster if the CPU is busy with processing packets, there are corner cases where TCP gets slower. Especially OpenBSD 6.8 and older has a maxbust limitiation that scales badly if the other side sends too few ACKs. Also regress test relayd run-args-http-slow-consumer.pl uses strange socket buffer sizes that triggers slow performance with the new algorithm. For OpenBSD 6.9 release switch back to 6.8 delayed ACK behavior. discussed with deraadt@ benno@ claudio@ jan@
2021-03-30[ICMP] IP options lead to malformed replyAlexandr Nedvedicky
icmp_send() must update IP header length if IP optaions are appended. Such packet also has to be dispatched with IP_RAWOUTPUT flags. Bug reported and fix co-designed by Dominik Schreilechner _at_ siemens _dot_ com OK bluhm@
2021-03-20use m_dup_pkthdr in ip_fragment to copy pkthdr info to fragments.David Gwynne
this ensures more stuff is copied, in particular the flowid information. this is also how v6 does it, which makes things more consistent. ok bluhm@
2021-03-10spellingJonathan Gray
ok gnezdo@ semarie@ mpi@
2021-03-07use uint64_t ethernet addresses for compares in carp.David Gwynne
pass the uint64_t that ether_input has already converted from a real ethernet address into carp_input so it can use it without having to do its own conversion. tested by hrvoje popovski tested by me on amd64 and sparc64 ok patrick@ jmatthew@
2021-03-05pass the uint64_t dst ethernet address from ether_input to bridges.David Gwynne
tested on amd64 and sparc64.
2021-03-01Refactor ip_fragment() and ip6_fragment(). Use a mbuf list toAlexander Bluhm
simplify the handling of the fragment list. Now the functions ip_fragment() and ip6_fragment() always consume the mbuf. They free the mbuf and mbuf list in case of an error and take care about the counter. Adjust the code a bit to make v4 and v6 look similar. Fixes a potential mbuf leak when pf_route6() called pf_refragment6() and it failed. Now the mbuf is always freed by ip6_fragment(). OK dlg@ mvs@
2021-02-26add some helpers for working with ethernet addresses as uint64_tDavid Gwynne
the main bits are ether_addr_to_e64 and ether_e64_to addr for loading an ethernet address into a uin64_t and visa versa. there's also some macros for testing if an address in a uint64_t is multicast, broadcast, anyaddr, or if it's an 802.1q reserved multicast group address. the reason for this functionality is once you have an ethernet address as a uint64_t, operations like compares, bit tests, and so on are fast and easy. tested on amd64 and sparc64
2021-02-25we don't have to cast to caddr_t when calling m_copydata anymore.David Gwynne
the first cut of this diff was made with coccinelle using this spatch: @rule@ type caddr_t; expression m, off, len, cp; @@ -m_copydata(m, off, len, (caddr_t)cp) +m_copydata(m, off, len, cp) i had fix it's opinionated idea of formatting by hand though, so i'm not sure it was worth it. ok deraadt@ bluhm@
2021-02-23Use pool to allocate tdbs.tobhe
ok patrick@ bluhm@
2021-02-23As ip_insertoptions() may prepend a mbuf, "goto bad" has to freeAlexander Bluhm
the new chain. This fixes a potential memory leak in ip_output(). Also simplify a bunch of "goto done". OK kn@ mvs@
2021-02-23Use NULL instead of 0 in `m_nextpkt' assignment.mvs
ok deraadt@ dlg@
2021-02-11Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().Patrick Wildt
Technically the whole point of the stoeplitz API is that it's symmetric, meaning that the order of addresses and ports doesn't matter and will produce the same hash value. Coverity CID 1501717 ok dlg@
2021-02-10If pf changes the routing table when sending packets, the kernelAlexander Bluhm
could get stuck in an endless recursion during TCP path MTU discovery. Create a dynamic host route in ip_output() that can be used by tcp_mtudisc() to store the MTU. Reported by Peter Mueller and Sebastian Sturm OK claudio@