summaryrefslogtreecommitdiff
path: root/sys/net/pf_norm.c
AgeCommit message (Collapse)Author
2024-07-14Unlock IPv6 sysctl net.inet6.ip6.forwarding from net lock.Alexander Bluhm
Use atomic operations to read ip6_forwarding while processing packets in the network stack. To make clear where actually the router property is needed, use the i_am_router variable based on ip6_forwarding. It already existed in nd6_nbr. Move i_am_router setting up the call stack until all users are independent. The forwarding decisions in pf_test, pf_refragment6, ip6_input do also not interfere. Use a new array ipv6ctl_vars_unlocked to make transition of all the integer sysctls easier. Adapt IPv4 to the new style. OK mvs@
2024-07-04Implement IPv6 forwarding IPsec only.Alexander Bluhm
IPsec gateways set the forwarding sysctl to 2. While this worked for IPv4 since a long time, adapt this feature for IPv6 now. Set sysctl net.inet6.ip6.forwarding=2 to forward only packets that have been processed by IPsec. Set IPV6_FORWARDING_IPSEC in ip6_input() and pass the flag down to the call stack. This provides consistent view on global variable ip6_forwarding. In ip6_output() or ip6_forward() drop packets that do not match the policy. OK denis@
2024-06-20Read IPv6 forwarding value only once while processing a packet.Alexander Bluhm
IPv4 uses IP_FORWARDING to pass down a consistent value of net.inet.ip.forwarding down the stack. This is needed for unlocking sysctl. Do the same for IPv6. Read ip6_forwarding once in ip6_input_if() and pass down IPV6_FORWARDING as flags to ip6_ours(), ip6_hbhchcheck(), ip6_forward(). Replace the srcrt value with IPV6_REDIRECT flag for consistency with IPv4. To have common syntax with IPv4, use ip6_forwarding == 0 checks instead of !ip6_forwarding. This will also make it easier to implement net.inet6.ip6.forwarding=2 for IPsec only forwarding later. In nd6_ns_input() and nd6_na_input() read ip6_forwarding once and store it in i_am_router. The variable name has been chosen to avoid confusion with is_router, which indicates router flag of the packet. Reading of ip6_forwarding is done independently from ip6_input_if(), consistency does not really matter. One is for ND router behavior the other for forwarding. Again use the ip6_forwarding != 0 check, so when ip6_forwarding IPsec only value 2 gets implemented, it will behave like a router. OK deraadt@ sashan@ florian@ claudio@
2024-04-22Show pf fragment reassembly counters.Alexander Bluhm
Framgent count and statistics are stored in struct pf_status. From there pfctl(8) and systat(1) collect and show them. Note that pfctl -s info needs the -v switch to show fragments. As fragment reassembly has its own mutex, also grab this in pf ipctl(2) and sysctl(2) code. input claudio@; OK henning@
2023-10-10Remove dead code in pf_pull_hdr().Alexander Bluhm
pf_pull_hdr() allows to pass an action pointer parameter as output value. This is never used, all callers pass a NULL argument. Remove ACTION_SET() entirely. The logic (fragoff >= len) in pf_pull_hdr() does not work since revision 1.4. Before it was used to drop short TCP or UDP fragments that contained only part of the header. Current code in pf_pull_hdr() drops the packets anyway, so always set reason PFRES_FRAG. OK kn@ sashan@
2023-07-06big update to pfsync to try and clean up locking in particular.David Gwynne
moving pf forward has been a real struggle, and pfsync has been a constant source of pain. we have been papering over the problems for a while now, but it reached the point that it needed a fundamental restructure, which is what this diff is. the big headliner changes in this diff are: - pfsync specific locks this is the whole reason for this diff. rather than rely on NET_LOCK or KERNEL_LOCK or whatever, pfsync now has it's own locks to protect it's internal data structures. this is important because pfsync runs a bunch of timeouts and tasks to push pfsync packets out on the wire, or when it's handling requests generated by incoming pfsync packets, both of which happen outside pf itself running. having pfsync specific locks around pfsync data structures makes the mutations of these data structures a lot more explicit and auditable. - partitioning to enable future parallelisation of the network stack, this rewrite includes support for pfsync to partition states into different "slices". these slices run independently, ie, the states collected by one slice are serialised into a separate packet to the states collected and serialised by another slice. states are mapped to pfsync slices based on the pf state hash, which is the same hash that the rest of the network stack and multiq hardware uses. - no more pfsync called from netisr pfsync used to be called from netisr to try and bundle packets, but now that there's multiple pfsync slices this doesnt make sense. instead it uses tasks in softnet tqs. - improved bulk transfer handling there's shiny new state machines around both the bulk transmit and receive handling. pfsync used to do horrible things to carp demotion counters, but now it is very predictable and returns the counters back where they started. - better tdb handling the tdb handling was pretty hairy, but hrvoje has kicked this around a lot with ipsec and sasyncd and we've found and fixed a bunch of issues as a result of that testing. - mpsafe pf state purges this was committed previously, but because the locks pfsync relied on weren't clear this just caused a ton of bugs. as part of this diff it's now reliable, and moves a big chunk of work out from under KERNEL_LOCK, which in turn improves the responsiveness and throughput of a firewall even if you're not using pfsync. there's a bunch of other little changes along the way, but the above are the big ones. hrvoje has done performance testing with this diff and notes a big improvement when pfsync is not in use. performance when pfsync is enabled is about the same, but im hoping the slices means we can scale along with pf as it improves. lots (months) of testing by me and hrvoje on pfsync boxes tests and ok sashan@ deraadt@ says this is a good time to put it in
2023-05-07I preparation for TSO in software, cleanup the fragment code. UseAlexander Bluhm
if_output_ml() to send mbuf lists to interfaces. This can be used for TSO, fragments, ARP and ND6. Rename variable fml to ml. In pf_route6() split the if else block. Put the safety check (hlen + firstlen < tlen) into ip_fragment(). It makes the code correct in case the packet is too short to be fragmented. This should not happen, but other functions also have this logic. No functional change. OK sashan@
2022-11-06move pfsync_state_import in if_pfsync.c to pf_state_import in pf.cDavid Gwynne
this is straightening the deck chairs. the state import and export code are used by both the pf ioctls and pfsync, but the export code is in pf.c and the import code is in if_pfsync. if pfsync was disabled then the ioctl stuff wouldnt link. moving the import code to pf.c makes it more symmetrical(?) and robust. tweaks and ok from kn@ sashan@
2022-10-10Recalculate checksum of normalised packetBjorn Ketelaars
In 2011, henning@ removed fiddling with the ip checksum of normalised packets in r1.131 of sys/net/pf_norm.c. Rationale was that the checksum is always recalculated in all output paths anyway. In 2016, procter@ reintroduced checksum modification to preserve end-to-end checksums in r1.189 of sys/net/pf_norm.c. Likely soomewhere in that timeslot checksum recalculation of normalised packets was broken. With input from bluhm@. OK sashan@, bluhm@
2022-08-22Protect pf_reassemble() with pf fragment lock. When the pool limitAlexander Bluhm
for fragment entries was reached, pf_create_fragment() called pf_flush_fragments() without lock. This could result in a crash. Let PF_FRAG_LOCK() cover the whole pf_reassemble() function as pf_nfrents++ was also missing the lock. crash found and fix tested by Hrvoje Popovski; OK sashan@
2021-03-10spellingJonathan Gray
ok gnezdo@ semarie@ mpi@
2021-03-01Refactor ip_fragment() and ip6_fragment(). Use a mbuf list toAlexander Bluhm
simplify the handling of the fragment list. Now the functions ip_fragment() and ip6_fragment() always consume the mbuf. They free the mbuf and mbuf list in case of an error and take care about the counter. Adjust the code a bit to make v4 and v6 look similar. Fixes a potential mbuf leak when pf_route6() called pf_refragment6() and it failed. Now the mbuf is always freed by ip6_fragment(). OK dlg@ mvs@
2021-02-22When cutting of the head of an overlapping fragment during pfAlexander Bluhm
reassembly, reinsert the fragment into the lookup table with correct index. Reported-by: syzbot+d043455a5346f726f1c4@syzkaller.appspotmail.com OK claudio@
2021-02-09Activate use of PF_LOCK() by removing the WITH_PF_LOCK ifdefs.Patrick Wildt
Silence from the network group ok sashan@
2020-06-24kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha
time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
2019-02-28IPv6 fragments with malformed extension headers could be erroneouslyAlexander Bluhm
passed by pf or cause a panic in pf. fix from sashan@; OK bluhm@ claudio@ bug found by Corentin Bayet, Nicolas Collignon, Luca Moro at Synacktiv
2018-10-23Make pf compile without DIAGNOSTIC againReyk Floeter
OK bluhm@ kn@
2018-09-10Instead of calculating the mbuf packet header length here and there,Alexander Bluhm
put the algorithm into a new function m_calchdrlen(). Also set an uninitialized m_len to 0 in NFS code. OK claudio@
2018-09-10During fragment reassembly, mbuf chains with packet headers wereAlexander Bluhm
created. Add a new function m_removehdr() do convert packet header mbufs within the chain to regular mbufs. Assert that the mbuf at the beginning of the chain has a packet header. found by Maxime Villard in NetBSD; from markus@; OK claudio@
2018-09-10Limit the fragment entry queue length to 64 per bucket. So we haveAlexander Bluhm
a global limit of 1024 fragments, but it is fine grained to the region of the packet. Smaller packets may have less fragments. This costs another 16 bytes of memory per reassembly and devides the worst case for searching by 8. requestd by claudio@; OK sashan@ claudio@
2018-09-08Split the pf(4) fragment reassembly queue into smaller parts.Alexander Bluhm
Remember 16 entry points based on the fragment offset. Instead of a worst case of 8196 list traversals we now check a maximum of 512 list entries or 16 array elements. discussed with claudio@ and sashan@; OK sashan@
2018-09-04Forgot to rename pf_frent_holes() prototype in previous commit.Alexander Bluhm
2018-09-04Avoid traversing the list of fragment entris to check whether theAlexander Bluhm
pf(4) reassembly is complete. Instead count the holes that are created when inserting a fragment. If there are no holes left, the fragments are continuous. idea from claudio@; OK claudio@ sashan@
2018-06-18Refactor the six ways to find TCP options into one new function. As a result:Richard Procter
- MSS and WSCALE option candidates must now meet their min type length. - 'max-mss' is now more tolerant of malformed option lists. These changes were immaterial to the live traffic I've examined. OK sashan@ mpi@
2018-02-06some finger muscle workout:Henning Brauer
bzero -> memset and (very few) bcopy -> memcpy/memmove
2017-06-26Fragments for a single connection (a combination of proto,src,dst,af)Alexander Bluhm
may easily reuse the fragment id as it is only 16 bit for IPv4. To avoid that pf reassembles them into the wrong packet, throw away stale fragments. With the default timeout this happens after 12,000 newer fragements have been seen. from markus@; OK sashan@
2017-06-24To avoid packet loss due to reuse of the 16 bit IPv4 fragment id,Alexander Bluhm
we need suitable data structures. Organize the pf fragments with two red-black trees. One is holding the address and protocol information and the other has only the fragment id. This will allow to drop fragemts for specific connections more aggressively. ` from markus@; OK sashan@
2017-06-19When dealing with mbuf pointers passed down as function parameters,Alexander Bluhm
bugs could easily result in use-after-free or double free. Introduce m_freemp() which automatically resets the pointer before freeing it. So we have less dangling pointers in the kernel. OK krw@ mpi@ claudio@
2017-06-05- let's add PF_LOCK()Alexandr Nedvedicky
to enable PF_LOCK(), you must add 'option WITH_PF_LOCK' to your kernel configuration. The code does not do much currently it's just the very small step towards MP. O.K. henning@, mikeb@, mpi@
2017-05-15Enable the NET_LOCK(), take 3.Martin Pieuchot
Recursions are still marked as XXXSMP. ok deraadt@, bluhm@
2017-04-23Some of the LOG_NOTICE messages from PF were seen in normal operationsStuart Henderson
with certain rulesets and excessively noisy; move them to LOG_INFO (which was previously unused). ok benno@
2017-03-17Revert the NET_LOCK() and bring back pf's contention lock for release.Martin Pieuchot
For the moment the NET_LOCK() is always taken by threads running under KERNEL_LOCK(). That means it doesn't buy us anything except a possible deadlock that we did not spot. So make sure this doesn't happen, we'll have plenty of time in the next release cycle to stress test it. ok visa@
2017-01-30removes the pf_consistency_lock and protects the users withSebastian Benoit
NET_LOCK(). pfioctl() will need the NET_LOCK() anyway. So better keep things simple until we're going to redesign PF for a MP world. fixes the crash reported by Kaya Saman. ok mpi@, bluhm@
2016-12-29In pf_refragment6() use the valid route from pf_route6() insteadAlexander Bluhm
of calling rtalloc() again. OK mpi@
2016-12-29Use __func__ instead of explicit function name in panic messages.Alexander Bluhm
2016-12-28Fix white spaces. No binary change.Alexander Bluhm
2016-11-22Fold union pf_headers buffer into struct pf_pdesc (enabled by pfvar_priv.h).Richard Procter
Prevent pf_socket_lookup() reading uninitialised header buffers on fragments. OK blum@ sashan@
2016-11-21Follow RFC 5722 more strictly when handling overlapping fragmentsAlexander Bluhm
in pf. Drop the whole fragment state if IPv6 fragments appear which have invalid length or fragment-offset or more-fragment-bit. In IPv4 they are considered invalid and just dropped like before. Found by Antonios Atlasis; OK sashan@ sthen@
2016-10-26Put union pf_headers and struct pf_pdesc into separate header fileAlexander Bluhm
pfvar_priv.h. The pf_headers had to be defined in multiple .c files before. In pfvar.h it would have unknown storage size, this file is included in too many places. The idea is to have a private pf header that is only included in the pf part of the kernel. For now it contains pf_pdesc and pf_headers, it may be extended later. discussion, input and OK henning@ procter@ sashan@
2016-09-27roll back turning RB into RBT until i get better at this process.David Gwynne
2016-09-27move pf from the RB macros to the RBT functions.David Gwynne
2016-09-15all pools have their ipl set via pool_setipl, so fold it into pool_init.David Gwynne
the ioff argument to pool_init() is unused and has been for many years, so this replaces it with an ipl argument. because the ipl will be set on init we no longer need pool_setipl. most of these changes have been done with coccinelle using the spatch below. cocci sucks at formatting code though, so i fixed that by hand. the manpage and subr_pool.c bits i did myself. ok tedu@ jmatthew@ @ipl@ expression pp; expression ipl; expression s, a, o, f, m, p; @@ -pool_init(pp, s, a, o, f, m, p); -pool_setipl(pp, ipl); +pool_init(pp, s, a, ipl, f, m, p);
2016-09-02pool_setipl for pf bitsDavid Gwynne
ok phessler@ henning@
2016-08-24Kill ip6_forward_rt reducing differences between v4 and v6.Martin Pieuchot
A single forwarding cache is not the answer. The answer is 42... err PF! ok bluhm@
2016-08-17Reintroduce 5.3-style checksum modification to preserve end-to-end checksumsprocter
when fiddling with packets but without the mess that motivated Henning to remove it. Affects only this one aspect of Henning's checksum work. Also tweak the basic algorithm and supply a correctness argument. OK dlg@ deraadt@ sthen@; no objection henning@
2016-06-15Kill nd6_output(), it doesn't do anything since the resolution logicMartin Pieuchot
has been moved to nd6_resolve(). ok visa@, millert@, florian@, sthen@
2016-06-15There's no need to convert values returned by arc4random to the networkMike Belopuhov
byte order. Spotted by Gleb Smirnoff (glebius@FreeBSD.org), thanks! ok tedu
2016-05-31Do not call nd6_output() without route entry argument.Martin Pieuchot
ok sthen@, bluhm@
2016-05-28Backout pf.c r1.972, pf_norm.c r1.184, ok claudioStuart Henderson
pf_test calls pf_refragment6 with dst=NULL, which is passed down to rtable_match which attempts to dereference it.
2016-05-24Do not call nd6_output() without route entry argument.Martin Pieuchot
ok bluhm@