summaryrefslogtreecommitdiff
path: root/sys/netinet/ip_divert.c
AgeCommit message (Collapse)Author
2019-02-04Avoid an mbuf double free in the oob soreceive() path. In theAlexander Bluhm
usrreq functions move the mbuf m_freem() logic to the release block instead of distributing it over the switch statement. Then the goto release in the initial check, whether the pcb still exists, will not free the mbuf for the PRU_RCVD, PRU_RVCOOB, PRU_SENSE command. OK claudio@ mpi@ visa@ Reported-by: syzbot+8e7997d4036ae523c79c@syzkaller.appspotmail.com
2018-11-10Do not translate the EACCES error from pf(4) to EHOSTUNREACH anymore.Alexander Bluhm
It also translated a documented send(2) EACCES case erroneously. This was too much magic and always prone to errors. from Jan Klemkow; man page jmc@; OK claudio@
2018-10-04Revert the inpcb table mutex commit. It triggers a witness panicAlexander Bluhm
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
2018-09-20As a step towards per inpcb or socket locks, remove the net lockAlexander Bluhm
for netstat -a. Introduce a global mutex that protects the tables and hashes for the internet PCBs. To detect detached PCB, set its inp_socket field to NULL. This has to be protected by a per PCB mutex. The protocol pointer has to be protected by the mutex as netstat uses it. Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify() before the table mutex to avoid lock ordering problems in the notify functions. OK visa@
2018-04-24Push NET_LOCK down in the default ifioctl case.Paul Irofti
For the PRU_CONTROL bit the NET_LOCK surrounds in[6]_control() and on the ENOTSUPP case we guard the driver if_ioctl functions. OK mpi@
2017-11-02Move PRU_DETACH out of pr_usrreq into per proto pr_detachFlorian Obser
functions to pave way for more fine grained locking. Suggested by, comments & OK mpi
2017-10-09Reduces the scope of the NET_LOCK() in sysctl(2) path.Martin Pieuchot
Exposes per-CPU counters to real parrallelism. ok visa@, bluhm@, jca@
2017-10-06Unfortunately I removed too much in my previous commit and brokeAlexander Bluhm
divert-packet. Bring back the loop over the global list to find the divert socket.
2017-10-06Kill the divert-packet socket option IP_DIVERTFL to filter packets.Alexander Bluhm
It used a loop over the global list divbtable that would be hard to make MP safe. The port net/dnsfilter does not work without this, it should be converted to divert-to. Neither other ports nor base use this filter feature. ports checked by sthen@; OK mpi@ benno@
2017-09-06Replace the call to ifa_ifwithaddr() in divert6_output() with aAlexander Bluhm
route lookup to make it MP safe. Only set the mbuf header fields that are needed. Validate the name input. Also use the same variables in IPv4 and IPv6 functions and avoid unneccessary initialization. OK mpi@
2017-09-06Replace the call to ifa_ifwithaddr() in divert_output() with a routeAlexander Bluhm
lookup to make it MP safe. Only set the mbuf header fields that are needed. Validate the name input. OK mpi@
2017-09-05Replace NET_ASSERT_LOCKED() by soassertlocked() in *_usrreq().Martin Pieuchot
Not all of them need the NET_LOCK(). ok bluhm@
2017-07-27Grab the KERNEL_LOCK() before calling sorwakeup().Martin Pieuchot
In the forwarding path, pf_test() is executed w/o KERNEL_LOCK() and in case of divert end up calling sowakup(). However selwakup() and csignal() are not yet ready to be executed w/o KERNEL_LOCK(). ok bluhm@
2017-06-26Assert that the corresponding socket is locked when manipulating socketMartin Pieuchot
buffers. This is one step towards unlocking TCP input path. Note that all the functions asserting for the socket lock are not necessarilly MP-safe. All the fields of 'struct socket' aren't protected. Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to tell when a filter needs to lock the underlying data structures. Logic and name taken from NetBSD. Tested by Hrvoje Popovski. ok claudio@, bluhm@, mikeb@
2017-05-30Introduce ipv{4,6}_input(), two wrappers around IP queues.Martin Pieuchot
This will help transitionning to an un-KERNEL_LOCK()ed IP forwarding path. Disucssed with bluhm@, ok claudio@
2017-04-05When building counter memory in preparation to copy to userland, alwaysTheo de Raadt
zero the buffers first. All the current objects appear to be safe, however future changes might introduce structure pads. Discussed with guenther, ok bluhm
2017-03-13Move PRU_ATTACH out of the pr_usrreq functions into pr_attach.Claudio Jeker
Attach is quite a different thing to the other PRU functions and this should make locking a bit simpler. This also removes the ugly hack on how proto was passed to the attach function. OK bluhm@ and mpi@ on a previous version
2017-02-09percpu counters for divert(4) statsJeremie Courreges-Anglas
ok dlg@
2017-01-29Change the IPv4 pr_input function to the way IPv6 is implemented,Alexander Bluhm
to get rid of struct ip6protosw and some wrapper functions. It is more consistent to have less different structures. The divert_input functions cannot be called anyway, so remove them. OK visa@ mpi@
2017-01-25Since raw_input() and route_input() are gone from pr_input, we canAlexander Bluhm
make the variable parameters of the protocol input functions fixed. Also add the proto to make it similar to IPv6. OK mpi@ guenther@ millert@
2016-12-19Introduce the NET_LOCK() a rwlock used to serialize accesses to the partsMartin Pieuchot
of the network stack that are not yet ready to be executed in parallel or where new sleeping points are not possible. This first pass replace all the entry points leading to ip_output(). This is done to not introduce new sleeping points when trying to acquire ART's write lock, needed when a new L2 entry is created via the RT_RESOLVE. Inputs from and ok bluhm@, ok dlg@
2016-11-21Enforce that pr_usrreq functions are called at IPL_SOFTNET.Martin Pieuchot
This will allow us to keep locking simple as soon as we trade splsoftnet() for a rwlock. ok bluhm@, claudio@
2016-03-07Sync no-argument function declaration and definition by adding (void).Christian Weisgerber
ok mpi@ millert@
2015-09-09if_put after if_getDavid Gwynne
ok mpi@
2015-09-01Replace sockaddr casts with the proper satosin(), ... calls.Alexander Bluhm
From David Hill; OK mpi@; tested kspillner@; tweaks bluhm@
2015-08-14Replace sockaddr casts with the proper satosin() or satosin6() calls.Alexander Bluhm
From David Hill; OK mpi@
2015-07-15m_freem() can handle NULL, do not check for this condition beforehands.Theo de Raadt
ok stsp mpi
2015-06-16Store a unique ID, an interface index, rather than a pointer to theMartin Pieuchot
receiving interface in the packet header of every mbuf. The interface pointer should now be retrieved when necessary with if_get(). If a NULL pointer is returned by if_get(), the interface has probably been destroy/removed and the mbuf should be freed. Such mechanism will simplify garbage collection of mbufs and limit problems with dangling ifp pointers. Tested by jmatthew@ and krw@, discussed with many. ok mikeb@, bluhm@, dlg@
2015-04-10replace the use of ifqueues for most input queues serviced by netisrDavid Gwynne
with niqueues. this change is so big because there's a lot of code that takes pointers to different input queues (eg, ether_input picks between ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through to code to enqueue packets against the pointer. if i changed only one of the input queues id have to add sepearate code paths, one for ifqueues and one for niqueues in each of these places by flipping all these input queues at once i can keep the currently common code common. testing by mpi@ sthen@ and rafael zalamena ok mpi@ sthen@ claudio@ henning@
2015-01-24Userland (base & ports) was adapted to always include <netinet/in.h>Theo de Raadt
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be cleaned up next. Some sockaddr_union steps make it into here as well. ok naddy
2014-12-05Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.Martin Pieuchot
ok mikeb@, krw@, bluhm@, tedu@
2014-09-08remove uneeded route.h includesJonathan Gray
ok miod@ mpi@
2014-08-10Fix the length check for reinjected ICMP packets: sizeof(struct icmp) isLawrence Teo
28 but an ICMP packet can be as small as 8 bytes (e.g. an ICMP echo request packet with no payload), so check against ICMP_MINLEN instead. Prior to this fix, divert(4) would erroneously discard valid ICMP packets that are shorter than 20 bytes. ICMPv6 is not affected, so this change applies to ICMP over IPv4 only. ok florian@ henning@
2014-08-10Rename p_hdrlen to min_hdrlen to better reflect its purpose.Lawrence Teo
No object file change ok florian@ henning@
2014-07-22Fewer <netinet/in_systm.h> !Martin Pieuchot
2014-07-12Remove the redundant csum_flag variable and just set the checksum flagLawrence Teo
in the pkthdr directly. ok henning@
2014-07-12Protocol checksums have been recalculated on reinjection for a whileLawrence Teo
now, so there is no need to calculate them before sending them to userspace. ok henning@
2014-07-12Implement checksum offload for divert(4): simply set the checksum flagLawrence Teo
and let the stack take care of the checksums for reinjected outbound packets. Reinjected inbound packets will continue to have their checksums calculated manually but we can now take advantage of in_proto_cksum_out and in6_proto_cksum_out to streamline the way their checksums are done. help from florian@ and henning@, feedback from naddy@ ok florian@ henning@
2014-07-10Simplify the way divert(4) sends packets to userspace: Instead ofLawrence Teo
unnecessarily allocating an mbuf tag to store the divert port, just pass the divert port directly to divert_packet() or divert6_packet() as an argument. includes a style fix pointed out by bluhm@ ok bluhm@ henning@ reyk@
2014-04-23No need for vargs here.Florian Obser
While there move declaration of divert{,6}_output() to .c as it's a private function. Also switch first two args to make it more like similar functions (both suggested by mpi@). Input/OK mpi@, OK lteo@
2014-04-21ip_output() using varargs always struck me as bizarre, esp since it's onlyHenning Brauer
ever used to pass on uint32 (for ipsec). stop that madness and just pass the uint32, 0 in all cases but the two that pass the ipsec flowinfo. ok deraadt reyk guenther
2014-04-14"struct pkthdr" holds a routing table ID, not a routing domain one.Martin Pieuchot
Avoid the confusion by using an appropriate name for the variable. Note that since routing domain IDs are a subset of the set of routing table IDs, the following idiom is correct: rtableid = rdomain But to get the routing domain ID corresponding to a given routing table ID, you must call rtable_l2(9). claudio@ likes it, ok mikeb@
2014-04-07Retire kernel support for SO_DONTROUTE, this time without breakingMartin Pieuchot
localhost connections. The plan is to always use the routing table for addresses and routes resolutions, so there is no future for an option that wants to bypass it. This option has never been implemented for IPv6 anyway, so let's just remove the IPv4 bits that you weren't aware of. Tested a least by lteo@, guenther@ and chrisz@, ok mikeb@, benno@
2014-03-28revert "Retire kernel support for SO_DONTROUTE" diff, which does bad thingsStuart Henderson
for localhost connections. discussed with deraadt@
2014-03-27Retire kernel support for SO_DONTROUTE, since the plan is to alwaysMartin Pieuchot
use the routing table there's no future for an option that wants to bypass it. This option has never been implemented for IPv6 anyway, so let's just remove the IPv4 bits that you weren't aware of. Tested by florian@, man pages inputs from jmc@, ok benno@
2014-01-09bzero/bcmp -> memset/memcmp. ok matthewTed Unangst
2013-12-20Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQKenneth R Westerback
in the base. Ports fixes to follow shortly for the two ports (gkrellm and net-snmp) affected. ok zhuk@ millert@
2013-11-15Rename the struct pf_divert variable in divert_packet() andLawrence Teo
divert6_packet() from "pd" to "divert" to match the rest of the source. I think "pd" was not a good name for a struct pf_divert because "pd" usually refers to a pf_pdesc. No object file change. OK benno@ bluhm@ henning@
2013-04-08Recalculate the IP and protocol checksums of packets (re)injected viaLawrence Teo
divert(4) sockets. Recalculation of these checksums is necessary because (1) PF no longer updates IP checksums as of pf.c rev 1.731, so translated packets that are diverted to userspace (e.g. divert-packet with nat-to/rdr-to) will have bad IP checksums and will be reinjected with bad IP checksums if the userspace program doesn't correct the checksums; (2) the userspace program may modify the packets, which would invalidate the checksums; and (3) the divert(4) man page states that checksums are supposed to be recalculated on reinjection. This diff has been tested on a public webserver serving both IPv4/IPv6 for more than four weeks. It has also been tested on a firewall with divert-packet and nat-to/rdr-to where it transferred over 60GB of FTP/HTTP/HTTPS/SSH/DNS/ICMP/ICMPv6 data correctly, using IPv4/IPv6 userspace programs that intentionally break the IP and protocol checksums to confirm that recalculation is done correctly on reinjection. IPv6 extension headers were tested with Scapy. Thanks to florian@ for testing the original version of the diff with dnsfilter and Justin Mayes for testing the original version with Snort inline. Thanks also to todd@ for helping me in my search for the cause of this bug. I would especially like to thank blambert@ for reviewing many versions of this diff, and providing guidance and tons of helpful feedback. no objections from florian@ help/ok blambert@, ok henning@
2013-04-02Use macros sotoinpcb() and intotcpcb() instead of casts. Use NULLAlexander Bluhm
instead of 0 for pointers. No binary change. OK mpi@