Age | Commit message (Collapse) | Author |
|
usrreq functions move the mbuf m_freem() logic to the release block
instead of distributing it over the switch statement. Then the
goto release in the initial check, whether the pcb still exists,
will not free the mbuf for the PRU_RCVD, PRU_RVCOOB, PRU_SENSE
command.
OK claudio@ mpi@ visa@
Reported-by: syzbot+8e7997d4036ae523c79c@syzkaller.appspotmail.com
|
|
It also translated a documented send(2) EACCES case erroneously.
This was too much magic and always prone to errors.
from Jan Klemkow; man page jmc@; OK claudio@
|
|
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@
|
|
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@
|
|
For the PRU_CONTROL bit the NET_LOCK surrounds in[6]_control() and
on the ENOTSUPP case we guard the driver if_ioctl functions.
OK mpi@
|
|
functions to pave way for more fine grained locking.
Suggested by, comments & OK mpi
|
|
Exposes per-CPU counters to real parrallelism.
ok visa@, bluhm@, jca@
|
|
divert-packet. Bring back the loop over the global list to find
the divert socket.
|
|
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@
|
|
route lookup to make it MP safe. Only set the mbuf header fields
that are needed. Validate the name input. Also use the same
variables in IPv4 and IPv6 functions and avoid unneccessary
initialization.
OK mpi@
|
|
lookup to make it MP safe. Only set the mbuf header fields that
are needed. Validate the name input.
OK mpi@
|
|
Not all of them need the NET_LOCK().
ok bluhm@
|
|
In the forwarding path, pf_test() is executed w/o KERNEL_LOCK() and
in case of divert end up calling sowakup(). However selwakup() and
csignal() are not yet ready to be executed w/o KERNEL_LOCK().
ok bluhm@
|
|
buffers.
This is one step towards unlocking TCP input path. Note that all the
functions asserting for the socket lock are not necessarilly MP-safe.
All the fields of 'struct socket' aren't protected.
Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to
tell when a filter needs to lock the underlying data structures. Logic
and name taken from NetBSD.
Tested by Hrvoje Popovski.
ok claudio@, bluhm@, mikeb@
|
|
This will help transitionning to an un-KERNEL_LOCK()ed IP
forwarding path.
Disucssed with bluhm@, ok claudio@
|
|
zero the buffers first. All the current objects appear to be safe,
however future changes might introduce structure pads.
Discussed with guenther, ok bluhm
|
|
Attach is quite a different thing to the other PRU functions and
this should make locking a bit simpler. This also removes the ugly
hack on how proto was passed to the attach function.
OK bluhm@ and mpi@ on a previous version
|
|
ok dlg@
|
|
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@
|
|
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@
|
|
of the network stack that are not yet ready to be executed in parallel or
where new sleeping points are not possible.
This first pass replace all the entry points leading to ip_output(). This
is done to not introduce new sleeping points when trying to acquire ART's
write lock, needed when a new L2 entry is created via the RT_RESOLVE.
Inputs from and ok bluhm@, ok dlg@
|
|
This will allow us to keep locking simple as soon as we trade
splsoftnet() for a rwlock.
ok bluhm@, claudio@
|
|
ok mpi@ millert@
|
|
ok mpi@
|
|
From David Hill; OK mpi@; tested kspillner@; tweaks bluhm@
|
|
From David Hill; OK mpi@
|
|
ok stsp mpi
|
|
receiving interface in the packet header of every mbuf.
The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.
Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.
Tested by jmatthew@ and krw@, discussed with many.
ok mikeb@, bluhm@, dlg@
|
|
with niqueues.
this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places
by flipping all these input queues at once i can keep the currently
common code common.
testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@
|
|
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy
|
|
ok mikeb@, krw@, bluhm@, tedu@
|
|
ok miod@ mpi@
|
|
28 but an ICMP packet can be as small as 8 bytes (e.g. an ICMP echo
request packet with no payload), so check against ICMP_MINLEN instead.
Prior to this fix, divert(4) would erroneously discard valid ICMP
packets that are shorter than 20 bytes.
ICMPv6 is not affected, so this change applies to ICMP over IPv4 only.
ok florian@ henning@
|
|
No object file change
ok florian@ henning@
|
|
|
|
in the pkthdr directly.
ok henning@
|
|
now, so there is no need to calculate them before sending them to
userspace.
ok henning@
|
|
and let the stack take care of the checksums for reinjected outbound
packets.
Reinjected inbound packets will continue to have their checksums
calculated manually but we can now take advantage of in_proto_cksum_out
and in6_proto_cksum_out to streamline the way their checksums are done.
help from florian@ and henning@, feedback from naddy@
ok florian@ henning@
|
|
unnecessarily allocating an mbuf tag to store the divert port, just pass
the divert port directly to divert_packet() or divert6_packet() as an
argument.
includes a style fix pointed out by bluhm@
ok bluhm@ henning@ reyk@
|
|
While there move declaration of divert{,6}_output() to .c as it's a
private function. Also switch first two args to make it more like
similar functions (both suggested by mpi@).
Input/OK mpi@, OK lteo@
|
|
ever used to pass on uint32 (for ipsec). stop that madness and just pass
the uint32, 0 in all cases but the two that pass the ipsec flowinfo.
ok deraadt reyk guenther
|
|
Avoid the confusion by using an appropriate name for the variable.
Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:
rtableid = rdomain
But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).
claudio@ likes it, ok mikeb@
|
|
localhost connections.
The plan is to always use the routing table for addresses and routes
resolutions, so there is no future for an option that wants to bypass
it. This option has never been implemented for IPv6 anyway, so let's
just remove the IPv4 bits that you weren't aware of.
Tested a least by lteo@, guenther@ and chrisz@, ok mikeb@, benno@
|
|
for localhost connections. discussed with deraadt@
|
|
use the routing table there's no future for an option that wants to
bypass it. This option has never been implemented for IPv6 anyway,
so let's just remove the IPv4 bits that you weren't aware of.
Tested by florian@, man pages inputs from jmc@, ok benno@
|
|
|
|
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.
ok zhuk@ millert@
|
|
divert6_packet() from "pd" to "divert" to match the rest of the source.
I think "pd" was not a good name for a struct pf_divert because "pd"
usually refers to a pf_pdesc.
No object file change.
OK benno@ bluhm@ henning@
|
|
divert(4) sockets.
Recalculation of these checksums is necessary because (1) PF no longer
updates IP checksums as of pf.c rev 1.731, so translated packets that
are diverted to userspace (e.g. divert-packet with nat-to/rdr-to) will
have bad IP checksums and will be reinjected with bad IP checksums if
the userspace program doesn't correct the checksums; (2) the userspace
program may modify the packets, which would invalidate the checksums;
and (3) the divert(4) man page states that checksums are supposed to be
recalculated on reinjection.
This diff has been tested on a public webserver serving both IPv4/IPv6
for more than four weeks. It has also been tested on a firewall with
divert-packet and nat-to/rdr-to where it transferred over 60GB of
FTP/HTTP/HTTPS/SSH/DNS/ICMP/ICMPv6 data correctly, using IPv4/IPv6
userspace programs that intentionally break the IP and protocol
checksums to confirm that recalculation is done correctly on
reinjection. IPv6 extension headers were tested with Scapy.
Thanks to florian@ for testing the original version of the diff with
dnsfilter and Justin Mayes for testing the original version with Snort
inline. Thanks also to todd@ for helping me in my search for the cause
of this bug.
I would especially like to thank blambert@ for reviewing many versions
of this diff, and providing guidance and tons of helpful feedback.
no objections from florian@
help/ok blambert@, ok henning@
|
|
instead of 0 for pointers. No binary change.
OK mpi@
|