Age | Commit message (Collapse) | Author |
|
tables on top of a rdomain) but until now our code was a crazy mix so that
it was impossible to correctly use rtables in that case. Additionally pf(4)
only knows about rtables and not about rdomains. This is especially bad when
tracking (possibly conflicting) states in various domains.
This diff fixes all or most of these issues. It adds a lookup function to
get the rdomain id based on a rtable id. Makes pf understand rdomains and
allows pf to move packets between rdomains (it is similar to NAT).
Because pf states now track the rdomain id as well it is necessary to modify
the pfsync wire format. So old and new systems will not sync up.
A lot of help by dlg@, tested by sthen@, jsg@ and probably more
OK dlg@, mpf@, deraadt@
|
|
IP_IPCOMP_LEVEL found by Clement LECIGNE, localhost root exploitable on
userland/kernel shared vm machines (ie. i386, amd64, arm, sparc (but not
sparc64), sh, ...) on OpenBSD 4.3 or older
ok claudio
|
|
`if it is unused nuke it' claudio
|
|
link-route points over the carp interface. (IP-less carpdev)
The descision whether to drop an ARP query is now expressed with
a goto out; rather than a second check later, which prevented
the carpdev case to work. Also add some comments to make
in_arpinput() easier to understand.
OK henning, markus.
|
|
destination of a packet was changed by pf. This allows for some evil
games with rdr-to or nat-to but is mostly needed for better rdomain/rtable
support. This is a first step and more work and cleanup is needed.
Here a list of what works and what does not (needs a patched pfctl):
pass out rdr-to:
from local rdr-to local addr works (if state tracking on lo0 is done)
from remote rdr-to local addr does NOT work
from local rdr-to remote works
from remote rdr-to remote works
pass in nat-to:
from remote nat-to local addr does NOT work
from remote nat-to non-local addr works
non-local is an IP that is routed to the FW but is not assigned on the FW.
The non working cases need some magic to correctly rewrite the incomming
packet since the rewriting would happen outbound which is too late.
"time to get it in" deraadt@
|
|
- queue packets from pf(4) to a userspace application
- reinject packets from the application into the kernel stack.
The divert socket can be bound to a special "divert port" and will
receive every packet diverted to that port by pf(4).
The pf syntax is pretty simple, e.g.:
pass on em0 inet proto tcp from any to any port 80 divert-packet port 1
A lot of discussion have happened since my last commit that resulted
in many changes and improvements.
I would *really* like to thank everyone who took part in the discussion
especially canacar@ who spotted out which are the limitations of this approach.
OpenBSD divert(4) is meant to be compatible with software running on
top of FreeBSD's divert sockets even though they are pretty different and will
become even more with time.
discusses with many, but mainly reyk@ canacar@ deraadt@ dlg@ claudio@ beck@
tested by reyk@ and myself
ok reyk@ claudio@ beck@
manpage help and ok by jmc@
|
|
Sorry.
|
|
- queue packets from pf(4) to a userspace application
- reinject packets from the application into the kernel stack.
The divert socket can be bound to a special "divert port" and will
receive every packet diverted to that port by pf(4).
The pf syntax is pretty simple, e.g.:
pass on em0 inet proto tcp from any to any port 80 divert-packet port 8000
test, bugfix and ok by reyk@
manpage help and ok by jmc@
no objections from many others.
|
|
seems to be causing some kind of memory corruption after several
hours of heavy IPsec traffic. connections start becoming very slow
eventually leading to all IPsec packets being lost. a reboot solves
the issue for several more hours before it appears again.
|
|
no binary change; ok grunk@
|
|
|
|
#if 1
reasonable
#else
bullshit required by some committee
#endif
are enough. theo ok
|
|
|
|
-m_copydata istead of straight bcopy. noticed by damien
-handle the pretty much impossible case that the packet header grows so
much that MHLEN < 68. i bet this had been the least of our worries, in that
case, but code oughta be correct anyway.
ok theo and dlg
|
|
forwarded packet in case ip_output returns an error and we have to quote
some of it back in an icmp error message.
this implementation done from scratch:
place an mbuf on the stack. copy the pkthdr from the forwarded packet and
the first 68 bytes of payload.
if we need to send an icmp error, just m_copym our mbuf-on-the-stack into
a real one that icmp_error can fuck with and eat as it desires.
ok theo dlg
|
|
therefore. Inherit the rdomain through the syncache.
There are some interactions that need some more work (ctlinput) so this
can be improved but is good enough for now.
OK markus@
|
|
recycling an mbuf tag and changing its type. just always get a new one.
theo ok
|
|
ok michele@ claudio@
|
|
some greater care must be taken to ensure the mbuf generated for icmp
errors is a good copy.
|
|
Agreed by mcbride@, sthen@ and henning@
|
|
as its a void function.
ok claudio@
|
|
ip_output failed and we had to generate an icmp packet. since ip_output
frees the mbuf we give it, we copied the original into a new mbuf. if
ip_output succeeded, we threw the copy away.
the problem with this is that copying the mbuf is about a third of the cost
of ip_forward.
this diff copies the data we might need onto the stack, and only builds the
mbuf for the icmp error if it actually needs it, ie, if ip_output fails.
this gives a noticable improvement in pps for forwarded traffic.
ok claudio@ markus@ henning@
tested by markus@ and by me in production for several days at work
|
|
Traffic shaping code should not be inside routing code.
If you want to rate-limit use altq instead.
ok claudio@ henning@ dlg@
|
|
|
|
E.g. give up the MASTER status if there's a host with a lower
demote count, even if it has a higher advskew.
At the moment this shouldn't cause any change, but this is a
first step towards the removal of the
"bump the advskew to 240 in case of errors" hack,
without breaking backward compatibility.
OK henning@
|
|
changed with a sysctl, so note it in sysctl.conf. v6 needs further
testing following discussions on the tech mailing list; rainer@ points
out possible interactions with neighbour discovery which need to be
investigated first.
"go ahead on the v4 part" deraadt@
|
|
|
|
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@
|
|
is pretty expensive, the more the more addresses are configured locally,
since we walk a list. when pf is on and we have a state key pointer,
and that state key is linked to another state key, we know for sure this
is not local. when it has a link to a pcb, it certainly goes to the local
codepath.
on a box with 1000 adresses forwarding 3 times as fast as before. theo ok
|
|
checksum offload over IPv6; ok deraadt@
|
|
and tdb_hash is only used in ip_ipsp.c, so there's no need to declare
it as extern in ip_ipsp.h
ok claudio@ henning@
|
|
tested by Manuel Rodriguez Morales <marodriguez at grupogdt.com>
|
|
ok claudio@
|
|
ok claudio@
|
|
when we check if a hash chain is over 15 long, we would access one past
the end of the array. change the static array size to a define because
it makes this checking easier to verify.
Found by Parfait.
ok deraadt@.
|
|
code. In pf rtableid == -1 means don't change the rtableid because
of this rule. So it has to be signed int there. Before the value
is passed from pf to route it is always checked to be >= 0. Change
the type to int in pf and to u_int in netinet and netinet6 to make
the checks work. Otherwise -1 may be used as an array index and
the kernel crashes.
ok henning@
|
|
levels. This will allow for platforms where soft interrupt levels do not
map to real hardware interrupt levels to have soft ipl values overlapping
hard ipl values without breaking spl asserts.
|
|
WARNING: THIS BREAKS COMPATIBILITY WITH THE PREVIOUS VERSION OF PFSYNC
this is a new variant of the protocol and a large reworking of the
pfsync code to address some performance issues. the single largest
benefit comes from having multiple pfsync messages of different
types handled in a single packet. pfsyncs handling of pf states is
highly optimised now, along with packet parsing and construction.
huggz for beck@ for testing.
huge thanks to mcbride@ for his help during development and for
finding all the bugs during the initial tests.
thanks to peter sutton for letting me get credit for this work.
ok beck@ mcbride@ "good." deraadt@
|
|
update the route specific MTU from the interface (because it could have
changed in between). This only makes sense if we actually have a valid
route but e.g. multicast traffic does no route lookup and so there is no
route at all and we don't need to update anything.
Hit by dlg@'s pfsync rewrite which already found 3 other bugs in the network
stack and slowly makes us wonder how it worked in the first place.
OK mcbride@ dlg@
|
|
being passed down if using HW checksum offload.
From Brad, inspired by NetBSD/FreeBSD. ok markus@
|
|
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.
To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.
tested by todd@, ok hshoexer@, angelos@, todd@
|
|
interfaces and is probably never hit. The other one happens when the
number of packets on the arp hold queue is exceeded. If arpresolve()
returns NULL the mbuf must be on the hold queue or freed.
Fixes the mbuf leak seen by dlg@. Found with dlg@'s insane mbuf leak
diff. OK dlg@
|
|
gets a mac addr for an ip under net.inet.ip.arpqueued.
ok deraadt@
|
|
fixes v6-over-v4 gifs wrt pf chatter about state linking mismatches
ok jsing claudio, tested by Ant La Porte <ant at ukbsd.org>
|
|
ok deraadt@ otto@
|
|
M_ANYCAST6 was only used to signal tcp6_input() that it should drop the
packet and send back icmp error. This can be done in ip6_input() without
the need for a mbuf flag. Gives us back one slot in m_flags for possible
future need. Looked at and some input by naddy@ and henning@. OK dlg@
|
|
arp layer. With a lot of input from deraadt@.
OK dlg@, looks good gollo@ + deraadt@
|
|
address. This cvs commit introduces a queue that buffers a small
burst of packets and resending the packets in correct order when
the ethernet address is resolved. Code written by Armin Wolfermann
<aw@osn.de>.
OK: claudio@ henning@
|
|
no carpdev configured.
I don't see how we can run into this at all, but let's
leave this test for a a little extra safety.
OK henning@
|
|
ok dlg
|