summaryrefslogtreecommitdiff
path: root/sys/netinet
AgeCommit message (Collapse)Author
2019-04-23a first cut at converting some virtual ethernet interfaces to if_vinputDavid Gwynne
this let's input processing bypass ifiqs. there's a performance benefit from this, and it will let me tweak the backpressure detection mechanism that ifiqs use without impacting on a stack of virtual interfaces. ive tested all of these except mpw, which i will end up testing soon anyway.
2019-04-22In in_cksum() and in6_cksum() convert types to C99 style and makeAlexander Bluhm
both functions consistent. In in_cksum() panic if len is longer than mbuf, but in in6_cksum() do not panic if off and len match exactly to the end of mbuf. OK claudio@
2019-04-05In debug mode print TCP flag names to console correctly.Alexander Bluhm
from Mitchell Krome
2019-02-13change rt_ifa_add and rt_ifa_del so they take an rdomain argument.David Gwynne
this allows mpls interfaces (mpe, mpw) to pass the rdomain they wish the local label to be in, rather than have it implicitly forced to 0 by these functions. right now they'll pass 0, but it will soon be possible to have them rx packets in other rdomains. previously the functions used ifp->if_rdomain for the rdomain. everything other than mpls still passes ifp->if_rdomain. ok mpi@
2019-02-10remove the implict RTF_MPATH flag that rt_ifa_add() sets on new routes.David Gwynne
MPLS interfaces (ab)use rt_ifa_add for adding the local MPLS label that they listen on for incoming packets, while every other use of rt_ifa_add is for adding addresses on local interfaces. MPLS does this cos the addresses involved are in basically the same shape as ones used for setting up local addresses. It is appropriate for interfaces to want RTF_MPATH on local addresses, but in the MPLS case it means you can have multiple local things listening on the same label, which doesn't actually work. mpe in particular keeps track of in use labels to it can handle collisions, however, mpw does not. It is currently possible to have multiple mpw interfaces on the same local label, and sharing the same label as mpe or possible normal forwarding labels. Moving the RTF_MPATH flag out of rt_ifa_add means all the callers that still want it need to pass it themselves. The mpe and mpw callers are left alone without the flag, and will now get EEXIST from rt_ifa_add when a label is already in use. ok (and a huge amount of patience and help) mpi@ claudio@ is ok with the idea, but saw a much much earlier solution to the problem
2019-02-06Fix a possible mbuf leak in tcp_usrreq(). Make the error handlingAlexander Bluhm
more consistent to the other protocols' usrreq functions. OK visa@ claudio@
2019-02-04Avoid an mbuf double free in the oob soreceive() path. In theAlexander Bluhm
usrreq functions move the mbuf m_freem() logic to the release block instead of distributing it over the switch statement. Then the goto release in the initial check, whether the pcb still exists, will not free the mbuf for the PRU_RCVD, PRU_RVCOOB, PRU_SENSE command. OK claudio@ mpi@ visa@ Reported-by: syzbot+8e7997d4036ae523c79c@syzkaller.appspotmail.com
2019-01-20Refresh arp entries that are about to expire. Once their life time is lessClaudio Jeker
then 1/8 of net.inet.ip.arptimeout the system will send out a arp request about every 30 seconds until either the entry is updated or expired. Not refreshing arp entries will result in packet drop every time a entry expires which is not ideal for important gateway entries. Came up with this after a discussion with deraadt@. OK benno@ deraadt@
2019-01-18Bring back the ip_pcbopts() refactor. Pad the option buffer and thereforClaudio Jeker
the mbuf to the next word length as it is required by the standard. Also use the correct offset from the input mbuf. OK visa@, input & OK bluhm@
2019-01-18Revert Rev 1.351, the change is not quite right yet.Claudio Jeker
2019-01-08Botched up an if conditional in the last commit. The IP length needs toClaudio Jeker
bigger than the IP header len to be valid. With this I can traceroute again.
2019-01-07Validate the version, and all length fields of IP packets passed to a raw socketClaudio Jeker
with INP_HDRINCL. There is no reason to allow badly constructed packets through our network stack. Especially since they may trigger diagnostic checks further down the stack. Now EINVAL is returned instead which was already used for some checks that happened before. OK florian@ Reported-by: syzbot+0361ed02deed123667cb@syzkaller.appspotmail.com
2019-01-06Rewrite ip_pcbopts() to fill a fresh mbuf with the ip options insteadClaudio Jeker
of fiddling with the user supplied mbuf and then copy it at the end. OK visa@
2019-01-03Replace a funky 'else switch' construct into something that is equal butClaudio Jeker
a lot easier to read. The if can simply return the error and so the else branch is no longer needed. Input and OK dhill@
2018-12-25rework icmp6_error() to be closer to icmp_error()denis
input & OK mpi@
2018-12-20Replace a wrong poor mans m_trailingspace() with the real thing. The mbufClaudio Jeker
passed to ip_pcbopts could be a cluster and so the size check is all wrong. found by Greg Steuck; OK bluhm@ Reported-by: syzbot+c2543ae6b6692a5843e3@syzkaller.appspotmail.com eVS: ----------------------------------------------------------------------
2018-12-17Switch from timeout_add with tvtohz to just timeout_add_tv. Now this changeClaudio Jeker
will reduce the sleep time by one tick which doesn't matter in the common case. The code never passes a true 0 timeval to timeout_add_tv so the code will always sleep for at least 1 tick which is good enough. OK kn@, florian@, visa@, cheloha@
2018-12-11split ether_output into resolution, encapsulation, and output functionsDavid Gwynne
if if_output can be overridden on ethernet interfaces, it will allow things like vlan to do it's packet encapsulation during output before putting the packet directly on the underlying interface for output. this has two benefits. first, it can avoid having ether_output on pseudo interfaces recurse, which makes profiling of the network stack a lot clearer. secondly, and more importantly, it allows pseudo ethernet interface packet encapsulation to by run concurrently by the stack, rather than having packets unnecessarily serialied by an ifq. this diff just splits ether_output up, it doesnt have any interface take advantage of it yet. tweaks and ok claudio@
2018-12-04Use m_align() and while there reorder the pkthdr initalisation a bit.Claudio Jeker
This also makes the IPv4 and IPv6 code more similar. OK phessler@
2018-12-03In PRU_DISCONNECT don't fall through into PRU_ABORT since the latter freesClaudio Jeker
the inpcb apart from the disconnect. Just call soisdisconnected() and clear the inp->inp_faddr since the socket is still valid after a disconnect. Problem found by syzkaller via Greg Steuck OK visa@ Fixes: Reported-by: syzbot+2cd350dfe5c96f6469f2@syzkaller.appspotmail.com Reported-by: syzbot+139ac2d7d3d60162334b@syzkaller.appspotmail.com Reported-by: syzbot+02168317bd0156c13b69@syzkaller.appspotmail.com Reported-by: syzbot+de8d2459ecf4cdc576a1@syzkaller.appspotmail.com
2018-11-30MH_ALIGN -> m_align. In revarprequest() set the ph_rtableid so thatClaudio Jeker
the function is doing the same initialisation as arprequest(). OK bluhm@
2018-11-28Further cleanup of icmp_do_error.Claudio Jeker
- Use m_align() since it handles all cases - Use same rounding logic in the size check as in m_align() so all data will filt always. - consolidate pkthdr initalisation into one place - use m_prepend() instead of direct pointer manipulation (including the panic in case an underflow happens). OK bluhm@
2018-11-19Retire dom_rtkeylen from struct domain. Nothing is using this anymore.Claudio Jeker
It was used by the original patricia tree. OK mpi@
2018-11-14provide ip_tos_patch() for setting ip_tos and patching the ipv4 cksum.David Gwynne
previously the gif code would patch the tos field and not recalc the cksum, which would cause ip input code to drop the packet due to a cksum failure. the ipip code patched ip_tos and unconditionally recalculated the cksum, making it correct, but also wiping out any errors that may have been present before the recalculation. updating the cksum rather than replacing it lets cksum failures still fire. ip_tos_patch() is provided in the ecn code since it's because of ecn propagation that we need to update the tos field. internally it works like pf_patch_8 and pf_cksum_fixup, but since pf is optional it rolls its own code. procter may fix that in the future... ok claudio@
2018-11-10Do not translate the EACCES error from pf(4) to EHOSTUNREACH anymore.Alexander Bluhm
It also translated a documented send(2) EACCES case erroneously. This was too much magic and always prone to errors. from Jan Klemkow; man page jmc@; OK claudio@
2018-11-09M_LEADINGSPACE() and M_TRAILINGSPACE() are just wrappers forClaudio Jeker
m_leadingspace() and m_trailingspace(). Convert all callers to call directly the functions and remove the defines. OK krw@, mpi@
2018-11-09Remove the last few XXX rdomain markers. Even those functions respect theClaudio Jeker
rdomain now and are therefor rdomain save. OK mpi@
2018-11-05In icmp_input_if() m_pullup up the maximum size of required data at the start.Claudio Jeker
The maximum is ICMP_MINLEN (8) + max IPv4 header size (60) + IPv6 header (40) for the IPv6 over IPv4 transition case. By having up to this amount of data consequtive in an mbuf makes the rest of the code simpler and no more extra m_pullup calls are needed. Only length checks are now required.The maximum size is also big enough for all other ICMP types that don't embed the IP heaader. This ensures that all data has been m_pullup-ed before calling the ctlinput function which can look that deep into the header. OK bluhm@ markus@
2018-11-05Consider the size of IP header when doing the ICMP length overflowAlexander Bluhm
check. This code was never reached as ICMP length was truncated before, but fix the wrong calculation anyway. OK claudio@
2018-11-05Fixup the case where an mbuf cluster is used. Correctly offset the data toClaudio Jeker
the end of the cluster (there is no M_ALIGN version for clusters so it is hard coded). Also make the sanity check more general by using m_leadingspace. Not a security issue since the cluster code is not reachable, there is enough space in an mbuf. OK bluhm@
2018-11-04The change of the sb_mbmax calculation in sbreserve() broke settingAlexander Bluhm
a fixed socket send buffer size for TCP. tcp_update_sndspace() could overwrite the value as the algorithms were not in sync. OK benno@ claudio@
2018-10-22ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@cheloha
2018-10-18Partial revert of previous. Only the queue(3) stuff should have gone in.cheloha
2018-10-18igmp, struct router_info: use queue(3)cheloha
In particular, use LIST_* to preserve O(n) removal in rti_delete(). While here, clean up two malloc(9) calls. Suggested by mpi@. ok visa@
2018-10-13Expose net.inet.ip.arpq.drops to help debug what's going on when a lotFlorian Obser
of packets are being dropped but non of the other counters are increasing. From Daniel Hokka Zakrisson (daniel AT hozac DOT com), thanks! OK florian, phessler
2018-10-10RT_TABLEID_MAX is 255, fix places that assumed that it is less than 255.Reyk Floeter
rtable 255 is a valid routing table or domain id that wasn't handled by the ip[6]_mroute code or by snmpd. The arrays in the ip[6]_mroute code where off by one and didn't allocate space for rtable 255; snmpd simply ignored rtable 255. All other places in the tree seem to handle RT_TABLEID_MAX correctly. OK florian@ benno@ henning@ deraadt@
2018-10-04Revert the inpcb table mutex commit. It triggers a witness panicAlexander Bluhm
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
2018-09-24Turn carp_ourether() mp-safe, this is a requirement for taking bridge(4)Martin Pieuchot
out of the KERNEL_LOCK(). ok visa@, bluhm@
2018-09-20As a step towards per inpcb or socket locks, remove the net lockAlexander Bluhm
for netstat -a. Introduce a global mutex that protects the tables and hashes for the internet PCBs. To detect detached PCB, set its inp_socket field to NULL. This has to be protected by a per PCB mutex. The protocol pointer has to be protected by the mutex as netstat uses it. Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify() before the table mutex to avoid lock ordering problems in the notify functions. OK visa@
2018-09-17Do not acknowledge a received ack-only tcp packet that we would drop due tofriehm
PAWS. Otherwise we could trigger a retransmit of the opposite party with another wrong timestamp and produce loop. I have seen this with a buggy server which messed up tcp timestamps. Suggested by Prof. Jacobson for FreeBSD. ok krw, bluhm, henning, mpi
2018-09-14Initialize the TDB to NULL in ipsec_common_input() andRicardo Mestre
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus mbuf (NULL) we don't end up trying to dereference the TDB, while being an uninitialized pointer, to increase the drops. Coverity IDs 1473312, 1473313 and 1473317. OK mpi@ visa@
2018-09-14In general it is a bad idea to use one random secret for two things.Alexander Bluhm
The inet PCB uses one hash with local and foreign addresses, and one with local port numbers. Give both hashes separate keys. Also document the struct fields. OK visa@
2018-09-14unbreak userland uses of in_pcb.h by including sys/refcnt.hJonathan Gray
ok visa@
2018-09-13Add reference counting for inet pcb, this will be needed when weAlexander Bluhm
start locking the socket. An inp can be referenced by the PCB queue and hashes, by a pf mbuf header, or by a pf state key. OK visa@
2018-09-13Include the size of IPCOMP header when checking for compression.Martin Pieuchot
Problem found and anaylyzed by Romain Gabet, ok markus@
2018-09-11Convert inetctlerrmap to u_char like inet6ctlerrmap. That is alsoAlexander Bluhm
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap. OK mpi@
2018-09-11Make the distribution of in_ and in6_ functions in in_pcb.c andAlexander Bluhm
in6_pcb.c consistent, to ease comparing the code. Move all inet6 functions to in6_. Bring functions in both source files in same order. Cleanup the include section. Now in_pcb.c is a superset of in6_pcb.c. The latter contains all the special implementations. Just moving arround, no code change intended. OK mpi@
2018-09-10Remove useless INPCBHASH() macros. Just expand them.Alexander Bluhm
OK stsp@
2018-09-10Instead of calculating the mbuf packet header length here and there,Alexander Bluhm
put the algorithm into a new function m_calchdrlen(). Also set an uninitialized m_len to 0 in NFS code. OK claudio@
2018-09-10During fragment reassembly, mbuf chains with packet headers wereAlexander Bluhm
created. Add a new function m_removehdr() do convert packet header mbufs within the chain to regular mbufs. Assert that the mbuf at the beginning of the chain has a packet header. found by Maxime Villard in NetBSD; from markus@; OK claudio@