summaryrefslogtreecommitdiff
path: root/sys/net
AgeCommit message (Collapse)Author
2022-07-14Protect all writers to ifm_cur with a mutex. ifmedia_match() doesAlexander Bluhm
not return any pointers without lock anymore. OK mvs@ mbuhl@
2022-07-14Turn pppoe(4) back to kernel lock. We can't predict netlock state withinVitaliy Makkoveev
pppoe_start(), so we can't use it for pppoe(4) data protection. Except input path, pppoe(4) always accessed with kernel lock held, so grab it around pppoeintr() too. Interfaces should not use netlock for their data protection. They should rely on kernel lock or implement their own. ok bluhm@ bket@
2022-07-14Replace tabs by spaces after "#define". No functional changes, justVitaliy Makkoveev
prevent future diffs to be ugly. ok bluhm@
2022-07-12Use __func__ in interface media debug printf().Alexander Bluhm
2022-07-12Protect interface media list with a mutex. This is just a startAlexander Bluhm
to make make media structures MP safe. OK mvs@
2022-07-12Remove PIPEXCSESSION pipex(4) ioctl(2) command from kernel and man page.Vitaliy Makkoveev
Long time ago pipex(4) session can't be deleted until both pipex(4) input and output queues become empty. Dead sessions were linked to the stack and the `ip_forward' flag was used to prevent packets forwarding. npppd(8) marked such sessions by doing PIPEXCSESSION ioctl(2) call. But since we started to unlink close session from the stack, this logic became unnecessary. Also pipex(4) session could be closed just after close request. npppd(8) was the only userland program which did PIPEXCSESSION ioctl(2) call, and we removed it week ago. It's time to remove the remains. Now the `flags' member of 'pipex_session' structure became immutable. ok yasuoka@
2022-07-10Add missing `pipex_list_mtx' mutex(9) around all sessions loop withinVitaliy Makkoveev
pipex_ip_output(). The all sessions loop was reworked to make possible to drop the lock within. ok bluhm@ yasuoka@.
2022-07-10if_detach() should wait until concurrent (*if_qstart)() interface startVitaliy Makkoveev
routines finished. Call ifq_barrier(9) just after we unlinked dying interface from the stack. From this point it is not accessible by if_get(9) and if_unit(9), and all concurrent threads owning interface pointer finished. It also detached from pseudo drivers like bridge(4). We only could have concurrent (*if_qstart)() handlers running, so wait them and then continue destruction. Reported and tested by Hrvoje Popovski. ok bluhm@
2022-07-10Add _cb suffix to callback fields in struct ifmedia. Makes codeAlexander Bluhm
easier to read and grep as ifm_status was used in both structs ifmediareq and ifmedia with different meaning. OK mvs@
2022-07-09Fix the error path of the 'SIOCSIFMTU' pppoe_ioctl() case. Return errorVitaliy Makkoveev
value if the `error' is set instead of continue to sppp_ioctl(). ok bluhm@
2022-07-09Unwrap klist from struct selinfo as this code no longer uses selwakeup().Visa Hankala
OK jsg@
2022-07-05Remove old poll/select wakeup mechanism.Visa Hankala
Also remove unneeded seltrue() and selfalse(). OK mpi@ jsg@
2022-07-02Remove unused device poll functions.Visa Hankala
Also remove unneeded includes of <sys/poll.h> and <sys/select.h>. Some addenda from jsg@. OK miod@ mpi@
2022-06-29Between the calls to art_match() and SRPL_FIRST() another CPU mayAlexander Bluhm
remove the route from the list. In rtable_match() check if the route entry is NULL. discussed with mpi@ jmatthew@ claudio@; OK mpi@
2022-06-29Remove switch(4) remains.Vitaliy Makkoveev
ok claudio@ mpi@
2022-06-29ether_input() called with shared netlock, but pppoe(4) wants it to beVitaliy Makkoveev
exclusive. Do the pppoe(4) input within netisr handler with exclusive netlok held and remove kernel lock hack from ether_input(). This is the step back, but it makes ether_input() path better then it is now. Tested by Hrvoje Popovski. ok bluhm@ claudio@
2022-06-28Don't call pipex_rele_session() when `session' is NULL.Vitaliy Makkoveev
Reported by Hrvoje Popovski. ok bluhm@
2022-06-28fix syncookies in conjunction with tcp fast port reuse.Henning Brauer
This really pointed out that the place syncookies were hooked in was almost, but not completely right. The way it was the special case for tcp fast port reuse in pf_test_state wasn't hit, because the first packet hitting that was the ACK from the peer finishing the 3WHS, and the reconstructed SYN came after. We're now doing pf_find_state (and *only* that) first, then syncookies, then going on so that the old state is thrown away properly and we get a new one with the sequence number modulator set up correctly Bonus: -11 lines of code tracked down (that took a while) + fixed under contract with Hush Communications Canada; special thanks to Lyndon ok sashan
2022-06-28Use refcnt API for struct rtentry instead of hand-crafted atomicAlexander Bluhm
operations. OK mvs@
2022-06-28ifconfig(8) return "Not supported" if you try to configure tso on a non-tsoJan Klemkow
supported interface. pointed out by bluhm@ OK bluhm@
2022-06-28Introduce `pipexoutq' mbuf(9) queue, and put outgoing pipex(4) relatedVitaliy Makkoveev
PPPOE packets within. Do (*if_output)() calls within netisr handler with netlock held. We can't predict netlock state when pipex(4) related (*if_qstart)() handlers called. This means we can't use netlock within pppac_qstart() and pppx_if_qstart() handlers. ok bluhm@
2022-06-27Rework the rttimer code. Instead of a global queue and a global timeoutClaudio Jeker
use a per rttimer struct timeout. On enqueue the struct rttimer belongs to the timeout, in case the route is removed before the timer fires cleanup based on the timeout_del() return value. If the timeout currently running then just clear the rtt_rt pointer and let the timeout handle the cleanup. This should hopefully fix the icmp_pmtu_timeout crashes reported by some people. OK bluhm@
2022-06-27Push the kernel lock down into arpresolve(). We still need it toAlexander Bluhm
prevent concurrent access to rt_llinfo from rtrequest_delete(). But the common case, when the MAC address is already known, works without lock. tested by Hrvoje Popovski; OK mvs@
2022-06-27Fix white space and wrap long lines.Alexander Bluhm
2022-06-27Introduce Large Receive Offloading of TCP segment offloading for ix(4). It isJan Klemkow
disabled by default. Also add a tso option to ifconfig(8) to enable and disable this feature. ok deraadt
2022-06-27Don't copy more than sa_len from the sockaddr to the sysctl / rt msg buffer.Claudio Jeker
In the rt msg buffer the size of the full buffer is calculated first then filled out after allocating the mbuf. In the sysctl code this is not needed since the buffer is already provided. OK mvs@
2022-06-26Mark `pipex_enable' as atomic. We never check `pipex_enable' withinVitaliy Makkoveev
(*if_qstart)() and we don't worry it's not serialized with the rest of output path. Also we will process already enqueued pipex(4) packets regardless on `pipex_enable' state. Use the local copy of `pipex_enable' within pppx_if_output(), otherwise we loose consistency. pointed and ok by bluhm@
2022-06-26Don't reset `idle_time' timeout on closed pipex(4) sessions in packetVitaliy Makkoveev
processing path. Such sessions already reached time to live timeout, and the garbage collector waits a little to before kill them. Otherwise we could make session's life time more then PIPEX_CLOSE_TIMEOUT. ok bluhm@
2022-06-26Don't take kernel lock on pipex(4) pppoe input. This extra serializationVitaliy Makkoveev
is not required. In packet processing path we have shared netlock held, but we do read-only access on per session `flags' and `ifindex'. We always modify them from ioctl(2) path with exclusive netlock held. The rest of pipex(4) session is immutable or uses per-session locks. ok bluhm@
2022-06-26Fix spacing.Vitaliy Makkoveev
2022-06-26Switch walkargs for the buffer size to size_t and change the overflowClaudio Jeker
check to the less awkward w->w_needed <= w->w_given. OK bluhm@
2022-06-26The "ifq_set_maxlen(..., 1);" hack we use to enforce pipex(4) relatedVitaliy Makkoveev
(*if_qstart)() be always called with netlock held doesn't work anymore with PPPOE sessions. Introduce `pipex_list_mtx' mutex(9) and use it to protect global pipex(4) lists and radix trees. Protect pipex(4) `session' dereference with reference counters, because we could sleep when accessing pipex(4) from ioctl(2) path, and this is not possible with mutex(9) held. ok bluhm@
2022-06-26'pipex_mppe' and 'pipex_session' structures have uint16_t bit fieldsVitaliy Makkoveev
which represent flags. We mix unlocked access to immutable flags with protected access to mutable ones. This could be not MP independent on some architectures, so convert these fields to u_int `flags' variables. ok bluhm@
2022-06-26Allow waiting during ktable allocation in pf_ioctl.mbuhl
OK bluhm Reported-by: syzbot+50ea4f33ed5dd9264918@syzkaller.appspotmail.com Reported-by: syzbot+df65f8b7ee8c0089e885@syzkaller.appspotmail.com
2022-06-16pfctl reports existing table as being added. glitch hasAlexandr Nedvedicky
been spotted and reported by jmc@ OK kn@
2022-06-16Mark routes sent via sysctl(2) with RTF_DONE like it is done on theClaudio Jeker
route socket. All messages passed are by definition done. This may allow to share more code between sysctl and route socket parsers. OK mpi@
2022-06-13fix logic bug in pf_find_state()Henning Brauer
a state in PFTM_PURGE could potentially hide another state on the same state key that is active and we'd incorrectly block the packet I believe that cannot happen as things are now. ok sashan
2022-06-07fixes potential memory leak. if_vinput() should always consume packetAlexandr Nedvedicky
by either passing it further or releasing it. OK mvs@
2022-06-07fixes NULL pointer dereference panic triggered by relayd.Alexandr Nedvedicky
same panic can be triggered when address table is part of anchor loaded by 'load anchor ... from ..,' statement. pf_find_or_create_ruleset() function called by pfr_add_tables() must receive ruleset name which comes from pre-allocated root table. OK claudio@ dlg@
2022-06-06Simplify solock() and sounlock(). There is no reason to return a valueClaudio Jeker
for the lock operation and to pass a value to the unlock operation. sofree() still needs an extra flag to know if sounlock() should be called or not. But sofree() is called less often and mostly without keeping the lock. OK mpi@ mvs@
2022-06-01callers to pf(4) must continue to run with packet as returnedAlexandr Nedvedicky
by firewall. OK dlg@
2022-05-23In pf the kernel paniced if IP options in packet within ICMP payloadAlexander Bluhm
were truncated. Drop such packets instead. Reported-by: syzbot+91abd3aa2fdfe900f9ce@syzkaller.appspotmail.com OK sashan@ claudio@
2022-05-23Fix white space.Alexander Bluhm
2022-05-18Remove #ifdef DDB specific includes, added in 1.968 but related code bitsMiod Vallat
removed in 1.970. ok bluhm@
2022-05-16pfi_kif_alloc() may be called with M_NOWAIT. Add NULL check toAlexander Bluhm
handle malloc(9) failure. from markus@; OK sashan@
2022-05-15Use strncmp() and IFNAMSIZ for if_xname in veb(4) consistently.Alexander Bluhm
OK dlg@
2022-05-15gcc insists the decl for veb_ports_free also use inlineTheo de Raadt
2022-05-15avoid calling if_enqueue from an smr critical section.David Gwynne
claudio@ is right that as a rule of thumb it is a bad idea to call arbitrary code from an smr crit section because the scope of what is called is very hard to keep in your head. in this particular case sashan@ points out that if_enqueue can call vport handlers, which calls if_vinput, which will push a packet into the network stack, which will call pf and try to take an rwlock. you can't sleep in an smr crit section. SMRs in this situation are protecting references to ports in the list of span and actual ports attached to a veb. when we needed to send a packet to an unknown unicast, broadcast, or multicast packet the code would SMR_TAILQ_FOREACH over all the ports, duplicating the mbuf and calling if_enqueue against the port. span port handling is basically the same, but we unconditionally send to them. this replaces the SMR_TAILQ with maps (arrays) of ports. the veb port map data structure contains a struct refcnt and the number of ports. the forwarding paths use an SMR crit section to get a reference to the map, increase the refcnt, and then leaves the smr crit section before iterating over the array of ports in the map. after the iteration it releases the refcnt. this does add a couple of atomic ops in the forwarding path, but only in the uncommon case (most packets are (should be) to known unicast addresses), and it's only one set of ops for all ports instead of ops per port. the known unicast case follows this pattern too. reported by Barbaros Bilek on bugs@ fix tested by me and hrvoje popovski ok claudio@ sashan@ bluhm@ (who also did a lot of the initial analysis)
2022-05-14When receiving a PADO offer, clear stored tags from previous PADO packets.Tobias Heider
Keeping and combining tags from multiple previous packets could result in a single accumulated reply overrunning mbuf size limits. Also make sure the tag size fields are reset to 0 if allocation fails. Add size check on mbuf cluster allocation and fail if more than MCLBYTES are requested. From NetBSD. tested by naddy@ ok bluhm@
2022-05-10move memory allocations in pfr_add_tables() out ofAlexandr Nedvedicky
NET_LOCK()/PF_LOCK() scope. bluhm@ helped a lot to put this diff into shape. OK bluhm@