summaryrefslogtreecommitdiff
path: root/sys/net
AgeCommit message (Collapse)Author
2020-05-13bpf(4): separate descriptor non-blocking status from read timeoutcheloha
If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
2020-05-13only pass the IO_NDELAY flag to ifq_deq_sleep as the nbio argument.David Gwynne
2020-05-12Set timeout(9) to refill the receive ring descriptors if the amount ofjan
descriptors runs below the low watermark. The em(4) firmware seems not to work properly with just a few descriptors in the receive ring. Thus, we use the low water mark as an indicator instead of zero descriptors, which causes deadlocks. ok kettenis@
2020-04-23Add support for autmatically moving traffic between rdomains on ipsec(4)tobhe
encryption or decryption. This allows us to keep plaintext and encrypted network traffic seperated and reduces the attack surface for network sidechannel attacks. The only way to reach the inner rdomain from outside is by successful decryption and integrity verification through the responsible Security Association (SA). The only way for internal traffic to get out is getting encrypted and moved through the outgoing SA. Multiple plaintext rdomains can share the same encrypted rdomain while the unencrypted packets are still kept seperate. The encrypted and unencrypted rdomains can have different default routes. The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'. If this differs from 'tdb_rdomain' then the packet is moved to 'tdb_rdomain_post' afer IPsec processing. Flows and outgoing IPsec SAs are installed in the plaintext rdomain, incoming IPsec SAs are installed in the encrypted rdomain. IPCOMP SAs are always installed in the plaintext rdomain. They can be viewed with 'route -T X exec ipsecctl -sa' where X is the rdomain ID. As the kernel does not create encX devices automatically when creating rdomains they have to be added by hand with ifconfig for IPsec to work in non-default rdomains. discussed with chris@ and kn@ ok markus@, patrick@
2020-04-20Don't return stack garbage even if it is going to beKenneth R Westerback
ignored. Initialize 'error' to 0. CID 1483380 ok mpi@
2020-04-19fix insufficient input sanitization in pf_rulecopyin() and pf_pool_copyin()Alexandr Nedvedicky
Reported-by: syzbot+d0639632a0affe0a690e@syzkaller.appspotmail.com Reported-by: syzbot+ae5e359d7f82688edd6a@syzkaller.appspotmail.com OK anton@
2020-04-18Use MHLEN for the space size of mbuf header. This fixes the panicYASUOKA Masahiko
when using pppac without pipex. ok dlg
2020-04-15Do not delete an existing RTF_CACHED entry with the same destinationMartin Pieuchot
address as the one trying to be inserted. Such entry must stay in the table as long as its parent route exist. If a code path tries to re-insert a route with the same destination address on the same interface it is a bug. Avoid the "route contains no arp information" problem reported by sthen@ and Laurent Salle. ok claudio@
2020-04-12Stop processing packets under non-exclusive (read) netlock.Martin Pieuchot
Prevent concurrency in the socket layer which is not ready for that. Two recent data corruptions in pfsync(4) and the socket layer pointed out that, at least, tun(4) was incorrectly using NET_RUNLOCK(). Until we find a way in software to avoid future mistakes and to make sure that only the softnet thread and some ioctls are safe to use a read version of the lock, put everything back to the exclusive version. ok stsp@, visa@
2020-04-12make ifpromisc assert that the caller is holding the NET_LOCK.David Gwynne
it needs NET_LOCK because it modifies if_flags and if_pcount. ok visa@
2020-04-12say if_pcount needs NET_LOCK instead of the kernel lock.David Gwynne
if_pcount is only touched in ifpromisc(), and ifpromisc() needs NET_LOCK anyway because it also modifies if_flags. suggested by mpi@ ok visa@
2020-04-12take NET_LOCK in aggr_clone_destroy() before calling aggr_p_dtor()David Gwynne
aggr_p_dtor() calls ifpromisc(), and ifpromisc() callers need to be holding NET_LOCK to make changes to if_flags and if_pcount, and before calling the interfaces ioctl to apply the flag change. i found this while reading code with my eyes, and was able to trigger the NET_ASSERT_LOCKED in the vlan_ioctl path. ok visa@
2020-04-12take NET_LOCK in tpmr_clone_destroy() before calling tpmr_p_dtor()David Gwynne
tpmr_p_dtor() calls ifpromisc(), and ifpromisc() callers need to be holding NET_LOCK to make changes to if_flags and if_pcount, and before calling the interfaces ioctl to apply the flag change. found by hrvoje popovski who was testing tpmr with vlan interfaces. vlan(4) asserts that the net lock is held in it's ioctl path, which started this whole bug hunt. ok visa@ (who came up with a similar diff, which hrvoje tested)
2020-04-12ifpromisc() requires NET_LOCK(), so acquire the lock when changingVisa Hankala
promiscuous mode from bridge(4). This fixes a regression of r1.332 of sys/net/if_bridge.c. splassert with bridge(4) and vlan(4) reported by David Hill OK mpi@, dlg@
2020-04-11log() lines need \n too.David Gwynne
2020-04-11Avoid triggering KASSERT for bogus reason in pfsync_sendout with PFSYNC_DEBUG.Stefan Sperling
ok mpi@
2020-04-11Grab the exclusive NET_LOCK() in the softnet thread.Martin Pieuchot
Prevent a data corruption on a UDP receive socket buffer reported by procter@ who triggered it with wireguard-go. The symptoms are underflow of sb_cc/sb_datacc/sb_mcnt. ok visa@
2020-04-11fix build with PFSYNC_DEBUG by switching a format string from %d to %zdStefan Sperling
2020-04-10Typo in comment.Martin Pieuchot
2020-04-10Place the 64bit key on the stack instead of malloc(9)in' it in pppx_if_find().Martin Pieuchot
Removing a malloc(9) with M_WAITOK reduces possible context switches which helps when dealing with parallelism issues. From Vitaliy Makkoveev.
2020-04-07Abstract the head of knote lists. This allows extending the lists,Visa Hankala
for example, with locking assertions. OK mpi@, anton@
2020-04-07Deny to create a pipex session if the session id already exists.Claudio Jeker
From Vitaliy Makkoveev OK yasuoka@
2020-04-07Remove superfluous NULL check from allocation with PR_WAITOK.Martin Pieuchot
From Vitaliy Makkoveev
2020-04-06use LIST_FOERACH_SAFE() instead of manual rolling the loop.Claudio Jeker
From Vitaliy Makkoveev
2020-04-06Pass struct pipex_iface_context pointer down to pipex ioctl functions.Claudio Jeker
This way pppx(4) and pppac(4) can be further unified. This is an intermediary step that does not introduce any behaviour change. From Vitaliy Makkoveev
2020-04-04Prevent the destruction of a session owned by another interface.Martin Pieuchot
Issue reported by and fix from Vitaliy Makkoveev.
2020-04-01Disallow session timeout on pppx(4).Martin Pieuchot
The timeout code currently assumes that the `session' descriptor it deals with is independently allocated. This isn't true for pppx(4) and result in memory corruption. So disable the feature until the code is fixed. Bug reported and fix provided by Vitaliy Makkoveev.
2020-03-26Unify #ifdef guarding code to remove PPTP and L2TP sessions.Martin Pieuchot
This makes a pattern emerge that should help when starting to protect the global `session' list with something else than the KERNEL_LOCK(). from Vitaliy Makkoveev.
2020-03-25Grab the NET_LOCK() before calling pipex_iface_stop().Martin Pieuchot
This function calls pipex_destroy_session() which requires the lock and pipex_ioctl() already calls it with the NET_LOCK() held. From Vitaliy Makkoveev.
2020-03-24Remove redundant 'NULL' check for 'rtm'.tobhe
CID 1453252 ok claudio@ mpi@
2020-03-21r1.244 introduced rt_hash() with careful checks of src for NULL atKenneth R Westerback
each dereference. r1.275 added a check at the top of the function, with an immediate "return (-1)" if src == NULL. Thus making the repeated checks in the body superfluous. CID 1452932. ok millert@ mpi@
2020-03-18Plug mem leak in SADB_REGISTER.Martin Pieuchot
From Benjamin Baier, ok tobhe@
2020-03-11properly limit indexing into the aggr_periodic_times array.David Gwynne
coverity CID 1486819 pointed out by and ok tobhe@
2020-03-10The return value of rt_ifa_purge() is ignored, so stopKenneth R Westerback
returning a (possibly uninitialized) value. CID 1483466. ok millert@
2020-03-10Properly exit loop at end of hooks TAILQ.tobhe
Feedback from and ok dlg@ ok kn@ todd@
2020-03-10Make sure return value 'error' is initialized to '0'.tobhe
ok dlg@ deraadt@
2020-02-20Replace field f_isfd with field f_flags in struct filterops to allowVisa Hankala
adding more filter properties without cluttering the struct. OK mpi@, anton@
2020-02-18pppx(4): rwsleep(9) -> rwsleep_nsec(9); ok claudio@cheloha
2020-02-18Cleanup <sys/kthread.h> and <sys/proc.h> includes.Martin Pieuchot
Do not include <sys/kthread.h> where it is not needed and stop including <sys/proc.h> in it. ok visa@, anton@
2020-02-15Remove needless #ifdef.YASUOKA Masahiko
Diff from Jan Stary. ok claudio
2020-02-14Push the KERNEL_LOCK() insidge pgsigio() and selwakeup().Martin Pieuchot
The 3 subsystems: signal, poll/select and kqueue can now be addressed separatly. Note that bpf(4) and audio(4) currently delay the wakeups to a separate context in order to respect the KERNEL_LOCK() requirement. Sockets (UDP, TCP) and pipes spin to grab the lock for the sames reasons. ok anton@, visa@
2020-02-01replace vlan instance SRP lists with SMR_SLISTsJonathan Matthew
As vlan instances obtained from the lists are passed to if_vinput(), which may sleep (with PF locking enabled), we only traverse the vlan lists inside the SMR critical section, and keep the existing reference counting in place. ok visa@ sashan@
2020-01-31actually set the link state down when the /dev entry is closed.David Gwynne
this means a route message is sent when the interface is closed and goes down, but also causes another route message to be sent when the interface comes up on the next open. this is important for things like ospfd and the ospfd regress test because they want to know when link comes up. the regression was pointed out by bluhm, who also helped me isolate the problem.
2020-01-30device poll handlers should return POLL flags, not errnos.David Gwynne
this restores restores returning POLLERR when the device is gone. ENXIO doesn't make much sense as part of a pollfd revents field.
2020-01-28Simplify filterops routines where klist_invalidate() is used.Visa Hankala
klist_invalidate() detaches knotes from the list and rewires them synchronously so that the original filterops routines do not get called after the invalidation. OK anton@, mpi@
2020-01-27update bpf_iflist in bpfsdetach instead of bpfdetach as some driversJoshua Stein
like USB only use the former and bpf_iflist was otherwise retaining references to a freed bpf_if. ok sashan
2020-01-27add some comments to tun_destroy, and try be a bit more paranoid.David Gwynne
2020-01-25move the SMR_LIST_REMOVE and smr_barrier up in tun_clone_destroy.David Gwynne
without this the tun_softc is still available on the list for the syscalls to get to, even though the device is dead and should no longer be referenced. by leaving it in the list after the refcnt_finalize, it was still be found and was used. found by claudio@ jmatthew@ agrees with the change
2020-01-25tweaks sleeping for an mbuf so it's more mpsafe.David Gwynne
the stack puts an mbuf on the tun ifq, and ifqs protect themselves with a mutex. rather than invent another lock that tun can wrap these ifq ops with and also coordinate it's conditionals (reading and dying) with, try and reuse the ifq mtx for the tun stuff too. because ifqs are more special than tun, this adds a special ifq_deq_sleep to ifq code that tun can call. tun just passes the reading and dying variables to ifq to check, but the tricky stuff about ifqs are kept in the right place. with this, tun_dev_read should be callable without the kernel lock.
2020-01-25use SMRs to find the right tun_softc on syscall entries.David Gwynne