summaryrefslogtreecommitdiff
path: root/sys/net/rtsock.c
AgeCommit message (Collapse)Author
2023-04-20Call sysctl_source() with shared netlock. It performs read-onlyVitaliy Makkoveev
access to netlock protected data. ok kn@ bluhm@
2023-04-19Protect rtable_setsource() and rtable_getsource() with exclusiveAlexander Bluhm
and shared netlock respectively. OK kn@ mvs@
2023-04-18Remove kernel lock from ifa_ifwithaddr() and ifa_ifwithdstaddr().Vitaliy Makkoveev
Netlock protects `if_list', `ifa_list' and returned `ifa' dereference, so put netlock assertion within. Please note, rtable_setsource() doesn't destroy data pointed by `ar_source'. This is the `ifa_addr' data belongs to `ifa' and exclusive netlock is required to destroy it. So the kernel lock is not required within rt_setsource(). Take netlock by rt_setsource() caller to make `ifa' dereference safe. Suggestions and ok by bluhm@
2023-04-18Call sysctl_dumpentry() with shared netlock. It performs read-onlyVitaliy Makkoveev
access to netlock protected data. Please note, kernel lock is still taken, as required by rtable_getsource() or BFD subsystem. ok kn@ bluhm@
2023-04-18Call sysctl_iflist() with shared netlock. It performs read-only accessVitaliy Makkoveev
to netlock protected data. ok kn@ bluhm@
2023-04-18Call sysctl_ifnames() with shared netlock. It performs read-only accessVitaliy Makkoveev
to netlock protected data. ok kn@ bluhm@
2023-01-22Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' ofVitaliy Makkoveev
receive buffer. As it was done for SS_CANTSENDMORE bit, the definition kept as is, but now these bits belongs to the `sb_state' of receive buffer. `sb_state' ored with `so_state' when socket data exporting to the userland. ok bluhm@
2022-10-17Change pru_abort() return type to the type of void and make pru_abort()Vitaliy Makkoveev
optional. We have no interest on pru_abort() return value. We call it only from soabort() which is dummy pru_abort() wrapper and has no return value. Only the connection oriented sockets need to implement (*pru_abort)() handler. Such sockets are tcp(4) and unix(4) sockets, so remove existing code for all others, it doesn't called. ok guenther@
2022-10-03System calls should not fail due to temporary memory shortage inAlexander Bluhm
malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing for internet pcb. Pfkey and route sockets were already waiting. sonewconn() must not wait when called during TCP 3-way handshake. This logic has been preserved. Unix domain stream socket connect(2) can wait until the other side has created the socket to accept. OK mvs@
2022-09-13Change pru_rcvd() return type to the type of void. We have no interestVitaliy Makkoveev
on pru_rcvd() return value. Drop "pru_rcvd != NULL" check within pru_rcvd() wrapper. We only call it if the socket's protocol have PR_WANTRCVD flag set. Such sockets are route domain, tcp(4) and unix(4) sockets. ok guenther@ bluhm@
2022-09-08Rename global ifnet TAILQKlemens Nanni
Naming the list like the struct itself makes for awful grepping. Call the global variable "ifnetlist" from now on. There used to be kvm(3) consumers in base picking up this symbol, but those have long been converted to other interfaces. A few potential ports users remain, same deal as sys/net/if_var.h r1.116 "Remove struct ifnet's unused if_switchport member": they get bumped. Previous users pointed out by deraadt OK bluhm
2022-09-05Add missing prototypes for route_attach() and route_detach().Vitaliy Makkoveev
2022-09-03Move PRU_PEERADDR request to (*pru_peeraddr)().Vitaliy Makkoveev
Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets, except tcp(4) case. Also remove *_usrreq() handlers. ok bluhm@
2022-09-03Move PRU_SOCKADDR request to (*pru_sockaddr)()Vitaliy Makkoveev
Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4) inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability. The key management and route domain sockets returns EINVAL error for PRU_SOCKADDR request, so keep this behaviour for a while instead of make pru_sockaddr handler optional and return EOPNOTSUPP. ok bluhm@
2022-09-02Move PRU_CONTROL request to (*pru_control)().Vitaliy Makkoveev
The 'proc *' arg is not used for PRU_CONTROL request, so remove it from pru_control() wrapper. Split out {tcp,udp}6_usrreqs from {tcp,udp}_usrreqs and use them for inet6 case. ok guenther@ bluhm@
2022-09-01Move PRU_CONNECT2 request to (*pru_connect2)().Vitaliy Makkoveev
ok bluhm@
2022-08-31Move PRU_SENDOOB request to (*pru_sendoob)().Vitaliy Makkoveev
PRU_SENDOOB request always consumes passed `top' and `control' mbufs. To avoid dummy m_freem(9) handlers for all protocols release passed mbufs in the pru_sendoob() EOPNOTSUPP error path. Also fix `control' mbuf(9) leak in the tcp(4) PRU_SENDOOB error path. ok bluhm@
2022-08-29Move PRU_RCVOOB request to (*pru_rcvoob)().Vitaliy Makkoveev
ok bluhm@
2022-08-29Use struct refcnt for interface address reference counting.Alexander Bluhm
There was a crash due to use after free of the ifa although it is ref counted. As ifa_refcnt was a simple integer increment, there may be a path where multiple CPUs access it concurrently. So change to struct refcnt which is MP safe and provides dt(4) leak debugging. Link level address for IPsec enc(4) and various MPLS interfaces is special. There ifa is part of struct sc. Use refcount anyway and add a panic to detect use after free. bug report stsp@; OK mvs@
2022-08-28Move PRU_SENSE request to (*pru_sense)().Vitaliy Makkoveev
ok bluhm@
2022-08-28Since we have no raw_usrreq anymore, remove pr_output indirection.Alexander Bluhm
pfkeyv2 and route can call their output functions directly. OK mvs@
2022-08-28Move PRU_ABORT request to (*pru_abort)().Vitaliy Makkoveev
We abort only the sockets which are linked to `so_q' or `so_q0' queues of listening socket. Such sockets have no corresponding file descriptor and are not accessed from userland, so PRU_ABORT used to destroy them on listening socket destruction. Currently all our sockets support PRU_ABORT request, but actually it required only for tcp(4) and unix(4) sockets, so i should be optional. However, they will be removed with separate diff, and this time PRU_ABORT requests were converted as is. Also, the socket should be destroyed on PRU_ABORT request, but route and key management sockets leave it alive. This was also converted as is, because this wrong code never called. ok bluhm@
2022-08-27Move PRU_SEND request to (*pru_send)().Vitaliy Makkoveev
The former PRU_SEND error path of gre_usrreq() had `control' mbuf(9) leak. It was fixed in new gre_send(). The former pfkeyv2_send() was renamed to pfkeyv2_dosend(). ok bluhm@
2022-08-26Move PRU_RCVD request to (*pru_rcvd)().Vitaliy Makkoveev
ok bluhm@
2022-08-22Move PRU_SHUTDOWN request to (*pru_shutdown)().Vitaliy Makkoveev
ok bluhm@
2022-08-22Move PRU_DISCONNECT request to (*pru_disconnect).Vitaliy Makkoveev
ok bluhm@
2022-08-22Move PRU_ACCEPT request to (*pru_accept)().Vitaliy Makkoveev
ok bluhm@
2022-08-21Move PRU_CONNECT request to (*pru_connect)() handler.Vitaliy Makkoveev
ok bluhm@
2022-08-21Move PRU_LISTEN request to (*pru_listen)() handler.Vitaliy Makkoveev
ok bluhm@
2022-08-20Move PRU_BIND request to (*pru_bind)() handler.Vitaliy Makkoveev
For the protocols which don't support request, leave handler NULL. Do the NULL check within corresponding pru_() wrapper and return EOPNOTSUPP in such case. This will be done for all upcoming user request handlers. ok bluhm@ guenther@
2022-08-15Introduce 'pr_usrreqs' structure and move existing user-protocolVitaliy Makkoveev
handlers into it. We want to split existing (*pr_usrreq)() to multiple short handlers for each PRU_ request as it was already done for PRU_ATTACH and PRU_DETACH. This is the preparation step, (*pr_usrreq)() split will be done with the following diffs. Based on reverted diff from guenther@. ok bluhm@
2022-06-28Use refcnt API for struct rtentry instead of hand-crafted atomicAlexander Bluhm
operations. OK mvs@
2022-06-27Rework the rttimer code. Instead of a global queue and a global timeoutClaudio Jeker
use a per rttimer struct timeout. On enqueue the struct rttimer belongs to the timeout, in case the route is removed before the timer fires cleanup based on the timeout_del() return value. If the timeout currently running then just clear the rtt_rt pointer and let the timeout handle the cleanup. This should hopefully fix the icmp_pmtu_timeout crashes reported by some people. OK bluhm@
2022-06-27Fix white space and wrap long lines.Alexander Bluhm
2022-06-27Don't copy more than sa_len from the sockaddr to the sysctl / rt msg buffer.Claudio Jeker
In the rt msg buffer the size of the full buffer is calculated first then filled out after allocating the mbuf. In the sysctl code this is not needed since the buffer is already provided. OK mvs@
2022-06-26Switch walkargs for the buffer size to size_t and change the overflowClaudio Jeker
check to the less awkward w->w_needed <= w->w_given. OK bluhm@
2022-06-16Mark routes sent via sysctl(2) with RTF_DONE like it is done on theClaudio Jeker
route socket. All messages passed are by definition done. This may allow to share more code between sysctl and route socket parsers. OK mpi@
2022-06-06Simplify solock() and sounlock(). There is no reason to return a valueClaudio Jeker
for the lock operation and to pass a value to the unlock operation. sofree() still needs an extra flag to know if sounlock() should be called or not. But sofree() is called less often and mostly without keeping the lock. OK mpi@ mvs@
2022-03-09Change the logic around rounding up the needed memory for sysctls sinceClaudio Jeker
the network state can change between the two sysctl calls. Adding 10% extra works for larger routing tables but can be too little on smaller tables to hold even a single extra message. Instead of that add at least 1024 bytes or 10% (whichever is bigger) and round the size up to the next page. With this there are no more sporadic errors in the bgpd integration tests. OK sthen@
2022-02-25Reported-by: syzbot+1b5b209ce506db4d411d@syzkaller.appspotmail.comPhilip Guenther
Revert the pr_usrreqs move: syzkaller found a NULL pointer deref and I won't be available to monitor for followup issues for a bit
2022-02-25Move pr_attach and pr_detach to a new structure pr_usrreqs that canPhilip Guenther
then be shared among protosw structures, following the same basic direction as NetBSD and FreeBSD for this. Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the proper prototype to eliminate the previously necessary casts. ok mvs@ bluhm@
2022-01-20Shifting signed integers left by 31 is undefined behavior in C.Alexander Bluhm
found by kubsan; joint work with tobhe@; OK miod@
2021-12-16When adding the extra 10% of space to a needed sysctl buffer use mathClaudio Jeker
that is less likely to overflow the int type used. A BGP fullfeed is now so big that this calculation overflowed and then got sign extended. The result was for example 'route -n show' failures. Problem identified with deraadt@ OK deraadt@ (more cleanup needed but this fix is a good start)
2021-09-14Add missing kernel lock for Bi-directional Forwarding Detection data.Vitaliy Makkoveev
Also bfdset() calls pool_get(9) with PR_WAITOK flag so it should be done before we check the existence of this `bfd', otherwise it could be added multiple times. We have BFD disabled in the default kernel so this diff is for consistency mostly. ok mpi@
2021-09-07Fix NULL pointer dereference introduced by previous commit.Vitaliy Makkoveev
Reported-by: syzbot+684597dbbb9b516e76ae@syzkaller.appspotmail.com ok mpi@
2021-09-07Fix the race between if_detach() and rtm_output().Vitaliy Makkoveev
When the dying network interface descriptor has if_get(9) obtained reference owned by foreign thread, the if_detach() thread will sleep just after it removed this interface from the interface index map. The data related to this interface is still in routing table, so if_get(9) called by concurrent rtm_output() thread will return NULL and the following "ifp != NULL" assertion will be triggered. So remove the "ifp != NULL" assertions from rtm_output() and try to grab `ifp' as early as possible then hold it until we finish the work. In the case we won the race and we have `ifp' non NULL, concurrent if_detach() thread will wait us. In the case we lost we just return ESRCH. The problem reported by danj@. Diff tested by danj@. ok mpi@
2021-06-23rtsock: revert from timeout_set_flags(9) to timeout_set_proc(9); ok mvs@cheloha
2021-06-01Check `so_state' in rtm_senddesync() and return if SS_ISCONNECTED ormvs
SS_CANTRCVMORE bits are set. The first check required to prevent timeout_add(9) reschedule `rop_timeout', otherwise timeout_del_barrier(9) can't help us. The second check is for the case when shutdown(2) with SHUT_RD argument occurred on this socket and we should not receive anything include RTM_DESYNC packets. ok claudio@
2021-05-30Declare all struct protosw as constant.Alexander Bluhm
OK mvs@
2021-05-25As network features are not added dynamically, the domain structuresAlexander Bluhm
are constant. Having more const makes MP review easier. More pointers are mapped read-only in the kernel image. OK deraadt@ mvs@