summaryrefslogtreecommitdiff
path: root/sys/net/route.c
AgeCommit message (Collapse)Author
2024-09-20remove unneeded semicolons; checked by millert@Jonathan Gray
2024-03-31Combine route_cache() and rtalloc_mpath() in new route_mpath().Alexander Bluhm
Fill and check the cache and call rtalloc_mpath() together. Then the caller of route_mpath() does not have to care about the uint32_t *src pointer and just pass struct in_addr. All the conversions are done inside the functions. A previous version of this diff was backed out. There was an additional rtisvalid() in rtalloc_mpath() that prevented packet output via interfaces that were not up. Now the route in the cache has to be valid, but after new lookup, rtalloc_mpath() may return invalid routes. This generates less errors in userland an preserves existing behavior. OK sashan@
2024-02-29revert "Combine route_cache() and rtalloc_mpath() in new route_mpath()"Christian Weisgerber
It breaks NFS. ok claudio@
2024-02-27Combine route_cache() and rtalloc_mpath() in new route_mpath().Alexander Bluhm
Fill and check the cache and call rtalloc_mpath() together. Then the caller of route_mpath() does not have to care about the uint32_t *src pointer and just pass struct in_addr. All the conversions are done inside the functions. ro->ro_rt is either valid or NULL. Note that some places have a stricter rtisvalid() now compared to the previous NULL check. OK claudio@
2024-02-22Make the route cache aware of multipath routing.Alexander Bluhm
Pass source address to route_cache() and store it in struct route. Cached multipath routes are only valid if source address matches. If sysctl multipath changes, increase route generation number. OK claudio@
2024-02-13Merge struct route and struct route_in6.Alexander Bluhm
Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there. OK claudio@
2024-02-09Route cache function returns hit or miss.Alexander Bluhm
The route_cache() function can easily return whether it was a cache hit or miss. Then the logic to perform a route lookup gets a bit simpler. Some more complicated if (ro->ro_rt == NULL) checks still exist elsewhere. Also use route cache in in_pcbselsrc() instead of filling struct route manually. OK claudio@
2024-02-07Add missing #ifdef INET6 to fix ramdisk build.Alexander Bluhm
2024-02-07Use the route generation number also for IPv6.Alexander Bluhm
Implement route6_cache() to check whether the cached route is still valid and otherwise fill caching parameter of struct route_in6. Also count cache hits and misses in netstat. in_pcbrtentry() uses route cache now. OK claudio@
2024-02-05Add netstat counter for route cache.Alexander Bluhm
To optimize route caching, count cache hits and misses. This is shown in netstat -s for both inet and inet6. Reuse the old IPv6 forward cache counter. Sort ip6s_wrongif consistently. For now only IPv4 cache counter has been implemented. OK mvs@
2024-01-31Add route generation number to route cache.Alexander Bluhm
The outgoing route is cached at the inpcb. This cache was only invalidated when the socket closes or if the route gets invalid. More specific routes were not detected. Especially with dynamic routing protocols, sockets must be closed and reopened to use the correct route. Running ping during a route change shows the problem. To solve this, add a route generation number that is updated whenever the routing table changes. The lookup in struct route is put into the route_cache() function. If the generation number is too old, the cached route gets discarded. Implement route_cache() for ip_output() and ip_forward() first. IPv6 and more places will follow. OK claudio@
2023-11-13Fix rt_setgate() error handling.Alexander Bluhm
In revision 1.424 the logic in rt_setgate() has changed. The old code entered a value into rt_gateway also if rt_setgwroute() returned an error. Now if rt_setgwroute() fails, rt_gateway is NULL and ROUNDUP(rt->rt_gateway->sa_len) crashes. Put back the old logic in rt_setgate(). Setting rt_gateway and rt_gwroute are actually independent. If malloc(9) in rt_setgate() fails, rt_gateway can still be NULL. The subsequent crash in free(rt->rt_gateway, M_RTABLE, ROUNDUP(rt->rt_gateway->sa_len)) was just never observed. Add a NULL check around these free(9). Reported-by: syzbot+2e79dd9db712d3c5ade9@syzkaller.appspotmail.com OK mvs@
2023-11-12Use constant sockaddr in route lookup.Alexander Bluhm
In rtalloc() and rtalloc_mpath() declare the parameter dst as const sockaddr. This makes MP safe route lookup easier as the destination address is definitely not modified during the operation. Array rti_info, the central data structure with addresses for route matching, contains constant sockaddr now. OK mvs@ dlg@
2023-11-12rt_setgate performs a series of tweaks to an rtable and the routes inDavid Gwynne
the rtable which should be serialised to ensure they're consistent. unfortunately, rt_setgate is called from the network stack while it's only holding shared NET_LOCK. this uses the [X] protections as described in route.h to serialise the changes, and reworks the code to try and keep enough stuff linked up properly during the changes that it will still work if another cpu is still using the rtentry structs while they still have shared net lock. tested by and ok bluhm@
2023-11-10rtable_match() takes constant destination.Alexander Bluhm
For implementing MP safe route lookup, it helps to know which function parameters are constant. Add some const declarations, so that the compiler guarantees that sockaddr dst parameter of rtable_match() does not change. OK dlg@
2023-04-28Add rtentry refcnt type to dt(4).Vitaliy Makkoveev
ok bluhm@
2023-04-27Remove kernel lock from rtfree(9).Vitaliy Makkoveev
Route timers and route labels protected by corresponding mutexes. `ifa' uses references counting for protection. rt_mpls_clear() could be called lockless because this is the last reference of `rt'. ok bluhm@ kn@
2023-04-27Add `rttimer_mtx' to the locking description.Vitaliy Makkoveev
No functional changes.
2023-04-26Introduce `rtlabel_mtx' mutex(9) to protect route labels storage. ThisVitaliy Makkoveev
time kernel and net locks are held in various combination to protect it. We don't want to put kernel lock to all the places. Netlock also can't be used because rtfree(9) which calls rtlabel_unref() has unknown netlock state within. This new `rtlabel_mtx' mutex(9) protects `rt_labels' list and `label' entry dereference. Since we don't export 'rt_label' structure, keep this lock private to net/route.c. For this reason rtlabel_id2name() now copies label string to externally passed buffer instead of returning address of `rt_labels' list data. This is the way which rtlabel_id2sa() already works. ok bluhm@
2023-04-26Remove +20y old rt_timer_init() commentKlemens Nanni
Obsolete since last year's r1.411 "Rework the rttimer code." OK claudio
2023-04-26typofix rttimer commentKlemens Nanni
2023-01-28Revert the `rt_lock' rwlock(9) diff to fix the recursiveVitaliy Makkoveev
rwlock(9) acquisition. Reported-by: syzbot+fbe3acb4886adeef31e0@syzkaller.appspotmail.com
2023-01-21Introduce `rt_lock' rwlock(9) and use it instead of kernel lock toVitaliy Makkoveev
serialize arpcache() and arpresolve(). In fact, net stack already has sleep points, so the rwlock(9) is better here because we avoid intersection with the rest of kernel locked paths. Also this new lock assumed to use to route layer protection instead of netlock. Hrvoje Popovski had tested this diff and found no visible performance impact. ok bluhm@
2022-08-29Use struct refcnt for interface address reference counting.Alexander Bluhm
There was a crash due to use after free of the ifa although it is ref counted. As ifa_refcnt was a simple integer increment, there may be a path where multiple CPUs access it concurrently. So change to struct refcnt which is MP safe and provides dt(4) leak debugging. Link level address for IPsec enc(4) and various MPLS interfaces is special. There ifa is part of struct sc. Use refcount anyway and add a panic to detect use after free. bug report stsp@; OK mvs@
2022-07-28In the kernel exist functions to print routes, but they were notAlexander Bluhm
accessible from ddb. Implement "show all routes" to print routing tables, and "show route 0xfffffd807e9b0000" for a single route entry. Note that the rtable id is not part of a route entry, so it makes no sense to print it there. OK deraadt@
2022-06-28Use refcnt API for struct rtentry instead of hand-crafted atomicAlexander Bluhm
operations. OK mvs@
2022-06-27Rework the rttimer code. Instead of a global queue and a global timeoutClaudio Jeker
use a per rttimer struct timeout. On enqueue the struct rttimer belongs to the timeout, in case the route is removed before the timer fires cleanup based on the timeout_del() return value. If the timeout currently running then just clear the rtt_rt pointer and let the timeout handle the cleanup. This should hopefully fix the icmp_pmtu_timeout crashes reported by some people. OK bluhm@
2022-05-05Use static objects for struct rttimer_queue instead of dynamicallyClaudio Jeker
allocate them. Currently there are 6 rttimer_queues and not many more will follow. So change rt_timer_queue_create() to rt_timer_queue_init() which now takes a struct rttimer_queue * as argument which will be initialized. Since this changes the gloabl vars from pointer to struct adjust other callers as well. OK bluhm@
2022-05-04Move rttimer callback function from the rttimer itself to rttimer_queue.Claudio Jeker
All users use the same callback per queue so that makes sense. Also replace rt_timer_queue_destroy() with rt_timer_queue_flush(). OK bluhm@
2022-04-30Convert the 2nd rttimer callback from struct rttimer to u_int rtableid.Claudio Jeker
The callback only needs to know the rtableid all the other info from struct rtableid is not needed. Also change the default rttimer callback to only delete routes that are RTF_HOST and RTF_DYNAMIC. This way 2 of the ICMP handlers can use NULL as the callback. OK bluhm@
2022-04-28Route timers were not MP safe. Protect the global lists with aAlexander Bluhm
mutex and move the rttimer entries into a temporary list. Then the callback and pool put can be called later without holding the mutex. tested by Hrvoje Popovski; OK claudio@
2022-04-20Fix white space and wrap long lines.Alexander Bluhm
2022-04-20Route timeout was a mixture of int, u_int and long. Use type intAlexander Bluhm
for timeout, add sysctl bounds checking between 0 and max int, and use time_t for absolute times. Some code assumes that the route timeout queue can be NULL and at some places this was checked. Better make sure that all queues always exist. The pool_get for struct rttimer_queue is only called from initialization and from syscall, so PR_WAITOK is possible. Keep the special hack when ip_mtudisc is set to 0. Destroy the queue and generate an empty one. If redirect timeout is 0, it should not time out. Check the value in IPv6 to make the behavior like IPv4. Sysctl net.inet6.icmp6.redirtimeout had no effect as the queue timeout was not modified. Make icmp6_sysctl() look like icmp_sysctl(). OK claudio@
2022-04-19Use a pool instead of malloc for struct rttimer_queue. As routingAlexander Bluhm
runs without kernel lock, use IPL_MPFLOOR protection for its pools. OK mvs@ claudio@
2022-04-19Instead of a MP unsafe global variable to initialize at first use,Alexander Bluhm
call rt_timer_init() from rtable_init(). OK mvs@ claudio@
2022-02-22Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>Philip Guenther
net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-02-07In rtredirect() change an bad assignment in an if condition to theClaudio Jeker
correct equality check. Found by and OK jsg@
2022-01-02spellingJonathan Gray
ok jmc@ reads ok tb@
2021-05-25As network features are not added dynamically, the domain structuresAlexander Bluhm
are constant. Having more const makes MP review easier. More pointers are mapped read-only in the kernel image. OK deraadt@ mvs@
2021-03-10spellingJonathan Gray
ok gnezdo@ semarie@ mpi@
2020-10-29Add feature to force the selection of source IP addressdenis
Based/previous work on an idea from deraadt@ Input from claudio@, djm@, deraadt@, sthen@ OK deraadt@
2020-08-13Use rtm_miss() rather than the simpler rtm_send() to send route deleteJonathan Matthew
messages, and save the route flags before deleting the route. For L2 route entries, the RTF_LLINFO flag is cleared during deletion, so saving the flags beforehand means they're correct in the routing socket message. ok mpi@
2020-07-28Add size to free(9) callskn
Those are for the gateway sockaddrs which get allocated in rt_setgate() with the same ROUNDUP(sa_len) approach. mpi already added a sizes for a few rt_gateway sockaddrs in two commits, these are the last one in route.c leaving only ifafree() behind. OK mpi
2020-06-24kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha
time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
2020-04-20Don't return stack garbage even if it is going to beKenneth R Westerback
ignored. Initialize 'error' to 0. CID 1483380 ok mpi@
2020-04-15Do not delete an existing RTF_CACHED entry with the same destinationMartin Pieuchot
address as the one trying to be inserted. Such entry must stay in the table as long as its parent route exist. If a code path tries to re-insert a route with the same destination address on the same interface it is a bug. Avoid the "route contains no arp information" problem reported by sthen@ and Laurent Salle. ok claudio@
2020-04-10Typo in comment.Martin Pieuchot
2020-03-21r1.244 introduced rt_hash() with careful checks of src for NULL atKenneth R Westerback
each dereference. r1.275 added a check at the top of the function, with an immediate "return (-1)" if src == NULL. Thus making the repeated checks in the body superfluous. CID 1452932. ok millert@ mpi@
2020-03-10The return value of rt_ifa_purge() is ignored, so stopKenneth R Westerback
returning a (possibly uninitialized) value. CID 1483466. ok millert@
2020-01-08Fix confusion around rtlabelid and rtableid in rt_ifa_add() and rt_ifa_del().Claudio Jeker
The routing labels have nothing todo with rdomains and routing tables. Remove the unneeded rdomain check. With this rtlabel on interfaces work again. OK kn@