Age | Commit message (Collapse) | Author |
|
|
|
Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions.
A previous version of this diff was backed out. There was an
additional rtisvalid() in rtalloc_mpath() that prevented packet
output via interfaces that were not up. Now the route in the cache
has to be valid, but after new lookup, rtalloc_mpath() may return
invalid routes. This generates less errors in userland an preserves
existing behavior.
OK sashan@
|
|
It breaks NFS.
ok claudio@
|
|
Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions. ro->ro_rt is either valid or NULL. Note
that some places have a stricter rtisvalid() now compared to the
previous NULL check.
OK claudio@
|
|
Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.
OK claudio@
|
|
Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.
OK claudio@
|
|
The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.
OK claudio@
|
|
|
|
Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.
OK claudio@
|
|
To optimize route caching, count cache hits and misses. This is
shown in netstat -s for both inet and inet6. Reuse the old IPv6
forward cache counter. Sort ip6s_wrongif consistently. For now
only IPv4 cache counter has been implemented.
OK mvs@
|
|
The outgoing route is cached at the inpcb. This cache was only
invalidated when the socket closes or if the route gets invalid.
More specific routes were not detected. Especially with dynamic
routing protocols, sockets must be closed and reopened to use the
correct route. Running ping during a route change shows the problem.
To solve this, add a route generation number that is updated whenever
the routing table changes. The lookup in struct route is put into
the route_cache() function. If the generation number is too old,
the cached route gets discarded.
Implement route_cache() for ip_output() and ip_forward() first.
IPv6 and more places will follow.
OK claudio@
|
|
In revision 1.424 the logic in rt_setgate() has changed. The old
code entered a value into rt_gateway also if rt_setgwroute() returned
an error. Now if rt_setgwroute() fails, rt_gateway is NULL and
ROUNDUP(rt->rt_gateway->sa_len) crashes.
Put back the old logic in rt_setgate(). Setting rt_gateway and
rt_gwroute are actually independent.
If malloc(9) in rt_setgate() fails, rt_gateway can still be NULL.
The subsequent crash in free(rt->rt_gateway, M_RTABLE,
ROUNDUP(rt->rt_gateway->sa_len)) was just never observed. Add a
NULL check around these free(9).
Reported-by: syzbot+2e79dd9db712d3c5ade9@syzkaller.appspotmail.com
OK mvs@
|
|
In rtalloc() and rtalloc_mpath() declare the parameter dst as const
sockaddr. This makes MP safe route lookup easier as the destination
address is definitely not modified during the operation. Array
rti_info, the central data structure with addresses for route
matching, contains constant sockaddr now.
OK mvs@ dlg@
|
|
the rtable which should be serialised to ensure they're consistent.
unfortunately, rt_setgate is called from the network stack while it's
only holding shared NET_LOCK.
this uses the [X] protections as described in route.h to serialise the
changes, and reworks the code to try and keep enough stuff linked up
properly during the changes that it will still work if another cpu is
still using the rtentry structs while they still have shared net lock.
tested by and ok bluhm@
|
|
For implementing MP safe route lookup, it helps to know which
function parameters are constant. Add some const declarations, so
that the compiler guarantees that sockaddr dst parameter of
rtable_match() does not change.
OK dlg@
|
|
ok bluhm@
|
|
Route timers and route labels protected by corresponding mutexes. `ifa'
uses references counting for protection. rt_mpls_clear() could be called
lockless because this is the last reference of `rt'.
ok bluhm@ kn@
|
|
No functional changes.
|
|
time kernel and net locks are held in various combination to protect it.
We don't want to put kernel lock to all the places. Netlock also can't
be used because rtfree(9) which calls rtlabel_unref() has unknown
netlock state within.
This new `rtlabel_mtx' mutex(9) protects `rt_labels' list and `label'
entry dereference. Since we don't export 'rt_label' structure, keep this
lock private to net/route.c. For this reason rtlabel_id2name() now
copies label string to externally passed buffer instead of returning
address of `rt_labels' list data. This is the way which rtlabel_id2sa()
already works.
ok bluhm@
|
|
Obsolete since last year's r1.411 "Rework the rttimer code."
OK claudio
|
|
|
|
rwlock(9) acquisition.
Reported-by: syzbot+fbe3acb4886adeef31e0@syzkaller.appspotmail.com
|
|
serialize arpcache() and arpresolve(). In fact, net stack already has
sleep points, so the rwlock(9) is better here because we avoid
intersection with the rest of kernel locked paths. Also this new lock
assumed to use to route layer protection instead of netlock.
Hrvoje Popovski had tested this diff and found no visible performance
impact.
ok bluhm@
|
|
There was a crash due to use after free of the ifa although it is
ref counted. As ifa_refcnt was a simple integer increment, there
may be a path where multiple CPUs access it concurrently. So change
to struct refcnt which is MP safe and provides dt(4) leak debugging.
Link level address for IPsec enc(4) and various MPLS interfaces is
special. There ifa is part of struct sc. Use refcount anyway and
add a panic to detect use after free.
bug report stsp@; OK mvs@
|
|
accessible from ddb. Implement "show all routes" to print routing
tables, and "show route 0xfffffd807e9b0000" for a single route
entry. Note that the rtable id is not part of a route entry, so
it makes no sense to print it there.
OK deraadt@
|
|
operations.
OK mvs@
|
|
use a per rttimer struct timeout. On enqueue the struct rttimer belongs
to the timeout, in case the route is removed before the timer fires
cleanup based on the timeout_del() return value. If the timeout currently
running then just clear the rtt_rt pointer and let the timeout handle the
cleanup. This should hopefully fix the icmp_pmtu_timeout crashes reported
by some people.
OK bluhm@
|
|
allocate them.
Currently there are 6 rttimer_queues and not many more will follow. So
change rt_timer_queue_create() to rt_timer_queue_init() which now takes
a struct rttimer_queue * as argument which will be initialized.
Since this changes the gloabl vars from pointer to struct adjust other
callers as well.
OK bluhm@
|
|
All users use the same callback per queue so that makes sense.
Also replace rt_timer_queue_destroy() with rt_timer_queue_flush().
OK bluhm@
|
|
The callback only needs to know the rtableid all the other info from
struct rtableid is not needed.
Also change the default rttimer callback to only delete routes that are
RTF_HOST and RTF_DYNAMIC. This way 2 of the ICMP handlers can use NULL
as the callback.
OK bluhm@
|
|
mutex and move the rttimer entries into a temporary list. Then the
callback and pool put can be called later without holding the mutex.
tested by Hrvoje Popovski; OK claudio@
|
|
|
|
for timeout, add sysctl bounds checking between 0 and max int, and
use time_t for absolute times.
Some code assumes that the route timeout queue can be NULL and at
some places this was checked. Better make sure that all queues
always exist. The pool_get for struct rttimer_queue is only called
from initialization and from syscall, so PR_WAITOK is possible.
Keep the special hack when ip_mtudisc is set to 0. Destroy the
queue and generate an empty one.
If redirect timeout is 0, it should not time out. Check the value
in IPv6 to make the behavior like IPv4.
Sysctl net.inet6.icmp6.redirtimeout had no effect as the queue
timeout was not modified. Make icmp6_sysctl() look like icmp_sysctl().
OK claudio@
|
|
runs without kernel lock, use IPL_MPFLOOR protection for its pools.
OK mvs@ claudio@
|
|
call rt_timer_init() from rtable_init().
OK mvs@ claudio@
|
|
net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@
|
|
correct equality check.
Found by and OK jsg@
|
|
ok jmc@ reads ok tb@
|
|
are constant. Having more const makes MP review easier. More
pointers are mapped read-only in the kernel image.
OK deraadt@ mvs@
|
|
ok gnezdo@ semarie@ mpi@
|
|
Based/previous work on an idea from deraadt@
Input from claudio@, djm@, deraadt@, sthen@
OK deraadt@
|
|
messages, and save the route flags before deleting the route. For L2
route entries, the RTF_LLINFO flag is cleared during deletion, so saving
the flags beforehand means they're correct in the routing socket message.
ok mpi@
|
|
Those are for the gateway sockaddrs which get allocated in rt_setgate()
with the same ROUNDUP(sa_len) approach.
mpi already added a sizes for a few rt_gateway sockaddrs in two commits,
these are the last one in route.c leaving only ifafree() behind.
OK mpi
|
|
time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.
This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).
There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.
There is no performance cost on 64-bit (__LP64__) platforms.
With input from visa@, dlg@, and tedu@.
Several bugs squashed by visa@.
ok kettenis@
|
|
ignored. Initialize 'error' to 0.
CID 1483380
ok mpi@
|
|
address as the one trying to be inserted.
Such entry must stay in the table as long as its parent route exist. If
a code path tries to re-insert a route with the same destination address
on the same interface it is a bug.
Avoid the "route contains no arp information" problem reported by sthen@
and Laurent Salle.
ok claudio@
|
|
|
|
each dereference. r1.275 added a check at the top of the function,
with an immediate "return (-1)" if src == NULL. Thus making the
repeated checks in the body superfluous.
CID 1452932.
ok millert@ mpi@
|
|
returning a (possibly uninitialized) value.
CID 1483466.
ok millert@
|
|
The routing labels have nothing todo with rdomains and routing tables.
Remove the unneeded rdomain check. With this rtlabel on interfaces work again.
OK kn@
|