Age | Commit message (Collapse) | Author |
|
Doing KERNEL_LOCK() just before NET_LOCK() does not make sense.
Net lock is a rwlock that releases kernel lock during sleep. To
avoid an unnecessary release and take kernel lock cycle, move
KERNEL_LOCK() after NET_LOCK().
There is no lock order reversal deadlock issue. Both locks are
used in any order thoughout the kernel. As NET_LOCK() releases the
kernel lock when it cannot take the lock immediately and has to
sleep, we always end in the order kernel lock before net lock after
sleeping.
OK sashan@
|
|
Replace hand-rolled reference counting with refcnt_init(9) and hook it up
with a new dt(4) probe.
OK bluhm mvs
|
|
in6.c already has the privilege check as early as possible, make in.c match.
For unprivileged IPv4 ioctl calls with invalid args, this changes errno from
E* to EPERM.
OK bluhm
|
|
|
|
|
|
All cases do the same check as first step, so merge it before the switch
and before grapping exclusive locks.
OK mvs
|
|
Just like in6_ioctl_get(), read ioctls are safe with the shared net lock to
protect interface addresses and flags.
OK mvs
|
|
in{,6}_addmulti(). Since kernel lock is no more taken while following
setsockopt() path, it should be taken in this place. Corresponding
in{,6}_delmulti() already acquire kernel lock around (*if_ioctl)().
Problem reported and diff tested by weerd@
ok kn@ bluhm@
|
|
so->so_state is already read without kernel lock inside soo_ioctl()
which calls pru_control() aka in6_control() and in_control().
OK mvs
|
|
Naming the list like the struct itself makes for awful grepping.
Call the global variable "ifnetlist" from now on.
There used to be kvm(3) consumers in base picking up this symbol, but those
have long been converted to other interfaces.
A few potential ports users remain, same deal as sys/net/if_var.h r1.116
"Remove struct ifnet's unused if_switchport member": they get bumped.
Previous users pointed out by deraadt
OK bluhm
|
|
There was a crash due to use after free of the ifa although it is
ref counted. As ifa_refcnt was a simple integer increment, there
may be a path where multiple CPUs access it concurrently. So change
to struct refcnt which is MP safe and provides dt(4) leak debugging.
Link level address for IPsec enc(4) and various MPLS interfaces is
special. There ifa is part of struct sc. Use refcount anyway and
add a panic to detect use after free.
bug report stsp@; OK mvs@
|
|
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@
|
|
We already allow 240/4 in and out so lets allow it through as well.
One of many steps to make 240/4 useable.
Diff by Seth David Schoen (schoen at loyalty.org)
OK bluhm@ djm@
|
|
igmp groups may join while sleeping in interface destruction. In
this case if_get() in igmp_joingroup() fails and rti_fill() is not
called. Then inm->inm_rti may be NULL. This is the condition when
syzkaller crashes in igmp_leavegroup().
Pass the ifp the current CPU is already holding down to igmp_joingroup()
and igmp_leavegroup() to avoid half constructed igmp groups. Calling
if_get() in caller and callee makes no sense anyway.
Reported-by: syzbot+146823a676b7bea83649@syzkaller.appspotmail.com
OK denis@
|
|
for malloc(9) to make the system call reliable.
OK mvs@
|
|
ok gnezdo@ semarie@ mpi@
|
|
Since our last concurrency mistake only ioctl(2) ans sysctl(2) code path
take the reader lock. This is mostly for documentation purpose as long as
the softnet thread is converted back to use a read lock.
dlg@ said that comments should be good enough.
ok sashan@
|
|
made from socket close path. Most device drivers are not MP-safe yet,
and the closing of AF_INET and AF_INET6 sockets is no longer under the
kernel lock.
This fixes a panic seen by jcs@.
OK mpi@
|
|
Same fix as for the IPv6 case. Fixes a regression in ports/net/openvpn
spotted by landry@, ok bluhm@
|
|
netmask in the kernel.
OK visa@
|
|
this follows what's been done for detach and link state hooks, and
makes handling of hooks generally more robust.
address hooks are a bit different to detach/link state hooks in
that there's only a few things that register hooks (carp, pf, vxlan),
but a lot of places to run the hooks (lots of ipv4 and ipv6 address
configuration).
an address hook cookie was in struct pfi_kif, which is part of the
pf abi. rather than break pfctl -sI, this maintains the void * used
for the cookie and uses it to store a task, which is then used as
intended with the new api.
|
|
SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFDSTADDR, SIOCGIFBRDADDR,
SIOCSIFADDR, SIOCSIFNETMASK, SIOCSIFDSTADDR, and SIOCSIFBRDADDR.
Name in_ioctl_set_ifaddr() consistently. Use in_sa2sin() to validate
inet address. Combine if_addrlist loops and add comment. Although
netmask is not a inet address, length must be valid.
Reported-by: syzbot+5fc6da002fc4e8d994be@syzkaller.appspotmail.com
OK visa@
|
|
Fix the SIOCAIFADDR and SIOCDIFADDR ioctl(2) by implementing
in_sa2sin() to validate inet address family and address length.
OK visa@
|
|
ifconfig set/unset it.
ok deraadt@ kmos@
|
|
this allows mpls interfaces (mpe, mpw) to pass the rdomain they
wish the local label to be in, rather than have it implicitly forced
to 0 by these functions. right now they'll pass 0, but it will soon
be possible to have them rx packets in other rdomains.
previously the functions used ifp->if_rdomain for the rdomain.
everything other than mpls still passes ifp->if_rdomain.
ok mpi@
|
|
MPLS interfaces (ab)use rt_ifa_add for adding the local MPLS label
that they listen on for incoming packets, while every other use of
rt_ifa_add is for adding addresses on local interfaces. MPLS does
this cos the addresses involved are in basically the same shape as
ones used for setting up local addresses.
It is appropriate for interfaces to want RTF_MPATH on local addresses,
but in the MPLS case it means you can have multiple local things
listening on the same label, which doesn't actually work. mpe in
particular keeps track of in use labels to it can handle collisions,
however, mpw does not. It is currently possible to have multiple
mpw interfaces on the same local label, and sharing the same label
as mpe or possible normal forwarding labels.
Moving the RTF_MPATH flag out of rt_ifa_add means all the callers
that still want it need to pass it themselves. The mpe and mpw
callers are left alone without the flag, and will now get EEXIST
from rt_ifa_add when a label is already in use.
ok (and a huge amount of patience and help) mpi@
claudio@ is ok with the idea, but saw a much much earlier solution
to the problem
|
|
magic constant to panic() calls.
ok benno@ henning@ tb@
|
|
now unused 'ifra' from in_ioctl().
Discussed with mpi and visa
|
|
to its own function and merge the two switches in in_ioctl_change_ifaddr().
Finally: each ioctl has its own case and privilege check.
ok visa
As an aside, an audit of the ports tree has shown that we should continue
to support the legacy ioctls SIOCSIF{,BRD,DST}ADDR, SIOCSIFNETMASK despite
the fact that they have been deprecated for the better part of two decades
and FreeBSD dropped support 7 years ago. Too many ports still rely on them.
Thanks to sthen and visa for their help with that.
|
|
|
|
Some more code shuffling to get rid of one switch in in_ioctl().
This way there is one case for each of SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK, starting with a privilege check before any global
data is modified.
ok visa
|
|
|
|
and in_ioctl_change_ifaddr(). This way there is one case per ioctl
starting with a privilege check before any global data is modified.
The code paths are now straightforward. Some code duplication between
SIOCSIFADDR and SIOCAIFADDR, but that can be addressed later.
tested by hrvoje
ok visa
|
|
handling of SIOCAIFADDR, SIOCDIFADDR, SIOCSIFADDR into a separate
function, analogously to what was done in in6_ioctl().
tested by hrvoje
ok visa
|
|
two big switches in this function. Error out early in the default case
without grabbing the NET_LOCK() and move SIOCSIFNETMASK a bit up. This
will reduce the noise in an upcoming diff.
ok visa
|
|
that only needs the read lock.
ok visa, mpi
|
|
protected: mrt_ioctl() and in_ioctl(). The former has no other callers
and only needs a read lock. The latter will need refactoring to reduce
the lock's scope further. In a first step, establish a single exit point
and protect most of the function body with the NET_LOCK() while removing
the NET_LOCK() from a handful of callers.
suggested by & ok mpi, ok visa
|
|
Found the hard way.
|
|
For the PRU_CONTROL bit the NET_LOCK surrounds in[6]_control() and
on the ENOTSUPP case we guard the driver if_ioctl functions.
OK mpi@
|
|
This needs to go back to the drawing board.
|
|
|
|
as loopback interfaces for each rdomain (including lo0). This is done when
the interface is brought up. This is now also done by default (either on
attach of lo0 or when creating the rdomain).
OK mpi@
|
|
Instead return EOPNOTSUPP and call it from ifioctl(). This will help
getting per-driver ioctl routines outside of need the NET_LOCK().
While here always return ENXIO when ``ifp'' is NULL.
ok visa@, florian@
|
|
if_attach() enforces it is properly defined.
|
|
ok florian@, claudio@, bluhm@
|
|
ok florian@, claudio@, visa@, bluhm@
|
|
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@
|
|
currently protected by the NET_LOCK().
They are not accessed in the hot path, so protecting them with a
mutex could be an option. However since we're now going to run
with a NET_LOCK() for some time, assert that it is held.
IPsec is not yet ready to run without KERNEL_LOCK(), so assert it
is held, even in the forwarding path.
Tested by sthen@, ok visa@, claudio@, bluhm@
|
|
ok visa@
|
|
inline function instead of casting it to sockaddr. While there,
use inline instead of __inline for all these conversions. Some
struct sockaddr casts can be avoided completely.
OK dhill@ mpi@
|