Age | Commit message (Collapse) | Author |
|
(reading back the relocation).
It doesn't add any real security and when we actually need to map the
buffer on demand to read/write it makes things cripplingly slow. The
correct way to make this utterly incorruptible is a radeon-kms-like
command checker to the command streams. This is on my todo list.
Thanks to drahn@ for additional testing.
|
|
this driver to work on machine with low kva and large apertures.
tested by myself and drahn@
|
|
things that there really isn't a decent api for elsewhere.
Since on recent intel IGPs the gtt aperture is too big (256meg is not
uncommon) to be mapped on a kva-constrained arch like i386, introduce an agp
mapping api that does things depending on arch.
On amd64 which can afford the space (and will use the direct mapping
again soon)just do bus_space_map() on init, then parcels things out
using bus_space_subregion(), thus avoiding map/unmap overhead on every
call (this is how inteldrm does things right now).
On i386, we do bus_space_map() and bus_space_unmap as appropriate. Linux
has some tricks here involving ``atomic'' maps that are on only one cpu
and that you may not sleep with to avoid the ipi overhead for tlb
flushing. For now we don't go down that route but it is being
considered.
I am also considering if it is worth abstracting this a little more,
improving the api and making it a general MD interface.
Tested by myself on i386 and amd64 and by drahn@ (who has one of the
machines with an aperture that is too big) on i386.
|
|
Tested by Jan Stary; thanks!
|
|
Tested by Jan Stary, thanks!
|
|
If userland asks to allocate an object large enough that two that size
could not fit around the pinned objects, disallow it with EFBIG.
This prevents mmap of large objects that big and copying between them
putting the machine into infinite thrashing. with a patch to the ddx (on
my git branch) that allocates a non-accelerated pixmap when it gets that
return code, matthieu@s test huge image works happily when before it
DOSed the kernel.
The correct fix would be to fall back to mmaping the backing pages for
objects that big (radeondrm will need such ability anyway). This however
is a lot more complicated and I am still working out how to do it
correctly hence this commit for now.
|
|
We need a proper MI api for doing this (one which will fall back to
mtrrs if PAT is not available would be best), but for now this allows
inteldrm to use PAT if available. Big fat XXX mentioning the need for a
real api.
ok kettenis@, tedu@
|
|
ok @damien
|
|
tsleep'ing (for example waiting for the firmware to become alive)
in iwn_init.
I believe this might fix a crash reported by dhill@
This is a temporary fix until I find something better that I will
apply to my other drivers that can tsleep in if_init (wpi, run etc...)
|
|
alternatives in a same image.
|
|
|
|
places in the tree need to be touched to update the object
initialisation with respect to that.
So, make a function (uvm_initobj) that takes the refcount, object and
pager ops and does this initialisation for us. This should save on
maintainance in the future.
looked good to fgs@. Tedu complained about the British spelling but OKed
it anyway.
|
|
enabled (it is not enabled yet.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
actually do so).
|
|
- pmap_kremove takes a va and a size, not a va range (unlike pmap_remove,
that gratuitious difference is nothing if not annoying).
- fix a memory leak of the bit 17 bitstring.
- fix the offset calculation when iterating through the dma segments.
Tested by Brandon Mercer, his machine now seems to be rock solid.
Remember kids, if a code path has not been tested fully, it does not work!
|
|
This was causing swizzling on bit 17 swizzling intel IGDs when not
needed. Thanks to Brandon Mercer for testing.
|
|
pointed out by Clang static analyser.
|
|
|
|
|
|
|
|
the one changed.
|
|
while (condition) {
do_stuff()
increment_condition /* this was missing */
}
To a for loop like it always should have been. I have no idea what I was
smoking when I wrote this function.
Fixes the crash on hardware that does bit 17 swizzling (turns out the
three I know of are all 945s) as soon as we first unbind an object.
Thank you very much to Brandon Mercer for actually managing to get me a
crash dump so i could debug this, and also for testing the fix.
|
|
|
|
libdrm bug recently.
Correct to what was intended.
|
|
accessed by the gpu or needing a flush). Since this implies that the object is
wanted, emit the flush then to save time.
Makes things a lot smoother than before in some GL applications, since
before we were claiming that object needing a flush were unbusy so the
next map stalled the gpu waiting on a flush.
From daniel vetter on intel-gfx.
|
|
conflicts.
|
|
If we just read access to some data that has been accessed by the gpu,
only sleep until the end of the gpus last write (which we track). So
instead of stalling the gpu until the last time accessed, both can read
at the same time (which is allowed and coherent as long as the right
invalidation happens).
Since we check offsets from userland before we exec a batchbuffer, this
helps 965 (with lots of read only relocations in the render path) quite
a lot.
|
|
Before, as well as being kinda nasty there was a very definite race, if
the last reference to an object was removed by uvm (a map going away),
then the free path happened unlocked, this could cause all kinds of
havoc.
In order to deal with this, move to fine-grained locking. Since uvm
object locks are spinlocks, and we need to sleep in operations that will
wait on the gpu, provide a DRM_BUSY flag that is set on a locked object
that then allows us to unlock and sleep (this is similar to several
things done in uvm on pages and some object types).
The rwlock stays around to ensure that execbuffer can have acces to the
whole gtt, so ioctls that bind to the gtt need a read lock, and
execuffer gets a write lock. otherwise most ioctls just need to busy the
object that they operate on. Lists also have their own locks.
Some cleanup could be done to make this a little prettier, but it is
much more correct than previously.
Tested very very vigorously on 855 (x40) and 965 (x61s), this found numerous
bugs. Also, the I can no longer crash the kernel at will.
A bunch of asserts hidden under DRMLOCKDEBUG have been left in the code for
debugging purposes.
|
|
these maps tend to be fairly long lived so it buys us nothing other than
code complexity.
|
|
Since this means the necessary gtt alignment may change. Nothing did
this already, so all it does it allows the code to be simpler.
idea from Daniel Vetter.
|
|
interrupt handler. So the locking and spl manipulation can simply go
away.
ok deraadt@, oga@
|
|
that kills gtt mappings.
In both of these case we want all writes to hit the bus before we do
whatever we're about to do.
Doesn't solve any problems that I know of but it may help.
|
|
When we disable tiling (for example whenever we free an object to out
userland cache), we stall the gpu so that we can get rid of the fence
register covering its bit of the gtt.
Instead, mark it as invalid and then free it on next use, leading to
less of a gpu stall if any. Leads to some slight performance improvement
on 8xx, 91x and 94x chipsets which are fence constrained.
|
|
|
|
tag and compare the individual components.
|
|
VGA device can be active, and is responsible for routing IO to the active VGA
device. Processes can use the new PCIOC_GETVGA and PCIOC_SETVGA ioctls
to manipulate the VGA arbiter.
ok deraadt@, oga@
|
|
ok deraadt@
|
|
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt
|
|
found by clang static analyser
|
|
found by clang static analyser.
|
|
Due to the messy context setup code this was breaking ipv6 forwarding
when ipv4 offloading was enabled. All checksum offloading remains
disabled for now.
Debugged with and ok claudio@
|
|
we shouldn't be touching the config space at all here.
Found by the clang static analyser.
ok deraadt
|
|
index
- we only care about unsolicited responses from the codec we're using
- no need to enable unsolicited responses until we know which codec
we're using
fixes crash reported by jacekm@
|
|
The spec says this bit should always be set. It can help resolve
hardware deadlocks where a unit downstream of the VS is waiting for more
input, the VS has one vertex queueed up but not dispatched because it
hope to get one more vertex so it can dispatch a 2x4 block, and software
isn't handing any more vertices due to waiting on rendering.
|