Age | Commit message (Collapse) | Author |
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Should match the functionality of the earlier generations, but untuned.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Confusing gcc with different flags for supposedly inlined functions is
not a good idea.
References: https://bugs.freedesktop.org/show_bug.cgi?id=62198
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The idea is to implement more fine-grained checks as we may want
different heuristics for desktops with GT1s than for mobile GT2s, etc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Having reduced all the vb code for these generations to the same set of
routines, we can refactor them into a single set of functions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The goal is to reduce the preference of rendering to a SHM pixmap - only
if it is already active, will we consider continuing to use it on the
GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
References: https://bugs.freedesktop.org/show_bug.cgi?id=56591
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
And assert that the box is valid when migrating.
References: https://bugs.freedesktop.org/show_bug.cgi?id=56591
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Be careful when cutting and pasting assertions!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
References: https://bugs.freedesktop.org/show_bug.cgi?id=55700
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
When computing the active region with of a composite operation with
unknown extents we try to simply use the whole Drawable. However, this
needs to be clipped otherwise it may trigger assertion failure with an
offscreen pixmap.
References: https://bugs.freedesktop.org/show_bug.cgi?id=55164
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The only real user now has its own heuristics, so convert the remaining
users over to !is_gpu().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Bring the code uptodate with both kernel interface changes and internal
adjustments following the creation of CPU buffers with set-cacheing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Even if the pixmap is entirely damaged on the CPU, we still may be in
the process of transferring it and so cause an unwanted stall.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The unwanted ';' caused is_cpu() to always return false if a GPU bo was
attached. Not necessary a bad thing, just misses the potential
optimisation where having chosen to prefer to use the CPU path we then
have to migrate to the GPU even though the bo is undamaged or idle.
Spotted-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The goal is cheaply spot a simple copy operation that can be performed
on the CPU without having to load both parties onto the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Take in account busyness of the damaged GPU bo for considering placement
of the subsequent operations. In particular, note that is_cpu is only
used for when we feel like the following operation would be better on
the CPU and just want to confirm that doing so will not stall.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The upload proxy is a fake buffer that we do not want to render to as
then the damage tracking become extremely confused and the buffer it
self is not optimised for persistent rendering. We assert that we do not
use it as a render target, and this patch adds the check so that we
avoid treating the proxy as a valid target when choosing the render
path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The shaders treat colours as an argb value, however the clear color is
stored in the pixmap's native format (a8, r5g6b5, x8r8g8b8 etc). So
before using the value of the clear color as a solid we need to convert
it into the a8r8g8b8 format.
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48204
Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Since the removal of the ability to create a backing pixmap after the
creation of its parent, it no longer becomes practical to attempt
rendering with the GPU to unattached pixmaps. So having made the
decision never to render to that pixmap, perform the test explicitly
along the render paths.
This fixes a segmentation fault introduced in 8a303f195 (sna: Remove
existing damage before overwriting with a composite op) which assumed
the existence of a backing pixmap along a render path.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47700
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As if we try to perform the operation with outstanding operations on the
source pixmaps, we will stall waiting for them to complete.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As the GATT is irrespective of actual RAM size, we need to be careful
not to be too generous when allocating GPU bo and their shadows. So
first of all we limit default render targets to those small enough to
fit comfortably in RAM alongside others, and secondly we try to only
keep a single copy of large objects in memory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the source is not attached to a buffer (be it a GPU bo or a CPU bo),
a temporary upload buffer would be required and so it is not worth
forcing the target to the destination in that case (should the target
not be on the GPU already).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Initially, the batch->mode was only set upon an actual mode switch,
batch submission would not reset the mode. However, to facilitate fast
ring switching with semaphores, reseting the mode upon batch submission
is desired which means that if we submit the batch in the middle of an
operation we must redeclare its mode before continuing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In the common case, we expect a very small number of vertices which will
fit into the batch along with the commands. However, in full flow we
overflow the on-stack buffer and likely several continuation buffers.
Streaming those straight into the GTT seems like a good idea, with the
usual caveats over aperture pressure. (Since these are linear we could
use snoopable bo for the architectures that support such for vertex
buffers and if we had kernel support.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This includes the condition where the pixmap is too large, as well as
being too small, to be allocatable on the GPU. It is only a hint set
during creation, and may be overridden if required.
This fixes the regression in ocitysmap which decided to render glyphs
into a GPU mask for a destination that does not fit into the aperture.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Avoid the function call overhead by inspecting the low bit to see if it
is all-damaged already.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Verified on real hw, this undocumented (at least in the bspec before me)
bug truly exists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
damage
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Even if we don't know the extents of the render operation, if the entire
pixmap is damaged we can still reduce the damage tracking.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A missing check before emitting a dword into the batch opened up the
possibility of overflowing the batch and corrupting our state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Replace the growing bitfield with an enum marking where it was last
used.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In order to avoid having to perform a copy of the cacheable buffer into
GPU space, we can map a bo as cacheable and write directly to its
contents. This is only a win on systems that can avoid the clflush, and
also we have to go to greater measures to avoid unnecessary
serialisation upon that CPU bo. Sadly, we do not yet go to enough length
to avoid negatively impacting ShmPutImage, but that does not appear to
be a artefact of stalling upon a CPU buffer.
Note, LLC is a SandyBridge feature enabled by default in kernel 3.1 and
later. In time, we should be able to expose similar support for
snoopable buffers for other generations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Try to avoid a few more unnecessary context switches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Performance of this lazy interface looks inconclusive:
Speedups
========
xlib swfdec-giant-steps 1063.56 -> 710.68: 1.50x speedup
xlib firefox-asteroids 3612.55 -> 3012.58: 1.20x speedup
xlib firefox-canvas-alpha 15837.62 -> 13442.98: 1.18x speedup
xlib ocitysmap 1106.35 -> 970.66: 1.14x speedup
xlib firefox-canvas 33140.27) -> 30616.08: 1.08x speedup
xlib poppler 629.97 -> 585.95: 1.08x speedup
xlib firefox-talos-gfx 2754.37 -> 2562.00: 1.08x speedup
Slowdowns
=========
xlib gvim 1363.16 -> 1439.64: 1.06x slowdown
xlib midori-zoomed 758.48 -> 904.37: 1.19x slowdown
xlib firefox-fishbowl 22068.29 -> 26547.84: 1.20x slowdown
xlib firefox-planet-gnome 2995.96 -> 4231.44: 1.41x slowdown
It remains off and a curiosity for the time being.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This is to workaround a ping-pong issue involving small icons. The
horrible sequence of operations appears to use a tiled FillRect to copy
from the scanout onto to a temporary pixmap, which causes us to
readback from the scanout. We are destined to hit the fallback path there
anyway until we implement stippling...
References: https://bugs.freedesktop.org/show_bug.cgi?id=41718
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|