Age | Commit message (Collapse) | Author |
|
This helps SNB on cairo-traces that utilize lots of temporary uploads
(rasterised sources and masks for instance), but comes at a cost of
regressing others...
In order to counter the regression from increasing the GTT cache size,
the CPU/GTT vma cache are split and accounted separately.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
After reducing the used size in the partial buffer, we need to resort
the list to maintain the list in decreasing amount of available space.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Allow SandyBridge to specialise its clear routine to reduce the number
of ring switches. It may be interesting to specialise the clear routines
even further and use the special render clear commands...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Optimistically we would replace the GPU damage with the new set of
trapezoids. However, if any partial damage remains then the next
operation which is often to composite another layer of trapezoids (for
complex clipmasks) using IN will then stall.
This fixes a regression in firefox-fishbowl (and lesser regressions
throughout the cairo-traces).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Continuing the tuning for sna_copy_boxes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The first, and likely only, goal is to support SHMPixmap efficiently
(and without compromising SHMImage!) which we want to preserve as vmaps
and never create a GPU bo. For all other use cases, we will want to
create snoopable CPU bo ala the LLC buffers on SandyBridge.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We trade-off the extra copy in the hope that as we haven't used the GPU
bo before then, we won't need it again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
And update the check for reusing the blit!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Once the buffer is destroyed, it may be reallocated with a new pitch. We
could track handle and pitch, but it is easier to simply restart the
blit after the buffer is freed.
References: https://bugs.freedesktop.org/show_bug.cgi?id=44277
References: https://bugs.freedesktop.org/show_bug.cgi?id=44555
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
One restriction common to all generations is that samplers access pairs
of rows and so we need to pad the buffer to accommodate access to that
second row. Do so unconditionally along paths that may be used by the
render pipeline.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Similar to the action taken into move-to-gpu so that we forgo the
overhead of damage tracking when the initial act of creation is on the
render paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
By inlining the swizzling of the alpha-channel we can support BLT copies
from an alpha-less pixmap to an alpha-destination.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
References: https://bugs.freedesktop.org/show_bug.cgi?id=44504
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Strange as it may seem... But the principle of doing less work with
greater locality should help everywhere, just not as noticeable when
real work is performed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
When the bo is already completely damaged on the CPU, all we need to do
is to sync with the CPU bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Faking it for the render upload simply isn't good enough, since we need
the correct drawrect.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
With no CPU damage to upload, we know that there is no reason not to use
the GPU bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we spot that the region is wholly contained within the CPU damage
initially, we can conclude that is not in the GPU damage without
reduction.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Check if the source and mask are identical pictures and just copy the
source channel to the mask in that case.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
For the common case (at least with llc bo) where we are immediately
using an uploaded image from its linear buffer, check upfront before
computing the sampled region for transfer to the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
For the common case of glyphs, the pixmap is entirely on the GPU which
can be quickly tested before performing the more complex transformations
to determine how much pixel data we need to upload.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Without xserver support for notification of when scratch pixmaps are
reused, we simply cannot attach our privates to them lest we cause
corruption with SHM pixmaps.
This is a recent regression back unto an old, old xserver issue.
Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44503
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
To aide debugging in conjunction with compositors and their crazy
offsets.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Beware the NULL pointer and early deference.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In order for the entire PutImage to be performed inplace, we need to
maintain the tendency to keep doing inplace operations. This hint is
provided by tracking whether or not the last operation used the GTT
mapping. However, that hint was not being provided by zpixmap_blt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the operation does not replace existing CPU damage, we are likely to
want to reuse the pixmap again on the CPU, so avoid mixing CPU/GPU
operations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
On systems that incur painful overhead for ring switches, it is usually
better to create a large buffer and perform a sparse copy on the same
ring than create a compact buffer and use the BLT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This reverts 281425551bdab7eb38ae167a3205b14ae3599c49 as it was causing
insufferable lag in firefox.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we decide to defer the upload for this instance of the source pixmap,
mark it so. Then if we do use it again we will upload it to a GPU bo and
hopefully reuse those pixels.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the pixmap is entirely within the current CPU damage, we can forgo
reducing either the GPU or CPU damage when checking whether we need to
upload dirty pixels for a source texture.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we think that the operation is better performed on the CPU, avoid the
overhead of manipulating our privates.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As demonstrated with oversized glyphs and a chain of catastrophy, when
attaching our private to a pixmap after creation we need to mark the
entire CPU pixmap as dirty as we never tracked exactly which bits were
dirtied.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Glyphs, even large ones, we suspect will be reused and so the deferred
upload is counterproductive. Upload them immediately and mark them as
special creatures for later debugging.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we explicitly create CPU bo when wanted, we no longer desire to
spontaneously create vmaps for simply uploading to the GPU bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We need to be carefully to copy the boxes in a strict lifo order so as
to avoid overwritting the last boxes when reusing the array allocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|