Age | Commit message (Collapse) | Author |
|
This seems to help with small slow caches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Keeping a set of pinned batches in userspace is considerably faster as
we can avoid the blit overhead. However, combining the two approaches
yields even greater performance, as fast as without either w/a, and yet
stable.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This is to make it easier to extend in future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Ease debugging by allowing all acceleration or render acceleration to be
disabled through AccelMethod:
Option "AccelMethod" "off" -> disable all acceleration
Option "AccelMethod" "blt" -> disable render acceleration (only use BLT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we will undoubtably flush and sync upon the SHM request very shortly
afterwards, we only want to use the GPU for the SHM upload iff it is
currently busy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Split the decision between where it is imperative to use the BLT to
avoid TLB misses and the second case where it is merely preferential to
witch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Track the most recent ring each bo is executed on, and prefer to keep it
on that ring for the next operation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Oops, we never managed to reuse the cached location of the target
surface as we entered it into the cache with the wrong key.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
cell_list_alloc() is only called from one place, and the compiler should
already be inlining it - but does not appear to be. Hint harder.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The goal is to reduce the preference of rendering to a SHM pixmap - only
if it is already active, will we consider continuing to use it on the
GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
...rather than force the exchange.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In case anyone ever wants to disable the default.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
So that we can prevent feeding back a stale bo when the DRI2 client
tries to swap an old buffer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57212
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Further restrict the amount of fenced bo we try to fit into the batch to
make it easier for the kernel to accommodate the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
It is always done at the beginning of vertex emission.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In case we hit a path were we avoid reusing the source for the mask and
leave is_affine unset for a solid mask.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Makes this 855gm much happier.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Must remember, its octal not decimal.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we may write preparatory instructions into the batch before checking
for a flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Simplify the later checks by always populating the lists with a single,
albeit unpinned, bo in the case we fail to create pinned batches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=26345
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the output is unscaled, then we do not require pixel interpolation
(and planar formats are exactly subsampled).
References: https://bugs.freedesktop.org/show_bug.cgi?id=58185
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fixes the flickering seen in the fishtank demo, for example.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Might be worth caching the last-known-value so we can skip the query for
an old swap request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The DRI2 protocol is inherently racy. Fortuituously, this can be swept
under the carpet by forcing the serialisation between the DRI2 clients
by using a blit for the SwapBuffers.
References: https://bugs.freedesktop.org/show_bug.cgi?id=58005
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As Jesse pointed out, it is legal for the client to request that the
flip be some frame in the future even with no divisor.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If divisor is 0 but the current MSC is behind the target, we shouldn't
schedule a flip (which will occur at the next vblank) or we'll end up
displaying it early and returning the wrong timestamp.
Preserve the optimization though by allowing us to schedule a flip if
both the divisor is 0 and the current MSC is equal to or ahead of the
target; this avoids a round trip through the kernel.
Reported-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
|
This can happen naturally for 3-pipe config on Ivybridge or if the
outputs are rearranged whilst we slept. Instead of failing to change the
display on the VT, install at least a fb on the CompatOutput so that
hopefully the DE can take over, or give some control to the user.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Oops, I thought the 'busy' bit was now used and apparently forgot it is
used to control the periodic flushing...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A compromise between not flushing quick enough and flushing too often,
hopefully.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we submit a batch early (for example if the GPU is idle), then submit
whatever else the client drew immediately upon completion of its
blockhandler. This is required to prevent flashing due to visible delay
between the clear at the start of the cycle and then the overdraw later.
References: https://bugs.freedesktop.org/show_bug.cgi?id=51718
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56825
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Further experimentation...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The aim is to improve GPU concurrency by keeping it busy. The possible
complication is that we incur more overhead due to small batches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|