Age | Commit message (Collapse) | Author |
|
As it is advisable to combined the synchronous rendering debug option
with other debugging options, it is more convenient to make it into a
configure option: --enable-debug=sync
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This is handy for the case where the batch triggers a GPU hang rather
than being rejected by the kernel.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We can replace the custom heuristics for PutImage by applying them to
the common path, where hopefully they are equally valid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As these do not flush the active state if we have read-read mappings, we
need to be careful with our asserts concerning the busy flag.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we may fail the size check with an empty batch and a pair of large
bo, we need to check before submitting that batch in order to not run
afoul of our internal sanity checks.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
So that in the cache where we are driving multiple independent screens
each having their own device, we do not share the global reserved
request in the event of an allocation failure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In the case where the kernel is inserting semaphores to serialise work
between rings, we want to only delay the surface that is coming from the
other ring and not interfere with work already queued.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The asm() prevents SNA from compiling on ia64.
Fixes https://bugs.gentoo.org/show_bug.cgi?id=448570
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The exported function is not used, so mark it static and strengthen the
assertions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Avoid having to walk the full relocation array for the few entries that
need to be updated for the batch buffer offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Having reduced all the vb code for these generations to the same set of
routines, we can refactor them into a single set of functions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We only use a single cache for very large buffers, so we need to be
careful that we set the tiling on them. More so, we need to take extra
care when allocating large CPU bo from that cache to be sure that they
are untiled and the flags are true.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Now that the feature has been committed upstream, we can rely on the
runtime detection.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This seems to help with small slow caches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Keeping a set of pinned batches in userspace is considerably faster as
we can avoid the blit overhead. However, combining the two approaches
yields even greater performance, as fast as without either w/a, and yet
stable.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This is to make it easier to extend in future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Track the most recent ring each bo is executed on, and prefer to keep it
on that ring for the next operation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Further restrict the amount of fenced bo we try to fit into the batch to
make it easier for the kernel to accommodate the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Makes this 855gm much happier.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Must remember, its octal not decimal.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Simplify the later checks by always populating the lists with a single,
albeit unpinned, bo in the case we fail to create pinned batches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=26345
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Oops, I thought the 'busy' bit was now used and apparently forgot it is
used to control the periodic flushing...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A compromise between not flushing quick enough and flushing too often,
hopefully.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we submit a batch early (for example if the GPU is idle), then submit
whatever else the client drew immediately upon completion of its
blockhandler. This is required to prevent flashing due to visible delay
between the clear at the start of the cycle and then the overdraw later.
References: https://bugs.freedesktop.org/show_bug.cgi?id=51718
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Further experimentation...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The aim is to improve GPU concurrency by keeping it busy. The possible
complication is that we incur more overhead due to small batches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Previously, before every operation we would look to see if the GPU was
idle and we were running under a DRI compositor. If the GPU was idle, we
would flush the batch in the hope that we reduce the cost of the context
switch and copy from the compositor (by completing the work earlier).
However, we would complete the work far too earlier and as a result
would need to flush the batch before every single operation resulting in
extra overhead and reduced performance. For example, the gtkperf
circles benchmark under gnome-shell/compiz would be 2x slower on
Ivybridge.
Reported-by: Michael Larabel <michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|