Age | Commit message (Collapse) | Author |
|
The basis for the constraints are what we can map into the aperture for
direct writing with the CPU, so use the size of the mappable region as
opposed to the size of the total GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In order to capture and reuse all io buffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Daniel claims that this is will be faster, or will be once he has
completed rewriting pwrite!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The whole pixmap means the sample covers the full width and height, not
just either!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we change contexts, then we will submit the batch obsoleting the
earlier resource checks.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Benchmarking on the current code base, says this is now a win. A
reversal of older benchmarks, so expect further tuning.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Reduce the calls to compute the surface size down to one.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
i.e. only force the BLT if using the sampler is going to be incredibly
slow.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The goal of the optimisation is to discard the GPU bo early, so we
can skip the extra damage reduction if there is no gpu bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Once again experiment with untiled smalled buffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We want to avoid the condition of reducing the tiling mode (when reusing
an active untiled buffer in preference to creating a new) for a wide buffer
when doing will force a TLB miss on each sample.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we handle tiled spans indirectly, we need to avoid applying the
drawable offsets twice (once in the mi layer generating the spans, and
then once more in the tiled rect renderer).
Reported-by: Ulrich Müller <ulm@gentoo.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43245
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The cost of the TLB miss on every sample far outweighs the impact of the
context (and ring) switch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
There are many operations, usually the core drawing acceleration, where
the BLT is much more preferable than using the CPU. However, the BLT is
limited to only using X-tiling, so if we encounter a Y-tiled pixmap
target we need to recreate it as X-tiling before proceeding. Hopefully,
the pixmap is then kept around and rendered multiple times to amoritize
the cost of the copy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Y-tiling is slightly faster with RENDER operations, so attempt to
allocate source-only pixmaps using this tiling mode. Actually using
Y-tiling is a delicate balance because it then prevents the use of the
BLT. For instance, enabling Y-tiling by default gives a 30% performance
improvement on the fish-demo (compositing benchmark) at 2560x1440 on
Ironlake but regresses tiger-demo by 2x (spans benchmark).
So experiment with this compromise and allow for changing the default
tiling.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Take advantage of a couple of new instructions introduced with Cantiga
to reduce the instruction count inside the shaders and improve
performance by around 10% in the fish-demo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The goal is to keep running until the tick after every stops,
irrespective of forced flushes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
And so hopefully make it clearer. In the process we restore the flushing
behaviour for UXA back to before the glamor intervention.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Reported-by: Roman Jarosz <kedgedev@gmail.com>
Reported-by: da_fox@mad.scientist.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43134
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In order to workaround a bug in the shaders on gen4, we need to flush
the pipeline after every rectangle. The recently introduced fill-one
mechanism for gen4, missed this vital step triggering a random hang with
an otherwise sane batchbuffer (the missing flush is hard to spot!).
Fixes regression from 86f99379ee5 (sna/gen4: Add fill-one).
Reported-by: Albert Damen <albrt@gmx.net>
Reported-by: Fryderyk Dziarmagowski <fdziarmagowski@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43083
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Unroll the byte reversal as we know the row length is word aligned.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fixes x11perf -wdcircle100 -time 1 -repeat 1 -rop GXxor
Reported-by: Fryderyk Dziarmagowski <fdziarmagowski@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43084
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This commit only enables two glamor functions for
uxa_fill_spans and uxa_poly_fill_rects.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Added one configuration option --enable-glamor to control
whether use glamor. Added one new file intel_glamor.c to
wrap glamor egl API for intel driver's usage.
This commit doesn't really change the driver's control path.
It just adds necessary files for glamor and change some
configuration.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Integrate glamor acceleration into UXA framework. Add
necessary flushing at the following points:
1. Flush UXA batch buffer before call into glamor.
2. Flush GL operations after return from a glamor function.
3. The point we need to flush UXA batch buffer, we also
need to flush GL operations, for example, in
intel_flush_callback and couple of places in intel_display.c.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This reverts commit 212fa9868767637e8f430485eeb522c99e63fd16.
The underlying register programming for eDP is now believed to be fixed
as of linux-3.1.
References: https://bugs.freedesktop.org/show_bug.cgi?id=38012
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41070
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|