Age | Commit message (Collapse) | Author |
|
|
|
This paranoid check is deceased; pining for the fjords.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
If the destination cannot fit into the 3D pipeline when we need to
composite, we fallback to doing the operation on the CPU. This is very
slow, and quite easy to trigger on i915 by plugging in an external
display.
An alternative is to extract the extents of the operation from the
destination using the blitter which can usually handle much larger
operations. This gives us a temporary target that can fit into the 3D
pipeline and thus be accelerated, before copying back into the larger
real destination.
For x11perf this boosts glyph rendering on PineView, from 38kglyphs/s to
480kglyphs/s. Just a little shy of the native performance of 601kglyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
On my PineView box these represent ~5% overhead on x11perf text:
Before:
16000000 trep @ 0.0020 msec (495000.0/sec): Char in 80-char aa line (Charter 10)
12000000 trep @ 0.0022 msec (461000.0/sec): Char in 80-char rgb line (Charter 10)
After:
16000000 trep @ 0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
x11perf regression caused by 2D driver
https://bugs.freedesktop.org/show_bug.cgi?id=28047
caused by
commit a7b800513fcc94e063dfd68d2f63b6bab7fae47d
uxa: Extract sub-region from in-memory buffers.
The issue is that as we extract the region prior to checking whether the
composite can in fact be accelerated, we perform expensive surplus
operations. This is particularly noticeable for ComponentAlpha text,
such as rgb10text. The solution here is to rearrange the
check_composite() prior to acquiring the sources, and only extracting
the subregion if the render path can not actually handle the texture.
Performance (on PineView):
a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s
a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s
now: aa=66800 glyph/s, rgb=28800 glyphs/s
The residual lossage seems to be from the extra function call and
dixPrivate lookups. Hmm. More warning is the extremely low performance,
however the results are consistent so the improvement looks real...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
The PRM (Vol 1, p32) specifies that the URB_FENCE command must not cross
a cache-line boundary (64-bytes) in order to workaround a silicon issue.
Ensure that it does not by inserting an alignment point before the atomic
section.
This is a slightly too large hammer, but the easiest method to work with
the current BEGIN_BATCH/ADVANCE_BATCH protections.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Beware the potential buffer overflow.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This should restore the previous level of synchronisation between
textures and pixmaps, but *does not* guarantee that a texture will be
flushed before use. tfp should be fixed so that the ddx can submit the
batch if required to flush the pixmap.
A side-effect of this patch is to rename intel_batch_flush() to
intel_batch_submit() to reduce the confusion of executing a batch buffer
with that of emitting a MI_FLUSH.
Should fix the remaining rendering corruption involving tfp [inc compiz]:
Bug 25431 [i915 bisected] piglit/texturing_tfp regressed
http://bugs.freedesktop.org/show_bug.cgi?id=25431
Bug 25481 Wrong cursor format and cursor blink rate with compiz enabled
http://bugs.freedesktop.org/show_bug.cgi?id=25481
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
There is only a single caller that wishes to forcibly append a flush
into the batch: intel_sync(). So move the logic there.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Ensure that the render caches and texture caches are appropriately
flushed when switching a pixmap from a target to a source.
This should fix bug 24315,
[855GM] Rendering corruption in text (usually)
https://bugs.freedesktop.org/show_bug.cgi?id=24315
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In order to detect when we require cache flushes we need to track which
domains the pixmap currently belongs to. So to do so we create a device
private structure to hold the extra information and hook it up.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Since batch buffers are rarely emitted by themselves but as part of a
sequence of state and vertices, the whole sequence is emitted atomically.
Here we just enforce that batches are marked as being part of an atomic
sequence as appropriate.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
It's poor style, and has confused new developers.
|
|
Make the following options available via xorg.conf:
Section "Driver"
Option "DebugFlushBatches" "1" # Flush the batch buffer after every
# single operation;
Option "DebugFlushCaches" "1" # Include a MI_FLUSH at the end of every
# batch buffer to force data to be
# flushed out of cache and into memory
# before the completion of the batch.
Option "DebugWait" "1" # Wait for the completion of every batch buffer
# before continuing, i.e. perform synchronous
# rendering.
EndSection
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
|
|
Also, start settling on the cairo naming for things: source, mask, and dest.
|
|
|
|
This is the beginning of the campaign to remove some of the absurd use of
Hungarian in the driver. Not that I don't like Hungarian, but I don't need
to know that pI830 is a pPointer.
|
|
We've talked about doing this since the start of the project, putting it off
until "some convenient time". Just after removing a third of the driver seems
like a convenient time, when backporting's probably not happening much anyway.
|
|
At this point, the only remaining feature regressions should be the lack of
overlay support (about to land), and the need to update the XVMC code to work
in the presence of KMS.
Acked-by: Keith Packard <keithp@keithp.com> (in principle)
Acked-by: Carl Worth <cworth@cworth.org> (in principle)
|
|
Currently when asked to composite using a gradient source or mask, we
fallback to using fbComposite(). This has the side-effect of causing a
readback on the destination surface, stalling the GPU pipeline. Instead,
like uxa_trapezoids(), we can use pixman to fill a scratch pixmap and then
copy that to an offscreen pixmap for use with uxa_composite().
Speedups on i915:
firefox-talos-svg: 710378.14 -> 549262.96: 1.29x speedup
No slowdowns.
Thanks to Søeren Sandmann Pedersen for spotting the missing
ValidatePicture().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
|
|
This patch enables 2D composite on IGDNG. IGDNG requires
new compiled shader programs for Gen5 and some command changes.
The most notable is the layout of vertex element has changed,
but we tried to keep it as origin to not change shader programs.
Also vertex buffer state requires end address of vertex buffer
instead of origin max index.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
This reverts commit 4653a7db622ad54a3182d93c81331765d930db34.
Eric was getting a little too ambitious about our brave, new world.
We do still want the driver to work with old, non-GEM kernels
after all.
|
|
|
|
Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
[anholt: fixed up unneeded != NULL checks]
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
This removes it from a callsite where it would have just resulted in a
fatalerror.
|
|
This cleans up findstatic.pl output for the i830+ code, which resulted in
removing some code. The only odd part of this commit is the
if (0) i830_sdvo_dump() in i830_sdvo.c -- it tells the compiler that the code
is used, without using it since we want the code around while debugging.
It's also in a likely place to ask for the dump, so I think it's OK.
|
|
It's been broken for years now, and KMS offers a much better chance of getting
this working sensibly without making a mess of the 2D driver.
|
|
This fixes huge memory leak at each VT switch (about 600 BOs + 6MB
of RSS of Xserver).
|
|
They're tiny so it shouldn't have been a problem, but play it safe. This is
another <5% loss on top of the previously reported value, bringing the whole
series to about 8%.
|
|
This eliminates the pinned memory allocation for 965 render state.
|
|
|
|
This is a first step in a series of changes to avoid requiring a pinned object,
which gets in the way of doing non-root KMS. This change appears to result in
about a 2-6% loss in x11perf -aa10text, which better algorithms in libdrm could
make up for (it hasn't really had to deal with code this bad before).
|
|
|
|
This improves performance by avoiding repeated map/unmap cycles, which are
a bit expensive on my machine with lock debugging on in the kernel. It could
do much better if we did more than 18 or so floats at a time.
|
|
The require_space had failed since it only checked for the space required
by the batch emits in the function itself, but not in the
i965_emit_composite_state() that it called (the state we were concerned about
having set up for that 12 * 4 dwords to follow!). This is replaced by
intel_batch_start_atomic(), which will catch such mistakes in the future.
|
|
|
|
Previously it wouldn't count the pixmaps that were about to be used, which
is pretty much the only purpose of having the pain around. This also
eliminates the check_twice confusion with emit_batch_header_for_composite().
|
|
|
|
|
|
We want the objects to be created once per prepare/done both for efficiency and
so we can handle aperture checking better.
|