Age | Commit message (Collapse) | Author |
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fixes regression from
commit 195a51353c3af7bd253227da5f759f06cea01f73 [2.21.8]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 9 19:13:46 2013 +0100
sna/video: Convert to a pure Xv backend
Reported-by: Edward Sheldrake <ejsheldrake@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65479
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This is to enable feature work which requires access to Client state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This reverts commit ddd75d6539dcf692cb76747cd63d1f301180f18a.
This is a WIP patch, not ready for upstream. The danger of mixing topic
branches.
|
|
|
|
Each of the overlay, sprite and textured video can support XvMC
passthrough, so we need to setup an XvMC adaptor for each of our Xv
adaptors.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
When applying pan and zoom to a mismatched video, it would inevitably
miscompute the origin and scale factors.
Reported-by: Matti Hamalainen <ccr@tnsp.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61610
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Under certain circumstances, XvScreenInit can indeed fail, so do not
bother with creatin XvMC (as it triggers internal assertions if it
cannot find our adaptor amongst Xv's).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The total frame size is less than 3 times the subsampled chroma planes
due to the additional alignment bytes.
Bugzilla: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1104180
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We repeatedly set the alignment value on the first port, rather than
once for each.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47597
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
References: https://bugs.freedesktop.org/show_bug.cgi?id=47597
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Simply ignore the cropping and copy the whole plane rather than
complicate the computation of the packed destination pixels.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Oh fun. Textured video expects the source content to be relative to the
origin, whereas overlay video expects the source at the origin.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Yikes, setting image.x2 == image.x1 meant no data was copied whilst the
video was clipped.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fortunately nobody had yet noticed that all videos were assumed to play
with a matching src/dst origin.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
So that we can use the same teardown path as normal scanouts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
It was only ever used in conjunction with HAS_DEBUG_FULL. For debug
purposes it is as easy to redefine DBG locally. By simplifying the DBG
macro we can create it consistently and so reduce the number of compiler
warnings.
Long term, this has to be dynamic. Sigh.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
With the introduction of the third pipe on IvyBridge it is possible to
encounter situations where the combination of the three monitors exceed
the limits of the scanout engine and so prevent them being used at their
native resolutions. (It is conceivable to hit similar issues on earlier
generation, especially gen2/3.) One workaround, this patch, is to extend
the RandR shadow support to break the extended framebuffer into per-crtc
pixmaps.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
So that we may benefit from the caching of buffers and the automatic
selection of the preferred upload method.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Unifies available options for both UXA and SNA drivers, and
moves them into a common header file, intel_opts.h.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
|
|
Based on the work by Jesse Barnes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we wish to immediate map the vertices buffers, it is beneficial to
search the linear cache for an existing mapping to reuse first.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
For a singular region, we want to use a value for nboxes of 0 not 1,
fortunately if you pass in a box, it ignores the value of nboxes.
RegionInit() is a most peculiar API!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Such large bo place extreme stress on the system, for example trying to
mmap a 1GiB into the CPU domain currently fails due to a kernel bug. :(
So if you can avoid the swap thrashing during the upload, the ddx can now
handle 16k x 16k images on gen4+ on the GPU. That is fine until you want
two such images...
The real complication comes in uploading (and downloading) from such
large textures as they are too large for a single operation with
automatic detiling via either the BLT or the RENDER ring. We could do
manual tiling/switching or, as this patch does, tile the transfer in
chunks small enough to fit into either pipeline.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Avoid the function call overhead by inspecting the low bit to see if it
is all-damaged already.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A VMA cache appears unavoidable thanks to compiz and an excrutiatingly
slow GTT pagefault, though it does look like it will be ineffectual
during everyday usage. Compiz (and presumably other compositing
managers) appears to be undoing all the pagefault minimisation as
demonstrated on gen5 with large XPutImage. It also appears the CPU to
memory bandwidth ratio plays a crucial role in determining whether
going straight to GTT or through the CPU cache is a win - so no trivial
heuristic.
x11perf -putimage10 -putimage500 on i5-2467m:
Before:
bare: 1150,000 2,410
compiz: 438,000 2,670
After:
bare: 1190,000 2,730
compiz: 437,000 2,690
UXA:
bare: 658,000 2,670
compiz: 389,000 2,520
On i3-330m
Before:
bare: 537,000 1,080
compiz: 263,000 398
After:
bare: 606,000 1,360
compiz: 203,000 985
UXA:
bare: 294,000 1,070
compiz: 197,000 821
On pnv:
Before:
bare: 179,000 213
compiz: 106,000 123
After:
bare: 181,000 246
compiz: 103,000 197
UXA:
bare: 114,000 312
compiz: 75,700 191
Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Don't just deference any old random pointer, use the one we actually
mapped in the first place!
Reported-by: Matti Hamalainen <ccr@tnsp.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42880
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
So pack all the relevant details into the same structure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42412
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Didn't spot anything that might have led to a genuine bug, but this
should help improve the signal-to-noise ratio of warnings in the future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Remove the PCI ID device checks by using the simpler check on the
generation id for errata pertaining to 830/845.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Lots of scary warnings found by valgrind.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The premise is that switching between rings (i.e. the BLT and
RENDER rings) on SandyBridge imposes a large latency overhead whilst
rendering. The cause is that in order to switch rings, we need to split
the batch earlier than is desired and to add serialisation between the
rings. Both of which incur large overhead.
By switching to using a pure 3D blit engine (ok, not so pure as the BLT
engine still has uses for the core drawing model which can not be easily
represented without a combinatorial explosion of shaders) we can take
advantage of additional efficiencies, such as relative relocations, that
have been incorporated into recent hardware advances. However, even
older hardware performs better from avoiding the implicit context
switches and from the batching efficiency of the 3D pipeline...
But this is X, and PolyGlyphBlt still exists and remains in use. So for
the operations that are not worth accelerating in hardware, we introduce a
shadow buffer mechanism through out and reintroduce pixmap migration.
Doing this efficiently is the cornerstone of ensuring that we do exploit
the increased potential of recent hardware for running old applications and
environments (i.e. so that the latest and greatest chip is actually faster
than gen2!)
For the curious, sna is SandyBridge's New Acceleration. If you are
running older chipsets and welcome the performance increase offered by
this patch, then you may choose to call it Snazzy instead.
Speedups
========
gen3 firefox-fishtank 1203584.56 (1203842.75 0.01%) -> 85561.71 (125146.44 14.87%): 14.07x speedup
gen5 grads-heat-map 3385.42 (3489.73 1.44%) -> 350.29 (350.75 0.18%): 9.66x speedup
gen3 xfce4-terminal-a1 4179.02 (4180.09 0.06%) -> 503.90 (531.88 4.48%): 8.29x speedup
gen4 grads-heat-map 2458.66 (2826.34 4.64%) -> 348.82 (349.20 0.29%): 7.05x speedup
gen3 grads-heat-map 1443.33 (1445.32 0.09%) -> 298.55 (298.76 0.05%): 4.83x speedup
gen3 swfdec-youtube 3836.14 (3894.14 0.95%) -> 889.84 (979.56 5.99%): 4.31x speedup
gen6 grads-heat-map 742.11 (744.44 0.15%) -> 172.51 (172.93 0.20%): 4.30x speedup
gen3 firefox-talos-svg 71740.44 (72370.13 0.59%) -> 21959.29 (21995.09 0.68%): 3.27x speedup
gen5 gvim 8045.51 (8071.47 0.17%) -> 2589.38 (3246.78 10.74%): 3.11x speedup
gen6 poppler 3800.78 (3817.92 0.24%) -> 1227.36 (1230.12 0.30%): 3.10x speedup
gen6 gnome-terminal-vim 9106.84 (9111.56 0.03%) -> 3459.49 (3478.52 0.25%): 2.63x speedup
gen5 midori-zoomed 9564.53 (9586.58 0.17%) -> 3677.73 (3837.02 2.02%): 2.60x speedup
gen5 gnome-terminal-vim 38167.25 (38215.82 0.08%) -> 14901.09 (14902.28 0.01%): 2.56x speedup
gen5 poppler 13575.66 (13605.04 0.16%) -> 5554.27 (5555.84 0.01%): 2.44x speedup
gen5 swfdec-giant-steps 8941.61 (8988.72 0.52%) -> 3851.98 (3871.01 0.93%): 2.32x speedup
gen5 xfce4-terminal-a1 18956.60 (18986.90 0.07%) -> 8362.75 (8365.70 0.01%): 2.27x speedup
gen5 firefox-fishtank 88750.31 (88858.23 0.14%) -> 39164.57 (39835.54 0.80%): 2.27x speedup
gen3 midori-zoomed 2392.13 (2397.82 0.14%) -> 1109.96 (1303.10 30.35%): 2.16x speedup
gen6 gvim 2510.34 (2513.34 0.20%) -> 1200.76 (1204.30 0.22%): 2.09x speedup
gen5 firefox-planet-gnome 40478.16 (40565.68 0.09%) -> 19606.22 (19648.79 0.16%): 2.06x speedup
gen5 gnome-system-monitor 10344.47 (10385.62 0.29%) -> 5136.69 (5256.85 1.15%): 2.01x speedup
gen3 poppler 2595.23 (2603.10 0.17%) -> 1297.56 (1302.42 0.61%): 2.00x speedup
gen6 firefox-talos-gfx 7184.03 (7194.97 0.13%) -> 3806.31 (3811.66 0.06%): 1.89x speedup
gen5 evolution 8739.25 (8766.12 0.27%) -> 4817.54 (5050.96 1.54%): 1.81x speedup
gen3 evolution 1684.06 (1696.88 0.35%) -> 1004.99 (1008.55 0.85%): 1.68x speedup
gen3 gnome-terminal-vim 4285.13 (4287.68 0.04%) -> 2715.97 (3202.17 13.52%): 1.58x speedup
gen5 swfdec-youtube 5843.94 (5951.07 0.91%) -> 3810.86 (3826.04 1.32%): 1.53x speedup
gen4 poppler 7496.72 (7558.83 0.58%) -> 5125.08 (5247.65 1.44%): 1.46x speedup
gen4 gnome-terminal-vim 21126.24 (21292.08 0.85%) -> 14590.25 (15066.33 1.80%): 1.45x speedup
gen5 firefox-talos-svg 99873.69 (100300.95 0.37%) -> 70745.66 (70818.86 0.05%): 1.41x speedup
gen4 firefox-planet-gnome 28205.10 (28304.45 0.27%) -> 19996.11 (20081.44 0.56%): 1.41x speedup
gen5 firefox-talos-gfx 93070.85 (93194.72 0.10%) -> 67687.93 (70374.37 1.30%): 1.37x speedup
gen4 evolution 6696.25 (6854.14 0.85%) -> 4958.62 (5027.73 0.85%): 1.35x speedup
gen3 swfdec-giant-steps 2538.03 (2539.30 0.04%) -> 1895.71 (2050.62 62.43%): 1.34x speedup
gen4 gvim 4356.18 (4422.78 0.70%) -> 3276.31 (3281.69 0.13%): 1.33x speedup
gen6 evolution 1242.13 (1245.44 0.72%) -> 953.76 (954.54 0.07%): 1.30x speedup
gen6 firefox-planet-gnome 4554.23 (4560.69 0.08%) -> 3758.76 (3768.97 0.28%): 1.21x speedup
gen3 firefox-talos-gfx 6264.13 (6284.65 0.30%) -> 5261.56 (5370.87 1.28%): 1.19x speedup
gen4 midori-zoomed 4771.13 (4809.90 0.73%) -> 4037.03 (4118.93 0.85%): 1.18x speedup
gen6 swfdec-giant-steps 1557.06 (1560.13 0.12%) -> 1336.34 (1341.29 0.32%): 1.17x speedup
gen4 firefox-talos-gfx 80767.28 (80986.31 0.17%) -> 69629.08 (69721.71 0.06%): 1.16x speedup
gen6 midori-zoomed 1463.70 (1463.76 0.08%) -> 1331.45 (1336.56 0.22%): 1.10x speedup
Slowdowns
=========
gen6 xfce4-terminal-a1 2030.25 (2036.23 0.25%) -> 2144.60 (2240.31 4.29%): 1.06x slowdown
gen4 swfdec-youtube 3580.00 (3597.23 3.92%) -> 3826.90 (3862.24 0.91%): 1.07x slowdown
gen4 firefox-talos-svg 66112.25 (66256.51 0.11%) -> 71433.40 (71584.31 0.14%): 1.08x slowdown
gen4 gnome-system-monitor 5691.60 (5724.03 0.56%) -> 6707.56 (6747.83 0.33%): 1.18x slowdown
gen3 ocitysmap 3494.05 (3502.44 0.20%) -> 4321.99 (4524.42 2.78%): 1.24x slowdown
gen4 ocitysmap 3628.42 (3641.66 9.37%) -> 5177.16 (5828.74 8.38%): 1.43x slowdown
gen5 ocitysmap 4027.77 (4068.11 0.80%) -> 5748.26 (6282.25 7.38%): 1.43x slowdown
gen6 ocitysmap 1401.61 (1402.24 0.40%) -> 2365.74 (2379.14 4.12%): 1.69x slowdown
[Note the performance regression for ocitysmap comes from that we now
attempt to support rendering to and (more importantly) from large
surfaces. By enabling such operations is the only way to one day be
faster than purely using the CPU, in the meantime we suffer regression
due to the increased migration and aperture thrashing. The other couple
of regressions will be eliminated with improved span and shader support,
now that the framework for such is in place.]
The performance increase for Cairo completely overlooks the other
critical aspects of the architecture:
World of Padman:
gen3 (800x600): 57.5 -> 96.2
gen4 (800x600): 47.8 -> 74.6
gen6 (1366x768): 100.4 -> 140.3 [F15]
144.3 -> 146.4 [drm-intel-next]
x11perf (gen6);
aa10text: 3.47 -> 14.3 Mglyphs/s [unthrottled!]
copywinwin10: 1.66 -> 1.99 Mops/s
copywinpix10: 2.28 -> 2.98 Mops/s
And we do not have a good measure for how much improvement the reworking
of the fallback paths give, except that xterm is now over 4x faster...
PS: This depends upon the Xorg patchset "Remove the cacheing of the last
scratch PixmapRec" for correct invalidations of scratch Pixmaps (used by
the dix to implement SHM operations, used by chromium and gtk+ pixbufs.
PPS: ./configure --enable-sna
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|