summaryrefslogtreecommitdiff
path: root/src/sna/sna_video_textured.c
AgeCommit message (Collapse)Author
2012-12-21sna/video: Initialise alignment for video ports > 0Chris Wilson
We repeatedly set the alignment value on the first port, rather than once for each. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-18sna/video: Fix presentation of cropped spritesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-18sna/video: Amalgamate the computation of source vs dest offsetsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-18sna/video: Fix adjustment of drawable vs source origin wrt to clipChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-18sna/xvmc: Clean up to avoid crash'n'burnChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-17sna/video: Pass along the video source offsetChris Wilson
Fortunately nobody had yet noticed that all videos were assumed to play with a matching src/dst origin. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-08sna: Flush upon change of target if GPU is idleChris Wilson
The aim is to improve GPU concurrency by keeping it busy. The possible complication is that we incur more overhead due to small batches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-11-30Convert generation counter to octalChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-17sna: Use the secure batches to program scanline waits on gen6+Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-26sna/video: Protect against attempting to use TexturedVideo whilst wedgedChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-14Drop some unused includesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-14sna: Aim for consistency and use stdbool except for core X APIsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-09sna: Simplify the DBG incarnationChris Wilson
It was only ever used in conjunction with HAS_DEBUG_FULL. For debug purposes it is as easy to redefine DBG locally. By simplifying the DBG macro we can create it consistently and so reduce the number of compiler warnings. Long term, this has to be dynamic. Sigh. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-23sna: extend RandR to support super sized monitor configurationsChris Wilson
With the introduction of the third pipe on IvyBridge it is possible to encounter situations where the combination of the three monitors exceed the limits of the scanout engine and so prevent them being used at their native resolutions. (It is conceivable to hit similar issues on earlier generation, especially gen2/3.) One workaround, this patch, is to extend the RandR shadow support to break the extended framebuffer into per-crtc pixmaps. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-02sna: NameForAtom may return NULLChris Wilson
Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-05-28sna: Use magic upload buffers for video texturesChris Wilson
So that we may benefit from the caching of buffers and the automatic selection of the preferred upload method. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-05-26sna: Fix typo for debug compilationChris Wilson
s/ctrc/crtc/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-05-25sna: Trust the crtc-is-bound determination after modeset and hotplugChris Wilson
As these should be the only time that they change and we now have the checks in place, we can drop the workaround of doing the check just before emitting the wait. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-04-06sna/video: Only wait upon the scanout pixmapChris Wilson
Caught by the addition of the assertion. Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-03-30sna: Minimise the risk of hotplug hangs by checking fb before vsyncChris Wilson
Everytime we issue a MI_WAIT_FOR_EVENT on a scan-line from userspace we run the risk of that pipe being disable before we submit a batch. As the pipe is then disabled or configured differently, we encounter an indefinite wait and trigger a GPU hang. To minimise the risk of a hotplug event being detected and submitting a vsynced batch prior to noticing the removal of the pipe, perform an explicit query of the current CRTC and delete the wait if we spot that our framebuffer is no longer attached. This is about as good as we can achieve without extra help from the kernel. Reported-by: Francis Leblanc <Francis.Leblanc-Lebeau@verint.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45413 (and others) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-03-02sna: Pass usage hint for creating linear buffersChris Wilson
As we wish to immediate map the vertices buffers, it is beneficial to search the linear cache for an existing mapping to reuse first. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29sna: Allow ridiculously large bo, up to half the total GATTChris Wilson
Such large bo place extreme stress on the system, for example trying to mmap a 1GiB into the CPU domain currently fails due to a kernel bug. :( So if you can avoid the swap thrashing during the upload, the ddx can now handle 16k x 16k images on gen4+ on the GPU. That is fine until you want two such images... The real complication comes in uploading (and downloading) from such large textures as they are too large for a single operation with automatic detiling via either the BLT or the RENDER ring. We could do manual tiling/switching or, as this patch does, tile the transfer in chunks small enough to fit into either pipeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27sna/video: Ensure the video pixmap is on the GPUChris Wilson
The presumption that the pixmap is the scanout and so will always be pinned is false if there is a shadow or under a compositor. In those cases, the pixmap may be idle and so the GPU bo reaped. This was compounded by that the video path did not mark the pixmap as busy. So whilst watching a video under xfce4 with compositing enabled (has to be a non-GL compositor) the video would suddenly stall. Reported-by: Paul Neumann <paul104x@yahoo.de> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45279 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27sna/video: Add some DBG messages to track the error pathsChris Wilson
References: https://bugs.freedesktop.org/show_bug.cgi?id=45279 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27sna/video: Add some more DBG breadcrumbs to the textured PutImageChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27sna/video: Simplify the gen2/915gm checkChris Wilson
And make the later check in put image match. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15sna/video: Increase the level of paranoiaChris Wilson
In how many different ways can we check that the scanout is allocated before we start decoding video? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-15sna/gen3: Check for upload failure of video boChris Wilson
And propagate that failure back to the client. Reported-by: Paul Neumann <paul104x@yahoo.de> References: https://bugs.freedesktop.org/show_bug.cgi?id=43716 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-13sna/video: Simplify check for 915G[M] which is simply gen==30Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-13sna/video: Constify a couple of attribute arraysChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-09sna/video: Pass texture video limits to the clientChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-09sna/video: Use the normal bo cache for texture video streamsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-09sna/video: Pass cropped source dimensions along with frame dataChris Wilson
So pack all the relevant details into the same structure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-30sna: Don't mess with NDEBUGChris Wilson
This is set in configure and redefining it later inside the C files just leads to trouble and broken compilation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-17sna: perform a warnings reduction passChris Wilson
Didn't spot anything that might have led to a genuine bug, but this should help improve the signal-to-noise ratio of warnings in the future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-13sna/video: Stop advertising unsupported Xv attributesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-29sna/video: Defend against PutImage to a broken screenChris Wilson
Similar to the previous commit, check that the Screen Pixmap is bound to a bo before proceeding. [Note that in this case, the absence of the bo would have been picked up much later after doing all of the setup...] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-27sna: Disable XVideo using the TexturedAdapter if the GPU is wedgedChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-12sna/dri: Fix tripple-buffering for vblank_mode=0Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-04sna: Introduce a new acceleration model.Chris Wilson
The premise is that switching between rings (i.e. the BLT and RENDER rings) on SandyBridge imposes a large latency overhead whilst rendering. The cause is that in order to switch rings, we need to split the batch earlier than is desired and to add serialisation between the rings. Both of which incur large overhead. By switching to using a pure 3D blit engine (ok, not so pure as the BLT engine still has uses for the core drawing model which can not be easily represented without a combinatorial explosion of shaders) we can take advantage of additional efficiencies, such as relative relocations, that have been incorporated into recent hardware advances. However, even older hardware performs better from avoiding the implicit context switches and from the batching efficiency of the 3D pipeline... But this is X, and PolyGlyphBlt still exists and remains in use. So for the operations that are not worth accelerating in hardware, we introduce a shadow buffer mechanism through out and reintroduce pixmap migration. Doing this efficiently is the cornerstone of ensuring that we do exploit the increased potential of recent hardware for running old applications and environments (i.e. so that the latest and greatest chip is actually faster than gen2!) For the curious, sna is SandyBridge's New Acceleration. If you are running older chipsets and welcome the performance increase offered by this patch, then you may choose to call it Snazzy instead. Speedups ======== gen3 firefox-fishtank 1203584.56 (1203842.75 0.01%) -> 85561.71 (125146.44 14.87%): 14.07x speedup gen5 grads-heat-map 3385.42 (3489.73 1.44%) -> 350.29 (350.75 0.18%): 9.66x speedup gen3 xfce4-terminal-a1 4179.02 (4180.09 0.06%) -> 503.90 (531.88 4.48%): 8.29x speedup gen4 grads-heat-map 2458.66 (2826.34 4.64%) -> 348.82 (349.20 0.29%): 7.05x speedup gen3 grads-heat-map 1443.33 (1445.32 0.09%) -> 298.55 (298.76 0.05%): 4.83x speedup gen3 swfdec-youtube 3836.14 (3894.14 0.95%) -> 889.84 (979.56 5.99%): 4.31x speedup gen6 grads-heat-map 742.11 (744.44 0.15%) -> 172.51 (172.93 0.20%): 4.30x speedup gen3 firefox-talos-svg 71740.44 (72370.13 0.59%) -> 21959.29 (21995.09 0.68%): 3.27x speedup gen5 gvim 8045.51 (8071.47 0.17%) -> 2589.38 (3246.78 10.74%): 3.11x speedup gen6 poppler 3800.78 (3817.92 0.24%) -> 1227.36 (1230.12 0.30%): 3.10x speedup gen6 gnome-terminal-vim 9106.84 (9111.56 0.03%) -> 3459.49 (3478.52 0.25%): 2.63x speedup gen5 midori-zoomed 9564.53 (9586.58 0.17%) -> 3677.73 (3837.02 2.02%): 2.60x speedup gen5 gnome-terminal-vim 38167.25 (38215.82 0.08%) -> 14901.09 (14902.28 0.01%): 2.56x speedup gen5 poppler 13575.66 (13605.04 0.16%) -> 5554.27 (5555.84 0.01%): 2.44x speedup gen5 swfdec-giant-steps 8941.61 (8988.72 0.52%) -> 3851.98 (3871.01 0.93%): 2.32x speedup gen5 xfce4-terminal-a1 18956.60 (18986.90 0.07%) -> 8362.75 (8365.70 0.01%): 2.27x speedup gen5 firefox-fishtank 88750.31 (88858.23 0.14%) -> 39164.57 (39835.54 0.80%): 2.27x speedup gen3 midori-zoomed 2392.13 (2397.82 0.14%) -> 1109.96 (1303.10 30.35%): 2.16x speedup gen6 gvim 2510.34 (2513.34 0.20%) -> 1200.76 (1204.30 0.22%): 2.09x speedup gen5 firefox-planet-gnome 40478.16 (40565.68 0.09%) -> 19606.22 (19648.79 0.16%): 2.06x speedup gen5 gnome-system-monitor 10344.47 (10385.62 0.29%) -> 5136.69 (5256.85 1.15%): 2.01x speedup gen3 poppler 2595.23 (2603.10 0.17%) -> 1297.56 (1302.42 0.61%): 2.00x speedup gen6 firefox-talos-gfx 7184.03 (7194.97 0.13%) -> 3806.31 (3811.66 0.06%): 1.89x speedup gen5 evolution 8739.25 (8766.12 0.27%) -> 4817.54 (5050.96 1.54%): 1.81x speedup gen3 evolution 1684.06 (1696.88 0.35%) -> 1004.99 (1008.55 0.85%): 1.68x speedup gen3 gnome-terminal-vim 4285.13 (4287.68 0.04%) -> 2715.97 (3202.17 13.52%): 1.58x speedup gen5 swfdec-youtube 5843.94 (5951.07 0.91%) -> 3810.86 (3826.04 1.32%): 1.53x speedup gen4 poppler 7496.72 (7558.83 0.58%) -> 5125.08 (5247.65 1.44%): 1.46x speedup gen4 gnome-terminal-vim 21126.24 (21292.08 0.85%) -> 14590.25 (15066.33 1.80%): 1.45x speedup gen5 firefox-talos-svg 99873.69 (100300.95 0.37%) -> 70745.66 (70818.86 0.05%): 1.41x speedup gen4 firefox-planet-gnome 28205.10 (28304.45 0.27%) -> 19996.11 (20081.44 0.56%): 1.41x speedup gen5 firefox-talos-gfx 93070.85 (93194.72 0.10%) -> 67687.93 (70374.37 1.30%): 1.37x speedup gen4 evolution 6696.25 (6854.14 0.85%) -> 4958.62 (5027.73 0.85%): 1.35x speedup gen3 swfdec-giant-steps 2538.03 (2539.30 0.04%) -> 1895.71 (2050.62 62.43%): 1.34x speedup gen4 gvim 4356.18 (4422.78 0.70%) -> 3276.31 (3281.69 0.13%): 1.33x speedup gen6 evolution 1242.13 (1245.44 0.72%) -> 953.76 (954.54 0.07%): 1.30x speedup gen6 firefox-planet-gnome 4554.23 (4560.69 0.08%) -> 3758.76 (3768.97 0.28%): 1.21x speedup gen3 firefox-talos-gfx 6264.13 (6284.65 0.30%) -> 5261.56 (5370.87 1.28%): 1.19x speedup gen4 midori-zoomed 4771.13 (4809.90 0.73%) -> 4037.03 (4118.93 0.85%): 1.18x speedup gen6 swfdec-giant-steps 1557.06 (1560.13 0.12%) -> 1336.34 (1341.29 0.32%): 1.17x speedup gen4 firefox-talos-gfx 80767.28 (80986.31 0.17%) -> 69629.08 (69721.71 0.06%): 1.16x speedup gen6 midori-zoomed 1463.70 (1463.76 0.08%) -> 1331.45 (1336.56 0.22%): 1.10x speedup Slowdowns ========= gen6 xfce4-terminal-a1 2030.25 (2036.23 0.25%) -> 2144.60 (2240.31 4.29%): 1.06x slowdown gen4 swfdec-youtube 3580.00 (3597.23 3.92%) -> 3826.90 (3862.24 0.91%): 1.07x slowdown gen4 firefox-talos-svg 66112.25 (66256.51 0.11%) -> 71433.40 (71584.31 0.14%): 1.08x slowdown gen4 gnome-system-monitor 5691.60 (5724.03 0.56%) -> 6707.56 (6747.83 0.33%): 1.18x slowdown gen3 ocitysmap 3494.05 (3502.44 0.20%) -> 4321.99 (4524.42 2.78%): 1.24x slowdown gen4 ocitysmap 3628.42 (3641.66 9.37%) -> 5177.16 (5828.74 8.38%): 1.43x slowdown gen5 ocitysmap 4027.77 (4068.11 0.80%) -> 5748.26 (6282.25 7.38%): 1.43x slowdown gen6 ocitysmap 1401.61 (1402.24 0.40%) -> 2365.74 (2379.14 4.12%): 1.69x slowdown [Note the performance regression for ocitysmap comes from that we now attempt to support rendering to and (more importantly) from large surfaces. By enabling such operations is the only way to one day be faster than purely using the CPU, in the meantime we suffer regression due to the increased migration and aperture thrashing. The other couple of regressions will be eliminated with improved span and shader support, now that the framework for such is in place.] The performance increase for Cairo completely overlooks the other critical aspects of the architecture: World of Padman: gen3 (800x600): 57.5 -> 96.2 gen4 (800x600): 47.8 -> 74.6 gen6 (1366x768): 100.4 -> 140.3 [F15] 144.3 -> 146.4 [drm-intel-next] x11perf (gen6); aa10text: 3.47 -> 14.3 Mglyphs/s [unthrottled!] copywinwin10: 1.66 -> 1.99 Mops/s copywinpix10: 2.28 -> 2.98 Mops/s And we do not have a good measure for how much improvement the reworking of the fallback paths give, except that xterm is now over 4x faster... PS: This depends upon the Xorg patchset "Remove the cacheing of the last scratch PixmapRec" for correct invalidations of scratch Pixmaps (used by the dix to implement SHM operations, used by chromium and gtk+ pixbufs. PPS: ./configure --enable-sna Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>