summaryrefslogtreecommitdiff
path: root/src/sna/sna_damage.h
AgeCommit message (Collapse)Author
2013-02-10sna: Backport to squeeze - Xorg-1.6, pixman-0.16, libdrm-2.4.21Chris Wilson
The principle change is to switch to the old Privates API and undo the Region renames. The downside is that this ignores the critical bugfixes made to the xserver since xorg-1.6 - but I assume that whoever wants to run the latest hardware on the old xservers is also backporting those stability fixes... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-14sna: Apply PutImage optimisations to move-to-cpuChris Wilson
We can replace the custom heuristics for PutImage by applying them to the common path, where hopefully they are equally valid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-14sna: Aim for consistency and use stdbool except for core X APIsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-09sna: Simplify the DBG incarnationChris Wilson
It was only ever used in conjunction with HAS_DEBUG_FULL. For debug purposes it is as easy to redefine DBG locally. By simplifying the DBG macro we can create it consistently and so reduce the number of compiler warnings. Long term, this has to be dynamic. Sigh. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18sna: Validate cpu/gpu damage never overlapsChris Wilson
References: https://bugs.freedesktop.org/show_bug.cgi?id=50477 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-05-03sna: Fix offset for combining damageChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-05-03sna: Avoid reducing damage for synchronisationChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27sna: Use a proxy rather than a temporary bo for too-tall but thin targetsChris Wilson
If the render target is thin enough to fit within the 3D pipeline, but is too tall, we can fudge the address of the origin and coordinates to fit within the constaints of the pipeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-24sna/gen3: Apply damage to video pixmapChris Wilson
Reported-by: Paul Neumann <paul104x@yahoo.de> References: https://bugs.freedesktop.org/show_bug.cgi?id=44504 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-23sna: Assert that the subtract operation does reduce an all-damagedChris Wilson
Somewhere somewhen it appears that I am discarding the all-damaged flag on the pointer. The only possibility I can see is for a no-op subtraction, so put an assert there just in case the impossible is happening. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12sna: Store damage-all in the low bit of the damage pointerChris Wilson
Avoid the function call overhead by inspecting the low bit to see if it is all-damaged already. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-03sna: Use a cheaper no-reduction damage check for simply discarding further ↵Chris Wilson
damage Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-03sna/damage: Mark the box as packed so that the embedded_box is aligned correctlyChris Wilson
valgrind was complaining about an overlapping memcpy on a 64-bit platform as gcc padded the sna_damage_box to 28 bytes... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-25sna: Tweak damage not to reduce if it will not affect the outcome of ↵Chris Wilson
reducing to all Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-24sna: Remove the independent tracking of elts from boxesChris Wilson
Following the switch to a global mode for damage, the elts array became redundant and all that is required is the list of boxes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-12sna: Defer allocation of memory for larger pixmap until first useChris Wilson
In the happy scenario where the pixmap only resides upon the GPU we can forgo the CPU allocation entirely. The goal is to reduce the number of needless mmaps performed by the system memory allocator and reduce overall memory consumption. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-14sna/damage: Always reduce damage for testing PIXMAN_REGION_OUTChris Wilson
Reported-by: Clemens Eisserer <linuxhippy@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=42414 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-14sna: Check whether damage can be reduced to all-damage on moving to GPUChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-12sna/damage: Reduce the damage for evaluating sna_damage_is_allChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-21sna: Fast path unclipped pointsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-21sna: Fast path for unclipped rectanglesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-21sna/damage: Only track the mode globallyChris Wilson
As damage accumulation is handled modally, we do not need to track the mode per elt and so attempt to simplify the code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-20sna: Actually apply the composite offset for the self-copyChris Wilson
I translated the region to copy by the composite pixmap offset, only failed to use the translated region for the actual copy command (using instead the original boxes). Fix that mistake by avoiding the temporary region entirely and applying the translation inplace. We also have to be careful in the case of copying between two composited windows that have different offsets into the same screen pixmap. This fixes the regression introduced with a3466c8b69af (sna/accel: Implement a simpler path for CopyArea between the same pixmaps). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-13sna/damage: Avoid testing against a completey damaged regionChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-07sna: Add some more debug commentary to render picture source migrationChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-04sna: Introduce a new acceleration model.Chris Wilson
The premise is that switching between rings (i.e. the BLT and RENDER rings) on SandyBridge imposes a large latency overhead whilst rendering. The cause is that in order to switch rings, we need to split the batch earlier than is desired and to add serialisation between the rings. Both of which incur large overhead. By switching to using a pure 3D blit engine (ok, not so pure as the BLT engine still has uses for the core drawing model which can not be easily represented without a combinatorial explosion of shaders) we can take advantage of additional efficiencies, such as relative relocations, that have been incorporated into recent hardware advances. However, even older hardware performs better from avoiding the implicit context switches and from the batching efficiency of the 3D pipeline... But this is X, and PolyGlyphBlt still exists and remains in use. So for the operations that are not worth accelerating in hardware, we introduce a shadow buffer mechanism through out and reintroduce pixmap migration. Doing this efficiently is the cornerstone of ensuring that we do exploit the increased potential of recent hardware for running old applications and environments (i.e. so that the latest and greatest chip is actually faster than gen2!) For the curious, sna is SandyBridge's New Acceleration. If you are running older chipsets and welcome the performance increase offered by this patch, then you may choose to call it Snazzy instead. Speedups ======== gen3 firefox-fishtank 1203584.56 (1203842.75 0.01%) -> 85561.71 (125146.44 14.87%): 14.07x speedup gen5 grads-heat-map 3385.42 (3489.73 1.44%) -> 350.29 (350.75 0.18%): 9.66x speedup gen3 xfce4-terminal-a1 4179.02 (4180.09 0.06%) -> 503.90 (531.88 4.48%): 8.29x speedup gen4 grads-heat-map 2458.66 (2826.34 4.64%) -> 348.82 (349.20 0.29%): 7.05x speedup gen3 grads-heat-map 1443.33 (1445.32 0.09%) -> 298.55 (298.76 0.05%): 4.83x speedup gen3 swfdec-youtube 3836.14 (3894.14 0.95%) -> 889.84 (979.56 5.99%): 4.31x speedup gen6 grads-heat-map 742.11 (744.44 0.15%) -> 172.51 (172.93 0.20%): 4.30x speedup gen3 firefox-talos-svg 71740.44 (72370.13 0.59%) -> 21959.29 (21995.09 0.68%): 3.27x speedup gen5 gvim 8045.51 (8071.47 0.17%) -> 2589.38 (3246.78 10.74%): 3.11x speedup gen6 poppler 3800.78 (3817.92 0.24%) -> 1227.36 (1230.12 0.30%): 3.10x speedup gen6 gnome-terminal-vim 9106.84 (9111.56 0.03%) -> 3459.49 (3478.52 0.25%): 2.63x speedup gen5 midori-zoomed 9564.53 (9586.58 0.17%) -> 3677.73 (3837.02 2.02%): 2.60x speedup gen5 gnome-terminal-vim 38167.25 (38215.82 0.08%) -> 14901.09 (14902.28 0.01%): 2.56x speedup gen5 poppler 13575.66 (13605.04 0.16%) -> 5554.27 (5555.84 0.01%): 2.44x speedup gen5 swfdec-giant-steps 8941.61 (8988.72 0.52%) -> 3851.98 (3871.01 0.93%): 2.32x speedup gen5 xfce4-terminal-a1 18956.60 (18986.90 0.07%) -> 8362.75 (8365.70 0.01%): 2.27x speedup gen5 firefox-fishtank 88750.31 (88858.23 0.14%) -> 39164.57 (39835.54 0.80%): 2.27x speedup gen3 midori-zoomed 2392.13 (2397.82 0.14%) -> 1109.96 (1303.10 30.35%): 2.16x speedup gen6 gvim 2510.34 (2513.34 0.20%) -> 1200.76 (1204.30 0.22%): 2.09x speedup gen5 firefox-planet-gnome 40478.16 (40565.68 0.09%) -> 19606.22 (19648.79 0.16%): 2.06x speedup gen5 gnome-system-monitor 10344.47 (10385.62 0.29%) -> 5136.69 (5256.85 1.15%): 2.01x speedup gen3 poppler 2595.23 (2603.10 0.17%) -> 1297.56 (1302.42 0.61%): 2.00x speedup gen6 firefox-talos-gfx 7184.03 (7194.97 0.13%) -> 3806.31 (3811.66 0.06%): 1.89x speedup gen5 evolution 8739.25 (8766.12 0.27%) -> 4817.54 (5050.96 1.54%): 1.81x speedup gen3 evolution 1684.06 (1696.88 0.35%) -> 1004.99 (1008.55 0.85%): 1.68x speedup gen3 gnome-terminal-vim 4285.13 (4287.68 0.04%) -> 2715.97 (3202.17 13.52%): 1.58x speedup gen5 swfdec-youtube 5843.94 (5951.07 0.91%) -> 3810.86 (3826.04 1.32%): 1.53x speedup gen4 poppler 7496.72 (7558.83 0.58%) -> 5125.08 (5247.65 1.44%): 1.46x speedup gen4 gnome-terminal-vim 21126.24 (21292.08 0.85%) -> 14590.25 (15066.33 1.80%): 1.45x speedup gen5 firefox-talos-svg 99873.69 (100300.95 0.37%) -> 70745.66 (70818.86 0.05%): 1.41x speedup gen4 firefox-planet-gnome 28205.10 (28304.45 0.27%) -> 19996.11 (20081.44 0.56%): 1.41x speedup gen5 firefox-talos-gfx 93070.85 (93194.72 0.10%) -> 67687.93 (70374.37 1.30%): 1.37x speedup gen4 evolution 6696.25 (6854.14 0.85%) -> 4958.62 (5027.73 0.85%): 1.35x speedup gen3 swfdec-giant-steps 2538.03 (2539.30 0.04%) -> 1895.71 (2050.62 62.43%): 1.34x speedup gen4 gvim 4356.18 (4422.78 0.70%) -> 3276.31 (3281.69 0.13%): 1.33x speedup gen6 evolution 1242.13 (1245.44 0.72%) -> 953.76 (954.54 0.07%): 1.30x speedup gen6 firefox-planet-gnome 4554.23 (4560.69 0.08%) -> 3758.76 (3768.97 0.28%): 1.21x speedup gen3 firefox-talos-gfx 6264.13 (6284.65 0.30%) -> 5261.56 (5370.87 1.28%): 1.19x speedup gen4 midori-zoomed 4771.13 (4809.90 0.73%) -> 4037.03 (4118.93 0.85%): 1.18x speedup gen6 swfdec-giant-steps 1557.06 (1560.13 0.12%) -> 1336.34 (1341.29 0.32%): 1.17x speedup gen4 firefox-talos-gfx 80767.28 (80986.31 0.17%) -> 69629.08 (69721.71 0.06%): 1.16x speedup gen6 midori-zoomed 1463.70 (1463.76 0.08%) -> 1331.45 (1336.56 0.22%): 1.10x speedup Slowdowns ========= gen6 xfce4-terminal-a1 2030.25 (2036.23 0.25%) -> 2144.60 (2240.31 4.29%): 1.06x slowdown gen4 swfdec-youtube 3580.00 (3597.23 3.92%) -> 3826.90 (3862.24 0.91%): 1.07x slowdown gen4 firefox-talos-svg 66112.25 (66256.51 0.11%) -> 71433.40 (71584.31 0.14%): 1.08x slowdown gen4 gnome-system-monitor 5691.60 (5724.03 0.56%) -> 6707.56 (6747.83 0.33%): 1.18x slowdown gen3 ocitysmap 3494.05 (3502.44 0.20%) -> 4321.99 (4524.42 2.78%): 1.24x slowdown gen4 ocitysmap 3628.42 (3641.66 9.37%) -> 5177.16 (5828.74 8.38%): 1.43x slowdown gen5 ocitysmap 4027.77 (4068.11 0.80%) -> 5748.26 (6282.25 7.38%): 1.43x slowdown gen6 ocitysmap 1401.61 (1402.24 0.40%) -> 2365.74 (2379.14 4.12%): 1.69x slowdown [Note the performance regression for ocitysmap comes from that we now attempt to support rendering to and (more importantly) from large surfaces. By enabling such operations is the only way to one day be faster than purely using the CPU, in the meantime we suffer regression due to the increased migration and aperture thrashing. The other couple of regressions will be eliminated with improved span and shader support, now that the framework for such is in place.] The performance increase for Cairo completely overlooks the other critical aspects of the architecture: World of Padman: gen3 (800x600): 57.5 -> 96.2 gen4 (800x600): 47.8 -> 74.6 gen6 (1366x768): 100.4 -> 140.3 [F15] 144.3 -> 146.4 [drm-intel-next] x11perf (gen6); aa10text: 3.47 -> 14.3 Mglyphs/s [unthrottled!] copywinwin10: 1.66 -> 1.99 Mops/s copywinpix10: 2.28 -> 2.98 Mops/s And we do not have a good measure for how much improvement the reworking of the fallback paths give, except that xterm is now over 4x faster... PS: This depends upon the Xorg patchset "Remove the cacheing of the last scratch PixmapRec" for correct invalidations of scratch Pixmaps (used by the dix to implement SHM operations, used by chromium and gtk+ pixbufs. PPS: ./configure --enable-sna Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>