summaryrefslogtreecommitdiff
path: root/src/sna/sna_render_inline.h
AgeCommit message (Collapse)Author
2013-02-08sna/gen4: Split the have_render flag in separate prefer_gpu hintsChris Wilson
The idea is to implement more fine-grained checks as we may want different heuristics for desktops with GT1s than for mobile GT2s, etc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-16sna: Revert use of a separate CAN_CREATE_SMALL flagChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-13sna: Relax limitation on not mapping GPU bo with shadow pointersChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-02sna: Fast path inplace addition of solid trapezoidsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-20sna/gen4+: Amalgamate all the gen4-7 vertex buffer emissionChris Wilson
Having reduced all the vb code for these generations to the same set of routines, we can refactor them into a single set of functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna: Include shm hint in render placementChris Wilson
The goal is to reduce the preference of rendering to a SHM pixmap - only if it is already active, will we consider continuing to use it on the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-14sna/gen2+: Experiment with not forcing migration to GPU after CPU rasterisationChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-11-01sna: Try to reduce ping-pong migration for intermixed render/legacy code pathsChris Wilson
References: https://bugs.freedesktop.org/show_bug.cgi?id=56591 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-31sna: Clamp the drawable box to prevent int16 overflowChris Wilson
And assert that the box is valid when migrating. References: https://bugs.freedesktop.org/show_bug.cgi?id=56591 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-07sna/gen2: Compile fixChris Wilson
Be careful when cutting and pasting assertions! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-07sna/gen2: Add a couple of assertions to track down a batch overflowChris Wilson
References: https://bugs.freedesktop.org/show_bug.cgi?id=55700 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-09-20sna/gen3+: Trim the target extents to the CompositeClipChris Wilson
When computing the active region with of a composite operation with unknown extents we try to simply use the whole Drawable. However, this needs to be clipped otherwise it may trigger assertion failure with an offscreen pixmap. References: https://bugs.freedesktop.org/show_bug.cgi?id=55164 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-08-20sna: Remove confusing is_cpu()Chris Wilson
The only real user now has its own heuristics, so convert the remaining users over to !is_gpu(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-08-19sna: Tweak is_cpu/is_gpu heuristicsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-21sna: Refresh experimental userptr vmap supportChris Wilson
Bring the code uptodate with both kernel interface changes and internal adjustments following the creation of CPU buffers with set-cacheing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-14sna: Make sure we check for a busy CPU bo before declaring is-cpuChris Wilson
Even if the pixmap is entirely damaged on the CPU, we still may be in the process of transferring it and so cause an unwanted stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-14sna: Aim for consistency and use stdbool except for core X APIsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-27sna: Remove a trailing ';'Chris Wilson
The unwanted ';' caused is_cpu() to always return false if a GPU bo was attached. Not necessary a bad thing, just misses the potential optimisation where having chosen to prefer to use the CPU path we then have to migrate to the GPU even though the bo is undamaged or idle. Spotted-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17sna: Further refine choice of placement when uploading source data.Chris Wilson
The goal is cheaply spot a simple copy operation that can be performed on the CPU without having to load both parties onto the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17sna: Tweak placement of operationsChris Wilson
Take in account busyness of the damaged GPU bo for considering placement of the subsequent operations. In particular, note that is_cpu is only used for when we feel like the following operation would be better on the CPU and just want to confirm that doing so will not stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-04-19sna: Don't consider upload proxies as being on the GPU for render targetsChris Wilson
The upload proxy is a fake buffer that we do not want to render to as then the damage tracking become extremely confused and the buffer it self is not optimised for persistent rendering. We assert that we do not use it as a render target, and this patch adds the check so that we avoid treating the proxy as a valid target when choosing the render path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-04-03sna/gen3: Convert the clear-color from picture->format to a8r8g8b8Chris Wilson
The shaders treat colours as an argb value, however the clear color is stored in the pixmap's native format (a8, r5g6b5, x8r8g8b8 etc). So before using the value of the clear color as a solid we need to convert it into the a8r8g8b8 format. Reported-by: Clemens Eisserer <linuxhippy@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48204 Reported-by: Paul Neumann <paul104x@yahoo.de> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47308 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-03-22sna: Force fallbacks if the destination is unattachedChris Wilson
Since the removal of the ability to create a backing pixmap after the creation of its parent, it no longer becomes practical to attempt rendering with the GPU to unattached pixmaps. So having made the decision never to render to that pixmap, perform the test explicitly along the render paths. This fixes a segmentation fault introduced in 8a303f195 (sna: Remove existing damage before overwriting with a composite op) which assumed the existence of a backing pixmap along a render path. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47700 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-03-13sna: Prefer to render very thin trapezoids inplaceChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-03-08sna/gen2+: Prefer not to fallback if the source is busyChris Wilson
As if we try to perform the operation with outstanding operations on the source pixmaps, we will stall waiting for them to complete. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-22sna/blt: Avoid clobbering the composite state if we fail to setup the BLTChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30sna: Track large objects and limit prefer-gpu hint to small objectsChris Wilson
As the GATT is irrespective of actual RAM size, we need to be careful not to be too generous when allocating GPU bo and their shadows. So first of all we limit default render targets to those small enough to fit comfortably in RAM alongside others, and secondly we try to only keep a single copy of large objects in memory. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-26sna/gen2+: Include being unattached in the list of source fallbacksChris Wilson
If the source is not attached to a buffer (be it a GPU bo or a CPU bo), a temporary upload buffer would be required and so it is not worth forcing the target to the destination in that case (should the target not be on the GPU already). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14sna: Ensure that the batch mode is always declared before emitting dwordsChris Wilson
Initially, the batch->mode was only set upon an actual mode switch, batch submission would not reset the mode. However, to facilitate fast ring switching with semaphores, reseting the mode upon batch submission is desired which means that if we submit the batch in the middle of an operation we must redeclare its mode before continuing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14sna: Upload continuation vertices into mmapped buffersChris Wilson
In the common case, we expect a very small number of vertices which will fit into the batch along with the commands. However, in full flow we overflow the on-stack buffer and likely several continuation buffers. Streaming those straight into the GTT seems like a good idea, with the usual caveats over aperture pressure. (Since these are linear we could use snoopable bo for the architectures that support such for vertex buffers and if we had kernel support.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14sna: Hint whether we prefer to use the GPU for a pixmapChris Wilson
This includes the condition where the pixmap is too large, as well as being too small, to be allocatable on the GPU. It is only a hint set during creation, and may be overridden if required. This fixes the regression in ocitysmap which decided to render glyphs into a GPU mask for a destination that does not fit into the aperture. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12sna: Store damage-all in the low bit of the damage pointerChris Wilson
Avoid the function call overhead by inspecting the low bit to see if it is all-damaged already. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-04sna: Limit batch to a single page on 865gChris Wilson
Verified on real hw, this undocumented (at least in the bspec before me) bug truly exists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-03sna: Use a cheaper no-reduction damage check for simply discarding further ↵Chris Wilson
damage Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-28sna: Refactor common code for testing gpu busyness of a pixmapChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-22sna: discard damage-all even for width|height==0 operationsChris Wilson
Even if we don't know the extents of the render operation, if the entire pixmap is damaged we can still reduce the damage tracking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-19sna/gen[23]: We need to check the batch before doing an inline flushChris Wilson
A missing check before emitting a dword into the batch opened up the possibility of overflowing the batch and corrupting our state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17sna: Simplify write domain trackingChris Wilson
Replace the growing bitfield with an enum marking where it was last used. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17sna: Map the upload buffer using an LLC boChris Wilson
In order to avoid having to perform a copy of the cacheable buffer into GPU space, we can map a bo as cacheable and write directly to its contents. This is only a win on systems that can avoid the clflush, and also we have to go to greater measures to avoid unnecessary serialisation upon that CPU bo. Sadly, we do not yet go to enough length to avoid negatively impacting ShmPutImage, but that does not appear to be a artefact of stalling upon a CPU buffer. Note, LLC is a SandyBridge feature enabled by default in kernel 3.1 and later. In time, we should be able to expose similar support for snoopable buffers for other generations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10sna: Be more pessimistic with CPU sourcesChris Wilson
Try to avoid a few more unnecessary context switches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10sna/trapezoids: Try to render traps onto a8 destinations in placeChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-13sna/composite: Attempt to reduce the damage is the operation is containedChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-07sna: Expand multiplies of two 16-bit values to a full 32-bit rangeChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-04sna: Add some asserts to detect buffer overflow.Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-04sna/gen6: Poor man's spans layered on top of the exisiting compositeChris Wilson
Performance of this lazy interface looks inconclusive: Speedups ======== xlib swfdec-giant-steps 1063.56 -> 710.68: 1.50x speedup xlib firefox-asteroids 3612.55 -> 3012.58: 1.20x speedup xlib firefox-canvas-alpha 15837.62 -> 13442.98: 1.18x speedup xlib ocitysmap 1106.35 -> 970.66: 1.14x speedup xlib firefox-canvas 33140.27) -> 30616.08: 1.08x speedup xlib poppler 629.97 -> 585.95: 1.08x speedup xlib firefox-talos-gfx 2754.37 -> 2562.00: 1.08x speedup Slowdowns ========= xlib gvim 1363.16 -> 1439.64: 1.06x slowdown xlib midori-zoomed 758.48 -> 904.37: 1.19x slowdown xlib firefox-fishbowl 22068.29 -> 26547.84: 1.20x slowdown xlib firefox-planet-gnome 2995.96 -> 4231.44: 1.41x slowdown It remains off and a curiosity for the time being. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-19sna: Enlarge the minimum pixmap size to migrate for RenderChris Wilson
This is to workaround a ping-pong issue involving small icons. The horrible sequence of operations appears to use a tiled FillRect to copy from the scanout onto to a temporary pixmap, which causes us to readback from the scanout. We are destined to hit the fallback path there anyway until we implement stippling... References: https://bugs.freedesktop.org/show_bug.cgi?id=41718 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-17sna: perform a warnings reduction passChris Wilson
Didn't spot anything that might have led to a genuine bug, but this should help improve the signal-to-noise ratio of warnings in the future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-24sna: Also allow BLT copies to discard the alpha channelChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-04sna: Introduce a new acceleration model.Chris Wilson
The premise is that switching between rings (i.e. the BLT and RENDER rings) on SandyBridge imposes a large latency overhead whilst rendering. The cause is that in order to switch rings, we need to split the batch earlier than is desired and to add serialisation between the rings. Both of which incur large overhead. By switching to using a pure 3D blit engine (ok, not so pure as the BLT engine still has uses for the core drawing model which can not be easily represented without a combinatorial explosion of shaders) we can take advantage of additional efficiencies, such as relative relocations, that have been incorporated into recent hardware advances. However, even older hardware performs better from avoiding the implicit context switches and from the batching efficiency of the 3D pipeline... But this is X, and PolyGlyphBlt still exists and remains in use. So for the operations that are not worth accelerating in hardware, we introduce a shadow buffer mechanism through out and reintroduce pixmap migration. Doing this efficiently is the cornerstone of ensuring that we do exploit the increased potential of recent hardware for running old applications and environments (i.e. so that the latest and greatest chip is actually faster than gen2!) For the curious, sna is SandyBridge's New Acceleration. If you are running older chipsets and welcome the performance increase offered by this patch, then you may choose to call it Snazzy instead. Speedups ======== gen3 firefox-fishtank 1203584.56 (1203842.75 0.01%) -> 85561.71 (125146.44 14.87%): 14.07x speedup gen5 grads-heat-map 3385.42 (3489.73 1.44%) -> 350.29 (350.75 0.18%): 9.66x speedup gen3 xfce4-terminal-a1 4179.02 (4180.09 0.06%) -> 503.90 (531.88 4.48%): 8.29x speedup gen4 grads-heat-map 2458.66 (2826.34 4.64%) -> 348.82 (349.20 0.29%): 7.05x speedup gen3 grads-heat-map 1443.33 (1445.32 0.09%) -> 298.55 (298.76 0.05%): 4.83x speedup gen3 swfdec-youtube 3836.14 (3894.14 0.95%) -> 889.84 (979.56 5.99%): 4.31x speedup gen6 grads-heat-map 742.11 (744.44 0.15%) -> 172.51 (172.93 0.20%): 4.30x speedup gen3 firefox-talos-svg 71740.44 (72370.13 0.59%) -> 21959.29 (21995.09 0.68%): 3.27x speedup gen5 gvim 8045.51 (8071.47 0.17%) -> 2589.38 (3246.78 10.74%): 3.11x speedup gen6 poppler 3800.78 (3817.92 0.24%) -> 1227.36 (1230.12 0.30%): 3.10x speedup gen6 gnome-terminal-vim 9106.84 (9111.56 0.03%) -> 3459.49 (3478.52 0.25%): 2.63x speedup gen5 midori-zoomed 9564.53 (9586.58 0.17%) -> 3677.73 (3837.02 2.02%): 2.60x speedup gen5 gnome-terminal-vim 38167.25 (38215.82 0.08%) -> 14901.09 (14902.28 0.01%): 2.56x speedup gen5 poppler 13575.66 (13605.04 0.16%) -> 5554.27 (5555.84 0.01%): 2.44x speedup gen5 swfdec-giant-steps 8941.61 (8988.72 0.52%) -> 3851.98 (3871.01 0.93%): 2.32x speedup gen5 xfce4-terminal-a1 18956.60 (18986.90 0.07%) -> 8362.75 (8365.70 0.01%): 2.27x speedup gen5 firefox-fishtank 88750.31 (88858.23 0.14%) -> 39164.57 (39835.54 0.80%): 2.27x speedup gen3 midori-zoomed 2392.13 (2397.82 0.14%) -> 1109.96 (1303.10 30.35%): 2.16x speedup gen6 gvim 2510.34 (2513.34 0.20%) -> 1200.76 (1204.30 0.22%): 2.09x speedup gen5 firefox-planet-gnome 40478.16 (40565.68 0.09%) -> 19606.22 (19648.79 0.16%): 2.06x speedup gen5 gnome-system-monitor 10344.47 (10385.62 0.29%) -> 5136.69 (5256.85 1.15%): 2.01x speedup gen3 poppler 2595.23 (2603.10 0.17%) -> 1297.56 (1302.42 0.61%): 2.00x speedup gen6 firefox-talos-gfx 7184.03 (7194.97 0.13%) -> 3806.31 (3811.66 0.06%): 1.89x speedup gen5 evolution 8739.25 (8766.12 0.27%) -> 4817.54 (5050.96 1.54%): 1.81x speedup gen3 evolution 1684.06 (1696.88 0.35%) -> 1004.99 (1008.55 0.85%): 1.68x speedup gen3 gnome-terminal-vim 4285.13 (4287.68 0.04%) -> 2715.97 (3202.17 13.52%): 1.58x speedup gen5 swfdec-youtube 5843.94 (5951.07 0.91%) -> 3810.86 (3826.04 1.32%): 1.53x speedup gen4 poppler 7496.72 (7558.83 0.58%) -> 5125.08 (5247.65 1.44%): 1.46x speedup gen4 gnome-terminal-vim 21126.24 (21292.08 0.85%) -> 14590.25 (15066.33 1.80%): 1.45x speedup gen5 firefox-talos-svg 99873.69 (100300.95 0.37%) -> 70745.66 (70818.86 0.05%): 1.41x speedup gen4 firefox-planet-gnome 28205.10 (28304.45 0.27%) -> 19996.11 (20081.44 0.56%): 1.41x speedup gen5 firefox-talos-gfx 93070.85 (93194.72 0.10%) -> 67687.93 (70374.37 1.30%): 1.37x speedup gen4 evolution 6696.25 (6854.14 0.85%) -> 4958.62 (5027.73 0.85%): 1.35x speedup gen3 swfdec-giant-steps 2538.03 (2539.30 0.04%) -> 1895.71 (2050.62 62.43%): 1.34x speedup gen4 gvim 4356.18 (4422.78 0.70%) -> 3276.31 (3281.69 0.13%): 1.33x speedup gen6 evolution 1242.13 (1245.44 0.72%) -> 953.76 (954.54 0.07%): 1.30x speedup gen6 firefox-planet-gnome 4554.23 (4560.69 0.08%) -> 3758.76 (3768.97 0.28%): 1.21x speedup gen3 firefox-talos-gfx 6264.13 (6284.65 0.30%) -> 5261.56 (5370.87 1.28%): 1.19x speedup gen4 midori-zoomed 4771.13 (4809.90 0.73%) -> 4037.03 (4118.93 0.85%): 1.18x speedup gen6 swfdec-giant-steps 1557.06 (1560.13 0.12%) -> 1336.34 (1341.29 0.32%): 1.17x speedup gen4 firefox-talos-gfx 80767.28 (80986.31 0.17%) -> 69629.08 (69721.71 0.06%): 1.16x speedup gen6 midori-zoomed 1463.70 (1463.76 0.08%) -> 1331.45 (1336.56 0.22%): 1.10x speedup Slowdowns ========= gen6 xfce4-terminal-a1 2030.25 (2036.23 0.25%) -> 2144.60 (2240.31 4.29%): 1.06x slowdown gen4 swfdec-youtube 3580.00 (3597.23 3.92%) -> 3826.90 (3862.24 0.91%): 1.07x slowdown gen4 firefox-talos-svg 66112.25 (66256.51 0.11%) -> 71433.40 (71584.31 0.14%): 1.08x slowdown gen4 gnome-system-monitor 5691.60 (5724.03 0.56%) -> 6707.56 (6747.83 0.33%): 1.18x slowdown gen3 ocitysmap 3494.05 (3502.44 0.20%) -> 4321.99 (4524.42 2.78%): 1.24x slowdown gen4 ocitysmap 3628.42 (3641.66 9.37%) -> 5177.16 (5828.74 8.38%): 1.43x slowdown gen5 ocitysmap 4027.77 (4068.11 0.80%) -> 5748.26 (6282.25 7.38%): 1.43x slowdown gen6 ocitysmap 1401.61 (1402.24 0.40%) -> 2365.74 (2379.14 4.12%): 1.69x slowdown [Note the performance regression for ocitysmap comes from that we now attempt to support rendering to and (more importantly) from large surfaces. By enabling such operations is the only way to one day be faster than purely using the CPU, in the meantime we suffer regression due to the increased migration and aperture thrashing. The other couple of regressions will be eliminated with improved span and shader support, now that the framework for such is in place.] The performance increase for Cairo completely overlooks the other critical aspects of the architecture: World of Padman: gen3 (800x600): 57.5 -> 96.2 gen4 (800x600): 47.8 -> 74.6 gen6 (1366x768): 100.4 -> 140.3 [F15] 144.3 -> 146.4 [drm-intel-next] x11perf (gen6); aa10text: 3.47 -> 14.3 Mglyphs/s [unthrottled!] copywinwin10: 1.66 -> 1.99 Mops/s copywinpix10: 2.28 -> 2.98 Mops/s And we do not have a good measure for how much improvement the reworking of the fallback paths give, except that xterm is now over 4x faster... PS: This depends upon the Xorg patchset "Remove the cacheing of the last scratch PixmapRec" for correct invalidations of scratch Pixmaps (used by the dix to implement SHM operations, used by chromium and gtk+ pixbufs. PPS: ./configure --enable-sna Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>