summaryrefslogtreecommitdiff
path: root/src/sna/sna_render_inline.h
AgeCommit message (Collapse)Author
2014-03-10sna: Pass render hints for migration based on source locationChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-12-11sna/gen8: Initial backend for BroadwellChris Wilson
Should match the functionality of the earlier generations, but untuned. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-19sna: Treat a source with a CPU bo as being attached.Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-03-12sna/gen4: Tweak compilation flags to avoid mixed settings across functionsChris Wilson
Confusing gcc with different flags for supposedly inlined functions is not a good idea. References: https://bugs.freedesktop.org/show_bug.cgi?id=62198 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-02-08sna/gen4: Split the have_render flag in separate prefer_gpu hintsChris Wilson
The idea is to implement more fine-grained checks as we may want different heuristics for desktops with GT1s than for mobile GT2s, etc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-16sna: Revert use of a separate CAN_CREATE_SMALL flagChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-13sna: Relax limitation on not mapping GPU bo with shadow pointersChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-02sna: Fast path inplace addition of solid trapezoidsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-20sna/gen4+: Amalgamate all the gen4-7 vertex buffer emissionChris Wilson
Having reduced all the vb code for these generations to the same set of routines, we can refactor them into a single set of functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna: Include shm hint in render placementChris Wilson
The goal is to reduce the preference of rendering to a SHM pixmap - only if it is already active, will we consider continuing to use it on the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-14sna/gen2+: Experiment with not forcing migration to GPU after CPU rasterisationChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-11-01sna: Try to reduce ping-pong migration for intermixed render/legacy code pathsChris Wilson
References: https://bugs.freedesktop.org/show_bug.cgi?id=56591 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-31sna: Clamp the drawable box to prevent int16 overflowChris Wilson
And assert that the box is valid when migrating. References: https://bugs.freedesktop.org/show_bug.cgi?id=56591 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-07sna/gen2: Compile fixChris Wilson
Be careful when cutting and pasting assertions! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-07sna/gen2: Add a couple of assertions to track down a batch overflowChris Wilson
References: https://bugs.freedesktop.org/show_bug.cgi?id=55700 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-09-20sna/gen3+: Trim the target extents to the CompositeClipChris Wilson
When computing the active region with of a composite operation with unknown extents we try to simply use the whole Drawable. However, this needs to be clipped otherwise it may trigger assertion failure with an offscreen pixmap. References: https://bugs.freedesktop.org/show_bug.cgi?id=55164 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-08-20sna: Remove confusing is_cpu()Chris Wilson
The only real user now has its own heuristics, so convert the remaining users over to !is_gpu(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-08-19sna: Tweak is_cpu/is_gpu heuristicsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-21sna: Refresh experimental userptr vmap supportChris Wilson
Bring the code uptodate with both kernel interface changes and internal adjustments following the creation of CPU buffers with set-cacheing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-14sna: Make sure we check for a busy CPU bo before declaring is-cpuChris Wilson
Even if the pixmap is entirely damaged on the CPU, we still may be in the process of transferring it and so cause an unwanted stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-07-14sna: Aim for consistency and use stdbool except for core X APIsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-27sna: Remove a trailing ';'Chris Wilson
The unwanted ';' caused is_cpu() to always return false if a GPU bo was attached. Not necessary a bad thing, just misses the potential optimisation where having chosen to prefer to use the CPU path we then have to migrate to the GPU even though the bo is undamaged or idle. Spotted-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17sna: Further refine choice of placement when uploading source data.Chris Wilson
The goal is cheaply spot a simple copy operation that can be performed on the CPU without having to load both parties onto the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17sna: Tweak placement of operationsChris Wilson
Take in account busyness of the damaged GPU bo for considering placement of the subsequent operations. In particular, note that is_cpu is only used for when we feel like the following operation would be better on the CPU and just want to confirm that doing so will not stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-04-19sna: Don't consider upload proxies as being on the GPU for render targetsChris Wilson
The upload proxy is a fake buffer that we do not want to render to as then the damage tracking become extremely confused and the buffer it self is not optimised for persistent rendering. We assert that we do not use it as a render target, and this patch adds the check so that we avoid treating the proxy as a valid target when choosing the render path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-04-03sna/gen3: Convert the clear-color from picture->format to a8r8g8b8Chris Wilson
The shaders treat colours as an argb value, however the clear color is stored in the pixmap's native format (a8, r5g6b5, x8r8g8b8 etc). So before using the value of the clear color as a solid we need to convert it into the a8r8g8b8 format. Reported-by: Clemens Eisserer <linuxhippy@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48204 Reported-by: Paul Neumann <paul104x@yahoo.de> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47308 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-03-22sna: Force fallbacks if the destination is unattachedChris Wilson
Since the removal of the ability to create a backing pixmap after the creation of its parent, it no longer becomes practical to attempt rendering with the GPU to unattached pixmaps. So having made the decision never to render to that pixmap, perform the test explicitly along the render paths. This fixes a segmentation fault introduced in 8a303f195 (sna: Remove existing damage before overwriting with a composite op) which assumed the existence of a backing pixmap along a render path. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47700 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-03-13sna: Prefer to render very thin trapezoids inplaceChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-03-08sna/gen2+: Prefer not to fallback if the source is busyChris Wilson
As if we try to perform the operation with outstanding operations on the source pixmaps, we will stall waiting for them to complete. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-22sna/blt: Avoid clobbering the composite state if we fail to setup the BLTChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30sna: Track large objects and limit prefer-gpu hint to small objectsChris Wilson
As the GATT is irrespective of actual RAM size, we need to be careful not to be too generous when allocating GPU bo and their shadows. So first of all we limit default render targets to those small enough to fit comfortably in RAM alongside others, and secondly we try to only keep a single copy of large objects in memory. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-26sna/gen2+: Include being unattached in the list of source fallbacksChris Wilson
If the source is not attached to a buffer (be it a GPU bo or a CPU bo), a temporary upload buffer would be required and so it is not worth forcing the target to the destination in that case (should the target not be on the GPU already). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14sna: Ensure that the batch mode is always declared before emitting dwordsChris Wilson
Initially, the batch->mode was only set upon an actual mode switch, batch submission would not reset the mode. However, to facilitate fast ring switching with semaphores, reseting the mode upon batch submission is desired which means that if we submit the batch in the middle of an operation we must redeclare its mode before continuing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14sna: Upload continuation vertices into mmapped buffersChris Wilson
In the common case, we expect a very small number of vertices which will fit into the batch along with the commands. However, in full flow we overflow the on-stack buffer and likely several continuation buffers. Streaming those straight into the GTT seems like a good idea, with the usual caveats over aperture pressure. (Since these are linear we could use snoopable bo for the architectures that support such for vertex buffers and if we had kernel support.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14sna: Hint whether we prefer to use the GPU for a pixmapChris Wilson
This includes the condition where the pixmap is too large, as well as being too small, to be allocatable on the GPU. It is only a hint set during creation, and may be overridden if required. This fixes the regression in ocitysmap which decided to render glyphs into a GPU mask for a destination that does not fit into the aperture. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12sna: Store damage-all in the low bit of the damage pointerChris Wilson
Avoid the function call overhead by inspecting the low bit to see if it is all-damaged already. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-04sna: Limit batch to a single page on 865gChris Wilson
Verified on real hw, this undocumented (at least in the bspec before me) bug truly exists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-03sna: Use a cheaper no-reduction damage check for simply discarding further ↵Chris Wilson
damage Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-28sna: Refactor common code for testing gpu busyness of a pixmapChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-22sna: discard damage-all even for width|height==0 operationsChris Wilson
Even if we don't know the extents of the render operation, if the entire pixmap is damaged we can still reduce the damage tracking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-19sna/gen[23]: We need to check the batch before doing an inline flushChris Wilson
A missing check before emitting a dword into the batch opened up the possibility of overflowing the batch and corrupting our state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17sna: Simplify write domain trackingChris Wilson
Replace the growing bitfield with an enum marking where it was last used. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17sna: Map the upload buffer using an LLC boChris Wilson
In order to avoid having to perform a copy of the cacheable buffer into GPU space, we can map a bo as cacheable and write directly to its contents. This is only a win on systems that can avoid the clflush, and also we have to go to greater measures to avoid unnecessary serialisation upon that CPU bo. Sadly, we do not yet go to enough length to avoid negatively impacting ShmPutImage, but that does not appear to be a artefact of stalling upon a CPU buffer. Note, LLC is a SandyBridge feature enabled by default in kernel 3.1 and later. In time, we should be able to expose similar support for snoopable buffers for other generations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10sna: Be more pessimistic with CPU sourcesChris Wilson
Try to avoid a few more unnecessary context switches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10sna/trapezoids: Try to render traps onto a8 destinations in placeChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-13sna/composite: Attempt to reduce the damage is the operation is containedChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-07sna: Expand multiplies of two 16-bit values to a full 32-bit rangeChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-04sna: Add some asserts to detect buffer overflow.Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-04sna/gen6: Poor man's spans layered on top of the exisiting compositeChris Wilson
Performance of this lazy interface looks inconclusive: Speedups ======== xlib swfdec-giant-steps 1063.56 -> 710.68: 1.50x speedup xlib firefox-asteroids 3612.55 -> 3012.58: 1.20x speedup xlib firefox-canvas-alpha 15837.62 -> 13442.98: 1.18x speedup xlib ocitysmap 1106.35 -> 970.66: 1.14x speedup xlib firefox-canvas 33140.27) -> 30616.08: 1.08x speedup xlib poppler 629.97 -> 585.95: 1.08x speedup xlib firefox-talos-gfx 2754.37 -> 2562.00: 1.08x speedup Slowdowns ========= xlib gvim 1363.16 -> 1439.64: 1.06x slowdown xlib midori-zoomed 758.48 -> 904.37: 1.19x slowdown xlib firefox-fishbowl 22068.29 -> 26547.84: 1.20x slowdown xlib firefox-planet-gnome 2995.96 -> 4231.44: 1.41x slowdown It remains off and a curiosity for the time being. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-19sna: Enlarge the minimum pixmap size to migrate for RenderChris Wilson
This is to workaround a ping-pong issue involving small icons. The horrible sequence of operations appears to use a tiled FillRect to copy from the scanout onto to a temporary pixmap, which causes us to readback from the scanout. We are destined to hit the fallback path there anyway until we implement stippling... References: https://bugs.freedesktop.org/show_bug.cgi?id=41718 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>