summaryrefslogtreecommitdiff
path: root/src/sna/gen6_render.c
AgeCommit message (Collapse)Author
2012-01-06sna: Only force a pipeline flush for a change of destination, not sourcesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-06sna/gen6: Reuse current no-blending setup for PictOpClearChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-06sna/gen6: Tidy emission of CC state (blending)Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-05sna/gen6: Only force BLT if the src and dst overlaps for self-copyChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-05sna/gen6: Enable reuse of source PictureChris Wilson
Check if the source and mask are identical pictures and just copy the source channel to the mask in that case. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-23sna: Pass usage-hint to move-to-gpuChris Wilson
When simply creating a source GPU bo it is preferrable not to mark it as all-damaged. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-23sna/gen2+: Reuse source channel for mask where possibleChris Wilson
GTK+ has a clever trick for premultiplying its images by loading the same pixel data into both the source and mask, and then performing the composite. This causes us to upload the same pixel data twice! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-22sna: flatten source alphamapsChris Wilson
Replace the source picture+alpha with a bo that contains the RGB channels from source and A from the alpha map. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-22sna/gen2+: Prefer to use the CPU if we have a source alphamap and CPU picturesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-20sna: Implement extended fallback handling for src == dst copiesChris Wilson
Only marginally better than falling all the way back to using the CPU, is to perform a double copy to workaround the overlapping copy. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-18sna/gen5: Tidy checking against hardcoded maximum 3D sizeChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-18sna/gen[67]: check for context switch after preparing sourceChris Wilson
If we used the BLT to prepare the source, see if we can continue the operation on the BLT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-18sna/gen2+: If we use the BLT to prepare the target, try using BLT for opChris Wilson
If we incurred a context switch to the BLT in order to prepare the target (uploading damage for instance), we should recheck whether we can continue the operation on the BLT rather than force a switch back to RENDER. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-18sna/gen5+: First try a blt composite if the source/dest are too largeChris Wilson
If we will need to extract either the source or the destination, we should see if we can do the entire operation on the BLT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17sna: Simplify write domain trackingChris Wilson
Replace the growing bitfield with an enum marking where it was last used. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17sna: Map the upload buffer using an LLC boChris Wilson
In order to avoid having to perform a copy of the cacheable buffer into GPU space, we can map a bo as cacheable and write directly to its contents. This is only a win on systems that can avoid the clflush, and also we have to go to greater measures to avoid unnecessary serialisation upon that CPU bo. Sadly, we do not yet go to enough length to avoid negatively impacting ShmPutImage, but that does not appear to be a artefact of stalling upon a CPU buffer. Note, LLC is a SandyBridge feature enabled by default in kernel 3.1 and later. In time, we should be able to expose similar support for snoopable buffers for other generations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17sna/gen4+: disable the blend unit for PictOpSrcChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17src/gen4+: Add support for depth 15 render copies/fillsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-12sna/gen6: Only use CPU bo for a render target if untiledChris Wilson
For large render targets, we prefer to use tiled bo in order to avoid severe performance degradation. However, if we don't have a GPU bo but do have a CPU bo and the operation would be untiled, then simply use the CPU bo. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11sna/gen6: Tidy the usage of the max pipeline sizeChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-08sna/gen6: Reduce dst readbacks for unsupported sourcesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-29sna/gen5: Handle cpu-bo for render targetsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-29sna/gen6: Set the batch mode prior to checking limits and flushingChris Wilson
If we change contexts, then we will submit the batch obsoleting the earlier resource checks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-27sna/gen6+: Only use BLT if the untiled bo will cause per-pixel TLB missesChris Wilson
i.e. only force the BLT if using the sampler is going to be incredibly slow. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-25sna/gen5+: Prefer using the BLT when either src or dst is untiledChris Wilson
The cost of the TLB miss on every sample far outweighs the impact of the context (and ring) switch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-14sna: tidy assignment of composite damageChris Wilson
Make sure that the damage is always set, even if only to NULL, so that we are safe if in future the operation state is not initially cleared. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-13sna/composite: Attempt to reduce the damage is the operation is containedChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-12sna/damage: Reduce the damage for evaluating sna_damage_is_allChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-05sna: Do the supported PictOp check firstChris Wilson
There is no point even attempting a BLT operation if we know that it is an unusual render operation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-04sna/gen6: Enable spans interface for boxesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-04sna/gen6: Poor man's spans layered on top of the exisiting compositeChris Wilson
Performance of this lazy interface looks inconclusive: Speedups ======== xlib swfdec-giant-steps 1063.56 -> 710.68: 1.50x speedup xlib firefox-asteroids 3612.55 -> 3012.58: 1.20x speedup xlib firefox-canvas-alpha 15837.62 -> 13442.98: 1.18x speedup xlib ocitysmap 1106.35 -> 970.66: 1.14x speedup xlib firefox-canvas 33140.27) -> 30616.08: 1.08x speedup xlib poppler 629.97 -> 585.95: 1.08x speedup xlib firefox-talos-gfx 2754.37 -> 2562.00: 1.08x speedup Slowdowns ========= xlib gvim 1363.16 -> 1439.64: 1.06x slowdown xlib midori-zoomed 758.48 -> 904.37: 1.19x slowdown xlib firefox-fishbowl 22068.29 -> 26547.84: 1.20x slowdown xlib firefox-planet-gnome 2995.96 -> 4231.44: 1.41x slowdown It remains off and a curiosity for the time being. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-03sna: Support binding of a bo for multiple formatsChris Wilson
Applications may use the same pixmap with multiple formats within the same operation. For instance, you can premultiply and composite a normal pixmap in this manner. However, as we reused the sampler binding locations of the source (without an alpha channel) for the mask, we failed to read and multiply by the alpha channel causing it to remain black instead of transparent. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40926 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-30sna: Don't mess with NDEBUGChris Wilson
This is set in configure and redefining it later inside the C files just leads to trouble and broken compilation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-30Revert "sna: Don't flush the render caches if in the process of writing again"Chris Wilson
This reverts commit 15266e1b9500f6b348661c60d1982bde911f2d0e. KDE relies upon the ability to render into a sampler and then render upon itself. Not the first sign of madness... Will have to find another way of winning back the compwinwin performance.
2011-10-30sna/composite: Fix incorrect operator reduction for RenderFillRectanglesChris Wilson
As exemplified by KDE (using Kate) on gen3, it would attempt to render a large set of boxes using OVER and a transparent colour. As gen3 copied across some of the BLT assumptions, it was incorrectly reducing that to a CLEAR and thus rendering incorrectly. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-29sna/genX: Improve reduction of Render operator to BLT aluChris Wilson
This appeared to introduce a visual gitch into the xfce4 selection box on gen6 at least. References: https://bugs.freedesktop.org/show_bug.cgi?id=42367 Reported-by: Paul Neumann <paul104x@yahoo.de> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-28sna: Don't flush the render caches if in the process of writing againChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-27sna: Fix debug compilation, again.Chris Wilson
2011-10-26sna: Convert diagonal zero-width lines into blitsChris Wilson
This is slower than falling back to swrast for x11perf (up to 4x slower on SNB), it is still faster than doing that rasterisation through a WC-mapping and much faster in ordinary usage due to avoiding the readback hit. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-21sna: Remove the memset(0) of the fill opChris Wilson
The backends are all expected to initialise the state required. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-21sna: Pass a BoxRec to the fill opChris Wilson
For many of the core drawing routines, passing a BoxRec for the fill is more convenient since they already have one generated by the clip intersection. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-18sna/gen6: Don't modify composite arguments whilst checking for BLTChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-18sna/gen6: Precompute floats_per_rectChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-18sna/gen6: Try continuing with the BLT if the last batch was also BLTChris Wilson
In the vain hope of reducing switching between rings and introducing stalls between batches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-18sna/gen6: Micro-optimise gen6_rectangle_beginChris Wilson
We can only emit state between primitives, ergo we need only check for state updates if we've finished the vbo or are starting a new operation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-11sna/gen6: Add render support for fill-one-boxChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-11sna: Support a fast composite hook for solitary boxesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-25sna/kgem: Check all operation bo in a single amalgamationChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-17sna: perform a warnings reduction passChris Wilson
Didn't spot anything that might have led to a genuine bug, but this should help improve the signal-to-noise ratio of warnings in the future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-12sna/gen6: Prefer RENDER for copies as it compacts betterChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>