summaryrefslogtreecommitdiff
path: root/src/sna/gen7_render.c
AgeCommit message (Collapse)Author
2013-09-27sna/gen4+: Handle very large copies more gracefullyChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-09-25sna/gen6+: Fallback to BLT composite if fallback is forcedChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-09-18intel: Compile fixes for base install of SLED11.sp3Chris Wilson
Highlights of that distribution include xorg-xserver-1.6.5, kernel 3.0.76 and gcc-4.3. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-09-08sna/gen7: Prefer the BLT for gt1 systemsChris Wilson
On gt1, the BCS is faster than the RCS for all equivalent operations, unlike gt2+ where the RCS is faster (but at greater power draw). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-09-06sna/gen6+: Switch to using the BLT more often when off ACChris Wilson
The BLT is more power-efficient for the operations it can handle, so use it when possible (following the usual caveats) if we know we only have battery power. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-09-05sna/gen6+: Don't request extra caching for use-once upload buffersChris Wilson
As we only use these buffers once, we should not benefit from requesting them to be moved into L3/LLC cache - over and above the default recommendations we make when creating the buffer. Indeed, this may even lead to artefacts if we fail to invalidate those other caches when reusing the buffers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-08-28sna/gen6+: Don't force a ring switch for likely TLB misses if already busyChris Wilson
If the target is already on the render ring, don't force the switch away. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-08-28sna/gen6+: Improve ring stickyness for BLT composite opsChris Wilson
Rearrange the tests so that we check both src/dst for which rings they are currently on. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-08-25sna/gen7: Combine a couple of pipe-flushesChris Wilson
Reduce the number of pipe-controls we emit by combining one of the frequent flushes with a stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-08-23sna/gen7: Prefer the render ring for more operationsChris Wilson
As we get more well-endowed GPUs with ever more execution units, it becomes advantageous to do even basic copies through the render ring. However, the extra performance comes at a cost - higher power usage. To mitigate this, we apply a heuristic of only allowing a switch over to the render ring if the render ring is already active with an early request (in addition to the usual stall avoidance and general performance heuristics). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-08-18sna/gen6+: Tweak semaphore avoidance for composite operationsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-08-11sna/gen7: Refine ring selectionChris Wilson
Don't force us to select BLT too early if we allow ring switching. As the RENDER ring benefits from more cacheing over time (e.g. HSW:GT3e) it becomes much more preferable to use it over the BLT. Since we already have the logic to decide if ring switching is possible/preferred, relax the initial checks on where the current activity is to allow switching between batches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-30sna: Allow CPU access to scanouts if WT cachedChris Wilson
On Iris, we may store the framebuffer in the eLLC/LLC and mark it as being Write-Through cached. This means that we can treat it as being cached for read accesses (either by the GPU or CPU), but must be careful to still not write directly to the scanout with the CPU (only the GPU writes are cached and coherent with the display). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-30sna/gen7: Set appropriate constants for Haswell GT3Chris Wilson
GT3 has twice the number of cores and URB as GT2, and so we can use more threads and URB entries. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-28sna/gen7: Prefer GPU spans for Baytrail as wellChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-28sna/gen7: Use ivb/byt/hsw shorthand for generation checkingChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-28sna/gen7: Rename Valleyview to BaytrailChris Wilson
The codename changed midcycle - along more rational lines (all the chips within the platform are now part of the Baytrail family rather than different codenames for each). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-28sna/gen7: Set appropriate thread counts for Valleyview^BaytrailChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-28intel: Remove some unused macrosChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-22sna: Fix DBG compilationChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-07-13sna: Skip copying to the intermediate target if we will completely overwrite itChris Wilson
Occasionally when forced to use an intermediate destination surface, we know that we will completely overwrite the contents of the surface and so we can forgo the initial copy from the target. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-06-28sna/gen2+: Consider precision in render operation placementChris Wilson
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66297 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-06-26sna/gen4+: Fix determination of intermediate extentsChris Wilson
Complete logic fail for finding the bounding box of the boxes to be copied. Reported-by: Clemens Eisserer <linuxhippy@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66168 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-06-14sna/gen3+: Remove redundant clearing of clear hint in video playbackChris Wilson
The clear hint is correctly updated when performing the move-to-gpu and so it is being superfluously repeated by the callers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-06-13sna/gen7: Set sampler swizzle for video sourcesChris Wilson
Otherwise the sampler on Haswell will just read all zeros when trying to playback a video. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65699 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-06-13sna/gen2+: Promote a conditional dirty into an assertionChris Wilson
If the target bo is not bound when we start to emit the composite state for the operation, we are screwed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-06-06sna: Fix format specifier for mismatching int/long in DBGChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-06-05sna: Include the GT details in the backend name for a chipsetChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-05-28sna: Make the backend identifier more informativeChris Wilson
This is useful, for example, with the multiple gen7 variants. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-05-09sna/gen7: Add DBG for channel setup for render sourceChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-04-10sna/gen7: Cache our kernels in L3Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-03-28sna/gen7: Refine is_gt2() for Haswell versus IvybridgeChris Wilson
The two similar chipsets do not use the same PCI-ID encoding schema. Fixes regression from commit 235a3981ea9759317b392302a2b2b8f4fafab410 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 26 20:37:14 2013 +0000 sna/gen7: Use GT2 values for GT2 variants Reported-by: zaverel@free.fr Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-03-27sna/gen7: Resist the temptation to overprogram the number of PS threads for HSWChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-03-27sna/gen7: Fix MOCS for HaswellChris Wilson
The memory attributes changed slightly, and in particular there is now an explicit uncached setting - which of course happened to be the value currently selected. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-03-27sna/gen7: Restore render acceleration for VLV power-on boardChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-03-27sna/gen7: Prefer spans for GT2 desktop variantsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-03-27sna/gen7: Use GT2 values for GT2 variantsChris Wilson
The check was only testing for GT2+ and excluding the normal GT2 devices. See also commit ce9f0448367ea6a90490a28150bfdc0a76500129 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Feb 8 16:01:54 2013 +0000 sna/gen6: Use GT2 settings for both GT2 and GT2+ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-03-14sna/gen5+: Add missing float casts in computation of scaled src offsetsChris Wilson
Without the casts, the division ends up as 0 rather than the fractional offset into the texture. The casts were missed in the claimed fix: commit 89038ddb96aabc4bc1f04402b2aca0ce546e8bf3 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Feb 28 14:35:54 2013 +0000 sna/video: Correct scaling of source offsets Reported-by: Roman Elshin <roman.elshin@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62343 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-02-28sna/video: Correct scaling of source offsetsChris Wilson
When applying pan and zoom to a mismatched video, it would inevitably miscompute the origin and scale factors. Reported-by: Matti Hamalainen <ccr@tnsp.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61610 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-02-26sna: Reverse inverted assertionsChris Wilson
Oops, the assertions that we had sufficient free space was inverted. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-02-26sna/gen4+: Begin specialising vertex programs for ISAChris Wilson
Allow use of advanced ISA when available by detecting support at runtime. This initial work just uses GCC to emit varying ISA, future work could use hand written code for these hot spots. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-02-25sna/gen3+: Restart vertex space checks after lock contentionChris Wilson
If we end up contending for the vertex lock, we need to double check there is sufficient vertex space left for us. Bugzill: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1124576 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-02-25sna/gen3+: Assert that nbox is not 0Chris Wilson
Various assertions to track down a potential programming error. References: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1124576 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-02-22sna/gen7: Skip CLEAR_PARAMS for the null depthbufferChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-02-22sna/gen7: Only a pipeline stall is required for the CA passChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-02-08sna/gen4: Split the have_render flag in separate prefer_gpu hintsChris Wilson
The idea is to implement more fine-grained checks as we may want different heuristics for desktops with GT1s than for mobile GT2s, etc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-31sna/gen3+: Flush vertex threads before touching global stateChris Wilson
We need to be careful not just when finishing the current vbo to synchronize with the sharing threads, but also before we emit the batch state that no other thread will try and do the same. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-29sna/gen3+: Fix a DBG for composite_boxes()Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-29sna: Add GT1/GT2 thread counts for HaswellChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27sna: Replace the forced vertex finish with just a waitChris Wilson
When completing a batch mid-operation, we need to wait upon the other threads to complete their writes so that memory is coherent before submitting the work to the GPU. This was achieved by forcing the finish, but all that from that is the wait, which makes the handling of threads much explicit and removes the unnecessary vbo refresh. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>