summaryrefslogtreecommitdiff
path: root/src/sna/kgem.c
AgeCommit message (Collapse)Author
2013-01-20sna: Make DEBUG_SYNC a configure optionChris Wilson
As it is advisable to combined the synchronous rendering debug option with other debugging options, it is more convenient to make it into a configure option: --enable-debug=sync Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-20sna: Apply DEBUG_SYNC prior to emitting error reportChris Wilson
This is handy for the case where the batch triggers a GPU hang rather than being rejected by the kernel. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-16sna: Correct DBG to refer to the actual tiling mode forcedChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-16sna: Discard the batch if we are discarding the only buffer in itChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-16sna: Fix computation of large object sizes to prevent overflowChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-16sna: Revert use of a separate CAN_CREATE_SMALL flagChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-14sna: Apply PutImage optimisations to move-to-cpuChris Wilson
We can replace the custom heuristics for PutImage by applying them to the common path, where hopefully they are equally valid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-13sna: Allow creation of a CPU map for pixmaps if neededChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-13sna: Relax limitation on not mapping GPU bo with shadow pointersChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-13sna: Correct a few assertions after enabling read-only mappingsChris Wilson
As these do not flush the active state if we have read-read mappings, we need to be careful with our asserts concerning the busy flag. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-12sna: Experiment with a CPU mapping for certain fallbacksChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-12sna: Tweak max object sizes to take account of aperture restrictionsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-12sna: After a size check, double check the batch before flushingChris Wilson
As we may fail the size check with an empty batch and a pair of large bo, we need to check before submitting that batch in order to not run afoul of our internal sanity checks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-06sna: Try to create userptr with the unsync'ed flag set firstChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-04sna: Clear up the caches after handling a request allocation failureChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-04sna: Embed the pre-allocation of the static request into the deviceChris Wilson
So that in the cache where we are driving multiple independent screens each having their own device, we do not share the global reserved request in the event of an allocation failure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-04sna: Flush the batch prior to referencing work from another ringChris Wilson
In the case where the kernel is inserting semaphores to serialise work between rings, we want to only delay the surface that is coming from the other ring and not interfere with work already queued. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-03sna: Convert allocation request from bytes to num_pages when shrinkingChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-03sna: Add a pair of asserts to validate fls()/cache_bucket()Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-03sna: Also recognise __i386__ for fls asmChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-03sna: Fix off-by-one in C version of flsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-02sna: Rewrite __fls without dependence upon x86 assemblyMatt Turner
The asm() prevents SNA from compiling on ia64. Fixes https://bugs.gentoo.org/show_bug.cgi?id=448570
2013-01-02sna: Fast path inplace addition of solid trapezoidsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-31sna/dri: Fix triple buffering to not penalise missed framesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-29sna: Allow a flush to occur before batching a flush-boChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-28sna: Mark kgem_bo_retire() as staticChris Wilson
The exported function is not used, so mark it static and strengthen the assertions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-26sna: Explicitly track self-relocation entriesChris Wilson
Avoid having to walk the full relocation array for the few entries that need to be updated for the batch buffer offset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-20sna/gen4+: Amalgamate all the gen4-7 vertex buffer emissionChris Wilson
Having reduced all the vb code for these generations to the same set of routines, we can refactor them into a single set of functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-20sna: Ignore throttling during vertex closeChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-17sna: Untangle the confusion of caching large LLC boChris Wilson
We only use a single cache for very large buffers, so we need to be careful that we set the tiling on them. More so, we need to take extra care when allocating large CPU bo from that cache to be sure that they are untiled and the flags are true. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-17sna: Promote pinned-batches to run-time detectionChris Wilson
Now that the feature has been committed upstream, we can rely on the runtime detection. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-17sna: Limit the default upload buffer size to half the cpu cacheChris Wilson
This seems to help with small slow caches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna: Enable support for opting out of the kernel CS workaroundChris Wilson
Keeping a set of pinned batches in userspace is considerably faster as we can avoid the blit overhead. However, combining the two approaches yields even greater performance, as fast as without either w/a, and yet stable. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna: Try to reuse pinned batches by inspecting the kernel busy statusChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna: Precompute the base set of batch-flagsChris Wilson
This is to make it easier to extend in future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna: Only flush at the low fence wm if idleChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna/gen6+: Keep the bo on its current ringChris Wilson
Track the most recent ring each bo is executed on, and prefer to keep it on that ring for the next operation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-14sna: Reduce fence watermarksChris Wilson
Further restrict the amount of fenced bo we try to fit into the batch to make it easier for the kernel to accommodate the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-13sna/gen2: Align surface sizes to an even tileChris Wilson
Makes this 855gm much happier. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-13sna: Fix typo for 830/845 genChris Wilson
Must remember, its octal not decimal. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-12sna: Improve the initialisation failure path for pinned batchesChris Wilson
Simplify the later checks by always populating the lists with a single, albeit unpinned, bo in the case we fail to create pinned batches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-12sna: Fix the error path in kgem_init_pinned_batches() to use the right iterChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-12sna: Pin some batches to avoid CS incoherence on 830/845Chris Wilson
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=26345 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-10sna: Avoid reusing the same 'busy' bit for two different meanings.Chris Wilson
Oops, I thought the 'busy' bit was now used and apparently forgot it is used to control the periodic flushing... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-10sna: Compromise and only flush a split batch if writing to scanoutChris Wilson
A compromise between not flushing quick enough and flushing too often, hopefully. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-10sna: Immediately flush a split batchChris Wilson
If we submit a batch early (for example if the GPU is idle), then submit whatever else the client drew immediately upon completion of its blockhandler. This is required to prevent flashing due to visible delay between the clear at the start of the cycle and then the overdraw later. References: https://bugs.freedesktop.org/show_bug.cgi?id=51718 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-09sna: Replace remaining kgem_is_idle() with kgem_ring_is_idle()Chris Wilson
Further experimentation... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-08sna: Flush upon change of target if GPU is idleChris Wilson
The aim is to improve GPU concurrency by keeping it busy. The possible complication is that we incur more overhead due to small batches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-07sna: Only inspect the target ring for busynessChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-07sna: Only flush before adding fresh surfaces to the batchChris Wilson
Previously, before every operation we would look to see if the GPU was idle and we were running under a DRI compositor. If the GPU was idle, we would flush the batch in the hope that we reduce the cost of the context switch and copy from the compositor (by completing the work earlier). However, we would complete the work far too earlier and as a result would need to flush the batch before every single operation resulting in extra overhead and reduced performance. For example, the gtkperf circles benchmark under gnome-shell/compiz would be 2x slower on Ivybridge. Reported-by: Michael Larabel <michael@phoronix.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>