summaryrefslogtreecommitdiff
path: root/src/sna/kgem.h
AgeCommit message (Collapse)Author
2013-02-07sna: Also assert that the GPU is not wedged before continuing a batchChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-31sna: Make sure the needs_flush is always accompanied by a tracking requestChris Wilson
References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-20sna/gen7: Place the vsync commands in the same cachelineChris Wilson
Do as told; both the LRI and WAIT_FOR_EVENT need to be in the same cacheline for an unspecified reason. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-16sna: Revert use of a separate CAN_CREATE_SMALL flagChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-13sna: Relax limitation on not mapping GPU bo with shadow pointersChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-12sna: Experiment with a CPU mapping for certain fallbacksChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-11sna: Reorder struct kgem_bo to move related data into the same cachelineChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-10sna: Allow CPU bo to copy to GPU bo if the device is idle.Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-04sna: Embed the pre-allocation of the static request into the deviceChris Wilson
So that in the cache where we are driving multiple independent screens each having their own device, we do not share the global reserved request in the event of an allocation failure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-04sna: Flush the batch prior to referencing work from another ringChris Wilson
In the case where the kernel is inserting semaphores to serialise work between rings, we want to only delay the surface that is coming from the other ring and not interfere with work already queued. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-02sna/gen6+: Fine tune placement of DRI copiesChris Wilson
Avoid offsetting the overhead of the render copy only to be penalised by the overhead of the semaphore. So compromise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-31sna/dri: Fix triple buffering to not penalise missed framesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-29sna: Allow a flush to occur before batching a flush-boChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-28sna: Mark kgem_bo_retire() as staticChris Wilson
The exported function is not used, so mark it static and strengthen the assertions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-26sna: Explicitly track self-relocation entriesChris Wilson
Avoid having to walk the full relocation array for the few entries that need to be updated for the batch buffer offset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-20sna/gen4+: Amalgamate all the gen4-7 vertex buffer emissionChris Wilson
Having reduced all the vb code for these generations to the same set of routines, we can refactor them into a single set of functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-17sna: Refine check for an unset context switchChris Wilson
So it appears that we end up performing a context switch on an empty batch, but already has a mode. This is caught later, too late, by assertions. However, we can change the guards slightly to prevent those assertions without altering the code too greatly. And I can then think how to detect where we are setting a mode on the batch but doing no work - which is likely masking a bigger bug. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna: Enable support for opting out of the kernel CS workaroundChris Wilson
Keeping a set of pinned batches in userspace is considerably faster as we can avoid the blit overhead. However, combining the two approaches yields even greater performance, as fast as without either w/a, and yet stable. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna: Precompute the base set of batch-flagsChris Wilson
This is to make it easier to extend in future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-16sna/gen6+: Keep the bo on its current ringChris Wilson
Track the most recent ring each bo is executed on, and prefer to keep it on that ring for the next operation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-13sna: Only flush the batch after an actual relocationChris Wilson
As we may write preparatory instructions into the batch before checking for a flush. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-12sna: Pin some batches to avoid CS incoherence on 830/845Chris Wilson
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=26345 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-10sna: Avoid reusing the same 'busy' bit for two different meanings.Chris Wilson
Oops, I thought the 'busy' bit was now used and apparently forgot it is used to control the periodic flushing... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-09sna: Replace remaining kgem_is_idle() with kgem_ring_is_idle()Chris Wilson
Further experimentation... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-08sna: Flush upon change of target if GPU is idleChris Wilson
The aim is to improve GPU concurrency by keeping it busy. The possible complication is that we incur more overhead due to small batches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-07sna: Convert the ring from BLT/3D to the internal index for kgem_ring_is_idle()Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-07sna: Only inspect the target ring for busynessChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-07sna: Only flush before adding fresh surfaces to the batchChris Wilson
Previously, before every operation we would look to see if the GPU was idle and we were running under a DRI compositor. If the GPU was idle, we would flush the batch in the hope that we reduce the cost of the context switch and copy from the compositor (by completing the work earlier). However, we would complete the work far too earlier and as a result would need to flush the batch before every single operation resulting in extra overhead and reduced performance. For example, the gtkperf circles benchmark under gnome-shell/compiz would be 2x slower on Ivybridge. Reported-by: Michael Larabel <michael@phoronix.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-07sna: Mark proxies as dirty on first relocationChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-12-05sna: Assume that future hardware only gets more flexibleChris Wilson
E.g. that BLT can always write to cacheable memory, inflexible fences are a thing of the past, etc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-11-30Convert generation counter to octalChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-11-21sna: Remove the kgem_bo_is_mappable refcnt assertion from freed pathsChris Wilson
A few callers of kgem_bo_is_mappable operate on freed bo, and so need to avoid the assert(bo->refcnt). References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-11-21sna: Add a few refcnt assertionsChris Wilson
References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-11-08sna: Experiment with using reloc.handle as an index into the execbufferChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-11-08sna: Support a fast no relocation changed pathChris Wilson
x11perf -copywinwin10 on gm45 with c2d L9400: before: 553,000 op/s after: 565,000 op/s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-31sna: Preserve mode if flushing before a scanline waitChris Wilson
Reported-by: Jiri Slaby <jirislaby@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=47597 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-23sna: Beware 16-bit overflow when computing sample areasChris Wilson
Reported-by: Ognian Tenchev <drJeckyll@Jeckyll.net> References: https://bugs.freedesktop.org/show_bug.cgi?id=56324 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-17sna: Enable support for SECURE batch buffersChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-16sna: Drop fake tiled CPU mappingChris Wilson
The only path where this is correct already handles it as the special case that it is, everywhere else it just nonsense. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-07sna: Check that for batch overflows after advancing a BLTChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-10-04sna/gen2: Prevent using the GTT maps with I915_TILING_Y on 855gmChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-09-21sna: Use inplace X tiling for LLC uploadsChris Wilson
Based on a suggestion by Chad Versace (taken from a patch for mesa). This allows for a faster upload of pixel data through a ShmImage, or for complete replacement of a GPU bo. Using a modified version of x11perf to upload to a pixmap rather than scanout on an IVB i7-3720qm: Before: 40000000 trep @ 0.0007 msec (1410000.0/sec): ShmPutImage 10x10 square 4000000 trep @ 0.0110 msec ( 90700.0/sec): ShmPutImage 100x100 square 160000 trep @ 0.1689 msec ( 5920.0/sec): ShmPutImage 500x500 square After: 40000000 trep @ 0.0007 msec (1450000.0/sec): ShmPutImage 10x10 square 6000000 trep @ 0.0061 msec ( 164000.0/sec): ShmPutImage 100x100 square 400000 trep @ 0.1126 msec ( 8880.0/sec): ShmPutImage 500x500 square However, the real takeaway from this is that the overheads for ShmPutImage are substantial, only hitting around 70% expected efficiency, and overshadowed by PutImage, which for reference is 60000000 trep @ 0.0006 msec (1800000.0/sec): PutImage 10x10 square Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-09-18sna/gen7: Add some ring switching sanity checksChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-09-12sna: Keep a very small, short-lived cache of large buffersChris Wilson
As we now regularly retire and so discard the temporary large buffers, we find them in short supply and ourselves wasting lots of time creating and destroying the transient buffers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-09-12sna: Flush after operating on large buffersChris Wilson
As we know that such operations are likely to be slow and consume precious GTT space, mark them as candidates for flushing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-09-06sna: Apply the minimum 256 pitch to CREATE_USAGE_SHARED pixmaps as wellChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-09-04sna: Fixup CREATE_USAGE_SHAREDChris Wilson
The DRI2 code tries to create pixmaps with non-zero width/height, whoops. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-09-04sna: Port prime interfacingChris Wilson
Preliminary prime support. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-08-28sna: Propagate the request to flush rather than directly submit the batchChris Wilson
The subtly is that we need to reset the mode correctly after submitting the batch which was not handled by kgem_flush(). If we fail to set the appropriate mode then the next operation will be on a random ring, which can prove fatal with SandyBridge+. Reported-by: Reinis Danne <reinis.danne@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-08-27sna: Track outstanding requests per-ringChris Wilson
In order to properly track when the GPU is idle, we need to account for the completion order that may differ on architectures like SandyBridge with multiple mostly independent rings. Reported-by: Clemens Eisserer <linuxhippy@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54127 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>