summaryrefslogtreecommitdiff
path: root/src/i965_render.c
AgeCommit message (Collapse)Author
2011-04-07Revert "i965: Convert to relative relocations for state"Chris Wilson
This reverts commit d2106384be6f9df498392127c3ff64d0a2b17457. Breaks compiz (but not mutter/gnome-shell) on gen6. Not sure if this is not seem deep interaction issue with multiple clients sharing the GPU or just with compiz, but for now we have to revert and suffer the inane performance hit. It looks suspiciously like another deferred damage issue... Bugzilla: 51a27e88b073cff229fff4362cb6ac22835c4044 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-04-07i965: Avoid transform overheads for vertex emit where possibleChris Wilson
Minor improvement as the bottlenecks lie elsewhere. But it was annoying me. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-04-07i965: Refactor to use constant sampler_state offsetsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-04-04i965: Reset vertex_id after every batchChris Wilson
So that we always remember to re-emit the initial vertex elements state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-04-04i965: Always update last_floats_per_vertexChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-04-04Take advantage of the kernel flush for dirty bo in the busy ioctlChris Wilson
Rather than just creating and submitting a batch that simply contains a flush in order to periodically ensure that rendering reaches the scanout, we can simply ask the kernel whether the scanout is busy. The kernel will then submit a flush on our behalf if it is dirty, which takes advantage of the kernel's dirty state tracking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-04-04i965: segregate each vertex element into its own bufferChris Wilson
Reduce the number of relocations emitted by only emitting one relocation per vertex element per vertex buffer. References: https://bugs.freedesktop.org/show_bug.cgi?id=35733 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-04-04i965: Convert to relative relocations for stateChris Wilson
References: https://bugs.freedesktop.org/show_bug.cgi?id=35733 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-17Fix IGD and IGDNG constants to be comprehensibleAdam Jackson
Since, with GPU-on-package, it's hard to talk about a model number for a specific chipset like 855GM, just use the platform names. Signed-off-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-04i965: Fix off-by-one in assertChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-07Revert "i965: The RenderCache flush after every glyph is required for compiz"Chris Wilson
This reverts commit 03e8351179b1c25d219842ef3e01ee8e176f594f. * sigh. This was only meant to be a temporary debugging hack, not for public consumption (or embarrassment). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-07i965: Mark sure we mark reused render targets as dirtyChris Wilson
... or else we may forget to flush them again. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-07i965: The RenderCache flush after every glyph is required for compizChris Wilson
... now who can explain why. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-06snb: Only emit CC and DepthStencil bos once per batchChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-06snb: Restore drawrect, we need the implicit flushChris Wilson
Something is wrong, we should be tracking when to invalidate the caches as appropriate, yet I can not finding the missing flush to replace the implicit one of DRAW_RECTANGLE. Fixes cacomposite. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-06snb: Cache pixmap binding locationsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-06snb: Cache state between composite opsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-06snb: Emit more invariants only onceChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-05i965: Also flush the vertex buffer when restarting the array.Chris Wilson
As a corollary to filling one vertex array and beginning a new one is remembering to emit the old one before overwriting... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-05i965: Check for potential vertex array overflow every timeChris Wilson
There was a reason why we need to check at the start of every composite operation to see if we have enough space in the array to fit the vertices, which I promptly forgot when moving the code around to make it look pretty. * sigh. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-03i965: Amalgamate surface binding tablesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-03i965: Upload an entire vbo in a single pwrite, rather than per-rectangleChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-03i965: Use reciprocal scale factors to avoid the divide per-vertex-elementChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-09i915: Disable maximum state addressesChris Wilson
As the kernel controls the relocation of state buffers, we should not hard code the maximum permissible value for them. Fixes an eventual hang with full-gtt. Reported-by: Peter Clifton <pcjc2@cam.ac.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-02render: acceleration for composite on SandybridgeXiang, Haihao
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
2010-11-02render: set the surface state base addressXiang, Haihao
It is the same as commit 73d4c7d7 Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
2010-10-07Include a chipset generation number to clarify device specific paths.Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-22Make driver compile for 1.6 Xserver series again.Matthias Hopf
Signed-off-by: Matthias Hopf <mhopf@suse.de>
2010-06-25Rename common infrastructure to the intel namespace.Chris Wilson
After splitting out the i810 driver into its own legacy directory, we can identify the common routines not as i830 but as intel. This clarifies the code which *is* i830 specific. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-25i810: Move into a legacy directory.Chris Wilson
The driver is still built but is no longer under active development so move it and supporting files to a new directory. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-21i965: Compile fix.Chris Wilson
Oops, I spent more time discussing these flushing bugs than I spent paying attention to what I was actually doing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-21i965: Mark the render target as dirty within composite_setup()Chris Wilson
The key difference between i965 and earlier, is that the surfaces passed to the samplers through an indirect table and so the batch and render target was not being marked dirty by the relocation (since the relocation only happens within prepare_composite() which may have been in another batch.) Simply call intel_pixmap_mark_dirty() when binding the sampler table into the batch to ensure that the dirty is tracked appropriately. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-21Emit the flush after a potential draw from the BlockHandler.Chris Wilson
As the batch submit may not trigger further drawing through flushing the vertices, pass the requirement to emit the flush down to the submission routine so that the flush can be appended after the final commands. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-14i965: Sanity check ComponentAlpha status in prepare_compositeChris Wilson
Fixes: Bug 28446 - Garbled Font with Mathematica 7 https://bugs.freedesktop.org/show_bug.cgi?id=28446 Rewriting the glyphs to render to the destination directly and removing the more expensive multiple invocations of CompositePicture per picture was a great performance boost -- except that it needs special handling in the backend in order to not fallback. Having done so for i915, I neglected to ensure the sanity checking in i965_prepare_composite() was sufficient. As it turns out, it was not and so we misrendered CA-glyphs when rendering directly to the destination. This causes us to fallback properly, but is a performance regression as we no longer try the 2-pass magic helper before resorting to s/w. At the moment, I'd rather live with the temporary regression and fix i965 to do the same magic as i915, as it critical to fixing the severe performance issues currently crippling i965, as I believe that this regression only affects the minority of applications (incorrect, as it turns out, as the glyphs are overlapping) rendering directly to the destination. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-09Revert "xp:trapezoids"Chris Wilson
This reverts commit f429fb9d872950705e11171d0e7407fb7673c786. An experimental patch I forgot was on my main branch as I was bugfixing. ARGH!
2010-06-08xp:trapezoidsChris Wilson
2010-05-26i965: Remove ATOMIC_BATCH.Chris Wilson
This paranoid check is deceased; pining for the fjords. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26Add a workaround for Ironlake errata relating to disabling the clipper.Eric Anholt
2010-05-24uxa: Use temporary dest when target is too large for compositorChris Wilson
If the destination cannot fit into the 3D pipeline when we need to composite, we fallback to doing the operation on the CPU. This is very slow, and quite easy to trigger on i915 by plugging in an external display. An alternative is to extract the extents of the operation from the destination using the blitter which can usually handle much larger operations. This gives us a temporary target that can fit into the 3D pipeline and thus be accelerated, before copying back into the larger real destination. For x11perf this boosts glyph rendering on PineView, from 38kglyphs/s to 480kglyphs/s. Just a little shy of the native performance of 601kglyphs/s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24Kill paranoid assertions on every write into the batchbuffer.Chris Wilson
On my PineView box these represent ~5% overhead on x11perf text: Before: 16000000 trep @ 0.0020 msec (495000.0/sec): Char in 80-char aa line (Charter 10) 12000000 trep @ 0.0022 msec (461000.0/sec): Char in 80-char rgb line (Charter 10) After: 16000000 trep @ 0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10) 16000000 trep @ 0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-11i965: Add texformats mapping for additional pixman formatsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-10uxa: Rearrange checking and preparing of composite textures.Chris Wilson
x11perf regression caused by 2D driver https://bugs.freedesktop.org/show_bug.cgi?id=28047 caused by commit a7b800513fcc94e063dfd68d2f63b6bab7fae47d uxa: Extract sub-region from in-memory buffers. The issue is that as we extract the region prior to checking whether the composite can in fact be accelerated, we perform expensive surplus operations. This is particularly noticeable for ComponentAlpha text, such as rgb10text. The solution here is to rearrange the check_composite() prior to acquiring the sources, and only extracting the subregion if the render path can not actually handle the texture. Performance (on PineView): a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s now: aa=66800 glyph/s, rgb=28800 glyphs/s The residual lossage seems to be from the extra function call and dixPrivate lookups. Hmm. More warning is the extremely low performance, however the results are consistent so the improvement looks real... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-02-23Add initial defines and probing for SandybridgeEric Anholt
2010-01-08i965: Ensure that URB_FENCE is aligned to 64-bytesChris Wilson
The PRM (Vol 1, p32) specifies that the URB_FENCE command must not cross a cache-line boundary (64-bytes) in order to workaround a silicon issue. Ensure that it does not by inserting an alignment point before the atomic section. This is a slightly too large hammer, but the easiest method to work with the current BEGIN_BATCH/ADVANCE_BATCH protections. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-08i965: Only use the affine kernels if both src and mask are affineChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-08i965: Set src_filter before testing.Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-08i965: Maximum number of vertices per composite is 24, not 18Chris Wilson
Beware the potential buffer overflow. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-07batch: Ensure we send a MI_FLUSH in the block handler for TFPChris Wilson
This should restore the previous level of synchronisation between textures and pixmaps, but *does not* guarantee that a texture will be flushed before use. tfp should be fixed so that the ddx can submit the batch if required to flush the pixmap. A side-effect of this patch is to rename intel_batch_flush() to intel_batch_submit() to reduce the confusion of executing a batch buffer with that of emitting a MI_FLUSH. Should fix the remaining rendering corruption involving tfp [inc compiz]: Bug 25431 [i915 bisected] piglit/texturing_tfp regressed http://bugs.freedesktop.org/show_bug.cgi?id=25431 Bug 25481 Wrong cursor format and cursor blink rate with compiz enabled http://bugs.freedesktop.org/show_bug.cgi?id=25481 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-02Remove flush parameter from intel_batch_flush()Chris Wilson
There is only a single caller that wishes to forcibly append a flush into the batch: intel_sync(). So move the logic there. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-11-29batch: Emit a 'pipelined' flush when using a dirty source.Chris Wilson
Ensure that the render caches and texture caches are appropriately flushed when switching a pixmap from a target to a source. This should fix bug 24315, [855GM] Rendering corruption in text (usually) https://bugs.freedesktop.org/show_bug.cgi?id=24315 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>