summaryrefslogtreecommitdiff
path: root/src/i965_render.c
AgeCommit message (Collapse)Author
2010-06-08xp:trapezoidsChris Wilson
2010-05-26i965: Remove ATOMIC_BATCH.Chris Wilson
This paranoid check is deceased; pining for the fjords. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26Add a workaround for Ironlake errata relating to disabling the clipper.Eric Anholt
2010-05-24uxa: Use temporary dest when target is too large for compositorChris Wilson
If the destination cannot fit into the 3D pipeline when we need to composite, we fallback to doing the operation on the CPU. This is very slow, and quite easy to trigger on i915 by plugging in an external display. An alternative is to extract the extents of the operation from the destination using the blitter which can usually handle much larger operations. This gives us a temporary target that can fit into the 3D pipeline and thus be accelerated, before copying back into the larger real destination. For x11perf this boosts glyph rendering on PineView, from 38kglyphs/s to 480kglyphs/s. Just a little shy of the native performance of 601kglyphs/s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24Kill paranoid assertions on every write into the batchbuffer.Chris Wilson
On my PineView box these represent ~5% overhead on x11perf text: Before: 16000000 trep @ 0.0020 msec (495000.0/sec): Char in 80-char aa line (Charter 10) 12000000 trep @ 0.0022 msec (461000.0/sec): Char in 80-char rgb line (Charter 10) After: 16000000 trep @ 0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10) 16000000 trep @ 0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-11i965: Add texformats mapping for additional pixman formatsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-10uxa: Rearrange checking and preparing of composite textures.Chris Wilson
x11perf regression caused by 2D driver https://bugs.freedesktop.org/show_bug.cgi?id=28047 caused by commit a7b800513fcc94e063dfd68d2f63b6bab7fae47d uxa: Extract sub-region from in-memory buffers. The issue is that as we extract the region prior to checking whether the composite can in fact be accelerated, we perform expensive surplus operations. This is particularly noticeable for ComponentAlpha text, such as rgb10text. The solution here is to rearrange the check_composite() prior to acquiring the sources, and only extracting the subregion if the render path can not actually handle the texture. Performance (on PineView): a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s now: aa=66800 glyph/s, rgb=28800 glyphs/s The residual lossage seems to be from the extra function call and dixPrivate lookups. Hmm. More warning is the extremely low performance, however the results are consistent so the improvement looks real... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-02-23Add initial defines and probing for SandybridgeEric Anholt
2010-01-08i965: Ensure that URB_FENCE is aligned to 64-bytesChris Wilson
The PRM (Vol 1, p32) specifies that the URB_FENCE command must not cross a cache-line boundary (64-bytes) in order to workaround a silicon issue. Ensure that it does not by inserting an alignment point before the atomic section. This is a slightly too large hammer, but the easiest method to work with the current BEGIN_BATCH/ADVANCE_BATCH protections. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-08i965: Only use the affine kernels if both src and mask are affineChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-08i965: Set src_filter before testing.Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-08i965: Maximum number of vertices per composite is 24, not 18Chris Wilson
Beware the potential buffer overflow. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-07batch: Ensure we send a MI_FLUSH in the block handler for TFPChris Wilson
This should restore the previous level of synchronisation between textures and pixmaps, but *does not* guarantee that a texture will be flushed before use. tfp should be fixed so that the ddx can submit the batch if required to flush the pixmap. A side-effect of this patch is to rename intel_batch_flush() to intel_batch_submit() to reduce the confusion of executing a batch buffer with that of emitting a MI_FLUSH. Should fix the remaining rendering corruption involving tfp [inc compiz]: Bug 25431 [i915 bisected] piglit/texturing_tfp regressed http://bugs.freedesktop.org/show_bug.cgi?id=25431 Bug 25481 Wrong cursor format and cursor blink rate with compiz enabled http://bugs.freedesktop.org/show_bug.cgi?id=25481 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-02Remove flush parameter from intel_batch_flush()Chris Wilson
There is only a single caller that wishes to forcibly append a flush into the batch: intel_sync(). So move the logic there. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-11-29batch: Emit a 'pipelined' flush when using a dirty source.Chris Wilson
Ensure that the render caches and texture caches are appropriately flushed when switching a pixmap from a target to a source. This should fix bug 24315, [855GM] Rendering corruption in text (usually) https://bugs.freedesktop.org/show_bug.cgi?id=24315 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-11-29batch: Track pixmap domains.Chris Wilson
In order to detect when we require cache flushes we need to track which domains the pixmap currently belongs to. So to do so we create a device private structure to hold the extra information and hook it up. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-11-10Check that batch buffers are atomic.Chris Wilson
Since batch buffers are rarely emitted by themselves but as part of a sequence of state and vertices, the whole sequence is emitted atomically. Here we just enforce that batches are marked as being part of an atomic sequence as appropriate. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-11-05Remove flow-control macros for fallbacks in the 2D driver.Eric Anholt
It's poor style, and has confused new developers.
2009-10-14conf: Add debugging flush optionsChris Wilson
Make the following options available via xorg.conf: Section "Driver" Option "DebugFlushBatches" "1" # Flush the batch buffer after every # single operation; Option "DebugFlushCaches" "1" # Include a MI_FLUSH at the end of every # batch buffer to force data to be # flushed out of cache and into memory # before the completion of the batch. Option "DebugWait" "1" # Wait for the completion of every batch buffer # before continuing, i.e. perform synchronous # rendering. EndSection Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-10-08Call pPixmaps plain old pixmaps.Eric Anholt
2009-10-08de-pCamelHungarian the Render pictures and pixmaps.Eric Anholt
2009-10-08Share several render fields between render implementations.Eric Anholt
Also, start settling on the cairo naming for things: source, mask, and dest.
2009-10-08Rename the xf86 screen private from pScrn to scrn.Eric Anholt
2009-10-08Rename the screen private from I830Ptr pI830 to intel_screen_private *intel.Eric Anholt
This is the beginning of the campaign to remove some of the absurd use of Hungarian in the driver. Not that I don't like Hungarian, but I don't need to know that pI830 is a pPointer.
2009-10-06Move to kernel coding style.Eric Anholt
We've talked about doing this since the start of the project, putting it off until "some convenient time". Just after removing a third of the driver seems like a convenient time, when backporting's probably not happening much anyway.
2009-10-06Remove UMS support.Eric Anholt
At this point, the only remaining feature regressions should be the lack of overlay support (about to land), and the need to update the XVMC code to work in the presence of KMS. Acked-by: Keith Packard <keithp@keithp.com> (in principle) Acked-by: Carl Worth <cworth@cworth.org> (in principle)
2009-09-14Avoid fallbacks for compositing gradient patternsChris Wilson
Currently when asked to composite using a gradient source or mask, we fallback to using fbComposite(). This has the side-effect of causing a readback on the destination surface, stalling the GPU pipeline. Instead, like uxa_trapezoids(), we can use pixman to fill a scratch pixmap and then copy that to an offscreen pixmap for use with uxa_composite(). Speedups on i915: firefox-talos-svg: 710378.14 -> 549262.96: 1.29x speedup No slowdowns. Thanks to Søeren Sandmann Pedersen for spotting the missing ValidatePicture(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-08-07Align tiled pixmap height so we don't address beyond the end of our buffers.Eric Anholt
2009-07-09Remove bad comment about 3DSTATE_DRAWING_RECTANGLE size.Eric Anholt
2009-06-30Enable 2D composite on IGDNGXiang Haihao
This patch enables 2D composite on IGDNG. IGDNG requires new compiled shader programs for Gen5 and some command changes. The most notable is the layout of vertex element has changed, but we tried to keep it as origin to not change shader programs. Also vertex buffer state requires end address of vertex buffer instead of origin max index. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2009-05-26Revert "Rely on BO pixmaps being present in acceleration paths."Carl Worth
This reverts commit 4653a7db622ad54a3182d93c81331765d930db34. Eric was getting a little too ambitious about our brave, new world. We do still want the driver to work with old, non-GEM kernels after all.
2009-04-27Rely on BO pixmaps being present in acceleration paths.Eric Anholt
2009-04-21Unreference allocated bos in i965 render error pathsZdenek Kabelac
Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Eric Anholt <eric@anholt.net>
2009-04-21Fix leak of some 965 render state on VT switch.Zdenek Kabelac
Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com> [anholt: fixed up unneeded != NULL checks] Signed-off-by: Eric Anholt <eric@anholt.net>
2009-04-21Replace a bunch of #ifdef debug flushing/syncing with a single function.Eric Anholt
This removes it from a callsite where it would have just resulted in a fatalerror.
2009-04-21Staticize a bunch of functions and variables in the driver.Eric Anholt
This cleans up findstatic.pl output for the i830+ code, which resulted in removing some code. The only odd part of this commit is the if (0) i830_sdvo_dump() in i830_sdvo.c -- it tells the compiler that the code is used, without using it since we want the code around while debugging. It's also in a likely place to ask for the dump, so I think it's OK.
2009-03-06intel: Nuke shared-entity support (zaphod mode).Eric Anholt
It's been broken for years now, and KMS offers a much better chance of getting this working sensibly without making a mess of the 2D driver.
2009-03-04Fix serious memory leak at Enter/LeaveVTLukas Hejtmanek
This fixes huge memory leak at each VT switch (about 600 BOs + 6MB of RSS of Xserver).
2009-01-27Don't forget the new state bos in check_aperture.Eric Anholt
They're tiny so it shouldn't have been a problem, but play it safe. This is another <5% loss on top of the previously reported value, bringing the whole series to about 8%.
2009-01-21Move i965 render sampler state to BOs.Eric Anholt
This eliminates the pinned memory allocation for 965 render state.
2009-01-21Move i965 render kernels to BOs.Eric Anholt
2009-01-21Move 965 render unit state to BOs.Eric Anholt
This is a first step in a series of changes to avoid requiring a pinned object, which gets in the way of doing non-root KMS. This change appears to result in about a 2-6% loss in x11perf -aa10text, which better algorithms in libdrm could make up for (it hasn't really had to deal with code this bad before).
2009-01-21Remove 965 render wm scratch space, which was just unused.Eric Anholt
2009-01-20Use drm_intel_bo_subdata to put render vb data in.Eric Anholt
This improves performance by avoiding repeated map/unmap cycles, which are a bit expensive on my machine with lock debugging on in the kernel. It could do much better if we did more than 18 or so floats at a time.
2009-01-20Move i965 render vb setup to use time, and decouple state emit from it.Eric Anholt
The require_space had failed since it only checked for the space required by the batch emits in the function itself, but not in the i965_emit_composite_state() that it called (the state we were concerned about having set up for that 12 * 4 dwords to follow!). This is replaced by intel_batch_start_atomic(), which will catch such mistakes in the future.
2009-01-20Move i965 render transform setup from emit_composite_state to prepare_composite.Eric Anholt
2009-01-20i965: Pull check_aperture out to a separate function and make it dtrt.Eric Anholt
Previously it wouldn't count the pixmaps that were about to be used, which is pretty much the only purpose of having the pain around. This also eliminates the check_twice confusion with emit_batch_header_for_composite().
2009-01-20Move filter computation from emit_batch_header to prepare_composite.Eric Anholt
2009-01-20Use intel_emit_reloc from video to prettify 965 render bind_bo setup.Eric Anholt
2009-01-20Move i965 render state bo setup back to prepare_composite.Eric Anholt
We want the objects to be created once per prepare/done both for efficiency and so we can handle aperture checking better.