summaryrefslogtreecommitdiff
path: root/src/sna/sna_display.c
AgeCommit message (Collapse)Author
2012-01-05sna: Only force a batch continuation if the scanout is written toChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-31sna: Make sure the shadow pixmap is suitable for scanoutChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17sna: Simplify write domain trackingChris Wilson
Replace the growing bitfield with an enum marking where it was last used. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-17sna: Map the upload buffer using an LLC boChris Wilson
In order to avoid having to perform a copy of the cacheable buffer into GPU space, we can map a bo as cacheable and write directly to its contents. This is only a win on systems that can avoid the clflush, and also we have to go to greater measures to avoid unnecessary serialisation upon that CPU bo. Sadly, we do not yet go to enough length to avoid negatively impacting ShmPutImage, but that does not appear to be a artefact of stalling upon a CPU buffer. Note, LLC is a SandyBridge feature enabled by default in kernel 3.1 and later. In time, we should be able to expose similar support for snoopable buffers for other generations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-08xf86-video-intel: change order of DPMS operationsSimon Que
The operations when setting dpms on should be in the order opposite of what's done when setting dpms off. This is because of potentially conflicting effects: ~ drmModeConnectoSetProperty() enables/disables the backlight driver. Some backlight drivers such as intel_backlight set the backlight to 0 when disabled and to max when enabled. ~ intel_output_dpms_backlight() saves the backlight value when turning DPMS off and restores it when turning DPMS on. Here's the current order of operations: xset dpms force off (backlight is nonzero) drmModeConnectoSetProperty(DPMSModeOff) kernel: disable backlight, backlight=0 intel_output_dpms_backlight(DPMSModeOff) save backlight value (0) <-- it has been set to 0 by kernel set backlight to 0 xset dpms force on drmModeConnectoSetProperty(DPMSModeOn) kernel: enable backlight, backlight=max intel_output_dpms_backlight(DPMSModeOn) set backlight to saved value (0) The correct way to do this would be to reverse the operations during xset dpms force off: intel_output_dpms_backlight(DPMSModeOff) save backlight value (nonzero) set backlight to 0 drmModeConnectoSetProperty(DPMSModeOff) kernel: enable backlight, backlight=0 This restores the saved nonzero backlight value during the force on. Signed-off-by: Simon Que <sque@chromium.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-16sna: Reduce and clarify dependenciesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-16sna: And keep unity happyChris Wilson
Rewrite the DRI layer to avoid the various bugs and shortcomings of the Xserver and interfacing with mesa. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38732 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39044 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-08sna: Begin hooking up valgrind/memcheckChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-05sna: s/flush/vblank/ fixes for DBG()Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-04sna: Run the deferred flush at vrefreshChris Wilson
This helps to reduce the perceived jerkiness of the redraw. Reported-by: Clemens Eisserer <linuxhippy@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42413 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-31sna: Apply the GPU damage for clipped PolyFillRectanglesChris Wilson
Reported-by: Clemens Eisserer <linuxhippy@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42425 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-10-31sna: Set the flush interval based on output vrefreshChris Wilson
Rather than a blank 25Hz, use twice the vblank interval to hopefully avoid bad values. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-30sna/gen6: Fix offset of Scan-Line-Compare registerChris Wilson
Reported-by: Frank Mariak <fmariak@macrosystem.de> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-17sna: perform a warnings reduction passChris Wilson
Didn't spot anything that might have led to a genuine bug, but this should help improve the signal-to-noise ratio of warnings in the future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-25sna/display: Destroy shadow dataChris Wilson
Under certain circumstances the shadow can be destroy after being allocated but before being created. The pixmap is a NULL pointer at that time, but we know that its value should be data, so just use the data pointer instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-26sna: Revert enabling scan-line wait on SNBChris Wilson
Hanging the machine does indeed prevent video tearing. Just not quite what the user expected... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39497 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-20sna: Enable gen6 scan-line waitingChris Wilson
The code was ready and waiting, just forgot to turn it on. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-07sna: Take advantage of the needs_flush tracking on the front bufferChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-25sna/display: Protect against drmModeGetCrtc returning NULLChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-24sna: Wrap the fbcon in a scratch pixmap for render-copy across depth changesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-24sna: Support depth-30 and some more logging to show the depthChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-24sna: Clip the fbcon to the frontbufferChris Wilson
...both to correct the placement of the fbcon into the smaller scanout and to ensure that we correctly clip the boxes to be copied. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-23sna: Debug compile fix, and some extra commentsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-20sna: Don't perform a GPU copy of the scanout if it is wedged.Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-17sna/display: After copying the fbcon, tell the server that we have a backgroundChris Wilson
... so that the core knows to skip the clear. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-17sna/display: Apply damage for the fbcon copyChris Wilson
... so that any immediate shadow usage will read back the fbcon contents. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-17sna: Copy the fbcon contents onto the front buffer upon X startupChris Wilson
This patch has been carried by the distributions every since they started doing graphical boot splashes. Time to integrate it and give it some TLC. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-17sna/display: Remove the caching of the drmModeCrtcChris Wilson
We only use it for the id. Everything else stored on it, like the buffer_id, is not permanent and we need to query the current status as required. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-14sna: Split zaphod across the crtcsChris Wilson
Since we have no global resource allocator for zaphod mode, that's what RandR-1.4 solves, we have to further constrain zaphod mode to only use one crtc per screen. This also means that you must match the output restrictions within the Screen definitions, noting that the crtc pipe id corresponds with the screen number. Reportede-by: Phillp Haddad <phillip.haddad@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-14sna: Compile fix for debugging enabledChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-13sna: Invalidate the mode if the front pixmap was swapped whilst blankedChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-13sna/dri: Accurately track front and pending front for async flipsChris Wilson
By not tracking the front buffer correctly, i.e. performing the exchange on every swap, GL_FRONT was no longer pointing at the updated buffer and neither was the root pixmap. So both X and GL would read the wrong buffer was the flip was pending. The other issue was that we would feed the old front buffer back to the application as a future back buffer (due to buffer caching) and so the kernel would duly insert a WAIT_EVENT for the pending flip to complete before allowing rendering to affect it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-12sna: Just do a pointer exchange when flipping with no scanoutChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-12sna: Check that the scanout is still attached before waiting for scanlineChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-12sna/dri: Fix tripple-buffering for vblank_mode=0Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-10sna: Use the ScreenPixmap->serialNumber as a generation countChris Wilson
DRI2 presumes that the pixmap->serialNumber can be used as unique id. If it changes revokes *all* the buffers, it presumes a new pixmap has been attached to the window, for example after a reconfiguration event (resizing of a window, or a mode switch). However, as we updated the root pixmap upon a pageflip, we were triggering revocations everytime, causing further revocations and massive aperture thrashing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-10sna: Use temporary for storing the current crtc box when computing best crtcChris Wilson
... as the caller may be reusing an input parameter for the result. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-10sna: Remember to decouple the fb on closingChris Wilson
... so that we actually attach a new one after regen! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-08sna/dri: valgrindifyChris Wilson
Lots of scary warnings found by valgrind. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-07sna: Add zaphod supportChris Wilson
Zaphod support is a rudimentary method for creating an Xserver with multiple screens from a single device. The Device is instantiated, with a duplication of its resources, as many as required up to a maximum of the number of its outputs, and each instance is attached to a Screen and added to the ServerLayout. A Device can be bound to a selection of outputs using a comma separated list of RandR names. Note: in general, this is not the preferred solution! And will be superseded by per-crtc-pixmaps in RandR-1.4. For example, the following xorg.conf fragment creates an XServer with two screens, one attached to the LVDS panel on the laptop, and the other to any external output: Section "Device" Identifier "Intel0" Driver "intel" BusID "PCI:0:2:0" Option "ZaphodHeads" "LVDS1" Screen 0 EndSection Section "Device" Identifier "Intel1" Driver "intel" BusID "PCI:0:2:0" Option "ZaphodHeads" "DVI1,VGA1" Screen 1 EndSection Section "Screen" Identifier "Screen0" Device "Intel0" EndSection Section "Screen" Identifier "Screen1" Device "Intel1" EndSection Section "ServerLayout" Identifier "default" Screen "Screen0" Screen "Screen1" EndSection Based on a patch by Ben Skegs <bskeggs@redhat.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-04sna: Introduce a new acceleration model.Chris Wilson
The premise is that switching between rings (i.e. the BLT and RENDER rings) on SandyBridge imposes a large latency overhead whilst rendering. The cause is that in order to switch rings, we need to split the batch earlier than is desired and to add serialisation between the rings. Both of which incur large overhead. By switching to using a pure 3D blit engine (ok, not so pure as the BLT engine still has uses for the core drawing model which can not be easily represented without a combinatorial explosion of shaders) we can take advantage of additional efficiencies, such as relative relocations, that have been incorporated into recent hardware advances. However, even older hardware performs better from avoiding the implicit context switches and from the batching efficiency of the 3D pipeline... But this is X, and PolyGlyphBlt still exists and remains in use. So for the operations that are not worth accelerating in hardware, we introduce a shadow buffer mechanism through out and reintroduce pixmap migration. Doing this efficiently is the cornerstone of ensuring that we do exploit the increased potential of recent hardware for running old applications and environments (i.e. so that the latest and greatest chip is actually faster than gen2!) For the curious, sna is SandyBridge's New Acceleration. If you are running older chipsets and welcome the performance increase offered by this patch, then you may choose to call it Snazzy instead. Speedups ======== gen3 firefox-fishtank 1203584.56 (1203842.75 0.01%) -> 85561.71 (125146.44 14.87%): 14.07x speedup gen5 grads-heat-map 3385.42 (3489.73 1.44%) -> 350.29 (350.75 0.18%): 9.66x speedup gen3 xfce4-terminal-a1 4179.02 (4180.09 0.06%) -> 503.90 (531.88 4.48%): 8.29x speedup gen4 grads-heat-map 2458.66 (2826.34 4.64%) -> 348.82 (349.20 0.29%): 7.05x speedup gen3 grads-heat-map 1443.33 (1445.32 0.09%) -> 298.55 (298.76 0.05%): 4.83x speedup gen3 swfdec-youtube 3836.14 (3894.14 0.95%) -> 889.84 (979.56 5.99%): 4.31x speedup gen6 grads-heat-map 742.11 (744.44 0.15%) -> 172.51 (172.93 0.20%): 4.30x speedup gen3 firefox-talos-svg 71740.44 (72370.13 0.59%) -> 21959.29 (21995.09 0.68%): 3.27x speedup gen5 gvim 8045.51 (8071.47 0.17%) -> 2589.38 (3246.78 10.74%): 3.11x speedup gen6 poppler 3800.78 (3817.92 0.24%) -> 1227.36 (1230.12 0.30%): 3.10x speedup gen6 gnome-terminal-vim 9106.84 (9111.56 0.03%) -> 3459.49 (3478.52 0.25%): 2.63x speedup gen5 midori-zoomed 9564.53 (9586.58 0.17%) -> 3677.73 (3837.02 2.02%): 2.60x speedup gen5 gnome-terminal-vim 38167.25 (38215.82 0.08%) -> 14901.09 (14902.28 0.01%): 2.56x speedup gen5 poppler 13575.66 (13605.04 0.16%) -> 5554.27 (5555.84 0.01%): 2.44x speedup gen5 swfdec-giant-steps 8941.61 (8988.72 0.52%) -> 3851.98 (3871.01 0.93%): 2.32x speedup gen5 xfce4-terminal-a1 18956.60 (18986.90 0.07%) -> 8362.75 (8365.70 0.01%): 2.27x speedup gen5 firefox-fishtank 88750.31 (88858.23 0.14%) -> 39164.57 (39835.54 0.80%): 2.27x speedup gen3 midori-zoomed 2392.13 (2397.82 0.14%) -> 1109.96 (1303.10 30.35%): 2.16x speedup gen6 gvim 2510.34 (2513.34 0.20%) -> 1200.76 (1204.30 0.22%): 2.09x speedup gen5 firefox-planet-gnome 40478.16 (40565.68 0.09%) -> 19606.22 (19648.79 0.16%): 2.06x speedup gen5 gnome-system-monitor 10344.47 (10385.62 0.29%) -> 5136.69 (5256.85 1.15%): 2.01x speedup gen3 poppler 2595.23 (2603.10 0.17%) -> 1297.56 (1302.42 0.61%): 2.00x speedup gen6 firefox-talos-gfx 7184.03 (7194.97 0.13%) -> 3806.31 (3811.66 0.06%): 1.89x speedup gen5 evolution 8739.25 (8766.12 0.27%) -> 4817.54 (5050.96 1.54%): 1.81x speedup gen3 evolution 1684.06 (1696.88 0.35%) -> 1004.99 (1008.55 0.85%): 1.68x speedup gen3 gnome-terminal-vim 4285.13 (4287.68 0.04%) -> 2715.97 (3202.17 13.52%): 1.58x speedup gen5 swfdec-youtube 5843.94 (5951.07 0.91%) -> 3810.86 (3826.04 1.32%): 1.53x speedup gen4 poppler 7496.72 (7558.83 0.58%) -> 5125.08 (5247.65 1.44%): 1.46x speedup gen4 gnome-terminal-vim 21126.24 (21292.08 0.85%) -> 14590.25 (15066.33 1.80%): 1.45x speedup gen5 firefox-talos-svg 99873.69 (100300.95 0.37%) -> 70745.66 (70818.86 0.05%): 1.41x speedup gen4 firefox-planet-gnome 28205.10 (28304.45 0.27%) -> 19996.11 (20081.44 0.56%): 1.41x speedup gen5 firefox-talos-gfx 93070.85 (93194.72 0.10%) -> 67687.93 (70374.37 1.30%): 1.37x speedup gen4 evolution 6696.25 (6854.14 0.85%) -> 4958.62 (5027.73 0.85%): 1.35x speedup gen3 swfdec-giant-steps 2538.03 (2539.30 0.04%) -> 1895.71 (2050.62 62.43%): 1.34x speedup gen4 gvim 4356.18 (4422.78 0.70%) -> 3276.31 (3281.69 0.13%): 1.33x speedup gen6 evolution 1242.13 (1245.44 0.72%) -> 953.76 (954.54 0.07%): 1.30x speedup gen6 firefox-planet-gnome 4554.23 (4560.69 0.08%) -> 3758.76 (3768.97 0.28%): 1.21x speedup gen3 firefox-talos-gfx 6264.13 (6284.65 0.30%) -> 5261.56 (5370.87 1.28%): 1.19x speedup gen4 midori-zoomed 4771.13 (4809.90 0.73%) -> 4037.03 (4118.93 0.85%): 1.18x speedup gen6 swfdec-giant-steps 1557.06 (1560.13 0.12%) -> 1336.34 (1341.29 0.32%): 1.17x speedup gen4 firefox-talos-gfx 80767.28 (80986.31 0.17%) -> 69629.08 (69721.71 0.06%): 1.16x speedup gen6 midori-zoomed 1463.70 (1463.76 0.08%) -> 1331.45 (1336.56 0.22%): 1.10x speedup Slowdowns ========= gen6 xfce4-terminal-a1 2030.25 (2036.23 0.25%) -> 2144.60 (2240.31 4.29%): 1.06x slowdown gen4 swfdec-youtube 3580.00 (3597.23 3.92%) -> 3826.90 (3862.24 0.91%): 1.07x slowdown gen4 firefox-talos-svg 66112.25 (66256.51 0.11%) -> 71433.40 (71584.31 0.14%): 1.08x slowdown gen4 gnome-system-monitor 5691.60 (5724.03 0.56%) -> 6707.56 (6747.83 0.33%): 1.18x slowdown gen3 ocitysmap 3494.05 (3502.44 0.20%) -> 4321.99 (4524.42 2.78%): 1.24x slowdown gen4 ocitysmap 3628.42 (3641.66 9.37%) -> 5177.16 (5828.74 8.38%): 1.43x slowdown gen5 ocitysmap 4027.77 (4068.11 0.80%) -> 5748.26 (6282.25 7.38%): 1.43x slowdown gen6 ocitysmap 1401.61 (1402.24 0.40%) -> 2365.74 (2379.14 4.12%): 1.69x slowdown [Note the performance regression for ocitysmap comes from that we now attempt to support rendering to and (more importantly) from large surfaces. By enabling such operations is the only way to one day be faster than purely using the CPU, in the meantime we suffer regression due to the increased migration and aperture thrashing. The other couple of regressions will be eliminated with improved span and shader support, now that the framework for such is in place.] The performance increase for Cairo completely overlooks the other critical aspects of the architecture: World of Padman: gen3 (800x600): 57.5 -> 96.2 gen4 (800x600): 47.8 -> 74.6 gen6 (1366x768): 100.4 -> 140.3 [F15] 144.3 -> 146.4 [drm-intel-next] x11perf (gen6); aa10text: 3.47 -> 14.3 Mglyphs/s [unthrottled!] copywinwin10: 1.66 -> 1.99 Mops/s copywinpix10: 2.28 -> 2.98 Mops/s And we do not have a good measure for how much improvement the reworking of the fallback paths give, except that xterm is now over 4x faster... PS: This depends upon the Xorg patchset "Remove the cacheing of the last scratch PixmapRec" for correct invalidations of scratch Pixmaps (used by the dix to implement SHM operations, used by chromium and gtk+ pixbufs. PPS: ./configure --enable-sna Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>