sna: Introduce a new acceleration model.

The premise is that switching between rings (i.e. the BLT and RENDER rings) on SandyBridge imposes a large latency overhead whilst rendering. The cause is that in order to switch rings, we need to split the batch earlier than is desired and to add serialisation between the rings. Both of which incur large overhead. By switching to using a pure 3D blit engine (ok, not so pure as the BLT engine still has uses for the core drawing model which can not be easily represented without a combinatorial explosion of shaders) we can take advantage of additional efficiencies, such as relative relocations, that have been incorporated into recent hardware advances. However, even older hardware performs better from avoiding the implicit context switches and from the batching efficiency of the 3D pipeline... But this is X, and PolyGlyphBlt still exists and remains in use. So for the operations that are not worth accelerating in hardware, we introduce a shadow buffer mechanism through out and reintroduce pixmap migration. Doing this efficiently is the cornerstone of ensuring that we do exploit the increased potential of recent hardware for running old applications and environments (i.e. so that the latest and greatest chip is actually faster than gen2!) For the curious, sna is SandyBridge's New Acceleration. If you are running older chipsets and welcome the performance increase offered by this patch, then you may choose to call it Snazzy instead. Speedups ======== gen3 firefox-fishtank 1203584.56 (1203842.75 0.01%) -> 85561.71 (125146.44 14.87%): 14.07x speedup gen5 grads-heat-map 3385.42 (3489.73 1.44%) -> 350.29 (350.75 0.18%): 9.66x speedup gen3 xfce4-terminal-a1 4179.02 (4180.09 0.06%) -> 503.90 (531.88 4.48%): 8.29x speedup gen4 grads-heat-map 2458.66 (2826.34 4.64%) -> 348.82 (349.20 0.29%): 7.05x speedup gen3 grads-heat-map 1443.33 (1445.32 0.09%) -> 298.55 (298.76 0.05%): 4.83x speedup gen3 swfdec-youtube 3836.14 (3894.14 0.95%) -> 889.84 (979.56 5.99%): 4.31x speedup gen6 grads-heat-map 742.11 (744.44 0.15%) -> 172.51 (172.93 0.20%): 4.30x speedup gen3 firefox-talos-svg 71740.44 (72370.13 0.59%) -> 21959.29 (21995.09 0.68%): 3.27x speedup gen5 gvim 8045.51 (8071.47 0.17%) -> 2589.38 (3246.78 10.74%): 3.11x speedup gen6 poppler 3800.78 (3817.92 0.24%) -> 1227.36 (1230.12 0.30%): 3.10x speedup gen6 gnome-terminal-vim 9106.84 (9111.56 0.03%) -> 3459.49 (3478.52 0.25%): 2.63x speedup gen5 midori-zoomed 9564.53 (9586.58 0.17%) -> 3677.73 (3837.02 2.02%): 2.60x speedup gen5 gnome-terminal-vim 38167.25 (38215.82 0.08%) -> 14901.09 (14902.28 0.01%): 2.56x speedup gen5 poppler 13575.66 (13605.04 0.16%) -> 5554.27 (5555.84 0.01%): 2.44x speedup gen5 swfdec-giant-steps 8941.61 (8988.72 0.52%) -> 3851.98 (3871.01 0.93%): 2.32x speedup gen5 xfce4-terminal-a1 18956.60 (18986.90 0.07%) -> 8362.75 (8365.70 0.01%): 2.27x speedup gen5 firefox-fishtank 88750.31 (88858.23 0.14%) -> 39164.57 (39835.54 0.80%): 2.27x speedup gen3 midori-zoomed 2392.13 (2397.82 0.14%) -> 1109.96 (1303.10 30.35%): 2.16x speedup gen6 gvim 2510.34 (2513.34 0.20%) -> 1200.76 (1204.30 0.22%): 2.09x speedup gen5 firefox-planet-gnome 40478.16 (40565.68 0.09%) -> 19606.22 (19648.79 0.16%): 2.06x speedup gen5 gnome-system-monitor 10344.47 (10385.62 0.29%) -> 5136.69 (5256.85 1.15%): 2.01x speedup gen3 poppler 2595.23 (2603.10 0.17%) -> 1297.56 (1302.42 0.61%): 2.00x speedup gen6 firefox-talos-gfx 7184.03 (7194.97 0.13%) -> 3806.31 (3811.66 0.06%): 1.89x speedup gen5 evolution 8739.25 (8766.12 0.27%) -> 4817.54 (5050.96 1.54%): 1.81x speedup gen3 evolution 1684.06 (1696.88 0.35%) -> 1004.99 (1008.55 0.85%): 1.68x speedup gen3 gnome-terminal-vim 4285.13 (4287.68 0.04%) -> 2715.97 (3202.17 13.52%): 1.58x speedup gen5 swfdec-youtube 5843.94 (5951.07 0.91%) -> 3810.86 (3826.04 1.32%): 1.53x speedup gen4 poppler 7496.72 (7558.83 0.58%) -> 5125.08 (5247.65 1.44%): 1.46x speedup gen4 gnome-terminal-vim 21126.24 (21292.08 0.85%) -> 14590.25 (15066.33 1.80%): 1.45x speedup gen5 firefox-talos-svg 99873.69 (100300.95 0.37%) -> 70745.66 (70818.86 0.05%): 1.41x speedup gen4 firefox-planet-gnome 28205.10 (28304.45 0.27%) -> 19996.11 (20081.44 0.56%): 1.41x speedup gen5 firefox-talos-gfx 93070.85 (93194.72 0.10%) -> 67687.93 (70374.37 1.30%): 1.37x speedup gen4 evolution 6696.25 (6854.14 0.85%) -> 4958.62 (5027.73 0.85%): 1.35x speedup gen3 swfdec-giant-steps 2538.03 (2539.30 0.04%) -> 1895.71 (2050.62 62.43%): 1.34x speedup gen4 gvim 4356.18 (4422.78 0.70%) -> 3276.31 (3281.69 0.13%): 1.33x speedup gen6 evolution 1242.13 (1245.44 0.72%) -> 953.76 (954.54 0.07%): 1.30x speedup gen6 firefox-planet-gnome 4554.23 (4560.69 0.08%) -> 3758.76 (3768.97 0.28%): 1.21x speedup gen3 firefox-talos-gfx 6264.13 (6284.65 0.30%) -> 5261.56 (5370.87 1.28%): 1.19x speedup gen4 midori-zoomed 4771.13 (4809.90 0.73%) -> 4037.03 (4118.93 0.85%): 1.18x speedup gen6 swfdec-giant-steps 1557.06 (1560.13 0.12%) -> 1336.34 (1341.29 0.32%): 1.17x speedup gen4 firefox-talos-gfx 80767.28 (80986.31 0.17%) -> 69629.08 (69721.71 0.06%): 1.16x speedup gen6 midori-zoomed 1463.70 (1463.76 0.08%) -> 1331.45 (1336.56 0.22%): 1.10x speedup Slowdowns ========= gen6 xfce4-terminal-a1 2030.25 (2036.23 0.25%) -> 2144.60 (2240.31 4.29%): 1.06x slowdown gen4 swfdec-youtube 3580.00 (3597.23 3.92%) -> 3826.90 (3862.24 0.91%): 1.07x slowdown gen4 firefox-talos-svg 66112.25 (66256.51 0.11%) -> 71433.40 (71584.31 0.14%): 1.08x slowdown gen4 gnome-system-monitor 5691.60 (5724.03 0.56%) -> 6707.56 (6747.83 0.33%): 1.18x slowdown gen3 ocitysmap 3494.05 (3502.44 0.20%) -> 4321.99 (4524.42 2.78%): 1.24x slowdown gen4 ocitysmap 3628.42 (3641.66 9.37%) -> 5177.16 (5828.74 8.38%): 1.43x slowdown gen5 ocitysmap 4027.77 (4068.11 0.80%) -> 5748.26 (6282.25 7.38%): 1.43x slowdown gen6 ocitysmap 1401.61 (1402.24 0.40%) -> 2365.74 (2379.14 4.12%): 1.69x slowdown [Note the performance regression for ocitysmap comes from that we now attempt to support rendering to and (more importantly) from large surfaces. By enabling such operations is the only way to one day be faster than purely using the CPU, in the meantime we suffer regression due to the increased migration and aperture thrashing. The other couple of regressions will be eliminated with improved span and shader support, now that the framework for such is in place.] The performance increase for Cairo completely overlooks the other critical aspects of the architecture: World of Padman: gen3 (800x600): 57.5 -> 96.2 gen4 (800x600): 47.8 -> 74.6 gen6 (1366x768): 100.4 -> 140.3 [F15] 144.3 -> 146.4 [drm-intel-next] x11perf (gen6); aa10text: 3.47 -> 14.3 Mglyphs/s [unthrottled!] copywinwin10: 1.66 -> 1.99 Mops/s copywinpix10: 2.28 -> 2.98 Mops/s And we do not have a good measure for how much improvement the reworking of the fallback paths give, except that xterm is now over 4x faster... PS: This depends upon the Xorg patchset "Remove the cacheing of the last scratch PixmapRec" for correct invalidations of scratch Pixmaps (used by the dix to implement SHM operations, used by chromium and gtk+ pixbufs. PPS: ./configure --enable-sna Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
author: Chris Wilson <chris@chris-wilson.co.uk> 2011-04-08 07:17:14 +0100
committer: Chris Wilson <chris@chris-wilson.co.uk> 2011-06-04 09:19:46 +0100
commit: bcef98af561939aa48d9236b2dfa2c5626adf4cb (patch)
tree: 9d05558947a97595a6fdece968b50eeae45bbfb1 /src/sna/kgem.c
parent: 340cfb7f5271fd1df4c8948e5c9336f5b69a6e6c (diff)
1 files changed, 1775 insertions, 0 deletions
diff --git a/src/sna/kgem.c b/src/sna/kgem.c
new file mode 100644
index 00000000..0dee6e55
--- /dev/null
+++ b/src/sna/kgem.c
@@ -0,0 +1,1775 @@
+/*
+ * Copyright (c) 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *    Chris Wilson <chris@chris-wilson.co.uk>
+ *
+ */
+
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#include "sna.h"
+#include "sna_reg.h"
+
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <time.h>
+#include <errno.h>
+#include <fcntl.h>
+
+static inline void list_move(struct list *list, struct list *head)
+{
+	__list_del(list->prev, list->next);
+	list_add(list, head);
+}
+
+static inline void list_replace(struct list *old,
+				struct list *new)
+{
+	new->next = old->next;
+	new->next->prev = new;
+	new->prev = old->prev;
+	new->prev->next = new;
+}
+
+#define list_last_entry(ptr, type, member) \
+    list_entry((ptr)->prev, type, member)
+
+#define list_for_each(pos, head)				\
+    for (pos = (head)->next; pos != (head); pos = pos->next)
+
+
+#define DBG_NO_HW 0
+#define DBG_NO_VMAP 0
+#define DBG_NO_RELAXED_FENCING 0
+#define DBG_DUMP 0
+
+#if DEBUG_KGEM
+#undef DBG
+#define DBG(x) ErrorF x
+#else
+#define NDEBUG 1
+#endif
+
+#define PAGE_SIZE 4096
+
+struct kgem_partial_bo {
+	struct kgem_bo base;
+	uint32_t used, alloc;
+	uint32_t need_io : 1;
+	uint32_t write : 1;
+};
+
+static struct drm_i915_gem_exec_object2 _kgem_dummy_exec;
+
+static int gem_set_tiling(int fd, uint32_t handle, int tiling, int stride)
+{
+	struct drm_i915_gem_set_tiling set_tiling;
+	int ret;
+
+	do {
+		set_tiling.handle = handle;
+		set_tiling.tiling_mode = tiling;
+		set_tiling.stride = stride;
+
+		ret = ioctl(fd, DRM_IOCTL_I915_GEM_SET_TILING, &set_tiling);
+	} while (ret == -1 && (errno == EINTR || errno == EAGAIN));
+	return set_tiling.tiling_mode;
+}
+
+static void *gem_mmap(int fd, uint32_t handle, int size, int prot)
+{
+	struct drm_i915_gem_mmap_gtt mmap_arg;
+	void *ptr;
+
+	DBG(("%s(handle=%d, size=%d, prot=%s)\n", __FUNCTION__,
+	     handle, size, prot & PROT_WRITE ? "read/write" : "read-only"));
+
+	mmap_arg.handle = handle;
+	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_MMAP_GTT, &mmap_arg)) {
+		assert(0);
+		return NULL;
+	}
+
+	ptr = mmap(0, size, prot, MAP_SHARED, fd, mmap_arg.offset);
+	if (ptr == MAP_FAILED) {
+		assert(0);
+		ptr = NULL;
+	}
+
+	return ptr;
+}
+
+static int gem_write(int fd, uint32_t handle,
+		     int offset, int length,
+		     const void *src)
+{
+	struct drm_i915_gem_pwrite pwrite;
+
+	DBG(("%s(handle=%d, offset=%d, len=%d)\n", __FUNCTION__,
+	     handle, offset, length));
+
+	pwrite.handle = handle;
+	pwrite.offset = offset;
+	pwrite.size = length;
+	pwrite.data_ptr = (uintptr_t)src;
+	return drmIoctl(fd, DRM_IOCTL_I915_GEM_PWRITE, &pwrite);
+}
+
+static int gem_read(int fd, uint32_t handle, const void *dst, int length)
+{
+	struct drm_i915_gem_pread pread;
+
+	DBG(("%s(handle=%d, len=%d)\n", __FUNCTION__,
+	     handle, length));
+
+	pread.handle = handle;
+	pread.offset = 0;
+	pread.size = length;
+	pread.data_ptr = (uintptr_t)dst;
+	return drmIoctl(fd, DRM_IOCTL_I915_GEM_PREAD, &pread);
+}
+
+Bool kgem_bo_write(struct kgem *kgem, struct kgem_bo *bo,
+		   const void *data, int length)
+{
+	if (gem_write(kgem->fd, bo->handle, 0, length, data))
+		return FALSE;
+
+	_kgem_retire(kgem);
+	return TRUE;
+}
+
+static uint32_t gem_create(int fd, int size)
+{
+	struct drm_i915_gem_create create;
+
+#if DEBUG_KGEM
+	assert((size & (PAGE_SIZE-1)) == 0);
+#endif
+
+	create.handle = 0;
+	create.size = size;
+	(void)drmIoctl(fd, DRM_IOCTL_I915_GEM_CREATE, &create);
+
+	return create.handle;
+}
+
+static bool
+kgem_busy(struct kgem *kgem, int handle)
+{
+	struct drm_i915_gem_busy busy;
+
+	busy.handle = handle;
+	busy.busy = !kgem->wedged;
+	(void)drmIoctl(kgem->fd, DRM_IOCTL_I915_GEM_BUSY, &busy);
+
+	return busy.busy;
+}
+
+static bool
+gem_madvise(int fd, uint32_t handle, uint32_t state)
+{
+	struct drm_i915_gem_madvise madv;
+	int ret;
+
+	madv.handle = handle;
+	madv.madv = state;
+	madv.retained = 1;
+	ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_MADVISE, &madv);
+	assert(ret == 0);
+
+	return madv.retained;
+	(void)ret;
+}
+
+static void gem_close(int fd, uint32_t handle)
+{
+	struct drm_gem_close close;
+
+	close.handle = handle;
+	(void)drmIoctl(fd, DRM_IOCTL_GEM_CLOSE, &close);
+}
+
+static struct kgem_bo *__kgem_bo_init(struct kgem_bo *bo,
+				      int handle, int size)
+{
+	memset(bo, 0, sizeof(*bo));
+
+	bo->refcnt = 1;
+	bo->handle = handle;
+	bo->aperture_size = bo->size = size;
+	bo->reusable = true;
+	bo->cpu_read = true;
+	bo->cpu_write = true;
+	list_init(&bo->request);
+	list_init(&bo->list);
+
+	return bo;
+}
+
+static struct kgem_bo *__kgem_bo_alloc(int handle, int size)
+{
+	struct kgem_bo *bo;
+
+	bo = malloc(sizeof(*bo));
+	if (bo == NULL)
+		return NULL;
+
+	return __kgem_bo_init(bo, handle, size);
+}
+
+static struct kgem_request *__kgem_request_alloc(void)
+{
+	struct kgem_request *rq;
+
+	rq = malloc(sizeof(*rq));
+	assert(rq);
+	if (rq == NULL)
+		return rq;
+
+	list_init(&rq->buffers);
+
+	return rq;
+}
+
+static inline unsigned long __fls(unsigned long word)
+{
+	asm("bsr %1,%0"
+	    : "=r" (word)
+	    : "rm" (word));
+	return word;
+}
+
+static struct list *inactive(struct kgem *kgem,
+			     int size)
+{
+	uint32_t order = __fls(size / PAGE_SIZE);
+	if (order >= ARRAY_SIZE(kgem->inactive))
+		order = ARRAY_SIZE(kgem->inactive)-1;
+	return &kgem->inactive[order];
+}
+
+void kgem_init(struct kgem *kgem, int fd, int gen)
+{
+	drm_i915_getparam_t gp;
+	struct drm_i915_gem_get_aperture aperture;
+	int i;
+
+	kgem->fd = fd;
+	kgem->gen = gen;
+	kgem->wedged = drmCommandNone(kgem->fd, DRM_I915_GEM_THROTTLE) == -EIO;
+	kgem->wedged |= DBG_NO_HW;
+
+	kgem->ring = kgem->mode = KGEM_NONE;
+	kgem->flush = 0;
+
+	kgem->nbatch = 0;
+	kgem->nreloc = 0;
+	kgem->nexec = 0;
+	kgem->surface = ARRAY_SIZE(kgem->batch);
+	list_init(&kgem->partial);
+	list_init(&kgem->requests);
+	list_init(&kgem->active);
+	for (i = 0; i < ARRAY_SIZE(kgem->inactive); i++)
+		list_init(&kgem->inactive[i]);
+
+	kgem->next_request = __kgem_request_alloc();
+
+	kgem->has_vmap = 0;
+#if defined(USE_VMAP) && defined(I915_PARAM_HAS_VMAP)
+	if (!DBG_NO_VMAP) {
+		drm_i915_getparam_t gp;
+
+		gp.param = I915_PARAM_HAS_VMAP;
+		gp.value = &i;
+		kgem->has_vmap =
+			drmIoctl(kgem->fd, DRM_IOCTL_I915_GETPARAM, &gp) == 0 &&
+			i > 0;
+	}
+#endif
+	DBG(("%s: using vmap=%d\n", __FUNCTION__, kgem->has_vmap));
+
+	if (gen < 40) {
+		kgem->has_relaxed_fencing = 0;
+		if (!DBG_NO_RELAXED_FENCING) {
+			drm_i915_getparam_t gp;
+
+			gp.param = I915_PARAM_HAS_RELAXED_FENCING;
+			gp.value = &i;
+			if (drmIoctl(kgem->fd, DRM_IOCTL_I915_GETPARAM, &gp) == 0) {
+				if (gen < 33)
+					kgem->has_relaxed_fencing = i >= 2;
+				else
+					kgem->has_relaxed_fencing = i > 0;
+			}
+		}
+	} else
+		kgem->has_relaxed_fencing = 1;
+	DBG(("%s: has relaxed fencing=%d\n", __FUNCTION__,
+	     kgem->has_relaxed_fencing));
+
+	aperture.aper_available_size = 64*1024*1024;
+	(void)drmIoctl(fd, DRM_IOCTL_I915_GEM_GET_APERTURE, &aperture);
+
+	kgem->aperture_high = aperture.aper_available_size * 3/4;
+	kgem->aperture_low = aperture.aper_available_size * 1/4;
+	kgem->aperture = 0;
+	DBG(("%s: aperture low=%d, high=%d\n", __FUNCTION__,
+	     kgem->aperture_low, kgem->aperture_high));
+
+	i = 8;
+	gp.param = I915_PARAM_NUM_FENCES_AVAIL;
+	gp.value = &i;
+	(void)drmIoctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	kgem->fence_max = i - 2;
+
+	DBG(("%s: max fences=%d\n", __FUNCTION__, kgem->fence_max));
+}
+
+/* XXX hopefully a good approximation */
+static uint32_t kgem_get_unique_id(struct kgem *kgem)
+{
+	uint32_t id;
+	id = ++kgem->unique_id;
+	if (id == 0)
+		id = ++kgem->unique_id;
+	return id;
+}
+
+static uint32_t kgem_surface_size(struct kgem *kgem,
+				  uint32_t width,
+				  uint32_t height,
+				  uint32_t bpp,
+				  uint32_t tiling,
+				  uint32_t *pitch)
+{
+	uint32_t tile_width, tile_height;
+	uint32_t size;
+
+	if (kgem->gen == 2) {
+		if (tiling) {
+			tile_width = 512;
+			tile_height = 16;
+		} else {
+			tile_width = 64;
+			tile_height = 2;
+		}
+	} else switch (tiling) {
+	default:
+	case I915_TILING_NONE:
+		tile_width = 64;
+		tile_height = 2;
+		break;
+	case I915_TILING_X:
+		tile_width = 512;
+		tile_height = 8;
+		break;
+	case I915_TILING_Y:
+		tile_width = 128;
+		tile_height = 32;
+		break;
+	}
+
+	*pitch = ALIGN(width * bpp / 8, tile_width);
+	if (kgem->gen < 40 && tiling != I915_TILING_NONE) {
+		if (*pitch > 8192)
+			return 0;
+		for (size = tile_width; size < *pitch; size <<= 1)
+			;
+		*pitch = size;
+	}
+
+	size = *pitch * ALIGN(height, tile_height);
+	if (kgem->has_relaxed_fencing || tiling == I915_TILING_NONE)
+		return ALIGN(size, PAGE_SIZE);
+
+	/*  We need to allocate a pot fence region for a tiled buffer. */
+	if (kgem->gen < 30)
+		tile_width = 512 * 1024;
+	else
+		tile_width = 1024 * 1024;
+	while (tile_width < size)
+		tile_width *= 2;
+	return tile_width;
+}
+
+static uint32_t kgem_aligned_height(uint32_t height, uint32_t tiling)
+{
+	uint32_t tile_height;
+
+	switch (tiling) {
+	default:
+	case I915_TILING_NONE:
+		tile_height = 2;
+		break;
+	case I915_TILING_X:
+		tile_height = 8;
+		break;
+	case I915_TILING_Y:
+		tile_height = 32;
+		break;
+	}
+
+	return ALIGN(height, tile_height);
+}
+
+static struct drm_i915_gem_exec_object2 *
+kgem_add_handle(struct kgem *kgem, struct kgem_bo *bo)
+{
+	struct drm_i915_gem_exec_object2 *exec;
+
+	assert(kgem->nexec < ARRAY_SIZE(kgem->exec));
+	exec = memset(&kgem->exec[kgem->nexec++], 0, sizeof(*exec));
+	exec->handle = bo->handle;
+	exec->offset = bo->presumed_offset;
+
+	kgem->aperture += bo->aperture_size;
+
+	return exec;
+}
+
+void _kgem_add_bo(struct kgem *kgem, struct kgem_bo *bo)
+{
+	bo->exec = kgem_add_handle(kgem, bo);
+	bo->rq = kgem->next_request;
+	list_move(&bo->request, &kgem->next_request->buffers);
+	kgem->flush |= bo->flush;
+}
+
+static uint32_t kgem_end_batch(struct kgem *kgem)
+{
+	kgem->batch[kgem->nbatch++] = MI_BATCH_BUFFER_END;
+	if (kgem->nbatch & 1)
+		kgem->batch[kgem->nbatch++] = MI_NOOP;
+
+	return kgem->nbatch;
+}
+
+static void kgem_fixup_self_relocs(struct kgem *kgem, struct kgem_bo *bo)
+{
+	int n;
+
+	for (n = 0; n < kgem->nreloc; n++) {
+		if (kgem->reloc[n].target_handle == 0) {
+			kgem->reloc[n].target_handle = bo->handle;
+			kgem->batch[kgem->reloc[n].offset/sizeof(kgem->batch[0])] =
+				kgem->reloc[n].delta + bo->presumed_offset;
+		}
+	}
+}
+
+static void __kgem_bo_destroy(struct kgem *kgem, struct kgem_bo *bo)
+{
+	assert(list_is_empty(&bo->list));
+	assert(bo->refcnt == 0);
+
+	bo->src_bound = bo->dst_bound = 0;
+
+	if(!bo->reusable)
+		goto destroy;
+
+	if (!bo->deleted && !bo->exec) {
+		if (!gem_madvise(kgem->fd, bo->handle, I915_MADV_DONTNEED)) {
+			kgem->need_purge = 1;
+			goto destroy;
+		}
+
+		bo->deleted = 1;
+	}
+
+	list_move(&bo->list, (bo->rq || bo->needs_flush) ? &kgem->active : inactive(kgem, bo->size));
+	return;
+
+destroy:
+	if (!bo->exec) {
+		list_del(&bo->request);
+		gem_close(kgem->fd, bo->handle);
+		free(bo);
+	}
+}
+
+static void kgem_bo_unref(struct kgem *kgem, struct kgem_bo *bo)
+{
+	if (--bo->refcnt == 0)
+		__kgem_bo_destroy(kgem, bo);
+}
+
+void _kgem_retire(struct kgem *kgem)
+{
+	struct kgem_bo *bo, *next;
+
+	list_for_each_entry_safe(bo, next, &kgem->active, list) {
+		if (bo->rq == NULL && !kgem_busy(kgem, bo->handle)) {
+			assert(bo->needs_flush);
+			assert(bo->deleted);
+			bo->needs_flush = 0;
+			list_move(&bo->list, inactive(kgem, bo->size));
+		}
+	}
+
+	while (!list_is_empty(&kgem->requests)) {
+		struct kgem_request *rq;
+
+		rq = list_first_entry(&kgem->requests,
+				      struct kgem_request,
+				      list);
+		if (kgem_busy(kgem, rq->bo->handle))
+			break;
+
+		while (!list_is_empty(&rq->buffers)) {
+			bo = list_first_entry(&rq->buffers,
+					      struct kgem_bo,
+					      request);
+			list_del(&bo->request);
+			bo->rq = NULL;
+			bo->gpu = false;
+
+			if (bo->refcnt == 0 && !bo->needs_flush) {
+				assert(bo->deleted);
+				if (bo->reusable) {
+					list_move(&bo->list,
+						  inactive(kgem, bo->size));
+				} else {
+					gem_close(kgem->fd, bo->handle);
+					free(bo);
+				}
+			}
+		}
+
+		rq->bo->refcnt--;
+		assert(rq->bo->refcnt == 0);
+		if (gem_madvise(kgem->fd, rq->bo->handle, I915_MADV_DONTNEED)) {
+			rq->bo->deleted = 1;
+			list_move(&rq->bo->list,
+				  inactive(kgem, rq->bo->size));
+		} else {
+			kgem->need_purge = 1;
+			gem_close(kgem->fd, rq->bo->handle);
+			free(rq->bo);
+		}
+
+		list_del(&rq->list);
+		free(rq);
+	}
+
+	kgem->retire = 0;
+}
+
+static void kgem_commit(struct kgem *kgem)
+{
+	struct kgem_request *rq = kgem->next_request;
+	struct kgem_bo *bo, *next;
+
+	list_for_each_entry_safe(bo, next, &rq->buffers, request) {
+		bo->src_bound = bo->dst_bound = 0;
+		bo->presumed_offset = bo->exec->offset;
+		bo->exec = NULL;
+		bo->dirty = false;
+		bo->gpu = true;
+		bo->cpu_read = false;
+		bo->cpu_write = false;
+
+		if (!bo->refcnt) {
+			if (!bo->reusable) {
+destroy:
+				list_del(&bo->list);
+				list_del(&bo->request);
+				gem_close(kgem->fd, bo->handle);
+				free(bo);
+				continue;
+			}
+			if (!bo->deleted) {
+				if (!gem_madvise(kgem->fd, bo->handle,
+						 I915_MADV_DONTNEED)) {
+					kgem->need_purge = 1;
+					goto destroy;
+				}
+				bo->deleted = 1;
+			}
+		}
+	}
+
+	list_add_tail(&rq->list, &kgem->requests);
+	kgem->next_request = __kgem_request_alloc();
+}
+
+static void kgem_close_list(struct kgem *kgem, struct list *head)
+{
+	while (!list_is_empty(head)) {
+		struct kgem_bo *bo;
+
+		bo = list_first_entry(head, struct kgem_bo, list);
+		gem_close(kgem->fd, bo->handle);
+		list_del(&bo->list);
+		list_del(&bo->request);
+		free(bo);
+	}
+}
+
+static void kgem_close_inactive(struct kgem *kgem)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(kgem->inactive); i++)
+		kgem_close_list(kgem, &kgem->inactive[i]);
+}
+
+static void kgem_finish_partials(struct kgem *kgem)
+{
+	struct kgem_partial_bo *bo, *next;
+
+	list_for_each_entry_safe(bo, next, &kgem->partial, base.list) {
+		if (!bo->base.exec)
+			continue;
+
+		if (bo->write && bo->need_io) {
+			DBG(("%s: handle=%d, uploading %d/%d\n",
+			     __FUNCTION__, bo->base.handle, bo->used, bo->alloc));
+			gem_write(kgem->fd, bo->base.handle,
+				  0, bo->used, bo+1);
+			bo->need_io = 0;
+		}
+
+		list_del(&bo->base.list);
+		kgem_bo_unref(kgem, &bo->base);
+	}
+}
+
+static void kgem_cleanup(struct kgem *kgem)
+{
+	while (!list_is_empty(&kgem->partial)) {
+		struct kgem_bo *bo;
+
+		bo = list_first_entry(&kgem->partial,
+				      struct kgem_bo,
+				      list);
+		list_del(&bo->list);
+		kgem_bo_unref(kgem, bo);
+	}
+
+	while (!list_is_empty(&kgem->requests)) {
+		struct kgem_request *rq;
+
+		rq = list_first_entry(&kgem->requests,
+				      struct kgem_request,
+				      list);
+		while (!list_is_empty(&rq->buffers)) {
+			struct kgem_bo *bo;
+
+			bo = list_first_entry(&rq->buffers,
+					      struct kgem_bo,
+					      request);
+			list_del(&bo->request);
+			bo->rq = NULL;
+			bo->gpu = false;
+			if (bo->refcnt == 0) {
+				list_del(&bo->list);
+				gem_close(kgem->fd, bo->handle);
+				free(bo);
+			}
+		}
+
+		list_del(&rq->list);
+		free(rq);
+	}
+
+	kgem_close_inactive(kgem);
+}
+
+static int kgem_batch_write(struct kgem *kgem, uint32_t handle)
+{
+	int ret;
+
+	/* If there is no surface data, just upload the batch */
+	if (kgem->surface == ARRAY_SIZE(kgem->batch))
+		return gem_write(kgem->fd, handle,
+				 0, sizeof(uint32_t)*kgem->nbatch,
+				 kgem->batch);
+
+	/* Are the batch pages conjoint with the surface pages? */
+	if (kgem->surface < kgem->nbatch + PAGE_SIZE/4)
+		return gem_write(kgem->fd, handle,
+				 0, sizeof(kgem->batch),
+				 kgem->batch);
+
+	/* Disjoint surface/batch, upload separately */
+	ret = gem_write(kgem->fd, handle,
+			0, sizeof(uint32_t)*kgem->nbatch,
+			kgem->batch);
+	if (ret)
+		return ret;
+
+	return gem_write(kgem->fd, handle,
+			sizeof(uint32_t)*kgem->surface,
+			sizeof(kgem->batch) - sizeof(uint32_t)*kgem->surface,
+			kgem->batch + kgem->surface);
+}
+
+void _kgem_submit(struct kgem *kgem)
+{
+	struct kgem_request *rq;
+	uint32_t batch_end;
+	int size;
+
+	assert(kgem->nbatch);
+	assert(kgem->nbatch <= KGEM_BATCH_SIZE(kgem));
+
+	sna_kgem_context_switch(kgem, KGEM_NONE);
+
+	batch_end = kgem_end_batch(kgem);
+	sna_kgem_flush(kgem);
+
+	DBG(("batch[%d/%d]: %d %d %d, nreloc=%d, nexec=%d, nfence=%d, aperture=%d\n",
+	     kgem->mode, kgem->ring, batch_end, kgem->nbatch, kgem->surface,
+	     kgem->nreloc, kgem->nexec, kgem->nfence, kgem->aperture));
+
+	assert(kgem->nbatch <= ARRAY_SIZE(kgem->batch));
+	assert(kgem->nreloc <= ARRAY_SIZE(kgem->reloc));
+	assert(kgem->nexec < ARRAY_SIZE(kgem->exec));
+	assert(kgem->nfence <= kgem->fence_max);
+#if DEBUG_BATCH
+	__kgem_batch_debug(kgem, batch_end);
+#endif
+
+	rq = kgem->next_request;
+	if (kgem->surface != ARRAY_SIZE(kgem->batch))
+		size = sizeof(kgem->batch);
+	else
+		size = kgem->nbatch * sizeof(kgem->batch[0]);
+	rq->bo = kgem_create_linear(kgem, size);
+	if (rq->bo) {
+		uint32_t handle = rq->bo->handle;
+		int i;
+
+		i = kgem->nexec++;
+		kgem->exec[i].handle = handle;
+		kgem->exec[i].relocation_count = kgem->nreloc;
+		kgem->exec[i].relocs_ptr = (uintptr_t)kgem->reloc;
+		kgem->exec[i].alignment = 0;
+		kgem->exec[i].offset = 0;
+		kgem->exec[i].flags = 0;
+		kgem->exec[i].rsvd1 = 0;
+		kgem->exec[i].rsvd2 = 0;
+
+		rq->bo->exec = &kgem->exec[i];
+		list_add(&rq->bo->request, &rq->buffers);
+
+		kgem_fixup_self_relocs(kgem, rq->bo);
+		kgem_finish_partials(kgem);
+
+		if (kgem_batch_write(kgem, handle) == 0) {
+			struct drm_i915_gem_execbuffer2 execbuf;
+			int ret;
+
+			execbuf.buffers_ptr = (uintptr_t)kgem->exec;
+			execbuf.buffer_count = kgem->nexec;
+			execbuf.batch_start_offset = 0;
+			execbuf.batch_len = batch_end*4;
+			execbuf.cliprects_ptr = 0;
+			execbuf.num_cliprects = 0;
+			execbuf.DR1 = 0;
+			execbuf.DR4 = 0;
+			execbuf.flags = kgem->ring;
+			execbuf.rsvd1 = 0;
+			execbuf.rsvd2 = 0;
+
+			if (DBG_DUMP) {
+				int fd = open("/tmp/i915-batchbuffers.dump",
+					      O_WRONLY | O_CREAT | O_APPEND,
+					      0666);
+				if (fd != -1) {
+					ret = write(fd, kgem->batch, batch_end*4);
+					fd = close(fd);
+				}
+			}
+
+			ret = drmIoctl(kgem->fd,
+				       DRM_IOCTL_I915_GEM_EXECBUFFER2,
+				       &execbuf);
+			while (ret == -1 && errno == EBUSY) {
+				drmCommandNone(kgem->fd, DRM_I915_GEM_THROTTLE);
+				ret = drmIoctl(kgem->fd,
+					       DRM_IOCTL_I915_GEM_EXECBUFFER2,
+					       &execbuf);
+			}
+			if (ret == -1 && errno == EIO) {
+				DBG(("%s: GPU hang detected\n", __FUNCTION__));
+				kgem->wedged = 1;
+				ret = 0;
+			}
+#if DEBUG_KGEM
+			if (ret < 0) {
+				int i;
+				ErrorF("batch (end=%d, size=%d) submit failed: %d\n",
+				       batch_end, size, errno);
+
+				i = open("/tmp/batchbuffer", O_WRONLY | O_CREAT | O_APPEND, 0666);
+				if (i != -1) {
+					ret = write(i, kgem->batch, batch_end*4);
+					close(i);
+				}
+
+				for (i = 0; i < kgem->nexec; i++) {
+					struct kgem_request *rq = kgem->next_request;
+					struct kgem_bo *bo, *found = NULL;
+
+					list_for_each_entry(bo, &rq->buffers, request) {
+						if (bo->handle == kgem->exec[i].handle) {
+							found = bo;
+							break;
+						}
+					}
+					ErrorF("exec[%d] = handle:%d, presumed offset: %x, size: %d, tiling %d, fenced %d, deleted %d\n",
+					       i,
+					       kgem->exec[i].handle,
+					       (int)kgem->exec[i].offset,
+					       found ? found->size : 0,
+					       found ? found->tiling : 0,
+					       (int)(kgem->exec[i].flags & EXEC_OBJECT_NEEDS_FENCE),
+					       found ? found->deleted : 1);
+				}
+				for (i = 0; i < kgem->nreloc; i++) {
+					ErrorF("reloc[%d] = pos:%d, target:%d, delta:%d, read:%x, write:%x, offset:%x\n",
+					       i,
+					       (int)kgem->reloc[i].offset,
+					       kgem->reloc[i].target_handle,
+					       kgem->reloc[i].delta,
+					       kgem->reloc[i].read_domains,
+					       kgem->reloc[i].write_domain,
+					       (int)kgem->reloc[i].presumed_offset);
+				}
+				abort();
+			}
+#endif
+			assert(ret == 0);
+
+			if (DEBUG_FLUSH_SYNC) {
+				struct drm_i915_gem_set_domain set_domain;
+				int ret;
+
+				set_domain.handle = handle;
+				set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+				set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+
+				ret = drmIoctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain);
+				if (ret == -1) {
+					DBG(("%s: sync: GPU hang detected\n", __FUNCTION__));
+					kgem->wedged = 1;
+				}
+			}
+		}
+	}
+
+	kgem_commit(kgem);
+	if (kgem->wedged)
+		kgem_cleanup(kgem);
+
+	kgem->nfence = 0;
+	kgem->nexec = 0;
+	kgem->nreloc = 0;
+	kgem->aperture = 0;
+	kgem->nbatch = 0;
+	kgem->surface = ARRAY_SIZE(kgem->batch);
+	kgem->mode = KGEM_NONE;
+	kgem->flush = 0;
+
+	kgem->retire = 1;
+
+	sna_kgem_reset(kgem);
+}
+
+void kgem_throttle(struct kgem *kgem)
+{
+	kgem->wedged |= drmCommandNone(kgem->fd, DRM_I915_GEM_THROTTLE) == -EIO;
+}
+
+bool kgem_needs_expire(struct kgem *kgem)
+{
+	int i;
+
+	if (!list_is_empty(&kgem->active))
+		return true;
+
+	for (i = 0; i < ARRAY_SIZE(kgem->inactive); i++) {
+		if (!list_is_empty(&kgem->inactive[i]))
+			return true;
+	}
+
+	return false;
+}
+
+bool kgem_expire_cache(struct kgem *kgem)
+{
+	time_t now, expire;
+	struct kgem_bo *bo;
+	unsigned int size = 0, count = 0;
+	bool idle;
+	int i;
+
+	_kgem_retire(kgem);
+	if (kgem->wedged)
+		kgem_cleanup(kgem);
+
+	time(&now);
+	expire = 0;
+
+	idle = true;
+	for (i = 0; i < ARRAY_SIZE(kgem->inactive); i++) {
+		idle &= list_is_empty(&kgem->inactive[i]);
+		list_for_each_entry(bo, &kgem->inactive[i], list) {
+			assert(bo->deleted);
+			if (bo->delta) {
+				expire = now - 5;
+				break;
+			}
+
+			bo->delta = now;
+		}
+	}
+	if (!kgem->need_purge) {
+		if (idle)
+			return false;
+		if (expire == 0)
+			return true;
+	}
+
+	idle = true;
+	for (i = 0; i < ARRAY_SIZE(kgem->inactive); i++) {
+		while (!list_is_empty(&kgem->inactive[i])) {
+			bo = list_last_entry(&kgem->inactive[i],
+					     struct kgem_bo, list);
+
+			if (!gem_madvise(kgem->fd, bo->handle,
+					 I915_MADV_DONTNEED)) {
+				if (bo->delta > expire) {
+					idle = false;
+					break;
+				}
+			}
+
+			count++;
+			size += bo->size;
+
+			gem_close(kgem->fd, bo->handle);
+			list_del(&bo->list);
+			free(bo);
+		}
+	}
+
+	DBG(("%s: purge? %d -- expired %d objects, %d bytes\n", __FUNCTION__, kgem->need_purge,  count, size));
+
+	kgem->need_purge = false;
+	return idle;
+	(void)count;
+	(void)size;
+}
+
+static struct kgem_bo *
+search_linear_cache(struct kgem *kgem, int size, bool active)
+{
+	struct kgem_bo *bo, *next;
+	struct list *cache;
+
+	if (!active) {
+		cache = inactive(kgem, size);
+		kgem_retire(kgem);
+	} else
+		cache = &kgem->active;
+
+	list_for_each_entry_safe(bo, next, cache, list) {
+		if (size > bo->size)
+			continue;
+
+		if (active && bo->tiling != I915_TILING_NONE)
+			continue;
+
+		list_del(&bo->list);
+
+		if (bo->deleted) {
+			if (!gem_madvise(kgem->fd, bo->handle,
+					 I915_MADV_WILLNEED)) {
+				kgem->need_purge = 1;
+				goto next_bo;
+			}
+
+			bo->deleted = 0;
+		}
+
+		if (I915_TILING_NONE != bo->tiling &&
+		    gem_set_tiling(kgem->fd, bo->handle,
+				   I915_TILING_NONE, 0) != I915_TILING_NONE)
+			goto next_bo;
+
+		bo->tiling = I915_TILING_NONE;
+		bo->pitch = 0;
+		bo->delta = 0;
+		bo->aperture_size = bo->size;
+		DBG(("  %s: found handle=%d (size=%d) in linear %s cache\n",
+		     __FUNCTION__, bo->handle, bo->size,
+		     active ? "active" : "inactive"));
+		assert(bo->refcnt == 0);
+		assert(bo->reusable);
+		return bo;
+next_bo:
+		list_del(&bo->request);
+		gem_close(kgem->fd, bo->handle);
+		free(bo);
+	}
+
+	return NULL;
+}
+
+struct kgem_bo *kgem_create_for_name(struct kgem *kgem, uint32_t name)
+{
+	struct drm_gem_open open_arg;
+
+	DBG(("%s(name=%d)\n", __FUNCTION__, name));
+
+	memset(&open_arg, 0, sizeof(open_arg));
+	open_arg.name = name;
+	if (drmIoctl(kgem->fd, DRM_IOCTL_GEM_OPEN, &open_arg))
+		return NULL;
+
+	DBG(("%s: new handle=%d\n", __FUNCTION__, open_arg.handle));
+	return __kgem_bo_alloc(open_arg.handle, 0);
+}
+
+struct kgem_bo *kgem_create_linear(struct kgem *kgem, int size)
+{
+	struct kgem_bo *bo;
+	uint32_t handle;
+
+	DBG(("%s(%d)\n", __FUNCTION__, size));
+
+	size = ALIGN(size, PAGE_SIZE);
+	bo = search_linear_cache(kgem, size, false);
+	if (bo)
+		return kgem_bo_reference(bo);
+
+	handle = gem_create(kgem->fd, size);
+	if (handle == 0)
+		return NULL;
+
+	DBG(("%s: new handle=%d\n", __FUNCTION__, handle));
+	return __kgem_bo_alloc(handle, size);
+}
+
+int kgem_choose_tiling(struct kgem *kgem, int tiling, int width, int height, int bpp)
+{
+	if (kgem->gen < 40) {
+		if (tiling) {
+			if (width * bpp > 8192 * 8) {
+				DBG(("%s: pitch too large for tliing [%d]\n",
+				     __FUNCTION__, width*bpp/8));
+				return I915_TILING_NONE;
+			}
+
+			if (width > 2048 || height > 2048) {
+				DBG(("%s: large buffer (%dx%d), forcing TILING_X\n",
+				     __FUNCTION__, width, height));
+				return -I915_TILING_X;
+			}
+		}
+	} else {
+		if (width*bpp > (MAXSHORT-512) * 8) {
+			DBG(("%s: large pitch [%d], forcing TILING_X\n",
+			     __FUNCTION__, width*bpp/8));
+			return -I915_TILING_X;
+		}
+
+		if (tiling && (width > 8192 || height > 8192)) {
+			DBG(("%s: large tiled buffer [%dx%d], forcing TILING_X\n",
+			     __FUNCTION__, width, height));
+			return -I915_TILING_X;
+		}
+	}
+
+	if (tiling == I915_TILING_Y && height < 16) {
+		DBG(("%s: too short [%d] for TILING_Y\n",
+		     __FUNCTION__,height));
+		tiling = I915_TILING_X;
+	}
+	if (tiling == I915_TILING_X && height < 4) {
+		DBG(("%s: too short [%d] for TILING_X\n",
+		     __FUNCTION__, height));
+		tiling = I915_TILING_NONE;
+	}
+
+	if (tiling == I915_TILING_X && width * bpp < 512/2) {
+		DBG(("%s: too thin [%d] for TILING_X\n",
+		     __FUNCTION__, width));
+		tiling = I915_TILING_NONE;
+	}
+	if (tiling == I915_TILING_Y && width * bpp < 32/2) {
+		DBG(("%s: too thin [%d] for TILING_Y\n",
+		     __FUNCTION__, width));
+		tiling = I915_TILING_NONE;
+	}
+
+	DBG(("%s: %dx%d -> %d\n", __FUNCTION__, width, height, tiling));
+	return tiling;
+}
+
+static bool _kgem_can_create_2d(struct kgem *kgem,
+				int width, int height, int bpp, int tiling)
+{
+	uint32_t pitch, size;
+
+	if (bpp < 8)
+		return false;
+
+	size = kgem_surface_size(kgem, width, height, bpp, tiling, &pitch);
+	if (size == 0 || size > kgem->aperture_low)
+		size = kgem_surface_size(kgem, width, height, bpp, I915_TILING_NONE, &pitch);
+	return size > 0 && size <= kgem->aperture_low;
+}
+
+#if DEBUG_KGEM
+bool kgem_can_create_2d(struct kgem *kgem,
+			int width, int height, int bpp, int tiling)
+{
+	bool ret = _kgem_can_create_2d(kgem, width, height, bpp, tiling);
+	DBG(("%s(%dx%d, bpp=%d, tiling=%d) = %d\n", __FUNCTION__,
+	     width, height, bpp, tiling, ret));
+	return ret;
+}
+#else
+bool kgem_can_create_2d(struct kgem *kgem,
+			int width, int height, int bpp, int tiling)
+{
+	return _kgem_can_create_2d(kgem, width, height, bpp, tiling);
+}
+#endif
+
+static int kgem_bo_aperture_size(struct kgem *kgem, struct kgem_bo *bo)
+{
+	int size;
+
+	if (kgem->gen >= 40 || bo->tiling == I915_TILING_NONE) {
+		size = bo->size;
+	} else {
+		if (kgem->gen < 30)
+			size = 512 * 1024;
+		else
+			size = 1024 * 1024;
+		while (size < bo->size)
+			size *= 2;
+	}
+	return size;
+}
+
+struct kgem_bo *kgem_create_2d(struct kgem *kgem,
+			       int width,
+			       int height,
+			       int bpp,
+			       int tiling,
+			       uint32_t flags)
+{
+	struct list *cache;
+	struct kgem_bo *bo, *next;
+	uint32_t pitch, tiled_height[3], size;
+	uint32_t handle;
+	int exact = flags & CREATE_EXACT;
+	int search;
+	int i;
+
+	if (tiling < 0)
+		tiling = -tiling, exact = 1;
+
+	DBG(("%s(%dx%d, bpp=%d, tiling=%d, exact=%d, inactive=%d)\n", __FUNCTION__,
+	     width, height, bpp, tiling, !!exact, !!(flags & CREATE_INACTIVE)));
+
+	assert(_kgem_can_create_2d(kgem, width, height, bpp, tiling));
+	size = kgem_surface_size(kgem, width, height, bpp, tiling, &pitch);
+	assert(size && size <= kgem->aperture_low);
+	if (flags & CREATE_INACTIVE)
+		goto skip_active_search;
+
+	for (i = 0; i <= I915_TILING_Y; i++)
+		tiled_height[i] = kgem_aligned_height(height, i);
+
+	search = 0;
+	/* Best active match first */
+	list_for_each_entry_safe(bo, next, &kgem->active, list) {
+		uint32_t s;
+
+		search++;
+
+		if (exact) {
+			if (bo->tiling != tiling)
+				continue;
+		} else {
+			if (bo->tiling > tiling)
+				continue;
+		}
+
+		if (bo->tiling) {
+			if (bo->pitch < pitch) {
+				DBG(("tiled and pitch too small: tiling=%d, (want %d), pitch=%d, need %d\n",
+				     bo->tiling, tiling,
+				     bo->pitch, pitch));
+				continue;
+			}
+		} else
+			bo->pitch = pitch;
+
+		s = bo->pitch * tiled_height[bo->tiling];
+		if (s > bo->size) {
+			DBG(("size too small: %d < %d\n",
+			     bo->size, s));
+			continue;
+		}
+
+		list_del(&bo->list);
+
+		if (bo->deleted) {
+			if (!gem_madvise(kgem->fd, bo->handle,
+					 I915_MADV_WILLNEED)) {
+				kgem->need_purge = 1;
+				gem_close(kgem->fd, bo->handle);
+				list_del(&bo->request);
+				free(bo);
+				continue;
+			}
+
+			bo->deleted = 0;
+		}
+
+		bo->unique_id = kgem_get_unique_id(kgem);
+		bo->delta = 0;
+		bo->aperture_size = kgem_bo_aperture_size(kgem, bo);
+		DBG(("  from active: pitch=%d, tiling=%d, handle=%d, id=%d\n",
+		     bo->pitch, bo->tiling, bo->handle, bo->unique_id));
+		assert(bo->refcnt == 0);
+		assert(bo->reusable);
+		return kgem_bo_reference(bo);
+	}
+
+	DBG(("searched %d active, no match\n", search));
+
+skip_active_search:
+	/* Now just look for a close match and prefer any currently active */
+	cache = inactive(kgem, size);
+	list_for_each_entry_safe(bo, next, cache, list) {
+		if (size > bo->size) {
+			DBG(("inactive too small: %d < %d\n",
+			     bo->size, size));
+			continue;
+		}
+
+		if (bo->tiling != tiling ||
+		    (tiling != I915_TILING_NONE && bo->pitch != pitch)) {
+			if (tiling != gem_set_tiling(kgem->fd,
+						     bo->handle,
+						     tiling, pitch))
+				goto next_bo;
+		}
+
+		bo->pitch = pitch;
+		bo->tiling = tiling;
+
+		list_del(&bo->list);
+
+		if (bo->deleted) {
+			if (!gem_madvise(kgem->fd, bo->handle,
+					 I915_MADV_WILLNEED)) {
+				kgem->need_purge = 1;
+				goto next_bo;
+			}
+
+			bo->deleted = 0;
+		}
+
+		bo->delta = 0;
+		bo->unique_id = kgem_get_unique_id(kgem);
+		bo->aperture_size = kgem_bo_aperture_size(kgem, bo);
+		assert(bo->pitch);
+		DBG(("  from inactive: pitch=%d, tiling=%d: handle=%d, id=%d\n",
+		     bo->pitch, bo->tiling, bo->handle, bo->unique_id));
+		assert(bo->refcnt == 0);
+		assert(bo->reusable);
+		return kgem_bo_reference(bo);
+
+next_bo:
+		gem_close(kgem->fd, bo->handle);
+		list_del(&bo->request);
+		free(bo);
+		continue;
+	}
+
+	handle = gem_create(kgem->fd, size);
+	if (handle == 0)
+		return NULL;
+
+	bo = __kgem_bo_alloc(handle, size);
+	if (!bo) {
+		gem_close(kgem->fd, handle);
+		return NULL;
+	}
+
+	bo->unique_id = kgem_get_unique_id(kgem);
+	bo->pitch = pitch;
+	if (tiling != I915_TILING_NONE)
+		bo->tiling = gem_set_tiling(kgem->fd, handle, tiling, pitch);
+	bo->aperture_size = kgem_bo_aperture_size(kgem, bo);
+
+	DBG(("  new pitch=%d, tiling=%d, handle=%d, id=%d\n",
+	     bo->pitch, bo->tiling, bo->handle, bo->unique_id));
+	return bo;
+}
+
+void _kgem_bo_destroy(struct kgem *kgem, struct kgem_bo *bo)
+{
+	if (bo->proxy) {
+		kgem_bo_unref(kgem, bo->proxy);
+		list_del(&bo->request);
+		free(bo);
+		return;
+	}
+
+	__kgem_bo_destroy(kgem, bo);
+}
+
+void __kgem_flush(struct kgem *kgem, struct kgem_bo *bo)
+{
+	/* The kernel will emit a flush *and* update its own flushing lists. */
+	kgem_busy(kgem, bo->handle);
+}
+
+bool kgem_check_bo(struct kgem *kgem, struct kgem_bo *bo)
+{
+	if (bo == NULL)
+		return true;
+
+	if (bo->exec)
+		return true;
+
+	if (kgem->aperture > kgem->aperture_low)
+		return false;
+
+	if (bo->size + kgem->aperture > kgem->aperture_high)
+		return false;
+
+	if (kgem->nexec == KGEM_EXEC_SIZE(kgem))
+		return false;
+
+	return true;
+}
+
+bool kgem_check_bo_fenced(struct kgem *kgem, ...)
+{
+	va_list ap;
+	struct kgem_bo *bo;
+	int num_fence = 0;
+	int num_exec = 0;
+	int size = 0;
+
+	if (kgem->aperture > kgem->aperture_low)
+		return false;
+
+	va_start(ap, kgem);
+	while ((bo = va_arg(ap, struct kgem_bo *))) {
+		if (bo->exec) {
+			if (kgem->gen >= 40 || bo->tiling == I915_TILING_NONE)
+				continue;
+
+			if ((bo->exec->flags & EXEC_OBJECT_NEEDS_FENCE) == 0)
+				num_fence++;
+
+			continue;
+		}
+
+		size += bo->size;
+		num_exec++;
+		if (kgem->gen < 40 && bo->tiling)
+			num_fence++;
+	}
+	va_end(ap);
+
+	if (size + kgem->aperture > kgem->aperture_high)
+		return false;
+
+	if (kgem->nexec + num_exec >= KGEM_EXEC_SIZE(kgem))
+		return false;
+
+	if (kgem->nfence + num_fence >= kgem->fence_max)
+		return false;
+
+	return true;
+}
+
+uint32_t kgem_add_reloc(struct kgem *kgem,
+			uint32_t pos,
+			struct kgem_bo *bo,
+			uint32_t read_write_domain,
+			uint32_t delta)
+{
+	int index;
+
+	index = kgem->nreloc++;
+	assert(index < ARRAY_SIZE(kgem->reloc));
+	kgem->reloc[index].offset = pos * sizeof(kgem->batch[0]);
+	if (bo) {
+		assert(!bo->deleted);
+
+		delta += bo->delta;
+		if (bo->proxy) {
+			/* need to release the cache upon batch submit */
+			list_move(&bo->request, &kgem->next_request->buffers);
+			bo->exec = &_kgem_dummy_exec;
+			bo = bo->proxy;
+		}
+
+		assert(!bo->deleted);
+
+		if (bo->exec == NULL) {
+			_kgem_add_bo(kgem, bo);
+			if (bo->needs_flush &&
+			    (read_write_domain >> 16) != I915_GEM_DOMAIN_RENDER)
+				bo->needs_flush = false;
+		}
+
+		if (read_write_domain & KGEM_RELOC_FENCED && kgem->gen < 40) {
+			if (bo->tiling &&
+			    (bo->exec->flags & EXEC_OBJECT_NEEDS_FENCE) == 0) {
+				assert(kgem->nfence < kgem->fence_max);
+				kgem->nfence++;
+			}
+			bo->exec->flags |= EXEC_OBJECT_NEEDS_FENCE;
+		}
+
+		kgem->reloc[index].delta = delta;
+		kgem->reloc[index].target_handle = bo->handle;
+		kgem->reloc[index].presumed_offset = bo->presumed_offset;
+
+		if (read_write_domain & 0x7fff)
+			bo->needs_flush = bo->dirty = true;
+
+		delta += bo->presumed_offset;
+	} else {
+		kgem->reloc[index].delta = delta;
+		kgem->reloc[index].target_handle = 0;
+		kgem->reloc[index].presumed_offset = 0;
+	}
+	kgem->reloc[index].read_domains = read_write_domain >> 16;
+	kgem->reloc[index].write_domain = read_write_domain & 0x7fff;
+
+	return delta;
+}
+
+void *kgem_bo_map(struct kgem *kgem, struct kgem_bo *bo, int prot)
+{
+	return gem_mmap(kgem->fd, bo->handle, bo->size, prot);
+}
+
+uint32_t kgem_bo_flink(struct kgem *kgem, struct kgem_bo *bo)
+{
+	struct drm_gem_flink flink;
+	int ret;
+
+	memset(&flink, 0, sizeof(flink));
+	flink.handle = bo->handle;
+	ret = drmIoctl(kgem->fd, DRM_IOCTL_GEM_FLINK, &flink);
+	if (ret)
+		return 0;
+
+	bo->reusable = false;
+	return flink.name;
+}
+
+#if defined(USE_VMAP) && defined(I915_PARAM_HAS_VMAP)
+static uint32_t gem_vmap(int fd, void *ptr, int size, int read_only)
+{
+	struct drm_i915_gem_vmap vmap;
+
+	vmap.user_ptr = (uintptr_t)ptr;
+	vmap.user_size = size;
+	vmap.flags = 0;
+	if (read_only)
+		vmap.flags |= I915_VMAP_READ_ONLY;
+
+	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_VMAP, &vmap))
+		return 0;
+
+	return vmap.handle;
+}
+
+struct kgem_bo *kgem_create_map(struct kgem *kgem,
+				void *ptr, uint32_t size,
+				bool read_only)
+{
+	struct kgem_bo *bo;
+	uint32_t handle;
+
+	if (!kgem->has_vmap)
+		return NULL;
+
+	handle = gem_vmap(kgem->fd, ptr, size, read_only);
+	if (handle == 0)
+		return NULL;
+
+	bo = __kgem_bo_alloc(handle, size);
+	if (bo == NULL) {
+		gem_close(kgem->fd, handle);
+		return NULL;
+	}
+
+	bo->reusable = false;
+	bo->sync = true;
+	DBG(("%s(ptr=%p, size=%d, read_only=%d) => handle=%d\n",
+	     __FUNCTION__, ptr, size, read_only, handle));
+	return bo;
+}
+#else
+static uint32_t gem_vmap(int fd, void *ptr, int size, int read_only)
+{
+	return 0;
+}
+
+struct kgem_bo *kgem_create_map(struct kgem *kgem,
+				void *ptr, uint32_t size,
+				bool read_only)
+{
+	return NULL;
+}
+#endif
+
+void kgem_bo_sync(struct kgem *kgem, struct kgem_bo *bo, bool for_write)
+{
+	struct drm_i915_gem_set_domain set_domain;
+
+	kgem_bo_submit(kgem, bo);
+	if (for_write ? bo->cpu_write : bo->cpu_read)
+		return;
+
+	set_domain.handle = bo->handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_CPU;
+	set_domain.write_domain = for_write ? I915_GEM_DOMAIN_CPU : 0;
+
+	drmIoctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain);
+	_kgem_retire(kgem);
+	bo->cpu_read = true;
+	if (for_write)
+		bo->cpu_write = true;
+}
+
+void kgem_clear_dirty(struct kgem *kgem)
+{
+	struct kgem_request *rq = kgem->next_request;
+	struct kgem_bo *bo;
+
+	list_for_each_entry(bo, &rq->buffers, request)
+		bo->dirty = false;
+}
+
+/* Flush the contents of the RenderCache and invalidate the TextureCache */
+void kgem_emit_flush(struct kgem *kgem)
+{
+	if (kgem->nbatch == 0)
+		return;
+
+	if (!kgem_check_batch(kgem,  4)) {
+		_kgem_submit(kgem);
+		return;
+	}
+
+	DBG(("%s()\n", __FUNCTION__));
+
+	if (kgem->ring == KGEM_BLT) {
+		kgem->batch[kgem->nbatch++] = MI_FLUSH_DW | 2;
+		kgem->batch[kgem->nbatch++] = 0;
+		kgem->batch[kgem->nbatch++] = 0;
+		kgem->batch[kgem->nbatch++] = 0;
+	} else if (kgem->gen >= 50 && 0) {
+		kgem->batch[kgem->nbatch++] = PIPE_CONTROL | 2;
+		kgem->batch[kgem->nbatch++] =
+			PIPE_CONTROL_WC_FLUSH |
+			PIPE_CONTROL_TC_FLUSH |
+			PIPE_CONTROL_NOWRITE;
+		kgem->batch[kgem->nbatch++] = 0;
+		kgem->batch[kgem->nbatch++] = 0;
+	} else {
+		if ((kgem->batch[kgem->nbatch-1] & (0xff<<23)) == MI_FLUSH)
+			kgem->nbatch--;
+		kgem->batch[kgem->nbatch++] = MI_FLUSH | MI_INVALIDATE_MAP_CACHE;
+	}
+
+	kgem_clear_dirty(kgem);
+}
+
+struct kgem_bo *kgem_create_proxy(struct kgem_bo *target,
+				  int offset, int length)
+{
+	struct kgem_bo *bo;
+
+	assert(target->proxy == NULL);
+
+	bo = __kgem_bo_alloc(target->handle, length);
+	if (bo == NULL)
+		return NULL;
+
+	bo->reusable = false;
+	bo->proxy = kgem_bo_reference(target);
+	bo->delta = offset;
+	return bo;
+}
+
+struct kgem_bo *kgem_create_buffer(struct kgem *kgem,
+				   uint32_t size, uint32_t flags,
+				   void **ret)
+{
+	struct kgem_partial_bo *bo;
+	bool write = !!(flags & KGEM_BUFFER_WRITE);
+	int offset = 0;
+
+	DBG(("%s: size=%d, flags=%x\n", __FUNCTION__, size, flags));
+
+	list_for_each_entry(bo, &kgem->partial, base.list) {
+		if (bo->write != write)
+			continue;
+		if (bo->used + size < bo->alloc) {
+			DBG(("%s: reusing partial buffer? used=%d, total=%d\n",
+			     __FUNCTION__, bo->used, bo->alloc));
+			offset = bo->used;
+			bo->used += size;
+			break;
+		}
+	}
+
+	if (offset == 0) {
+		uint32_t handle;
+		int alloc;
+
+		alloc = (flags & KGEM_BUFFER_LAST) ? 4096 : 32 * 1024;
+		alloc = ALIGN(size, alloc);
+
+		bo = malloc(sizeof(*bo) + alloc);
+		if (bo == NULL)
+			return NULL;
+
+		handle = 0;
+		if (kgem->has_vmap)
+			handle = gem_vmap(kgem->fd, bo+1, alloc, write);
+		if (handle == 0) {
+			struct kgem_bo *old;
+
+			old = NULL;
+			if (!write)
+				old = search_linear_cache(kgem, alloc, true);
+			if (old == NULL)
+				old = search_linear_cache(kgem, alloc, false);
+			if (old) {
+				memcpy(&bo->base, old, sizeof(*old));
+				if (old->rq)
+					list_replace(&old->request,
+						     &bo->base.request);
+				else
+					list_init(&bo->base.request);
+				free(old);
+				bo->base.refcnt = 1;
+			} else {
+				if (!__kgem_bo_init(&bo->base,
+					       gem_create(kgem->fd, alloc),
+					       alloc)) {
+					free(bo);
+					return NULL;
+				}
+			}
+			bo->need_io = true;
+		} else {
+			__kgem_bo_init(&bo->base, handle, alloc);
+			bo->base.reusable = false;
+			bo->base.sync = true;
+			bo->need_io = 0;
+		}
+
+		bo->alloc = alloc;
+		bo->used = size;
+		bo->write = write;
+
+		list_add(&bo->base.list, &kgem->partial);
+		DBG(("%s(size=%d) new handle=%d\n",
+		     __FUNCTION__, alloc, bo->base.handle));
+	}
+
+	*ret = (char *)(bo+1) + offset;
+	return kgem_create_proxy(&bo->base, offset, size);
+}
+
+struct kgem_bo *kgem_upload_source_image(struct kgem *kgem,
+					 const void *data,
+					 int x, int y,
+					 int width, int height,
+					 int stride, int bpp)
+{
+	int dst_stride = ALIGN(width * bpp, 32) >> 3;
+	int size = dst_stride * height;
+	struct kgem_bo *bo;
+	void *dst;
+
+	DBG(("%s : (%d, %d), (%d, %d), stride=%d, bpp=%d\n",
+	     __FUNCTION__, x, y, width, height, stride, bpp));
+
+	bo = kgem_create_buffer(kgem, size, KGEM_BUFFER_WRITE, &dst);
+	if (bo == NULL)
+		return NULL;
+
+	memcpy_blt(data, dst, bpp,
+		   stride, dst_stride,
+		   x, y,
+		   0, 0,
+		   width, height);
+
+	bo->pitch = dst_stride;
+	return bo;
+}
+
+void kgem_buffer_sync(struct kgem *kgem, struct kgem_bo *_bo)
+{
+	struct kgem_partial_bo *bo;
+
+	if (_bo->proxy)
+		_bo = _bo->proxy;
+
+	bo = (struct kgem_partial_bo *)_bo;
+
+	DBG(("%s(need_io=%s, sync=%d)\n", __FUNCTION__,
+	     bo->need_io ? bo->write ? "write" : "read" : "none",
+	     bo->base.sync));
+
+	if (bo->need_io) {
+		if (bo->write)
+			gem_write(kgem->fd, bo->base.handle,
+				  0, bo->used, bo+1);
+		else
+			gem_read(kgem->fd, bo->base.handle, bo+1, bo->used);
+		_kgem_retire(kgem);
+		bo->need_io = 0;
+	}
+
+	if (bo->base.sync)
+		kgem_bo_sync(kgem, &bo->base, bo->write);
+}
author	Chris Wilson <chris@chris-wilson.co.uk>	2011-04-08 07:17:14 +0100
committer	Chris Wilson <chris@chris-wilson.co.uk>	2011-06-04 09:19:46 +0100
commit	bcef98af561939aa48d9236b2dfa2c5626adf4cb (patch)
tree	9d05558947a97595a6fdece968b50eeae45bbfb1 /src/sna/kgem.c
parent	340cfb7f5271fd1df4c8948e5c9336f5b69a6e6c (diff)