Age | Commit message (Collapse) | Author |
|
This cleans up findstatic.pl output for the i830+ code, which resulted in
removing some code. The only odd part of this commit is the
if (0) i830_sdvo_dump() in i830_sdvo.c -- it tells the compiler that the code
is used, without using it since we want the code around while debugging.
It's also in a likely place to ask for the dump, so I think it's OK.
|
|
It's been broken for years now, and KMS offers a much better chance of getting
this working sensibly without making a mess of the 2D driver.
|
|
This fixes huge memory leak at each VT switch (about 600 BOs + 6MB
of RSS of Xserver).
|
|
They're tiny so it shouldn't have been a problem, but play it safe. This is
another <5% loss on top of the previously reported value, bringing the whole
series to about 8%.
|
|
This eliminates the pinned memory allocation for 965 render state.
|
|
|
|
This is a first step in a series of changes to avoid requiring a pinned object,
which gets in the way of doing non-root KMS. This change appears to result in
about a 2-6% loss in x11perf -aa10text, which better algorithms in libdrm could
make up for (it hasn't really had to deal with code this bad before).
|
|
|
|
This improves performance by avoiding repeated map/unmap cycles, which are
a bit expensive on my machine with lock debugging on in the kernel. It could
do much better if we did more than 18 or so floats at a time.
|
|
The require_space had failed since it only checked for the space required
by the batch emits in the function itself, but not in the
i965_emit_composite_state() that it called (the state we were concerned about
having set up for that 12 * 4 dwords to follow!). This is replaced by
intel_batch_start_atomic(), which will catch such mistakes in the future.
|
|
|
|
Previously it wouldn't count the pixmaps that were about to be used, which
is pretty much the only purpose of having the pain around. This also
eliminates the check_twice confusion with emit_batch_header_for_composite().
|
|
|
|
|
|
We want the objects to be created once per prepare/done both for efficiency and
so we can handle aperture checking better.
|
|
These two paths allocate a number of objects directly.
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
With batch flush notify vertex buffer will be unreferenced,
so don't count it in later aperture check. Also adding
uninitialized vertex buffer check in batch flush notify.
|
|
Which is just being tidy. We initially were looking at this code
path due to a report of a crash on server shutdown which started
after this unreference call was added. Setting this to NULL
apparently didn't avoid the crash, but it's a good thing to do
regardless.
|
|
This avoids mapping a buffer object which is being referenced
by a batch that has already been flushed, (which is a terribly
expensive operation).
On my machine this brings the performance of x11perf -aa10text
from 85k back to 150k, (where it was before a recent kernel
upgrade). Also, before this patch, when I used my X server
actively performance would drop as low as 15k---hopefully that
bug is gone with this change.
|
|
The call into intel_batch_flush() will invalidate the pI830->batch_bo
stored in bo_table[0]. Fix it by re-read the refreshed value.
Signed-off-by: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
Those are identical that using one define is much clear.
And it can also apply fixes for GM45 too, which is missing with
origin define.
|
|
This required reordering the relocation emits for surface/binding table
so that we didn't add new relocations to things that had already been
relocated at (the check_aperture requirement).
|
|
|
|
Instead of having a static array for these and doing an ugly sync
everytime we recycle the array, we now simply allocate short-lived
buffer objects for this dynamic state. The dri layer, in turn, can
take care of efficiently reusing objects as necessary.
On a GM965 this change was tested to improve the performance of
x11perf -aa10text from roughly 120000 to 154000 glyphs/sec.
|
|
This avoids leaking one buffer object.
|
|
We don't actually plan to put any other data in this structure, so it
doesn't make sense to have a generic name, (since we'll only be using
it for our vertex buffer).
|
|
This function is the new name for _allocate_dynamic_state now that
it also emits everything to the batch necessary for setting up a
composite operation. This happens in prepare_composite() every
time and in composite() whenever our vertex buffer fills up.
It's not yet strictly necessary to be redoing this setup in
composite() but it will be soon when the setup starts referring
to buffer objects for surface state and binding table. This
move prepares for that.
|
|
This begins the process of separating the dynamic data from the
static data, (still to move are the surface state and binding
table objects). The new dynamic_state is stored in a buffer
object, so this patch restores the buffer-object-for-vertex-buffer
functionality originally in commit 1abf4d3a7a and later reverted
in 5c9a62a29f.
A notable difference is that this time we actually do use
check_aperture_space to ensure things will fit, (assuming
there's a non-empty implementation under that).
|
|
This doesn't make any difference, but it's cleaner to have
each function follow the same idiom for obtaining these pointers.
|
|
More cleanup here, and again no functional change.
|
|
This follows naturally from the structure rename.
Also we make things less muddled by having this function
actually accept a pointer to a gen4_static_state_t rather
than a gen4_state_t, (and then fetching the desired pointer
out from that).
Again, no intended change in functionality here.
|
|
It doesn't contain only static data yet, but it will soon, so
this renaming prepares for that. Also, this helps make things
more clear between gen4_render_state_t and gen4_state_t which
were muddled before, (particularly because the corresponding
identifiers were render_state and card_state). The card_state
identifier is now known as static_state which should be less
confusing.
This change is strictly search-and-replace with no functional
changes.
|
|
It's very convenient that the hardware supports this non-default
mode since it's exactly what is specified by the Render extension.
This provides a more efficient means of fixing bug #16820:
[EXA] Composition result in black for areas outside of source-surface bo
https://bugs.freedesktop.org/show_bug.cgi?id=16820
without the software fallback we had in the earlier fix,
(commit 76c9ece36e6400fd10f364ee330faea470e2da64 ).
|
|
This is consistent with the documentation, (and just plain makes
more sense).
|
|
This reverts commit 76c9ece36e6400fd10f364ee330faea470e2da64.
We've learned a new technique that should let us avoid this fallback
to software. See following commit.
|
|
We wish it wouldn't, but the hardware ignores the alpha in the
BorderColor we set when the source picture format has no alpha
in it, (and it uses alpha of 1.0 where we want 0.0). For now,
fallback for these cases. This gives a correct result, but
obviously is not as fast as we would like.
This fixes bug #16820:
[EXA] Composition result in black for areas outside of source-surface bounds
https://bugs.freedesktop.org/show_bug.cgi?id=16820
|
|
Eric informed me that the repeat field exists only for backwards
compatibility with old drivers that weren't prepared for values
other than 0 or 1 here. Since we are, we can just ignore that
field and examine only repeatType. So the code's a (tiny) bit
simpler this way.
|
|
It's quite simple to support these modes---we simply need to
turn on the support for them in the hardware.
These changes have been verified with the extend-pad and
extend-reflect tests in cairo's test suite. However, this
currently required using a custom-modified version of cairo.
The issue is that released versions of cairo, (and even
cairo master so far), don't pass RepeatPad and RepeatReflect
to Render, (due to various bugs and workarounds in cairo
and pixman). I do plan to fix those issues in cairo, so that
in a future release of cairo, (1.8.2 perhaps?), the cairo
test suite will usefully test these new repeat modes in our
driver.
|
|
The existing switch statement was switching on the Boolean
repeat field rather than the correct repeatType field. This
had not caused any problem before as only two possible repeat
values were supported (RepeatNone = 0 and RepeatNormal = 1)
so they were always the same as the repeat field.
Soon, however, we'll be supporting more repeat types, so we'll
need to switch on the correct value.
|
|
We'll probably end up doing this differently, but avoid this path for now.
|
|
Otherwise just use the GTT address.
|
|
|
|
This reverts commit 1abf4d3a7a203ff5d6e5ceda29573e7fd69ddf8e.
Conflicts:
src/i965_render.c - flushing was removed, keep it that way
|
|
ssh://git.freedesktop.org/git/xorg/driver/xf86-video-intel into drm-gem
|
|
This improves 'x11perf -aa10text' performance from ~144k to ~169k
|
|
|
|
|
|
This allows us to only call i830WaitSync once every 128 calls to composite
rather than on every call. However, we do need to also call MI_FLUSH to
avoid the vertex cache getting in our way, (since our "separate" buffers
are all allocated as one contiguous chunk).
|
|
Using more than one (in the future) will allow for doing less frequent calls
to i830WaitSync.
|
|
This is in preparation for having larger (or multiple) vertex buffers
in the future.
|