Age | Commit message (Collapse) | Author |
|
Presumably this only matters for i686 because amd64 implies sse2, but:
BUILDSTDERR: In file included from gen4_vertex.c:34:
BUILDSTDERR: gen4_vertex.c: In function 'emit_vertex':
BUILDSTDERR: sna_render_inline.h:40:26: error: inlining failed in call to always_inline 'vertex_emit_2s': target specific option mismatch
BUILDSTDERR: static force_inline void vertex_emit_2s(struct sna *sna, int16_t x, int16_t y)
BUILDSTDERR: ^~~~~~~~~~~~~~
BUILDSTDERR: gen4_vertex.c:308:25: note: called from here
BUILDSTDERR: #define OUT_VERTEX(x,y) vertex_emit_2s(sna, x,y) /* XXX assert(!too_large(x, y)); */
BUILDSTDERR: ^~~~~~~~~~~~~~~~~~~~~~~~
BUILDSTDERR: gen4_vertex.c:360:2: note: in expansion of macro 'OUT_VERTEX'
BUILDSTDERR: OUT_VERTEX(dstX, dstY);
BUILDSTDERR: ^~~~~~~~~~
The bug here appears to be that emit_vertex() is declared 'sse2' but
vertex_emit_2s is merely always_inline. gcc8 decides that since you said
always_inline you need to have explicitly cloned it for every
permutation of targets. Merely saying inline seems to do the job of
cloning vertex_emit_2s as much as necessary.
So to reiterate: if you say always-inline, it won't, but if you just say
maybe inline, it will. Thanks gcc, that's helpful.
|
|
Trying to unify all the target attributes to chase down:
blt.c: In function ‘memcpy_from_tiled_x__swizzle_0__sse2’:
blt.c:345:1: error: inlining failed in call to always_inline
‘memcpy_sse64xN’: target specific option mismatch
memcpy_sse64xN(uint8_t *dst, const uint8_t *src, int bytes)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Telling the compiler the known alignment should improve the memcpy
operation, but only has a small impact today (a few bytes/instructions
per function).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Using __packed__ as shorthand for ___attribute__(__packed__) confuses
clang as. (I guess to it expands (__packed__) which gcc skips.) As
clang also uses packed in its builtins, we have to find a compromise,
and so tightly_packed wins for being a more verbose description without
the dangerous leading underscores.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Also written by Mark Kettenis and reported by Sedat Dilek.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we need optimal copy code for the general case, where unlike
swizzling the run lengths are not known before hand, we need to call the
arch specific routines from glibc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Thomas Jones reported that the build was failing with gcc-4.5 due to the
memcpy routines requesting an unsupported optimisation mode (-Ofast) and
supplied this patch to only enable Ofast for gcc-4.6+
Reported-by: Thomas Jones <thomas.jones@utoronto.ca>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Always enable gcc to fully optimize the core memcpy routines (provided
that optimisations are not entirely disabled, for instance for
debugging).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This reduces the number of loops and restarts required in the kernel.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Confusing gcc with different flags for supposedly inlined functions is
not a good idea.
References: https://bugs.freedesktop.org/show_bug.cgi?id=62198
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
And fixup a basic error in the process.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Merely hinting that it was preferred by using sse+387 was not enough
for GCC to emit the faster SSE2 code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
gcc-4.4.5 (on squeeze) triggers an ICE when using target(sse2).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Otherwise we seem to confuse the poor little compiler. This should also
make it easier to use CPP to turn off blocks.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Allow use of advanced ISA when available by detecting support at
runtime. This initial work just uses GCC to emit varying ISA, future
work could use hand written code for these hot spots.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
'const' is only allowed to use the function parameters and not allowed
to access global memory - that includes not allowed to deference its
arguments...
Thanks to Jiri Slaby for spotting my mistake.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we now defer the allocation of pixel data until first use, it can
fail in the middle of a rendering routine. In order to prevent chasing
us passing a NULL pointer into the fallback routines, we need to propagate
the failure from the malloc and suppress the failure, discarding the
operation, which is less than ideal.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|