Merge Mesa 17.2.8

author: Jonathan Gray <jsg@cvs.openbsd.org> 2017-12-31 07:12:27 +0000
committer: Jonathan Gray <jsg@cvs.openbsd.org> 2017-12-31 07:12:27 +0000
commit: 051645c92924bf915d82bf219f2ed67309b5577a (patch)
tree: 4aae126dd8e5a18c6a9926a5468d1561e6038a07 /lib/mesa/src/gallium/docs/source
parent: 2dae6fe6f74cf7fb9fd65285302c0331d9786b00 (diff)
6 files changed, 565 insertions, 134 deletions
diff --git a/lib/mesa/src/gallium/docs/source/conf.py b/lib/mesa/src/gallium/docs/source/conf.py
index 5e8173d86..c6039fbe8 100644
--- a/lib/mesa/src/gallium/docs/source/conf.py
+++ b/lib/mesa/src/gallium/docs/source/conf.py
@@ -22,7 +22,7 @@ sys.path.append(os.path.abspath('exts'))
 
 # Add any Sphinx extension module names here, as strings. They can be extensions
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
-extensions = ['sphinx.ext.pngmath', 'sphinx.ext.graphviz', 'formatting']
+extensions = ['sphinx.ext.imgmath', 'sphinx.ext.graphviz', 'formatting']
 
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
diff --git a/lib/mesa/src/gallium/docs/source/context.rst b/lib/mesa/src/gallium/docs/source/context.rst
index e190cefc8..a46131c31 100644
--- a/lib/mesa/src/gallium/docs/source/context.rst
+++ b/lib/mesa/src/gallium/docs/source/context.rst
@@ -53,8 +53,6 @@ buffers, surfaces) are bound to the driver.
 
 * ``set_vertex_buffers``
 
-* ``set_index_buffer``
-
 
 Non-CSO State
 ^^^^^^^^^^^^^
@@ -91,14 +89,24 @@ objects. They all follow simple, one-method binding calls, e.g.
   blits. (Blits have their own way to pass the requisite rectangles
   in.)
 * ``set_tess_state`` configures the default tessellation parameters:
+
   * ``default_outer_level`` is the default value for the outer tessellation
     levels. This corresponds to GL's ``PATCH_DEFAULT_OUTER_LEVEL``.
   * ``default_inner_level`` is the default value for the inner tessellation
     levels. This corresponds to GL's ``PATCH_DEFAULT_INNER_LEVEL``.
+
 * ``set_debug_callback`` sets the callback to be used for reporting
   various debug messages, eventually reported via KHR_debug and
   similar mechanisms.
 
+Samplers
+^^^^^^^^
+
+pipe_sampler_state objects control how textures are sampled (coordinate
+wrap modes, interpolation modes, etc).  Note that samplers are not used
+for texture buffer objects.  That is, pipe_context::bind_sampler_views()
+will not bind a sampler if the corresponding sampler view refers to a
+PIPE_BUFFER resource.
 
 Sampler Views
 ^^^^^^^^^^^^^
@@ -252,6 +260,29 @@ multi-byte element value starting at offset bytes from resource start, going
 for size bytes. It is guaranteed that size % clear_value_size == 0.
 
 
+Uploading
+^^^^^^^^^
+
+For simple single-use uploads, use ``pipe_context::stream_uploader`` or
+``pipe_context::const_uploader``. The latter should be used for uploading
+constants, while the former should be used for uploading everything else.
+PIPE_USAGE_STREAM is implied in both cases, so don't use the uploaders
+for static allocations.
+
+Usage:
+
+Call u_upload_alloc or u_upload_data as many times as you want. After you are
+done, call u_upload_unmap. If the driver doesn't support persistent mappings,
+u_upload_unmap makes sure the previously mapped memory is unmapped.
+
+Gotchas:
+- Always fill the memory immediately after u_upload_alloc. Any following call
+to u_upload_alloc and u_upload_data can unmap memory returned by previous
+u_upload_alloc.
+- Don't interleave calls using stream_uploader and const_uploader. If you use
+one of them, do the upload, unmap, and only then can you use the other one.
+
+
 Drawing
 ^^^^^^^
 
@@ -265,8 +296,8 @@ the mode of the primitive and the vertices to be fetched, in the range between
 Every instance with instanceID in the range between ``start_instance`` and
 ``start_instance``+``instance_count``-1, inclusive, will be drawn.
 
-If there is an index buffer bound, and ``indexed`` field is true, all vertex
-indices will be looked up in the index buffer.
+If  ``index_size`` != 0, all vertex indices will be looked up from the index
+buffer.
 
 In indexed draw, ``min_index`` and ``max_index`` respectively provide a lower
 and upper bound of the indices contained in the index buffer inside the range
@@ -578,7 +609,8 @@ texture_barrier
 %%%%%%%%%%%%%%%
 
 This function flushes all pending writes to the currently-set surfaces and
-invalidates all read caches of the currently-set samplers.
+invalidates all read caches of the currently-set samplers. This can be used
+for both regular textures as well as for framebuffers read via FBFETCH.
 
 
 
@@ -592,6 +624,31 @@ are set.
 
 
 
+.. _resource_commit:
+
+resource_commit
+%%%%%%%%%%%%%%%
+
+This function changes the commit state of a part of a sparse resource. Sparse
+resources are created by setting the ``PIPE_RESOURCE_FLAG_SPARSE`` flag when
+calling ``resource_create``. Initially, sparse resources only reserve a virtual
+memory region that is not backed by memory (i.e., it is uncommitted). The
+``resource_commit`` function can be called to commit or uncommit parts (or all)
+of a resource. The driver manages the underlying backing memory.
+
+The contents of newly committed memory regions are undefined. Calling this
+function to commit an already committed memory region is allowed and leaves its
+content unchanged. Similarly, calling this function to uncommit an already
+uncommitted memory region is allowed.
+
+For buffers, the given box must be aligned to multiples of
+``PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE``. As an exception to this rule, if the size
+of the buffer is not a multiple of the page size, changing the commit state of
+the last (partial) page requires a box that ends at the end of the buffer
+(i.e., box->x + box->width == buffer->width0).
+
+
+
 .. _pipe_transfer:
 
 PIPE_TRANSFER
@@ -707,3 +764,46 @@ notifications are single-shot, i.e. subsequent calls to
   since the last call or since the last notification by callback.
 * ``set_device_reset_callback`` sets a callback which will be called when
   a device reset is detected. The callback is only called synchronously.
+
+Bindless
+^^^^^^^^
+
+If PIPE_CAP_BINDLESS_TEXTURE is TRUE, the following ``pipe_context`` functions
+are used to create/delete bindless handles, and to make them resident in the
+current context when they are going to be used by shaders.
+
+* ``create_texture_handle`` creates a 64-bit unsigned integer texture handle
+  that is going to be directly used in shaders.
+* ``delete_texture_handle`` deletes a 64-bit unsigned integer texture handle.
+* ``make_texture_handle_resident`` makes a 64-bit unsigned texture handle
+  resident in the current context to be accessible by shaders for texture
+  mapping.
+* ``create_image_handle`` creates a 64-bit unsigned integer image handle that
+  is going to be directly used in shaders.
+* ``delete_image_handle`` deletes a 64-bit unsigned integer image handle.
+* ``make_image_handle_resident`` makes a 64-bit unsigned integer image handle
+  resident in the current context to be accessible by shaders for image loads,
+  stores and atomic operations.
+
+Using several contexts
+----------------------
+
+Several contexts from the same screen can be used at the same time. Objects
+created on one context cannot be used in another context, but the objects
+created by the screen methods can be used by all contexts.
+
+Transfers
+^^^^^^^^^
+A transfer on one context is not expected to synchronize properly with
+rendering on other contexts, thus only areas not yet used for rendering should
+be locked.
+
+A flush is required after transfer_unmap to expect other contexts to see the
+uploaded data, unless:
+
+* Using persistent mapping. Associated with coherent mapping, unmapping the
+  resource is also not required to use it in other contexts. Without coherent
+  mapping, memory_barrier(PIPE_BARRIER_MAPPED_BUFFER) should be called on the
+  context that has mapped the resource. No flush is required.
+
+* Mapping the resource with PIPE_TRANSFER_MAP_DIRECTLY.
diff --git a/lib/mesa/src/gallium/docs/source/drivers/openswr.rst b/lib/mesa/src/gallium/docs/source/drivers/openswr.rst
index 84aa51f5d..e254d7bce 100644
--- a/lib/mesa/src/gallium/docs/source/drivers/openswr.rst
+++ b/lib/mesa/src/gallium/docs/source/drivers/openswr.rst
@@ -7,7 +7,7 @@ geometry heavy workloads there is a considerable speedup over llvmpipe,
 which is to be expected as the geometry frontend of llvmpipe is single
 threaded.
 
-This rasterizer is x86 specific and requires AVX or AVX2.  The driver
+This rasterizer is x86 specific and requires AVX or above.  The driver
 fits into the gallium framework, and reuses gallivm for doing the TGSI
 to vectorized llvm-IR conversion of the shader kernels.
 
diff --git a/lib/mesa/src/gallium/docs/source/drivers/openswr/usage.rst b/lib/mesa/src/gallium/docs/source/drivers/openswr/usage.rst
index e55b4211a..61c30c27c 100644
--- a/lib/mesa/src/gallium/docs/source/drivers/openswr/usage.rst
+++ b/lib/mesa/src/gallium/docs/source/drivers/openswr/usage.rst
@@ -4,8 +4,9 @@ Usage
 Requirements
 ^^^^^^^^^^^^
 
-* An x86 processor with AVX or AVX2
-* LLVM version 3.6 or later
+* An x86 processor with AVX or above
+* LLVM version 3.9 or later
+* C++14 capable compiler
 
 Building
 ^^^^^^^^
@@ -18,13 +19,18 @@ configure time, for example: ::
 Using
 ^^^^^
 
-On Linux, building will create a drop-in alternative for libGL.so into::
+On Linux, building with autotools will create a drop-in alternative
+for libGL.so into::
 
   lib/gallium/libGL.so
+  lib/gallium/libswrAVX.so
+  lib/gallium/libswrAVX2.so
 
-or::
+Alternatively, building with SCons will produce::
 
-  build/foo/gallium/targets/libgl-xlib/libGL.so
+  build/linux-x86_64/gallium/targets/libgl-xlib/libGL.so
+  build/linux-x86_64/gallium/drivers/swr/libswrAVX.so
+  build/linux-x86_64/gallium/drivers/swr/libswrAVX2.so
 
 To use it set the LD_LIBRARY_PATH environment variable accordingly.
 
diff --git a/lib/mesa/src/gallium/docs/source/screen.rst b/lib/mesa/src/gallium/docs/source/screen.rst
index d79e75e21..32da22885 100644
--- a/lib/mesa/src/gallium/docs/source/screen.rst
+++ b/lib/mesa/src/gallium/docs/source/screen.rst
@@ -115,10 +115,6 @@ The integer capabilities:
   aligned to 4.  If false, there are no restrictions on src_offset.
 * ``PIPE_CAP_COMPUTE``: Whether the implementation supports the
   compute entry points defined in pipe_context and pipe_screen.
-* ``PIPE_CAP_USER_INDEX_BUFFERS``: Whether user index buffers are supported.
-  If not, the state tracker must upload all indices which are not in hw
-  resources.  If user-space buffers are supported, the driver must also still
-  accept HW resource buffers.
 * ``PIPE_CAP_USER_CONSTANT_BUFFERS``: Whether user-space constant buffers
   are supported.  If not, the state tracker must put constants into HW
   resources/buffers.  If user-space constant buffers are supported, the
@@ -221,7 +217,7 @@ The integer capabilities:
   pipe_draw_info::indirect_stride and ::indirect_count
 * ``PIPE_CAP_MULTI_DRAW_INDIRECT_PARAMS``: Whether the driver supports
   taking the number of indirect draws from a separate parameter
-  buffer, see pipe_draw_info::indirect_params.
+  buffer, see pipe_draw_indirect_info::indirect_draw_count.
 * ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE``: Whether the fragment shader supports
   the FINE versions of DDX/DDY.
 * ``PIPE_CAP_VENDOR_ID``: The vendor ID of the underlying hardware. If it's
@@ -361,6 +357,45 @@ The integer capabilities:
   equal interpolation qualifiers.
   Components may overlap, notably when the gaps in an array of dvec3 are
   filled in.
+* ``PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS``: Whether interleaved stream
+  output mode is able to interleave across buffers. This is required for
+  ARB_transform_feedback3.
+* ``PIPE_CAP_TGSI_CAN_READ_OUTPUTS``: Whether every TGSI shader stage can read
+  from the output file.
+* ``PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY``: Tell the GLSL compiler to use
+  the minimum amount of optimizations just to be able to do all the linking
+  and lowering.
+* ``PIPE_CAP_TGSI_FS_FBFETCH``: Whether a fragment shader can use the FBFETCH
+  opcode to retrieve the current value in the framebuffer.
+* ``PIPE_CAP_TGSI_MUL_ZERO_WINS``: Whether TGSI shaders support the
+  ``TGSI_PROPERTY_MUL_ZERO_WINS`` shader property.
+* ``PIPE_CAP_DOUBLES``: Whether double precision floating-point operations
+  are supported.
+* ``PIPE_CAP_INT64``: Whether 64-bit integer operations are supported.
+* ``PIPE_CAP_INT64_DIVMOD``: Whether 64-bit integer division/modulo
+  operations are supported.
+* ``PIPE_CAP_TGSI_TEX_TXF_LZ``: Whether TEX_LZ and TXF_LZ opcodes are
+  supported.
+* ``PIPE_CAP_TGSI_CLOCK``: Whether the CLOCK opcode is supported.
+* ``PIPE_CAP_POLYGON_MODE_FILL_RECTANGLE``: Whether the
+  PIPE_POLYGON_MODE_FILL_RECTANGLE mode is supported for
+  ``pipe_rasterizer_state::fill_front`` and
+  ``pipe_rasterizer_state::fill_back``.
+* ``PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE``: The page size of sparse buffers in
+  bytes, or 0 if sparse buffers are not supported. The page size must be at
+  most 64KB.
+* ``PIPE_CAP_TGSI_BALLOT``: Whether the BALLOT and READ_* opcodes as well as
+  the SUBGROUP_* semantics are supported.
+* ``PIPE_CAP_TGSI_TES_LAYER_VIEWPORT``: Whether ``TGSI_SEMANTIC_LAYER`` and
+  ``TGSI_SEMANTIC_VIEWPORT_INDEX`` are supported as tessellation evaluation
+  shader outputs.
+* ``PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEX``: Whether a buffer with just
+  PIPE_BIND_CONSTANT_BUFFER can be legally passed to set_vertex_buffers.
+* ``PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION``: As the name says.
+* ``PIPE_CAP_POST_DEPTH_COVERAGE``: whether
+  ``TGSI_PROPERTY_FS_POST_DEPTH_COVERAGE`` is supported.
+* ``PIPE_CAP_BINDLESS_TEXTURE``: Whether bindless texture operations are
+  supported.
 
 
 .. _pipe_capf:
@@ -419,7 +454,6 @@ file is still supported. In that case, the constbuf index is assumed
 to be 0.
 
 * ``PIPE_SHADER_CAP_MAX_TEMPS``: The maximum number of temporary registers.
-* ``PIPE_SHADER_CAP_MAX_PREDS``: The maximum number of predicate registers.
 * ``PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED``: Whether the continue opcode is supported.
 * ``PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR``: Whether indirect addressing
   of the input file is supported.
@@ -439,8 +473,6 @@ to be 0.
   program.  It should be one of the ``pipe_shader_ir`` enum values.
 * ``PIPE_SHADER_CAP_MAX_SAMPLER_VIEWS``: The maximum number of texture
   sampler views. Must not be lower than PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS.
-* ``PIPE_SHADER_CAP_DOUBLES``: Whether double precision floating-point
-  operations are supported.
 * ``PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED``: Whether double precision rounding
   is supported. If it is, DTRUNC/DCEIL/DFLR/DROUND opcodes may be used.
 * ``PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED``: Whether DFRACEXP and
@@ -460,6 +492,13 @@ to be 0.
 * ``PIPE_SHADER_CAP_SUPPORTED_IRS``: Supported representations of the
   program.  It should be a mask of ``pipe_shader_ir`` bits.
 * ``PIPE_SHADER_CAP_MAX_SHADER_IMAGES``: Maximum number of image units.
+* ``PIPE_SHADER_CAP_LOWER_IF_THRESHOLD``: IF and ELSE branches with a lower
+  cost than this value should be lowered by the state tracker for better
+  performance. This is a tunable for the GLSL compiler and the behavior is
+  specific to the compiler.
+* ``PIPE_SHADER_CAP_TGSI_SKIP_MERGE_REGISTERS``: Whether the merge registers
+  TGSI pass is skipped. This might reduce code size and register pressure if
+  the underlying driver has a real backend compiler.
 
 
 .. _pipe_compute_cap:
@@ -587,17 +626,26 @@ get_name
 
 Returns an identifying name for the screen.
 
+The returned string should remain valid and immutable for the lifetime of
+pipe_screen.
+
 get_vendor
 ^^^^^^^^^^
 
 Returns the screen vendor.
 
+The returned string should remain valid and immutable for the lifetime of
+pipe_screen.
+
 get_device_vendor
 ^^^^^^^^^^^^^^^^^
 
 Returns the actual vendor of the device driving the screen
 (as opposed to the driver vendor).
 
+The returned string should remain valid and immutable for the lifetime of
+pipe_screen.
+
 .. _get_param:
 
 get_param
@@ -639,8 +687,6 @@ the maximum allowed legal value is 32.
 
 **bindings** is a bitmask of :ref:`PIPE_BIND` flags.
 
-**geom_flags** is a bitmask of PIPE_TEXTURE_GEOM_x flags.
-
 Returns TRUE if all usages can be satisfied.
 
 
@@ -693,6 +739,20 @@ which isn't multisampled.
 
 
 
+resource_changed
+^^^^^^^^^^^^^^^^
+
+Mark a resource as changed so derived internal resources will be recreated
+on next use.
+
+When importing external images that can't be directly used as texture sampler
+source, internal copies may have to be created that the hardware can sample
+from. When those resources are reimported, the image data may have changed, and
+the previously derived internal resources must be invalidated to avoid sampling
+from old copies.
+
+
+
 resource_destroy
 ^^^^^^^^^^^^^^^^
 
@@ -728,3 +788,23 @@ query group at the specified **index** is returned in **info**.
 The function returns non-zero on success.
 The driver-specific query group is described with the
 pipe_driver_query_group_info structure.
+
+
+
+get_disk_shader_cache
+^^^^^^^^^^^^^^^^^^^^^
+
+Returns a pointer to a driver-specific on-disk shader cache. If the driver
+failed to create the cache or does not support an on-disk shader cache NULL is
+returned. The callback itself may also be NULL if the driver doesn't support
+an on-disk shader cache.
+
+
+Thread safety
+-------------
+
+Screen methods are required to be thread safe. While gallium rendering
+contexts are not required to be thread safe, it is required to be safe to use
+different contexts created with the same screen in different threads without
+locks. It is also required to be safe using screen methods in a thread, while
+using one of its contexts in another (without locks).
diff --git a/lib/mesa/src/gallium/docs/source/tgsi.rst b/lib/mesa/src/gallium/docs/source/tgsi.rst
index 5068285aa..0dd2ac024 100644
--- a/lib/mesa/src/gallium/docs/source/tgsi.rst
+++ b/lib/mesa/src/gallium/docs/source/tgsi.rst
@@ -26,11 +26,19 @@ each of the components of *dst*. When this happens, the result is said to be
 Modifiers
 ^^^^^^^^^^^^^^^
 
-TGSI supports modifiers on inputs (as well as saturate modifier on instructions).
+TGSI supports modifiers on inputs (as well as saturate and precise modifier
+on instructions).
 
-For inputs which have a floating point type, both absolute value and negation
-modifiers are supported (with absolute value being applied first).
-TGSI_OPCODE_MOV is considered to have float input type for applying modifiers.
+For arithmetic instruction having a precise modifier certain optimizations
+which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
+optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
+MAD instruction.
+
+For inputs which have a floating point type, both absolute value and
+negation modifiers are supported (with absolute value being applied
+first).  The only source of TGSI_OPCODE_MOV and the second and third
+sources of TGSI_OPCODE_UCMP are considered to have float type for
+applying modifiers.
 
 For inputs which have signed or unsigned type only the negate modifier is
 supported.
@@ -235,6 +243,9 @@ This instruction replicates its result.
 
 .. opcode:: MAD - Multiply And Add
 
+Perform a * b + c. The implementation is free to decide whether there is an
+intermediate rounding step or not.
+
 .. math::
 
   dst.x = src0.x \times src1.x + src2.x
@@ -246,19 +257,6 @@ This instruction replicates its result.
   dst.w = src0.w \times src1.w + src2.w
 
 
-.. opcode:: SUB - Subtract
-
-.. math::
-
-  dst.x = src0.x - src1.x
-
-  dst.y = src0.y - src1.y
-
-  dst.z = src0.z - src1.z
-
-  dst.w = src0.w - src1.w
-
-
 .. opcode:: LRP - Linear Interpolate
 
 .. math::
@@ -313,19 +311,6 @@ Perform a * b + c with no intermediate rounding step.
   dst.w = src.w - \lfloor src.w\rfloor
 
 
-.. opcode:: CLAMP - Clamp
-
-.. math::
-
-  dst.x = clamp(src0.x, src1.x, src2.x)
-
-  dst.y = clamp(src0.y, src1.y, src2.y)
-
-  dst.z = clamp(src0.z, src1.z, src2.z)
-
-  dst.w = clamp(src0.w, src1.w, src2.w)
-
-
 .. opcode:: FLR - Floor
 
 .. math::
@@ -391,19 +376,6 @@ This instruction replicates its result.
   dst.w = 1
 
 
-.. opcode:: ABS - Absolute
-
-.. math::
-
-  dst.x = |src.x|
-
-  dst.y = |src.y|
-
-  dst.z = |src.z|
-
-  dst.w = |src.w|
-
-
 .. opcode:: DPH - Homogeneous Dot Product
 
 This instruction replicates its result.
@@ -794,6 +766,29 @@ This instruction replicates its result.
   dst = src0.x \times src1.x + src0.y \times src1.y
 
 
+.. opcode:: TEX_LZ - Texture Lookup With LOD = 0
+
+  This is the same as TXL with LOD = 0. Like every texture opcode, it obeys
+  pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod.
+  There is no way to override those two in shaders.
+
+.. math::
+
+  coord.x = src0.x
+
+  coord.y = src0.y
+
+  coord.z = src0.z
+
+  coord.w = none
+
+  lod = 0
+
+  unit = src1
+
+  dst = texture\_sample(unit, coord, lod)
+
+
 .. opcode:: TXL - Texture Lookup With explicit LOD
 
   for cube map array textures, the explicit lod value
@@ -953,14 +948,23 @@ XXX doesn't look like most of the opcodes really belong here.
 .. opcode:: TXF - Texel Fetch
 
   As per NV_gpu_shader4, extract a single texel from a specified texture
-  image. The source sampler may not be a CUBE or SHADOW.  src 0 is a
+  image or PIPE_BUFFER resource. The source sampler may not be a CUBE or
+  SHADOW.  src 0 is a
   four-component signed integer vector used to identify the single texel
   accessed. 3 components + level.  Just like texture instructions, an optional
   offset vector is provided, which is subject to various driver restrictions
-  (regarding range, source of offsets).
+  (regarding range, source of offsets). This instruction ignores the sampler
+  state.
+
   TXF(uint_vec coord, int_vec offset).
 
 
+.. opcode:: TXF_LZ - Texel Fetch
+
+  This is the same as TXF with level = 0. Like TXF, it obeys
+  pipe_sampler_view::u.tex.first_level.
+
+
 .. opcode:: TXQ - Texture Size Query
 
   As per NV_gpu_program4, retrieve the dimensions of the texture depending on
@@ -988,7 +992,9 @@ XXX doesn't look like most of the opcodes really belong here.
 .. opcode:: TXQS - Texture Samples Query
 
   This retrieves the number of samples in the texture, and stores it
-  into the x component. The other components are undefined.
+  into the x component as an unsigned integer. The other components are
+  undefined.  If the texture is not multisampled, this function returns
+  (1, undef, undef, undef).
 
 .. math::
 
@@ -1044,6 +1050,20 @@ XXX doesn't look like most of the opcodes really belong here.
 
    dst.xy = lodq(uint, coord);
 
+.. opcode:: CLOCK - retrieve the current shader time
+
+   Invoking this instruction multiple times in the same shader should
+   cause monotonically increasing values to be returned. The values
+   are implicitly 64-bit, so if fewer than 64 bits of precision are
+   available, to provide expected wraparound semantics, the value
+   should be shifted up so that the most significant bit of the time
+   is the most significant bit of the 64-bit value.
+
+.. math::
+
+   dst.xy = clock()
+
+
 Integer ISA
 ^^^^^^^^^^^^^^^^^^^^^^^^
 These opcodes are used for integer operations.
@@ -1196,13 +1216,13 @@ Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
 
 .. math::
 
-  dst.x = src0.x \ src1.x
+  dst.x = \frac{src0.x}{src1.x}
 
-  dst.y = src0.y \ src1.y
+  dst.y = \frac{src0.y}{src1.y}
 
-  dst.z = src0.z \ src1.z
+  dst.z = \frac{src0.z}{src1.z}
 
-  dst.w = src0.w \ src1.w
+  dst.w = \frac{src0.w}{src1.w}
 
 
 .. opcode:: UDIV - Unsigned Integer Division
@@ -1211,13 +1231,13 @@ Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
 
 .. math::
 
-  dst.x = src0.x \ src1.x
+  dst.x = \frac{src0.x}{src1.x}
 
-  dst.y = src0.y \ src1.y
+  dst.y = \frac{src0.y}{src1.y}
 
-  dst.z = src0.z \ src1.z
+  dst.z = \frac{src0.z}{src1.z}
 
-  dst.w = src0.w \ src1.w
+  dst.w = \frac{src0.w}{src1.w}
 
 
 .. opcode:: UMOD - Unsigned Integer Remainder
@@ -1226,13 +1246,13 @@ Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
 
 .. math::
 
-  dst.x = src0.x \ src1.x
+  dst.x = src0.x \bmod src1.x
 
-  dst.y = src0.y \ src1.y
+  dst.y = src0.y \bmod src1.y
 
-  dst.z = src0.z \ src1.z
+  dst.z = src0.z \bmod src1.z
 
-  dst.w = src0.w \ src1.w
+  dst.w = src0.w \bmod src1.w
 
 
 .. opcode:: NOT - Bitwise Not
@@ -1583,48 +1603,43 @@ These opcodes are used for bit-level manipulation of integers.
 
 .. opcode:: IBFE - Signed Bitfield Extract
 
-  See SM5 instruction of the same name. Extracts a set of bits from the input,
-  and sign-extends them if the high bit of the extracted window is set.
+  Like GLSL bitfieldExtract. Extracts a set of bits from the input, and
+  sign-extends them if the high bit of the extracted window is set.
 
   Pseudocode::
 
     def ibfe(value, offset, bits):
-      offset = offset & 0x1f
-      bits = bits & 0x1f
+      if offset < 0 or bits < 0 or offset + bits > 32:
+        return undefined
       if bits == 0: return 0
       # Note: >> sign-extends
-      if width + offset < 32:
-        return (value << (32 - offset - bits)) >> (32 - bits)
-      else:
-        return value >> offset
+      return (value << (32 - offset - bits)) >> (32 - bits)
 
 .. opcode:: UBFE - Unsigned Bitfield Extract
 
-  See SM5 instruction of the same name. Extracts a set of bits from the input,
-  without any sign-extension.
+  Like GLSL bitfieldExtract. Extracts a set of bits from the input, without
+  any sign-extension.
 
   Pseudocode::
 
     def ubfe(value, offset, bits):
-      offset = offset & 0x1f
-      bits = bits & 0x1f
+      if offset < 0 or bits < 0 or offset + bits > 32:
+        return undefined
       if bits == 0: return 0
       # Note: >> does not sign-extend
-      if width + offset < 32:
-        return (value << (32 - offset - bits)) >> (32 - bits)
-      else:
-        return value >> offset
+      return (value << (32 - offset - bits)) >> (32 - bits)
 
 .. opcode:: BFI - Bitfield Insert
 
-  See SM5 instruction of the same name. Replaces a bit region of 'base' with
-  the low bits of 'insert'.
+  Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits
+  of 'insert'.
 
   Pseudocode::
 
     def bfi(base, insert, offset, bits):
-      offset = offset & 0x1f
-      bits = bits & 0x1f
+      if offset < 0 or bits < 0 or offset + bits > 32:
+        return undefined
+      # << defined such that mask == ~0 when bits == 32, offset == 0
       mask = ((1 << bits) - 1) << offset
       return ((insert << offset) & mask) | (base & ~mask)
 
@@ -1847,7 +1862,10 @@ two-component vectors with doubled precision in each component.
 
 .. opcode:: DABS - Absolute
 
+.. math::
+
   dst.xy = |src0.xy|
+
   dst.zw = |src0.zw|
 
 .. opcode:: DADD - Add
@@ -2010,6 +2028,15 @@ Perform a * b + c with no intermediate rounding step.
   dst.zw = src0.zw \times src1.zw + src2.zw
 
 
+.. opcode:: DDIV - Divide
+
+.. math::
+
+  dst.xy = \frac{src0.xy}{src1.xy}
+
+  dst.zw = \frac{src0.zw}{src1.zw}
+
+
 .. opcode:: DRCP - Reciprocal
 
 .. math::
@@ -2090,7 +2117,10 @@ two-component vectors with 64-bits in each component.
 
 .. opcode:: I64ABS - 64-bit Integer Absolute Value
 
+.. math::
+
   dst.xy = |src0.xy|
+
   dst.zw = |src0.zw|
 
 .. opcode:: I64NEG - 64-bit Integer Negate
@@ -2100,6 +2130,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = -src.xy
+
   dst.zw = -src.zw
 
 .. opcode:: I64SSG - 64-bit Integer Set Sign
@@ -2107,6 +2138,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0
+
   dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0
 
 .. opcode:: U64ADD - 64-bit Integer Add
@@ -2114,6 +2146,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = src0.xy + src1.xy
+
   dst.zw = src0.zw + src1.zw
 
 .. opcode:: U64MUL - 64-bit Integer Multiply
@@ -2121,6 +2154,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = src0.xy * src1.xy
+
   dst.zw = src0.zw * src1.zw
 
 .. opcode:: U64SEQ - 64-bit Integer Set on Equal
@@ -2128,6 +2162,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.x = src0.xy == src1.xy ? \sim 0 : 0
+
   dst.z = src0.zw == src1.zw ? \sim 0 : 0
 
 .. opcode:: U64SNE - 64-bit Integer Set on Not Equal
@@ -2135,6 +2170,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.x = src0.xy != src1.xy ? \sim 0 : 0
+
   dst.z = src0.zw != src1.zw ? \sim 0 : 0
 
 .. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than
@@ -2142,6 +2178,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.x = src0.xy < src1.xy ? \sim 0 : 0
+
   dst.z = src0.zw < src1.zw ? \sim 0 : 0
 
 .. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal
@@ -2149,6 +2186,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.x = src0.xy >= src1.xy ? \sim 0 : 0
+
   dst.z = src0.zw >= src1.zw ? \sim 0 : 0
 
 .. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than
@@ -2156,6 +2194,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.x = src0.xy < src1.xy ? \sim 0 : 0
+
   dst.z = src0.zw < src1.zw ? \sim 0 : 0
 
 .. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal
@@ -2163,6 +2202,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.x = src0.xy >= src1.xy ? \sim 0 : 0
+
   dst.z = src0.zw >= src1.zw ? \sim 0 : 0
 
 .. opcode:: I64MIN - Minimum of 64-bit Signed Integers
@@ -2170,6 +2210,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = min(src0.xy, src1.xy)
+
   dst.zw = min(src0.zw, src1.zw)
 
 .. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers
@@ -2177,6 +2218,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = min(src0.xy, src1.xy)
+
   dst.zw = min(src0.zw, src1.zw)
 
 .. opcode:: I64MAX - Maximum of 64-bit Signed Integers
@@ -2184,6 +2226,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = max(src0.xy, src1.xy)
+
   dst.zw = max(src0.zw, src1.zw)
 
 .. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers
@@ -2191,6 +2234,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = max(src0.xy, src1.xy)
+
   dst.zw = max(src0.zw, src1.zw)
 
 .. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer
@@ -2200,6 +2244,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = src0.xy << (0x3f \& src1.x)
+
   dst.zw = src0.zw << (0x3f \& src1.y)
 
 .. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer)
@@ -2209,6 +2254,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = src0.xy >> (0x3f \& src1.x)
+
   dst.zw = src0.zw >> (0x3f \& src1.y)
 
 .. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer)
@@ -2218,27 +2264,31 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = src0.xy >> (unsigned) (0x3f \& src1.x)
+
   dst.zw = src0.zw >> (unsigned) (0x3f \& src1.y)
 
 .. opcode:: I64DIV - 64-bit Signed Integer Division
 
 .. math::
 
-  dst.xy = src0.xy \ src1.xy
-  dst.zw = src0.zw \ src1.zw
+  dst.xy = \frac{src0.xy}{src1.xy}
+
+  dst.zw = \frac{src0.zw}{src1.zw}
 
 .. opcode:: U64DIV - 64-bit Unsigned Integer Division
 
 .. math::
 
-  dst.xy = src0.xy \ src1.xy
-  dst.zw = src0.zw \ src1.zw
+  dst.xy = \frac{src0.xy}{src1.xy}
+
+  dst.zw = \frac{src0.zw}{src1.zw}
 
 .. opcode:: U64MOD - 64-bit Unsigned Integer Remainder
 
 .. math::
 
   dst.xy = src0.xy \bmod src1.xy
+
   dst.zw = src0.zw \bmod src1.zw
 
 .. opcode:: I64MOD - 64-bit Signed Integer Remainder
@@ -2246,6 +2296,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
   dst.xy = src0.xy \bmod src1.xy
+
   dst.zw = src0.zw \bmod src1.zw
 
 .. opcode:: F2U64 - Float to 64-bit Unsigned Int
@@ -2253,6 +2304,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.xy = (uint64_t) src0.x
+
    dst.zw = (uint64_t) src0.y
 
 .. opcode:: F2I64 - Float to 64-bit Int
@@ -2260,6 +2312,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.xy = (int64_t) src0.x
+
    dst.zw = (int64_t) src0.y
 
 .. opcode:: U2I64 - Unsigned Integer to 64-bit Integer
@@ -2269,6 +2322,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.xy = (uint64_t) src0.x
+
    dst.zw = (uint64_t) src0.y
 
 .. opcode:: I2I64 - Signed Integer to 64-bit Integer
@@ -2278,6 +2332,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.xy = (int64_t) src0.x
+
    dst.zw = (int64_t) src0.y
 
 .. opcode:: D2U64 - Double to 64-bit Unsigned Int
@@ -2285,6 +2340,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.xy = (uint64_t) src0.xy
+
    dst.zw = (uint64_t) src0.zw
 
 .. opcode:: D2I64 - Double to 64-bit Int
@@ -2292,6 +2348,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.xy = (int64_t) src0.xy
+
    dst.zw = (int64_t) src0.zw
 
 .. opcode:: U642F - 64-bit unsigned integer to float
@@ -2299,6 +2356,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.x = (float) src0.xy
+
    dst.y = (float) src0.zw
 
 .. opcode:: I642F - 64-bit Int to Float
@@ -2306,6 +2364,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.x = (float) src0.xy
+
    dst.y = (float) src0.zw
 
 .. opcode:: U642D - 64-bit unsigned integer to double
@@ -2313,6 +2372,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.xy = (double) src0.xy
+
    dst.zw = (double) src0.zw
 
 .. opcode:: I642D - 64-bit Int to double
@@ -2320,6 +2380,7 @@ two-component vectors with 64-bits in each component.
 .. math::
 
    dst.xy = (double) src0.xy
+
    dst.zw = (double) src0.zw
 
 .. _samplingopcodes:
@@ -2489,21 +2550,54 @@ after lookup.
 
 .. opcode:: SAMPLE_POS
 
-  Query the position of a given sample.  dst receives float4 (x, y, 0, 0)
-  indicated where the sample is located. If the resource is not a multi-sample
-  resource and not a render target, the result is 0.
+  Query the position of a sample in the given resource or render target
+  when per-sample fragment shading is in effect.
+
+  Syntax: ``SAMPLE_POS dst, source, sample_index``
+
+  dst receives float4 (x, y, undef, undef) indicated where the sample is
+  located. Sample locations are in the range [0, 1] where 0.5 is the center
+  of the fragment.
+
+  source is either a sampler view (to indicate a shader resource) or temp
+  register (to indicate the render target).  The source register may have
+  an optional swizzle to apply to the returned result
+
+  sample_index is an integer scalar indicating which sample position is to
+  be queried.
+
+  If per-sample shading is not in effect or the source resource or render
+  target is not multisampled, the result is (0.5, 0.5, undef, undef).
+
+  NOTE: no driver has implemented this opcode yet (and no state tracker
+  emits it).  This information is subject to change.
 
 .. opcode:: SAMPLE_INFO
 
-  dst receives number of samples in x.  If the resource is not a multi-sample
-  resource and not a render target, the result is 0.
+  Query the number of samples in a multisampled resource or render target.
 
+  Syntax: ``SAMPLE_INFO dst, source``
+
+  dst receives int4 (n, 0, 0, 0) where n is the number of samples in a
+  resource or the render target.
+
+  source is either a sampler view (to indicate a shader resource) or temp
+  register (to indicate the render target).  The source register may have
+  an optional swizzle to apply to the returned result
+
+  If per-sample shading is not in effect or the source resource or render
+  target is not multisampled, the result is (1, 0, 0, 0).
+
+  NOTE: no driver has implemented this opcode yet (and no state tracker
+  emits it).  This information is subject to change.
 
 .. _resourceopcodes:
 
 Resource Access Opcodes
 ^^^^^^^^^^^^^^^^^^^^^^^
 
+For these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY.
+
 .. opcode:: LOAD - Fetch data from a shader buffer or image
 
                Syntax: ``LOAD dst, resource, address``
@@ -2566,6 +2660,19 @@ Resource Access Opcodes
   image, while .w will contain the number of samples for multi-sampled
   images.
 
+.. opcode:: FBFETCH - Load data from framebuffer
+
+  Syntax: ``FBFETCH dst, output``
+
+  Example: ``FBFETCH TEMP[0], OUT[0]``
+
+  This is only valid on ``COLOR`` semantic outputs. Returns the color
+  of the current position in the framebuffer from before this fragment
+  shader invocation. May return the same value from multiple calls for
+  a particular output within a single invocation. Note that result may
+  be undefined if a fragment is drawn multiple times without a blend
+  barrier in between.
+
 
 .. _threadsyncopcodes:
 
@@ -2642,8 +2749,8 @@ These opcodes provide atomic variants of some common arithmetic and
 logical operations.  In this context atomicity means that another
 concurrent memory access operation that affects the same memory
 location is guaranteed to be performed strictly before or after the
-entire execution of the atomic operation. The resource may be a buffer
-or an image. In the case of an image, the offset works the same as for
+entire execution of the atomic operation. The resource may be a BUFFER,
+IMAGE, or MEMORY.  In the case of an image, the offset works the same as for
 ``LOAD`` and ``STORE``, specified above. These atomic operations may
 only be used with 32-bit integer image formats.
 
@@ -2797,22 +2904,68 @@ only be used with 32-bit integer image formats.
   resource[offset] = (dst_x > src_x ? dst_x : src_x)
 
 
-.. _voteopcodes:
+.. _interlaneopcodes:
+
+Inter-lane opcodes
+^^^^^^^^^^^^^^^^^^
+
+These opcodes reduce the given value across the shader invocations
+running in the current SIMD group. Every thread in the subgroup will receive
+the same result. The BALLOT operations accept a single-channel argument that
+is treated as a boolean and produce a 64-bit value.
+
+.. opcode:: VOTE_ANY - Value is set in any of the active invocations
+
+  Syntax: ``VOTE_ANY dst, value``
+
+  Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x``
+
+
+.. opcode:: VOTE_ALL - Value is set in all of the active invocations
+
+  Syntax: ``VOTE_ALL dst, value``
+
+  Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x``
+
+
+.. opcode:: VOTE_EQ - Value is the same in all of the active invocations
+
+  Syntax: ``VOTE_EQ dst, value``
+
+  Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x``
+
 
-Vote opcodes
-^^^^^^^^^^^^
+.. opcode:: BALLOT - Lanemask of whether the value is set in each active
+            invocation
 
-These opcodes compare the given value across the shader invocations
-running in the current SIMD group. The details of exactly which
-invocations get compared are implementation-defined, and it would be a
-correct implementation to only ever consider the current thread's
-value. (i.e. SIMD group of 1). The argument is treated as a boolean.
+  Syntax: ``BALLOT dst, value``
 
-.. opcode:: VOTE_ANY - Value is set in any of the current invocations
+  Example: ``BALLOT TEMP[0].xy, TEMP[1].x``
 
-.. opcode:: VOTE_ALL - Value is set in all of the current invocations
+  When the argument is a constant true, this produces a bitmask of active
+  invocations. In fragment shaders, this can include helper invocations
+  (invocations whose outputs and writes to memory are discarded, but which
+  are used to compute derivatives).
 
-.. opcode:: VOTE_EQ - Value is the same in all of the current invocations
+
+.. opcode:: READ_FIRST - Broadcast the value from the first active
+            invocation to all active lanes
+
+  Syntax: ``READ_FIRST dst, value``
+
+  Example: ``READ_FIRST TEMP[0], TEMP[1]``
+
+
+.. opcode:: READ_INVOC - Retrieve the value from the given invocation
+            (need not be uniform)
+
+  Syntax: ``READ_INVOC dst, value, invocation``
+
+  Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x``
+
+  invocation.x controls the invocation number to read from for all channels.
+  The invocation number must be the same across all active invocations in a
+  sub-group; otherwise, the results are undefined.
 
 
 Explanation of symbols used
@@ -3102,6 +3255,11 @@ For geometry shaders, this semantic label indicates that an output
 contains the index of the viewport (and scissor) to use.
 This is an integer value, and only the X component is used.
 
+If PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
+supported, then this semantic label can also be used in vertex or
+tessellation evaluation shaders, respectively. Only the value written in the
+last vertex processing stage is used.
+
 
 TGSI_SEMANTIC_LAYER
 """""""""""""""""""
@@ -3111,6 +3269,11 @@ contains the layer value to use for the color and depth/stencil surfaces.
 This is an integer value, and only the X component is used.
 (Also known as rendertarget array index.)
 
+If PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
+supported, then this semantic label can also be used in vertex or
+tessellation evaluation shaders, respectively. Only the value written in the
+last vertex processing stage is used.
+
 
 TGSI_SEMANTIC_CULLDIST
 """"""""""""""""""""""
@@ -3164,22 +3327,33 @@ TGSI_SEMANTIC_SAMPLEID
 """"""""""""""""""""""
 
 For fragment shaders, this semantic label indicates that a system value
-contains the current sample id (i.e. gl_SampleID).
-This is an integer value, and only the X component is used.
+contains the current sample id (i.e. gl_SampleID) as an unsigned int.
+Only the X component is used.  If per-sample shading is not enabled,
+the result is (0, undef, undef, undef).
 
 TGSI_SEMANTIC_SAMPLEPOS
 """""""""""""""""""""""
 
-For fragment shaders, this semantic label indicates that a system value
-contains the current sample's position (i.e. gl_SamplePosition). Only the X
-and Y values are used.
+For fragment shaders, this semantic label indicates that a system
+value contains the current sample's position as float4(x, y, undef, undef)
+in the render target (i.e.  gl_SamplePosition) when per-fragment shading
+is in effect.  Position values are in the range [0, 1] where 0.5 is
+the center of the fragment.
 
 TGSI_SEMANTIC_SAMPLEMASK
 """"""""""""""""""""""""
 
-For fragment shaders, this semantic label indicates that an output contains
-the sample mask used to disable further sample processing
-(i.e. gl_SampleMask). Only the X value is used, up to 32x MS.
+For fragment shaders, this semantic label can be applied to either a
+shader system value input or output.
+
+For a system value, the sample mask indicates the set of samples covered by
+the current primitive.  If MSAA is not enabled, the value is (1, 0, 0, 0).
+
+For an output, the sample mask is used to disable further sample processing.
+
+For both, the register type is uint[4] but only the X component is used
+(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up
+to 32x MSAA is supported).
 
 TGSI_SEMANTIC_INVOCATIONID
 """"""""""""""""""""""""""
@@ -3322,6 +3496,57 @@ For compute shaders, this semantic indicates the (x, y, z) coordinates of the
 current thread inside of the block.
 
 
+TGSI_SEMANTIC_SUBGROUP_SIZE
+"""""""""""""""""""""""""""
+
+This semantic indicates the subgroup size for the current invocation. This is
+an integer of at most 64, as it indicates the width of lanemasks. It does not
+depend on the number of invocations that are active.
+
+
+TGSI_SEMANTIC_SUBGROUP_INVOCATION
+"""""""""""""""""""""""""""""""""
+
+The index of the current invocation within its subgroup.
+
+
+TGSI_SEMANTIC_SUBGROUP_EQ_MASK
+""""""""""""""""""""""""""""""
+
+A bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
+``1 << subgroup_invocation`` in arbitrary precision arithmetic.
+
+
+TGSI_SEMANTIC_SUBGROUP_GE_MASK
+""""""""""""""""""""""""""""""
+
+A bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
+``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation``
+in arbitrary precision arithmetic.
+
+
+TGSI_SEMANTIC_SUBGROUP_GT_MASK
+""""""""""""""""""""""""""""""
+
+A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
+``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)``
+in arbitrary precision arithmetic.
+
+
+TGSI_SEMANTIC_SUBGROUP_LE_MASK
+""""""""""""""""""""""""""""""
+
+A bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
+``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic.
+
+
+TGSI_SEMANTIC_SUBGROUP_LT_MASK
+""""""""""""""""""""""""""""""
+
+A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
+``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
+
+
 Declaration Interpolate
 ^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -3507,12 +3732,12 @@ If set to a non-zero value, this turns on point mode for the tessellator,
 which means that points will be generated instead of primitives.
 
 NUM_CLIPDIST_ENABLED
-""""""""""""""""
+""""""""""""""""""""
 
 How many clip distance scalar outputs are enabled.
 
 NUM_CULLDIST_ENABLED
-""""""""""""""""
+""""""""""""""""""""
 
 How many cull distance scalar outputs are enabled.
 
@@ -3530,13 +3755,33 @@ Which shader stage will MOST LIKELY follow after this shader when the shader
 is bound. This is only a hint to the driver and doesn't have to be precise.
 Only set for VS and TES.
 
-TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
-"""""""""""""""""""""""""""""""""""""""""""""""""""
+CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
+"""""""""""""""""""""""""""""""""""""
 
 Threads per block in each dimension, if known at compile time. If the block size
 is known all three should be at least 1. If it is unknown they should all be set
 to 0 or not set.
 
+MUL_ZERO_WINS
+"""""""""""""
+
+The MUL TGSI operation (FP32 multiplication) will return 0 if either
+of the operands are equal to 0. That means that 0 * Inf = 0. This
+should be set the same way for an entire pipeline. Note that this
+applies not only to the literal MUL TGSI opcode, but all FP32
+multiplications implied by other operations, such as MAD, FMA, DP2,
+DP3, DP4, DPH, DST, LOG, LRP, XPD, and possibly others. If there is a
+mismatch between shaders, then it is unspecified whether this behavior
+will be enabled.
+
+FS_POST_DEPTH_COVERAGE
+""""""""""""""""""""""
+
+When enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples
+that have failed the depth/stencil tests. This is only valid when
+FS_EARLY_DEPTH_STENCIL is also specified.
+
+
 Texture Sampling and Texture Formats
 ------------------------------------
author	Jonathan Gray <jsg@cvs.openbsd.org>	2017-12-31 07:12:27 +0000
committer	Jonathan Gray <jsg@cvs.openbsd.org>	2017-12-31 07:12:27 +0000
commit	051645c92924bf915d82bf219f2ed67309b5577a (patch)
tree	4aae126dd8e5a18c6a9926a5468d1561e6038a07 /lib/mesa/src/gallium/docs/source
parent	2dae6fe6f74cf7fb9fd65285302c0331d9786b00 (diff)