It's no longer used outside anv_batch_chain so we certainly don't need to
be exporting. Inside anv_batch_chain, it's only used twice and it can be
replaced by a single line so there's really no point.
Previously, our location handling was focussed on either no location
(usually implicit 0) or a builting. Unfortunately, if you gave it a
location, it would blow it away and just not care. This worked fine with
crucible and our meta shaders but didn't work with the CTS. The new code
uses the "data.explicit_location" field to denote that it has a "final"
location (usually from a builtin) and, otherwise, the location is
considered to be relative to the base for that shader stage.
This allows us to allocate from either side of the block pool in a
consistent way. If you use the previous block_pool_alloc function, you
will get offsets from the start of the pool as normal. If you use the new
block_pool_alloc_back function, you will get a negative index that
corresponds to something in the "back" of the pool.
This has the unfortunate side-effect of making it so that we can't have a
block pool bigger than 1GB. However, that's unlikely to happen and, for
the sake of bi-directional block pools, we need to negative offsets.
We don't have any locking issues yet because we use the pool size itself as
a mutex in block_pool_alloc to guarantee that only one thread is resizing
at a time. However, we are about to add support for growing the block pool
at both ends. This introduces two potential races:
1) You could have two block_pool_alloc() calls that both try to grow the
block pool, one from each end.
2) The relocation handling code will now have to think about not only the
bo that we use for the block pool but also the offset from the start of
that bo to the center of the block pool. It's possible that the block
pool growing code could race with the relocation handling code and get
a bo and offset out of sync.
Grabbing the device mutex solves both of these problems. Thanks to (2), we
can't really do anything more granular.
Partially implement the below functions for 3D images:
vkCmdCopyBufferToImage
vkCmdCopyImageToBuffer
vkCmdCopyImage
vkCmdBlitImage
Not all features work, and there is much for performance improvement.
Beware that vkCmdCopyImage and vkCmdBlitImage are untested. Crucible
proves that vkCmdCopyBufferToImage and vkCmdCopyImageToBuffer works,
though.
Supported:
- copy regions with z offset
Unsupported:
- copy regions with extent.depth > 1
Crucible test results on master@d452d2b are:
pass: func.miptree.r8g8b8a8-unorm.*.view-3d.*
pass: func.miptree.d32-sfloat.*.view-3d.*
fail: func.miptree.s8-uint.*.view-3d.*
The field's meaning depends on SURFACE_STATE::SurfaceType.
Make that correlation explicit by switching on VkImageType.
For good measure, add some PRM quotes too.
Calling vkCreateImage() with VK_IMAGE_TYPE_3D now succeeds and computes
the surface layout correctly. However, 3D images do not yet work for
many other Vulkan entrypoints.
Previously, we simply had a big blob of stuff for "driver constants". Now,
we have a very specific data structure that contains the driver constants
that we care about.
There were merge conflicts in spirv.h that got missed because they were in
a comment and so it still compiled. This gets rid of them and we should be
on-par with upstream spirv->nir.
Pulling in libwayland causes undefined symbols in applications that are
linked against vulkan alone. Ideally, we would like to dlopen a platform
support library or something like that. For now, this works and should get
crucible running again.
We do this for two reasons: First, because it allows us to simplify WSI and
compiling in/out support for a particular platform is as simple as calling
or not calling the platform-specific init function. Second, the
implementation gives us a place for a given chunk of the WSI to stash
stuff in the instance.
Unfortunately, this is a very large commit and removes the old LunarG WSI
extension. This is because there are a couple of entrypoints that have the
same name between the two extensions so implementing them both is
impractiacl.
Support is still incomplete, but this is enough to get vkcube up and going
again.
Now that we don't compile GLSL, we can roll back a few more hacks and
unexport some things from the backend compiler.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Since we switched away from calling brwCreateContext() there's a bit of
hacky support we can now delete. This reduces our diff to upstream master.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We used to always just do a one-level fallback from genX_* to anv_*
entry points. That worked for gen7 and gen8 where all entry points were
either different or could be made anv_* entry points (eg
anv_CreateDynamicViewportState). We're about to add gen9 and now need to
be able to fall back to gen8 entry points for most things.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We'll change the dispatch mechanism again in a later commit. Stop using
the driver_layer function pointers and just use the public entry points.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Print out file and line number and translate the error code to the
symbolic name.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Create R8_UINT VkAttachmentView and VkImageView for the stencil data.
This fixes a crash, but the pixels in the destination image are still
incorrect. They are not properly tiled.
Fixes crashes in Crucible tests func.miptree.s8-uint.aspect-stencil.* as
of crucible-7471449. Test results improve 'lost' -> 'fail'.
In Vulkan, VertexId and InstanceId will be zero-based and new intrinsics,
VertexIndex and InstanceIndex, will be added for non-zer-based. See also,
Khronos bug #14255
From now on, the majority of SPIR-V improvements should happen on the spirv
branch which will also be public. It will be frequently merged into the
vulkan driver.
This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node. Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node. By just resetting to after_instr, we
avoid all these problems.
I've been promissed in a bug that this will be fixed in a future version of
the header. However, in the interest of my branch building, I'm adding
these changes in myself for the moment.
We don't actually use it to create all the instructions but we do use it
for insertion always. This should make things far more consistent for
implementing extended instructions.
We do control-flow handling as a two-step process. The first step is to
walk the instructions list and record various information about blocks and
functions. This is where the acutal nir_function_overload objects get
created. We also record the start/stop instruction for each block. Then
a second pass walks over each of the functions and over the blocks in each
function in a way that's NIR-friendly and actually parses the instructions.
Previously, they were hidden behind a #ifdef __cplusplus so C wouldn't find
them. This commit simpliy moves the #ifdef and adds #ifdef's around
constructors.
Instead of having functions to add values and set various things, we just
have a function that does a few asserts and then returns the value. The
caller is then responsible for setting the various fields.
This function's cases for non-generic compressed formats duplicate
the GL to MESA translation in _mesa_glenum_to_compressed_format().
This patch replaces the switch cases with a call to the translation
function. This change teaches this function about ASTC, thus enabling
ASTC for glTex*Storage*() calls.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
MESA_FORMAT_RGBA_DXT5 should actually be reserved for GL_RGBA[4]_DXT5_S3TC.
Also, Gallium and other dri drivers (radeon and nouveau) follow this mapping
scheme.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.
I think this fixes one of the earlier shader going off end
of memory problems we've stopped.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things,
SEL with these conditional mods are commutative.
See commit 3b7f683f.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.
This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/
v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
surface state open coded because it seems to go better with the existing code.
v3:
Use and inline function only in gen8_emit_texture_surface_state() (Matt).
Cc: Matt Turner <mattst88@gmail.com>
Cc: Nanley Chery <nanleychery@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This can be done with a single pass for the instruction base,
and takes renumber_registers out of its spot on the profile.
Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
The glsl->tgsi convertor does some temporary register reduction
however in profiling shader-db this shows up quite highly,
so optimise things to reduce the number of loops through
all the instructions we do. This drops merge_registers
from 4-5% on the profile to 1%. I think this can be reduced
further by possibly optimising the renumber pass.
Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of looking this up lots, lets just cache it in the instruction
translation up front. I just noticed this function what high in a profile
of shader-db on radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The selector is shared by all shader variants, so the
individual shaders shouldn't change it. Use tgsi_shader_scan()
results to set geometry properties within a
r600_create_shader_state() call and treat said propertices in
the selector as read-only within r600_shader_from_tgsi().
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Note that 'geometry shader properties' should be carried in the
selector state over the shader state in any case.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The hardware is capable of dealing with GL1-style user clip planes.
No clip vertex, no clip distances. Fixes a number of ucp tests, as well
as neverball.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
According to NVIDIA, local performance counters (MP) are prefixed
with SM, while global performance counters (PCOUNTER) are called PM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Otherwise this will crash on 32-bit, and it gets rid of
warnings building on 32-bit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This greatly improves generated code, especially for the snorm variants,
since it is able to get rid of the lshift/rshift for sext, as well as
replacing each shift + mask with a single op.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It is fairly tricky to detect the proper conditions for using bitfield
insert, but easy to just use it up front. This removes a lot of
instructions on nvc0 when invoking the packing builtins.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No need to walk through instructions in blocks we know don't contain our
registers' live ranges.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I always thought that the is_control_flow() -> return false check was a
bad hack, and some previous attempts to remove it have failed and have
been reverted.
The previous two patches fix some problems that caused register
coalescing to not notice some interference between registers, which the
is_control_flow() check apparently works around.
With that fixed, we can calculate interference more accurately.
total instructions in shared programs: 6261319 -> 6257917 (-0.05%)
instructions in affected programs: 346282 -> 342880 (-0.98%)
helped: 1552
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
equals() returns false for registers with different types, using it
isn't appropriate to determine whether an is overwriting a register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Noticed when debugging things that lead to the next patch.
On G45 (and presumably ILK) this helps register coalescing:
total instructions in shared programs: 4077373 -> 4077340 (-0.00%)
instructions in affected programs: 43751 -> 43718 (-0.08%)
helped: 52
HURT: 2
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Zero sized uniforms can exist in the list, but they don't get get any space
allocated in prog_data->params or in the param_size array, so the size
should not be set for them. This was previously fixed in:
commit: 781dc7c0e1.
However,
commit: 259f7291de
removed the fix.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If the sampler object has been deleted in the same context the binding
will have been cleared. If it has been deleted in another context, the
spec does not say what should returned. None of the other binding point
queries check for deletion in another context.
Also, as names of deleted objects are free for reuse, the current code
didn't even work reliably.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
This adds index queries (glGet*i_v) for GL_TEXTURE_BINDING_* and
GL_SAMPLER_BINDING, as well as textue queries
(glGetTex{,ture}Parameter*) for GL_TEXTURE_TARGET.
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Stencil buffers have strange pitch. The PRM says:
The pitch must be set to 2x the value computed based on width,
as the stencil buffer is stored with two rows interleaved.
anv_cmd_buffer_clear_attachments() skipped the clear renderpass if no
color attachments needed to be cleared, even if a depth attachment
needed to be cleared.
In anv_depth_stencil_view, replace the members
bo
depth_offset
depth_stride
depth_format
depth_qpitch
stencil_offset
stencil_stride
stencil_qpitch
with the single member
const struct anv_image *image
The removed members duplicated data in anv_image::depth_surface and
anv_image::stencil_surface.
The format of the view itself and of the view's image may differ.
Moreover, if the view's format has no depth aspect but the image's
format does, we must not program the depth buffer. Ditto for stencil.
Shaders that contain instruction data after an instruction with EOP could end
up parsing that as an instruction, leading to various crashes and asserts in
SB as it gets very confused if it sees for instance a loop start instruction
jumping off to some random point.
Add a couple of asserts, and print EOP bit if set in old asm printer.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cube maps are special in that they have separate teximages for each
face. We handled that by copying the data to them separately, but in
case zoffset != 0 or depth != 6 we would read off the end of the client
array or modify the wrong images.
zoffset/depth have already been verified by the time the code gets to
this stage, so no need to double-check.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
The NIR cursor API is exactly what we want for the builder's insertion
point. This simplifies the API, the implementation, and is actually
more flexible as well.
This required a bit of reworking of TGSI->NIR's if/loop stack handling;
we now store cursors instead of cf_node_lists, for better or worse.
v2: Actually move the cursor in the after_instr case.
v3: Take advantage of nir_instr_insert (suggested by Connor).
v4: vc4 build fixes (thanks to Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v4]
Acked-by: Connor Abbott <cwabbott0@gmail.com> [v4]
This patch implements a general nir_instr_insert() function that takes a
nir_cursor for the insertion point. It then reworks the existing API to
simply be a wrapper around that for compatibility.
This largely involves moving the existing code into a new function.
Suggested by Connor Abbott.
v2: Make the legacy functions static inline in nir.h (requested by
Connor Abbott).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
We want to use this for normal instruction insertion too, not just
control flow. Generally these functions are going to be extremely
useful when working with NIR, so I want them to be widely available
without having to include a separate file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
Jumps must be the last instruction in a block, so inserting another
instruction after a jump is illegal.
Previously, we only checked this when the new instruction being inserted
was a jump. This is a red herring - inserting *any* kind of instruction
after a jump is illegal.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we used PROGRAM_ARRAY only for variables which were
arrays or matrices. But if the variable is a structure containing
an array or matrix, we need to use PROGRAM_ARRAY for that too.
Before, we failed an assertion:
state_tracker/st_glsl_to_tgsi.cpp:4900:
Assertion `src_reg->file != PROGRAM_TEMPORARY' failed.
when running the piglit test
glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The split_virtual_grfs code doesn't properly rewrite reladdr so we need to
make sure that any uniform indirects are lowered away first.
This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all constant locations are assigned in a single function, we can
refactor it a bit to unify things. In particular, we now handle
pull_constant_loc and push_constant_loc more similarly and we only modify
stage_prog_data->params[] in one place at the end of the function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
driParseDebugString() doesn't have actual code to parse comma separated
lists (or any other supported options?); instead it dumbly uses strstr().
This means that INTEL_DEBUG="vec4vs" will trigger both DEBUG_VEC4VS and
DEBUG_VS, as "vs" is also a substring.
We should probably improve the driconf parsing, but for now, just rename
the option so it's usable in the meantime.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
v2: use ARB_texture_multisample enable bit
Patch adds extension enable bit and enables required keywords
and builtin functions for the extension.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This patch creates a new macro, FETCH_COMPRESSED - similar in nature
to the other FETCH_* macros. This reduces repetition in the code that
deals with compressed textures.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
In agreement with the coding style, functions that aren't directly visible
to the GL API should prefer the use of bool over GLboolean.
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Remove redundant checks and comments by grouping our calculations for
align_w and align_h wherever possible.
v2: reintroduce brw.
don't include functional changes.
don't adjust function parameters or create a new function.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
An ASTC block takes up 16 bytes for all block width and height configurations.
This size is not integrally divisible by all ASTC block widths. Therefore cpp
is changed to mean bytes per block if the texture is compressed.
Because the original definition was bytes per block divided by block width, all
references to the mipmap width must be divided the block width. This keeps the
address calculation formulas consistent. For example, the units for miptree_level
x_offset and miptree total_width has changed from pixels to blocks.
v2: reuse preexisting ALIGN_NPOT macro located in an i965 driver file.
v3: move ALIGN_NPOT into seperate commit.
simplify cpp assignment in copy_image_with_blitter().
update miptree width and offset variables in: intel_miptree_copy_slice(),
intel_miptree_map_gtt(), and brw_miptree_layout_texture_3d().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
In agreement with commit 4ab8d59a23, vertical alignment values are equal to
four times the block height on Gen9+.
v2: add newlines to separate declarations, statments, and comments.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
ALIGN is changed to ALIGN_NPOT because alignment values are sometimes not
powers of two when working with ASTC.
v2: handle texture arrays and LDR-only systems.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Aligning with a non-power-of-two number is a general task that can be used in
various places. This commit is required for the next one.
v2: add greater than 0 assertion (Anuj).
convert the macro to a static inline function.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
ALIGN and ROUND_DOWN_TO both require that the alignment value passed
into the macro be a power of two in the comments. Using software assertions
verifies this to be the case.
v2: use static inline functions instead of gcc-specific statement expressions (Brian).
v3: fix indendation (Brian).
v4: add greater than zero requirement (Anuj).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Define two-thirds of the 2D Intel ASTC surface formats (LDR-only). This allows
a 1-to-1 mapping from the mesa format to the Intel format.
ASTC textures will default to being processed in LDR mode. If there is
hardware support for HDR/Full mode and the texture is not sRGB, add the
format bit necessary to process it in HDR/Full mode.
v2: remove extra newlines.
v3: follow existing coding style in translate_tex_format().
v4: expound on the GEN9_SURFACE_ASTC_HDR_FORMAT_BIT comment.
update SF table - ASTC is actually supported in Gen8.
v5: conform the ASTC MESA_FORMAT enums to the existing naming convention.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
This is necesary to initialize the gl_texture_image struct.
From the KHR_texture_compression_astc_ldr spec:
"Added to Section 3.8.6, Compressed Texture Images
Add the tokens specified above to Table 3.16, Compressed Internal Formats.
In all cases, the base internal format will be RGBA. The encoding allows
images to be encoded with fewer channels, but this is always presented as
RGBA to the sampler."
v2. use _mesa_is_astc_format().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
The ASTC spec was revised as follows:
Revision 2, April 28, 2015 - added CompressedTex{Sub,}Image3D to
commands accepting ASTC format tokens in the New Tokens section [...].
Support only exists in the HDR submode:
Add a second new column "3D Tex." which is empty for all non-ASTC
formats. If only the LDR profile is supported by the implementation,
this column is also empty for all ASTC formats. If both the LDR and HDR
profiles are supported only, this column is checked for all ASTC
formats.
LDR-only systems should generate an INVALID_OPERATION error when
attempting to call CompressedTexImage3D with the TEXTURE_3D target.
v2. return the proper error for LDR-only systems.
v3. update is_astc_format().
v4. use _mesa_is_astc_format().
v5. place logic in _mesa_target_can_be_compressed.
v6. fix issues handling ASTC formats.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
In agreement with the ASTC spec, this makes calls to TexImage*D unsuccessful.
Implied by the spec, Generate[Texture]Mipmap and [Copy]Tex[Sub]Image*D calls
must be unsuccessful as well.
v2. actually force attempts to compress online to fail.
v3. indentation (Matt).
v4. update copytexture_error_check to account for CopyTexImage*D (Chad).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
v2: correct the spelling of the sRGB variants.
remove spaces around "=" when setting the enum value.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Define the mesa formats and make changes necessary for compilation
without errors. Also add support for _mesa_get_srgb_format_linear().
v2. conform the ASTC MESA_FORMAT enums to the existing naming convention.
v3. remove ASTC cases for _mesa_get_uncompressed_format(). This function is
only used for generating mipmaps - something ASTC formats do not support
due to lack of online compression.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
If the packet encoding is defined in the same format as register definitions,
the python script can process them automatically and the parser support
becomes trivial.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This adds trace points to all IBs and the parser prints them and also
prints which trace points were reached (executed) by the CP.
This can help pinpoint a problematic packet, draw call, etc.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
GPU hang detection must be enabled by setting: GALLIUM_DDEBUG=[timeout in ms]
This may print too much information that we might not understand yet,
but some of the bits are very useful.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This makes writing a good IB parser a lot easier.
It generates 2 tables:
- packet3 table
- register table with all registers, fields, and named values
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Instruction encoding isn't needed in Mesa.
The border color address registers were duplicated.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
v2: lots of improvements
This is like identity or trace, but simpler. It doesn't wrap most states.
Run with:
GALLIUM_DDEBUG=1000 [executable]
where "executable" is the app and "1000" is in miliseconds, meaning that
the context will be considered hung if a fence fails to signal in 1000 ms.
If that happens, all shaders, context states, bound resources, draw
parameters, and driver debug information (if any) will be dumped into:
/home/$username/dd_dumps/$processname_$pid_$index.
Note that the context is flushed after every draw/clear/copy/blit operation
and then waited for to find the exact call that hangs.
You can also do:
GALLIUM_DDEBUG=always
to do the dumping after every draw/clear/copy/blit operation without
flushing and waiting.
Examples of driver states that can be dumped are:
- Hardware status registers saying which hw block is busy (hung).
- Disassembled shaders in a human-readable form.
- The last submitted command buffer in a human-readable form.
v2: drop pipe-loader changes, drop SConscript
rename dd.h -> dd_pipe.h
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This works if drivers upsample on upload (like all radeon ones do).
The alternative is an unexpected GL error from anything calling
_mesa_update_state and possibly other issues.
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Otherwise we get:
warning: 'num_user_sgprs' may be used uninitialized in this function
...
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Patch refactors existing parameters check to first check common enums
between desktop GL and GLES 3.1 and modifies get_tex_level_parameter_image
to be compatible with enums specified in 3.1.
v2: remove extra is_gles31() checks (suggested by Ilia)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to OpenGL ES specification section 7.12,
GL_COMPUTE_WORK_GROUP_SIZE, is supported by the
glGetProgramiv function.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
According to the OpenGL ES 3.1 specification chapter 17, the
MAX_COMPUTE_WORK_GROUP_COUNT and MAX_COMPUTE_WORK_GROUP_SIZE
is available for glGetIntegeri_v.
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
commit 26c549e69d
Author: Nanley Chery <nanley.g.chery@intel.com>
Date: Fri Jul 31 10:26:36 2015 -0700
mesa/formats: remove compressed formats from matching function
caused a regression in my CTS testing, this looks like a clear
thinko.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
sSigned-off-by: Dave Airlie <airlied@redhat.com>
gen8_graphics_pipeline_create had a bunch of stuff in it that's already set
up by anv_pipeline_init. The duplication was causing double-initialization
of a state stream and made valgrind very angry.
I used this as some testing ground for investigating some compiler
bits initially (e.g. lrint calls etc.), figured I could do much better
in the end just for fun...
This is mathematically equivalent, but uses some tricks to avoid
doubles and also replaces some float math with ints. Good for another
performance doubling or so. As a side note, some quick tests show that
llvm's loop vectorizer would be able to properly vectorize this version
(which it failed to do earlier due to doubles, producing a mess), giving
another 3 times performance increase with sse2 (more with sse4.1), but this
may not apply to mesa.
No piglit change.
Acked-by: Marek Olšák <marek.olsak@amd.com>
This code (lifted straight from the extension) was doing things the most
inefficient way you could think of.
This drops some of the more expensive float operations, in particular
- int-cast floors (pointless, values always positive)
- 2 raised to (signed) integers (replace with simple exponent manipulation),
getting rid of a misguided comment in the process (implement with table...)
- float division (replace with mul of reverse of those exponents)
This is like 3 times faster (measured for float3_to_rgb9e5), though it depends
(e.g. llvm is clever enough to replace exp2 with ldexp whereas gcc is not,
division is not too bad on cpus with early-exit divs).
Note that keeping the double math for now (float x + 0.5), as the results may
otherwise differ.
Acked-by: Marek Olšák <marek.olsak@amd.com>
GetTexImage can read to stencil8 but only from
a stencil or depthstencil textures.
This fixes a bunch of failures in CTS
GL33-CTS.gtf32.GL3Tests.packed_pixels
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This field in surface state is a bool, WriteOnlyCache is an enum from
GEN8.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Enables _mesa_target_can_be_compressed to return the appropriate GL error
depending on it's inputs. Use the parameter to return the appropriate GL error
for ETC2 formats on GLES3.
Suggested-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
All compressed formats return GL_FALSE and there isn't any evidence to
support that this behaviour would change. Remove all switch cases for
compressed formats.
v2. Since the exhaustive switch is removed, add a gtest to ensure
all formats are handled.
v3. Ensure that GL_NO_ERROR is set before returning.
v4. Fix an arg to _mesa_uncompressed_format_to_type_and_comps();
fix formatting and misc improvements (Chad).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
We currently check that our format info table is sane during context
initialization in debug builds. Perform this check during
`make check` instead. This enables format testing in release builds
and removes the requirement of an exhuastive switch for
_mesa_uncompressed_format_to_type_and_comps().
v2. indentation and conditional inclusion fixes (Chad).
allow tests to continue running if any format fails
and display the failing format name.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly. The proper way is to set the insertion point and
then simply insert things there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly. The proper way is to set the insertion point and
then simply insert things there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it easy for NIR passes to inspect what kind of shader they're
operating on.
Thanks to Michel Dänzer for helping me figure out where TGSI stores the
shader stage information.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
You wouldn't think the TileWalk mode matters when TiledSurface is
false. However, it has to be TILEWALK_XMAJOR. Make it so.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
The comment above move_uniform_array_access_to_pull_constants was
completely bogus because it has nothing to do with lowering instructions.
Instead, it's assiging locations of pull constants.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we treated the entire UNIFORM file as if it had two elements:
One for direct things and one for indirect. This is substantially
different from how the old visitor code handled it where each element was
effectively its own uniform. This commit makes the NIR path more like the
old ir_visitor path where each uniform is separate. This should allow us
to more easily make decisions about what to push.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the i965 backend, we want to be able to "pull apart" the uniforms and
push some of them into the shader through a different path. In order to do
this effectively, we need to know which variable is actually being referred
to by a given uniform load. Previously, it was completely flattened by
nir_lower_io which made things difficult. This adds more information to
the intrinsic to make this easier for us.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, there were four type_size() functions in play - the i965
compiler backend defined scalar and vec4 type_size() functions, and
nir_lower_io contained its own similar functions.
In fact, the i965 driver used nir_lower_io() and then looped over the
components using its own type_size - meaning both were in play. The
two are /basically/ the same, but not exactly in obscure cases like
subroutines and images.
This patch removes nir_lower_io's functions, and instead makes the
driver supply a function pointer. This gives the driver ultimate
flexibility in deciding how it wants to count things, reduces code
duplication, and improves consistency.
v2 (Jason Ekstrand):
- One side-effect of passing in a function pointer is that nir_lower_io is
now aware of and properly allocates space for image uniforms, allowing
us to drop hacks in the backend
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If there are no parameters, we don't need to create a nir_variable to
hold them...and allocating an array of length 0 is pretty bogus.
Should avoid i965 backend assertions in future patches Jason and I are
working on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I want to use C function pointers to these, and they don't use anything
in the visitor classes anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This way they don't implicitly increment the uniforms variable and don't
have to be called in-sequence during uniform setup.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The commit:
commit b49371b8ed
Author: Connor Abbott <cwabbott0@gmail.com>
AuthorDate: Tue Jul 21 19:54:18 2015 -0700
nir: move control flow modification to its own file
split out some control flow related APIs into a separate header, but did
not update drivers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The commit:
commit 8e0d4ef341
Author: Kenneth Graunke <kenneth@whitecape.org>
AuthorDate: Thu Aug 6 18:18:40 2015 -0700
nir: Delete the nir_function_impl::start_block field.
removed the start_block field without fixing up drivers..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Connor introduced this helper recently; we should use it here too.
I had to move the function earlier in the file for it to be available.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
With all the previous commits in place, we can now drop in support for
multiple platforms. First up is gen7 (Ivybridge).
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
This is a more logical place for it, between geometry front end state
and pixel backend state.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
This commit moves all occurances of gen8 specific state into a gen8
substruct. This clearly identifies the state as gen8 specific and
prepares for adding gen7 state structs. In the process we also rename
the field names to exactly match the command or state packet name,
without the 3DSTATE prefix, eg:
3DSTATE_VF -> gen8.vf
3DSTATE_WM_DEPTH_STENCIL -> gen8.wm_depth_stencil
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
The clear pipeline didn't have a vertex shader and relied on the clear
shader being hardcoded by the compiler to accept one attribute. This
necessitated a few special cases in the 3DSTATE_VS setup. Instead,
always provide a vertex shader, even if we disable VS dispatch.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Many of of these fields aren't used for buffer surfaces, so leave them
out for brevity.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
This adds VALIGN_2 and VALIGN_4 defines for IVB and HSW
RENDER_SURFACE_STATE.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
I'd prefer to move anv_CreateAttachmentView() as well, but it's a little
too much generic code to just duplicate for each gen. For now, we'll
add a anv_color_attachment_view_init() to dispatch to the gen specific
implementation, which we then call from anv_CreateAttachmentView().
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We'll probably want to move some code back into a shared init function,
but this gets one GEN8 surface state initialization out of anv_image.c.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We need this for generating surface state on the fly for dynamic buffer
views.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We're going to have to do this differently for earlier gens, so lets do
it in place only.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Since the extra dword in MI_BATCH_BUFFER_START added in gen8 is at the
end of the struct, we can emit the gen8 packet on all gens as long as we
set the instruction length correctly.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We used to use a manual GEN8_MI_BATCH_BUFFER_START_pack() call, but this
refactors the code to use anv_batch_emit();
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We'll organize gen specific code in three files per gen: pipeline,
cmd_buffer and state, eg:
gen8_cmd_buffer.c
gen8_pipeline.c
gen8_state.c
where gen8_cmd_buffer.c holds all vkCmd* entry points, gne8_pipeline.c
all gen specific code related to pipeline building and remaining state
code (sampler, surface state, dynamic state) in gen8_state.c.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
This gives us some testing of it. Also, the old nir_cf_node_remove()
wasn't handling phi nodes correctly and was calling cleanup_cf_node()
too late.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These will help us do a number of things, including:
- Early return elimination.
- Dead control flow elimination.
- Various optimizations, such as replacing:
if (foo) {
...
}
if (!foo) {
...
}
with:
if (foo) {
...
} else {
...
}
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a helper that will be shared between the new control flow
insertion and modification code.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now, it allows us to refactor the control flow insertion API's so
that there's a single entrypoint (with some wrappers). More importantly,
it will allow us to reduce the combinatorial explosion in the extract
function. There, we need to specify two points to extract, which may be
at the beginning of a block, the end of a block, or in the middle of a
block. And then there are various wrappers based off of that (before a
control flow node, before a control flow list, etc.). Rather than having
9 different functions, we can have one function and push the actual
logic of determining which variant to use down to the split function,
which will be shared with nir_cf_node_insert().
In the future, we may want to make the instruction insertion API's as
well as the builder use this, but that's a future cleanup.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we insert a single basic block A into another basic block B, we
will split B into C and D, insert A in the middle, and then splice
together C, A, and D. When we splice together C and A, we need to move
the successors of A into C -- except A has no successors, since it
hasn't been inserted yet. So in move_successors(), we need to handle the
case where the block whose successors are to be moved doesn't have any
successors. Fixing link_blocks() here prevents a segfault and makes it
work correctly.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We may delete a control flow node which contains structured jumps to
other parts of the program. We need to remove the jump as a predecessor,
as well as remove any phi node sources which reference it. Right now,
the same problem exists for blocks that don't end in a jump instruction,
but with the new API it shouldn't be an issue, since blocks that don't
end in a jump must either point to another block in the same extracted
CF list or not point to anything at all.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike calling nir_instr_remove(), calling nir_cf_node_remove() (and
later in the series, the nir_cf_list_delete()) implies that you're
removing instructions that may still have uses, except those
instructions are never executed so any uses will be undefined. When
cleaning up a CF node for deletion, we must clean up any uses of the
deleted instructions by making them point to undef instructions instead.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In particular, handle the case where the earlier block ends in a jump
and the later block is empty. In that case, we want to preserve the jump
and remove any traces of the later block. Before, we would only hit this
case when removing a control flow node after a jump, which wasn't a
common occurance, but we'll need it to handle inserting a control flow
list which ends in a jump, which should be more common/useful.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, we would only split a block with a jump at the end if we were
inserting something after a block with a jump, which never happened in
practice. But now, we want to use this to extract control flow lists
which may end in a jump, in which case we really need to do the correct
patching up. As a side effect, when removing jumps we now correctly
insert undef phi sources in some corner cases, which can't hurt.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, the process of removing a jump and wiring up the remaining block
correctly was atomic, but with the new control flow modification it's
split into two parts: first, we extract the jump, which creates a new
block with re-wired successors as well as a free-floating jump, and then
we delete the control flow containing the jump, which removes the entry
in the predecessors and any phi node sources. Split up
nir_handle_remove_jumps() to accomodate this, and add the missing
support for removing phi node sources.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
cleanup_cf_node() is part of the control flow modification code, which
we're going to split into its own file, but remove_defs_uses() is an
internal function used by nir_instr_remove(). Break the dependency by
making cleanup_cf_node() use nir_instr_remove() instead, which simply
calls remove_defs_uses() and then removes the instruction from the list.
nir_instr_remove() does do extra things for jumps, though, so we avoid
calling it on jumps which matches the previous behavior (this will be
fixed later in the series).
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It was being used to initialize function impls and loops, even though
it's really a control flow modification helper. It's pretty trivial, so
just inline it to avoid the dependency.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's simply the first nir_cf_node in the nir_function_impl::body list,
which is easy enough to access - we don't to store a pointer to it
explicitly. Removing it means we don't need to maintain the pointer
when, say, splitting the start block when modifying control flow.
Thanks to Connor Abbott for suggesting this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Only uncompressed formats have a non-void type and actual
components per pixel. Rename _mesa_format_to_type_and_comps
to _mesa_uncompressed_format_to_type_and_comps and require
callers to check if the format is not compressed.
v2. include compressed format cases to avoid gcc warnings (Chad).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
On the older platforms where we don't have logical contexts preserving
state across batches, we emit the invariant state setup on every batch
using the brw_invariant_state atom. This includes the pipeline selection
which is cached with the introduction of
commit 0e0e23ef53
Author: Jordan Justen <jordan.l.justen@intel.com>
Date: Wed Apr 22 11:43:50 2015 -0700
i965/state: Emit pipeline select when changing pipelines
However, we do not reset the cache between batches on context-less
platforms resulting in us not setting the pipeline selection and can
cause GPU hangs if a media pipelined was loaded in the meantime (e.g.
mixing mplayer/gstreamer using libva and gnome-shell). A simple solution
is to just forcibly re-emit the pipeline select along with the invariant
state and reset the cache at that point.
Reported-and-tested-by: Tomasz C. <tomaszc@o2.pl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91254
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
This reverts commit 567394112d.
It regressed performance. It looks like smaller IBs are better, because
the GPU goes idle quicker and there is less waiting for buffers and fences.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.
v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope
when gl_VertexID is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
The edge flag data on Gen6+ is passed through the fixed function hardware as
an extra attribute. According to the PRM it must be the last valid
VERTEX_ELEMENT structure. However if the vertex ID is also used then another
extra element is added to source the VID. This made it so the vertex ID is in
the wrong register in the vertex shader and the edge attribute is no longer in
the last element.
v2: Also implement for BDW+
v3 [by Ben]: Remove 10.5 tag. Too late.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
During vkCreateRenderPass, count the number of clear ops and store them
in new members of anv_render_pass:
uint32_t num_color_clear_attachments
bool has_depth_clear_attachment
bool has_stencil_clear_attachment
Cacheing these 8 bytes (including padding) reduces the number of times
that anv_cmd_buffer_clear_attachments needs to loop over the pass's
attachments.
In struct anv_saved_state, each member's type was a pointer to an Anvil
struct and each member's name was prefixed with "old" except cb_state,
which was a Vulkan handle whose name lacked "old".
The source image's format was incorrectly used for both the source view
and destination view. For vkCmdCopyImage to correctly translate formats,
the destination view's format must be that of the destination image's.
Change type of anv_render_pass_attachment::format from VkFormat to const
struct anv_format*. This elimiates the repetitive lookups into the
VkFormat -> anv_format table when looping over attachments during
anv_cmd_buffer_clear_attachments().
Stop creating a temporary VkImageCreateInfo with overriden
format=VK_FORMAT_S8_UINT. Instead, just pass the format override
directly to anv_image_make_surface().
Stencil formats are often a special case. To reduce the number of lookups
into the VkFormat-to-anv_format translation table when working with
stencil, expose the table's entry for VK_FORMAT_S8_UINT as global
variable anv_format_s8_uint.
Change type of anv_surface_view::format from VkFormat to const struct
anv_format*. This reduces the number of lookups in the VkFormat ->
anv_format table.
This moves the translation of VkFormat to anv_format from
anv_fill_buffer_surface_state() to its caller.
A prep commit to reduce more VkFormat -> anv_format translations.
Store the original VkFormat as anv_format::vk_format. This will be used
to reduce format indirection, such as lookups into the VkFormat ->
anv_format translation table.
We need to make sure we use the VkImage infrastructure for creating
dmabuf images.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We already query the device in various ways here and we can just also
get the aperture size. This avoids keeping an extra drm fd open
during the life time of the driver.
Also, we need to use explicit 64 bit types for the aperture size, not
size_t.
This lets us hit an assert if we exceed the block pool size instead of
GPU hanging.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
The anv_block_pool data structure suffered from the exact same race as the
state pool. Namely, that the uniqueness of the blocks handed out depends
on the next_block value increasing monotonically. However, this invariant
did not hold thanks to our block "return" concept.
The previous algorithm had a race because of the way we were using
__sync_fetch_and_add for everything. In particular, the concept of
"returning" over-allocated states in the "next > end" case was completely
bogus. If too many threads were hitting the state pool at the same time,
it was possible to have the following sequence:
A: Get an offset (next == end)
B: Get an offset (next > end)
A: Resize the pool (now next < end by a lot)
C: Get an offset (next < end)
B: Return the over-allocated offset
D: Get an offset
in which case D will get the same offset as C. The solution to this race
is to get rid of the concept of "returning" over-allocated states.
Instead, the thread that gets a new block simply sets the next and end
offsets directly and threads that over-allocate don't return anything and
just futex-wait. Since you can only ever hit the over-allocate case if
someone else hit the "next == end" case and hasn't resized yet, you're
guaranteed that the end value will get updated and the futex won't block
forever.
We have pools, so we should be using them. Also, I think this will help
keep valgrind from getting confused when we have to end up fighting with
system allocations such as those from malloc/free and mmap/munmap.
Figuring out whether or not to do a copy requires knowing the length of the
final batch_bo. This gets set by anv_batch_bo_finish so we have to do it
afterwards. Not sure how this was even working before.
Previously, the command buffer implementation was split between
anv_cmd_buffer.c and anv_cmd_emit.c. However, this naming convention was
confusing because none of the Vulkan entrypoints for anv_cmd_buffer were
actually in anv_cmd_buffer.c. This changes it so that anv_cmd_buffer.c is
what you think it is and the internals are in anv_batch_chain.c.
Previously, the caller of emit_state_base_address was doing this. However,
putting it directly in emit_state_base_address means that we'll never
forget the flush at the cost of one PIPE_CONTROL at the top every batch
(that should do nothing since the kernel just flushed for us).
This is more generic and doesn't imply that it emits MI_BATCH_BUFFER_END.
While we're at it, we'll move NOOP adding from bo_finish to
end_batch_buffer.
Instead of walking the list of batch and surface buffers, we simply keep
track of all known batch and surface buffers as we build the command
buffer. Then we use this new list to construct the validate list.
The algorighm we used previously required us to call add_bo in a particular
order in order to guarantee that we get the initial batch buffer as the
last element in the validate list. The new algorighm does a recursive walk
over the buffers and then re-orders the list. This should be much more
robust as we start to add circular dependancies in the relocations.
Before, we were doing this thing where we had one big relocation list for
the whole command buffer and each subbuffer took a chunk out of it. Now,
we store the actual relocation list in the anv_batch_bo. This comes at the
cost of more small allocations but makes a lot of things simpler.
Previously anv_batch.relocs was an actual relocation list. However, this
is limiting if the implementation of the batch wants to change the
relocation list as the batch progresses.
This update fixes cases where a 48-bit address field was split into
two parts:
__gen_address_type MemoryAddress;
uint32_t MemoryAddressHigh;
which cases this pack code to be generated:
dw[1] =
__gen_combine_address(data, &dw[1], values->MemoryAddress, dw1);
dw[2] =
__gen_field(values->MemoryAddressHigh, 0, 15) |
0;
which breaks for addresses above 4G.
This update also fixes arrays of structs in commands and structs, for
example, we now have:
struct GEN8_BLEND_STATE_ENTRY Entry[8];
and the pack functions now write all dwords in the packet, making
valgrind happy.
Finally, we would try to pack 64 bits of blend state into a uint32_t -
that's also fixed now.
Previously, we were crawling through the anv_cmd_buffer datastructure to
pull out batch buffers and things. This meant that every time something in
anv_cmd_buffer changed, we broke aub dumping. However, aub dumping should
just dump the stuff the kernel knows about so we really don't need to be
crawling driver internals.
This used to happen magically in cmd_buffer_new_surface_state_bo. However,
according to Ken, STATE_BASE_ADDRESS is very gen-specific so we really
shouldn't have it in the generic data-structure code.
The CTS passes in NULL names right now. It's not too hard to support that
as just "main". With this, and a patch to vulkancts, we now pass all 6
tests.
Jason started the task by creating anv_cmd_buffer.c and anv_cmd_emit.c.
This patch finishes the task by renaming all other files except
gen*_pack.h and glsl_scraper.py.
This completes the FINISHME to trim unneeded data from anv_buffer_view.
A VkExtent3D doesn't make sense for a VkBufferView.
So remove the member anv_surface_view::extent, and push it up to the two
objects that actually need it, anv_image_view and anv_attachment_view.
This removes nearly all the remaining raw Anvil<->Vulkan casts from the
C source files. (File compiler.cpp still contains many raw casts, and
I plan on ignoring that).
As far as I can tell, the only remaining raw casts are:
anv_attachment_view -> anv_depth_stencil_view
anv_attachment_view -> anv_color_attachment_view
The GLSL layer above is still hacky, so we're really just moving the
hack into GLSL-to-NIR. I'd rather not go all the way and make GLSL
support the Vulkan binding model too, since presumably we'll be
switching to SPIR-V exclusively, and so working on proper GLSL support
will be a waste of time. For now, doing this keeps it working as we add
SPIR-V->NIR support though.
The Vulkan spec does not specify that the free function provided to
CreateInstance must handle NULL properly so we do it in the wrapper. If
this ever changes in the spec, we can delete the extra 2 lines.
While updating vkDestroyObject, I discovered that vkDestroyPass reliably
crashes. That hasn't been an issue yet, though, because it is never
called.
In vkCreateRenderPass:
- Don't allocate empty attachment arrays.
- Ensure that pointers to empty attachment arrays are NULL.
- Store VkRenderPassCreateInfo::subpassCount as
anv_render_pass::subpass_count.
In vkDestroyRenderPass:
- Fix loop bounds: s/attachment_count/subpass_count/
- Don't call anv_device_free on null pointers.
vkDestroyColorAttachmentView
vkDestroyDepthStencilView
These functions are not in the 0.132 header, but adding them will help
us attain the type-safety API updates more quickly.
Oops. My recent commits added new destructors, but forgot to teach
vkDestroyObject about them. They are:
vkDestroyFence
vkDestroyEvent
vkDestroySemaphore
vkDestroyQueryPool
vkDestroyBuffer
This means that now the internal version of glslangValidator is
required. This includes some changes due to the sampler/texture rework,
but doesn't actually enable anything more yet. We also don't yet handle
UBO's correctly, and don't handle matrix stride and row major/column
major yet.
During anv_physical_device_init(), we opend the DRM device to do some
queries, then promptly closed it. Now we keep it open for the lifetime
of the anv_physical_device so that we can query it some more during
vkGetPhysicalDevice*Properties() [which will happen in follow-up
commits].
Because in a follow-up patch I need to do some non-trival teardown on
anv_physical_device. Currently, however, anv_physical_device_finish() is
currently a no-op that's just called in the right place.
Also, rename function fill_physical_device -> anv_physical_device_init
for symmetry.
Don't assume that $(top_srcdir)/.git is a directory. It may be a
gitlink file [1] if $(top_srcdir) is a submodule checkout or a linked
worktree [2].
[1] A "gitlink" is a text file that specifies the real location of
the gitdir.
[2] Linked worktrees are a new feature in Git 2.5.
Cc: "10.6, 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 75784243df)
The Vulkan spec says that pPhysicalDeviceCount is an out parameter if
pPhysicalDevices is NULL; otherwise it's an inout parameter.
Mesa incorrectly treated it unconditionally as an inout parameter, which
could have lead to reading unitialized data.
Don't enumerate devices in vkCreateInstance(). That's where global,
device-independent initialization should happen. Move device enumeration
to the more logical location, vkEnumeratePhysicalDevices().
Function fill_physical_device() has a 'path' parameter, and struct
anv_physical_device has a 'path' member. Sometimes these are used;
sometimes hardcoded "/dev/dri/renderD128" is used instead.
Be consistent. Hardcode "/dev/dri/renderD128" in exactly one location,
during initialization of the physical device.
Before values are pushed or annotated with a name, decoration, etc.,
they need to have an invalid type, NULL name, NULL decoration, etc.
ralloc zero's everything by accident, so this wasn't an issue in
practice, but we should be explicitly zero'ing it.
Unfortunately, this requires some non-trivial changes to the driver. Now
that the primitive restart index isn't given explicitly by the client, we
always use ~0 for everything like D3D does. Unfortunately, our hardware is
awesome and a 32-bit version of ~0 doesn't match any 16-bit values. This
means, we have to set it to either UINT16_MAX or UINT32_MAX depending on
the size of the index type. Since we get the index type from
CmdBindIndexBuffer and the rest of the VF packet from the pipeline, we need
to lazy-emit the VF packet.
This was missing and was causing the driver to not work with
execlists. Presumably we get a different initial hw context with
execlists enabled, that has sample mask 0 initially.
Set this to 0xffff for now. When we add MS support, we need to take the
value from VkPipelineMsStateCreateInfo::sampleMask.
The new headers use stdbool for enable/disable fields which
implicitly converts expressions like (flags & 8) to 0 or 1.
Also handles MBO (must-be-one) fields by setting them to one,
corrects a bspec typo (_3DPRIM_LISTSTRIP_ADJ -> LINESTRIP) and
makes a few enum values less clashy.
Convert the table from the direct mapping
VkImageViewType -> SurfaceType
into a mapping to an info struct
VkImageViewType -> struct anv_image_view_info
This accounts for a number differences between the generated headers and
the hand-written header. Not all reformatting is done in this commit but
it does make the headers much more diffable.
In theory, no functional change.
Structs in the old version were specified as
typedef struct VkSomeThing_
{
type field; // comment
} VkSomeThing;
However, in the generated headers, you have
typedef struct {
type field;
} VkSomeThing;
This commit also removes some unneeded whitespaces.
We should be asserting that the parent decoration didn't hand us
a member if the child decoration did, but different child decorations
may obviously have different members.
Since SSA values now have their own types, it's more convenient to make
'type' only used when we want to look up an actual SPIR-V type, since
we're going to change its type soon to support various decorations that
are handled at the SPIR-V -> NIR level.
Previously, the caller of vtn_handle_type had to handle actually inserting
the type. However, this didn't really work if the type was decorated in
any way.
The way the code currently works is that anv_format::cpp is the cpp of
anv_format::surface_format.
Me and Kristian disagree about how the code *should* work. Despite that,
I think it's in our discussion's best interest to document how the code
*currently* works. That should eliminate confusion.
If and when the code begins to work differently, then we'll update the
anv_format comments.
This prepares for upcoming miptree support.
anv_surface is a proxy for color surfaces, depth surfaces, and stencil
surfaces. Embed two instances of anv_surface into anv_image: the
primary surface (color or depth), and an optional stencil surface.
All function signatures that matched this pattern,
old: f(const VkImageCreateInfo *, const struct anv_image_create_info *)
were rewritten as
new: f(const struct anv_image_create_info *)
The format table defined cpp = 0 for stencil-only formats. The real cpp
is 1.
When code begins to lie, especially about stencil buffers, code becomes
increasingly fragile as time progresses, and the damage becomes
increasingly hard to undo. (For precedent, see the painful history of
stencil buffer cpp in the git log for gen6 and gen7 in the i965 driver).
Let's undo the stencil buffer cpp lie now to avoid future pain.
In the format table, set cpp = 1 for VK_FORMAT_S8; replace checks for
cpp == 0; and delete all comments about the hack.
From my experience with intel_mipmap_tree.c, I learned that for struct's
like anv_image and intel_mipmap_tree, which have sprawling
multi-function construction codepaths, it's easy to mistakenly use
unitialized struct members during construction.
Let's eliminate the risk of using unitialized anv_image members during
construction. Fill the struct at the function bottom instead of
piecemeal throughout the constructor.
anv_format::surface_format was incorrect for Vulkan depth formats.
For example, the format table mapped
VK_FORMAT_D24_UNORM -> .surface_format = D24_UNORM_X8_UINT
VK_FORMAT_D32_FLOAT -> .surface_format = D32_FLOAT
but should have mapped
VK_FORMAT_D24_UNORM -> .surface_format = R24_UNORM_X8_TYPELESS
VK_FORMAT_D32_FLOAT -> .surface_format = R32_FLOAT
The Crucible test func.depthstencil.basic passed despite the bug, but
only because it did not attempt to texture from the depth surface.
The core problem is that RENDER_SURFACE_STATE.SurfaceFormat and
3DSTATE_DEPTH_BUFFER.SurfaceFormat are distinct types. Considering them
as enum spaces, the two enum spaces have incompatible collisions.
Fix this by adding a new field 'depth_format' to struct anv_format.
Refer to brw_surface_formats.c:translate_tex_format() for precedent.
I misinterpreted anv_format::format as a VkFormat. Instead, it is
a hardware surface format (RENDER_SURFACE_STATE.SurfaceFormat). Rename
the field to 'surface_format' to make it unambiguous.
The brw_create_nir function takes a GLSL or ARB shader and turns it into a
NIR shader. The guts of the optimization and lowering code is now split
into a new brw_process_shader function.
When we're done emitting the code for a loop, we need to visit the new
break block, which is the merge block of the current loop, rather than
the old merge block, which is the merge block of the loop containing the
one we just emitted code for.
We compute the right mask and thread width max parameters as part of
pipeline creation and set them accordingly at vkCmdDispatch() and
vkCmdDispatchIndirect() time. These parameters depend only on the local
group size and the dispatch width of the program so we can figure this
out at pipeline create time.
This reverts commits
e17ed04 * vk/image: Don't double-allocate stencil buffers
1ee2d1c * vk/image: Teach anv_image_choose_tile_mode about WMAJOR
and instead adds a comment to describe the subtlety of how we create
images for stencil only formats.
The check in batch_bo_finish should catch any undefined values in the batch
but isn't that great for debugging. The checks in the various emit
functions will help get better granularity.
- Reduce the number of table lookups in anv_image_create from 4 to 1.
- Add field for surface alignment.
- Shorten field names tile_width, tile_height -> width, height.
This avoids the full brw context initialization and just sets up context
constants, initializes extensions and sets a few driver vfuncs for the
front-end GLSL compiler.
Dynamic color/blend state can be NULL in case we're not rendering to
color targets (only output to depth and/or stencil). Initialize
cmd_buffer->cb_state to NULL so we can reliably detect whether it's been
set or not.
GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
total instructions in shared programs: 2742062 -> 2681339 (-2.21%)
instructions in affected programs: 1514770 -> 1454047 (-4.01%)
helped: 5813
HURT: 1120
The gained programs are ARB vertext programs that were previously going
through the vec4 backend. Now that we have prog_to_nir, ARB vertex
programs can go through the scalar backend so they show up as "gained" in
the shader-db results.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
This supports the three Vulkan border color types for float color
formats. The support for integer formats is a little trickier, as we
don't know the format of the texture at this time.
According to the bspec, you're supposed to emit a PIPE_CONTROL with a CS
stall and a render target flush prior to chainging STATE_BASE_ADDRESS. A
little experimentation, however, shows that this is not enough. It also
appears as if you have to flush the texture cache after chainging base
address or things won't propagate properly.
This commit allows for us to create a whole new surface state buffer when
the old one runs out of room. We simply re-emit the state base address for
the new state, re-emit binding tables, and keep going.
Before, we were emitting surface states up-front when binding tables were
updated. Now, we wait to emit the surface states until we emit the binding
table. This makes meta simpler and should make it easier to deal with
swapping out the surface state buffer.
We need to make sure we use the right index into dynamic offset
array. Dynamic descriptors can be present or not in different stages and
to get the right offset, we need to compute the index at
vkCreateDescriptorSetLayout time.
We do this by creating a surface state on the fly that incorporates the
dynamic offset. This patch also refactor the descriptor set layout
constructor a bit to be less clever with switch statement fall
through. Instead of duplicating the subtle code to update the sampler
and surface slot map, we just use two switch statements.
We rely on these being initialized to NULL so meta can reliably detect
whether or not they've been set. ds_state is also allowed to not be
present so we need a well-defined value for that.
This mega-commit primarily does two things. First, is to turn anv_batch
into a better abstraction of a batch. Instead of actually having a BO, it
now has a few pointers to some piece of memory that are used to add data to
the "batch". If it gets to the end, there is a function pointer that it
can call to attempt to grow the batch.
The second change is to start using chained batch buffers. When the end of
the current batch BO is reached, it automatically creates a new one and
ineserts an MI_BATCH_BUFFER_START command to chain to it. In this way, our
batch buffers are effectively infinite in length.
This also drops the share create_surface_state helper and moves filling
out SURFACE_STATE directly into anv_image_view_init() and
anv_color_attachment_view_init().
If a meta operation is called before the pipeline is set, this can cause
uses of undefined values. They *should* be harmless, but we might as well
shut up valgrind on this one too.
Previously, we just blasted out whatever VB's we had marked as "dirty"
regardless of which ones were used by the pipeline. Given that the stride
of the VB is embedded in the pipeline this can cause problems. One problem
is if the pipeline doesn't use the given VB binding we emit a bogus stride.
Another problem is that we weren't properly resetting the dirty bits when
the pipeline changed.
Previously, we waited until later and did a pass through the used surfaces
and did the relocations then. This lead to doing double-relocations which
was causing us to get bogus surface offsets.
Since the binding table pointer is only 16 bits, we can only have 64kb
of binding table state allocated at any given time. With a block size of
1kb, that amounts to just 64 command buffers, which is not enough.
This commit adds support for the LunarG GLSL back-door as well as detecting
regular GLSL and SPIR-V. The SPIR-V path doesn't exist yet, so that will
cause an assert-fail.
The *_init functions work basically the same as the Vulkan entrypoints
except that they act on an already-created view and take an optional
command buffer option. If a command buffer is given, the surface state is
allocated out of the command buffer's state stream.
This reverts commit b6ab076d6b.
It turns out setting USE_MEMFD to 0 is really bad because it means we can't
resize the pool. Besides, valgrind SVN handles memfd so we really don't
need this fallback for valgrind anymore.
The binding table pointers packet only allows for a 16-bit binding table
address so all binding tables have to be in the first 64 KB of the surface
state BO. We solve this by adding a slave block pool that pulls off the
first 64 KB worth of blocks and reserves them for binding tables.
Clear inherits the render targets from the current render pass. This
means we need to fill out the binding table after switching to meta
bindings. However, meta copies etc happen outside a render pass and
break when we try to fill in the render targets. This change fills the
render targets only for meta clear.
We leave the block pool untracked so that reads/writes to freed blocks
will get caught and do the tracking at the state pool/stream level. We
have to do a few extra gymnastics for streams because valgrind works in
terms of poitners and we work in terms of separate map and offset.
Fortunately, the users of the state pool and stream should always be using
the map pointer provided in the anv_state structure. We just have to
track, per block, the map that was used when we initially got the block.
Then we can make sure we always use that map and valgrind should stay
happy.
We now always copy the entire struct unless pData is NULL and
unconditionally write back the struct size. It's not clear this is
useful if the structs may grow over time, but it seems to be the
expected behaviour for now.
I've been promissed in a bug that this will be fixed in a future version of
the header. However, in the interest of my branch building, I'm adding
these changes in myself for the moment.
We don't actually use it to create all the instructions but we do use it
for insertion always. This should make things far more consistent for
implementing extended instructions.
We do control-flow handling as a two-step process. The first step is to
walk the instructions list and record various information about blocks and
functions. This is where the acutal nir_function_overload objects get
created. We also record the start/stop instruction for each block. Then
a second pass walks over each of the functions and over the blocks in each
function in a way that's NIR-friendly and actually parses the instructions.
Instead of having functions to add values and set various things, we just
have a function that does a few asserts and then returns the value. The
caller is then responsible for setting the various fields.
If we don't do this then recursive meta is completely broken. What happens
is that the outer meta call may change the bindings pointer and the inner
meta call will change it again and, when it exits set it back to the
default. However, the outer meta call may be relying on it being left
alone so it uses the non-meta descriptor sets instead of its own.
Previously, the GLSL_VK_SHADER macro didn't work if the shader contained
commas outside of parentheses due to the way the C preprocessor works.
This commit fixes this by making it variadic again and doing it correctly
this time.
This changes the way descriptor sets and layouts work so that we fill
out binding table contents at the time we bind descriptor sets. We
manipulate the binding table contents and sampler state in a shadow-copy
in anv_cmd_buffer. At draw time, we allocate the actual binding table
and sampler state and flush the anv_cmd_buffer copies.
This new utility, glsl_scraper.py scrapes C files for instances of the
GLSL_VK_SHADER macro, pulls out the shader source, and compiles it to
SPIR-V. The compilation is done using glslValidator. The result is then
placed into another C file as arrays of dwords that can be easiliy handed
to a Vulkan driver.
This way we can pass in a vertex shader and yet have the pipeline emit an
empty 3DSTATE_VS packet. We need this for meta because we need to trick
the compiler into not deleting our inputs but at the same time disable the
VS so that we can use a rectlist. This should go away once we actually get
SPIR-V.
This is rather crude but it at least makes sure that all the render targets
get flushed at the end of the pass. We probably actually want to do
somthing based on image layout traansitions, but this will work for now.
This is by no means a complete solution to the binding table problems.
However, it does make texturing actually work. Before, we were texturing
from the render target since they were both starting at 0.
We don't need to point back to the memory object the bo came from.
Pointing directly to a bo lets us bind images and buffers to other
bos - like our allocator bos.
vgpr_comp_cnt=3;/* all components are needed for TES */
num_user_sgprs=SI_TES_NUM_USER_SGPR;
}else
assert(0);
unreachable("invalid shader selector type");
num_sgprs=shader->num_sgprs;
if(num_user_sgprs>num_sgprs){
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.