docs: add relnotes for 19.0.3

Bump version for 19.0.3
radeonsi: use CP DMA for the null const buffer clear on CIK
2019-04-24 10:39:04 -07:00 · 2019-04-24 10:36:18 -07:00 · 2019-04-24 08:50:50 -07:00 · 2019-04-24 08:50:34 -07:00 · 2019-04-23 09:18:20 -07:00 · 2019-04-23 09:17:58 -07:00
58 changed files with 670 additions and 157 deletions
--- a/2
+++ b/2
@@ -1 +1 @@
-19.0.1
+19.0.3
--- a/bin/install_megadrivers.py
+++ b/bin/install_megadrivers.py
@@ -49,7 +49,6 @@ def main():
        if os.path.lexists(to):
            os.unlink(to)
        os.makedirs(to)
-    shutil.copy(args.megadriver, master)

    for driver in args.drivers:
        abs_driver = os.path.join(to, driver)
@@ -71,7 +70,14 @@ def main():
                name, ext = os.path.splitext(name)
        finally:
            os.chdir(ret)
+
+    # Remove meson-created master .so and symlinks
    os.unlink(master)
+    name, ext = os.path.splitext(master)
+    while ext != '.so':
+        if os.path.lexists(name):
+            os.unlink(name)
+        name, ext = os.path.splitext(name)


 if __name__ == '__main__':
--- a/docs/relnotes/19.0.1.html
+++ b/docs/relnotes/19.0.1.html
@@ -31,7 +31,8 @@ Compatibility contexts may report a lower version depending on each driver.

 <h2>SHA256 checksums</h2>
 <pre>
-  TBD
+f1dd1980ed628edea3935eed7974fbc5d8353e9578c562728b880d63ac613dbd  mesa-19.0.1.tar.gz
+6884163c0ea9e4c98378ab8fecd72fe7b5f437713a14471beda378df247999d4  mesa-19.0.1.tar.xz
 </pre>


--- a/docs/relnotes/19.0.2.html
+++ b/docs/relnotes/19.0.2.html
@@ -0,0 +1,122 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html lang="en">
+<head>
+  <meta http-equiv="content-type" content="text/html; charset=utf-8">
+  <title>Mesa Release Notes</title>
+  <link rel="stylesheet" type="text/css" href="../mesa.css">
+</head>
+<body>
+
+<div class="header">
+  <h1>The Mesa 3D Graphics Library</h1>
+</div>
+
+<iframe src="../contents.html"></iframe>
+<div class="content">
+
+<h1>Mesa 19.0.2 Release Notes / April 10, 2019</h1>
+
+<p>
+Mesa 19.0.2 is a bug fix release which fixes bugs found since the 19.0.1 release.
+</p>
+<p>
+Mesa 19.0.2 implements the OpenGL 4.5 API, but the version reported by
+glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
+glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
+Some drivers don't support all the features required in OpenGL 4.5.  OpenGL
+4.5 is <strong>only</strong> available if requested at context creation.
+Compatibility contexts may report a lower version depending on each driver.
+</p>
+
+
+<h2>SHA256 checksums</h2>
+<pre>
+SHA256: eb972fc11d4e1261d34ec0b91a701f158d4870c0428fb108353ae7eab64b1118  mesa-19.0.2.tar.gz
+SHA256: 1a2edc3ce56906a676c91e6851298db45903df1f5cb9827395a922c1452db802  mesa-19.0.2.tar.xz
+</pre>
+
+
+<h2>New features</h2>
+
+
+<h2>Bug fixes</h2>
+
+
+<ul>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=108766">Bug 108766</a> - Mesa built with meson has RPATH entries</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109648">Bug 109648</a> - AMD Raven hang during va-api decoding</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110257">Bug 110257</a> - Major artifacts in mpeg2 vaapi hw decoding</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110259">Bug 110259</a> - radv: Sampling depth-stencil image in GENERAL layout returns nothing but zero (regression, bisected)</li>
+
+</ul>
+
+<h2>Changes</h2>
+
+
+<p>Boyuan Zhang (1):</p>
+<ul>
+  <li>st/va: reverse qt matrix back to its original order</li>
+</ul>
+
+<p>Caio Marcelo de Oliveira Filho (1):</p>
+<ul>
+  <li>nir: Take if_uses into account when repairing SSA</li>
+</ul>
+
+<p>Dylan Baker (2):</p>
+<ul>
+  <li>docs: Add SHA256 sums for mesa 19.0.1</li>
+  <li>VERSION: bump version for 19.0.2</li>
+</ul>
+
+<p>Eric Anholt (3):</p>
+<ul>
+  <li>dri3: Return the current swap interval from glXGetSwapIntervalMESA().</li>
+  <li>v3d: Bump the maximum texture size to 4k for V3D 4.x.</li>
+  <li>v3d: Don't try to use the TFU blit path if a scissor is enabled.</li>
+</ul>
+
+<p>Eric Engestrom (1):</p>
+<ul>
+  <li>meson: strip rpath from megadrivers</li>
+</ul>
+
+<p>Jason Ekstrand (1):</p>
+<ul>
+  <li>Revert "anv/radv: release memory allocated by glsl types during spirv_to_nir"</li>
+</ul>
+
+<p>Karol Herbst (1):</p>
+<ul>
+  <li>nir/print: fix printing the image_array intrinsic index</li>
+</ul>
+
+<p>Leo Liu (2):</p>
+<ul>
+  <li>radeon/vcn: add H.264 constrained baseline support</li>
+  <li>radeon/vcn/vp9: search the render target from the whole list</li>
+</ul>
+
+<p>Lionel Landwerlin (1):</p>
+<ul>
+  <li>intel: add dependency on genxml generated files</li>
+</ul>
+
+<p>Marek Olšák (1):</p>
+<ul>
+  <li>radeonsi: fix assertion failure by using the correct type</li>
+</ul>
+
+<p>Samuel Pitoiset (2):</p>
+<ul>
+  <li>radv: skip updating depth/color metadata for conditional rendering</li>
+  <li>radv: do not always initialize HTILE in compressed state</li>
+</ul>
+
+</div>
+</body>
+</html>
--- a/docs/relnotes/19.0.3.html
+++ b/docs/relnotes/19.0.3.html
@@ -0,0 +1,147 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html lang="en">
+<head>
+  <meta http-equiv="content-type" content="text/html; charset=utf-8">
+  <title>Mesa Release Notes</title>
+  <link rel="stylesheet" type="text/css" href="../mesa.css">
+</head>
+<body>
+
+<div class="header">
+  <h1>The Mesa 3D Graphics Library</h1>
+</div>
+
+<iframe src="../contents.html"></iframe>
+<div class="content">
+
+<h1>Mesa 19.0.3 Release Notes / April 24, 2019</h1>
+
+<p>
+Mesa 19.0.3 is a bug fix release which fixes bugs found since the l9.0.2 release.
+</p>
+<p>
+Mesa 19.0.3 implements the OpenGL 4.5 API, but the version reported by
+glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
+glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
+Some drivers don't support all the features required in OpenGL 4.5.  OpenGL
+4.5 is <strong>only</strong> available if requested at context creation.
+Compatibility contexts may report a lower version depending on each driver.
+</p>
+
+
+<h2>SHA256 checksums</h2>
+<pre>
+TBD
+</pre>
+
+
+<h2>New features</h2>
+
+<p>N/A</p>
+
+<h2>Bug fixes</h2>
+
+<ul>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=108879">Bug 108879</a> - [CIK] [regression] All opencl apps hangs indefinitely in si_create_context</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110201">Bug 110201</a> - [ivb] mesa 19.0.0 breaks rendering in kitty</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110356">Bug 110356</a> - install_megadrivers.py creates new dangling symlink [bisected]</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110441">Bug 110441</a> - [llvmpipe] complex-loop-analysis-bug regression</li>
+
+</ul>
+
+<h2>Changes</h2>
+
+<p>Andres Gomez (1):</p>
+<ul>
+  <li>glsl/linker: location aliasing requires types to have the same width</li>
+</ul>
+
+<p>Bas Nieuwenhuizen (1):</p>
+<ul>
+  <li>ac: Move has_local_buffers disable to radeonsi.</li>
+</ul>
+
+<p>Chia-I Wu (1):</p>
+<ul>
+  <li>virgl: fix fence fd version check</li>
+</ul>
+
+<p>Danylo Piliaiev (1):</p>
+<ul>
+  <li>intel/compiler: Do not reswizzle dst if instruction writes to flag register</li>
+</ul>
+
+<p>Dylan Baker (2):</p>
+<ul>
+  <li>docs: Add sha256 sums for 19.0.2</li>
+  <li>Bump version for 19.0.3</li>
+</ul>
+
+<p>Eric Anholt (1):</p>
+<ul>
+  <li>nir: Fix deref offset calculation for structs.</li>
+</ul>
+
+<p>Eric Engestrom (1):</p>
+<ul>
+  <li>meson: remove meson-created megadrivers symlinks</li>
+</ul>
+
+<p>Jason Ekstrand (2):</p>
+<ul>
+  <li>anv/pipeline: Fix MEDIA_VFE_STATE::PerThreadScratchSpace on gen7</li>
+  <li>anv: Add a #define for the max binding table size</li>
+</ul>
+
+<p>Juan A. Suarez Romero (1):</p>
+<ul>
+  <li>meson: Add dependency on genxml to anvil genfiles</li>
+</ul>
+
+<p>Kenneth Graunke (2):</p>
+<ul>
+  <li>glsl: Set location on structure-split sampler uniform variables</li>
+  <li>Revert "glsl: Set location on structure-split sampler uniform variables"</li>
+</ul>
+
+<p>Lionel Landwerlin (2):</p>
+<ul>
+  <li>anv: fix uninitialized pthread cond clock domain</li>
+  <li>intel/devinfo: fix missing num_thread_per_eu on ICL</li>
+</ul>
+
+<p>Lubomir Rintel (2):</p>
+<ul>
+  <li>gallivm: guess CPU features also on ARM</li>
+  <li>gallivm: disable NEON instructions if they are not supported</li>
+</ul>
+
+<p>Marek Olšák (1):</p>
+<ul>
+  <li>radeonsi: use CP DMA for the null const buffer clear on CIK</li>
+</ul>
+
+<p>Rhys Perry (1):</p>
+<ul>
+  <li>nir,ac/nir: fix cube_face_coord</li>
+</ul>
+
+<p>Roland Scheidegger (1):</p>
+<ul>
+  <li>gallivm: fix bogus assert in get_indirect_index</li>
+</ul>
+
+<p>Samuel Pitoiset (2):</p>
+<ul>
+  <li>ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+</li>
+  <li>radv: do not load vertex attributes that are not provided by the pipeline</li>
+</ul>
+
+
+</div>
+</body>
+</html>
--- a/src/amd/common/ac_gpu_info.c
+++ b/src/amd/common/ac_gpu_info.c
@@ -367,9 +367,7 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
 	info->has_syncobj_wait_for_submit = info->has_syncobj && info->drm_minor >= 20;
 	info->has_fence_to_handle = info->has_syncobj && info->drm_minor >= 21;
 	info->has_ctx_priority = info->drm_minor >= 22;
-	/* TODO: Enable this once the kernel handles it efficiently. */
-	info->has_local_buffers = info->drm_minor >= 20 &&
-				  !info->has_dedicated_vram;
+	info->has_local_buffers = info->drm_minor >= 20;
 	info->kernel_flushes_hdp_before_ib = true;
 	info->htile_cmask_support_1d_tiling = true;
 	info->si_TA_CS_BC_BASE_ADDR_allowed = true;
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1019,10 +1019,17 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr)
 		LLVMValueRef in[3];
 		for (unsigned chan = 0; chan < 3; chan++)
 			in[chan] = ac_llvm_extract_elem(&ctx->ac, src[0], chan);
-		results[0] = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubetc",
+		results[0] = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubesc",
 						ctx->ac.f32, in, 3, AC_FUNC_ATTR_READNONE);
-		results[1] = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubesc",
+		results[1] = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubetc",
 						ctx->ac.f32, in, 3, AC_FUNC_ATTR_READNONE);
+		LLVMValueRef ma = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubema",
+						     ctx->ac.f32, in, 3, AC_FUNC_ATTR_READNONE);
+		results[0] = ac_build_fdiv(&ctx->ac, results[0], ma);
+		results[1] = ac_build_fdiv(&ctx->ac, results[1], ma);
+		LLVMValueRef offset = LLVMConstReal(ctx->ac.f32, 0.5);
+		results[0] = LLVMBuildFAdd(ctx->ac.builder, results[0], offset, "");
+		results[1] = LLVMBuildFAdd(ctx->ac.builder, results[1], offset, "");
 		result = ac_build_gather_values(&ctx->ac, results, 2);
 		break;
 	}
@@ -2532,7 +2539,10 @@ static LLVMValueRef visit_image_atomic(struct ac_nir_context *ctx,
 		params[param_count++] = LLVMBuildExtractElement(ctx->ac.builder, get_src(ctx, instr->src[1]),
 								ctx->ac.i32_0, ""); /* vindex */
 		params[param_count++] = ctx->ac.i32_0; /* voffset */
-		if (HAVE_LLVM >= 0x800) {
+		if (HAVE_LLVM >= 0x900) {
+			/* XXX: The new raw/struct atomic intrinsics are buggy
+			 * with LLVM 8, see r358579.
+			 */
 			params[param_count++] = ctx->ac.i32_0; /* soffset */
 			params[param_count++] = ctx->ac.i32_0;  /* slc */

--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -1258,7 +1258,7 @@ radv_set_ds_clear_metadata(struct radv_cmd_buffer *cmd_buffer,
 	if (aspects & VK_IMAGE_ASPECT_DEPTH_BIT)
 		++reg_count;

-	radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + reg_count, 0));
+	radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + reg_count, cmd_buffer->state.predicating));
 	radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
 			S_370_WR_CONFIRM(1) |
 			S_370_ENGINE_SEL(V_370_PFP));
@@ -1282,7 +1282,7 @@ radv_set_tc_compat_zrange_metadata(struct radv_cmd_buffer *cmd_buffer,
 	uint64_t va = radv_buffer_get_va(image->bo);
 	va += image->offset + image->tc_compat_zrange_offset;

-	radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
+	radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, cmd_buffer->state.predicating));
 	radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
 			S_370_WR_CONFIRM(1) |
 			S_370_ENGINE_SEL(V_370_PFP));
@@ -1476,7 +1476,7 @@ radv_set_color_clear_metadata(struct radv_cmd_buffer *cmd_buffer,

 	assert(radv_image_has_cmask(image) || radv_image_has_dcc(image));

-	radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 4, 0));
+	radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 4, cmd_buffer->state.predicating));
 	radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
 			S_370_WR_CONFIRM(1) |
 			S_370_ENGINE_SEL(V_370_PFP));
@@ -4407,8 +4407,14 @@ static void radv_handle_depth_image_transition(struct radv_cmd_buffer *cmd_buffe
 		return;

 	if (src_layout == VK_IMAGE_LAYOUT_UNDEFINED) {
-		/* TODO: merge with the clear if applicable */
-		radv_initialize_htile(cmd_buffer, image, range, 0);
+		uint32_t clear_value = vk_format_is_stencil(image->vk_format) ? 0xfffff30f : 0xfffc000f;
+
+		if (radv_layout_is_htile_compressed(image, dst_layout,
+						    dst_queue_mask)) {
+			clear_value = 0;
+		}
+
+		radv_initialize_htile(cmd_buffer, image, range, clear_value);
 	} else if (!radv_layout_is_htile_compressed(image, src_layout, src_queue_mask) &&
 	           radv_layout_is_htile_compressed(image, dst_layout, dst_queue_mask)) {
 		uint32_t clear_value = vk_format_is_stencil(image->vk_format) ? 0xfffff30f : 0xfffc000f;
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -48,7 +48,6 @@
 #include "util/build_id.h"
 #include "util/debug.h"
 #include "util/mesa-sha1.h"
-#include "compiler/glsl_types.h"

 static int
 radv_device_get_cache_uuid(enum radeon_family family, void *uuid)
@@ -611,7 +610,6 @@ void radv_DestroyInstance(

 	VG(VALGRIND_DESTROY_MEMPOOL(instance));

-	_mesa_glsl_release_types();
 	_mesa_locale_fini();

 	vk_debug_report_instance_destroy(&instance->debug_report_callbacks);
--- a/src/amd/vulkan/radv_nir_to_llvm.c
+++ b/src/amd/vulkan/radv_nir_to_llvm.c
@@ -2027,10 +2027,32 @@ handle_vs_input_decl(struct radv_shader_context *ctx,

 		t_list = ac_build_load_to_sgpr(&ctx->ac, t_list_ptr, t_offset);

-		input = ac_build_buffer_load_format(&ctx->ac, t_list,
-						    buffer_index,
-						    ctx->ac.i32_0,
-						    num_channels, false, true);
+		if (ctx->options->key.vs.vertex_attribute_provided & (1u << attrib_index)) {
+			input = ac_build_buffer_load_format(&ctx->ac, t_list,
+							    buffer_index,
+							    ctx->ac.i32_0,
+							    num_channels, false, true);
+		} else {
+			/* Per the Vulkan spec, it's invalid to consume vertex
+			 * attributes that are not provided by the pipeline but
+			 * some (invalid) apps appear to do that. Fill the
+			 * input array with (eg. (0, 0, 0, 1)) to workaround
+			 * the problem and to avoid possible GPU hangs.
+			 */
+			LLVMValueRef chan[4];
+
+			/* The input_usage mask might be 0 if input variables
+			 * are not removed by the compiler.
+			 */
+			num_channels = CLAMP(num_channels, 1, 4);
+
+			for (unsigned i = 0; i < num_channels; i++) {
+				chan[i] = i == 3 ? ctx->ac.f32_1 : ctx->ac.f32_0;
+				chan[i] = ac_to_float(&ctx->ac, chan[i]);
+			}
+
+			input = ac_build_gather_values(&ctx->ac, chan, num_channels);
+		}

 		input = ac_build_expand_to_vec4(&ctx->ac, input, num_channels);

--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -1922,6 +1922,8 @@ radv_generate_graphics_pipeline_key(struct radv_pipeline *pipeline,
 			}
 			key.vertex_alpha_adjust |= adjust << (2 * location);
 		}
+
+		key.vertex_attribute_provided |= 1 << location;
 	}

 	if (pCreateInfo->pTessellationState)
@@ -1950,6 +1952,7 @@ radv_fill_shader_keys(struct radv_shader_variant_key *keys,
 {
 	keys[MESA_SHADER_VERTEX].vs.instance_rate_inputs = key->instance_rate_inputs;
 	keys[MESA_SHADER_VERTEX].vs.alpha_adjust = key->vertex_alpha_adjust;
+	keys[MESA_SHADER_VERTEX].vs.vertex_attribute_provided = key->vertex_attribute_provided;
 	for (unsigned i = 0; i < MAX_VERTEX_ATTRIBS; ++i)
 		keys[MESA_SHADER_VERTEX].vs.instance_rate_divisors[i] = key->instance_rate_divisors[i];

--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -365,6 +365,7 @@ struct radv_pipeline_cache {
 struct radv_pipeline_key {
 	uint32_t instance_rate_inputs;
 	uint32_t instance_rate_divisors[MAX_VERTEX_ATTRIBS];
+	uint32_t vertex_attribute_provided;
 	uint64_t vertex_alpha_adjust;
 	unsigned tess_input_vertices;
 	uint32_t col_format;
--- a/src/amd/vulkan/radv_shader.h
+++ b/src/amd/vulkan/radv_shader.h
@@ -66,6 +66,9 @@ struct radv_vs_variant_key {
 	uint32_t instance_rate_inputs;
 	uint32_t instance_rate_divisors[MAX_VERTEX_ATTRIBS];

+	/* Mask of vertex attributes that are provided by the pipeline. */
+	uint32_t vertex_attribute_provided;
+
 	/* For 2_10_10_10 formats the alpha is handled as unsigned by pre-vega HW.
 	 * so we may need to fix it up. */
 	uint64_t alpha_adjust;
--- a/src/broadcom/cle/v3d_packet_v33.xml
+++ b/src/broadcom/cle/v3d_packet_v33.xml
@@ -820,8 +820,8 @@

  <packet code="120" name="Tile Binning Mode Cfg" min_ver="41">

-    <field name="Height (in pixels)" size="12" start="48" type="uint" minus_one="true"/>
-    <field name="Width (in pixels)" size="12" start="32" type="uint" minus_one="true"/>
+    <field name="Height (in pixels)" size="16" start="48" type="uint" minus_one="true"/>
+    <field name="Width (in pixels)" size="16" start="32" type="uint" minus_one="true"/>

    <field name="Double-buffer in non-ms mode" size="1" start="15" type="bool"/>
    <field name="Multisample Mode (4x)" size="1" start="14" type="bool"/>
--- a/src/broadcom/common/v3d_limits.h
+++ b/src/broadcom/common/v3d_limits.h
@@ -32,7 +32,8 @@
 */
 #define V3D_MAX_TEXTURE_SAMPLERS 16

-#define V3D_MAX_MIP_LEVELS 12
+/* The HW can do 16384 (15), but we run into hangs when we expose that. */
+#define V3D_MAX_MIP_LEVELS 13

 #define V3D_MAX_SAMPLES 4

--- a/src/compiler/glsl/gl_nir_lower_samplers_as_deref.c
+++ b/src/compiler/glsl/gl_nir_lower_samplers_as_deref.c
@@ -167,6 +167,14 @@ lower_deref(nir_builder *b, struct lower_samplers_as_deref_state *state,
   } else {
      var = nir_variable_create(state->shader, nir_var_uniform, type, name);
      var->data.binding = binding;
+
+      /* Don't set var->data.location.  The old structure location could be
+       * used to index into gl_uniform_storage, assuming the full structure
+       * was walked in order.  With the new split variables, this invariant
+       * no longer holds and there's no meaningful way to start from a base
+       * location and access a particular array element.  Just leave it 0.
+       */
+
      _mesa_hash_table_insert_pre_hashed(state->remap_table, hash, name, var);
   }

--- a/src/compiler/glsl/link_varyings.cpp
+++ b/src/compiler/glsl/link_varyings.cpp
@@ -424,28 +424,14 @@ compute_variable_location_slot(ir_variable *var, gl_shader_stage stage)

 struct explicit_location_info {
   ir_variable *var;
-   unsigned numerical_type;
+   bool base_type_is_integer;
+   unsigned base_type_bit_size;
   unsigned interpolation;
   bool centroid;
   bool sample;
   bool patch;
 };

-static inline unsigned
-get_numerical_type(const glsl_type *type)
-{
-   /* From the OpenGL 4.6 spec, section 4.4.1 Input Layout Qualifiers, Page 68,
-    * (Location aliasing):
-    *
-    *    "Further, when location aliasing, the aliases sharing the location
-    *     must have the same underlying numerical type  (floating-point or
-    *     integer)
-    */
-   if (type->is_float() || type->is_double())
-      return GLSL_TYPE_FLOAT;
-   return GLSL_TYPE_INT;
-}
-
 static bool
 check_location_aliasing(struct explicit_location_info explicit_locations[][4],
                        ir_variable *var,
@@ -461,14 +447,23 @@ check_location_aliasing(struct explicit_location_info explicit_locations[][4],
                        gl_shader_stage stage)
 {
   unsigned last_comp;
-   if (type->without_array()->is_record()) {
-      /* The component qualifier can't be used on structs so just treat
-       * all component slots as used.
+   unsigned base_type_bit_size;
+   const glsl_type *type_without_array = type->without_array();
+   const bool base_type_is_integer =
+      glsl_base_type_is_integer(type_without_array->base_type);
+   const bool is_struct = type_without_array->is_record();
+   if (is_struct) {
+      /* structs don't have a defined underlying base type so just treat all
+       * component slots as used and set the bit size to 0. If there is
+       * location aliasing, we'll fail anyway later.
       */
      last_comp = 4;
+      base_type_bit_size = 0;
   } else {
-      unsigned dmul = type->without_array()->is_64bit() ? 2 : 1;
-      last_comp = component + type->without_array()->vector_elements * dmul;
+      unsigned dmul = type_without_array->is_64bit() ? 2 : 1;
+      last_comp = component + type_without_array->vector_elements * dmul;
+      base_type_bit_size =
+         glsl_base_type_get_bit_size(type_without_array->base_type);
   }

   while (location < location_limit) {
@@ -478,8 +473,22 @@ check_location_aliasing(struct explicit_location_info explicit_locations[][4],
            &explicit_locations[location][comp];

         if (info->var) {
-            /* Component aliasing is not alloed */
-            if (comp >= component && comp < last_comp) {
+            if (info->var->type->without_array()->is_record() || is_struct) {
+               /* Structs cannot share location since they are incompatible
+                * with any other underlying numerical type.
+                */
+               linker_error(prog,
+                            "%s shader has multiple %sputs sharing the "
+                            "same location that don't have the same "
+                            "underlying numerical type. Struct variable '%s', "
+                            "location %u\n",
+                            _mesa_shader_stage_to_string(stage),
+                            var->data.mode == ir_var_shader_in ? "in" : "out",
+                            is_struct ? var->name : info->var->name,
+                            location);
+               return false;
+            } else if (comp >= component && comp < last_comp) {
+               /* Component aliasing is not allowed */
               linker_error(prog,
                            "%s shader has multiple %sputs explicitly "
                            "assigned to location %d and component %d\n",
@@ -488,27 +497,52 @@ check_location_aliasing(struct explicit_location_info explicit_locations[][4],
                            location, comp);
               return false;
            } else {
-               /* For all other used components we need to have matching
-                * types, interpolation and auxiliary storage
+               /* From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout
+                * Qualifiers, Page 67, (Location aliasing):
+                *
+                *   " Further, when location aliasing, the aliases sharing the
+                *     location must have the same underlying numerical type
+                *     and bit width (floating-point or integer, 32-bit versus
+                *     64-bit, etc.) and the same auxiliary storage and
+                *     interpolation qualification."
                */
-               if (info->numerical_type !=
-                   get_numerical_type(type->without_array())) {
+
+               /* If the underlying numerical type isn't integer, implicitly
+                * it will be float or else we would have failed by now.
+                */
+               if (info->base_type_is_integer != base_type_is_integer) {
                  linker_error(prog,
-                               "Varyings sharing the same location must "
-                               "have the same underlying numerical type. "
-                               "Location %u component %u\n",
-                               location, comp);
+                               "%s shader has multiple %sputs sharing the "
+                               "same location that don't have the same "
+                               "underlying numerical type. Location %u "
+                               "component %u.\n",
+                               _mesa_shader_stage_to_string(stage),
+                               var->data.mode == ir_var_shader_in ?
+                               "in" : "out", location, comp);
+                  return false;
+               }
+
+               if (info->base_type_bit_size != base_type_bit_size) {
+                  linker_error(prog,
+                               "%s shader has multiple %sputs sharing the "
+                               "same location that don't have the same "
+                               "underlying numerical bit size. Location %u "
+                               "component %u.\n",
+                               _mesa_shader_stage_to_string(stage),
+                               var->data.mode == ir_var_shader_in ?
+                               "in" : "out", location, comp);
                  return false;
               }

               if (info->interpolation != interpolation) {
                  linker_error(prog,
-                               "%s shader has multiple %sputs at explicit "
-                               "location %u with different interpolation "
-                               "settings\n",
+                               "%s shader has multiple %sputs sharing the "
+                               "same location that don't have the same "
+                               "interpolation qualification. Location %u "
+                               "component %u.\n",
                               _mesa_shader_stage_to_string(stage),
                               var->data.mode == ir_var_shader_in ?
-                               "in" : "out", location);
+                               "in" : "out", location, comp);
                  return false;
               }

@@ -516,17 +550,20 @@ check_location_aliasing(struct explicit_location_info explicit_locations[][4],
                   info->sample != sample ||
                   info->patch != patch) {
                  linker_error(prog,
-                               "%s shader has multiple %sputs at explicit "
-                               "location %u with different aux storage\n",
+                               "%s shader has multiple %sputs sharing the "
+                               "same location that don't have the same "
+                               "auxiliary storage qualification. Location %u "
+                               "component %u.\n",
                               _mesa_shader_stage_to_string(stage),
                               var->data.mode == ir_var_shader_in ?
-                               "in" : "out", location);
+                               "in" : "out", location, comp);
                  return false;
               }
            }
         } else if (comp >= component && comp < last_comp) {
            info->var = var;
-            info->numerical_type = get_numerical_type(type->without_array());
+            info->base_type_is_integer = base_type_is_integer;
+            info->base_type_bit_size = base_type_bit_size;
            info->interpolation = interpolation;
            info->centroid = centroid;
            info->sample = sample;
--- a/src/compiler/glsl_types.h
+++ b/src/compiler/glsl_types.h
@@ -31,6 +31,7 @@
 #include "shader_enums.h"
 #include "blob.h"
 #include "c11/threads.h"
+#include "util/macros.h"

 #ifdef __cplusplus
 #include "main/config.h"
@@ -114,6 +115,42 @@ static inline bool glsl_base_type_is_integer(enum glsl_base_type type)
          type == GLSL_TYPE_IMAGE;
 }

+static inline unsigned int
+glsl_base_type_get_bit_size(const enum glsl_base_type base_type)
+{
+   switch (base_type) {
+   case GLSL_TYPE_BOOL:
+      return 1;
+
+   case GLSL_TYPE_INT:
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_FLOAT: /* TODO handle mediump */
+   case GLSL_TYPE_SUBROUTINE:
+      return 32;
+
+   case GLSL_TYPE_FLOAT16:
+   case GLSL_TYPE_UINT16:
+   case GLSL_TYPE_INT16:
+      return 16;
+
+   case GLSL_TYPE_UINT8:
+   case GLSL_TYPE_INT8:
+      return 8;
+
+   case GLSL_TYPE_DOUBLE:
+   case GLSL_TYPE_INT64:
+   case GLSL_TYPE_UINT64:
+   case GLSL_TYPE_IMAGE:
+   case GLSL_TYPE_SAMPLER:
+      return 64;
+
+   default:
+      unreachable("unknown base type");
+   }
+
+   return 0;
+}
+
 enum glsl_sampler_dim {
   GLSL_SAMPLER_DIM_1D = 0,
   GLSL_SAMPLER_DIM_2D,
--- a/src/compiler/nir/nir_deref.c
+++ b/src/compiler/nir/nir_deref.c
@@ -215,7 +215,7 @@ nir_build_deref_offset(nir_builder *b, nir_deref_instr *deref,
         unsigned field_offset =
            struct_type_get_field_offset(parent->type, size_align,
                                         (*p)->strct.index);
-         nir_iadd(b, offset, nir_imm_int(b, field_offset));
+         offset = nir_iadd(b, offset, nir_imm_int(b, field_offset));
      } else {
         unreachable("Unsupported deref type");
      }
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -404,12 +404,21 @@ dst.x = dst.y = 0.0;
 float absX = fabs(src0.x);
 float absY = fabs(src0.y);
 float absZ = fabs(src0.z);
-if (src0.x >= 0 && absX >= absY && absX >= absZ) { dst.x = -src0.y; dst.y = -src0.z; }
-if (src0.x < 0 && absX >= absY && absX >= absZ) { dst.x = -src0.y; dst.y = src0.z; }
-if (src0.y >= 0 && absY >= absX && absY >= absZ) { dst.x = src0.z; dst.y = src0.x; }
-if (src0.y < 0 && absY >= absX && absY >= absZ) { dst.x = -src0.z; dst.y = src0.x; }
-if (src0.z >= 0 && absZ >= absX && absZ >= absY) { dst.x = -src0.y; dst.y = src0.x; }
-if (src0.z < 0 && absZ >= absX && absZ >= absY) { dst.x = -src0.y; dst.y = -src0.x; }
+
+float ma = 0.0;
+if (absX >= absY && absX >= absZ) { ma = 2 * src0.x; }
+if (absY >= absX && absY >= absZ) { ma = 2 * src0.y; }
+if (absZ >= absX && absZ >= absY) { ma = 2 * src0.z; }
+
+if (src0.x >= 0 && absX >= absY && absX >= absZ) { dst.x = -src0.z; dst.y = -src0.y; }
+if (src0.x < 0 && absX >= absY && absX >= absZ) { dst.x = src0.z; dst.y = -src0.y; }
+if (src0.y >= 0 && absY >= absX && absY >= absZ) { dst.x = src0.x; dst.y = src0.z; }
+if (src0.y < 0 && absY >= absX && absY >= absZ) { dst.x = src0.x; dst.y = -src0.z; }
+if (src0.z >= 0 && absZ >= absX && absZ >= absY) { dst.x = src0.x; dst.y = -src0.y; }
+if (src0.z < 0 && absZ >= absX && absZ >= absY) { dst.x = -src0.x; dst.y = -src0.y; }
+
+dst.x = dst.x / ma + 0.5;
+dst.y = dst.y / ma + 0.5;
 """)

 unop_horiz("cube_face_index", 1, tfloat32, 3, tfloat32, """
--- a/src/compiler/nir/nir_print.c
+++ b/src/compiler/nir/nir_print.c
@@ -812,8 +812,8 @@ print_intrinsic_instr(nir_intrinsic_instr *instr, print_state *state)
         assert(dim < ARRAY_SIZE(dim_name) && dim_name[dim]);
         fprintf(fp, " image_dim=%s", dim_name[dim]);
      } else if (idx == NIR_INTRINSIC_IMAGE_ARRAY) {
-         bool array = nir_intrinsic_image_dim(instr);
-         fprintf(fp, " image_dim=%s", array ? "true" : "false");
+         bool array = nir_intrinsic_image_array(instr);
+         fprintf(fp, " image_array=%s", array ? "true" : "false");
      } else if (idx == NIR_INTRINSIC_DESC_TYPE) {
         VkDescriptorType desc_type = nir_intrinsic_desc_type(instr);
         fprintf(fp, " desc_type=%s", vulkan_descriptor_type_name(desc_type));
--- a/src/compiler/nir/nir_repair_ssa.c
+++ b/src/compiler/nir/nir_repair_ssa.c
@@ -77,6 +77,15 @@ repair_ssa_def(nir_ssa_def *def, void *void_state)
      }
   }

+   nir_foreach_if_use(src, def) {
+      nir_block *block_before_if =
+         nir_cf_node_as_block(nir_cf_node_prev(&src->parent_if->cf_node));
+      if (!nir_block_dominates(def->parent_instr->block, block_before_if)) {
+         is_valid = false;
+         break;
+      }
+   }
+
   if (is_valid)
      return true;

@@ -98,6 +107,15 @@ repair_ssa_def(nir_ssa_def *def, void *void_state)
      }
   }

+   nir_foreach_if_use_safe(src, def) {
+      nir_block *block_before_if =
+         nir_cf_node_as_block(nir_cf_node_prev(&src->parent_if->cf_node));
+      if (!nir_block_dominates(def->parent_instr->block, block_before_if)) {
+         nir_if_rewrite_condition(src->parent_if, nir_src_for_ssa(
+            nir_phi_builder_value_get_block_def(val, block_before_if)));
+      }
+   }
+
   return true;
 }

--- a/src/compiler/nir_types.h
+++ b/src/compiler/nir_types.h
@@ -97,37 +97,7 @@ unsigned glsl_atomic_size(const struct glsl_type *type);
 static inline unsigned
 glsl_get_bit_size(const struct glsl_type *type)
 {
-   switch (glsl_get_base_type(type)) {
-   case GLSL_TYPE_BOOL:
-      return 1;
-
-   case GLSL_TYPE_INT:
-   case GLSL_TYPE_UINT:
-   case GLSL_TYPE_FLOAT: /* TODO handle mediump */
-   case GLSL_TYPE_SUBROUTINE:
-      return 32;
-
-   case GLSL_TYPE_FLOAT16:
-   case GLSL_TYPE_UINT16:
-   case GLSL_TYPE_INT16:
-      return 16;
-
-   case GLSL_TYPE_UINT8:
-   case GLSL_TYPE_INT8:
-      return 8;
-
-   case GLSL_TYPE_DOUBLE:
-   case GLSL_TYPE_INT64:
-   case GLSL_TYPE_UINT64:
-   case GLSL_TYPE_IMAGE:
-   case GLSL_TYPE_SAMPLER:
-      return 64;
-
-   default:
-      unreachable("unknown base type");
-   }
-
-   return 0;
+   return glsl_base_type_get_bit_size(glsl_get_base_type(type));
 }

 bool glsl_type_is_16bit(const struct glsl_type *type);
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -556,11 +556,11 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,

   llvm::SmallVector<std::string, 16> MAttrs;

-#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
-#if HAVE_LLVM >= 0x0400
-   /* llvm-3.7+ implements sys::getHostCPUFeatures for x86,
-    * which allows us to enable/disable code generation based
-    * on the results of cpuid.
+#if HAVE_LLVM >= 0x0400 && (defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) || defined(PIPE_ARCH_ARM))
+   /* llvm-3.3+ implements sys::getHostCPUFeatures for Arm
+    * and llvm-3.7+ for x86, which allows us to enable/disable
+    * code generation based on the results of cpuid on these
+    * architectures.
    */
   llvm::StringMap<bool> features;
   llvm::sys::getHostCPUFeatures(features);
@@ -570,7 +570,7 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
        ++f) {
      MAttrs.push_back(((*f).second ? "+" : "-") + (*f).first().str());
   }
-#else
+#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
   /*
    * We need to unset attributes because sometimes LLVM mistakenly assumes
    * certain features are present given the processor name.
@@ -625,6 +625,12 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
   MAttrs.push_back("-avx512vl");
 #endif
 #endif
+#if defined(PIPE_ARCH_ARM)
+   if (!util_cpu_caps.has_neon) {
+      MAttrs.push_back("-neon");
+      MAttrs.push_back("-crypto");
+      MAttrs.push_back("-vfp2");
+   }
 #endif

 #if defined(PIPE_ARCH_PPC)
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -1108,7 +1108,7 @@ get_indirect_index(struct lp_build_tgsi_soa_context *bld,
    * larger than the declared size but smaller than the buffer size.
    */
   if (reg_file != TGSI_FILE_CONSTANT) {
-      assert(index_limit > 0);
+      assert(index_limit >= 0);
      max_index = lp_build_const_int_vec(bld->bld_base.base.gallivm,
                                         uint_bld->type, index_limit);

--- a/src/gallium/drivers/radeon/radeon_vcn_dec.c
+++ b/src/gallium/drivers/radeon/radeon_vcn_dec.c
@@ -64,6 +64,7 @@ static rvcn_dec_message_avc_t get_h264_msg(struct radeon_decoder *dec,
 	memset(&result, 0, sizeof(result));
 	switch (pic->base.profile) {
 	case PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE:
+	case PIPE_VIDEO_PROFILE_MPEG4_AVC_CONSTRAINED_BASELINE:
 		result.profile = RDECODE_H264_PROFILE_BASELINE;
 		break;

@@ -490,7 +491,7 @@ static rvcn_dec_message_vp9_t get_vp9_msg(struct radeon_decoder *dec,

 	assert(dec->base.max_references + 1 <= 16);

-	for (i = 0 ; i < dec->base.max_references + 1 ; ++i) {
+	for (i = 0 ; i < 16 ; ++i) {
 		if (dec->render_pic_list[i] && dec->render_pic_list[i] == target) {
 			result.curr_pic_idx =
 				(uintptr_t)vl_video_buffer_get_associated_data(target, &dec->base);
--- a/src/gallium/drivers/radeonsi/si_clear.c
+++ b/src/gallium/drivers/radeonsi/si_clear.c
@@ -272,7 +272,7 @@ void vi_dcc_clear_level(struct si_context *sctx,
 	}

 	si_clear_buffer(sctx, dcc_buffer, dcc_offset, clear_size,
-			&clear_value, 4, SI_COHERENCY_CB_META);
+			&clear_value, 4, SI_COHERENCY_CB_META, false);
 }

 /* Set the same micro tile mode as the destination of the last MSAA resolve.
@@ -505,7 +505,7 @@ static void si_do_fast_color_clear(struct si_context *sctx,
 				uint32_t clear_value = 0xCCCCCCCC;
 				si_clear_buffer(sctx, &tex->cmask_buffer->b.b,
 						tex->cmask_offset, tex->surface.cmask_size,
-						&clear_value, 4, SI_COHERENCY_CB_META);
+						&clear_value, 4, SI_COHERENCY_CB_META, false);
 				fmask_decompress_needed = true;
 			}

@@ -533,7 +533,7 @@ static void si_do_fast_color_clear(struct si_context *sctx,
 			uint32_t clear_value = 0;
 			si_clear_buffer(sctx, &tex->cmask_buffer->b.b,
 					tex->cmask_offset, tex->surface.cmask_size,
-					&clear_value, 4, SI_COHERENCY_CB_META);
+					&clear_value, 4, SI_COHERENCY_CB_META, false);
 			eliminate_needed = true;
 		}

--- a/src/gallium/drivers/radeonsi/si_compute_blit.c
+++ b/src/gallium/drivers/radeonsi/si_compute_blit.c
@@ -177,7 +177,8 @@ static void si_compute_do_clear_or_copy(struct si_context *sctx,

 void si_clear_buffer(struct si_context *sctx, struct pipe_resource *dst,
 		     uint64_t offset, uint64_t size, uint32_t *clear_value,
-		     uint32_t clear_value_size, enum si_coherency coher)
+		     uint32_t clear_value_size, enum si_coherency coher,
+		     bool force_cpdma)
 {
 	if (!size)
 		return;
@@ -241,7 +242,8 @@ void si_clear_buffer(struct si_context *sctx, struct pipe_resource *dst,
 		 * about buffer placements.
 		 */
 		if (clear_value_size > 4 ||
-		    (clear_value_size == 4 &&
+		    (!force_cpdma &&
+		     clear_value_size == 4 &&
 		     offset % 4 == 0 &&
 		     (size > 32*1024 || sctx->chip_class <= VI))) {
 			si_compute_do_clear_or_copy(sctx, dst, offset, NULL, 0,
@@ -282,7 +284,7 @@ static void si_pipe_clear_buffer(struct pipe_context *ctx,
 		coher = SI_COHERENCY_SHADER;

 	si_clear_buffer((struct si_context*)ctx, dst, offset, size, (uint32_t*)clear_value,
-			clear_value_size, coher);
+			clear_value_size, coher, false);
 }

 void si_copy_buffer(struct si_context *sctx,
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -609,11 +609,14 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen,
 	si_begin_new_gfx_cs(sctx);

 	if (sctx->chip_class == CIK) {
-		/* Clear the NULL constant buffer, because loads should return zeros. */
+		/* Clear the NULL constant buffer, because loads should return zeros.
+		 * Note that this forces CP DMA to be used, because clover deadlocks
+		 * for some reason when the compute codepath is used.
+		 */
 		uint32_t clear_value = 0;
 		si_clear_buffer(sctx, sctx->null_const_buf.buffer, 0,
 				sctx->null_const_buf.buffer->width0,
-				&clear_value, 4, SI_COHERENCY_SHADER);
+				&clear_value, 4, SI_COHERENCY_SHADER, true);
 	}
 	return &sctx->b;
 fail:
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -1168,7 +1168,8 @@ unsigned si_get_flush_flags(struct si_context *sctx, enum si_coherency coher,
 			    enum si_cache_policy cache_policy);
 void si_clear_buffer(struct si_context *sctx, struct pipe_resource *dst,
 		     uint64_t offset, uint64_t size, uint32_t *clear_value,
-		     uint32_t clear_value_size, enum si_coherency coher);
+		     uint32_t clear_value_size, enum si_coherency coher,
+		     bool force_cpdma);
 void si_copy_buffer(struct si_context *sctx,
 		    struct pipe_resource *dst, struct pipe_resource *src,
 		    uint64_t dst_offset, uint64_t src_offset, unsigned size);
--- a/src/gallium/drivers/radeonsi/si_state_viewport.c
+++ b/src/gallium/drivers/radeonsi/si_state_viewport.c
@@ -186,7 +186,7 @@ static void si_emit_guardband(struct si_context *ctx)
 		ctx->chip_class >= VI ? 16 : MAX2(ctx->screen->se_tile_repeat, 16);

 	/* Indexed by quantization modes */
-	static unsigned max_viewport_size[] = {65535, 16383, 4095};
+	static int max_viewport_size[] = {65535, 16383, 4095};

 	/* Ensure that the whole viewport stays representable in
 	 * absolute coordinates.
--- a/src/gallium/drivers/radeonsi/si_test_dma.c
+++ b/src/gallium/drivers/radeonsi/si_test_dma.c
@@ -309,7 +309,7 @@ void si_test_dma(struct si_screen *sscreen)
 		/* clear dst pixels */
 		uint32_t zero = 0;
 		si_clear_buffer(sctx, dst, 0, sdst->surface.surf_size, &zero, 4,
-		                SI_COHERENCY_SHADER);
+		                SI_COHERENCY_SHADER, false);
 		memset(dst_cpu.ptr, 0, dst_cpu.layer_stride * tdst.array_size);

 		/* preparation */
--- a/src/gallium/drivers/v3d/v3d_blit.c
+++ b/src/gallium/drivers/v3d/v3d_blit.c
@@ -491,7 +491,8 @@ v3d_tfu_blit(struct pipe_context *pctx, const struct pipe_blit_info *info)
        if ((info->mask & PIPE_MASK_RGBA) == 0)
                return false;

-        if (info->dst.box.x != 0 ||
+        if (info->scissor_enable ||
+            info->dst.box.x != 0 ||
            info->dst.box.y != 0 ||
            info->dst.box.width != dst_width ||
            info->dst.box.height != dst_height ||
--- a/src/gallium/drivers/v3d/v3d_screen.c
+++ b/src/gallium/drivers/v3d/v3d_screen.c
@@ -185,7 +185,10 @@ v3d_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
        case PIPE_CAP_MAX_TEXTURE_2D_LEVELS:
        case PIPE_CAP_MAX_TEXTURE_CUBE_LEVELS:
        case PIPE_CAP_MAX_TEXTURE_3D_LEVELS:
-                return V3D_MAX_MIP_LEVELS;
+                if (screen->devinfo.ver < 40)
+                        return 12;
+                else
+                        return V3D_MAX_MIP_LEVELS;
        case PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS:
                return 2048;

--- a/src/gallium/drivers/v3d/v3dx_draw.c
+++ b/src/gallium/drivers/v3d/v3dx_draw.c
@@ -55,7 +55,28 @@ v3d_start_draw(struct v3d_context *v3d)
        job->submit.bcl_start = job->bcl.bo->offset;
        v3d_job_add_bo(job, job->bcl.bo);

-        job->tile_alloc = v3d_bo_alloc(v3d->screen, 1024 * 1024, "tile_alloc");
+        /* The PTB will request the tile alloc initial size per tile at start
+         * of tile binning.
+         */
+        uint32_t tile_alloc_size = (job->draw_tiles_x *
+                                    job->draw_tiles_y) * 64;
+        /* The PTB allocates in aligned 4k chunks after the initial setup. */
+        tile_alloc_size = align(tile_alloc_size, 4096);
+
+        /* Include the first two chunk allocations that the PTB does so that
+         * we definitely clear the OOM condition before triggering one (the HW
+         * won't trigger OOM during the first allocations).
+         */
+        tile_alloc_size += 8192;
+
+        /* For performance, allocate some extra initial memory after the PTB's
+         * minimal allocations, so that we hopefully don't have to block the
+         * GPU on the kernel handling an OOM signal.
+         */
+        tile_alloc_size += 512 * 1024;
+
+        job->tile_alloc = v3d_bo_alloc(v3d->screen, tile_alloc_size,
+                                       "tile_alloc");
        uint32_t tsda_per_tile_size = v3d->screen->devinfo.ver >= 40 ? 256 : 64;
        job->tile_state = v3d_bo_alloc(v3d->screen,
                                       job->draw_tiles_y *
--- a/src/gallium/drivers/v3d/v3dx_state.c
+++ b/src/gallium/drivers/v3d/v3dx_state.c
@@ -846,6 +846,9 @@ v3d_setup_texture_shader_state(struct V3DX(TEXTURE_SHADER_STATE) *tex,
            prsc->target == PIPE_TEXTURE_1D_ARRAY) {
                tex->image_height = tex->image_width >> 14;
        }
+
+        tex->image_width &= (1 << 14) - 1;
+        tex->image_height &= (1 << 14) - 1;
 #endif

        if (prsc->target == PIPE_TEXTURE_3D) {
--- a/src/gallium/state_trackers/va/picture_mpeg12.c
+++ b/src/gallium/state_trackers/va/picture_mpeg12.c
@@ -27,6 +27,19 @@

 #include "va_private.h"

+const int reverse_inverse_zscan[] =
+{
+   /* Reverse inverse z scan pattern */
+    0,  2,  3,  9, 10, 20, 21, 35,
+    1,  4,  8, 11, 19, 22, 34, 36,
+    5,  7, 12, 18, 23, 33, 37, 48,
+    6, 13, 17, 24, 32, 38, 47, 49,
+   14, 16, 25, 31, 39, 46, 50, 57,
+   15, 26, 30, 40, 45, 51, 56, 58,
+   27, 29, 41, 44, 52, 55, 59, 62,
+   28, 42, 43, 53, 54, 60, 61, 63,
+};
+
 void vlVaHandlePictureParameterBufferMPEG12(vlVaDriver *drv, vlVaContext *context, vlVaBuffer *buf)
 {
   VAPictureParameterBufferMPEG2 *mpeg2 = buf->data;
@@ -66,16 +79,29 @@ void vlVaHandlePictureParameterBufferMPEG12(vlVaDriver *drv, vlVaContext *contex
 void vlVaHandleIQMatrixBufferMPEG12(vlVaContext *context, vlVaBuffer *buf)
 {
   VAIQMatrixBufferMPEG2 *mpeg2 = buf->data;
+   static uint8_t temp_intra_matrix[64];
+   static uint8_t temp_nonintra_matrix[64];

   assert(buf->size >= sizeof(VAIQMatrixBufferMPEG2) && buf->num_elements == 1);
-   if (mpeg2->load_intra_quantiser_matrix)
-      context->desc.mpeg12.intra_matrix = mpeg2->intra_quantiser_matrix;
-   else
+   if (mpeg2->load_intra_quantiser_matrix) {
+      /* The quantiser matrix that VAAPI provides has been applied
+         with inverse z-scan. However, what we expect in MPEG2
+         picture description is the original order. Therefore,
+         we need to reverse it back to its original order.
+      */
+      for (int i = 0; i < 64; i++)
+         temp_intra_matrix[i] =
+            mpeg2->intra_quantiser_matrix[reverse_inverse_zscan[i]];
+      context->desc.mpeg12.intra_matrix = temp_intra_matrix;
+   } else
      context->desc.mpeg12.intra_matrix = NULL;

-   if (mpeg2->load_non_intra_quantiser_matrix)
-      context->desc.mpeg12.non_intra_matrix = mpeg2->non_intra_quantiser_matrix;
-   else
+   if (mpeg2->load_non_intra_quantiser_matrix) {
+      for (int i = 0; i < 64; i++)
+         temp_nonintra_matrix[i] =
+            mpeg2->non_intra_quantiser_matrix[reverse_inverse_zscan[i]];
+      context->desc.mpeg12.non_intra_matrix = temp_nonintra_matrix;
+   } else
      context->desc.mpeg12.non_intra_matrix = NULL;
 }

--- a/src/gallium/targets/dri/meson.build
+++ b/src/gallium/targets/dri/meson.build
@@ -60,6 +60,9 @@ libgallium_dri = shared_library(
    driver_tegra, driver_i915, driver_svga, driver_virgl,
    driver_swr,
  ],
+  # Will be deleted during installation, see install_megadrivers.py
+  install : true,
+  install_dir : dri_drivers_path,
 )

 foreach d : [[with_gallium_kmsro, 'pl111_dri.so'],
--- a/src/gallium/targets/va/meson.build
+++ b/src/gallium/targets/va/meson.build
@@ -49,6 +49,7 @@ libva_gallium = shared_library(
    dep_libdrm, dep_thread, driver_r600, driver_radeonsi, driver_nouveau,
  ],
  link_depends : va_link_depends,
+  # Will be deleted during installation, see install_megadrivers.py
  install : true,
  install_dir : va_drivers_path,
 )
--- a/src/gallium/targets/vdpau/meson.build
+++ b/src/gallium/targets/vdpau/meson.build
@@ -55,6 +55,9 @@ libvdpau_gallium = shared_library(
  ],
  link_depends : vdpau_link_depends,
  soversion : '@0@.@1@.0'.format(VDPAU_MAJOR, VDPAU_MINOR),
+  # Will be deleted during installation, see install_megadrivers.py
+  install : true,
+  install_dir : vdpau_drivers_path,
 )
 foreach d : [[with_gallium_r300, 'r300'],
             [with_gallium_r600, 'r600'],
--- a/src/gallium/targets/xvmc/meson.build
+++ b/src/gallium/targets/xvmc/meson.build
@@ -47,6 +47,9 @@ libxvmc_gallium = shared_library(
  ],
  dependencies : [dep_thread, driver_r600, driver_nouveau],
  link_depends : xvmc_link_depends,
+  # Will be deleted during installation, see install_megadrivers.py
+  install : true,
+  install_dir : xvmc_drivers_path,
 )

 foreach d : [[with_gallium_r600, 'r600'], [with_gallium_nouveau, 'nouveau']]
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
@@ -92,6 +92,10 @@ static bool do_winsys_init(struct amdgpu_winsys *ws,
   if (!ac_query_gpu_info(fd, ws->dev, &ws->info, &ws->amdinfo))
      goto fail;

+   /* TODO: Enable this once the kernel handles it efficiently. */
+   if (ws->info.has_dedicated_vram)
+      ws->info.has_local_buffers = false;
+
   handle_env_var_force_family(ws);

   ws->addrlib = amdgpu_addr_create(&ws->info, &ws->amdinfo, &ws->info.max_alignment);
--- a/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c
+++ b/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c
@@ -46,7 +46,7 @@


 #define VIRGL_DRM_VERSION(major, minor) ((major) << 16 | (minor))
-#define VIRGL_DRM_VERSION_FENCE_FD      VIRGL_DRM_VERSION(1, 0)
+#define VIRGL_DRM_VERSION_FENCE_FD      VIRGL_DRM_VERSION(0, 1)


 static inline boolean can_cache_resource(struct virgl_hw_res *res)
@@ -870,7 +870,7 @@ static int virgl_drm_get_version(int fd)
 	else if (version->version_major != 0)
 		ret = -EINVAL;
 	else
-		ret = version->version_minor;
+		ret = VIRGL_DRM_VERSION(0, version->version_minor);

 	drmFreeVersion(version);

--- a/src/glx/dri3_glx.c
+++ b/src/glx/dri3_glx.c
@@ -642,7 +642,6 @@ dri3_set_swap_interval(__GLXDRIdrawable *pdraw, int interval)
      break;
   }

-   priv->swap_interval = interval;
   loader_dri3_set_swap_interval(&priv->loader_drawable, interval);

   return 0;
@@ -659,7 +658,7 @@ dri3_get_swap_interval(__GLXDRIdrawable *pdraw)

   struct dri3_drawable *priv =  (struct dri3_drawable *) pdraw;

-  return priv->swap_interval;
+  return priv->loader_drawable.swap_interval;
 }

 static void
--- a/src/glx/dri3_priv.h
+++ b/src/glx/dri3_priv.h
@@ -117,7 +117,6 @@ struct dri3_context
 struct dri3_drawable {
   __GLXDRIdrawable base;
   struct loader_dri3_drawable loader_drawable;
-   int swap_interval;

   /* LIBGL_SHOW_FPS support */
   uint64_t previous_ust;
--- a/src/intel/blorp/meson.build
+++ b/src/intel/blorp/meson.build
@@ -33,5 +33,5 @@ libblorp = static_library(
  files_libblorp,
  include_directories : [inc_common, inc_intel],
  c_args : [c_vis_args, no_override_init_args],
-  dependencies : idep_nir_headers,
+  dependencies : [idep_nir_headers, idep_genxml],
 )
--- a/src/intel/common/meson.build
+++ b/src/intel/common/meson.build
@@ -43,5 +43,5 @@ libintel_common = static_library(
  include_directories : [inc_common, inc_intel],
  c_args : [c_vis_args, no_override_init_args],
  link_with : [libisl],
-  dependencies : [dep_expat, dep_libdrm, dep_thread],
+  dependencies : [dep_expat, dep_libdrm, dep_thread, idep_genxml],
 )
--- a/src/intel/compiler/brw_vec4.cpp
+++ b/src/intel/compiler/brw_vec4.cpp
@@ -1160,6 +1160,12 @@ vec4_instruction::can_reswizzle(const struct gen_device_info *devinfo,
   if (devinfo->gen == 6 && is_math() && swizzle != BRW_SWIZZLE_XYZW)
      return false;

+   /* If we write to the flag register changing the swizzle would change
+    * what channels are written to the flag register.
+    */
+   if (writes_flag())
+      return false;
+
   /* We can't swizzle implicit accumulator access.  We'd have to
    * reswizzle the producer of the accumulator value in addition
    * to the consumer (i.e. both MUL and MACH).  Just skip this.
--- a/src/intel/dev/gen_device_info.c
+++ b/src/intel/dev/gen_device_info.c
@@ -414,6 +414,7 @@ static const struct gen_device_info gen_device_info_hsw_gt3 = {
   .has_64bit_types = true,                         \
   .supports_simd16_3src = true,                    \
   .has_surface_tile_offset = true,                 \
+   .num_thread_per_eu = 7,                          \
   .max_vs_threads = 504,                           \
   .max_tcs_threads = 504,                          \
   .max_tes_threads = 504,                          \
@@ -427,7 +428,6 @@ static const struct gen_device_info gen_device_info_bdw_gt1 = {
   .num_slices = 1,
   .num_subslices = { 2, },
   .num_eu_per_subslice = 8,
-   .num_thread_per_eu = 7,
   .l3_banks = 2,
   .max_cs_threads = 42,
   .urb = {
@@ -452,7 +452,6 @@ static const struct gen_device_info gen_device_info_bdw_gt2 = {
   .num_slices = 1,
   .num_subslices = { 3, },
   .num_eu_per_subslice = 8,
-   .num_thread_per_eu = 7,
   .l3_banks = 4,
   .max_cs_threads = 56,
   .urb = {
@@ -477,7 +476,6 @@ static const struct gen_device_info gen_device_info_bdw_gt3 = {
   .num_slices = 2,
   .num_subslices = { 3, 3, },
   .num_eu_per_subslice = 8,
-   .num_thread_per_eu = 7,
   .l3_banks = 8,
   .max_cs_threads = 56,
   .urb = {
@@ -503,7 +501,6 @@ static const struct gen_device_info gen_device_info_chv = {
   .num_slices = 1,
   .num_subslices = { 2, },
   .num_eu_per_subslice = 8,
-   .num_thread_per_eu = 7,
   .l3_banks = 2,
   .max_vs_threads = 80,
   .max_tcs_threads = 80,
@@ -609,8 +606,7 @@ static const struct gen_device_info gen_device_info_chv = {
 #define GEN9_FEATURES                               \
   GEN8_FEATURES,                                   \
   GEN9_HW_INFO,                                    \
-   .has_sample_with_hiz = true,                     \
-   .num_thread_per_eu = 7
+   .has_sample_with_hiz = true

 static const struct gen_device_info gen_device_info_skl_gt1 = {
   GEN9_FEATURES, .gt = 1,
--- a/src/intel/genxml/meson.build
+++ b/src/intel/genxml/meson.build
@@ -57,3 +57,5 @@ foreach f : gen_xml_files
    capture : true,
  )
 endforeach
+
+idep_genxml = declare_dependency(sources : [gen_xml_pack, genX_bits_h, genX_xml_h])
--- a/src/intel/meson.build
+++ b/src/intel/meson.build
@@ -21,9 +21,9 @@
 c_sse2_args = ['-msse2', '-mstackrealign']
 inc_intel = include_directories('.')

+subdir('genxml')
 subdir('blorp')
 subdir('dev')
-subdir('genxml')
 subdir('isl')
 subdir('common')
 subdir('compiler')
--- a/src/intel/vulkan/anv_descriptor_set.c
+++ b/src/intel/vulkan/anv_descriptor_set.c
@@ -73,10 +73,10 @@ void anv_GetDescriptorSetLayoutSupport(

   bool supported = true;
   for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
-      /* Our maximum binding table size is 250 and we need to reserve 8 for
-       * render targets.  240 is a nice round number.
+      /* Our maximum binding table size is 240 and we need to reserve 8 for
+       * render targets.
       */
-      if (surface_count[s] >= 240)
+      if (surface_count[s] >= MAX_BINDING_TABLE_SIZE - MAX_RTS)
         supported = false;
   }

--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -41,7 +41,6 @@
 #include "git_sha1.h"
 #include "vk_util.h"
 #include "common/gen_defines.h"
-#include "compiler/glsl_types.h"

 #include "genxml/gen7_pack.h"

@@ -704,7 +703,6 @@ void anv_DestroyInstance(

   vk_debug_report_instance_destroy(&instance->debug_report_callbacks);

-   _mesa_glsl_release_types();
   _mesa_locale_fini();

   vk_free(&instance->alloc, instance);
@@ -1031,7 +1029,7 @@ void anv_GetPhysicalDeviceProperties(
      .maxPerStageDescriptorSampledImages       = max_samplers,
      .maxPerStageDescriptorStorageImages       = max_images,
      .maxPerStageDescriptorInputAttachments    = 64,
-      .maxPerStageResources                     = 250,
+      .maxPerStageResources                     = MAX_BINDING_TABLE_SIZE - MAX_RTS,
      .maxDescriptorSetSamplers                 = 6 * max_samplers, /* number of stages * maxPerStageDescriptorSamplers */
      .maxDescriptorSetUniformBuffers           = 6 * 64,           /* number of stages * maxPerStageDescriptorUniformBuffers */
      .maxDescriptorSetUniformBuffersDynamic    = MAX_DYNAMIC_BUFFERS / 2,
@@ -1868,7 +1866,7 @@ VkResult anv_CreateDevice(
      result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
      goto fail_mutex;
   }
-   if (pthread_cond_init(&device->queue_submit, NULL) != 0) {
+   if (pthread_cond_init(&device->queue_submit, &condattr) != 0) {
      pthread_condattr_destroy(&condattr);
      result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
      goto fail_mutex;
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -163,6 +163,18 @@ struct gen_l3_config;
 #define MAX_GEN8_IMAGES 8
 #define MAX_PUSH_DESCRIPTORS 32 /* Minimum requirement */

+/* From the Skylake PRM Vol. 7 "Binding Table Surface State Model":
+ *
+ *    "The surface state model is used when a Binding Table Index (specified
+ *    in the message descriptor) of less than 240 is specified. In this model,
+ *    the Binding Table Index is used to index into the binding table, and the
+ *    binding table entry contains a pointer to the SURFACE_STATE."
+ *
+ * Binding table values above 240 are used for various things in the hardware
+ * such as stateless, stateless with incoherent cache, SLM, and bindless.
+ */
+#define MAX_BINDING_TABLE_SIZE 240
+
 /* The kernel relocation API has a limitation of a 32-bit delta value
 * applied to the address before it is written which, in spite of it being
 * unsigned, is treated as signed .  Because of the way that this maps to
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -2087,9 +2087,29 @@ compute_pipeline_create(
      vfe.URBEntryAllocationSize = GEN_GEN <= 7 ? 0 : 2;
      vfe.CURBEAllocationSize    = vfe_curbe_allocation;

-      vfe.PerThreadScratchSpace = get_scratch_space(cs_bin);
-      vfe.ScratchSpaceBasePointer =
-         get_scratch_address(pipeline, MESA_SHADER_COMPUTE, cs_bin);
+      if (cs_bin->prog_data->total_scratch) {
+         if (GEN_GEN >= 8) {
+            /* Broadwell's Per Thread Scratch Space is in the range [0, 11]
+             * where 0 = 1k, 1 = 2k, 2 = 4k, ..., 11 = 2M.
+             */
+            vfe.PerThreadScratchSpace =
+               ffs(cs_bin->prog_data->total_scratch) - 11;
+         } else if (GEN_IS_HASWELL) {
+            /* Haswell's Per Thread Scratch Space is in the range [0, 10]
+             * where 0 = 2k, 1 = 4k, 2 = 8k, ..., 10 = 2M.
+             */
+            vfe.PerThreadScratchSpace =
+               ffs(cs_bin->prog_data->total_scratch) - 12;
+         } else {
+            /* IVB and BYT use the range [0, 11] to mean [1kB, 12kB]
+             * where 0 = 1kB, 1 = 2kB, 2 = 3kB, ..., 11 = 12kB.
+             */
+            vfe.PerThreadScratchSpace =
+               cs_bin->prog_data->total_scratch / 1024 - 1;
+         }
+         vfe.ScratchSpaceBasePointer =
+            get_scratch_address(pipeline, MESA_SHADER_COMPUTE, cs_bin);
+      }
   }

   struct GENX(INTERFACE_DESCRIPTOR_DATA) desc = {
--- a/src/intel/vulkan/meson.build
+++ b/src/intel/vulkan/meson.build
@@ -105,7 +105,7 @@ foreach g : [['70', ['gen7_cmd_buffer.c']], ['75', ['gen7_cmd_buffer.c']],
      c_vis_args, no_override_init_args, c_sse2_args,
      '-DGEN_VERSIONx10=@0@'.format(_gen),
    ],
-    dependencies : [dep_libdrm, dep_valgrind, idep_nir_headers],
+    dependencies : [dep_libdrm, dep_valgrind, idep_nir_headers, idep_genxml],
  )
 endforeach

@@ -203,7 +203,7 @@ libvulkan_intel = shared_library(
    libvulkan_util, libvulkan_wsi, libmesa_util,
  ],
  dependencies : [
-    dep_thread, dep_dl, dep_m, anv_deps, idep_nir,
+    dep_thread, dep_dl, dep_m, anv_deps, idep_nir, idep_genxml,
  ],
  c_args : anv_flags,
  link_args : ['-Wl,--build-id=sha1', ld_args_bsymbolic, ld_args_gc_sections],
--- a/src/mesa/drivers/dri/i965/meson.build
+++ b/src/mesa/drivers/dri/i965/meson.build
@@ -187,7 +187,7 @@ libi965 = static_library(
    i965_gen_libs, libintel_common, libintel_dev, libisl, libintel_compiler,
    libblorp
  ],
-  dependencies : [dep_libdrm, dep_valgrind, idep_nir_headers],
+  dependencies : [dep_libdrm, dep_valgrind, idep_nir_headers, idep_genxml],
 )

 dri_drivers += libi965
--- a/src/mesa/drivers/dri/meson.build
+++ b/src/mesa/drivers/dri/meson.build
@@ -54,6 +54,9 @@ if dri_drivers != []
      dep_selinux, dep_libdrm, dep_expat, dep_m, dep_thread, dep_dl, idep_nir,
    ],
    link_args : [ld_args_build_id, ld_args_bsymbolic, ld_args_gc_sections],
+    # Will be deleted during installation, see install_megadrivers.py
+    install : true,
+    install_dir : dri_drivers_path,
  )

  meson.add_install_script(
Author	SHA1	Message	Date
Dylan Baker	c8cdee5dc3	docs: add relnotes for 19.0.3	2019-04-24 10:39:04 -07:00
Dylan Baker	5cb685a3b8	Bump version for 19.0.3	2019-04-24 10:36:18 -07:00
Marek Olšák	44ddb884c8	radeonsi: use CP DMA for the null const buffer clear on CIK This is a workaround for a thread deadlock that I have no idea why it occurs. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879 Fixes: `9b331e462e` Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (cherry picked from commit `b58e5fb6f3`)	2019-04-24 08:50:50 -07:00
Samuel Pitoiset	ba1bf6c3ea	radv: do not load vertex attributes that are not provided by the pipeline Per the Vulkan spec this is definitely invalid but X4 Foundations does that and it ends up by hanging the GPU. Found while enabling validation layers with the game. The issue will be reported to the developers. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit 381e38aaaa47c5aa38bc4f504b325fb68b7caea8)	2019-04-24 08:50:34 -07:00
Kenneth Graunke	f223fb98e9	Revert "glsl: Set location on structure-split sampler uniform variables" This reverts commit `9e0c744f07`, which regressed dEQP-GLES2.functional.uniform_api.random.3. It turns out that the newly produced location is meaningless and impossible to consume by drivers that want to look at gl_uniform_storage, so it's probably better to leave it unset (0) than a number that looks usable. Leave a tombstone^Wcomment to discourage the next person from making the obvious looking fix. See the next commit for a longer description of the problem. This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct on i965, which was originally fixed by the revert. The next commit will fix it again. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit `47303b466c`)	2019-04-23 09:18:20 -07:00
Lubomir Rintel	91671ec1f4	gallivm: disable NEON instructions if they are not supported The LLVM project made some questionable decisions about defaults for armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell platforms). On top of that, getHostCPUFeatures() doesn't disable missing machine attributes. Finally, -neon alone is not sufficient to disable emmision of NEON instructions. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> (cherry picked from commit `e983a975c6`)	2019-04-23 09:17:58 -07:00
Lubomir Rintel	b509068164	gallivm: guess CPU features also on ARM getHostCPUFeatures() is also available on ARM, for even longer time than for x86. Use it -- it potentially enables instructions that may speed things up. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: <mesa-stable@lists.freedesktop.org> Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518 Reviewed-by: Matt Turner <mattst88@gmail.com> (cherry picked from commit `bc6bfc861f`)	2019-04-23 09:17:51 -07:00
Jason Ekstrand	2397f5d99d	anv: Add a #define for the max binding table size This also fixes a bug where we mis-calculate maximum binding table sizes and may return true in vkGetDescriptorSetLayoutSupport even for sets too large to fit in a binding table. Fixes: `ddc4069122` "anv: Implement VK_KHR_maintenance3" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (cherry picked from commit `a5a0dc08f1`)	2019-04-22 09:06:44 -07:00
Lionel Landwerlin	ac1ffeab1d	intel/devinfo: fix missing num_thread_per_eu on ICL There was an assumption that num_thread_per_eu would be set in the Gen8 features. Since this is mostly the same of all gen8->11 (except GEN9_LP that overwrites it) let's just factor it out. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Anuj Phogat anuj.phogat@gmail.com (cherry picked from commit `773e6aa9fd`)	2019-04-22 09:06:38 -07:00
Eric Anholt	229c4abde3	nir: Fix deref offset calculation for structs. We were calcuating the offset for the field within the struct, and just dropping it on the floor. Fixes a regression in KHR-GLES3.shaders.struct.local.nested_struct_array_dynamic_index_fragment and a few of its friends since the scratch lowering commit. Fixes: `e8e159e9df` ("nir/deref: Add helpers for getting offsets") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit `9ac5ec2f90`)	2019-04-22 09:06:32 -07:00
Samuel Pitoiset	b5ea4378c3	ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+ They are buggy with LLVM 8 because they weren't marked as source of divergence, see r358579. Fixes: `dd0172e865` ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")" Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (cherry picked from commit `31164cf5f7`)	2019-04-22 09:06:26 -07:00
Lionel Landwerlin	23abb7d310	anv: fix uninitialized pthread cond clock domain Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `843775bab7` ("anv: Rework fences") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit `dfd79079da`)	2019-04-18 16:18:46 -07:00
Juan A. Suarez Romero	32e08b2397	meson: Add dependency on genxml to anvil genfiles This fixes a race condition where anv_gen_files are executed before genxml files, which causes a build failure v2: add dependency on idep_genxml (Lionel) Fixes: `d1992255bb` ("meson: Add build Intel "anv" vulkan driver") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (cherry picked from commit `b74e605cf4`)	2019-04-17 12:47:45 -07:00
Danylo Piliaiev	bde36e0736	intel/compiler: Do not reswizzle dst if instruction writes to flag register If we write to the flag register changing the swizzle would change what channels are written to the flag register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110201 Fixes: `4cd1a0be` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: <ian.d.romanick@intel.com> (cherry picked from commit `04508f57d1`)	2019-04-16 09:43:38 -07:00
Chia-I Wu	3400359432	virgl: fix fence fd version check Fixes: `d1a1c21e76` ("virgl: native fence fd support") Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `c45c889f95`)	2019-04-16 09:43:32 -07:00
Roland Scheidegger	ce4b6974cd	gallivm: fix bogus assert in get_indirect_index 0 is a valid value as max index, and the code handles it fine. This isn't commonly seen, as it will only happen with array declarations of size 1. Fixes piglit tests/shaders/complex-loop-analysis-bug.shader_test Fixes: `a3c898dc97` "gallivm: fix improper clamping of vertex index when fetching gs inputs" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110441 Reviewed-by: Brian Paul <brianp@vmware.com> (cherry picked from commit `88e0bbf24a`)	2019-04-16 09:43:25 -07:00
Bas Nieuwenhuizen	0ffd4c744d	ac: Move has_local_buffers disable to radeonsi. In radv we had a separate flag to actually use it + an env option to experimentally use it. The common code setting has_local_buffers to false of course broke that experimental option. Also the "enable on APU" did not make sense for RADV as it is still disabled by default. Fixes: `b21a4efb55` "radv/winsys: allow local BOs on APUs" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (cherry picked from commit `af9534b9f3`)	2019-04-16 09:43:18 -07:00
Rhys Perry	77dbb70e5c	nir,ac/nir: fix cube_face_coord Seems it was missing the "/ ma + 0.5" and the order was swapped. Fixes: `a1a2a8dfda` ('nir: add AMD_gcn_shader extended instructions') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (cherry picked from commit `8671cfe2a2`)	2019-04-16 09:43:10 -07:00
Andres Gomez	332da02f27	glsl/linker: location aliasing requires types to have the same width From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers, Page 67, (Location aliasing): " Further, when location aliasing, the aliases sharing the location must have the same underlying numerical type and bit width (floating-point or integer, 32-bit versus 64-bit, etc.) and the same auxiliary storage and interpolation qualification." Additionally, we have improved the linker error descriptions. Specifically, when taking structs into account we were producing a linker error because we assumed that all components in each location were used and that would cause component aliasing. This is not accurate of the actual problem. Now, the failure specifies that the underlying numerical type incompatibility is the cause for the failure. Fixes the following piglit test: tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test v2: - Do not assert if we see invalid numerical types. These come straight from shader code, so we should produce linker errors if shaders attempt to do location aliasing on variables that are not numerical such as records. - While we are at it, improve error reporting for the case of numerical type mismatch to include the shader stage. v3: - Allow location aliasing of images and samplers. If we get these it means bindless support is active and they should be handled as 64-bit integers (Ilia) - Make sure we produce link errors for any non-numerical type for which we attempt location aliasing, not just structs. v4: - Rebased with minor fixes (Andres). - Added fixing tag to the commit log (Andres). v5: - Remove the helper function and check individually for the underlying numerical type and bit width (Timothy). - Implicitly, assume that any non-treated type which is checked for its underlying numerical type is either integer or float and has a defined bit width (Timothy). - Implicitly, assume that structs are the only non-treated non-numerical type (Timothy). - Improve the linker error descriptions and commit log (Andres). Fixes: `13652e7516` ("glsl/linker: Fix type checks for location aliasing") Cc: Ilia Mirkin <imirkin@alum.mit.edu> Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (cherry picked from commit `75a3dd97aa`) [Andres Gomez: is_record() instead of is_struct() and brought glsl_base_type_get_bit_size] Signed-off-by: Andres Gomez <agomez@igalia.com>	2019-04-12 17:18:40 -07:00
Kenneth Graunke	2e63686268	glsl: Set location on structure-split sampler uniform variables gl_nir_lower_samplers_as_deref splits structure uniform variables, creating new variables for individual fields. As part of that, it calculates a new location. It then never set this on the new variables. Thanks to Michael Fiano for finding this bug. Fixes crashes on i965 with Piglit's new tests/spec/glsl-1.10/execution/samplers/uniform-struct test, which was reduced from the failing case in Michael's app. Fixes: `f003859f97` nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (cherry picked from commit `9e0c744f07`)	2019-04-12 13:36:49 -07:00
Jason Ekstrand	f9eaa873cf	anv/pipeline: Fix MEDIA_VFE_STATE::PerThreadScratchSpace on gen7 We were always programming it with the Broadwell convention which is too large by a factor of two on Haswell and just plain wrong on IVB and BYT. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `7eaaff18cb`)	2019-04-12 13:36:49 -07:00
Eric Engestrom	aacefed521	meson: remove meson-created megadrivers symlinks Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110356 Fixes: `aa7afe324c` "meson: strip rpath from megadrivers" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net> (cherry picked from commit `c77acc3ceb`)	2019-04-12 13:36:49 -07:00
Dylan Baker	d41acb4c9e	docs: Add sha256 sums for 19.0.2	2019-04-10 20:40:42 -07:00
Dylan Baker	2964ee3ad0	docs: Add release notes for 19.0.2	2019-04-10 20:34:09 -07:00
Dylan Baker	349759165c	VERSION: bump version for 19.0.2	2019-04-10 20:30:30 -07:00
Boyuan Zhang	20db3b0e46	st/va: reverse qt matrix back to its original order The quantiser matrix that VAAPI provides has been applied with inverse z-scan. However, what we expect in MPEG2 picture description is the original order. Therefore, we need to reverse it back to its original order. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110257 Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `d507bcdcf2`)	2019-04-09 08:36:40 -07:00
Lionel Landwerlin	57b7dbbb21	intel: add dependency on genxml generated files Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `48e48b8560`) Conflicts resolved by Dylan Conflicts: src/gallium/drivers/iris/meson.build	2019-04-09 08:35:49 -07:00
Caio Marcelo de Oliveira Filho	b493686860	nir: Take if_uses into account when repairing SSA If a def is used as an condition before its definition, we should also consider this a case to repair. When repairing, make sure we rewrite any if conditions too. Found in while inspecting a SPIR-V conversion from a 'continue block' that contains a conditional branch. We pull the continue block up to the beggining of the loop, and the condition in the branch ends up defined afterwards. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `364212f1ed` "nir: Add a pass to repair SSA form" (cherry picked from commit `c037dbb0ef`)	2019-04-08 09:30:03 -07:00
Eric Anholt	73bc3248f4	v3d: Don't try to use the TFU blit path if a scissor is enabled. We'll need to do a render-based blit for scissors, since the TFU (as seen in this conditional) can only update a whole surface. Fixes: `976ea90bdc` ("v3d: Add support for using the TFU to do some blits.") Fixes piglit fbo-scissor-blit. (cherry picked from commit `4c70f276bc`)	2019-04-05 09:08:03 -07:00
Eric Anholt	d1f4c96919	v3d: Bump the maximum texture size to 4k for V3D 4.x. 4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in the CTS at 8k and 16k. 4k at least lets us get one 4k display working. Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `62360e92ec`)	2019-04-05 09:07:57 -07:00
Eric Anholt	b7769cdfb7	dri3: Return the current swap interval from glXGetSwapIntervalMESA(). We were caching only the value set with glXSwapIntervalSGI(), missing out on the default setting of the swap interval by the loader. This fixes glxgears's warning about being vblank synchronized by default. Fixes: `9777c4234b` ("loader: drop the [gs]et_swap_interval callbacks") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (cherry picked from commit `edc7deec42`)	2019-04-02 09:14:20 -07:00
Marek Olšák	e46e3bfd13	radeonsi: fix assertion failure by using the correct type src/gallium/drivers/radeonsi/si_state_viewport.c:196: si_emit_guardband: Assertion `vp_as_scissor.maxx <= max_viewport_size[vp_as_scissor.quant_mode] && vp_as_scissor.maxy <= max_viewport_size[vp_as_scissor.quant_mode]' failed. The comparison was unsigned, so negative maxx or maxy would fail. Fixes: `3c540e0a74` "radeonsi: Fix guardband computation for large render targets" (cherry picked from commit `3ad2a9b3fa`)	2019-04-01 09:47:45 -07:00
Leo Liu	a4d5161d42	radeon/vcn/vp9: search the render target from the whole list The number of render targets could be more than max of references, so we search the full list of the render pictures for the current render target index https://bugs.freedesktop.org/show_bug.cgi?id=109648 Signed-off-by: Leo Liu <leo.liu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Acked-by: James Zhu<James.Zhu@amd.com> Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit `d4e0fbc92f`)	2019-04-01 09:47:39 -07:00
Eric Engestrom	a1c30b8b78	meson: strip rpath from megadrivers More specifically, use the library file that has been post-processed by Meson when creating the hardlinks. Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=108766 Fixes: `3218056e0e` "meson: Build i965 and dri stack" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (cherry picked from commit `aa7afe324c`)	2019-04-01 09:47:34 -07:00
Karol Herbst	9987a3d448	nir/print: fix printing the image_array intrinsic index Fixes: `0de003be03` ("nir: Add handle/index-based image intrinsics") Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit `6ffc72472c`)	2019-03-29 08:32:00 -07:00
Samuel Pitoiset	891c4ff633	radv: do not always initialize HTILE in compressed state Especially when performing a transtion from UNDEFINED->GENERAL, the driver shouldn't initialize HTILE metadata in compressed state because it doesn't decompress when the src layout is GENERAL. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110259 Fixes: `3a2e93147f` ("radv: always initialize HTILE when the src layout is UNDEFINED") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit `62a9d757e6`)	2019-03-29 08:31:53 -07:00
Samuel Pitoiset	a175dffe84	radv: skip updating depth/color metadata for conditional rendering I don't think we should update metadata when conditional rendering is enabled. For some reasons, some CTS breaks only on SI. This fixes the following CTS on SI: dEQP-VK.conditional_rendering.draw_clear.clear.depth.* Cc: 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit `6596eb2b30`)	2019-03-28 12:14:46 -07:00
Leo Liu	29bfb1af10	radeon/vcn: add H.264 constrained baseline support VCN supports this profile as well as UVD, so add it Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> CC: <mesa-stable@lists.freedesktop.org> (cherry picked from commit `f8ef8b56a6`)	2019-03-28 12:14:39 -07:00
Jason Ekstrand	dc6f00d53e	Revert "anv/radv: release memory allocated by glsl types during spirv_to_nir" This reverts commit `4e1bbb000c`. It turns out that some DXVK apps due to some implementation detail of DXVK or other create and destroy instances in an interleaved way. Freeing the glsl_type memory without being a bit more careful causes use-after-free issues. Looks like we need to try again. (cherry picked from commit `ce47999cee`)	2019-03-27 11:49:05 -07:00
Dylan Baker	ba3eb3c938	docs: Add SHA256 sums for mesa 19.0.1	2019-03-27 10:10:37 -07:00
@@ -1 +1 @@
 .0.1
 .0.3