Compare commits
61 Commits
mesa-11.1.
...
mesa-11.1.
Author | SHA1 | Date | |
---|---|---|---|
|
5a616125ac | ||
|
a8b2698494 | ||
|
7753691f1a | ||
|
ce914d941d | ||
|
300f807649 | ||
|
61a275b789 | ||
|
0f3892ed9d | ||
|
3d942ee4e5 | ||
|
efdf841238 | ||
|
5913a8c9ec | ||
|
3c9e76fc24 | ||
|
67b1e7b947 | ||
|
0586c5844f | ||
|
7d226ee279 | ||
|
36ff210d0e | ||
|
017f4755fd | ||
|
61cb4db868 | ||
|
34785fb7b9 | ||
|
22d6bf5078 | ||
|
9908d19699 | ||
|
a500109aad | ||
|
0e78a67709 | ||
|
4bb16d712a | ||
|
950e9886d0 | ||
|
dff89432d8 | ||
|
499d409a20 | ||
|
a16f5195ef | ||
|
f65b790089 | ||
|
aa19234943 | ||
|
66f216d8ce | ||
|
4d34038ae5 | ||
|
781a68555d | ||
|
e0b11bcc87 | ||
|
359679cb33 | ||
|
fcf6091521 | ||
|
aa5082b135 | ||
|
1df00e17d3 | ||
|
ad3df9d168 | ||
|
e4cf550501 | ||
|
ecf2885d7f | ||
|
fc59ca4064 | ||
|
396fbdc721 | ||
|
50ac2100df | ||
|
08cf0f8529 | ||
|
ba51596b1d | ||
|
3d13bb8851 | ||
|
3bf2c6b96a | ||
|
5ab1bb4bec | ||
|
c5ca18ec2f | ||
|
f6cca7a0c9 | ||
|
ae649bf1ad | ||
|
132303cfe4 | ||
|
9df2431194 | ||
|
dd409e2a41 | ||
|
38c770ec29 | ||
|
d8450616d9 | ||
|
c9fe9e4b42 | ||
|
693e938321 | ||
|
7a0661839b | ||
|
4c234d183b | ||
|
b37189523e |
@@ -1,2 +0,0 @@
|
||||
# The introduced definitions are not used/implemented by mesa
|
||||
1d5b88e33b07bc26d612720e6cb197a6917ba75f gles2: Update gl2ext.h to revision: 32120
|
@@ -238,6 +238,12 @@ for details.
|
||||
</ul>
|
||||
|
||||
|
||||
<h3>VA-API state tracker environment variables</h3>
|
||||
<ul>
|
||||
<li>VAAPI_MPEG4_ENABLED - enable MPEG4 for VA-API, disabled by default.
|
||||
</ul>
|
||||
|
||||
|
||||
<p>
|
||||
Other Gallium drivers have their own environment variables. These may change
|
||||
frequently so the source code should be consulted for details.
|
||||
|
@@ -14,7 +14,7 @@
|
||||
<iframe src="../contents.html"></iframe>
|
||||
<div class="content">
|
||||
|
||||
<h1>Mesa 11.1.0 Release Notes / TBD</h1>
|
||||
<h1>Mesa 11.1.0 Release Notes / 15 December 2015</h1>
|
||||
|
||||
<p>
|
||||
Mesa 11.1.0 is a new development release.
|
||||
@@ -84,11 +84,196 @@ Note: some of the new features are only available with certain drivers.
|
||||
|
||||
<h2>Bug fixes</h2>
|
||||
|
||||
TBD.
|
||||
<p>This list is likely incomplete.</p>
|
||||
|
||||
<ul>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=28130">Bug 28130</a> - vbo: premature flushing breaks GL_LINE_LOOP</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=49779">Bug 49779</a> - Extra line segments in GL_LINE_LOOP</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=79783">Bug 79783</a> - Distorted output in obs-studio where other vendors "work"</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=80821">Bug 80821</a> - When LIBGL_ALWAYS_SOFTWARE is set, KHR_create_context is not supported</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=81174">Bug 81174</a> - Gallium: GL_LINE_LOOP broken with more than 512 points</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=83508">Bug 83508</a> - [UBO] Assertion for array of blocks</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86281">Bug 86281</a> - brw_meta_fast_clear (brw=brw@entry=0x7fffd4097a08, fb=fb@entry=0x7fffd40fa900, buffers=buffers@entry=2, partial_clear=partial_clear@entry=false)</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86469">Bug 86469</a> - Unreal Engine demo doesn't run</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86720">Bug 86720</a> - [radeon] Europa Universalis 4 freezing during game start (10.3.3+, still broken on 11.0.2)</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89014">Bug 89014</a> - PIPE_QUERY_GPU_FINISHED is not acting as expected on SI</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90175">Bug 90175</a> - [hsw bisected][PATCH] atomic counters doesn't work for a binding point different to zero</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90631">Bug 90631</a> - Compilation failure for fragment shader with many branches on Sandy Bridge</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90734">Bug 90734</a> - glBufferSubData is corrupting data when buffer is > 32k</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91044">Bug 91044</a> - piglit spec/egl_khr_create_context/valid debug flag gles* fail</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91114">Bug 91114</a> - ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert fails</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91718">Bug 91718</a> - piglit.spec.arb_shader_image_load_store.invalid causes intermittent GPU HANG</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91780">Bug 91780</a> - Rendering issues with geometry shader</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91785">Bug 91785</a> - make check DispatchSanity_test.GLES31 regression</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91788">Bug 91788</a> - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91890">Bug 91890</a> - [nve7] witcher2: blurry image & DATA_ERRORs (class 0xa097 mthd 0x2380/0x238c)</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91898">Bug 91898</a> - src/util/mesa-sha1.c:250:25: fatal error: openssl/sha.h: No such file or directory</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91927">Bug 91927</a> - [SKL] [regression] piglit compressed textures tests fail with kernel upgrade</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91930">Bug 91930</a> - Program with GtkGLArea widget does not redraw</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91970">Bug 91970</a> - [BSW regression] dEQP-GLES3.functional.shaders.precision.int.highp_mul_vertex</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91985">Bug 91985</a> - [regression, bisected] FTBFS with commit f9caabe8f1: R600_UCP_CONST_BUFFER is undefined</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92009">Bug 92009</a> - ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels fails</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92033">Bug 92033</a> - [SNB,regression,dEQP,bisected] functional.shaders.random tests regressed</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92052">Bug 92052</a> - nir/nir_builder.h:79: error: expected primary-expression before ‘.’ token</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92054">Bug 92054</a> - make check gbm-symbols-check regression</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92066">Bug 92066</a> - [ILK,G45,regression] New assertion on BRW_MAX_MRF breaks ilk and g45</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92072">Bug 92072</a> - Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92095">Bug 92095</a> - [Regression, bisected] arb_shader_atomic_counters.compiler.builtins.frag</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92122">Bug 92122</a> - [bisected, cts] Regression with Assault Android Cactus</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92124">Bug 92124</a> - shader_query.cpp:841:34: error: ‘strndup’ was not declared in this scope</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92183">Bug 92183</a> - linker.cpp:3187:46: error: ‘strtok_r’ was not declared in this scope</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92193">Bug 92193</a> - [SKL] ES2-CTS.gtf.GL2ExtensionTests.compressed_astc_texture.compressed_astc_texture fails</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92221">Bug 92221</a> - Unintended code changes in _mesa_base_tex_format commit</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92265">Bug 92265</a> - Black windows in weston after update mesa to 11.0.2-1</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92304">Bug 92304</a> - [cts] cts.shaders.negative conformance tests fail</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92437">Bug 92437</a> - osmesa: Expose GL entry points for Windows build, via .def file</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92476">Bug 92476</a> - [cts] ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image fails</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92588">Bug 92588</a> - [HSW,BDW,BSW,SKL-Y][GLES 3.1 CTS] ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 - assert</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92621">Bug 92621</a> - [G965 ILK G45] Regression: 24 piglit regressions in glsl-1.10</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92623">Bug 92623</a> - Differences in prog_data ignored when caching fragment programs (causes hangs)</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92639">Bug 92639</a> - [Regression bisected] Ogles1conform mustpass.c fail</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92641">Bug 92641</a> - [SKL BSW] [Regression] Ogles1conform userclip.c fail</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92645">Bug 92645</a> - kodi vdpau interop fails since mesa,meta: move gl_texture_object::TargetIndex initializations</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92705">Bug 92705</a> - [clover] fail to build with llvm-svn/clang-svn 3.8</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92709">Bug 92709</a> - "LLVM triggered Diagnostic Handler: unsupported call to function ldexpf in main" when starting race in stuntrally</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92738">Bug 92738</a> - Randon R7 240 doesn't work on 16KiB page size platform</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92744">Bug 92744</a> - [g965 Regression bisected] Performance regression and piglit assertions due to liveness analysis</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92770">Bug 92770</a> - [SNB, regression, dEQP] deqp-gles3.functional.shaders.discard.dynamic_loop_texture</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92824">Bug 92824</a> - [regression, bisected] `make check` dispatch-sanity broken by GL_EXT_buffer_storage</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92849">Bug 92849</a> - [IVB HSW BDW] piglit image load/store load-from-cleared-image.shader_test fails</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92859">Bug 92859</a> - [regression, bisected] validate_intrinsic_instr: Assertion triggered</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92860">Bug 92860</a> - [radeonsi][bisected] st/mesa: implement ARB_copy_image - Corruption in ARK Survival Evolved</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92900">Bug 92900</a> - [regression bisected] About 700 piglit regressions is what could go wrong</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92985">Bug 92985</a> - Mac OS X build error "ar: no archive members specified"</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93015">Bug 93015</a> - Tonga Elemental segfault + VM faults since radeon: implement r600_query_hw_get_result via function pointers</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93048">Bug 93048</a> - [CTS regression] mesa af2723 breaks GL Conformance for debug extension</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93063">Bug 93063</a> - drm_helper.h:227:1: error: static declaration of ‘pipe_virgl_create_screen’ follows non-static declaration</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93091">Bug 93091</a> - [opencl] segfault when running any opencl programs (like clinfo)</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93126">Bug 93126</a> - wrongly claim supporting GL_EXT_texture_rg</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93180">Bug 93180</a> - [regression] arb_separate_shader_objects.active sampler conflict fails</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93235">Bug 93235</a> - [regression] dispatch sanity broken by GetPointerv</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
|
||||
|
||||
</ul>
|
||||
|
||||
|
||||
<h2>Changes</h2>
|
||||
|
||||
TBD.
|
||||
<li>MPEG4 decoding has been disabled by default in the VAAPI driver</li>
|
||||
|
||||
</div>
|
||||
</body>
|
||||
|
File diff suppressed because it is too large
Load Diff
@@ -575,8 +575,8 @@ CodeEmitterGK110::emitIMUL(const Instruction *i)
|
||||
if (isLIMM(i->src(1), TYPE_S32)) {
|
||||
emitForm_L(i, 0x280, 2, Modifier(0));
|
||||
|
||||
assert(i->subOp != NV50_IR_SUBOP_MUL_HIGH);
|
||||
|
||||
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH)
|
||||
code[1] |= 1 << 24;
|
||||
if (i->sType == TYPE_S32)
|
||||
code[1] |= 3 << 25;
|
||||
} else {
|
||||
@@ -695,14 +695,9 @@ CodeEmitterGK110::emitIMAD(const Instruction *i)
|
||||
if (i->sType == TYPE_S32)
|
||||
code[1] |= (1 << 19) | (1 << 24);
|
||||
|
||||
if (code[0] & 0x1) {
|
||||
assert(!i->subOp);
|
||||
SAT_(39);
|
||||
} else {
|
||||
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH)
|
||||
code[1] |= 1 << 25;
|
||||
SAT_(35);
|
||||
}
|
||||
if (i->subOp == NV50_IR_SUBOP_MUL_HIGH)
|
||||
code[1] |= 1 << 25;
|
||||
SAT_(35);
|
||||
}
|
||||
|
||||
void
|
||||
|
@@ -202,7 +202,8 @@ NV50LegalizePostRA::visit(Function *fn)
|
||||
Program *prog = fn->getProgram();
|
||||
|
||||
r63 = new_LValue(fn, FILE_GPR);
|
||||
if (prog->maxGPR < 63)
|
||||
// GPR units on nv50 are in half-regs
|
||||
if (prog->maxGPR < 126)
|
||||
r63->reg.data.id = 63;
|
||||
else
|
||||
r63->reg.data.id = 127;
|
||||
|
@@ -686,7 +686,7 @@ NVC0LoweringPass::handleTEX(TexInstruction *i)
|
||||
i->tex.s = 0x1f;
|
||||
i->setIndirectR(hnd);
|
||||
i->setIndirectS(NULL);
|
||||
} else if (i->tex.r == i->tex.s) {
|
||||
} else if (i->tex.r == i->tex.s || i->op == OP_TXF) {
|
||||
i->tex.r += prog->driver->io.texBindBase / 4;
|
||||
i->tex.s = 0; // only a single cX[] value possible here
|
||||
} else {
|
||||
|
@@ -598,6 +598,106 @@ static int select_twoside_color(struct r600_shader_ctx *ctx, int front, int back
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* execute a single slot ALU calculation */
|
||||
static int single_alu_op2(struct r600_shader_ctx *ctx, int op,
|
||||
int dst_sel, int dst_chan,
|
||||
int src0_sel, unsigned src0_chan_val,
|
||||
int src1_sel, unsigned src1_chan_val)
|
||||
{
|
||||
struct r600_bytecode_alu alu;
|
||||
int r, i;
|
||||
|
||||
if (ctx->bc->chip_class == CAYMAN && op == ALU_OP2_MULLO_INT) {
|
||||
for (i = 0; i < 4; i++) {
|
||||
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
|
||||
alu.op = op;
|
||||
alu.src[0].sel = src0_sel;
|
||||
if (src0_sel == V_SQ_ALU_SRC_LITERAL)
|
||||
alu.src[0].value = src0_chan_val;
|
||||
else
|
||||
alu.src[0].chan = src0_chan_val;
|
||||
alu.src[1].sel = src1_sel;
|
||||
if (src1_sel == V_SQ_ALU_SRC_LITERAL)
|
||||
alu.src[1].value = src1_chan_val;
|
||||
else
|
||||
alu.src[1].chan = src1_chan_val;
|
||||
alu.dst.sel = dst_sel;
|
||||
alu.dst.chan = i;
|
||||
alu.dst.write = i == dst_chan;
|
||||
alu.last = (i == 3);
|
||||
r = r600_bytecode_add_alu(ctx->bc, &alu);
|
||||
if (r)
|
||||
return r;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
|
||||
alu.op = op;
|
||||
alu.src[0].sel = src0_sel;
|
||||
if (src0_sel == V_SQ_ALU_SRC_LITERAL)
|
||||
alu.src[0].value = src0_chan_val;
|
||||
else
|
||||
alu.src[0].chan = src0_chan_val;
|
||||
alu.src[1].sel = src1_sel;
|
||||
if (src1_sel == V_SQ_ALU_SRC_LITERAL)
|
||||
alu.src[1].value = src1_chan_val;
|
||||
else
|
||||
alu.src[1].chan = src1_chan_val;
|
||||
alu.dst.sel = dst_sel;
|
||||
alu.dst.chan = dst_chan;
|
||||
alu.dst.write = 1;
|
||||
alu.last = 1;
|
||||
r = r600_bytecode_add_alu(ctx->bc, &alu);
|
||||
if (r)
|
||||
return r;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* execute a single slot ALU calculation */
|
||||
static int single_alu_op3(struct r600_shader_ctx *ctx, int op,
|
||||
int dst_sel, int dst_chan,
|
||||
int src0_sel, unsigned src0_chan_val,
|
||||
int src1_sel, unsigned src1_chan_val,
|
||||
int src2_sel, unsigned src2_chan_val)
|
||||
{
|
||||
struct r600_bytecode_alu alu;
|
||||
int r;
|
||||
|
||||
/* validate this for other ops */
|
||||
assert(op == ALU_OP3_MULADD_UINT24);
|
||||
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
|
||||
alu.op = op;
|
||||
alu.src[0].sel = src0_sel;
|
||||
if (src0_sel == V_SQ_ALU_SRC_LITERAL)
|
||||
alu.src[0].value = src0_chan_val;
|
||||
else
|
||||
alu.src[0].chan = src0_chan_val;
|
||||
alu.src[1].sel = src1_sel;
|
||||
if (src1_sel == V_SQ_ALU_SRC_LITERAL)
|
||||
alu.src[1].value = src1_chan_val;
|
||||
else
|
||||
alu.src[1].chan = src1_chan_val;
|
||||
alu.src[2].sel = src2_sel;
|
||||
if (src2_sel == V_SQ_ALU_SRC_LITERAL)
|
||||
alu.src[2].value = src2_chan_val;
|
||||
else
|
||||
alu.src[2].chan = src2_chan_val;
|
||||
alu.dst.sel = dst_sel;
|
||||
alu.dst.chan = dst_chan;
|
||||
alu.is_op3 = 1;
|
||||
alu.last = 1;
|
||||
r = r600_bytecode_add_alu(ctx->bc, &alu);
|
||||
if (r)
|
||||
return r;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int get_address_file_reg(struct r600_shader_ctx *ctx, int index)
|
||||
{
|
||||
return index > 0 ? ctx->bc->index_reg[index - 1] : ctx->bc->ar_reg;
|
||||
}
|
||||
|
||||
static int vs_add_primid_output(struct r600_shader_ctx *ctx, int prim_id_sid)
|
||||
{
|
||||
int i;
|
||||
@@ -1129,6 +1229,7 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
|
||||
unsigned vtx_id = src->Dimension.Index;
|
||||
int offset_reg = vtx_id / 3;
|
||||
int offset_chan = vtx_id % 3;
|
||||
int t2 = 0;
|
||||
|
||||
/* offsets of per-vertex data in ESGS ring are passed to GS in R0.x, R0.y,
|
||||
* R0.w, R1.x, R1.y, R1.z (it seems R0.z is used for PrimitiveID) */
|
||||
@@ -1136,13 +1237,24 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
|
||||
if (offset_reg == 0 && offset_chan == 2)
|
||||
offset_chan = 3;
|
||||
|
||||
if (src->Dimension.Indirect || src->Register.Indirect)
|
||||
t2 = r600_get_temp(ctx);
|
||||
|
||||
if (src->Dimension.Indirect) {
|
||||
int treg[3];
|
||||
int t2;
|
||||
struct r600_bytecode_alu alu;
|
||||
int r, i;
|
||||
|
||||
/* you have got to be shitting me -
|
||||
unsigned addr_reg;
|
||||
addr_reg = get_address_file_reg(ctx, src->DimIndirect.Index);
|
||||
if (src->DimIndirect.Index > 0) {
|
||||
r = single_alu_op2(ctx, ALU_OP1_MOV,
|
||||
ctx->bc->ar_reg, 0,
|
||||
addr_reg, 0,
|
||||
0, 0);
|
||||
if (r)
|
||||
return r;
|
||||
}
|
||||
/*
|
||||
we have to put the R0.x/y/w into Rt.x Rt+1.x Rt+2.x then index reg from Rt.
|
||||
at least this is what fglrx seems to do. */
|
||||
for (i = 0; i < 3; i++) {
|
||||
@@ -1150,7 +1262,6 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
|
||||
}
|
||||
r600_add_gpr_array(ctx->shader, treg[0], 3, 0x0F);
|
||||
|
||||
t2 = r600_get_temp(ctx);
|
||||
for (i = 0; i < 3; i++) {
|
||||
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
|
||||
alu.op = ALU_OP1_MOV;
|
||||
@@ -1175,8 +1286,33 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi
|
||||
if (r)
|
||||
return r;
|
||||
offset_reg = t2;
|
||||
offset_chan = 0;
|
||||
}
|
||||
|
||||
if (src->Register.Indirect) {
|
||||
int addr_reg;
|
||||
unsigned first = ctx->info.input_array_first[src->Indirect.ArrayID];
|
||||
|
||||
addr_reg = get_address_file_reg(ctx, src->Indirect.Index);
|
||||
|
||||
/* pull the value from index_reg */
|
||||
r = single_alu_op2(ctx, ALU_OP2_ADD_INT,
|
||||
t2, 1,
|
||||
addr_reg, 0,
|
||||
V_SQ_ALU_SRC_LITERAL, first);
|
||||
if (r)
|
||||
return r;
|
||||
r = single_alu_op3(ctx, ALU_OP3_MULADD_UINT24,
|
||||
t2, 0,
|
||||
t2, 1,
|
||||
V_SQ_ALU_SRC_LITERAL, 4,
|
||||
offset_reg, offset_chan);
|
||||
if (r)
|
||||
return r;
|
||||
offset_reg = t2;
|
||||
offset_chan = 0;
|
||||
index = src->Register.Index - first;
|
||||
}
|
||||
|
||||
memset(&vtx, 0, sizeof(vtx));
|
||||
vtx.buffer_id = R600_GS_RING_CONST_BUFFER;
|
||||
@@ -1222,6 +1358,7 @@ static int tgsi_split_gs_inputs(struct r600_shader_ctx *ctx)
|
||||
|
||||
fetch_gs_input(ctx, src, treg);
|
||||
ctx->src[i].sel = treg;
|
||||
ctx->src[i].rel = 0;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
@@ -1498,7 +1635,7 @@ static int generate_gs_copy_shader(struct r600_context *rctx,
|
||||
*last_exp_pos = NULL, *last_exp_param = NULL;
|
||||
int i, j, next_clip_pos = 61, next_param = 0;
|
||||
int ring;
|
||||
|
||||
bool only_ring_0 = true;
|
||||
cshader = calloc(1, sizeof(struct r600_pipe_shader));
|
||||
if (!cshader)
|
||||
return 0;
|
||||
@@ -1570,6 +1707,8 @@ static int generate_gs_copy_shader(struct r600_context *rctx,
|
||||
for (i = 0; i < so->num_outputs; i++) {
|
||||
if (so->output[i].stream == ring) {
|
||||
enabled = true;
|
||||
if (ring > 0)
|
||||
only_ring_0 = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
@@ -1604,7 +1743,7 @@ static int generate_gs_copy_shader(struct r600_context *rctx,
|
||||
cf_jump = ctx.bc->cf_last;
|
||||
|
||||
if (enabled)
|
||||
emit_streamout(&ctx, so, ring, &cshader->shader.ring_item_sizes[ring]);
|
||||
emit_streamout(&ctx, so, only_ring_0 ? -1 : ring, &cshader->shader.ring_item_sizes[ring]);
|
||||
cshader->shader.ring_item_sizes[ring] = ocnt * 16;
|
||||
}
|
||||
|
||||
@@ -7185,7 +7324,7 @@ static int tgsi_eg_arl(struct r600_shader_ctx *ctx)
|
||||
struct r600_bytecode_alu alu;
|
||||
int r;
|
||||
int i, lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
|
||||
unsigned reg = inst->Dst[0].Register.Index > 0 ? ctx->bc->index_reg[inst->Dst[0].Register.Index - 1] : ctx->bc->ar_reg;
|
||||
unsigned reg = get_address_file_reg(ctx, inst->Dst[0].Register.Index);
|
||||
|
||||
assert(inst->Dst[0].Register.Index < 3);
|
||||
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
|
||||
|
@@ -239,8 +239,8 @@ bool r600_common_context_init(struct r600_common_context *rctx,
|
||||
rctx->family = rscreen->family;
|
||||
rctx->chip_class = rscreen->chip_class;
|
||||
|
||||
if (rscreen->family == CHIP_HAWAII)
|
||||
rctx->max_db = 16;
|
||||
if (rscreen->chip_class >= CIK)
|
||||
rctx->max_db = MAX2(8, rscreen->info.r600_num_backends);
|
||||
else if (rscreen->chip_class >= EVERGREEN)
|
||||
rctx->max_db = 8;
|
||||
else
|
||||
|
@@ -489,6 +489,10 @@ static void vi_texture_alloc_dcc_separate(struct r600_common_screen *rscreen,
|
||||
if (rscreen->debug_flags & DBG_NO_DCC)
|
||||
return;
|
||||
|
||||
/* TODO: DCC is broken on Stoney */
|
||||
if (rscreen->family == CHIP_STONEY)
|
||||
return;
|
||||
|
||||
rtex->dcc_buffer = (struct r600_resource *)
|
||||
r600_aligned_buffer_create(&rscreen->b, PIPE_BIND_CUSTOM,
|
||||
PIPE_USAGE_DEFAULT, rtex->surface.dcc_size, rtex->surface.dcc_alignment);
|
||||
|
@@ -632,7 +632,7 @@ void si_check_vm_faults(struct si_context *sctx)
|
||||
/* Use conservative timeout 800ms, after which we won't wait any
|
||||
* longer and assume the GPU is hung.
|
||||
*/
|
||||
screen->fence_finish(screen, sctx->last_gfx_fence, 800*1000*1000);
|
||||
sctx->b.ws->fence_wait(sctx->b.ws, sctx->last_gfx_fence, 800*1000*1000);
|
||||
|
||||
if (!si_vm_fault_occured(sctx, &addr))
|
||||
return;
|
||||
|
@@ -594,6 +594,14 @@ static LLVMValueRef lds_load(struct lp_build_tgsi_context *bld_base,
|
||||
lp_build_const_int32(gallivm, swizzle));
|
||||
|
||||
value = build_indexed_load(si_shader_ctx, si_shader_ctx->lds, dw_addr);
|
||||
if (type == TGSI_TYPE_DOUBLE) {
|
||||
LLVMValueRef value2;
|
||||
dw_addr = lp_build_add(&bld_base->uint_bld, dw_addr,
|
||||
lp_build_const_int32(gallivm, swizzle + 1));
|
||||
value2 = build_indexed_load(si_shader_ctx, si_shader_ctx->lds, dw_addr);
|
||||
return radeon_llvm_emit_fetch_double(bld_base, value, value2);
|
||||
}
|
||||
|
||||
return LLVMBuildBitCast(gallivm->builder, value,
|
||||
tgsi2llvmtype(bld_base, type), "");
|
||||
}
|
||||
@@ -733,6 +741,7 @@ static LLVMValueRef fetch_input_gs(
|
||||
unsigned semantic_name = info->input_semantic_name[reg->Register.Index];
|
||||
unsigned semantic_index = info->input_semantic_index[reg->Register.Index];
|
||||
unsigned param;
|
||||
LLVMValueRef value;
|
||||
|
||||
if (swizzle != ~0 && semantic_name == TGSI_SEMANTIC_PRIMID)
|
||||
return get_primitive_id(bld_base, swizzle);
|
||||
@@ -774,11 +783,22 @@ static LLVMValueRef fetch_input_gs(
|
||||
args[7] = uint->zero; /* SLC */
|
||||
args[8] = uint->zero; /* TFE */
|
||||
|
||||
value = lp_build_intrinsic(gallivm->builder,
|
||||
"llvm.SI.buffer.load.dword.i32.i32",
|
||||
i32, args, 9,
|
||||
LLVMReadOnlyAttribute | LLVMNoUnwindAttribute);
|
||||
if (type == TGSI_TYPE_DOUBLE) {
|
||||
LLVMValueRef value2;
|
||||
args[2] = lp_build_const_int32(gallivm, (param * 4 + swizzle + 1) * 256);
|
||||
value2 = lp_build_intrinsic(gallivm->builder,
|
||||
"llvm.SI.buffer.load.dword.i32.i32",
|
||||
i32, args, 9,
|
||||
LLVMReadOnlyAttribute | LLVMNoUnwindAttribute);
|
||||
return radeon_llvm_emit_fetch_double(bld_base,
|
||||
value, value2);
|
||||
}
|
||||
return LLVMBuildBitCast(gallivm->builder,
|
||||
lp_build_intrinsic(gallivm->builder,
|
||||
"llvm.SI.buffer.load.dword.i32.i32",
|
||||
i32, args, 9,
|
||||
LLVMReadOnlyAttribute | LLVMNoUnwindAttribute),
|
||||
value,
|
||||
tgsi2llvmtype(bld_base, type), "");
|
||||
}
|
||||
|
||||
|
@@ -21,6 +21,7 @@ C_SOURCES := \
|
||||
vc4_job.c \
|
||||
vc4_nir_lower_blend.c \
|
||||
vc4_nir_lower_io.c \
|
||||
vc4_nir_lower_txf_ms.c \
|
||||
vc4_opt_algebraic.c \
|
||||
vc4_opt_constant_folding.c \
|
||||
vc4_opt_copy_propagation.c \
|
||||
|
@@ -121,6 +121,11 @@ enum vc4_packet {
|
||||
#define VC4_PACKET_TILE_COORDINATES_SIZE 3
|
||||
#define VC4_PACKET_GEM_HANDLES_SIZE 9
|
||||
|
||||
/* Number of multisamples supported. */
|
||||
#define VC4_MAX_SAMPLES 4
|
||||
/* Size of a full resolution color or Z tile buffer load/store. */
|
||||
#define VC4_TILE_BUFFER_SIZE (64 * 64 * 4)
|
||||
|
||||
#define VC4_MASK(high, low) (((1 << ((high) - (low) + 1)) - 1) << (low))
|
||||
/* Using the GNU statement expression extension */
|
||||
#define VC4_SET_FIELD(value, field) \
|
||||
@@ -151,6 +156,16 @@ enum vc4_packet {
|
||||
#define VC4_LOADSTORE_FULL_RES_DISABLE_ZS (1 << 1)
|
||||
#define VC4_LOADSTORE_FULL_RES_DISABLE_COLOR (1 << 0)
|
||||
|
||||
/** @{
|
||||
*
|
||||
* low bits of VC4_PACKET_STORE_FULL_RES_TILE_BUFFER and
|
||||
* VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER.
|
||||
*/
|
||||
#define VC4_LOADSTORE_FULL_RES_EOF (1 << 3)
|
||||
#define VC4_LOADSTORE_FULL_RES_DISABLE_CLEAR_ALL (1 << 2)
|
||||
#define VC4_LOADSTORE_FULL_RES_DISABLE_ZS (1 << 1)
|
||||
#define VC4_LOADSTORE_FULL_RES_DISABLE_COLOR (1 << 0)
|
||||
|
||||
/** @{
|
||||
*
|
||||
* byte 2 of VC4_PACKET_STORE_TILE_BUFFER_GENERAL and
|
||||
|
@@ -36,9 +36,11 @@
|
||||
|
||||
struct vc4_rcl_setup {
|
||||
struct drm_gem_cma_object *color_read;
|
||||
struct drm_gem_cma_object *color_ms_write;
|
||||
struct drm_gem_cma_object *color_write;
|
||||
struct drm_gem_cma_object *zs_read;
|
||||
struct drm_gem_cma_object *zs_write;
|
||||
struct drm_gem_cma_object *msaa_color_write;
|
||||
struct drm_gem_cma_object *msaa_zs_write;
|
||||
|
||||
struct drm_gem_cma_object *rcl;
|
||||
u32 next_offset;
|
||||
@@ -62,7 +64,6 @@ static inline void rcl_u32(struct vc4_rcl_setup *setup, u32 val)
|
||||
setup->next_offset += 4;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Emits a no-op STORE_TILE_BUFFER_GENERAL.
|
||||
*
|
||||
@@ -81,6 +82,22 @@ static void vc4_store_before_load(struct vc4_rcl_setup *setup)
|
||||
rcl_u32(setup, 0); /* no address, since we're in None mode */
|
||||
}
|
||||
|
||||
/*
|
||||
* Calculates the physical address of the start of a tile in a RCL surface.
|
||||
*
|
||||
* Unlike the other load/store packets,
|
||||
* VC4_PACKET_LOAD/STORE_FULL_RES_TILE_BUFFER don't look at the tile
|
||||
* coordinates packet, and instead just store to the address given.
|
||||
*/
|
||||
static uint32_t vc4_full_res_offset(struct vc4_exec_info *exec,
|
||||
struct drm_gem_cma_object *bo,
|
||||
struct drm_vc4_submit_rcl_surface *surf,
|
||||
uint8_t x, uint8_t y)
|
||||
{
|
||||
return bo->paddr + surf->offset + VC4_TILE_BUFFER_SIZE *
|
||||
(DIV_ROUND_UP(exec->args->width, 32) * y + x);
|
||||
}
|
||||
|
||||
/*
|
||||
* Emits a PACKET_TILE_COORDINATES if one isn't already pending.
|
||||
*
|
||||
@@ -108,22 +125,41 @@ static void emit_tile(struct vc4_exec_info *exec,
|
||||
* may be outstanding at a time.
|
||||
*/
|
||||
if (setup->color_read) {
|
||||
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
|
||||
rcl_u16(setup, args->color_read.bits);
|
||||
rcl_u32(setup,
|
||||
setup->color_read->paddr + args->color_read.offset);
|
||||
if (args->color_read.flags &
|
||||
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
|
||||
rcl_u8(setup, VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER);
|
||||
rcl_u32(setup,
|
||||
vc4_full_res_offset(exec, setup->color_read,
|
||||
&args->color_read, x, y) |
|
||||
VC4_LOADSTORE_FULL_RES_DISABLE_ZS);
|
||||
} else {
|
||||
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
|
||||
rcl_u16(setup, args->color_read.bits);
|
||||
rcl_u32(setup, setup->color_read->paddr +
|
||||
args->color_read.offset);
|
||||
}
|
||||
}
|
||||
|
||||
if (setup->zs_read) {
|
||||
if (setup->color_read) {
|
||||
/* Exec previous load. */
|
||||
vc4_tile_coordinates(setup, x, y);
|
||||
vc4_store_before_load(setup);
|
||||
}
|
||||
if (args->zs_read.flags &
|
||||
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
|
||||
rcl_u8(setup, VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER);
|
||||
rcl_u32(setup,
|
||||
vc4_full_res_offset(exec, setup->zs_read,
|
||||
&args->zs_read, x, y) |
|
||||
VC4_LOADSTORE_FULL_RES_DISABLE_COLOR);
|
||||
} else {
|
||||
if (setup->color_read) {
|
||||
/* Exec previous load. */
|
||||
vc4_tile_coordinates(setup, x, y);
|
||||
vc4_store_before_load(setup);
|
||||
}
|
||||
|
||||
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
|
||||
rcl_u16(setup, args->zs_read.bits);
|
||||
rcl_u32(setup, setup->zs_read->paddr + args->zs_read.offset);
|
||||
rcl_u8(setup, VC4_PACKET_LOAD_TILE_BUFFER_GENERAL);
|
||||
rcl_u16(setup, args->zs_read.bits);
|
||||
rcl_u32(setup, setup->zs_read->paddr +
|
||||
args->zs_read.offset);
|
||||
}
|
||||
}
|
||||
|
||||
/* Clipping depends on tile coordinates having been
|
||||
@@ -144,20 +180,60 @@ static void emit_tile(struct vc4_exec_info *exec,
|
||||
(y * exec->bin_tiles_x + x) * 32));
|
||||
}
|
||||
|
||||
if (setup->msaa_color_write) {
|
||||
bool last_tile_write = (!setup->msaa_zs_write &&
|
||||
!setup->zs_write &&
|
||||
!setup->color_write);
|
||||
uint32_t bits = VC4_LOADSTORE_FULL_RES_DISABLE_ZS;
|
||||
|
||||
if (!last_tile_write)
|
||||
bits |= VC4_LOADSTORE_FULL_RES_DISABLE_CLEAR_ALL;
|
||||
else if (last)
|
||||
bits |= VC4_LOADSTORE_FULL_RES_EOF;
|
||||
rcl_u8(setup, VC4_PACKET_STORE_FULL_RES_TILE_BUFFER);
|
||||
rcl_u32(setup,
|
||||
vc4_full_res_offset(exec, setup->msaa_color_write,
|
||||
&args->msaa_color_write, x, y) |
|
||||
bits);
|
||||
}
|
||||
|
||||
if (setup->msaa_zs_write) {
|
||||
bool last_tile_write = (!setup->zs_write &&
|
||||
!setup->color_write);
|
||||
uint32_t bits = VC4_LOADSTORE_FULL_RES_DISABLE_COLOR;
|
||||
|
||||
if (setup->msaa_color_write)
|
||||
vc4_tile_coordinates(setup, x, y);
|
||||
if (!last_tile_write)
|
||||
bits |= VC4_LOADSTORE_FULL_RES_DISABLE_CLEAR_ALL;
|
||||
else if (last)
|
||||
bits |= VC4_LOADSTORE_FULL_RES_EOF;
|
||||
rcl_u8(setup, VC4_PACKET_STORE_FULL_RES_TILE_BUFFER);
|
||||
rcl_u32(setup,
|
||||
vc4_full_res_offset(exec, setup->msaa_zs_write,
|
||||
&args->msaa_zs_write, x, y) |
|
||||
bits);
|
||||
}
|
||||
|
||||
if (setup->zs_write) {
|
||||
bool last_tile_write = !setup->color_write;
|
||||
|
||||
if (setup->msaa_color_write || setup->msaa_zs_write)
|
||||
vc4_tile_coordinates(setup, x, y);
|
||||
|
||||
rcl_u8(setup, VC4_PACKET_STORE_TILE_BUFFER_GENERAL);
|
||||
rcl_u16(setup, args->zs_write.bits |
|
||||
(setup->color_ms_write ?
|
||||
VC4_STORE_TILE_BUFFER_DISABLE_COLOR_CLEAR : 0));
|
||||
(last_tile_write ?
|
||||
0 : VC4_STORE_TILE_BUFFER_DISABLE_COLOR_CLEAR));
|
||||
rcl_u32(setup,
|
||||
(setup->zs_write->paddr + args->zs_write.offset) |
|
||||
((last && !setup->color_ms_write) ?
|
||||
((last && last_tile_write) ?
|
||||
VC4_LOADSTORE_TILE_BUFFER_EOF : 0));
|
||||
}
|
||||
|
||||
if (setup->color_ms_write) {
|
||||
if (setup->zs_write) {
|
||||
/* Reset after previous store */
|
||||
if (setup->color_write) {
|
||||
if (setup->msaa_color_write || setup->msaa_zs_write ||
|
||||
setup->zs_write) {
|
||||
vc4_tile_coordinates(setup, x, y);
|
||||
}
|
||||
|
||||
@@ -192,14 +268,26 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
|
||||
}
|
||||
|
||||
if (setup->color_read) {
|
||||
loop_body_size += (VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE);
|
||||
if (args->color_read.flags &
|
||||
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
|
||||
loop_body_size += VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER_SIZE;
|
||||
} else {
|
||||
loop_body_size += VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE;
|
||||
}
|
||||
}
|
||||
if (setup->zs_read) {
|
||||
if (setup->color_read) {
|
||||
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE;
|
||||
loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE;
|
||||
if (args->zs_read.flags &
|
||||
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
|
||||
loop_body_size += VC4_PACKET_LOAD_FULL_RES_TILE_BUFFER_SIZE;
|
||||
} else {
|
||||
if (setup->color_read &&
|
||||
!(args->color_read.flags &
|
||||
VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES)) {
|
||||
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE;
|
||||
loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE;
|
||||
}
|
||||
loop_body_size += VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE;
|
||||
}
|
||||
loop_body_size += VC4_PACKET_LOAD_TILE_BUFFER_GENERAL_SIZE;
|
||||
}
|
||||
|
||||
if (has_bin) {
|
||||
@@ -207,13 +295,23 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
|
||||
loop_body_size += VC4_PACKET_BRANCH_TO_SUB_LIST_SIZE;
|
||||
}
|
||||
|
||||
if (setup->msaa_color_write)
|
||||
loop_body_size += VC4_PACKET_STORE_FULL_RES_TILE_BUFFER_SIZE;
|
||||
if (setup->msaa_zs_write)
|
||||
loop_body_size += VC4_PACKET_STORE_FULL_RES_TILE_BUFFER_SIZE;
|
||||
|
||||
if (setup->zs_write)
|
||||
loop_body_size += VC4_PACKET_STORE_TILE_BUFFER_GENERAL_SIZE;
|
||||
if (setup->color_ms_write) {
|
||||
if (setup->zs_write)
|
||||
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE;
|
||||
if (setup->color_write)
|
||||
loop_body_size += VC4_PACKET_STORE_MS_TILE_BUFFER_SIZE;
|
||||
}
|
||||
|
||||
/* We need a VC4_PACKET_TILE_COORDINATES in between each store. */
|
||||
loop_body_size += VC4_PACKET_TILE_COORDINATES_SIZE *
|
||||
((setup->msaa_color_write != NULL) +
|
||||
(setup->msaa_zs_write != NULL) +
|
||||
(setup->color_write != NULL) +
|
||||
(setup->zs_write != NULL) - 1);
|
||||
|
||||
size += xtiles * ytiles * loop_body_size;
|
||||
|
||||
setup->rcl = drm_gem_cma_create(dev, size);
|
||||
@@ -224,13 +322,12 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
|
||||
|
||||
rcl_u8(setup, VC4_PACKET_TILE_RENDERING_MODE_CONFIG);
|
||||
rcl_u32(setup,
|
||||
(setup->color_ms_write ?
|
||||
(setup->color_ms_write->paddr +
|
||||
args->color_ms_write.offset) :
|
||||
(setup->color_write ? (setup->color_write->paddr +
|
||||
args->color_write.offset) :
|
||||
0));
|
||||
rcl_u16(setup, args->width);
|
||||
rcl_u16(setup, args->height);
|
||||
rcl_u16(setup, args->color_ms_write.bits);
|
||||
rcl_u16(setup, args->color_write.bits);
|
||||
|
||||
/* The tile buffer gets cleared when the previous tile is stored. If
|
||||
* the clear values changed between frames, then the tile buffer has
|
||||
@@ -255,6 +352,7 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
|
||||
for (x = min_x_tile; x <= max_x_tile; x++) {
|
||||
bool first = (x == min_x_tile && y == min_y_tile);
|
||||
bool last = (x == max_x_tile && y == max_y_tile);
|
||||
|
||||
emit_tile(exec, setup, x, y, first, last);
|
||||
}
|
||||
}
|
||||
@@ -266,6 +364,56 @@ static int vc4_create_rcl_bo(struct drm_device *dev, struct vc4_exec_info *exec,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int vc4_full_res_bounds_check(struct vc4_exec_info *exec,
|
||||
struct drm_gem_cma_object *obj,
|
||||
struct drm_vc4_submit_rcl_surface *surf)
|
||||
{
|
||||
struct drm_vc4_submit_cl *args = exec->args;
|
||||
u32 render_tiles_stride = DIV_ROUND_UP(exec->args->width, 32);
|
||||
|
||||
if (surf->offset > obj->base.size) {
|
||||
DRM_ERROR("surface offset %d > BO size %zd\n",
|
||||
surf->offset, obj->base.size);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if ((obj->base.size - surf->offset) / VC4_TILE_BUFFER_SIZE <
|
||||
render_tiles_stride * args->max_y_tile + args->max_x_tile) {
|
||||
DRM_ERROR("MSAA tile %d, %d out of bounds "
|
||||
"(bo size %zd, offset %d).\n",
|
||||
args->max_x_tile, args->max_y_tile,
|
||||
obj->base.size,
|
||||
surf->offset);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int vc4_rcl_msaa_surface_setup(struct vc4_exec_info *exec,
|
||||
struct drm_gem_cma_object **obj,
|
||||
struct drm_vc4_submit_rcl_surface *surf)
|
||||
{
|
||||
if (surf->flags != 0 || surf->bits != 0) {
|
||||
DRM_ERROR("MSAA surface had nonzero flags/bits\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (surf->hindex == ~0)
|
||||
return 0;
|
||||
|
||||
*obj = vc4_use_bo(exec, surf->hindex);
|
||||
if (!*obj)
|
||||
return -EINVAL;
|
||||
|
||||
if (surf->offset & 0xf) {
|
||||
DRM_ERROR("MSAA write must be 16b aligned.\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
return vc4_full_res_bounds_check(exec, *obj, surf);
|
||||
}
|
||||
|
||||
static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
|
||||
struct drm_gem_cma_object **obj,
|
||||
struct drm_vc4_submit_rcl_surface *surf)
|
||||
@@ -277,9 +425,10 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
|
||||
uint8_t format = VC4_GET_FIELD(surf->bits,
|
||||
VC4_LOADSTORE_TILE_BUFFER_FORMAT);
|
||||
int cpp;
|
||||
int ret;
|
||||
|
||||
if (surf->pad != 0) {
|
||||
DRM_ERROR("Padding unset\n");
|
||||
if (surf->flags & ~VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
|
||||
DRM_ERROR("Extra flags set\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
@@ -290,6 +439,25 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
|
||||
if (!*obj)
|
||||
return -EINVAL;
|
||||
|
||||
if (surf->flags & VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) {
|
||||
if (surf == &exec->args->zs_write) {
|
||||
DRM_ERROR("general zs write may not be a full-res.\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (surf->bits != 0) {
|
||||
DRM_ERROR("load/store general bits set with "
|
||||
"full res load/store.\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
ret = vc4_full_res_bounds_check(exec, *obj, surf);
|
||||
if (!ret)
|
||||
return ret;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (surf->bits & ~(VC4_LOADSTORE_TILE_BUFFER_TILING_MASK |
|
||||
VC4_LOADSTORE_TILE_BUFFER_BUFFER_MASK |
|
||||
VC4_LOADSTORE_TILE_BUFFER_FORMAT_MASK)) {
|
||||
@@ -341,9 +509,10 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec,
|
||||
}
|
||||
|
||||
static int
|
||||
vc4_rcl_ms_surface_setup(struct vc4_exec_info *exec,
|
||||
struct drm_gem_cma_object **obj,
|
||||
struct drm_vc4_submit_rcl_surface *surf)
|
||||
vc4_rcl_render_config_surface_setup(struct vc4_exec_info *exec,
|
||||
struct vc4_rcl_setup *setup,
|
||||
struct drm_gem_cma_object **obj,
|
||||
struct drm_vc4_submit_rcl_surface *surf)
|
||||
{
|
||||
uint8_t tiling = VC4_GET_FIELD(surf->bits,
|
||||
VC4_RENDER_CONFIG_MEMORY_FORMAT);
|
||||
@@ -351,13 +520,15 @@ vc4_rcl_ms_surface_setup(struct vc4_exec_info *exec,
|
||||
VC4_RENDER_CONFIG_FORMAT);
|
||||
int cpp;
|
||||
|
||||
if (surf->pad != 0) {
|
||||
DRM_ERROR("Padding unset\n");
|
||||
if (surf->flags != 0) {
|
||||
DRM_ERROR("No flags supported on render config.\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (surf->bits & ~(VC4_RENDER_CONFIG_MEMORY_FORMAT_MASK |
|
||||
VC4_RENDER_CONFIG_FORMAT_MASK)) {
|
||||
VC4_RENDER_CONFIG_FORMAT_MASK |
|
||||
VC4_RENDER_CONFIG_MS_MODE_4X |
|
||||
VC4_RENDER_CONFIG_DECIMATE_MODE_4X)) {
|
||||
DRM_ERROR("Unknown bits in render config: 0x%04x\n",
|
||||
surf->bits);
|
||||
return -EINVAL;
|
||||
@@ -414,18 +585,20 @@ int vc4_get_rcl(struct drm_device *dev, struct vc4_exec_info *exec)
|
||||
if (has_bin &&
|
||||
(args->max_x_tile > exec->bin_tiles_x ||
|
||||
args->max_y_tile > exec->bin_tiles_y)) {
|
||||
DRM_ERROR("Render tiles (%d,%d) outside of bin config (%d,%d)\n",
|
||||
DRM_ERROR("Render tiles (%d,%d) outside of bin config "
|
||||
"(%d,%d)\n",
|
||||
args->max_x_tile, args->max_y_tile,
|
||||
exec->bin_tiles_x, exec->bin_tiles_y);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
ret = vc4_rcl_surface_setup(exec, &setup.color_read, &args->color_read);
|
||||
ret = vc4_rcl_render_config_surface_setup(exec, &setup,
|
||||
&setup.color_write,
|
||||
&args->color_write);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
ret = vc4_rcl_ms_surface_setup(exec, &setup.color_ms_write,
|
||||
&args->color_ms_write);
|
||||
ret = vc4_rcl_surface_setup(exec, &setup.color_read, &args->color_read);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
@@ -437,10 +610,21 @@ int vc4_get_rcl(struct drm_device *dev, struct vc4_exec_info *exec)
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
ret = vc4_rcl_msaa_surface_setup(exec, &setup.msaa_color_write,
|
||||
&args->msaa_color_write);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
ret = vc4_rcl_msaa_surface_setup(exec, &setup.msaa_zs_write,
|
||||
&args->msaa_zs_write);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
/* We shouldn't even have the job submitted to us if there's no
|
||||
* surface to write out.
|
||||
*/
|
||||
if (!setup.color_ms_write && !setup.zs_write) {
|
||||
if (!setup.color_write && !setup.zs_write &&
|
||||
!setup.msaa_color_write && !setup.msaa_zs_write) {
|
||||
DRM_ERROR("RCL requires color or Z/S write\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
@@ -47,7 +47,6 @@
|
||||
void *validated, \
|
||||
void *untrusted
|
||||
|
||||
|
||||
/** Return the width in pixels of a 64-byte microtile. */
|
||||
static uint32_t
|
||||
utile_width(int cpp)
|
||||
@@ -191,7 +190,7 @@ vc4_check_tex_size(struct vc4_exec_info *exec, struct drm_gem_cma_object *fbo,
|
||||
|
||||
if (size + offset < size ||
|
||||
size + offset > fbo->base.size) {
|
||||
DRM_ERROR("Overflow in %dx%d (%dx%d) fbo size (%d + %d > %d)\n",
|
||||
DRM_ERROR("Overflow in %dx%d (%dx%d) fbo size (%d + %d > %zd)\n",
|
||||
width, height,
|
||||
aligned_width, aligned_height,
|
||||
size, offset, fbo->base.size);
|
||||
@@ -201,7 +200,6 @@ vc4_check_tex_size(struct vc4_exec_info *exec, struct drm_gem_cma_object *fbo,
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
static int
|
||||
validate_flush(VALIDATE_ARGS)
|
||||
{
|
||||
@@ -270,7 +268,7 @@ validate_indexed_prim_list(VALIDATE_ARGS)
|
||||
|
||||
if (offset > ib->base.size ||
|
||||
(ib->base.size - offset) / index_size < length) {
|
||||
DRM_ERROR("IB access overflow (%d + %d*%d > %d)\n",
|
||||
DRM_ERROR("IB access overflow (%d + %d*%d > %zd)\n",
|
||||
offset, length, index_size, ib->base.size);
|
||||
return -EINVAL;
|
||||
}
|
||||
@@ -361,9 +359,8 @@ validate_tile_binning_config(VALIDATE_ARGS)
|
||||
}
|
||||
|
||||
if (flags & (VC4_BIN_CONFIG_DB_NON_MS |
|
||||
VC4_BIN_CONFIG_TILE_BUFFER_64BIT |
|
||||
VC4_BIN_CONFIG_MS_MODE_4X)) {
|
||||
DRM_ERROR("unsupported bining config flags 0x%02x\n", flags);
|
||||
VC4_BIN_CONFIG_TILE_BUFFER_64BIT)) {
|
||||
DRM_ERROR("unsupported binning config flags 0x%02x\n", flags);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
@@ -424,8 +421,8 @@ validate_gem_handles(VALIDATE_ARGS)
|
||||
return 0;
|
||||
}
|
||||
|
||||
#define VC4_DEFINE_PACKET(packet, name, func) \
|
||||
[packet] = { packet ## _SIZE, name, func }
|
||||
#define VC4_DEFINE_PACKET(packet, func) \
|
||||
[packet] = { packet ## _SIZE, #packet, func }
|
||||
|
||||
static const struct cmd_info {
|
||||
uint16_t len;
|
||||
@@ -433,42 +430,42 @@ static const struct cmd_info {
|
||||
int (*func)(struct vc4_exec_info *exec, void *validated,
|
||||
void *untrusted);
|
||||
} cmd_info[] = {
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_HALT, "halt", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_NOP, "nop", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH, "flush", validate_flush),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH_ALL, "flush all state", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_START_TILE_BINNING, "start tile binning", validate_start_tile_binning),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_INCREMENT_SEMAPHORE, "increment semaphore", validate_increment_semaphore),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_HALT, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_NOP, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH, validate_flush),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_FLUSH_ALL, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_START_TILE_BINNING,
|
||||
validate_start_tile_binning),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_INCREMENT_SEMAPHORE,
|
||||
validate_increment_semaphore),
|
||||
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_GL_INDEXED_PRIMITIVE, "Indexed Primitive List", validate_indexed_prim_list),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_GL_INDEXED_PRIMITIVE,
|
||||
validate_indexed_prim_list),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_GL_ARRAY_PRIMITIVE,
|
||||
validate_gl_array_primitive),
|
||||
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_GL_ARRAY_PRIMITIVE, "Vertex Array Primitives", validate_gl_array_primitive),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_PRIMITIVE_LIST_FORMAT, NULL),
|
||||
|
||||
/* This is only used by clipped primitives (packets 48 and 49), which
|
||||
* we don't support parsing yet.
|
||||
*/
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_PRIMITIVE_LIST_FORMAT, "primitive list format", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_GL_SHADER_STATE, validate_gl_shader_state),
|
||||
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_GL_SHADER_STATE, "GL Shader State", validate_gl_shader_state),
|
||||
/* We don't support validating NV shader states. */
|
||||
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_CONFIGURATION_BITS, "configuration bits", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_FLAT_SHADE_FLAGS, "flat shade flags", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_POINT_SIZE, "point size", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_LINE_WIDTH, "line width", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_RHT_X_BOUNDARY, "RHT X boundary", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_DEPTH_OFFSET, "Depth Offset", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_CLIP_WINDOW, "Clip Window", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_VIEWPORT_OFFSET, "Viewport Offset", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_XY_SCALING, "Clipper XY Scaling", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_CONFIGURATION_BITS, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_FLAT_SHADE_FLAGS, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_POINT_SIZE, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_LINE_WIDTH, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_RHT_X_BOUNDARY, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_DEPTH_OFFSET, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_CLIP_WINDOW, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_VIEWPORT_OFFSET, NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_XY_SCALING, NULL),
|
||||
/* Note: The docs say this was also 105, but it was 106 in the
|
||||
* initial userland code drop.
|
||||
*/
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_Z_SCALING, "Clipper Z Scale and Offset", NULL),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_CLIPPER_Z_SCALING, NULL),
|
||||
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_TILE_BINNING_MODE_CONFIG, "tile binning configuration", validate_tile_binning_config),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_TILE_BINNING_MODE_CONFIG,
|
||||
validate_tile_binning_config),
|
||||
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_GEM_HANDLES, "GEM handles", validate_gem_handles),
|
||||
VC4_DEFINE_PACKET(VC4_PACKET_GEM_HANDLES, validate_gem_handles),
|
||||
};
|
||||
|
||||
int
|
||||
@@ -500,11 +497,6 @@ vc4_validate_bin_cl(struct drm_device *dev,
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
#if 0
|
||||
DRM_INFO("0x%08x: packet %d (%s) size %d processing...\n",
|
||||
src_offset, cmd, info->name, info->len);
|
||||
#endif
|
||||
|
||||
if (src_offset + info->len > len) {
|
||||
DRM_ERROR("0x%08x: packet %d (%s) length 0x%08x "
|
||||
"exceeds bounds (0x%08x)\n",
|
||||
@@ -519,8 +511,7 @@ vc4_validate_bin_cl(struct drm_device *dev,
|
||||
if (info->func && info->func(exec,
|
||||
dst_pkt + 1,
|
||||
src_pkt + 1)) {
|
||||
DRM_ERROR("0x%08x: packet %d (%s) failed to "
|
||||
"validate\n",
|
||||
DRM_ERROR("0x%08x: packet %d (%s) failed to validate\n",
|
||||
src_offset, cmd, info->name);
|
||||
return -EINVAL;
|
||||
}
|
||||
@@ -588,12 +579,14 @@ reloc_tex(struct vc4_exec_info *exec,
|
||||
|
||||
if (sample->is_direct) {
|
||||
uint32_t remaining_size = tex->base.size - p0;
|
||||
|
||||
if (p0 > tex->base.size - 4) {
|
||||
DRM_ERROR("UBO offset greater than UBO size\n");
|
||||
goto fail;
|
||||
}
|
||||
if (p1 > remaining_size - 4) {
|
||||
DRM_ERROR("UBO clamp would allow reads outside of UBO\n");
|
||||
DRM_ERROR("UBO clamp would allow reads "
|
||||
"outside of UBO\n");
|
||||
goto fail;
|
||||
}
|
||||
*validated_p0 = tex->paddr + p0;
|
||||
@@ -866,7 +859,7 @@ validate_gl_shader_rec(struct drm_device *dev,
|
||||
|
||||
if (vbo->base.size < offset ||
|
||||
vbo->base.size - offset < attr_size) {
|
||||
DRM_ERROR("BO offset overflow (%d + %d > %d)\n",
|
||||
DRM_ERROR("BO offset overflow (%d + %d > %zd)\n",
|
||||
offset, attr_size, vbo->base.size);
|
||||
return -EINVAL;
|
||||
}
|
||||
@@ -875,7 +868,8 @@ validate_gl_shader_rec(struct drm_device *dev,
|
||||
max_index = ((vbo->base.size - offset - attr_size) /
|
||||
stride);
|
||||
if (state->max_index > max_index) {
|
||||
DRM_ERROR("primitives use index %d out of supplied %d\n",
|
||||
DRM_ERROR("primitives use index %d out of "
|
||||
"supplied %d\n",
|
||||
state->max_index, max_index);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
@@ -24,24 +24,16 @@
|
||||
/**
|
||||
* DOC: Shader validator for VC4.
|
||||
*
|
||||
* The VC4 has no IOMMU between it and system memory. So, a user with access
|
||||
* to execute shaders could escalate privilege by overwriting system memory
|
||||
* (using the VPM write address register in the general-purpose DMA mode) or
|
||||
* reading system memory it shouldn't (reading it as a texture, or uniform
|
||||
* data, or vertex data).
|
||||
* The VC4 has no IOMMU between it and system memory, so a user with
|
||||
* access to execute shaders could escalate privilege by overwriting
|
||||
* system memory (using the VPM write address register in the
|
||||
* general-purpose DMA mode) or reading system memory it shouldn't
|
||||
* (reading it as a texture, or uniform data, or vertex data).
|
||||
*
|
||||
* This walks over a shader starting from some offset within a BO, ensuring
|
||||
* that its accesses are appropriately bounded, and recording how many texture
|
||||
* accesses are made and where so that we can do relocations for them in the
|
||||
* This walks over a shader BO, ensuring that its accesses are
|
||||
* appropriately bounded, and recording how many texture accesses are
|
||||
* made and where so that we can do relocations for them in the
|
||||
* uniform stream.
|
||||
*
|
||||
* The kernel API has shaders stored in user-mapped BOs. The BOs will be
|
||||
* forcibly unmapped from the process before validation, and any cache of
|
||||
* validated state will be flushed if the mapping is faulted back in.
|
||||
*
|
||||
* Storing the shaders in BOs means that the validation process will be slow
|
||||
* due to uncached reads, but since shaders are long-lived and shader BOs are
|
||||
* never actually modified, this shouldn't be a problem.
|
||||
*/
|
||||
|
||||
#include "vc4_drv.h"
|
||||
@@ -71,7 +63,6 @@ waddr_to_live_reg_index(uint32_t waddr, bool is_b)
|
||||
else
|
||||
return waddr;
|
||||
} else if (waddr <= QPU_W_ACC3) {
|
||||
|
||||
return 64 + waddr - QPU_W_ACC0;
|
||||
} else {
|
||||
return ~0;
|
||||
@@ -86,15 +77,14 @@ raddr_add_a_to_live_reg_index(uint64_t inst)
|
||||
uint32_t raddr_a = QPU_GET_FIELD(inst, QPU_RADDR_A);
|
||||
uint32_t raddr_b = QPU_GET_FIELD(inst, QPU_RADDR_B);
|
||||
|
||||
if (add_a == QPU_MUX_A) {
|
||||
if (add_a == QPU_MUX_A)
|
||||
return raddr_a;
|
||||
} else if (add_a == QPU_MUX_B && sig != QPU_SIG_SMALL_IMM) {
|
||||
else if (add_a == QPU_MUX_B && sig != QPU_SIG_SMALL_IMM)
|
||||
return 32 + raddr_b;
|
||||
} else if (add_a <= QPU_MUX_R3) {
|
||||
else if (add_a <= QPU_MUX_R3)
|
||||
return 64 + add_a;
|
||||
} else {
|
||||
else
|
||||
return ~0;
|
||||
}
|
||||
}
|
||||
|
||||
static bool
|
||||
@@ -112,9 +102,9 @@ is_tmu_write(uint32_t waddr)
|
||||
}
|
||||
|
||||
static bool
|
||||
record_validated_texture_sample(struct vc4_validated_shader_info *validated_shader,
|
||||
struct vc4_shader_validation_state *validation_state,
|
||||
int tmu)
|
||||
record_texture_sample(struct vc4_validated_shader_info *validated_shader,
|
||||
struct vc4_shader_validation_state *validation_state,
|
||||
int tmu)
|
||||
{
|
||||
uint32_t s = validated_shader->num_texture_samples;
|
||||
int i;
|
||||
@@ -227,8 +217,8 @@ check_tmu_write(uint64_t inst,
|
||||
validated_shader->uniforms_size += 4;
|
||||
|
||||
if (submit) {
|
||||
if (!record_validated_texture_sample(validated_shader,
|
||||
validation_state, tmu)) {
|
||||
if (!record_texture_sample(validated_shader,
|
||||
validation_state, tmu)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
@@ -239,10 +229,10 @@ check_tmu_write(uint64_t inst,
|
||||
}
|
||||
|
||||
static bool
|
||||
check_register_write(uint64_t inst,
|
||||
struct vc4_validated_shader_info *validated_shader,
|
||||
struct vc4_shader_validation_state *validation_state,
|
||||
bool is_mul)
|
||||
check_reg_write(uint64_t inst,
|
||||
struct vc4_validated_shader_info *validated_shader,
|
||||
struct vc4_shader_validation_state *validation_state,
|
||||
bool is_mul)
|
||||
{
|
||||
uint32_t waddr = (is_mul ?
|
||||
QPU_GET_FIELD(inst, QPU_WADDR_MUL) :
|
||||
@@ -298,7 +288,7 @@ check_register_write(uint64_t inst,
|
||||
return true;
|
||||
|
||||
case QPU_W_TLB_STENCIL_SETUP:
|
||||
return true;
|
||||
return true;
|
||||
}
|
||||
|
||||
return true;
|
||||
@@ -361,7 +351,7 @@ track_live_clamps(uint64_t inst,
|
||||
}
|
||||
|
||||
validation_state->live_max_clamp_regs[lri_add] = true;
|
||||
} if (op_add == QPU_A_MIN) {
|
||||
} else if (op_add == QPU_A_MIN) {
|
||||
/* Track live clamps of a value clamped to a minimum of 0 and
|
||||
* a maximum of some uniform's offset.
|
||||
*/
|
||||
@@ -393,8 +383,10 @@ check_instruction_writes(uint64_t inst,
|
||||
return false;
|
||||
}
|
||||
|
||||
ok = (check_register_write(inst, validated_shader, validation_state, false) &&
|
||||
check_register_write(inst, validated_shader, validation_state, true));
|
||||
ok = (check_reg_write(inst, validated_shader, validation_state,
|
||||
false) &&
|
||||
check_reg_write(inst, validated_shader, validation_state,
|
||||
true));
|
||||
|
||||
track_live_clamps(inst, validated_shader, validation_state);
|
||||
|
||||
@@ -442,7 +434,7 @@ vc4_validate_shader(struct drm_gem_cma_object *shader_obj)
|
||||
shader = shader_obj->vaddr;
|
||||
max_ip = shader_obj->base.size / sizeof(uint64_t);
|
||||
|
||||
validated_shader = kcalloc(sizeof(*validated_shader), 1, GFP_KERNEL);
|
||||
validated_shader = kcalloc(1, sizeof(*validated_shader), GFP_KERNEL);
|
||||
if (!validated_shader)
|
||||
return NULL;
|
||||
|
||||
@@ -498,7 +490,7 @@ vc4_validate_shader(struct drm_gem_cma_object *shader_obj)
|
||||
|
||||
if (ip == max_ip) {
|
||||
DRM_ERROR("shader failed to terminate before "
|
||||
"shader BO end at %d\n",
|
||||
"shader BO end at %zd\n",
|
||||
shader_obj->base.size);
|
||||
goto fail;
|
||||
}
|
||||
@@ -514,6 +506,9 @@ vc4_validate_shader(struct drm_gem_cma_object *shader_obj)
|
||||
return validated_shader;
|
||||
|
||||
fail:
|
||||
kfree(validated_shader);
|
||||
if (validated_shader) {
|
||||
kfree(validated_shader->texture_samples);
|
||||
kfree(validated_shader);
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
@@ -41,24 +41,53 @@ vc4_get_blit_surface(struct pipe_context *pctx,
|
||||
return pctx->create_surface(pctx, prsc, &tmpl);
|
||||
}
|
||||
|
||||
static bool
|
||||
is_tile_unaligned(unsigned size, unsigned tile_size)
|
||||
{
|
||||
return size & (tile_size - 1);
|
||||
}
|
||||
|
||||
static bool
|
||||
vc4_tile_blit(struct pipe_context *pctx, const struct pipe_blit_info *info)
|
||||
{
|
||||
struct vc4_context *vc4 = vc4_context(pctx);
|
||||
bool old_msaa = vc4->msaa;
|
||||
int old_tile_width = vc4->tile_width;
|
||||
int old_tile_height = vc4->tile_height;
|
||||
bool msaa = (info->src.resource->nr_samples ||
|
||||
info->dst.resource->nr_samples);
|
||||
int tile_width = msaa ? 32 : 64;
|
||||
int tile_height = msaa ? 32 : 64;
|
||||
|
||||
if (util_format_is_depth_or_stencil(info->dst.resource->format))
|
||||
return false;
|
||||
|
||||
if (info->scissor_enable)
|
||||
return false;
|
||||
|
||||
if ((info->mask & PIPE_MASK_RGBA) == 0)
|
||||
return false;
|
||||
|
||||
if (info->dst.box.x != 0 || info->dst.box.y != 0 ||
|
||||
info->src.box.x != 0 || info->src.box.y != 0 ||
|
||||
if (info->dst.box.x != info->src.box.x ||
|
||||
info->dst.box.y != info->src.box.y ||
|
||||
info->dst.box.width != info->src.box.width ||
|
||||
info->dst.box.height != info->src.box.height) {
|
||||
return false;
|
||||
}
|
||||
|
||||
int dst_surface_width = u_minify(info->dst.resource->width0,
|
||||
info->dst.level);
|
||||
int dst_surface_height = u_minify(info->dst.resource->height0,
|
||||
info->dst.level);
|
||||
if (is_tile_unaligned(info->dst.box.x, tile_width) ||
|
||||
is_tile_unaligned(info->dst.box.y, tile_height) ||
|
||||
(is_tile_unaligned(info->dst.box.width, tile_width) &&
|
||||
info->dst.box.x + info->dst.box.width != dst_surface_width) ||
|
||||
(is_tile_unaligned(info->dst.box.height, tile_height) &&
|
||||
info->dst.box.y + info->dst.box.height != dst_surface_height)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if (info->dst.resource->format != info->src.resource->format)
|
||||
return false;
|
||||
|
||||
@@ -70,18 +99,32 @@ vc4_tile_blit(struct pipe_context *pctx, const struct pipe_blit_info *info)
|
||||
vc4_get_blit_surface(pctx, info->src.resource, info->src.level);
|
||||
|
||||
pipe_surface_reference(&vc4->color_read, src_surf);
|
||||
pipe_surface_reference(&vc4->color_write, dst_surf);
|
||||
pipe_surface_reference(&vc4->color_write,
|
||||
dst_surf->texture->nr_samples ? NULL : dst_surf);
|
||||
pipe_surface_reference(&vc4->msaa_color_write,
|
||||
dst_surf->texture->nr_samples ? dst_surf : NULL);
|
||||
pipe_surface_reference(&vc4->zs_read, NULL);
|
||||
pipe_surface_reference(&vc4->zs_write, NULL);
|
||||
vc4->draw_min_x = 0;
|
||||
vc4->draw_min_y = 0;
|
||||
vc4->draw_max_x = dst_surf->width;
|
||||
vc4->draw_max_y = dst_surf->height;
|
||||
pipe_surface_reference(&vc4->msaa_zs_write, NULL);
|
||||
|
||||
vc4->draw_min_x = info->dst.box.x;
|
||||
vc4->draw_min_y = info->dst.box.y;
|
||||
vc4->draw_max_x = info->dst.box.x + info->dst.box.width;
|
||||
vc4->draw_max_y = info->dst.box.y + info->dst.box.height;
|
||||
vc4->draw_width = dst_surf->width;
|
||||
vc4->draw_height = dst_surf->height;
|
||||
|
||||
vc4->tile_width = tile_width;
|
||||
vc4->tile_height = tile_height;
|
||||
vc4->msaa = msaa;
|
||||
vc4->needs_flush = true;
|
||||
|
||||
vc4_job_submit(vc4);
|
||||
|
||||
vc4->msaa = old_msaa;
|
||||
vc4->tile_width = old_tile_width;
|
||||
vc4->tile_height = old_tile_height;
|
||||
|
||||
pipe_surface_reference(&dst_surf, NULL);
|
||||
pipe_surface_reference(&src_surf, NULL);
|
||||
|
||||
@@ -131,14 +174,6 @@ vc4_blit(struct pipe_context *pctx, const struct pipe_blit_info *blit_info)
|
||||
{
|
||||
struct pipe_blit_info info = *blit_info;
|
||||
|
||||
if (info.src.resource->nr_samples > 1 &&
|
||||
info.dst.resource->nr_samples <= 1 &&
|
||||
!util_format_is_depth_or_stencil(info.src.resource->format) &&
|
||||
!util_format_is_pure_integer(info.src.resource->format)) {
|
||||
fprintf(stderr, "color resolve unimplemented\n");
|
||||
return;
|
||||
}
|
||||
|
||||
if (vc4_tile_blit(pctx, blit_info))
|
||||
return;
|
||||
|
||||
|
@@ -67,8 +67,16 @@ vc4_flush(struct pipe_context *pctx)
|
||||
cl_u8(&bcl, VC4_PACKET_FLUSH);
|
||||
cl_end(&vc4->bcl, bcl);
|
||||
|
||||
vc4->msaa = false;
|
||||
if (cbuf && (vc4->resolve & PIPE_CLEAR_COLOR0)) {
|
||||
pipe_surface_reference(&vc4->color_write, cbuf);
|
||||
pipe_surface_reference(&vc4->color_write,
|
||||
cbuf->texture->nr_samples ? NULL : cbuf);
|
||||
pipe_surface_reference(&vc4->msaa_color_write,
|
||||
cbuf->texture->nr_samples ? cbuf : NULL);
|
||||
|
||||
if (cbuf->texture->nr_samples)
|
||||
vc4->msaa = true;
|
||||
|
||||
if (!(vc4->cleared & PIPE_CLEAR_COLOR0)) {
|
||||
pipe_surface_reference(&vc4->color_read, cbuf);
|
||||
} else {
|
||||
@@ -78,11 +86,21 @@ vc4_flush(struct pipe_context *pctx)
|
||||
} else {
|
||||
pipe_surface_reference(&vc4->color_write, NULL);
|
||||
pipe_surface_reference(&vc4->color_read, NULL);
|
||||
pipe_surface_reference(&vc4->msaa_color_write, NULL);
|
||||
}
|
||||
|
||||
if (vc4->framebuffer.zsbuf &&
|
||||
(vc4->resolve & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL))) {
|
||||
pipe_surface_reference(&vc4->zs_write, zsbuf);
|
||||
pipe_surface_reference(&vc4->zs_write,
|
||||
zsbuf->texture->nr_samples ?
|
||||
NULL : zsbuf);
|
||||
pipe_surface_reference(&vc4->msaa_zs_write,
|
||||
zsbuf->texture->nr_samples ?
|
||||
zsbuf : NULL);
|
||||
|
||||
if (zsbuf->texture->nr_samples)
|
||||
vc4->msaa = true;
|
||||
|
||||
if (!(vc4->cleared & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL))) {
|
||||
pipe_surface_reference(&vc4->zs_read, zsbuf);
|
||||
} else {
|
||||
@@ -91,6 +109,7 @@ vc4_flush(struct pipe_context *pctx)
|
||||
} else {
|
||||
pipe_surface_reference(&vc4->zs_write, NULL);
|
||||
pipe_surface_reference(&vc4->zs_read, NULL);
|
||||
pipe_surface_reference(&vc4->msaa_zs_write, NULL);
|
||||
}
|
||||
|
||||
vc4_job_submit(vc4);
|
||||
@@ -245,6 +264,8 @@ vc4_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags)
|
||||
|
||||
vc4_debug |= saved_shaderdb_flag;
|
||||
|
||||
vc4->sample_mask = (1 << VC4_MAX_SAMPLES) - 1;
|
||||
|
||||
return &vc4->base;
|
||||
|
||||
fail:
|
||||
|
@@ -206,6 +206,8 @@ struct vc4_context {
|
||||
struct pipe_surface *color_write;
|
||||
struct pipe_surface *zs_read;
|
||||
struct pipe_surface *zs_write;
|
||||
struct pipe_surface *msaa_color_write;
|
||||
struct pipe_surface *msaa_zs_write;
|
||||
/** @} */
|
||||
/** @{
|
||||
* Bounding box of the scissor across all queued drawing.
|
||||
@@ -224,6 +226,15 @@ struct vc4_context {
|
||||
uint32_t draw_width;
|
||||
uint32_t draw_height;
|
||||
/** @} */
|
||||
/** @{ Tile information, depending on MSAA and float color buffer. */
|
||||
uint32_t draw_tiles_x; /** @< Number of tiles wide for framebuffer. */
|
||||
uint32_t draw_tiles_y; /** @< Number of tiles high for framebuffer. */
|
||||
|
||||
uint32_t tile_width; /** @< Width of a tile. */
|
||||
uint32_t tile_height; /** @< Height of a tile. */
|
||||
/** Whether the current rendering is in a 4X MSAA tile buffer. */
|
||||
bool msaa;
|
||||
/** @} */
|
||||
|
||||
struct util_slab_mempool transfer_pool;
|
||||
struct blitter_context *blitter;
|
||||
|
@@ -68,21 +68,17 @@ vc4_start_draw(struct vc4_context *vc4)
|
||||
|
||||
vc4_get_draw_cl_space(vc4);
|
||||
|
||||
uint32_t width = vc4->framebuffer.width;
|
||||
uint32_t height = vc4->framebuffer.height;
|
||||
uint32_t tilew = align(width, 64) / 64;
|
||||
uint32_t tileh = align(height, 64) / 64;
|
||||
struct vc4_cl_out *bcl = cl_start(&vc4->bcl);
|
||||
|
||||
// Tile state data is 48 bytes per tile, I think it can be thrown away
|
||||
// as soon as binning is finished.
|
||||
cl_u8(&bcl, VC4_PACKET_TILE_BINNING_MODE_CONFIG);
|
||||
cl_u32(&bcl, 0); /* tile alloc addr, filled by kernel */
|
||||
cl_u32(&bcl, 0); /* tile alloc size, filled by kernel */
|
||||
cl_u32(&bcl, 0); /* tile state addr, filled by kernel */
|
||||
cl_u8(&bcl, tilew);
|
||||
cl_u8(&bcl, tileh);
|
||||
cl_u8(&bcl, 0); /* flags, filled by kernel. */
|
||||
cl_u8(&bcl, vc4->draw_tiles_x);
|
||||
cl_u8(&bcl, vc4->draw_tiles_y);
|
||||
/* Other flags are filled by kernel. */
|
||||
cl_u8(&bcl, vc4->msaa ? VC4_BIN_CONFIG_MS_MODE_4X : 0);
|
||||
|
||||
/* START_TILE_BINNING resets the statechange counters in the hardware,
|
||||
* which are what is used when a primitive is binned to a tile to
|
||||
@@ -102,8 +98,8 @@ vc4_start_draw(struct vc4_context *vc4)
|
||||
|
||||
vc4->needs_flush = true;
|
||||
vc4->draw_calls_queued++;
|
||||
vc4->draw_width = width;
|
||||
vc4->draw_height = height;
|
||||
vc4->draw_width = vc4->framebuffer.width;
|
||||
vc4->draw_height = vc4->framebuffer.height;
|
||||
|
||||
cl_end(&vc4->bcl, bcl);
|
||||
}
|
||||
|
@@ -44,10 +44,13 @@ struct drm_vc4_submit_rcl_surface {
|
||||
uint32_t hindex; /* Handle index, or ~0 if not present. */
|
||||
uint32_t offset; /* Offset to start of buffer. */
|
||||
/*
|
||||
* Bits for either render config (color_ms_write) or load/store packet.
|
||||
* Bits for either render config (color_write) or load/store packet.
|
||||
* Bits should all be 0 for MSAA load/stores.
|
||||
*/
|
||||
uint16_t bits;
|
||||
uint16_t pad;
|
||||
|
||||
#define VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES (1 << 0)
|
||||
uint16_t flags;
|
||||
};
|
||||
|
||||
/**
|
||||
@@ -126,9 +129,11 @@ struct drm_vc4_submit_cl {
|
||||
uint8_t max_x_tile;
|
||||
uint8_t max_y_tile;
|
||||
struct drm_vc4_submit_rcl_surface color_read;
|
||||
struct drm_vc4_submit_rcl_surface color_ms_write;
|
||||
struct drm_vc4_submit_rcl_surface color_write;
|
||||
struct drm_vc4_submit_rcl_surface zs_read;
|
||||
struct drm_vc4_submit_rcl_surface zs_write;
|
||||
struct drm_vc4_submit_rcl_surface msaa_color_write;
|
||||
struct drm_vc4_submit_rcl_surface msaa_zs_write;
|
||||
uint32_t clear_color[2];
|
||||
uint32_t clear_z;
|
||||
uint8_t clear_s;
|
||||
|
@@ -29,17 +29,35 @@ vc4_emit_state(struct pipe_context *pctx)
|
||||
struct vc4_context *vc4 = vc4_context(pctx);
|
||||
|
||||
struct vc4_cl_out *bcl = cl_start(&vc4->bcl);
|
||||
if (vc4->dirty & (VC4_DIRTY_SCISSOR | VC4_DIRTY_VIEWPORT)) {
|
||||
if (vc4->dirty & (VC4_DIRTY_SCISSOR | VC4_DIRTY_VIEWPORT |
|
||||
VC4_DIRTY_RASTERIZER)) {
|
||||
float *vpscale = vc4->viewport.scale;
|
||||
float *vptranslate = vc4->viewport.translate;
|
||||
float vp_minx = -fabsf(vpscale[0]) + vptranslate[0];
|
||||
float vp_maxx = fabsf(vpscale[0]) + vptranslate[0];
|
||||
float vp_miny = -fabsf(vpscale[1]) + vptranslate[1];
|
||||
float vp_maxy = fabsf(vpscale[1]) + vptranslate[1];
|
||||
uint32_t minx = MAX2(vc4->scissor.minx, vp_minx);
|
||||
uint32_t miny = MAX2(vc4->scissor.miny, vp_miny);
|
||||
uint32_t maxx = MIN2(vc4->scissor.maxx, vp_maxx);
|
||||
uint32_t maxy = MIN2(vc4->scissor.maxy, vp_maxy);
|
||||
|
||||
/* Clip to the scissor if it's enabled, but still clip to the
|
||||
* drawable regardless since that controls where the binner
|
||||
* tries to put things.
|
||||
*
|
||||
* Additionally, always clip the rendering to the viewport,
|
||||
* since the hardware does guardband clipping, meaning
|
||||
* primitives would rasterize outside of the view volume.
|
||||
*/
|
||||
uint32_t minx, miny, maxx, maxy;
|
||||
if (!vc4->rasterizer->base.scissor) {
|
||||
minx = MAX2(vp_minx, 0);
|
||||
miny = MAX2(vp_miny, 0);
|
||||
maxx = MIN2(vp_maxx, vc4->draw_width);
|
||||
maxy = MIN2(vp_maxy, vc4->draw_height);
|
||||
} else {
|
||||
minx = MAX2(vp_minx, vc4->scissor.minx);
|
||||
miny = MAX2(vp_miny, vc4->scissor.miny);
|
||||
maxx = MIN2(vp_maxx, vc4->scissor.maxx);
|
||||
maxy = MIN2(vp_maxy, vc4->scissor.maxy);
|
||||
}
|
||||
|
||||
cl_u8(&bcl, VC4_PACKET_CLIP_WINDOW);
|
||||
cl_u16(&bcl, minx);
|
||||
@@ -54,6 +72,20 @@ vc4_emit_state(struct pipe_context *pctx)
|
||||
}
|
||||
|
||||
if (vc4->dirty & (VC4_DIRTY_RASTERIZER | VC4_DIRTY_ZSA)) {
|
||||
uint8_t ez_enable_mask_out = ~0;
|
||||
|
||||
/* HW-2905: If the RCL ends up doing a full-res load when
|
||||
* multisampling, then early Z tracking may end up with values
|
||||
* from the previous tile due to a HW bug. Disable it to
|
||||
* avoid that.
|
||||
*
|
||||
* We should be able to skip this when the Z is cleared, but I
|
||||
* was seeing bad rendering on glxgears -samples 4 even in
|
||||
* that case.
|
||||
*/
|
||||
if (vc4->msaa)
|
||||
ez_enable_mask_out &= ~VC4_CONFIG_BITS_EARLY_Z;
|
||||
|
||||
cl_u8(&bcl, VC4_PACKET_CONFIGURATION_BITS);
|
||||
cl_u8(&bcl,
|
||||
vc4->rasterizer->config_bits[0] |
|
||||
@@ -62,8 +94,8 @@ vc4_emit_state(struct pipe_context *pctx)
|
||||
vc4->rasterizer->config_bits[1] |
|
||||
vc4->zsa->config_bits[1]);
|
||||
cl_u8(&bcl,
|
||||
vc4->rasterizer->config_bits[2] |
|
||||
vc4->zsa->config_bits[2]);
|
||||
(vc4->rasterizer->config_bits[2] |
|
||||
vc4->zsa->config_bits[2]) & ez_enable_mask_out);
|
||||
}
|
||||
|
||||
if (vc4->dirty & VC4_DIRTY_RASTERIZER) {
|
||||
|
@@ -89,31 +89,37 @@ vc4_submit_setup_rcl_surface(struct vc4_context *vc4,
|
||||
submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo);
|
||||
submit_surf->offset = surf->offset;
|
||||
|
||||
if (is_depth) {
|
||||
submit_surf->bits =
|
||||
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_ZS,
|
||||
VC4_LOADSTORE_TILE_BUFFER_BUFFER);
|
||||
if (psurf->texture->nr_samples == 0) {
|
||||
if (is_depth) {
|
||||
submit_surf->bits =
|
||||
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_ZS,
|
||||
VC4_LOADSTORE_TILE_BUFFER_BUFFER);
|
||||
|
||||
} else {
|
||||
submit_surf->bits =
|
||||
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_COLOR,
|
||||
VC4_LOADSTORE_TILE_BUFFER_BUFFER) |
|
||||
VC4_SET_FIELD(vc4_rt_format_is_565(psurf->format) ?
|
||||
VC4_LOADSTORE_TILE_BUFFER_BGR565 :
|
||||
VC4_LOADSTORE_TILE_BUFFER_RGBA8888,
|
||||
VC4_LOADSTORE_TILE_BUFFER_FORMAT);
|
||||
}
|
||||
submit_surf->bits |=
|
||||
VC4_SET_FIELD(surf->tiling,
|
||||
VC4_LOADSTORE_TILE_BUFFER_TILING);
|
||||
} else {
|
||||
submit_surf->bits =
|
||||
VC4_SET_FIELD(VC4_LOADSTORE_TILE_BUFFER_COLOR,
|
||||
VC4_LOADSTORE_TILE_BUFFER_BUFFER) |
|
||||
VC4_SET_FIELD(vc4_rt_format_is_565(psurf->format) ?
|
||||
VC4_LOADSTORE_TILE_BUFFER_BGR565 :
|
||||
VC4_LOADSTORE_TILE_BUFFER_RGBA8888,
|
||||
VC4_LOADSTORE_TILE_BUFFER_FORMAT);
|
||||
assert(!is_write);
|
||||
submit_surf->flags |= VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES;
|
||||
}
|
||||
submit_surf->bits |=
|
||||
VC4_SET_FIELD(surf->tiling, VC4_LOADSTORE_TILE_BUFFER_TILING);
|
||||
|
||||
if (is_write)
|
||||
rsc->writes++;
|
||||
}
|
||||
|
||||
static void
|
||||
vc4_submit_setup_ms_rcl_surface(struct vc4_context *vc4,
|
||||
struct drm_vc4_submit_rcl_surface *submit_surf,
|
||||
struct pipe_surface *psurf)
|
||||
vc4_submit_setup_rcl_render_config_surface(struct vc4_context *vc4,
|
||||
struct drm_vc4_submit_rcl_surface *submit_surf,
|
||||
struct pipe_surface *psurf)
|
||||
{
|
||||
struct vc4_surface *surf = vc4_surface(psurf);
|
||||
|
||||
@@ -126,16 +132,38 @@ vc4_submit_setup_ms_rcl_surface(struct vc4_context *vc4,
|
||||
submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo);
|
||||
submit_surf->offset = surf->offset;
|
||||
|
||||
submit_surf->bits =
|
||||
VC4_SET_FIELD(vc4_rt_format_is_565(surf->base.format) ?
|
||||
VC4_RENDER_CONFIG_FORMAT_BGR565 :
|
||||
VC4_RENDER_CONFIG_FORMAT_RGBA8888,
|
||||
VC4_RENDER_CONFIG_FORMAT) |
|
||||
VC4_SET_FIELD(surf->tiling, VC4_RENDER_CONFIG_MEMORY_FORMAT);
|
||||
if (psurf->texture->nr_samples == 0) {
|
||||
submit_surf->bits =
|
||||
VC4_SET_FIELD(vc4_rt_format_is_565(surf->base.format) ?
|
||||
VC4_RENDER_CONFIG_FORMAT_BGR565 :
|
||||
VC4_RENDER_CONFIG_FORMAT_RGBA8888,
|
||||
VC4_RENDER_CONFIG_FORMAT) |
|
||||
VC4_SET_FIELD(surf->tiling,
|
||||
VC4_RENDER_CONFIG_MEMORY_FORMAT);
|
||||
}
|
||||
|
||||
rsc->writes++;
|
||||
}
|
||||
|
||||
static void
|
||||
vc4_submit_setup_rcl_msaa_surface(struct vc4_context *vc4,
|
||||
struct drm_vc4_submit_rcl_surface *submit_surf,
|
||||
struct pipe_surface *psurf)
|
||||
{
|
||||
struct vc4_surface *surf = vc4_surface(psurf);
|
||||
|
||||
if (!surf) {
|
||||
submit_surf->hindex = ~0;
|
||||
return;
|
||||
}
|
||||
|
||||
struct vc4_resource *rsc = vc4_resource(psurf->texture);
|
||||
submit_surf->hindex = vc4_gem_hindex(vc4, rsc->bo);
|
||||
submit_surf->offset = surf->offset;
|
||||
submit_surf->bits = 0;
|
||||
rsc->writes++;
|
||||
}
|
||||
|
||||
/**
|
||||
* Submits the job to the kernel and then reinitializes it.
|
||||
*/
|
||||
@@ -150,18 +178,35 @@ vc4_job_submit(struct vc4_context *vc4)
|
||||
struct drm_vc4_submit_cl submit;
|
||||
memset(&submit, 0, sizeof(submit));
|
||||
|
||||
cl_ensure_space(&vc4->bo_handles, 4 * sizeof(uint32_t));
|
||||
cl_ensure_space(&vc4->bo_pointers, 4 * sizeof(struct vc4_bo *));
|
||||
cl_ensure_space(&vc4->bo_handles, 6 * sizeof(uint32_t));
|
||||
cl_ensure_space(&vc4->bo_pointers, 6 * sizeof(struct vc4_bo *));
|
||||
|
||||
vc4_submit_setup_rcl_surface(vc4, &submit.color_read,
|
||||
vc4->color_read, false, false);
|
||||
vc4_submit_setup_ms_rcl_surface(vc4, &submit.color_ms_write,
|
||||
vc4->color_write);
|
||||
vc4_submit_setup_rcl_render_config_surface(vc4, &submit.color_write,
|
||||
vc4->color_write);
|
||||
vc4_submit_setup_rcl_surface(vc4, &submit.zs_read,
|
||||
vc4->zs_read, true, false);
|
||||
vc4_submit_setup_rcl_surface(vc4, &submit.zs_write,
|
||||
vc4->zs_write, true, true);
|
||||
|
||||
vc4_submit_setup_rcl_msaa_surface(vc4, &submit.msaa_color_write,
|
||||
vc4->msaa_color_write);
|
||||
vc4_submit_setup_rcl_msaa_surface(vc4, &submit.msaa_zs_write,
|
||||
vc4->msaa_zs_write);
|
||||
|
||||
if (vc4->msaa) {
|
||||
/* This bit controls how many pixels the general
|
||||
* (i.e. subsampled) loads/stores are iterating over
|
||||
* (multisample loads replicate out to the other samples).
|
||||
*/
|
||||
submit.color_write.bits |= VC4_RENDER_CONFIG_MS_MODE_4X;
|
||||
/* Controls whether color_write's
|
||||
* VC4_PACKET_STORE_MS_TILE_BUFFER does 4x decimation
|
||||
*/
|
||||
submit.color_write.bits |= VC4_RENDER_CONFIG_DECIMATE_MODE_4X;
|
||||
}
|
||||
|
||||
submit.bo_handles = (uintptr_t)vc4->bo_handles.base;
|
||||
submit.bo_handle_count = cl_offset(&vc4->bo_handles) / 4;
|
||||
submit.bin_cl = (uintptr_t)vc4->bcl.base;
|
||||
@@ -173,10 +218,10 @@ vc4_job_submit(struct vc4_context *vc4)
|
||||
submit.uniforms_size = cl_offset(&vc4->uniforms);
|
||||
|
||||
assert(vc4->draw_min_x != ~0 && vc4->draw_min_y != ~0);
|
||||
submit.min_x_tile = vc4->draw_min_x / 64;
|
||||
submit.min_y_tile = vc4->draw_min_y / 64;
|
||||
submit.max_x_tile = (vc4->draw_max_x - 1) / 64;
|
||||
submit.max_y_tile = (vc4->draw_max_y - 1) / 64;
|
||||
submit.min_x_tile = vc4->draw_min_x / vc4->tile_width;
|
||||
submit.min_y_tile = vc4->draw_min_y / vc4->tile_height;
|
||||
submit.max_x_tile = (vc4->draw_max_x - 1) / vc4->tile_width;
|
||||
submit.max_y_tile = (vc4->draw_max_y - 1) / vc4->tile_height;
|
||||
submit.width = vc4->draw_width;
|
||||
submit.height = vc4->draw_height;
|
||||
if (vc4->cleared) {
|
||||
|
@@ -29,6 +29,10 @@
|
||||
* from the tile buffer after having waited for the scoreboard (which is
|
||||
* handled by vc4_qpu_emit.c), then do math using your output color and that
|
||||
* destination value, and update the output color appropriately.
|
||||
*
|
||||
* Once this pass is done, the color write will either have one component (for
|
||||
* single sample) with packed argb8888, or 4 components with the per-sample
|
||||
* argb8888 result.
|
||||
*/
|
||||
|
||||
/**
|
||||
@@ -40,15 +44,23 @@
|
||||
#include "glsl/nir/nir_builder.h"
|
||||
#include "vc4_context.h"
|
||||
|
||||
static bool
|
||||
blend_depends_on_dst_color(struct vc4_compile *c)
|
||||
{
|
||||
return (c->fs_key->blend.blend_enable ||
|
||||
c->fs_key->blend.colormask != 0xf ||
|
||||
c->fs_key->logicop_func != PIPE_LOGICOP_COPY);
|
||||
}
|
||||
|
||||
/** Emits a load of the previous fragment color from the tile buffer. */
|
||||
static nir_ssa_def *
|
||||
vc4_nir_get_dst_color(nir_builder *b)
|
||||
vc4_nir_get_dst_color(nir_builder *b, int sample)
|
||||
{
|
||||
nir_intrinsic_instr *load =
|
||||
nir_intrinsic_instr_create(b->shader,
|
||||
nir_intrinsic_load_input);
|
||||
load->num_components = 1;
|
||||
load->const_index[0] = VC4_NIR_TLB_COLOR_READ_INPUT;
|
||||
load->const_index[0] = VC4_NIR_TLB_COLOR_READ_INPUT + sample;
|
||||
nir_ssa_dest_init(&load->instr, &load->dest, 1, NULL);
|
||||
nir_builder_instr_insert(b, &load->instr);
|
||||
return &load->dest.ssa;
|
||||
@@ -496,23 +508,26 @@ vc4_nir_swizzle_and_pack(struct vc4_compile *c, nir_builder *b,
|
||||
|
||||
}
|
||||
|
||||
static void
|
||||
vc4_nir_lower_blend_instr(struct vc4_compile *c, nir_builder *b,
|
||||
nir_intrinsic_instr *intr)
|
||||
static nir_ssa_def *
|
||||
vc4_nir_blend_pipeline(struct vc4_compile *c, nir_builder *b, nir_ssa_def *src,
|
||||
int sample)
|
||||
{
|
||||
enum pipe_format color_format = c->fs_key->color_format;
|
||||
const uint8_t *format_swiz = vc4_get_format_swizzle(color_format);
|
||||
bool srgb = util_format_is_srgb(color_format);
|
||||
|
||||
/* Pull out the float src/dst color components. */
|
||||
nir_ssa_def *packed_dst_color = vc4_nir_get_dst_color(b);
|
||||
nir_ssa_def *packed_dst_color = vc4_nir_get_dst_color(b, sample);
|
||||
nir_ssa_def *dst_vec4 = nir_unpack_unorm_4x8(b, packed_dst_color);
|
||||
nir_ssa_def *src_color[4], *unpacked_dst_color[4];
|
||||
for (unsigned i = 0; i < 4; i++) {
|
||||
src_color[i] = nir_swizzle(b, intr->src[0].ssa, &i, 1, false);
|
||||
unpacked_dst_color[i] = nir_swizzle(b, dst_vec4, &i, 1, false);
|
||||
src_color[i] = nir_channel(b, src, i);
|
||||
unpacked_dst_color[i] = nir_channel(b, dst_vec4, i);
|
||||
}
|
||||
|
||||
if (c->fs_key->sample_alpha_to_one && c->fs_key->msaa)
|
||||
src_color[3] = nir_imm_float(b, 1.0);
|
||||
|
||||
vc4_nir_emit_alpha_test_discard(c, b, src_color[3]);
|
||||
|
||||
nir_ssa_def *packed_color;
|
||||
@@ -560,16 +575,100 @@ vc4_nir_lower_blend_instr(struct vc4_compile *c, nir_builder *b,
|
||||
colormask &= ~(0xff << (i * 8));
|
||||
}
|
||||
}
|
||||
packed_color = nir_ior(b,
|
||||
nir_iand(b, packed_color,
|
||||
nir_imm_int(b, colormask)),
|
||||
nir_iand(b, packed_dst_color,
|
||||
nir_imm_int(b, ~colormask)));
|
||||
|
||||
/* Turn the old vec4 output into a store of the packed color. */
|
||||
nir_instr_rewrite_src(&intr->instr, &intr->src[0],
|
||||
nir_src_for_ssa(packed_color));
|
||||
return nir_ior(b,
|
||||
nir_iand(b, packed_color,
|
||||
nir_imm_int(b, colormask)),
|
||||
nir_iand(b, packed_dst_color,
|
||||
nir_imm_int(b, ~colormask)));
|
||||
}
|
||||
|
||||
static int
|
||||
vc4_nir_next_output_driver_location(nir_shader *s)
|
||||
{
|
||||
int maxloc = -1;
|
||||
|
||||
nir_foreach_variable(var, &s->outputs)
|
||||
maxloc = MAX2(maxloc, (int)var->data.driver_location);
|
||||
|
||||
return maxloc + 1;
|
||||
}
|
||||
|
||||
static void
|
||||
vc4_nir_store_sample_mask(struct vc4_compile *c, nir_builder *b,
|
||||
nir_ssa_def *val)
|
||||
{
|
||||
nir_variable *sample_mask = nir_variable_create(c->s, nir_var_shader_out,
|
||||
glsl_uint_type(),
|
||||
"sample_mask");
|
||||
sample_mask->data.driver_location =
|
||||
vc4_nir_next_output_driver_location(c->s);
|
||||
sample_mask->data.location = FRAG_RESULT_SAMPLE_MASK;
|
||||
|
||||
nir_intrinsic_instr *intr =
|
||||
nir_intrinsic_instr_create(c->s, nir_intrinsic_store_output);
|
||||
intr->num_components = 1;
|
||||
intr->const_index[0] = sample_mask->data.driver_location;
|
||||
|
||||
intr->src[0] = nir_src_for_ssa(val);
|
||||
nir_builder_instr_insert(b, &intr->instr);
|
||||
}
|
||||
|
||||
static void
|
||||
vc4_nir_lower_blend_instr(struct vc4_compile *c, nir_builder *b,
|
||||
nir_intrinsic_instr *intr)
|
||||
{
|
||||
nir_ssa_def *frag_color = intr->src[0].ssa;
|
||||
|
||||
if (c->fs_key->sample_coverage) {
|
||||
nir_intrinsic_instr *load =
|
||||
nir_intrinsic_instr_create(b->shader,
|
||||
nir_intrinsic_load_sample_mask_in);
|
||||
load->num_components = 1;
|
||||
nir_ssa_dest_init(&load->instr, &load->dest, 1, NULL);
|
||||
nir_builder_instr_insert(b, &load->instr);
|
||||
|
||||
nir_ssa_def *bitmask = &load->dest.ssa;
|
||||
|
||||
vc4_nir_store_sample_mask(c, b, bitmask);
|
||||
} else if (c->fs_key->sample_alpha_to_coverage) {
|
||||
nir_ssa_def *a = nir_channel(b, frag_color, 3);
|
||||
|
||||
/* XXX: We should do a nice dither based on the fragment
|
||||
* coordinate, instead.
|
||||
*/
|
||||
nir_ssa_def *num_samples = nir_imm_float(b, VC4_MAX_SAMPLES);
|
||||
nir_ssa_def *num_bits = nir_f2i(b, nir_fmul(b, a, num_samples));
|
||||
nir_ssa_def *bitmask = nir_isub(b,
|
||||
nir_ishl(b,
|
||||
nir_imm_int(b, 1),
|
||||
num_bits),
|
||||
nir_imm_int(b, 1));
|
||||
vc4_nir_store_sample_mask(c, b, bitmask);
|
||||
}
|
||||
|
||||
/* The TLB color read returns each sample in turn, so if our blending
|
||||
* depends on the destination color, we're going to have to run the
|
||||
* blending function separately for each destination sample value, and
|
||||
* then output the per-sample color using TLB_COLOR_MS.
|
||||
*/
|
||||
nir_ssa_def *blend_output;
|
||||
if (c->fs_key->msaa && blend_depends_on_dst_color(c)) {
|
||||
c->msaa_per_sample_output = true;
|
||||
|
||||
nir_ssa_def *samples[4];
|
||||
for (int i = 0; i < VC4_MAX_SAMPLES; i++)
|
||||
samples[i] = vc4_nir_blend_pipeline(c, b, frag_color, i);
|
||||
blend_output = nir_vec4(b,
|
||||
samples[0], samples[1],
|
||||
samples[2], samples[3]);
|
||||
} else {
|
||||
blend_output = vc4_nir_blend_pipeline(c, b, frag_color, 0);
|
||||
}
|
||||
|
||||
nir_instr_rewrite_src(&intr->instr, &intr->src[0],
|
||||
nir_src_for_ssa(blend_output));
|
||||
intr->num_components = blend_output->num_components;
|
||||
}
|
||||
|
||||
static bool
|
||||
@@ -577,7 +676,7 @@ vc4_nir_lower_blend_block(nir_block *block, void *state)
|
||||
{
|
||||
struct vc4_compile *c = state;
|
||||
|
||||
nir_foreach_instr(block, instr) {
|
||||
nir_foreach_instr_safe(block, instr) {
|
||||
if (instr->type != nir_instr_type_intrinsic)
|
||||
continue;
|
||||
nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
|
||||
|
@@ -84,7 +84,7 @@ vc4_nir_unpack_16u(nir_builder *b, nir_ssa_def *src, unsigned chan)
|
||||
static nir_ssa_def *
|
||||
vc4_nir_unpack_8f(nir_builder *b, nir_ssa_def *src, unsigned chan)
|
||||
{
|
||||
return nir_swizzle(b, nir_unpack_unorm_4x8(b, src), &chan, 1, false);
|
||||
return nir_channel(b, nir_unpack_unorm_4x8(b, src), chan);
|
||||
}
|
||||
|
||||
static nir_ssa_def *
|
||||
@@ -226,7 +226,9 @@ vc4_nir_lower_fs_input(struct vc4_compile *c, nir_builder *b,
|
||||
{
|
||||
b->cursor = nir_before_instr(&intr->instr);
|
||||
|
||||
if (intr->const_index[0] == VC4_NIR_TLB_COLOR_READ_INPUT) {
|
||||
if (intr->const_index[0] >= VC4_NIR_TLB_COLOR_READ_INPUT &&
|
||||
intr->const_index[0] < (VC4_NIR_TLB_COLOR_READ_INPUT +
|
||||
VC4_MAX_SAMPLES)) {
|
||||
/* This doesn't need any lowering. */
|
||||
return;
|
||||
}
|
||||
@@ -309,7 +311,8 @@ vc4_nir_lower_output(struct vc4_compile *c, nir_builder *b,
|
||||
/* Color output is lowered by vc4_nir_lower_blend(). */
|
||||
if (c->stage == QSTAGE_FRAG &&
|
||||
(output_var->data.location == FRAG_RESULT_COLOR ||
|
||||
output_var->data.location == FRAG_RESULT_DATA0)) {
|
||||
output_var->data.location == FRAG_RESULT_DATA0 ||
|
||||
output_var->data.location == FRAG_RESULT_SAMPLE_MASK)) {
|
||||
intr->const_index[0] *= 4;
|
||||
return;
|
||||
}
|
||||
@@ -326,9 +329,8 @@ vc4_nir_lower_output(struct vc4_compile *c, nir_builder *b,
|
||||
intr_comp->const_index[0] = intr->const_index[0] * 4 + i;
|
||||
|
||||
assert(intr->src[0].is_ssa);
|
||||
intr_comp->src[0] = nir_src_for_ssa(nir_swizzle(b,
|
||||
intr->src[0].ssa,
|
||||
&i, 1, false));
|
||||
intr_comp->src[0] =
|
||||
nir_src_for_ssa(nir_channel(b, intr->src[0].ssa, i));
|
||||
nir_builder_instr_insert(b, &intr_comp->instr);
|
||||
}
|
||||
|
||||
|
172
src/gallium/drivers/vc4/vc4_nir_lower_txf_ms.c
Normal file
172
src/gallium/drivers/vc4/vc4_nir_lower_txf_ms.c
Normal file
@@ -0,0 +1,172 @@
|
||||
/*
|
||||
* Copyright © 2015 Broadcom
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice (including the next
|
||||
* paragraph) shall be included in all copies or substantial portions of the
|
||||
* Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
* IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#include "vc4_qir.h"
|
||||
#include "kernel/vc4_packet.h"
|
||||
#include "tgsi/tgsi_info.h"
|
||||
#include "glsl/nir/nir_builder.h"
|
||||
|
||||
/** @file vc4_nir_lower_txf_ms.c
|
||||
* Walks the NIR generated by TGSI-to-NIR to lower its nir_texop_txf_ms
|
||||
* coordinates to do the math necessary and use a plain nir_texop_txf instead.
|
||||
*
|
||||
* MSAA textures are laid out as 32x32-aligned blocks of RGBA8888 or Z24S8.
|
||||
* We can't load them through the normal sampler path because of the lack of
|
||||
* linear support in the hardware. So, we treat MSAA textures as a giant UBO
|
||||
* and do the math in the shader.
|
||||
*/
|
||||
|
||||
static void
|
||||
vc4_nir_lower_txf_ms_instr(struct vc4_compile *c, nir_builder *b,
|
||||
nir_tex_instr *txf_ms)
|
||||
{
|
||||
if (txf_ms->op != nir_texop_txf_ms)
|
||||
return;
|
||||
|
||||
b->cursor = nir_before_instr(&txf_ms->instr);
|
||||
|
||||
nir_tex_instr *txf = nir_tex_instr_create(c->s, 1);
|
||||
txf->op = nir_texop_txf;
|
||||
txf->sampler = txf_ms->sampler;
|
||||
txf->sampler_index = txf_ms->sampler_index;
|
||||
txf->coord_components = txf_ms->coord_components;
|
||||
txf->is_shadow = txf_ms->is_shadow;
|
||||
txf->is_new_style_shadow = txf_ms->is_new_style_shadow;
|
||||
|
||||
nir_ssa_def *coord = NULL, *sample_index = NULL;
|
||||
for (int i = 0; i < txf_ms->num_srcs; i++) {
|
||||
assert(txf_ms->src[i].src.is_ssa);
|
||||
|
||||
switch (txf_ms->src[i].src_type) {
|
||||
case nir_tex_src_coord:
|
||||
coord = txf_ms->src[i].src.ssa;
|
||||
break;
|
||||
case nir_tex_src_ms_index:
|
||||
sample_index = txf_ms->src[i].src.ssa;
|
||||
break;
|
||||
default:
|
||||
unreachable("Unknown txf_ms src\n");
|
||||
}
|
||||
}
|
||||
assert(coord);
|
||||
assert(sample_index);
|
||||
|
||||
nir_ssa_def *x = nir_channel(b, coord, 0);
|
||||
nir_ssa_def *y = nir_channel(b, coord, 1);
|
||||
|
||||
uint32_t tile_w = 32;
|
||||
uint32_t tile_h = 32;
|
||||
uint32_t tile_w_shift = 5;
|
||||
uint32_t tile_h_shift = 5;
|
||||
uint32_t tile_size = (tile_h * tile_w *
|
||||
VC4_MAX_SAMPLES * sizeof(uint32_t));
|
||||
unsigned unit = txf_ms->sampler_index;
|
||||
uint32_t w = align(c->key->tex[unit].msaa_width, tile_w);
|
||||
uint32_t w_tiles = w / tile_w;
|
||||
|
||||
nir_ssa_def *x_tile = nir_ushr(b, x, nir_imm_int(b, tile_w_shift));
|
||||
nir_ssa_def *y_tile = nir_ushr(b, y, nir_imm_int(b, tile_h_shift));
|
||||
nir_ssa_def *tile_addr = nir_iadd(b,
|
||||
nir_imul(b, x_tile,
|
||||
nir_imm_int(b, tile_size)),
|
||||
nir_imul(b, y_tile,
|
||||
nir_imm_int(b, (w_tiles *
|
||||
tile_size))));
|
||||
nir_ssa_def *x_subspan = nir_iand(b, x,
|
||||
nir_imm_int(b, (tile_w - 1) & ~1));
|
||||
nir_ssa_def *y_subspan = nir_iand(b, y,
|
||||
nir_imm_int(b, (tile_h - 1) & ~1));
|
||||
nir_ssa_def *subspan_addr = nir_iadd(b,
|
||||
nir_imul(b, x_subspan,
|
||||
nir_imm_int(b, 2 * VC4_MAX_SAMPLES * sizeof(uint32_t))),
|
||||
nir_imul(b, y_subspan,
|
||||
nir_imm_int(b,
|
||||
tile_w *
|
||||
VC4_MAX_SAMPLES *
|
||||
sizeof(uint32_t))));
|
||||
|
||||
nir_ssa_def *pixel_addr = nir_ior(b,
|
||||
nir_iand(b,
|
||||
nir_ishl(b, x,
|
||||
nir_imm_int(b, 2)),
|
||||
nir_imm_int(b, (1 << 2))),
|
||||
nir_iand(b,
|
||||
nir_ishl(b, y,
|
||||
nir_imm_int(b, 3)),
|
||||
nir_imm_int(b, (1 << 3))));
|
||||
|
||||
nir_ssa_def *sample_addr = nir_ishl(b, sample_index, nir_imm_int(b, 4));
|
||||
|
||||
nir_ssa_def *addr = nir_iadd(b,
|
||||
nir_ior(b, sample_addr, pixel_addr),
|
||||
nir_iadd(b, subspan_addr, tile_addr));
|
||||
|
||||
txf->src[0].src_type = nir_tex_src_coord;
|
||||
txf->src[0].src = nir_src_for_ssa(nir_vec2(b, addr, nir_imm_int(b, 0)));
|
||||
nir_ssa_dest_init(&txf->instr, &txf->dest, 4, NULL);
|
||||
nir_builder_instr_insert(b, &txf->instr);
|
||||
nir_ssa_def_rewrite_uses(&txf_ms->dest.ssa,
|
||||
nir_src_for_ssa(&txf->dest.ssa));
|
||||
nir_instr_remove(&txf_ms->instr);
|
||||
}
|
||||
|
||||
static bool
|
||||
vc4_nir_lower_txf_ms_block(nir_block *block, void *arg)
|
||||
{
|
||||
struct vc4_compile *c = arg;
|
||||
nir_function_impl *impl =
|
||||
nir_cf_node_get_function(&block->cf_node);
|
||||
|
||||
nir_builder b;
|
||||
nir_builder_init(&b, impl);
|
||||
|
||||
nir_foreach_instr_safe(block, instr) {
|
||||
if (instr->type == nir_instr_type_tex) {
|
||||
vc4_nir_lower_txf_ms_instr(c, &b,
|
||||
nir_instr_as_tex(instr));
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool
|
||||
vc4_nir_lower_txf_ms_impl(struct vc4_compile *c, nir_function_impl *impl)
|
||||
{
|
||||
nir_foreach_block(impl, vc4_nir_lower_txf_ms_block, c);
|
||||
|
||||
nir_metadata_preserve(impl,
|
||||
nir_metadata_block_index |
|
||||
nir_metadata_dominance);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
void
|
||||
vc4_nir_lower_txf_ms(struct vc4_compile *c)
|
||||
{
|
||||
nir_foreach_overload(c->s, overload) {
|
||||
if (overload->impl)
|
||||
vc4_nir_lower_txf_ms_impl(c, overload->impl);
|
||||
}
|
||||
}
|
@@ -94,7 +94,12 @@ static void
|
||||
replace_with_mov(struct vc4_compile *c, struct qinst *inst, struct qreg arg)
|
||||
{
|
||||
dump_from(c, inst);
|
||||
inst->op = QOP_MOV;
|
||||
if (qir_is_mul(inst))
|
||||
inst->op = QOP_MMOV;
|
||||
else if (qir_is_float_input(inst))
|
||||
inst->op = QOP_FMOV;
|
||||
else
|
||||
inst->op = QOP_MOV;
|
||||
inst->src[0] = arg;
|
||||
inst->src[1] = c->undef;
|
||||
dump_to(c, inst);
|
||||
@@ -181,6 +186,7 @@ qir_opt_algebraic(struct vc4_compile *c)
|
||||
case QOP_SUB:
|
||||
if (is_zero(c, inst->src[1])) {
|
||||
replace_with_mov(c, inst, inst->src[0]);
|
||||
progress = true;
|
||||
}
|
||||
break;
|
||||
|
||||
|
@@ -294,6 +294,76 @@ ntq_umul(struct vc4_compile *c, struct qreg src0, struct qreg src1)
|
||||
qir_uniform_ui(c, 24)));
|
||||
}
|
||||
|
||||
static struct qreg
|
||||
ntq_scale_depth_texture(struct vc4_compile *c, struct qreg src)
|
||||
{
|
||||
struct qreg depthf = qir_ITOF(c, qir_SHR(c, src,
|
||||
qir_uniform_ui(c, 8)));
|
||||
return qir_FMUL(c, depthf, qir_uniform_f(c, 1.0f/0xffffff));
|
||||
}
|
||||
|
||||
/**
|
||||
* Emits a lowered TXF_MS from an MSAA texture.
|
||||
*
|
||||
* The addressing math has been lowered in NIR, and now we just need to read
|
||||
* it like a UBO.
|
||||
*/
|
||||
static void
|
||||
ntq_emit_txf(struct vc4_compile *c, nir_tex_instr *instr)
|
||||
{
|
||||
uint32_t tile_width = 32;
|
||||
uint32_t tile_height = 32;
|
||||
uint32_t tile_size = (tile_height * tile_width *
|
||||
VC4_MAX_SAMPLES * sizeof(uint32_t));
|
||||
|
||||
unsigned unit = instr->sampler_index;
|
||||
uint32_t w = align(c->key->tex[unit].msaa_width, tile_width);
|
||||
uint32_t w_tiles = w / tile_width;
|
||||
uint32_t h = align(c->key->tex[unit].msaa_height, tile_height);
|
||||
uint32_t h_tiles = h / tile_height;
|
||||
uint32_t size = w_tiles * h_tiles * tile_size;
|
||||
|
||||
struct qreg addr;
|
||||
assert(instr->num_srcs == 1);
|
||||
assert(instr->src[0].src_type == nir_tex_src_coord);
|
||||
addr = ntq_get_src(c, instr->src[0].src, 0);
|
||||
|
||||
/* Perform the clamping required by kernel validation. */
|
||||
addr = qir_MAX(c, addr, qir_uniform_ui(c, 0));
|
||||
addr = qir_MIN(c, addr, qir_uniform_ui(c, size - 4));
|
||||
|
||||
qir_TEX_DIRECT(c, addr, qir_uniform(c, QUNIFORM_TEXTURE_MSAA_ADDR, unit));
|
||||
|
||||
struct qreg tex = qir_TEX_RESULT(c);
|
||||
c->num_texture_samples++;
|
||||
|
||||
struct qreg texture_output[4];
|
||||
enum pipe_format format = c->key->tex[unit].format;
|
||||
if (util_format_is_depth_or_stencil(format)) {
|
||||
struct qreg scaled = ntq_scale_depth_texture(c, tex);
|
||||
for (int i = 0; i < 4; i++)
|
||||
texture_output[i] = scaled;
|
||||
} else {
|
||||
struct qreg tex_result_unpacked[4];
|
||||
for (int i = 0; i < 4; i++)
|
||||
tex_result_unpacked[i] = qir_UNPACK_8_F(c, tex, i);
|
||||
|
||||
const uint8_t *format_swiz =
|
||||
vc4_get_format_swizzle(c->key->tex[unit].format);
|
||||
for (int i = 0; i < 4; i++) {
|
||||
texture_output[i] =
|
||||
get_swizzled_channel(c, tex_result_unpacked,
|
||||
format_swiz[i]);
|
||||
}
|
||||
}
|
||||
|
||||
struct qreg *dest = ntq_get_dest(c, &instr->dest);
|
||||
for (int i = 0; i < 4; i++) {
|
||||
dest[i] = get_swizzled_channel(c, texture_output,
|
||||
c->key->tex[unit].swizzle[i]);
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr)
|
||||
{
|
||||
@@ -301,6 +371,11 @@ ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr)
|
||||
bool is_txb = false, is_txl = false, has_proj = false;
|
||||
unsigned unit = instr->sampler_index;
|
||||
|
||||
if (instr->op == nir_texop_txf) {
|
||||
ntq_emit_txf(c, instr);
|
||||
return;
|
||||
}
|
||||
|
||||
for (unsigned i = 0; i < instr->num_srcs; i++) {
|
||||
switch (instr->src[i].src_type) {
|
||||
case nir_tex_src_coord:
|
||||
@@ -396,11 +471,7 @@ ntq_emit_tex(struct vc4_compile *c, nir_tex_instr *instr)
|
||||
|
||||
struct qreg unpacked[4];
|
||||
if (util_format_is_depth_or_stencil(format)) {
|
||||
struct qreg depthf = qir_ITOF(c, qir_SHR(c, tex,
|
||||
qir_uniform_ui(c, 8)));
|
||||
struct qreg normalized = qir_FMUL(c, depthf,
|
||||
qir_uniform_f(c, 1.0f/0xffffff));
|
||||
|
||||
struct qreg normalized = ntq_scale_depth_texture(c, tex);
|
||||
struct qreg depth_output;
|
||||
|
||||
struct qreg one = qir_uniform_f(c, 1.0f);
|
||||
@@ -1109,6 +1180,10 @@ emit_frag_end(struct vc4_compile *c)
|
||||
}
|
||||
}
|
||||
|
||||
if (c->output_sample_mask_index != -1) {
|
||||
qir_MS_MASK(c, c->outputs[c->output_sample_mask_index]);
|
||||
}
|
||||
|
||||
if (c->fs_key->depth_enabled) {
|
||||
struct qreg z;
|
||||
if (c->output_position_index != -1) {
|
||||
@@ -1120,7 +1195,12 @@ emit_frag_end(struct vc4_compile *c)
|
||||
qir_TLB_Z_WRITE(c, z);
|
||||
}
|
||||
|
||||
qir_TLB_COLOR_WRITE(c, color);
|
||||
if (!c->msaa_per_sample_output) {
|
||||
qir_TLB_COLOR_WRITE(c, color);
|
||||
} else {
|
||||
for (int i = 0; i < VC4_MAX_SAMPLES; i++)
|
||||
qir_TLB_COLOR_WRITE_MS(c, c->sample_colors[i]);
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
@@ -1171,7 +1251,7 @@ emit_point_size_write(struct vc4_compile *c)
|
||||
struct qreg point_size;
|
||||
|
||||
if (c->output_point_size_index != -1)
|
||||
point_size = c->outputs[c->output_point_size_index + 3];
|
||||
point_size = c->outputs[c->output_point_size_index];
|
||||
else
|
||||
point_size = qir_uniform_f(c, 1.0);
|
||||
|
||||
@@ -1359,6 +1439,9 @@ ntq_setup_outputs(struct vc4_compile *c)
|
||||
case FRAG_RESULT_DEPTH:
|
||||
c->output_position_index = loc;
|
||||
break;
|
||||
case FRAG_RESULT_SAMPLE_MASK:
|
||||
c->output_sample_mask_index = loc;
|
||||
break;
|
||||
}
|
||||
} else {
|
||||
switch (var->data.location) {
|
||||
@@ -1462,20 +1545,48 @@ ntq_emit_intrinsic(struct vc4_compile *c, nir_intrinsic_instr *instr)
|
||||
instr->const_index[0]);
|
||||
break;
|
||||
|
||||
case nir_intrinsic_load_sample_mask_in:
|
||||
*dest = qir_uniform(c, QUNIFORM_SAMPLE_MASK, 0);
|
||||
break;
|
||||
|
||||
case nir_intrinsic_load_input:
|
||||
assert(instr->num_components == 1);
|
||||
if (instr->const_index[0] == VC4_NIR_TLB_COLOR_READ_INPUT) {
|
||||
*dest = qir_TLB_COLOR_READ(c);
|
||||
if (instr->const_index[0] >= VC4_NIR_TLB_COLOR_READ_INPUT) {
|
||||
/* Reads of the per-sample color need to be done in
|
||||
* order.
|
||||
*/
|
||||
int sample_index = (instr->const_index[0] -
|
||||
VC4_NIR_TLB_COLOR_READ_INPUT);
|
||||
for (int i = 0; i <= sample_index; i++) {
|
||||
if (c->color_reads[i].file == QFILE_NULL) {
|
||||
c->color_reads[i] =
|
||||
qir_TLB_COLOR_READ(c);
|
||||
}
|
||||
}
|
||||
*dest = c->color_reads[sample_index];
|
||||
} else {
|
||||
*dest = c->inputs[instr->const_index[0]];
|
||||
}
|
||||
break;
|
||||
|
||||
case nir_intrinsic_store_output:
|
||||
assert(instr->num_components == 1);
|
||||
c->outputs[instr->const_index[0]] =
|
||||
qir_MOV(c, ntq_get_src(c, instr->src[0], 0));
|
||||
c->num_outputs = MAX2(c->num_outputs, instr->const_index[0] + 1);
|
||||
/* MSAA color outputs are the only case where we have an
|
||||
* output that's not lowered to being a store of a single 32
|
||||
* bit value.
|
||||
*/
|
||||
if (c->stage == QSTAGE_FRAG && instr->num_components == 4) {
|
||||
assert(instr->const_index[0] == c->output_color_index);
|
||||
for (int i = 0; i < 4; i++) {
|
||||
c->sample_colors[i] =
|
||||
qir_MOV(c, ntq_get_src(c, instr->src[0],
|
||||
i));
|
||||
}
|
||||
} else {
|
||||
assert(instr->num_components == 1);
|
||||
c->outputs[instr->const_index[0]] =
|
||||
qir_MOV(c, ntq_get_src(c, instr->src[0], 0));
|
||||
c->num_outputs = MAX2(c->num_outputs, instr->const_index[0] + 1);
|
||||
}
|
||||
break;
|
||||
|
||||
case nir_intrinsic_discard:
|
||||
@@ -1672,6 +1783,7 @@ vc4_shader_ntq(struct vc4_context *vc4, enum qstage stage,
|
||||
nir_lower_clip_vs(c->s, c->key->ucp_enables);
|
||||
|
||||
vc4_nir_lower_io(c);
|
||||
vc4_nir_lower_txf_ms(c);
|
||||
nir_lower_idiv(c->s);
|
||||
nir_lower_load_const_to_scalar(c->s);
|
||||
|
||||
@@ -1907,12 +2019,19 @@ vc4_setup_shared_key(struct vc4_context *vc4, struct vc4_key *key,
|
||||
struct pipe_sampler_state *sampler_state =
|
||||
texstate->samplers[i];
|
||||
|
||||
if (sampler) {
|
||||
key->tex[i].format = sampler->format;
|
||||
key->tex[i].swizzle[0] = sampler->swizzle_r;
|
||||
key->tex[i].swizzle[1] = sampler->swizzle_g;
|
||||
key->tex[i].swizzle[2] = sampler->swizzle_b;
|
||||
key->tex[i].swizzle[3] = sampler->swizzle_a;
|
||||
if (!sampler)
|
||||
continue;
|
||||
|
||||
key->tex[i].format = sampler->format;
|
||||
key->tex[i].swizzle[0] = sampler->swizzle_r;
|
||||
key->tex[i].swizzle[1] = sampler->swizzle_g;
|
||||
key->tex[i].swizzle[2] = sampler->swizzle_b;
|
||||
key->tex[i].swizzle[3] = sampler->swizzle_a;
|
||||
|
||||
if (sampler->texture->nr_samples) {
|
||||
key->tex[i].msaa_width = sampler->texture->width0;
|
||||
key->tex[i].msaa_height = sampler->texture->height0;
|
||||
} else if (sampler){
|
||||
key->tex[i].compare_mode = sampler_state->compare_mode;
|
||||
key->tex[i].compare_func = sampler_state->compare_func;
|
||||
key->tex[i].wrap_s = sampler_state->wrap_s;
|
||||
@@ -1952,6 +2071,11 @@ vc4_update_compiled_fs(struct vc4_context *vc4, uint8_t prim_mode)
|
||||
} else {
|
||||
key->logicop_func = PIPE_LOGICOP_COPY;
|
||||
}
|
||||
key->msaa = vc4->rasterizer->base.multisample;
|
||||
key->sample_coverage = (vc4->rasterizer->base.multisample &&
|
||||
vc4->sample_mask != (1 << VC4_MAX_SAMPLES) - 1);
|
||||
key->sample_alpha_to_coverage = vc4->blend->alpha_to_coverage;
|
||||
key->sample_alpha_to_one = vc4->blend->alpha_to_one;
|
||||
if (vc4->framebuffer.cbufs[0])
|
||||
key->color_format = vc4->framebuffer.cbufs[0]->format;
|
||||
|
||||
|
@@ -86,7 +86,9 @@ static const struct qir_op_info qir_op_info[] = {
|
||||
[QOP_TLB_STENCIL_SETUP] = { "tlb_stencil_setup", 0, 1, true },
|
||||
[QOP_TLB_Z_WRITE] = { "tlb_z", 0, 1, true },
|
||||
[QOP_TLB_COLOR_WRITE] = { "tlb_color", 0, 1, true },
|
||||
[QOP_TLB_COLOR_WRITE_MS] = { "tlb_color_ms", 0, 1, true },
|
||||
[QOP_TLB_COLOR_READ] = { "tlb_color_read", 1, 0 },
|
||||
[QOP_MS_MASK] = { "ms_mask", 0, 1, true },
|
||||
[QOP_VARY_ADD_C] = { "vary_add_c", 1, 1 },
|
||||
|
||||
[QOP_FRAG_X] = { "frag_x", 1, 0 },
|
||||
@@ -399,6 +401,7 @@ qir_compile_init(void)
|
||||
c->output_position_index = -1;
|
||||
c->output_color_index = -1;
|
||||
c->output_point_size_index = -1;
|
||||
c->output_sample_mask_index = -1;
|
||||
|
||||
c->def_ht = _mesa_hash_table_create(c, _mesa_hash_pointer,
|
||||
_mesa_key_pointer_equal);
|
||||
@@ -420,13 +423,19 @@ qir_remove_instruction(struct vc4_compile *c, struct qinst *qinst)
|
||||
struct qreg
|
||||
qir_follow_movs(struct vc4_compile *c, struct qreg reg)
|
||||
{
|
||||
int pack = reg.pack;
|
||||
|
||||
while (reg.file == QFILE_TEMP &&
|
||||
c->defs[reg.index] &&
|
||||
c->defs[reg.index]->op == QOP_MOV &&
|
||||
!c->defs[reg.index]->dst.pack) {
|
||||
(c->defs[reg.index]->op == QOP_MOV ||
|
||||
c->defs[reg.index]->op == QOP_FMOV ||
|
||||
c->defs[reg.index]->op == QOP_MMOV)&&
|
||||
!c->defs[reg.index]->dst.pack &&
|
||||
!c->defs[reg.index]->src[0].pack) {
|
||||
reg = c->defs[reg.index]->src[0];
|
||||
}
|
||||
|
||||
reg.pack = pack;
|
||||
return reg;
|
||||
}
|
||||
|
||||
|
@@ -38,6 +38,7 @@
|
||||
|
||||
#include "vc4_screen.h"
|
||||
#include "vc4_qpu_defines.h"
|
||||
#include "kernel/vc4_packet.h"
|
||||
#include "pipe/p_state.h"
|
||||
|
||||
struct nir_builder;
|
||||
@@ -121,7 +122,9 @@ enum qop {
|
||||
QOP_TLB_STENCIL_SETUP,
|
||||
QOP_TLB_Z_WRITE,
|
||||
QOP_TLB_COLOR_WRITE,
|
||||
QOP_TLB_COLOR_WRITE_MS,
|
||||
QOP_TLB_COLOR_READ,
|
||||
QOP_MS_MASK,
|
||||
QOP_VARY_ADD_C,
|
||||
|
||||
QOP_FRAG_X,
|
||||
@@ -230,6 +233,8 @@ enum quniform_contents {
|
||||
/** A reference to a texture config parameter 2 cubemap stride uniform */
|
||||
QUNIFORM_TEXTURE_CONFIG_P2,
|
||||
|
||||
QUNIFORM_TEXTURE_MSAA_ADDR,
|
||||
|
||||
QUNIFORM_UBO_ADDR,
|
||||
|
||||
QUNIFORM_TEXRECT_SCALE_X,
|
||||
@@ -247,6 +252,7 @@ enum quniform_contents {
|
||||
QUNIFORM_STENCIL,
|
||||
|
||||
QUNIFORM_ALPHA_REF,
|
||||
QUNIFORM_SAMPLE_MASK,
|
||||
};
|
||||
|
||||
struct vc4_varying_slot {
|
||||
@@ -283,11 +289,18 @@ struct vc4_key {
|
||||
struct vc4_uncompiled_shader *shader_state;
|
||||
struct {
|
||||
enum pipe_format format;
|
||||
unsigned compare_mode:1;
|
||||
unsigned compare_func:3;
|
||||
unsigned wrap_s:3;
|
||||
unsigned wrap_t:3;
|
||||
uint8_t swizzle[4];
|
||||
union {
|
||||
struct {
|
||||
unsigned compare_mode:1;
|
||||
unsigned compare_func:3;
|
||||
unsigned wrap_s:3;
|
||||
unsigned wrap_t:3;
|
||||
};
|
||||
struct {
|
||||
uint16_t msaa_width, msaa_height;
|
||||
};
|
||||
};
|
||||
} tex[VC4_MAX_TEXTURE_SAMPLERS];
|
||||
uint8_t ucp_enables;
|
||||
};
|
||||
@@ -304,6 +317,10 @@ struct vc4_fs_key {
|
||||
bool alpha_test;
|
||||
bool point_coord_upper_left;
|
||||
bool light_twoside;
|
||||
bool msaa;
|
||||
bool sample_coverage;
|
||||
bool sample_alpha_to_coverage;
|
||||
bool sample_alpha_to_one;
|
||||
uint8_t alpha_test_func;
|
||||
uint8_t logicop_func;
|
||||
uint32_t point_sprite_mask;
|
||||
@@ -348,6 +365,9 @@ struct vc4_compile {
|
||||
*/
|
||||
struct qreg *inputs;
|
||||
struct qreg *outputs;
|
||||
bool msaa_per_sample_output;
|
||||
struct qreg color_reads[VC4_MAX_SAMPLES];
|
||||
struct qreg sample_colors[VC4_MAX_SAMPLES];
|
||||
uint32_t inputs_array_size;
|
||||
uint32_t outputs_array_size;
|
||||
uint32_t uniforms_array_size;
|
||||
@@ -396,6 +416,7 @@ struct vc4_compile {
|
||||
uint32_t output_position_index;
|
||||
uint32_t output_color_index;
|
||||
uint32_t output_point_size_index;
|
||||
uint32_t output_sample_mask_index;
|
||||
|
||||
struct qreg undef;
|
||||
enum qstage stage;
|
||||
@@ -418,6 +439,8 @@ struct vc4_compile {
|
||||
*/
|
||||
#define VC4_NIR_TLB_COLOR_READ_INPUT 2000000000
|
||||
|
||||
#define VC4_NIR_MS_MASK_OUTPUT 2000000000
|
||||
|
||||
/* Special offset for nir_load_uniform values to get a QUNIFORM_*
|
||||
* state-dependent value.
|
||||
*/
|
||||
@@ -476,6 +499,7 @@ nir_ssa_def *vc4_nir_get_state_uniform(struct nir_builder *b,
|
||||
enum quniform_contents contents);
|
||||
nir_ssa_def *vc4_nir_get_swizzled_channel(struct nir_builder *b,
|
||||
nir_ssa_def **srcs, int swiz);
|
||||
void vc4_nir_lower_txf_ms(struct vc4_compile *c);
|
||||
void qir_lower_uniforms(struct vc4_compile *c);
|
||||
|
||||
void qpu_schedule_instructions(struct vc4_compile *c);
|
||||
@@ -616,9 +640,11 @@ QIR_ALU0(FRAG_REV_FLAG)
|
||||
QIR_ALU0(TEX_RESULT)
|
||||
QIR_ALU0(TLB_COLOR_READ)
|
||||
QIR_NODST_1(TLB_COLOR_WRITE)
|
||||
QIR_NODST_1(TLB_COLOR_WRITE_MS)
|
||||
QIR_NODST_1(TLB_Z_WRITE)
|
||||
QIR_NODST_1(TLB_DISCARD_SETUP)
|
||||
QIR_NODST_1(TLB_STENCIL_SETUP)
|
||||
QIR_NODST_1(MS_MASK)
|
||||
|
||||
static inline struct qreg
|
||||
qir_UNPACK_8_F(struct vc4_compile *c, struct qreg src, int i)
|
||||
|
@@ -116,6 +116,17 @@ qpu_tlbc()
|
||||
return r;
|
||||
}
|
||||
|
||||
static inline struct qpu_reg
|
||||
qpu_tlbc_ms()
|
||||
{
|
||||
struct qpu_reg r = {
|
||||
QPU_MUX_A,
|
||||
QPU_W_TLB_COLOR_MS,
|
||||
};
|
||||
|
||||
return r;
|
||||
}
|
||||
|
||||
static inline struct qpu_reg qpu_r0(void) { return qpu_rn(0); }
|
||||
static inline struct qpu_reg qpu_r1(void) { return qpu_rn(1); }
|
||||
static inline struct qpu_reg qpu_r2(void) { return qpu_rn(2); }
|
||||
|
@@ -387,6 +387,14 @@ vc4_generate_code(struct vc4_context *vc4, struct vc4_compile *c)
|
||||
qpu_rb(QPU_R_MS_REV_FLAGS)));
|
||||
break;
|
||||
|
||||
case QOP_MS_MASK:
|
||||
src[1] = qpu_ra(QPU_R_MS_REV_FLAGS);
|
||||
fixup_raddr_conflict(c, dst, &src[0], &src[1],
|
||||
qinst, &unpack);
|
||||
queue(c, qpu_a_AND(qpu_ra(QPU_W_MS_FLAGS),
|
||||
src[0], src[1]) | unpack);
|
||||
break;
|
||||
|
||||
case QOP_FRAG_Z:
|
||||
case QOP_FRAG_W:
|
||||
/* QOP_FRAG_Z/W don't emit instructions, just allocate
|
||||
@@ -430,6 +438,13 @@ vc4_generate_code(struct vc4_context *vc4, struct vc4_compile *c)
|
||||
}
|
||||
break;
|
||||
|
||||
case QOP_TLB_COLOR_WRITE_MS:
|
||||
queue(c, qpu_a_MOV(qpu_tlbc_ms(), src[0]));
|
||||
if (discard) {
|
||||
set_last_cond_add(c, QPU_COND_ZS);
|
||||
}
|
||||
break;
|
||||
|
||||
case QOP_VARY_ADD_C:
|
||||
queue(c, qpu_a_FADD(dst, src[0], qpu_r5()) | unpack);
|
||||
break;
|
||||
|
@@ -295,6 +295,10 @@ process_waddr_deps(struct schedule_state *state, struct schedule_node *n,
|
||||
add_write_dep(state, &state->last_tlb, n);
|
||||
break;
|
||||
|
||||
case QPU_W_MS_FLAGS:
|
||||
add_write_dep(state, &state->last_tlb, n);
|
||||
break;
|
||||
|
||||
case QPU_W_NOP:
|
||||
break;
|
||||
|
||||
|
@@ -22,6 +22,7 @@
|
||||
* IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#include "util/u_blit.h"
|
||||
#include "util/u_memory.h"
|
||||
#include "util/u_format.h"
|
||||
#include "util/u_inlines.h"
|
||||
@@ -72,11 +73,18 @@ vc4_resource_transfer_unmap(struct pipe_context *pctx,
|
||||
{
|
||||
struct vc4_context *vc4 = vc4_context(pctx);
|
||||
struct vc4_transfer *trans = vc4_transfer(ptrans);
|
||||
struct pipe_resource *prsc = ptrans->resource;
|
||||
struct vc4_resource *rsc = vc4_resource(prsc);
|
||||
struct vc4_resource_slice *slice = &rsc->slices[ptrans->level];
|
||||
|
||||
if (trans->map) {
|
||||
struct vc4_resource *rsc;
|
||||
struct vc4_resource_slice *slice;
|
||||
if (trans->ss_resource) {
|
||||
rsc = vc4_resource(trans->ss_resource);
|
||||
slice = &rsc->slices[0];
|
||||
} else {
|
||||
rsc = vc4_resource(ptrans->resource);
|
||||
slice = &rsc->slices[ptrans->level];
|
||||
}
|
||||
|
||||
if (ptrans->usage & PIPE_TRANSFER_WRITE) {
|
||||
vc4_store_tiled_image(rsc->bo->map + slice->offset +
|
||||
ptrans->box.z * rsc->cube_map_stride,
|
||||
@@ -88,10 +96,52 @@ vc4_resource_transfer_unmap(struct pipe_context *pctx,
|
||||
free(trans->map);
|
||||
}
|
||||
|
||||
if (trans->ss_resource && (ptrans->usage & PIPE_TRANSFER_WRITE)) {
|
||||
struct pipe_blit_info blit;
|
||||
memset(&blit, 0, sizeof(blit));
|
||||
|
||||
blit.src.resource = trans->ss_resource;
|
||||
blit.src.format = trans->ss_resource->format;
|
||||
blit.src.box.width = trans->ss_box.width;
|
||||
blit.src.box.height = trans->ss_box.height;
|
||||
blit.src.box.depth = 1;
|
||||
|
||||
blit.dst.resource = ptrans->resource;
|
||||
blit.dst.format = ptrans->resource->format;
|
||||
blit.dst.level = ptrans->level;
|
||||
blit.dst.box = trans->ss_box;
|
||||
|
||||
blit.mask = util_format_get_mask(ptrans->resource->format);
|
||||
blit.filter = PIPE_TEX_FILTER_NEAREST;
|
||||
|
||||
pctx->blit(pctx, &blit);
|
||||
vc4_flush(pctx);
|
||||
|
||||
pipe_resource_reference(&trans->ss_resource, NULL);
|
||||
}
|
||||
|
||||
pipe_resource_reference(&ptrans->resource, NULL);
|
||||
util_slab_free(&vc4->transfer_pool, ptrans);
|
||||
}
|
||||
|
||||
static struct pipe_resource *
|
||||
vc4_get_temp_resource(struct pipe_context *pctx,
|
||||
struct pipe_resource *prsc,
|
||||
const struct pipe_box *box)
|
||||
{
|
||||
struct pipe_resource temp_setup;
|
||||
|
||||
memset(&temp_setup, 0, sizeof(temp_setup));
|
||||
temp_setup.target = prsc->target;
|
||||
temp_setup.format = prsc->format;
|
||||
temp_setup.width0 = box->width;
|
||||
temp_setup.height0 = box->height;
|
||||
temp_setup.depth0 = 1;
|
||||
temp_setup.array_size = 1;
|
||||
|
||||
return pctx->screen->resource_create(pctx->screen, &temp_setup);
|
||||
}
|
||||
|
||||
static void *
|
||||
vc4_resource_transfer_map(struct pipe_context *pctx,
|
||||
struct pipe_resource *prsc,
|
||||
@@ -101,7 +151,6 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
|
||||
{
|
||||
struct vc4_context *vc4 = vc4_context(pctx);
|
||||
struct vc4_resource *rsc = vc4_resource(prsc);
|
||||
struct vc4_resource_slice *slice = &rsc->slices[level];
|
||||
struct vc4_transfer *trans;
|
||||
struct pipe_transfer *ptrans;
|
||||
enum pipe_format format = prsc->format;
|
||||
@@ -155,6 +204,50 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
|
||||
ptrans->usage = usage;
|
||||
ptrans->box = *box;
|
||||
|
||||
/* If the resource is multisampled, we need to resolve to single
|
||||
* sample. This seems like it should be handled at a higher layer.
|
||||
*/
|
||||
if (prsc->nr_samples) {
|
||||
trans->ss_resource = vc4_get_temp_resource(pctx, prsc, box);
|
||||
if (!trans->ss_resource)
|
||||
goto fail;
|
||||
assert(!trans->ss_resource->nr_samples);
|
||||
|
||||
/* The ptrans->box gets modified for tile alignment, so save
|
||||
* the original box for unmap time.
|
||||
*/
|
||||
trans->ss_box = *box;
|
||||
|
||||
if (usage & PIPE_TRANSFER_READ) {
|
||||
struct pipe_blit_info blit;
|
||||
memset(&blit, 0, sizeof(blit));
|
||||
|
||||
blit.src.resource = ptrans->resource;
|
||||
blit.src.format = ptrans->resource->format;
|
||||
blit.src.level = ptrans->level;
|
||||
blit.src.box = trans->ss_box;
|
||||
|
||||
blit.dst.resource = trans->ss_resource;
|
||||
blit.dst.format = trans->ss_resource->format;
|
||||
blit.dst.box.width = trans->ss_box.width;
|
||||
blit.dst.box.height = trans->ss_box.height;
|
||||
blit.dst.box.depth = 1;
|
||||
|
||||
blit.mask = util_format_get_mask(prsc->format);
|
||||
blit.filter = PIPE_TEX_FILTER_NEAREST;
|
||||
|
||||
pctx->blit(pctx, &blit);
|
||||
vc4_flush(pctx);
|
||||
}
|
||||
|
||||
/* The rest of the mapping process should use our temporary. */
|
||||
prsc = trans->ss_resource;
|
||||
rsc = vc4_resource(prsc);
|
||||
ptrans->box.x = 0;
|
||||
ptrans->box.y = 0;
|
||||
ptrans->box.z = 0;
|
||||
}
|
||||
|
||||
/* Note that the current kernel implementation is synchronous, so no
|
||||
* need to do syncing stuff here yet.
|
||||
*/
|
||||
@@ -170,6 +263,7 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
|
||||
|
||||
*pptrans = ptrans;
|
||||
|
||||
struct vc4_resource_slice *slice = &rsc->slices[level];
|
||||
if (rsc->tiled) {
|
||||
uint32_t utile_w = vc4_utile_width(rsc->cpp);
|
||||
uint32_t utile_h = vc4_utile_height(rsc->cpp);
|
||||
@@ -203,7 +297,7 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
|
||||
ptrans->box.height != orig_height) {
|
||||
vc4_load_tiled_image(trans->map, ptrans->stride,
|
||||
buf + slice->offset +
|
||||
box->z * rsc->cube_map_stride,
|
||||
ptrans->box.z * rsc->cube_map_stride,
|
||||
slice->stride,
|
||||
slice->tiling, rsc->cpp,
|
||||
&ptrans->box);
|
||||
@@ -216,9 +310,9 @@ vc4_resource_transfer_map(struct pipe_context *pctx,
|
||||
ptrans->layer_stride = ptrans->stride;
|
||||
|
||||
return buf + slice->offset +
|
||||
box->y / util_format_get_blockheight(format) * ptrans->stride +
|
||||
box->x / util_format_get_blockwidth(format) * rsc->cpp +
|
||||
box->z * rsc->cube_map_stride;
|
||||
ptrans->box.y / util_format_get_blockheight(format) * ptrans->stride +
|
||||
ptrans->box.x / util_format_get_blockwidth(format) * rsc->cpp +
|
||||
ptrans->box.z * rsc->cube_map_stride;
|
||||
}
|
||||
|
||||
|
||||
@@ -283,7 +377,13 @@ vc4_setup_slices(struct vc4_resource *rsc)
|
||||
|
||||
if (!rsc->tiled) {
|
||||
slice->tiling = VC4_TILING_FORMAT_LINEAR;
|
||||
level_width = align(level_width, utile_w);
|
||||
if (prsc->nr_samples) {
|
||||
/* MSAA (4x) surfaces are stored as raw tile buffer contents. */
|
||||
level_width = align(level_width, 32);
|
||||
level_height = align(level_height, 32);
|
||||
} else {
|
||||
level_width = align(level_width, utile_w);
|
||||
}
|
||||
} else {
|
||||
if (vc4_size_is_lt(level_width, level_height,
|
||||
rsc->cpp)) {
|
||||
@@ -300,7 +400,8 @@ vc4_setup_slices(struct vc4_resource *rsc)
|
||||
}
|
||||
|
||||
slice->offset = offset;
|
||||
slice->stride = level_width * rsc->cpp;
|
||||
slice->stride = (level_width * rsc->cpp *
|
||||
MAX2(prsc->nr_samples, 1));
|
||||
slice->size = level_height * slice->stride;
|
||||
|
||||
offset += slice->size;
|
||||
@@ -357,7 +458,10 @@ vc4_resource_setup(struct pipe_screen *pscreen,
|
||||
prsc->screen = pscreen;
|
||||
|
||||
rsc->base.vtbl = &vc4_resource_vtbl;
|
||||
rsc->cpp = util_format_get_blocksize(tmpl->format);
|
||||
if (prsc->nr_samples == 0)
|
||||
rsc->cpp = util_format_get_blocksize(tmpl->format);
|
||||
else
|
||||
rsc->cpp = sizeof(uint32_t);
|
||||
|
||||
assert(rsc->cpp);
|
||||
|
||||
@@ -371,8 +475,12 @@ get_resource_texture_format(struct pipe_resource *prsc)
|
||||
uint8_t format = vc4_get_tex_format(prsc->format);
|
||||
|
||||
if (!rsc->tiled) {
|
||||
assert(format == VC4_TEXTURE_TYPE_RGBA8888);
|
||||
return VC4_TEXTURE_TYPE_RGBA32R;
|
||||
if (prsc->nr_samples) {
|
||||
return ~0;
|
||||
} else {
|
||||
assert(format == VC4_TEXTURE_TYPE_RGBA8888);
|
||||
return VC4_TEXTURE_TYPE_RGBA32R;
|
||||
}
|
||||
}
|
||||
|
||||
return format;
|
||||
@@ -389,6 +497,7 @@ vc4_resource_create(struct pipe_screen *pscreen,
|
||||
* communicate metadata about tiling currently.
|
||||
*/
|
||||
if (tmpl->target == PIPE_BUFFER ||
|
||||
tmpl->nr_samples ||
|
||||
(tmpl->bind & (PIPE_BIND_SCANOUT |
|
||||
PIPE_BIND_LINEAR |
|
||||
PIPE_BIND_SHARED |
|
||||
@@ -492,13 +601,9 @@ vc4_surface_destroy(struct pipe_context *pctx, struct pipe_surface *psurf)
|
||||
FREE(psurf);
|
||||
}
|
||||
|
||||
/** Debug routine to dump the contents of an 8888 surface to the console */
|
||||
void
|
||||
vc4_dump_surface(struct pipe_surface *psurf)
|
||||
static void
|
||||
vc4_dump_surface_non_msaa(struct pipe_surface *psurf)
|
||||
{
|
||||
if (!psurf)
|
||||
return;
|
||||
|
||||
struct pipe_resource *prsc = psurf->texture;
|
||||
struct vc4_resource *rsc = vc4_resource(prsc);
|
||||
uint32_t *map = vc4_bo_map(rsc->bo);
|
||||
@@ -592,6 +697,147 @@ vc4_dump_surface(struct pipe_surface *psurf)
|
||||
}
|
||||
}
|
||||
|
||||
static uint32_t
|
||||
vc4_surface_msaa_get_sample(struct pipe_surface *psurf,
|
||||
uint32_t x, uint32_t y, uint32_t sample)
|
||||
{
|
||||
struct pipe_resource *prsc = psurf->texture;
|
||||
struct vc4_resource *rsc = vc4_resource(prsc);
|
||||
uint32_t tile_w = 32, tile_h = 32;
|
||||
uint32_t tiles_w = DIV_ROUND_UP(psurf->width, 32);
|
||||
|
||||
uint32_t tile_x = x / tile_w;
|
||||
uint32_t tile_y = y / tile_h;
|
||||
uint32_t *tile = (vc4_bo_map(rsc->bo) +
|
||||
VC4_TILE_BUFFER_SIZE * (tile_y * tiles_w + tile_x));
|
||||
uint32_t subtile_x = x % tile_w;
|
||||
uint32_t subtile_y = y % tile_h;
|
||||
|
||||
uint32_t quad_samples = VC4_MAX_SAMPLES * 4;
|
||||
uint32_t tile_stride = quad_samples * tile_w / 2;
|
||||
|
||||
return *((uint32_t *)tile +
|
||||
(subtile_y >> 1) * tile_stride +
|
||||
(subtile_x >> 1) * quad_samples +
|
||||
((subtile_y & 1) << 1) +
|
||||
(subtile_x & 1) +
|
||||
sample);
|
||||
}
|
||||
|
||||
static void
|
||||
vc4_dump_surface_msaa_char(struct pipe_surface *psurf,
|
||||
uint32_t start_x, uint32_t start_y,
|
||||
uint32_t w, uint32_t h)
|
||||
{
|
||||
bool all_same_color = true;
|
||||
uint32_t all_pix = 0;
|
||||
|
||||
for (int y = start_y; y < start_y + h; y++) {
|
||||
for (int x = start_x; x < start_x + w; x++) {
|
||||
for (int s = 0; s < VC4_MAX_SAMPLES; s++) {
|
||||
uint32_t pix = vc4_surface_msaa_get_sample(psurf,
|
||||
x, y,
|
||||
s);
|
||||
if (x == start_x && y == start_y)
|
||||
all_pix = pix;
|
||||
else if (all_pix != pix)
|
||||
all_same_color = false;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (all_same_color) {
|
||||
static const struct {
|
||||
uint32_t val;
|
||||
const char *c;
|
||||
} named_colors[] = {
|
||||
{ 0xff000000, "█" },
|
||||
{ 0x00000000, "█" },
|
||||
{ 0xffff0000, "r" },
|
||||
{ 0xff00ff00, "g" },
|
||||
{ 0xff0000ff, "b" },
|
||||
{ 0xffffffff, "w" },
|
||||
};
|
||||
int i;
|
||||
for (i = 0; i < ARRAY_SIZE(named_colors); i++) {
|
||||
if (named_colors[i].val == all_pix) {
|
||||
fprintf(stderr, "%s",
|
||||
named_colors[i].c);
|
||||
return;
|
||||
}
|
||||
}
|
||||
fprintf(stderr, "x");
|
||||
} else {
|
||||
fprintf(stderr, ".");
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
vc4_dump_surface_msaa(struct pipe_surface *psurf)
|
||||
{
|
||||
uint32_t tile_w = 32, tile_h = 32;
|
||||
uint32_t tiles_w = DIV_ROUND_UP(psurf->width, tile_w);
|
||||
uint32_t tiles_h = DIV_ROUND_UP(psurf->height, tile_h);
|
||||
uint32_t char_w = 140, char_h = 60;
|
||||
uint32_t char_w_per_tile = char_w / tiles_w - 1;
|
||||
uint32_t char_h_per_tile = char_h / tiles_h - 1;
|
||||
uint32_t found_colors[10];
|
||||
uint32_t num_found_colors = 0;
|
||||
|
||||
fprintf(stderr, "Surface: %dx%d (%dx MSAA)\n",
|
||||
psurf->width, psurf->height, psurf->texture->nr_samples);
|
||||
|
||||
for (int x = 0; x < (char_w_per_tile + 1) * tiles_w; x++)
|
||||
fprintf(stderr, "-");
|
||||
fprintf(stderr, "\n");
|
||||
|
||||
for (int ty = 0; ty < psurf->height; ty += tile_h) {
|
||||
for (int y = 0; y < char_h_per_tile; y++) {
|
||||
|
||||
for (int tx = 0; tx < psurf->width; tx += tile_w) {
|
||||
for (int x = 0; x < char_w_per_tile; x++) {
|
||||
uint32_t bx1 = (x * tile_w /
|
||||
char_w_per_tile);
|
||||
uint32_t bx2 = ((x + 1) * tile_w /
|
||||
char_w_per_tile);
|
||||
uint32_t by1 = (y * tile_h /
|
||||
char_h_per_tile);
|
||||
uint32_t by2 = ((y + 1) * tile_h /
|
||||
char_h_per_tile);
|
||||
|
||||
vc4_dump_surface_msaa_char(psurf,
|
||||
tx + bx1,
|
||||
ty + by1,
|
||||
bx2 - bx1,
|
||||
by2 - by1);
|
||||
}
|
||||
fprintf(stderr, "|");
|
||||
}
|
||||
fprintf(stderr, "\n");
|
||||
}
|
||||
|
||||
for (int x = 0; x < (char_w_per_tile + 1) * tiles_w; x++)
|
||||
fprintf(stderr, "-");
|
||||
fprintf(stderr, "\n");
|
||||
}
|
||||
|
||||
for (int i = 0; i < num_found_colors; i++) {
|
||||
fprintf(stderr, "color %d: 0x%08x\n", i, found_colors[i]);
|
||||
}
|
||||
}
|
||||
|
||||
/** Debug routine to dump the contents of an 8888 surface to the console */
|
||||
void
|
||||
vc4_dump_surface(struct pipe_surface *psurf)
|
||||
{
|
||||
if (!psurf)
|
||||
return;
|
||||
|
||||
if (psurf->texture->nr_samples)
|
||||
vc4_dump_surface_msaa(psurf);
|
||||
else
|
||||
vc4_dump_surface_non_msaa(psurf);
|
||||
}
|
||||
|
||||
static void
|
||||
vc4_flush_resource(struct pipe_context *pctx, struct pipe_resource *resource)
|
||||
{
|
||||
|
@@ -32,6 +32,9 @@
|
||||
struct vc4_transfer {
|
||||
struct pipe_transfer base;
|
||||
void *map;
|
||||
|
||||
struct pipe_resource *ss_resource;
|
||||
struct pipe_box ss_box;
|
||||
};
|
||||
|
||||
struct vc4_resource_slice {
|
||||
|
@@ -95,6 +95,7 @@ vc4_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
|
||||
case PIPE_CAP_BLEND_EQUATION_SEPARATE:
|
||||
case PIPE_CAP_TWO_SIDED_STENCIL:
|
||||
case PIPE_CAP_USER_INDEX_BUFFERS:
|
||||
case PIPE_CAP_TEXTURE_MULTISAMPLE:
|
||||
return 1;
|
||||
|
||||
/* lying for GL 2.0 */
|
||||
@@ -140,7 +141,6 @@ vc4_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
|
||||
case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
|
||||
case PIPE_CAP_CONDITIONAL_RENDER:
|
||||
case PIPE_CAP_PRIMITIVE_RESTART:
|
||||
case PIPE_CAP_TEXTURE_MULTISAMPLE:
|
||||
case PIPE_CAP_TEXTURE_BARRIER:
|
||||
case PIPE_CAP_SM3:
|
||||
case PIPE_CAP_INDEP_BLEND_ENABLE:
|
||||
@@ -358,7 +358,6 @@ vc4_screen_is_format_supported(struct pipe_screen *pscreen,
|
||||
unsigned retval = 0;
|
||||
|
||||
if ((target >= PIPE_MAX_TEXTURE_TYPES) ||
|
||||
(sample_count > 1) ||
|
||||
!util_format_is_supported(format, usage)) {
|
||||
return FALSE;
|
||||
}
|
||||
@@ -417,11 +416,13 @@ vc4_screen_is_format_supported(struct pipe_screen *pscreen,
|
||||
}
|
||||
|
||||
if ((usage & PIPE_BIND_RENDER_TARGET) &&
|
||||
(sample_count == 0 || sample_count == VC4_MAX_SAMPLES) &&
|
||||
vc4_rt_format_supported(format)) {
|
||||
retval |= PIPE_BIND_RENDER_TARGET;
|
||||
}
|
||||
|
||||
if ((usage & PIPE_BIND_SAMPLER_VIEW) &&
|
||||
(sample_count == 0 || sample_count == VC4_MAX_SAMPLES) &&
|
||||
(vc4_tex_format_supported(format))) {
|
||||
retval |= PIPE_BIND_SAMPLER_VIEW;
|
||||
}
|
||||
|
@@ -65,7 +65,7 @@ struct drm_device {
|
||||
};
|
||||
|
||||
struct drm_gem_object {
|
||||
uint32_t size;
|
||||
size_t size;
|
||||
struct drm_device *dev;
|
||||
};
|
||||
|
||||
|
@@ -79,7 +79,7 @@ static void
|
||||
vc4_set_sample_mask(struct pipe_context *pctx, unsigned sample_mask)
|
||||
{
|
||||
struct vc4_context *vc4 = vc4_context(pctx);
|
||||
vc4->sample_mask = (uint16_t)sample_mask;
|
||||
vc4->sample_mask = sample_mask & ((1 << VC4_MAX_SAMPLES) - 1);
|
||||
vc4->dirty |= VC4_DIRTY_SAMPLE_MASK;
|
||||
}
|
||||
|
||||
@@ -121,6 +121,9 @@ vc4_create_rasterizer_state(struct pipe_context *pctx,
|
||||
so->offset_factor = float_to_187_half(cso->offset_scale);
|
||||
}
|
||||
|
||||
if (cso->multisample)
|
||||
so->config_bits[0] |= VC4_CONFIG_BITS_RASTERIZER_OVERSAMPLE_4X;
|
||||
|
||||
return so;
|
||||
}
|
||||
|
||||
@@ -457,6 +460,22 @@ vc4_set_framebuffer_state(struct pipe_context *pctx,
|
||||
rsc->cpp);
|
||||
}
|
||||
|
||||
vc4->msaa = false;
|
||||
if (cso->cbufs[0])
|
||||
vc4->msaa = cso->cbufs[0]->texture->nr_samples != 0;
|
||||
else if (cso->zsbuf)
|
||||
vc4->msaa = cso->zsbuf->texture->nr_samples != 0;
|
||||
|
||||
if (vc4->msaa) {
|
||||
vc4->tile_width = 32;
|
||||
vc4->tile_height = 32;
|
||||
} else {
|
||||
vc4->tile_width = 64;
|
||||
vc4->tile_height = 64;
|
||||
}
|
||||
vc4->draw_tiles_x = DIV_ROUND_UP(cso->width, vc4->tile_width);
|
||||
vc4->draw_tiles_y = DIV_ROUND_UP(cso->height, vc4->tile_height);
|
||||
|
||||
vc4->dirty |= VC4_DIRTY_FRAMEBUFFER;
|
||||
}
|
||||
|
||||
|
@@ -71,6 +71,18 @@ write_texture_p2(struct vc4_context *vc4,
|
||||
VC4_SET_FIELD((data >> 16) & 1, VC4_TEX_P2_BSLOD));
|
||||
}
|
||||
|
||||
static void
|
||||
write_texture_msaa_addr(struct vc4_context *vc4,
|
||||
struct vc4_cl_out **uniforms,
|
||||
struct vc4_texture_stateobj *texstate,
|
||||
uint32_t unit)
|
||||
{
|
||||
struct pipe_sampler_view *texture = texstate->textures[unit];
|
||||
struct vc4_resource *rsc = vc4_resource(texture->texture);
|
||||
|
||||
cl_aligned_reloc(vc4, &vc4->uniforms, uniforms, rsc->bo, 0);
|
||||
}
|
||||
|
||||
|
||||
#define SWIZ(x,y,z,w) { \
|
||||
UTIL_FORMAT_SWIZZLE_##x, \
|
||||
@@ -244,6 +256,11 @@ vc4_write_uniforms(struct vc4_context *vc4, struct vc4_compiled_shader *shader,
|
||||
cl_aligned_reloc(vc4, &vc4->uniforms, &uniforms, ubo, 0);
|
||||
break;
|
||||
|
||||
case QUNIFORM_TEXTURE_MSAA_ADDR:
|
||||
write_texture_msaa_addr(vc4, &uniforms,
|
||||
texstate, uinfo->data[i]);
|
||||
break;
|
||||
|
||||
case QUNIFORM_TEXTURE_BORDER_COLOR:
|
||||
write_texture_border_color(vc4, &uniforms,
|
||||
texstate, uinfo->data[i]);
|
||||
@@ -303,6 +320,10 @@ vc4_write_uniforms(struct vc4_context *vc4, struct vc4_compiled_shader *shader,
|
||||
cl_aligned_f(&uniforms,
|
||||
vc4->zsa->base.alpha.ref_value);
|
||||
break;
|
||||
|
||||
case QUNIFORM_SAMPLE_MASK:
|
||||
cl_aligned_u32(&uniforms, vc4->sample_mask);
|
||||
break;
|
||||
}
|
||||
#if 0
|
||||
uint32_t written_val = *((uint32_t *)uniforms - 1);
|
||||
@@ -345,6 +366,7 @@ vc4_set_shader_uniform_dirty_flags(struct vc4_compiled_shader *shader)
|
||||
case QUNIFORM_TEXTURE_CONFIG_P1:
|
||||
case QUNIFORM_TEXTURE_CONFIG_P2:
|
||||
case QUNIFORM_TEXTURE_BORDER_COLOR:
|
||||
case QUNIFORM_TEXTURE_MSAA_ADDR:
|
||||
case QUNIFORM_TEXRECT_SCALE_X:
|
||||
case QUNIFORM_TEXRECT_SCALE_Y:
|
||||
dirty |= VC4_DIRTY_TEXSTATE;
|
||||
@@ -363,6 +385,10 @@ vc4_set_shader_uniform_dirty_flags(struct vc4_compiled_shader *shader)
|
||||
case QUNIFORM_ALPHA_REF:
|
||||
dirty |= VC4_DIRTY_ZSA;
|
||||
break;
|
||||
|
||||
case QUNIFORM_SAMPLE_MASK:
|
||||
dirty |= VC4_DIRTY_SAMPLE_MASK;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
|
@@ -28,10 +28,14 @@
|
||||
|
||||
#include "pipe/p_screen.h"
|
||||
|
||||
#include "util/u_video.h"
|
||||
|
||||
#include "vl/vl_winsys.h"
|
||||
|
||||
#include "va_private.h"
|
||||
|
||||
DEBUG_GET_ONCE_BOOL_OPTION(mpeg4, "VAAPI_MPEG4_ENABLED", false)
|
||||
|
||||
VAStatus
|
||||
vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int *num_profiles)
|
||||
{
|
||||
@@ -45,12 +49,16 @@ vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int *num_
|
||||
*num_profiles = 0;
|
||||
|
||||
pscreen = VL_VA_PSCREEN(ctx);
|
||||
for (p = PIPE_VIDEO_PROFILE_MPEG2_SIMPLE; p <= PIPE_VIDEO_PROFILE_HEVC_MAIN_444; ++p)
|
||||
for (p = PIPE_VIDEO_PROFILE_MPEG2_SIMPLE; p <= PIPE_VIDEO_PROFILE_HEVC_MAIN_444; ++p) {
|
||||
if (u_reduce_video_profile(p) == PIPE_VIDEO_FORMAT_MPEG4 && !debug_get_option_mpeg4())
|
||||
continue;
|
||||
|
||||
if (pscreen->get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, PIPE_VIDEO_CAP_SUPPORTED)) {
|
||||
vap = PipeToProfile(p);
|
||||
if (vap != VAProfileNone)
|
||||
profile_list[(*num_profiles)++] = vap;
|
||||
}
|
||||
}
|
||||
|
||||
/* Support postprocessing through vl_compositor */
|
||||
profile_list[(*num_profiles)++] = VAProfileNone;
|
||||
|
@@ -1737,7 +1737,7 @@ ast_function_expression::handle_method(exec_list *instructions,
|
||||
result = new(ctx) ir_constant(op->type->array_size());
|
||||
}
|
||||
} else if (op->type->is_vector()) {
|
||||
if (state->ARB_shading_language_420pack_enable) {
|
||||
if (state->has_420pack()) {
|
||||
/* .length() returns int. */
|
||||
result = new(ctx) ir_constant((int) op->type->vector_elements);
|
||||
} else {
|
||||
@@ -1746,7 +1746,7 @@ ast_function_expression::handle_method(exec_list *instructions,
|
||||
goto fail;
|
||||
}
|
||||
} else if (op->type->is_matrix()) {
|
||||
if (state->ARB_shading_language_420pack_enable) {
|
||||
if (state->has_420pack()) {
|
||||
/* .length() returns int. */
|
||||
result = new(ctx) ir_constant((int) op->type->matrix_columns);
|
||||
} else {
|
||||
@@ -2075,7 +2075,7 @@ ast_aggregate_initializer::hir(exec_list *instructions,
|
||||
}
|
||||
const glsl_type *const constructor_type = this->constructor_type;
|
||||
|
||||
if (!state->ARB_shading_language_420pack_enable) {
|
||||
if (!state->has_420pack()) {
|
||||
_mesa_glsl_error(&loc, state, "C-style initialization requires the "
|
||||
"GL_ARB_shading_language_420pack extension");
|
||||
return ir_rvalue::error_value(ctx);
|
||||
|
@@ -2649,7 +2649,9 @@ apply_explicit_binding(struct _mesa_glsl_parse_state *state,
|
||||
|
||||
return;
|
||||
}
|
||||
} else if (state->is_version(420, 310) && base_type->is_image()) {
|
||||
} else if ((state->is_version(420, 310) ||
|
||||
state->ARB_shading_language_420pack_enable) &&
|
||||
base_type->is_image()) {
|
||||
assert(ctx->Const.MaxImageUnits <= MAX_IMAGE_UNITS);
|
||||
if (max_index >= ctx->Const.MaxImageUnits) {
|
||||
_mesa_glsl_error(loc, state, "Image binding %d exceeds the "
|
||||
@@ -3736,7 +3738,7 @@ process_initializer(ir_variable *var, ast_declaration *decl,
|
||||
* expressions. Const-qualified global variables must still be
|
||||
* initialized with constant expressions.
|
||||
*/
|
||||
if (!state->ARB_shading_language_420pack_enable
|
||||
if (!state->has_420pack()
|
||||
|| state->current_function == NULL) {
|
||||
_mesa_glsl_error(& initializer_loc, state,
|
||||
"initializer of %s variable `%s' must be a "
|
||||
@@ -5365,7 +5367,7 @@ ast_jump_statement::hir(exec_list *instructions,
|
||||
if (state->current_function->return_type != ret_type) {
|
||||
YYLTYPE loc = this->get_location();
|
||||
|
||||
if (state->ARB_shading_language_420pack_enable) {
|
||||
if (state->has_420pack()) {
|
||||
if (!apply_implicit_conversion(state->current_function->return_type,
|
||||
ret, state)) {
|
||||
_mesa_glsl_error(& loc, state,
|
||||
|
@@ -948,7 +948,7 @@ parameter_qualifier:
|
||||
if (($1.flags.q.in || $1.flags.q.out) && ($2.flags.q.in || $2.flags.q.out))
|
||||
_mesa_glsl_error(&@1, state, "duplicate in/out/inout qualifier");
|
||||
|
||||
if (!state->has_420pack() && $2.flags.q.constant)
|
||||
if (!state->has_420pack_or_es31() && $2.flags.q.constant)
|
||||
_mesa_glsl_error(&@1, state, "in/out/inout must come after const "
|
||||
"or precise");
|
||||
|
||||
@@ -960,7 +960,7 @@ parameter_qualifier:
|
||||
if ($2.precision != ast_precision_none)
|
||||
_mesa_glsl_error(&@1, state, "duplicate precision qualifier");
|
||||
|
||||
if (!(state->has_420pack() || state->is_version(420, 310)) &&
|
||||
if (!state->has_420pack_or_es31() &&
|
||||
$2.flags.i != 0)
|
||||
_mesa_glsl_error(&@1, state, "precision qualifiers must come last");
|
||||
|
||||
@@ -1482,7 +1482,7 @@ layout_qualifier_id:
|
||||
$$.index = $3;
|
||||
}
|
||||
|
||||
if ((state->has_420pack() ||
|
||||
if ((state->has_420pack_or_es31() ||
|
||||
state->has_atomic_counters() ||
|
||||
state->has_shader_storage_buffer_objects()) &&
|
||||
match_layout_qualifier("binding", $1, state) == 0) {
|
||||
@@ -1714,7 +1714,7 @@ type_qualifier:
|
||||
if ($2.flags.q.invariant)
|
||||
_mesa_glsl_error(&@1, state, "duplicate \"invariant\" qualifier");
|
||||
|
||||
if (!state->has_420pack() && $2.flags.q.precise)
|
||||
if (!state->has_420pack_or_es31() && $2.flags.q.precise)
|
||||
_mesa_glsl_error(&@1, state,
|
||||
"\"invariant\" must come after \"precise\"");
|
||||
|
||||
@@ -1747,7 +1747,7 @@ type_qualifier:
|
||||
if ($2.has_interpolation())
|
||||
_mesa_glsl_error(&@1, state, "duplicate interpolation qualifier");
|
||||
|
||||
if (!state->has_420pack() &&
|
||||
if (!state->has_420pack_or_es31() &&
|
||||
($2.flags.q.precise || $2.flags.q.invariant)) {
|
||||
_mesa_glsl_error(&@1, state, "interpolation qualifiers must come "
|
||||
"after \"precise\" or \"invariant\"");
|
||||
@@ -1767,7 +1767,7 @@ type_qualifier:
|
||||
* precise qualifiers since these are useful in ARB_separate_shader_objects.
|
||||
* There is no clear spec guidance on this either.
|
||||
*/
|
||||
if (!state->has_420pack() && $2.has_layout())
|
||||
if (!state->has_420pack_or_es31() && $2.has_layout())
|
||||
_mesa_glsl_error(&@1, state, "duplicate layout(...) qualifiers");
|
||||
|
||||
$$ = $1;
|
||||
@@ -1785,7 +1785,7 @@ type_qualifier:
|
||||
"duplicate auxiliary storage qualifier (centroid or sample)");
|
||||
}
|
||||
|
||||
if (!state->has_420pack() &&
|
||||
if (!state->has_420pack_or_es31() &&
|
||||
($2.flags.q.precise || $2.flags.q.invariant ||
|
||||
$2.has_interpolation() || $2.has_layout())) {
|
||||
_mesa_glsl_error(&@1, state, "auxiliary storage qualifiers must come "
|
||||
@@ -1803,7 +1803,7 @@ type_qualifier:
|
||||
if ($2.has_storage())
|
||||
_mesa_glsl_error(&@1, state, "duplicate storage qualifier");
|
||||
|
||||
if (!state->has_420pack() &&
|
||||
if (!state->has_420pack_or_es31() &&
|
||||
($2.flags.q.precise || $2.flags.q.invariant || $2.has_interpolation() ||
|
||||
$2.has_layout() || $2.has_auxiliary_storage())) {
|
||||
_mesa_glsl_error(&@1, state, "storage qualifiers must come after "
|
||||
@@ -1819,7 +1819,7 @@ type_qualifier:
|
||||
if ($2.precision != ast_precision_none)
|
||||
_mesa_glsl_error(&@1, state, "duplicate precision qualifier");
|
||||
|
||||
if (!(state->has_420pack() || state->is_version(420, 310)) &&
|
||||
if (!(state->has_420pack_or_es31()) &&
|
||||
$2.flags.i != 0)
|
||||
_mesa_glsl_error(&@1, state, "precision qualifiers must come last");
|
||||
|
||||
@@ -2575,7 +2575,7 @@ interface_block:
|
||||
{
|
||||
ast_interface_block *block = (ast_interface_block *) $2;
|
||||
|
||||
if (!state->has_420pack() && block->layout.has_layout() &&
|
||||
if (!state->has_420pack_or_es31() && block->layout.has_layout() &&
|
||||
!block->layout.is_default_qualifier) {
|
||||
_mesa_glsl_error(&@1, state, "duplicate layout(...) qualifiers");
|
||||
YYERROR;
|
||||
|
@@ -255,6 +255,11 @@ struct _mesa_glsl_parse_state {
|
||||
return ARB_shading_language_420pack_enable || is_version(420, 0);
|
||||
}
|
||||
|
||||
bool has_420pack_or_es31() const
|
||||
{
|
||||
return ARB_shading_language_420pack_enable || is_version(420, 310);
|
||||
}
|
||||
|
||||
bool has_compute_shader() const
|
||||
{
|
||||
return ARB_compute_shader_enable || is_version(430, 310);
|
||||
|
@@ -57,8 +57,7 @@ _mesa_ast_field_selection_to_hir(const ast_expression *expr,
|
||||
expr->primary_expression.identifier);
|
||||
}
|
||||
} else if (op->type->is_vector() ||
|
||||
(state->ARB_shading_language_420pack_enable &&
|
||||
op->type->is_scalar())) {
|
||||
(state->has_420pack() && op->type->is_scalar())) {
|
||||
ir_swizzle *swiz = ir_swizzle::create(op,
|
||||
expr->primary_expression.identifier,
|
||||
op->type->vector_elements);
|
||||
|
@@ -1669,6 +1669,7 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name,
|
||||
this->data.pixel_center_integer = false;
|
||||
this->data.depth_layout = ir_depth_layout_none;
|
||||
this->data.used = false;
|
||||
this->data.always_active_io = false;
|
||||
this->data.read_only = false;
|
||||
this->data.centroid = false;
|
||||
this->data.sample = false;
|
||||
|
@@ -658,6 +658,13 @@ public:
|
||||
*/
|
||||
unsigned assigned:1;
|
||||
|
||||
/**
|
||||
* When separate shader programs are enabled, only input/outputs between
|
||||
* the stages of a multi-stage separate program can be safely removed
|
||||
* from the shader interface. Other input/outputs must remains active.
|
||||
*/
|
||||
unsigned always_active_io:1;
|
||||
|
||||
/**
|
||||
* Enum indicating how the variable was declared. See
|
||||
* ir_var_declaration_type.
|
||||
|
@@ -766,7 +766,7 @@ public:
|
||||
gl_shader_stage consumer_stage);
|
||||
~varying_matches();
|
||||
void record(ir_variable *producer_var, ir_variable *consumer_var);
|
||||
unsigned assign_locations(uint64_t reserved_slots);
|
||||
unsigned assign_locations(uint64_t reserved_slots, bool separate_shader);
|
||||
void store_locations() const;
|
||||
|
||||
private:
|
||||
@@ -986,11 +986,36 @@ varying_matches::record(ir_variable *producer_var, ir_variable *consumer_var)
|
||||
* passed to varying_matches::record().
|
||||
*/
|
||||
unsigned
|
||||
varying_matches::assign_locations(uint64_t reserved_slots)
|
||||
varying_matches::assign_locations(uint64_t reserved_slots, bool separate_shader)
|
||||
{
|
||||
/* Sort varying matches into an order that makes them easy to pack. */
|
||||
qsort(this->matches, this->num_matches, sizeof(*this->matches),
|
||||
&varying_matches::match_comparator);
|
||||
/* We disable varying sorting for separate shader programs for the
|
||||
* following reasons:
|
||||
*
|
||||
* 1/ All programs must sort the code in the same order to guarantee the
|
||||
* interface matching. However varying_matches::record() will change the
|
||||
* interpolation qualifier of some stages.
|
||||
*
|
||||
* 2/ GLSL version 4.50 removes the matching constrain on the interpolation
|
||||
* qualifier.
|
||||
*
|
||||
* From Section 4.5 (Interpolation Qualifiers) of the GLSL 4.40 spec:
|
||||
*
|
||||
* "The type and presence of interpolation qualifiers of variables with
|
||||
* the same name declared in all linked shaders for the same cross-stage
|
||||
* interface must match, otherwise the link command will fail.
|
||||
*
|
||||
* When comparing an output from one stage to an input of a subsequent
|
||||
* stage, the input and output don't match if their interpolation
|
||||
* qualifiers (or lack thereof) are not the same."
|
||||
*
|
||||
* "It is a link-time error if, within the same stage, the interpolation
|
||||
* qualifiers of variables of the same name do not match."
|
||||
*/
|
||||
if (!separate_shader) {
|
||||
/* Sort varying matches into an order that makes them easy to pack. */
|
||||
qsort(this->matches, this->num_matches, sizeof(*this->matches),
|
||||
&varying_matches::match_comparator);
|
||||
}
|
||||
|
||||
unsigned generic_location = 0;
|
||||
unsigned generic_patch_location = MAX_VARYING*4;
|
||||
@@ -1590,7 +1615,8 @@ assign_varying_locations(struct gl_context *ctx,
|
||||
reserved_varying_slot(producer, ir_var_shader_out) |
|
||||
reserved_varying_slot(consumer, ir_var_shader_in);
|
||||
|
||||
const unsigned slots_used = matches.assign_locations(reserved_slots);
|
||||
const unsigned slots_used = matches.assign_locations(reserved_slots,
|
||||
prog->SeparateShader);
|
||||
matches.store_locations();
|
||||
|
||||
for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
|
||||
|
@@ -3940,6 +3940,77 @@ split_ubos_and_ssbos(void *mem_ctx,
|
||||
assert(*num_ubos + *num_ssbos == num_blocks);
|
||||
}
|
||||
|
||||
static void
|
||||
set_always_active_io(exec_list *ir, ir_variable_mode io_mode)
|
||||
{
|
||||
assert(io_mode == ir_var_shader_in || io_mode == ir_var_shader_out);
|
||||
|
||||
foreach_in_list(ir_instruction, node, ir) {
|
||||
ir_variable *const var = node->as_variable();
|
||||
|
||||
if (var == NULL || var->data.mode != io_mode)
|
||||
continue;
|
||||
|
||||
/* Don't set always active on builtins that haven't been redeclared */
|
||||
if (var->data.how_declared == ir_var_declared_implicitly)
|
||||
continue;
|
||||
|
||||
var->data.always_active_io = true;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* When separate shader programs are enabled, only input/outputs between
|
||||
* the stages of a multi-stage separate program can be safely removed
|
||||
* from the shader interface. Other inputs/outputs must remain active.
|
||||
*/
|
||||
static void
|
||||
disable_varying_optimizations_for_sso(struct gl_shader_program *prog)
|
||||
{
|
||||
unsigned first, last;
|
||||
assert(prog->SeparateShader);
|
||||
|
||||
first = MESA_SHADER_STAGES;
|
||||
last = 0;
|
||||
|
||||
/* Determine first and last stage. Excluding the compute stage */
|
||||
for (unsigned i = 0; i < MESA_SHADER_COMPUTE; i++) {
|
||||
if (!prog->_LinkedShaders[i])
|
||||
continue;
|
||||
if (first == MESA_SHADER_STAGES)
|
||||
first = i;
|
||||
last = i;
|
||||
}
|
||||
|
||||
if (first == MESA_SHADER_STAGES)
|
||||
return;
|
||||
|
||||
for (unsigned stage = 0; stage < MESA_SHADER_STAGES; stage++) {
|
||||
gl_shader *sh = prog->_LinkedShaders[stage];
|
||||
if (!sh)
|
||||
continue;
|
||||
|
||||
if (first == last) {
|
||||
/* For a single shader program only allow inputs to the vertex shader
|
||||
* and outputs from the fragment shader to be removed.
|
||||
*/
|
||||
if (stage != MESA_SHADER_VERTEX)
|
||||
set_always_active_io(sh->ir, ir_var_shader_in);
|
||||
if (stage != MESA_SHADER_FRAGMENT)
|
||||
set_always_active_io(sh->ir, ir_var_shader_out);
|
||||
} else {
|
||||
/* For multi-stage separate shader programs only allow inputs and
|
||||
* outputs between the shader stages to be removed as well as inputs
|
||||
* to the vertex shader and outputs from the fragment shader.
|
||||
*/
|
||||
if (stage == first && stage != MESA_SHADER_VERTEX)
|
||||
set_always_active_io(sh->ir, ir_var_shader_in);
|
||||
else if (stage == last && stage != MESA_SHADER_FRAGMENT)
|
||||
set_always_active_io(sh->ir, ir_var_shader_out);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
|
||||
{
|
||||
@@ -4199,6 +4270,9 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
|
||||
}
|
||||
}
|
||||
|
||||
if (prog->SeparateShader)
|
||||
disable_varying_optimizations_for_sso(prog);
|
||||
|
||||
if (!interstage_cross_validate_uniform_blocks(prog))
|
||||
goto done;
|
||||
|
||||
|
@@ -187,6 +187,7 @@ flatten_named_interface_blocks_declarations::run(exec_list *instructions)
|
||||
new_var->data.sample = iface_t->fields.structure[i].sample;
|
||||
new_var->data.patch = iface_t->fields.structure[i].patch;
|
||||
new_var->data.stream = var->data.stream;
|
||||
new_var->data.how_declared = var->data.how_declared;
|
||||
|
||||
new_var->init_interface_type(iface_t);
|
||||
hash_table_insert(interface_namespace, new_var,
|
||||
|
@@ -75,6 +75,20 @@ do_dead_code(exec_list *instructions, bool uniform_locations_assigned)
|
||||
|| !entry->declaration)
|
||||
continue;
|
||||
|
||||
/* Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.5
|
||||
* (Core Profile) spec says:
|
||||
*
|
||||
* "With separable program objects, interfaces between shader
|
||||
* stages may involve the outputs from one program object and the
|
||||
* inputs from a second program object. For such interfaces, it is
|
||||
* not possible to detect mismatches at link time, because the
|
||||
* programs are linked separately. When each such program is
|
||||
* linked, all inputs or outputs interfacing with another program
|
||||
* stage are treated as active."
|
||||
*/
|
||||
if (entry->var->data.always_active_io)
|
||||
continue;
|
||||
|
||||
if (!entry->assign_list.is_empty()) {
|
||||
/* Remove all the dead assignments to the variable we found.
|
||||
* Don't do so if it's a shader or function output, though.
|
||||
|
@@ -196,6 +196,24 @@ intel_update_state(struct gl_context * ctx, GLuint new_state)
|
||||
brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
|
||||
}
|
||||
|
||||
/* Resolve color for each active shader image. */
|
||||
for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
|
||||
const struct gl_shader *shader = ctx->_Shader->CurrentProgram[i] ?
|
||||
ctx->_Shader->CurrentProgram[i]->_LinkedShaders[i] : NULL;
|
||||
|
||||
if (unlikely(shader && shader->NumImages)) {
|
||||
for (unsigned j = 0; j < shader->NumImages; j++) {
|
||||
struct gl_image_unit *u = &ctx->ImageUnits[shader->ImageUnits[j]];
|
||||
tex_obj = intel_texture_object(u->TexObj);
|
||||
|
||||
if (tex_obj && tex_obj->mt) {
|
||||
intel_miptree_resolve_color(brw, tex_obj->mt);
|
||||
brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
_mesa_lock_context_textures(ctx);
|
||||
}
|
||||
|
||||
|
@@ -1434,14 +1434,12 @@ void brw_create_constant_surface(struct brw_context *brw,
|
||||
drm_intel_bo *bo,
|
||||
uint32_t offset,
|
||||
uint32_t size,
|
||||
uint32_t *out_offset,
|
||||
bool dword_pitch);
|
||||
uint32_t *out_offset);
|
||||
void brw_create_buffer_surface(struct brw_context *brw,
|
||||
drm_intel_bo *bo,
|
||||
uint32_t offset,
|
||||
uint32_t size,
|
||||
uint32_t *out_offset,
|
||||
bool dword_pitch);
|
||||
uint32_t *out_offset);
|
||||
void brw_update_buffer_texture_surface(struct gl_context *ctx,
|
||||
unsigned unit,
|
||||
uint32_t *surf_offset);
|
||||
@@ -1453,8 +1451,7 @@ brw_update_sol_surface(struct brw_context *brw,
|
||||
void brw_upload_ubo_surfaces(struct brw_context *brw,
|
||||
struct gl_shader *shader,
|
||||
struct brw_stage_state *stage_state,
|
||||
struct brw_stage_prog_data *prog_data,
|
||||
bool dword_pitch);
|
||||
struct brw_stage_prog_data *prog_data);
|
||||
void brw_upload_abo_surfaces(struct brw_context *brw,
|
||||
struct gl_shader *shader,
|
||||
struct brw_stage_state *stage_state,
|
||||
|
@@ -186,7 +186,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder &bld,
|
||||
* the redundant ones.
|
||||
*/
|
||||
fs_reg vec4_offset = vgrf(glsl_type::int_type);
|
||||
bld.ADD(vec4_offset, varying_offset, brw_imm_ud(const_offset & ~3));
|
||||
bld.ADD(vec4_offset, varying_offset, brw_imm_ud(const_offset & ~0xf));
|
||||
|
||||
int scale = 1;
|
||||
if (devinfo->gen == 4 && bld.dispatch_width() == 8) {
|
||||
@@ -218,7 +218,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder &bld,
|
||||
inst->mlen = 1 + bld.dispatch_width() / 8;
|
||||
}
|
||||
|
||||
bld.MOV(dst, offset(vec4_result, bld, (const_offset & 3) * scale));
|
||||
bld.MOV(dst, offset(vec4_result, bld, ((const_offset & 0xf) / 4) * scale));
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1999,10 +1999,12 @@ fs_visitor::demote_pull_constants()
|
||||
|
||||
/* Generate a pull load into dst. */
|
||||
if (inst->src[i].reladdr) {
|
||||
fs_reg indirect = ibld.vgrf(BRW_REGISTER_TYPE_D);
|
||||
ibld.MUL(indirect, *inst->src[i].reladdr, brw_imm_d(4));
|
||||
VARYING_PULL_CONSTANT_LOAD(ibld, dst,
|
||||
brw_imm_ud(index),
|
||||
*inst->src[i].reladdr,
|
||||
pull_index);
|
||||
indirect,
|
||||
pull_index * 4);
|
||||
inst->src[i].reladdr = NULL;
|
||||
inst->src[i].stride = 1;
|
||||
} else {
|
||||
@@ -3038,13 +3040,11 @@ fs_visitor::lower_uniform_pull_constant_loads()
|
||||
continue;
|
||||
|
||||
if (devinfo->gen >= 7) {
|
||||
/* The offset arg before was a vec4-aligned byte offset. We need to
|
||||
* turn it into a dword offset.
|
||||
*/
|
||||
/* The offset arg is a vec4-aligned immediate byte offset. */
|
||||
fs_reg const_offset_reg = inst->src[1];
|
||||
assert(const_offset_reg.file == IMM &&
|
||||
const_offset_reg.type == BRW_REGISTER_TYPE_UD);
|
||||
const_offset_reg.ud /= 4;
|
||||
assert(const_offset_reg.ud % 16 == 0);
|
||||
|
||||
fs_reg payload, offset;
|
||||
if (devinfo->gen >= 9) {
|
||||
|
@@ -1101,28 +1101,6 @@ fs_visitor::nir_emit_undef(const fs_builder &bld, nir_ssa_undef_instr *instr)
|
||||
instr->def.num_components);
|
||||
}
|
||||
|
||||
static fs_reg
|
||||
fs_reg_for_nir_reg(fs_visitor *v, nir_register *nir_reg,
|
||||
unsigned base_offset, nir_src *indirect)
|
||||
{
|
||||
fs_reg reg;
|
||||
|
||||
assert(!nir_reg->is_global);
|
||||
|
||||
reg = v->nir_locals[nir_reg->index];
|
||||
|
||||
reg = offset(reg, v->bld, base_offset * nir_reg->num_components);
|
||||
if (indirect) {
|
||||
int multiplier = nir_reg->num_components * (v->dispatch_width / 8);
|
||||
|
||||
reg.reladdr = new(v->mem_ctx) fs_reg(v->vgrf(glsl_type::int_type));
|
||||
v->bld.MUL(*reg.reladdr, v->get_nir_src(*indirect),
|
||||
brw_imm_d(multiplier));
|
||||
}
|
||||
|
||||
return reg;
|
||||
}
|
||||
|
||||
fs_reg
|
||||
fs_visitor::get_nir_src(nir_src src)
|
||||
{
|
||||
@@ -1130,8 +1108,10 @@ fs_visitor::get_nir_src(nir_src src)
|
||||
if (src.is_ssa) {
|
||||
reg = nir_ssa_values[src.ssa->index];
|
||||
} else {
|
||||
reg = fs_reg_for_nir_reg(this, src.reg.reg, src.reg.base_offset,
|
||||
src.reg.indirect);
|
||||
/* We don't handle indirects on locals */
|
||||
assert(src.reg.indirect == NULL);
|
||||
reg = offset(nir_locals[src.reg.reg->index], bld,
|
||||
src.reg.base_offset * src.reg.reg->num_components);
|
||||
}
|
||||
|
||||
/* to avoid floating-point denorm flushing problems, set the type by
|
||||
@@ -1148,10 +1128,12 @@ fs_visitor::get_nir_dest(nir_dest dest)
|
||||
nir_ssa_values[dest.ssa.index] = bld.vgrf(BRW_REGISTER_TYPE_F,
|
||||
dest.ssa.num_components);
|
||||
return nir_ssa_values[dest.ssa.index];
|
||||
} else {
|
||||
/* We don't handle indirects on locals */
|
||||
assert(dest.reg.indirect == NULL);
|
||||
return offset(nir_locals[dest.reg.reg->index], bld,
|
||||
dest.reg.base_offset * dest.reg.reg->num_components);
|
||||
}
|
||||
|
||||
return fs_reg_for_nir_reg(this, dest.reg.reg, dest.reg.base_offset,
|
||||
dest.reg.indirect);
|
||||
}
|
||||
|
||||
fs_reg
|
||||
@@ -2368,16 +2350,13 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
|
||||
}
|
||||
|
||||
if (has_indirect) {
|
||||
/* Turn the byte offset into a dword offset. */
|
||||
fs_reg base_offset = vgrf(glsl_type::int_type);
|
||||
bld.SHR(base_offset, retype(get_nir_src(instr->src[1]),
|
||||
BRW_REGISTER_TYPE_D),
|
||||
brw_imm_d(2));
|
||||
fs_reg base_offset = retype(get_nir_src(instr->src[1]),
|
||||
BRW_REGISTER_TYPE_D);
|
||||
|
||||
unsigned vec4_offset = instr->const_index[0] / 4;
|
||||
unsigned vec4_offset = instr->const_index[0];
|
||||
for (int i = 0; i < instr->num_components; i++)
|
||||
VARYING_PULL_CONSTANT_LOAD(bld, offset(dest, bld, i), surf_index,
|
||||
base_offset, vec4_offset + i);
|
||||
base_offset, vec4_offset + i * 4);
|
||||
} else {
|
||||
fs_reg packed_consts = vgrf(glsl_type::float_type);
|
||||
packed_consts.type = dest.type;
|
||||
@@ -2450,7 +2429,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
|
||||
}
|
||||
|
||||
case nir_intrinsic_load_input_indirect:
|
||||
has_indirect = true;
|
||||
unreachable("Not allowed");
|
||||
/* fallthrough */
|
||||
case nir_intrinsic_load_input: {
|
||||
unsigned index = 0;
|
||||
@@ -2462,8 +2441,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
|
||||
src = offset(retype(nir_inputs, dest.type), bld,
|
||||
instr->const_index[0] + index);
|
||||
}
|
||||
if (has_indirect)
|
||||
src.reladdr = new(mem_ctx) fs_reg(get_nir_src(instr->src[0]));
|
||||
index++;
|
||||
|
||||
bld.MOV(dest, src);
|
||||
@@ -2536,7 +2513,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
|
||||
}
|
||||
|
||||
case nir_intrinsic_store_output_indirect:
|
||||
has_indirect = true;
|
||||
unreachable("Not allowed");
|
||||
/* fallthrough */
|
||||
case nir_intrinsic_store_output: {
|
||||
fs_reg src = get_nir_src(instr->src[0]);
|
||||
@@ -2544,8 +2521,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
|
||||
for (unsigned j = 0; j < instr->num_components; j++) {
|
||||
fs_reg new_dest = offset(retype(nir_outputs, src.type), bld,
|
||||
instr->const_index[0] + index);
|
||||
if (has_indirect)
|
||||
src.reladdr = new(mem_ctx) fs_reg(get_nir_src(instr->src[1]));
|
||||
index++;
|
||||
bld.MOV(new_dest, src);
|
||||
src = offset(src, bld, 1);
|
||||
|
@@ -48,11 +48,10 @@ brw_upload_gs_pull_constants(struct brw_context *brw)
|
||||
|
||||
/* BRW_NEW_GS_PROG_DATA */
|
||||
const struct brw_vue_prog_data *prog_data = &brw->gs.prog_data->base;
|
||||
const bool dword_pitch = prog_data->dispatch_mode == DISPATCH_MODE_SIMD8;
|
||||
|
||||
/* _NEW_PROGRAM_CONSTANTS */
|
||||
brw_upload_pull_constants(brw, BRW_NEW_GS_CONSTBUF, &gp->program.Base,
|
||||
stage_state, &prog_data->base, dword_pitch);
|
||||
stage_state, &prog_data->base);
|
||||
}
|
||||
|
||||
const struct brw_tracked_state brw_gs_pull_constants = {
|
||||
@@ -79,10 +78,9 @@ brw_upload_gs_ubo_surfaces(struct brw_context *brw)
|
||||
|
||||
/* BRW_NEW_GS_PROG_DATA */
|
||||
struct brw_vue_prog_data *prog_data = &brw->gs.prog_data->base;
|
||||
bool dword_pitch = prog_data->dispatch_mode == DISPATCH_MODE_SIMD8;
|
||||
|
||||
brw_upload_ubo_surfaces(brw, prog->_LinkedShaders[MESA_SHADER_GEOMETRY],
|
||||
&brw->gs.base, &prog_data->base, dword_pitch);
|
||||
&brw->gs.base, &prog_data->base);
|
||||
}
|
||||
|
||||
const struct brw_tracked_state brw_gs_ubo_surfaces = {
|
||||
|
@@ -494,7 +494,6 @@ fast_clear_attachments(struct brw_context *brw,
|
||||
struct rect fast_clear_rect)
|
||||
{
|
||||
assert(brw->gen >= 9);
|
||||
struct gl_context *ctx = &brw->ctx;
|
||||
|
||||
brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
|
||||
|
||||
@@ -511,7 +510,7 @@ fast_clear_attachments(struct brw_context *brw,
|
||||
|
||||
_mesa_meta_drawbuffers_from_bitfield(1 << index);
|
||||
|
||||
brw_draw_rectlist(ctx, &fast_clear_rect, MAX2(1, fb->MaxNumLayers));
|
||||
brw_draw_rectlist(brw, &fast_clear_rect, MAX2(1, fb->MaxNumLayers));
|
||||
|
||||
/* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
|
||||
* resolve them eventually.
|
||||
|
@@ -357,8 +357,7 @@ brw_upload_pull_constants(struct brw_context *brw,
|
||||
GLbitfield64 brw_new_constbuf,
|
||||
const struct gl_program *prog,
|
||||
struct brw_stage_state *stage_state,
|
||||
const struct brw_stage_prog_data *prog_data,
|
||||
bool dword_pitch);
|
||||
const struct brw_stage_prog_data *prog_data);
|
||||
|
||||
/* gen7_vs_state.c */
|
||||
void
|
||||
|
@@ -901,8 +901,21 @@ generate_pull_constant_load(struct brw_codegen *p,
|
||||
|
||||
gen6_resolve_implied_move(p, &header, inst->base_mrf);
|
||||
|
||||
brw_MOV(p, retype(brw_message_reg(inst->base_mrf + 1), BRW_REGISTER_TYPE_D),
|
||||
offset);
|
||||
if (devinfo->gen >= 6) {
|
||||
if (offset.file == BRW_IMMEDIATE_VALUE) {
|
||||
brw_MOV(p, retype(brw_message_reg(inst->base_mrf + 1),
|
||||
BRW_REGISTER_TYPE_D),
|
||||
brw_imm_d(offset.ud >> 4));
|
||||
} else {
|
||||
brw_SHR(p, retype(brw_message_reg(inst->base_mrf + 1),
|
||||
BRW_REGISTER_TYPE_D),
|
||||
offset, brw_imm_d(4));
|
||||
}
|
||||
} else {
|
||||
brw_MOV(p, retype(brw_message_reg(inst->base_mrf + 1),
|
||||
BRW_REGISTER_TYPE_D),
|
||||
offset);
|
||||
}
|
||||
|
||||
uint32_t msg_type;
|
||||
|
||||
|
@@ -787,11 +787,9 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
|
||||
src_reg offset;
|
||||
|
||||
if (!has_indirect) {
|
||||
offset = brw_imm_ud(const_offset / 16);
|
||||
offset = brw_imm_ud(const_offset & ~15);
|
||||
} else {
|
||||
offset = src_reg(this, glsl_type::uint_type);
|
||||
emit(SHR(dst_reg(offset), get_nir_src(instr->src[1], nir_type_int, 1),
|
||||
brw_imm_ud(4u)));
|
||||
offset = get_nir_src(instr->src[1], nir_type_int, 1);
|
||||
}
|
||||
|
||||
src_reg packed_consts = src_reg(this, glsl_type::vec4_type);
|
||||
|
@@ -1550,23 +1550,16 @@ vec4_visitor::get_pull_constant_offset(bblock_t * block, vec4_instruction *inst,
|
||||
|
||||
emit_before(block, inst, ADD(dst_reg(index), *reladdr,
|
||||
brw_imm_d(reg_offset)));
|
||||
|
||||
/* Pre-gen6, the message header uses byte offsets instead of vec4
|
||||
* (16-byte) offset units.
|
||||
*/
|
||||
if (devinfo->gen < 6) {
|
||||
emit_before(block, inst, MUL(dst_reg(index), index, brw_imm_d(16)));
|
||||
}
|
||||
emit_before(block, inst, MUL(dst_reg(index), index, brw_imm_d(16)));
|
||||
|
||||
return index;
|
||||
} else if (devinfo->gen >= 8) {
|
||||
/* Store the offset in a GRF so we can send-from-GRF. */
|
||||
src_reg offset = src_reg(this, glsl_type::int_type);
|
||||
emit_before(block, inst, MOV(dst_reg(offset), brw_imm_d(reg_offset)));
|
||||
emit_before(block, inst, MOV(dst_reg(offset), brw_imm_d(reg_offset * 16)));
|
||||
return offset;
|
||||
} else {
|
||||
int message_header_scale = devinfo->gen < 6 ? 16 : 1;
|
||||
return brw_imm_d(reg_offset * message_header_scale);
|
||||
return brw_imm_d(reg_offset * 16);
|
||||
}
|
||||
}
|
||||
|
||||
|
@@ -53,8 +53,7 @@ brw_upload_pull_constants(struct brw_context *brw,
|
||||
GLbitfield64 brw_new_constbuf,
|
||||
const struct gl_program *prog,
|
||||
struct brw_stage_state *stage_state,
|
||||
const struct brw_stage_prog_data *prog_data,
|
||||
bool dword_pitch)
|
||||
const struct brw_stage_prog_data *prog_data)
|
||||
{
|
||||
unsigned i;
|
||||
uint32_t surf_index = prog_data->binding_table.pull_constants_start;
|
||||
@@ -94,8 +93,7 @@ brw_upload_pull_constants(struct brw_context *brw,
|
||||
}
|
||||
|
||||
brw_create_constant_surface(brw, const_bo, const_offset, size,
|
||||
&stage_state->surf_offset[surf_index],
|
||||
dword_pitch);
|
||||
&stage_state->surf_offset[surf_index]);
|
||||
drm_intel_bo_unreference(const_bo);
|
||||
|
||||
brw->ctx.NewDriverState |= brw_new_constbuf;
|
||||
@@ -112,7 +110,6 @@ static void
|
||||
brw_upload_vs_pull_constants(struct brw_context *brw)
|
||||
{
|
||||
struct brw_stage_state *stage_state = &brw->vs.base;
|
||||
bool dword_pitch;
|
||||
|
||||
/* BRW_NEW_VERTEX_PROGRAM */
|
||||
struct brw_vertex_program *vp =
|
||||
@@ -121,11 +118,9 @@ brw_upload_vs_pull_constants(struct brw_context *brw)
|
||||
/* BRW_NEW_VS_PROG_DATA */
|
||||
const struct brw_stage_prog_data *prog_data = &brw->vs.prog_data->base.base;
|
||||
|
||||
dword_pitch = brw->vs.prog_data->base.dispatch_mode == DISPATCH_MODE_SIMD8;
|
||||
|
||||
/* _NEW_PROGRAM_CONSTANTS */
|
||||
brw_upload_pull_constants(brw, BRW_NEW_VS_CONSTBUF, &vp->program.Base,
|
||||
stage_state, prog_data, dword_pitch);
|
||||
stage_state, prog_data);
|
||||
}
|
||||
|
||||
const struct brw_tracked_state brw_vs_pull_constants = {
|
||||
@@ -145,16 +140,13 @@ brw_upload_vs_ubo_surfaces(struct brw_context *brw)
|
||||
/* _NEW_PROGRAM */
|
||||
struct gl_shader_program *prog =
|
||||
ctx->_Shader->CurrentProgram[MESA_SHADER_VERTEX];
|
||||
bool dword_pitch;
|
||||
|
||||
if (!prog)
|
||||
return;
|
||||
|
||||
/* BRW_NEW_VS_PROG_DATA */
|
||||
dword_pitch = brw->vs.prog_data->base.dispatch_mode == DISPATCH_MODE_SIMD8;
|
||||
brw_upload_ubo_surfaces(brw, prog->_LinkedShaders[MESA_SHADER_VERTEX],
|
||||
&brw->vs.base, &brw->vs.prog_data->base.base,
|
||||
dword_pitch);
|
||||
&brw->vs.base, &brw->vs.prog_data->base.base);
|
||||
}
|
||||
|
||||
const struct brw_tracked_state brw_vs_ubo_surfaces = {
|
||||
|
@@ -400,15 +400,11 @@ brw_create_constant_surface(struct brw_context *brw,
|
||||
drm_intel_bo *bo,
|
||||
uint32_t offset,
|
||||
uint32_t size,
|
||||
uint32_t *out_offset,
|
||||
bool dword_pitch)
|
||||
uint32_t *out_offset)
|
||||
{
|
||||
uint32_t stride = dword_pitch ? 4 : 16;
|
||||
uint32_t elements = ALIGN(size, stride) / stride;
|
||||
|
||||
brw->vtbl.emit_buffer_surface_state(brw, out_offset, bo, offset,
|
||||
BRW_SURFACEFORMAT_R32G32B32A32_FLOAT,
|
||||
elements, stride, false);
|
||||
size, 1, false);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -421,8 +417,7 @@ brw_create_buffer_surface(struct brw_context *brw,
|
||||
drm_intel_bo *bo,
|
||||
uint32_t offset,
|
||||
uint32_t size,
|
||||
uint32_t *out_offset,
|
||||
bool dword_pitch)
|
||||
uint32_t *out_offset)
|
||||
{
|
||||
/* Use a raw surface so we can reuse existing untyped read/write/atomic
|
||||
* messages. We need these specifically for the fragment shader since they
|
||||
@@ -537,7 +532,7 @@ brw_upload_wm_pull_constants(struct brw_context *brw)
|
||||
|
||||
/* _NEW_PROGRAM_CONSTANTS */
|
||||
brw_upload_pull_constants(brw, BRW_NEW_SURFACES, &fp->program.Base,
|
||||
stage_state, prog_data, true);
|
||||
stage_state, prog_data);
|
||||
}
|
||||
|
||||
const struct brw_tracked_state brw_wm_pull_constants = {
|
||||
@@ -918,8 +913,7 @@ void
|
||||
brw_upload_ubo_surfaces(struct brw_context *brw,
|
||||
struct gl_shader *shader,
|
||||
struct brw_stage_state *stage_state,
|
||||
struct brw_stage_prog_data *prog_data,
|
||||
bool dword_pitch)
|
||||
struct brw_stage_prog_data *prog_data)
|
||||
{
|
||||
struct gl_context *ctx = &brw->ctx;
|
||||
|
||||
@@ -944,8 +938,7 @@ brw_upload_ubo_surfaces(struct brw_context *brw,
|
||||
binding->BufferObject->Size - binding->Offset);
|
||||
brw_create_constant_surface(brw, bo, binding->Offset,
|
||||
binding->BufferObject->Size - binding->Offset,
|
||||
&ubo_surf_offsets[i],
|
||||
dword_pitch);
|
||||
&ubo_surf_offsets[i]);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -967,8 +960,7 @@ brw_upload_ubo_surfaces(struct brw_context *brw,
|
||||
binding->BufferObject->Size - binding->Offset);
|
||||
brw_create_buffer_surface(brw, bo, binding->Offset,
|
||||
binding->BufferObject->Size - binding->Offset,
|
||||
&ssbo_surf_offsets[i],
|
||||
dword_pitch);
|
||||
&ssbo_surf_offsets[i]);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -988,7 +980,7 @@ brw_upload_wm_ubo_surfaces(struct brw_context *brw)
|
||||
|
||||
/* BRW_NEW_FS_PROG_DATA */
|
||||
brw_upload_ubo_surfaces(brw, prog->_LinkedShaders[MESA_SHADER_FRAGMENT],
|
||||
&brw->wm.base, &brw->wm.prog_data->base, true);
|
||||
&brw->wm.base, &brw->wm.prog_data->base);
|
||||
}
|
||||
|
||||
const struct brw_tracked_state brw_wm_ubo_surfaces = {
|
||||
@@ -1014,7 +1006,7 @@ brw_upload_cs_ubo_surfaces(struct brw_context *brw)
|
||||
|
||||
/* BRW_NEW_CS_PROG_DATA */
|
||||
brw_upload_ubo_surfaces(brw, prog->_LinkedShaders[MESA_SHADER_COMPUTE],
|
||||
&brw->cs.base, &brw->cs.prog_data->base, true);
|
||||
&brw->cs.base, &brw->cs.prog_data->base);
|
||||
}
|
||||
|
||||
const struct brw_tracked_state brw_cs_ubo_surfaces = {
|
||||
|
@@ -304,7 +304,7 @@ brw_upload_cs_pull_constants(struct brw_context *brw)
|
||||
|
||||
/* _NEW_PROGRAM_CONSTANTS */
|
||||
brw_upload_pull_constants(brw, BRW_NEW_SURFACES, &cp->program.Base,
|
||||
stage_state, prog_data, true);
|
||||
stage_state, prog_data);
|
||||
}
|
||||
|
||||
const struct brw_tracked_state brw_cs_pull_constants = {
|
||||
|
@@ -898,6 +898,21 @@ _mesa_validate_program_pipeline(struct gl_context* ctx,
|
||||
if (!_mesa_sampler_uniforms_pipeline_are_valid(pipe))
|
||||
goto err;
|
||||
|
||||
/* Validate inputs against outputs, this cannot be done during linking
|
||||
* since programs have been linked separately from each other.
|
||||
*
|
||||
* From OpenGL 4.5 Core spec:
|
||||
* "Separable program objects may have validation failures that cannot be
|
||||
* detected without the complete program pipeline. Mismatched interfaces,
|
||||
* improper usage of program objects together, and the same
|
||||
* state-dependent failures can result in validation errors for such
|
||||
* program objects."
|
||||
*
|
||||
* OpenGL ES 3.1 specification has the same text.
|
||||
*/
|
||||
if (!_mesa_validate_pipeline_io(pipe))
|
||||
goto err;
|
||||
|
||||
pipe->Validated = GL_TRUE;
|
||||
return GL_TRUE;
|
||||
|
||||
@@ -928,23 +943,11 @@ _mesa_ValidateProgramPipeline(GLuint pipeline)
|
||||
return;
|
||||
}
|
||||
|
||||
_mesa_validate_program_pipeline(ctx, pipe,
|
||||
(ctx->_Shader->Name == pipe->Name));
|
||||
|
||||
/* Validate inputs against outputs, this cannot be done during linking
|
||||
* since programs have been linked separately from each other.
|
||||
*
|
||||
* From OpenGL 4.5 Core spec:
|
||||
* "Separable program objects may have validation failures that cannot be
|
||||
* detected without the complete program pipeline. Mismatched interfaces,
|
||||
* improper usage of program objects together, and the same
|
||||
* state-dependent failures can result in validation errors for such
|
||||
* program objects."
|
||||
*
|
||||
* OpenGL ES 3.1 specification has the same text.
|
||||
/* ValidateProgramPipeline should not throw errors when pipeline validation
|
||||
* fails and should instead only update the validation status. We pass
|
||||
* false for IsBound to avoid an error being thrown.
|
||||
*/
|
||||
if (!_mesa_validate_pipeline_io(pipe))
|
||||
pipe->Validated = GL_FALSE;
|
||||
_mesa_validate_program_pipeline(ctx, pipe, false);
|
||||
}
|
||||
|
||||
void GLAPIENTRY
|
||||
|
@@ -758,6 +758,10 @@ _mesa_uniform(struct gl_context *ctx, struct gl_shader_program *shProg,
|
||||
return;
|
||||
}
|
||||
}
|
||||
/* We need to reset the validate flag on changes to samplers in case
|
||||
* two different sampler types are set to the same texture unit.
|
||||
*/
|
||||
ctx->_Shader->Validated = GL_FALSE;
|
||||
}
|
||||
|
||||
if (uni->type->is_image()) {
|
||||
|
Reference in New Issue
Block a user