* Swap !IsFree() for IsMapped()
IsFree only checks if the VMAType == Free. As is, that means ClampRangeSize will include memory that is Reserved or PoolReserved, and neither of those types are GPU mapped.
This fixes this bug, may help with some non-GPU memory asserts.
* Apply ClampRangeSize to vertex buffers
Helps with some cases encountered by UE and Minecraft.
* Allow vector and scalar offset in buffer address arg to
LoadBuffer/StoreBuffer
* remove is_ring check
* fix atomics and update pattern matching for tess factor stores
* remove old asserts about soffset
* small fixes
* copyright
* Handle sgpr initialization for 2 special hull shader values, including tess factor buffer offset
* Qt: Add FSR settings to settings GUI
* Move FSR settings from Imgui.ini to main config
* move passing fsr settings to presenter constuctor
* cleanup: use struct instead of function call
* cleanup: make variable names consistent with others
* Update fsr settings real-time in qt, save button in Imgui
* Linux build fix, missing running game check
* syntax fix
* Change gamerunning checks to if (presenter)
Layer must be output from the final pre-rasterization stage, so when tessellation is
being used the vertex shader cannot set it. We could pass it through to the tessellation
shaders in some way, but unless it becomes an issue that's more effort than it's worth.
* vk_pipeline_cache: Cleanup graphics key refresh
* position: Don't assert on None mapping
Also check outputs in runtime info so shader is recompiled if they change
* video_core: support for RT layer outputs
- support for RT layer outputs
- refactor for handling of export attributes
- move output->attribute mapping to a separate header
* export: Rework render target exports
- Centralize all code related to MRT exports into a single function to make it easier to follow
- Apply swizzle to output RGBA colors instead of the render target channel.
This fixes swizzles on formats with < 4 channels
For example with render target format R8_UNORM and COMP_SWAP ALT_REV the previous code would output
frag_color.a = color.r;
instead of
frag_color.r = color.a;
which would result in incorrect output in some cases
* vk_pipeline_cache: Apply swizzle to write masks
---------
Co-authored-by: polyproxy <47796739+polybiusproxy@users.noreply.github.com>
* ir: Perform degamma in shader when sampler sets force_degamma
* specialization: Add srgb if image is sampled
Might fix cases where sampler force_degamma is used with srgb image
* video_core: Rework detiling
* video_core: Support tiling and macrotile detiling
* clang format
* image_info: Cleanups
* resource: Revert some changes
* texture_cache: Fix small error
* image_info: Set depth flag on depth promote
* buffer_cache: Remove level check
* tile_manager: Handle case of staging buffer causing flush
* image_info: Add 2D thick array mode
* image_info: Add slices to mip size
* tile_manager: Set bank swizzle
* buffer_cache: Support image copies from DmaData
* vk_rasterizer: Accelerate trivial render target copies with compute
Before tiling PR compute image copies were done with the following sequence
vkCmdCopyImageToBuffer (in SynchronizeBufferFromImage) -> vkCmdDispatch (copy) -> vkCmdCopyBufferToImage (in RefreshImage)
With the tiling PR it added extra tiling/detiling steps
vkCmdCopyImageToBuffer -> vkCmdDispatch (tiling) -> vkCmdDispatch (copy) -> vkCmdDispatch (detiling) -> vkCmdCopyBufferToImage
This is quite a bit of overhead for a simple image copy. This commit tries to detect trivial image copies i.e cs shaders that copy the full source image to all of the destination.
So now all this sequence is just a vkCmdCopyImage. How much it triggers depends on the guest
* texture_cache: Fix build
* image: Copy all subresources with buffer too
* shader_recompiler: Remove remnants of old discard
Also constant propagate conditional discard if condition is constant
* resource_tracking_pass: Rework sharp tracking for robustness
* resource_tracking_pass: Add source dominance analysis
When reachability is not enough to prune source list, check if a source dominates all other sources
* resource_tracking_pass: Fix immediate check
How did this work before
* resource_tracking_pass: Remove unused template type
* readlane_elimination_pass: Don't add phi when all args are the same
New sharp tracking exposed some bad sources coming on sampler sharps with aniso disable pattern that also were part of readlane pattern, fix tracking by removing the unnecessary phis inbetween
* resource_tracking_pass: Allow phi in disable aniso pattern
* resource_tracking_pass: Handle not valid buffer sharp and more phi in aniso pattern
* implement loads/store instructions for types smaller than dwords
* initialize s16/s8 types
* set profile for int8/16/64
* also need to zero extend u8/u16 to u32 result
* document unrelated bugs with atomic fmin/max
* remove profile checks and simple emit for added opcodes
---------
Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
* resource_tracking: Mark image as written when its used with atomics
* texture_cache: Remove meta registered flag
Mostly useless and it is possible for images to switch metas
* vk_rasterizer: Use xor as heuristic for HTILE clear
* renderer_vulkan: Respect provoking vertex setting
* renderer_vulkan: Handle rasterization discard
* renderer_vulkan: Implement logic ops
* renderer_vulkan: Properly implement depth clamp and clip
* renderer_vulkan: Handle line width
* Fix build
* vk_pipeline_cache: Don't check depth clamp without a depth buffer
* liverpool: Fix line control offset
* vk_pipeline_cache: Don't run search if depth clamp is disabled
* vk_pipeline_cache: Allow using viewport range when it's more restrictive then depth clamp
* liverpool: Disable depth clip when near and far planes have different setting
* vk_graphics_pipeline: Move warning to pipeline
* vk_pipeline_cache: Revert viewport check and remove log
* vk_graphics_pipeline: Enable depth clamp when depth clip is disabled and extension is not supported
Without the depth clip extension depth clipping is controlled by depth clamping
* shader_recompiler: Replace buffer pulling with attribute divisor for instance step rates
* flatten_extended_userdata: Remove special step rate buffer handling
* Review comments
* spirv_emit_context: Name all instance rate attribs properly
* spirv: Merge ReadConstBuffer again
template function only has 1 user now
* attribute: Add missing attributes
* translate: Reimplement step rate instance id
* Resolve validation warnings
* shader_recompiler: Separate vertex inputs from LS stage, cleanup tess
* buffer_cache: Handle inline data to flexible memory
* control_flow: Fix single instruction scopes edge case
Fixes the following pattern
v_cmpx_gt_u32 cond
buffer_store_dword value
.LABEL:
Before
buffer[index] = value;
After
if (cond)
{
buffer[index] = value;
}
* vector_memory: Handle soffset when offen is false
When offen is not used we can substitute the offset argument with soffset and have it handled correctly
* scalar_alu: Handle sharp moves with S_MOV_B64
This fixes unable to track sharp errors when this pattern is used in a shader
* emulator: Add log
* video_core: Bump binary info search range and buffer num
* buffer_cache: Bring back upload batching and temporary buffer
Because that PR fused the write and read protections under a single function call, it was a requirement to move the actual memory copy part inside the lambda to perform it before the read protection kicks in. However on certain large data transfers it had potential for data corruption. If, for example, an upload had two copies, a 400MB and a 300MB one, the first one would fit in the staging buffer, very likely with an induced stall. However the second one wouldn't have space to fit alongside the other data, but it's also small enough for the buffer to fit it, so the staging buffer would cause a flush and wait to copy it, overwriting the previous transfer.
To address this the upload function has been reworked to allow for batching like previously but with the new locking behavior. Also the condition to use temporary buffers has been expanded to also include cases when staging buffer will stall, which should increase performance a little in some cases.
* buffer_cache: Move buffer barriers and copy outside of lock range
* texture_cache: Async download of GPU modified linear images
* liverpool: Back to less submits
* texture_cache: Don't download depth images
* config: Add option for linear image readback