* presenter: render the game inside a ImGui window
* presenter: render the previous frame to keep the render rendering
* swapchain: fix swapchain image view format not being converted to unorm
* devtools: fix frame graph timing
* shader_recompiler: Account for instruction array flag in image type.
* shader_recompiler: Check da flag for all mimg instructions.
* shader_recompiler: Convert cube images into 2D arrays.
* shader_recompiler: Move image resource functions into sharp type.
* shader_recompiler: Use native AMD cube instructions when possible.
* specialization: Fix buffer storage mistake.
* video_core: Implement conversion for uncommon/unsupported number formats.
* shader_recompiler: Reinterpret image sample output as well.
* liverpool_to_vk: Remove mappings for remapped number formats.
These were poorly supported by drivers anyway.
* resource_tracking_pass: Fix image write swizzle mistake.
* amdgpu: Add missing specialization and move format mapping data to types
* reinterpret: Fix U/SToF input type.
* texture_cache: Stricter barriers on image upload
* buffer_cache: Stricter barrier for vkCmdUpdateBuffer
* vk_rasterizer: Barrier also normal buffers and make it apply to all stages
* texture_cache: Minor barrier cleanup
* Batch image and buffer barriers in a single command
* clang format
* Speed up LiverpoolToVK::SurfaceFormat
In Bloodborne this shows up as the function with the very highest cumulative "exclusive time". This is true both in scenes that perform poorly, and scenes that perform well.
I took (approximately) 10s samples using an 8khz sampling profiler.
In the Nightmare Grand Cathedral (looking towards the stairs, at the rest of the level):
- Reduced total time from 757.34ms to 82.61ms (out of ~10000ms).
- Reduced average frame times by 2ms (though according to the graph, the gap may be as big as 9ms every N frames).
In the Hunter's Dream (in the spawn position):
- Reduced the total time from 486.50ms to 53.83ms (out of ~10000ms).
- Average frame times appear to be roughly the same.
These are profiles of the change vs the version currently in the main branch. These improvements also improve things in the `threading` branch. They might improve them even more in that branch, but I didn't bother keeping track of my measurements as well in that branch. I believe this change will still be useful even when that branch is stabilized and merged.
It could be there are other bottlenecks in rendering on this branch that are preventing this code from being the critical path in places like the Hunter's Dream, where performance isn't currently as constrained. That might explain why the reduction in call times isn't resulting in a higher frame rate.
* Implement SurfaceFormat with derived lookup table instead of switch
* Clang format fixes
* shader_recompiler: Add swizzle support for unsupported formats.
* renderer_vulkan: Rework MRT swizzles and add unsupported format swizzle support.
* shader_recompiler: Clean up swizzle handling and handle ImageRead storage swizzle.
* shader_recompiler: Fix type errors
* liverpool_to_vk: Remove redundant clear color swizzles.
* shader_recompiler: Reduce CompositeConstruct to constants where possible.
* shader_recompiler: Fix ImageRead/Write and StoreBufferFormatF32 types.
* amdgpu: Add a few more unsupported format remaps.
* devtools: fix popen in non-windows environment
* devtools: fix frame crash assertion when hidden
* devtools: add search to shader list
* devtools: add copy name to shader list
* devtools: frame dump: search by shader name
* Clear RT if FCE was invoked before any draws
Co-authored-by: psucien <bad_cast@protonmail.com>
* address review comments
---------
Co-authored-by: psucien <bad_cast@protonmail.com>
* ir: Add heuristic based LDS barrier pass
* Attempts to insert barriers after zero-depth divergant conditional blocks in shaders that use shared memory
* lds_barriers: Limit to nvidia
* Intel has historically had problems with cs barriers, will debug other time
* coroutine code prettification
* asc queues submission refactoring
* better asc ring context handling
* final touches and review notes
* even more simplification for context saving
* shader_recompiler: Tessellation WIP
* fix compiler errors after merge
DONT MERGE set log file to /dev/null
DONT MERGE linux pthread bb fix
save work
DONT MERGE dump ir
save more work
fix mistake with ES shader
skip list
add input patch control points dynamic state
random stuff
* WIP Tessellation partial implementation. Squash commits
* test: make local/tcs use attr arrays
* attr arrays in TCS/TES
* dont define empty attr arrays
* switch to special opcodes for tess tcs/tes reads and tcs writes
* impl tcs/tes read attr insts
* rebase fix
* save some work
* save work probably broken and slow
* put Vertex LogicalStage after TCS and TES to fix bindings
* more refactors
* refactor pattern matching and optimize modulos (disabled)
* enable modulo opt
* copyright
* rebase fixes
* remove some prints
* remove some stuff
* Add TCS/TES support for shader patching and use LogicalStage
* refactor and handle wider DS instructions
* get rid of GetAttributes for special tess constants reads. Immediately replace some upon seeing readconstbuffer. Gets rid of some extra passes over IR
* stop relying on GNMX HsConstants struct. Change runtime_info.hs_info and some regs
* delete some more stuff
* update comments for current implementation
* some cleanup
* uint error
* more cleanup
* remove patch control points dynamic state (because runtime_info already depends on it)
* fix potential problem with determining passthrough
---------
Co-authored-by: IndecisiveTurtle <47210458+raphaelthegreat@users.noreply.github.com>
* texture_cache: Improve support for stencil reads
* libraries: Supress some spammy logs
* core: Support loading font libraries
* texture_cache: Remove assert
* page_manager: Enable userfaultfd by default
* Much faster than page faults and causes less problems
* shader_recompiler: Add texel buffer multiplier
* Fixes format mismatch assert when vsharp stride is multiple of format stride
* shader_recompiler: Specialize UBOs on size
* Some games can perform manual vertex pulling and thus bind read only buffers of varying size. We only recompile when the vsharp size is larger than size in shader, in opposite case its not needed
* clang format
* shader_recompiler: Read image format info directly from sharps instead of storing in shader info.
* renderer_vulkan: Parse fetch shader per-pipeline
* Few minor fixes.
* shader_recompiler: Specialize on vertex attribute number types.
* shader_recompiler: Move GetDrawOffsets to fetch shader
* core: Split error codes into separate files
* Reduces build times and is cleaner
* core: Bring structs and enums to codebase style
* core: More style changes
* Fixed false-positive image reuploads
* Fixed userfaultfd path, removed dead code, simplified calculations
* oopsie
* track potentially dirty images and hash them
* untrack only first page of the image in case of head access
* rebase, initialize hash, fix bounds check
* include image tail in the calculations
* video_core: texture_cache: interface refactor and better overlap handling
* resources binding moved into vk_rasterizer
* remove `virtual` flag leftover
* libkernel: Cleanup some function places
* kernel: Refactor thread functions
* kernel: It builds
* kernel: Fix a bunch of bugs, kernel thread heap
* kernel: File cleanup pt1
* File cleanup pt2
* File cleanup pt3
* File cleanup pt4
* kernel: Add missing funcs
* kernel: Add basic exceptions for linux
* gnmdriver: Add workload functions
* kernel: Fix new pthreads code on macOS. (#1441)
* kernel: Downgrade edeadlk to log
* gnmdriver: Add sceGnmSubmitCommandBuffersForWorkload
* exception: Add context register population for macOS. (#1444)
* kernel: Pthread rewrite touchups for Windows
* kernel: Multiplatform thread implementation
* mutex: Remove spamming log
* pthread_spec: Make assert into a log
* pthread_spec: Zero initialize array
* Attempt to fix non-Windows builds
* hotfix: change incorrect NID for scePthreadAttrSetaffinity
* scePthreadAttrSetaffinity implementation
* Attempt to fix Linux
* windows: Address a bunch of address space problems
* address_space: Fix unmap of region surrounded by placeholders
* libs: Reduce logging
* pthread: Implement condvar with waitable atomics and sleepqueue
* sleepq: Separate and make faster
* time: Remove delay execution
* Causes high cpu usage in Tohou Luna Nights
* kernel: Cleanup files again
* pthread: Add missing include
* semaphore: Use binary_semaphore instead of condvar
* Seems more reliable
* libraries/sysmodule: log module on `sceSysmoduleIsLoaded`
* libraries/kernel: implement `scePthreadSetPrio`
---------
Co-authored-by: squidbus <175574877+squidbus@users.noreply.github.com>
Co-authored-by: Daniel R. <47796739+polybiusproxy@users.noreply.github.com>
Games can strip the first shader instruction (meant for debugging) which we rely on for obtaining shader information (e.g. LittleBigPlanet 3). For this reason, we start a search through the code start until we arrive at the shader binary info.
* `RendererVulkan` -> `Presenter`
* support for Video Out gamma setting
* sRGB hack removed
* added post process pass to presenter
* splash functionality restored
* Implement shader resource tables
* fix after rebase + squash
* address some review comments
* fix pipeline_common
* cleanup debug stuff
* switch to using single codegenerator
* devtools: fix showing entire depth instead of bits
* devtools: show button for stage instead of menu bar
- fix batch view dockspace not rendering when window collapsed
* devtools: removed useless "Batch" collapse & don't collapse last batch
* devtools: refactor DrawRow to templating
* devtools: reg popup size adjusted to the content
* devtools: better window names
* devtools: regview layout compacted
* devtools: option to show collapsed frame dump
keep most popups open when selection changes
best popup windows positioning
* devtools: show compute shader regs
* devtools: tips popup
* devtools: pm4 - show markers
* SaveDataDialogLib: fix compile with mingw
* devtools: pm4 - show program state
* devtools: pm4 - show program disassembly
* devtools: pm4 - show frame regs
* devtools: pm4 - show color buffer info as popup
add ux improvements for open new windows with shift+click
better window titles
* imgui: skip all textures to avoid hanging with crash diagnostic enabled
not sure why this happens :c
* devtools: pm4 - show reg depth buffer
* Devtools: Pause system
* Devtools: pm4 viewer
- new menu bar
- refactored video_info layer
- dump & inspect pm4 packets
- removed dumpPM4 config
- renamed System to DebugState
- add docking space
- simple video info constrained to window size
* Devtools: pm4 viewer - add combo to select the queue
* Devtools: pm4 viewer - add hex editor
* Devtools: pm4 viewer - dump current cmd
* add monospaced font to devtools
* Devtools: pm4 viewer - use spec op name
avoid some allocations
* shader_recompiler: Define fragment output type based on number format.
* shader_recompiler: Fix GetAttribute SPIR-V output type.
* shader_recompiler: Don't bitcast on SetAttribute unless integer target.
* vulkan: Fix some extension support related validation errors.
* vulkan: Fix validation error on zero-size buffer.
* vulkan: Fix primitive list restart validation error.
* Use filesystem::path whenever possible, remove fs::path::string
* My hatred for Windows grows with every passing day
* More Qt stuff
* custom u8string formatter for fmt library
* Use u8string for imgui
* Fix toml errors hopefully
* Fix not printing issue
* Oh and on SDL
* I hate Windows even more today
* fix toml reading utf-8 paths
also small fix for fmt::UTF
* Formatting
* Fix QT path to run games
* Fix path logging in save data
* Fix trophy path handling
* Update game_list_frame.cpp
fixed snd0path
* Update main_window.cpp
fix snd0path
* Update main_window.cpp
* paths finally fixed
* git info in WIP versions title
---------
Co-authored-by: Vinicius Rangel <me@viniciusrangel.dev>
Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
* Always present acquired swapchain images
Always present acquired swapchain images in order to be able to acquire them again.
fix#865
* Recreate swapchain if window is resized
* Respect aspect ratio when blitting to frame
* Make SDL window resizable
* clang-format
* designator order (building with gcc)
Fix /shadPS4/src/video_core/renderer_vulkan/vk_instance.cpp:314:9: error: designator order for field ‘vk::PhysicalDeviceVulkan12Features::samplerMirrorClampToEdge’ does not match declaration order in ‘vk::PhysicalDeviceVulkan12Features’
* Clear frame before blitting
* clang-format
* Revert "designator order (building with gcc)"
There already is a PR opened for this.
This reverts commit 7f8ccf4b1e.