| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| | |
Move time services to new IPC and add debug printing
|
| |
| |
| |
| | |
Add some fixes/improvements to usage with the new IPC
|
|\ \
| |/
|/| |
Simplify VkResult lookup
|
| | |
|
| | |
|
|/ |
|
|\
| |
| | |
renderer_vulkan: Introduce separate cmd buffer for uploads
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
video_core: Move vma implementation to library
|
|\
| |
| | |
OpenGL: Make use of persistent buffer maps in buffer cache
|
| |
| |
| |
| |
| |
| | |
Persistent buffer maps were already used by the texture cache, this extends their usage for the buffer cache.
In my testing, using the memory maps for uploads was slower than the existing "ImmediateUpload" path, so the memory map usage is limited to downloads for the time being.
|
| | |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| | |
video_core: Implement maxwell3d draw texture method
|
| | |
|
|\ \
| | |
| | | |
vulkan: implement 'turbo mode' clock booster
|
| |/ |
|
|/ |
|
| |
|
|
|
|
|
| |
Co-authored-by: goldenx86 <goldenx86@users.noreply.github.com>
Co-authored-by: BreadFish64 <breadfish64@users.noreply.github.com>
|
|\
| |
| | |
video_core: Implement maxwell3d draw manager and split draw logic
|
| | |
|
|/ |
|
|\
| |
| | |
externals: update dynarmic, SDL2
|
| | |
|
|\ \
| |/
|/| |
video_core: add null backend
|
| | |
|
|/ |
|
| |
|
| |
|
|
|
|
| |
These are already explicitly or implicitly set in src/CMakeLists.txt
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[REUSE] is a specification that aims at making file copyright
information consistent, so that it can be both human and machine
readable. It basically requires that all files have a header containing
copyright and licensing information. When this isn't possible, like
when dealing with binary assets, generated files or embedded third-party
dependencies, it is permitted to insert copyright information in the
`.reuse/dep5` file.
Oh, and it also requires that all the licenses used in the project are
present in the `LICENSES` folder, that's why the diff is so huge.
This can be done automatically with `reuse download --all`.
The `reuse` tool also contains a handy subcommand that analyzes the
project and tells whether or not the project is (still) compliant,
`reuse lint`.
Following REUSE has a few advantages over the current approach:
- Copyright information is easy to access for users / downstream
- Files like `dist/license.md` do not need to exist anymore, as
`.reuse/dep5` is used instead
- `reuse lint` makes it easy to ensure that copyright information of
files like binary assets / images is always accurate and up to date
To add copyright information of files that didn't have it I looked up
who committed what and when, for each file. As yuzu contributors do not
have to sign a CLA or similar I couldn't assume that copyright ownership
was of the "yuzu Emulator Project", so I used the name and/or email of
the commit author instead.
[REUSE]: https://reuse.software
Follow-up to 01cf05bc75b1e47beb08937439f3ed9339e7b254
|
|
|
|
| |
Now that the entire project is free of variable shadowing, we can enforce this as a compile time error to prevent any further introduction of this logic bug.
|
| |
|
|
|
|
| |
... to fix build on Flatpak (and self-builds)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Adds {h264_,vp9_}{nvdec,vdpau} hwaccels.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* nvdec: VA-API
* Verify formatting
* Forgot a semicolon for Windows
* Clarify comment about AV_PIX_FMT_NV12
* Fix assert log spam from missing negation
* vic: Remove forgotten debug code
* Address lioncash's review
* Mention VA-API is Intel/AMD
* Address v1993's review
* Hopefully fix CMakeLists style this time
* vic: Improve cache locality
* vic: Fix off-by-one error
* codec: Async
* codec: Forgot the GetValue()
* nvdec: Address ameerj's review
* codec: Fallback to CPU without VA-API support
* cmake: Address lat9nq's review
* cmake: Make VA-API optional
* vaapi: Multiple GPU
* Apply suggestions from code review
Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>
* nvdec: Address ameerj's review
* codec: Use anonymous instead of static
* nvdec: Remove enum and fix memory leak
* nvdec: Address ameerj's review
* codec: Remove preparation for threading
Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use VK_KHR_pipeline_executable_properties when enabled and available to
log statistics about the pipeline cache in a game.
For example, this is on Turing GPUs when generating a pipeline cache
from Super Smash Bros. Ultimate:
Average pipeline statistics
==========================================
Code size: 6433.167
Register count: 32.939
More advanced results could be presented, at the moment it's just an
average of all 3D and compute pipelines.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Move code to separate files to be able to reuse it from OpenGL. This
greatly simplifies the pipeline cache logic on Vulkan.
Transform feedback state is not yet abstracted and it's still
intrusively stored inside vk_pipeline_cache. It will be moved when
needed on OpenGL.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly fixing unused *, implicit conversion, braced scalar init,
fpermissive, and some others.
Some Clang errors likely remain in video_core, and std::ranges is still
a pertinent issue in shader_recompiler
shader_recompiler: cmake: Force bracket depth to 1024 on Clang
Increases the maximum fold expression depth
thread_worker: Include condition_variable
Don't use list initializers in control flow
Co-authored-by: ReinUsesLisp <reinuseslisp@airmail.cc>
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Enforce implicit integer casts to a smaller type as errors.
|
|
|
|
|
|
|
| |
Users may want to fall back to the CPU ASTC texture decoder due to hangs
and crashes that may be caused by keeping the GPU under compute heavy
loads for extended periods of time. This is especially the case in games
such as Astral Chain which make extensive use of ASTC textures.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.
- Bindings are cached, allowing to skip work when the game changes few
bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
from the GPU to emulate constant buffers, instead GL_EXT_memory_object
is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
theory this should save one hash table resolve inside the driver
compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
that are not Nvidia's proprietary, due to their low performance on
partial glBufferSubData calls synchronized with 3D rendering (that
some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
cache more bindings than before, skipping unnecesarry work.
This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
|
|\
| |
| | |
cmake: FFmpeg linking rework
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Also renames related CMake variables to match both the Find*FFmpeg* and
variables defined within the file. Fixes odd errors produced by the old
FindFFmpeg.
Citra's FindFFmpeg is slightly modified here: adds Citra's copyright at
the beginning, renames FFmpeg_INCLUDES to FFmpeg_INCLUDE_DIR, disables a
few components in _FFmpeg_ALL_COMPONENTS, and adds the missing avutil
component to the comment above.
|
| |
| |
| |
| |
| |
| |
| |
| | |
For Linux, instructs CMake to use the FFmpeg submodule in externals.
This is HEAVILY based on our usage of the late Unicorn. Minimal change
to MSVC as it uses the yuzu-emu/ext-windows-bin. MinGW now targets the
same ext-windows-bin libraries as MSVC for FFmpeg. Adds FFMPEG_LIBRARIES
to WIN32 and simplifies video_core/CMakeLists.txt a bit.
|
|/
|
|
| |
moron.h & morton.cpp are not used anywhere and are just empty files
|
|
|
|
|
| |
Fix "message(ERROR ..." to "message(FATAL_ERROR ..." to properly stop
cmake when Nsight Aftermath can't be configured.
|
|\
| |
| | |
buffer_cache/buffer_base: Add a range tracking buffer container and tests
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It keeps track of the modified CPU and GPU ranges on a CPU page
granularity, notifying the given rasterizer about state changes
in the tracking behavior of the buffer.
Use a small vector optimization to store buffers smaller than 256 KiB
locally instead of using free store memory allocations.
|
| |
| |
| |
| | |
Allow using the abstraction from the OpenGL backend.
|
|/
|
|
| |
These flags are already defined in src/cmake.
|
| |
|
| |
|
|\
| |
| | |
vulkan_common: Move reusable Vulkan abstractions to a separate directory
|
| |
| |
| |
| |
| |
| | |
Move surface initialization code to a separate file. It's unlikely to
use this code outside of Vulkan, but keeping platform-specific code
(Win32, Xlib, Wayland) in its own translation unit keeps things cleaner.
|
| |
| |
| |
| |
| |
| |
| |
| | |
Initialize debug callbacks (messenger) from a separate file. This allows
sharing code with different backends.
Change our Vulkan error handling to use exceptions instead of error
codes, simplifying the initialization process.
|
| |
| |
| |
| |
| |
| | |
Simplify Vulkan's backend initialization code by moving it to a separate
file, allowing us to initialize a Vulkan instance from different
backends.
|
| |
| |
| |
| | |
Allows sharing Vulkan wrapper code between different rendering backends.
|
| |
| |
| |
| |
| | |
Allows us to initialize a Vulkan dynamic library from different backends
without duplicating code.
|
|\ \
| |/
|/| |
Service threads
|
| |
| |
| |
| | |
- We must always use a GPU thread now, even with synchronous GPU.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
|
|/ |
|
|\
| |
| | |
video_core: Enforce C4715 (not all control paths return a value)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Most of the time people write code that always returns a value,
terminates execution, throws an exception, or uses an unconventional
jump primitive.
This is not always true when we build without asserts on mainline builds.
To avoid introducing undefined behavior on our most used builds, enforce
this warning signalling an error and stopping the build from shipping.
|
|/
|
|
|
| |
Removes the unnecesary burden of maintaining separate #ifdef paths and
allows us sharing generic Vulkan code across APIs.
|
|
|
|
|
| |
Cleans out the rest of the occurrences of variable shadowing and makes
any further occurrences of shadowing compiler errors.
|
|\
| |
| | |
video_core: Enforce -Werror=type-limits
|
| |
| |
| |
| | |
Silences one warning and avoids introducing more in the future.
|
|/
|
|
| |
Silence three warnings and make them errors to avoid introducing more in the future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit aims to implement the NVDEC (Nvidia Decoder) functionality, with video frame decoding being handled by the FFmpeg library.
The process begins with Ioctl commands being sent to the NVDEC and VIC (Video Image Composer) emulated devices. These allocate the necessary GPU buffers for the frame data, along with providing information on the incoming video data. A Submit command then signals the GPU to process and decode the frame data.
To decode the frame, the respective codec's header must be manually composed from the information provided by NVDEC, then sent with the raw frame data to the ffmpeg library.
Currently, H264 and VP9 are supported, with VP9 having some minor artifacting issues related mainly to the reference frame composition in its uncompressed header.
Async GPU is not properly implemented at the moment.
Co-Authored-By: David <25727384+ogniK5377@users.noreply.github.com>
|
|
|
|
|
|
|
|
| |
These compiler flags aren't shared with clang, so specifying these flags
unconditionally can lead to a bit of warning spam.
While we're in the area, we can also enable -Wunused-but-set-parameter
given this is almost always a bug.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reworks how host<->device synchronization works on the Vulkan
backend. Instead of "protecting" resources with a fence and signalling
these as free when the fence is known to be signalled by the host GPU,
use timeline semaphores.
Vulkan timeline semaphores allow use to work on a subset of D3D12
fences. As far as we are concerned, timeline semaphores are a value set
by the host or the device that can be waited by either of them.
Taking advantange of this, we can have a monolithically increasing
atomic value for each submission to the graphics queue. Instead of
protecting resources with a fence, we simply store the current logical
tick (the atomic value stored in CPU memory). When we want to know if a
resource is free, it can be compared to the current GPU tick.
This greatly simplifies resource management code and the free status of
resources should have less false negatives.
To workaround bugs in validation layers, when these are attached there's
a thread waiting for timeline semaphores.
|
|
|
|
| |
This forces us to fix all -Wswitch warnings in video_core.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the necessary CMake code to copy the contents in a string source
shader (GLSL or GLASM) to a header file then consumed by video_core
files.
This allows editting GLSL in its own files without having to maintain
them in source files.
For now, only OpenGL presentation shaders are moved, but we can add
GLASM presentation shaders and static SPIR-V generation through
glslangValidator in the future.
|
| |
|
|
|
|
|
|
|
|
| |
Add a flat table to test if it's legal to create a texture view between
two formats or copy betweem them.
This table is based on ARB_copy_image and ARB_texture_view. Copies are
more permissive than views.
|
| |
|
|\
| |
| | |
gl_arb_decompiler: Implement an assembly shader decompiler
|
| |
| |
| |
| |
| |
| | |
Emit code compatible with NV_gpu_program5.
This should emit code compatible with Fermi, but it wasn't tested on
that architecture. Pascal has some issues not present on Turing GPUs.
|
| |
| |
| |
| |
| | |
The rasterizer cache is no longer used. Each cache has its own generic
implementation optimized for the cached data.
|
|/
|
|
|
|
|
|
|
| |
Implement a generic shader cache for fast lookups and invalidations.
Invalidations are cheap but expensive when a shader is invalidated.
Use two mutexes instead of one to avoid locking invalidations for
lookups and vice versa. When a shader has to be removed, lookups are
locked as expected.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Drop the std::list hack to allocate memory indefinitely.
Instead use a custom allocator that keeps references valid until
destruction. This allocates fixed chunks of memory and puts pointers in
a free list. When an allocation is no longer used put it back to the
free list, this doesn't heap allocate because std::vector doesn't change
the capacity. If the free list is empty, allocate a new chunk.
|
|\
| |
| | |
GPU: More optimizations to GPU Command List Processing and DMA Copy Optimizations
|
| | |
|
|/
|
|
|
|
|
|
| |
Deduplicate code shared between vk_pipeline_cache and gl_shader_cache as
well as shader decoder code.
While we are at it, fix a bug in gl_shader_cache where compute shaders
had an start offset of a stage shader.
|
|\
| |
| | |
Introduce Predictive Flushing and Improve ASYNC GPU
|
| | |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds optional support for Nsight Aftermath. It is enabled through
ENABLE_NSIGHT_AFTERMATH in cmake. A path to the SDK has to be provided
by the environment variable NSIGHT_AFTERMATH_SDK.
Nsight Aftermath allows an application to generate "minidumps" of the
GPU state when a device loss happens. By analysing these on Nsight we
can know what a game was doing and why it triggered a device loss.
The dump is generated inside %APPDATA%\yuzu\log\gpucrash and this
directory is deleted every time a new instance is initialized with
Nsight enabled.
To enable it on yuzu there has a to be a driver and device capable of
running Nsight Aftermath on Vulkan. That means only Turing based GPUs
on the latest stable driver, beta drivers won't work for now.
It is manually enabled in Configuration>Debug>Enable Graphics Debugging
because when using all debugging capabilities there is a runtime cost.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a reversed look up table extracted from
https://gist.github.com/rygorous/2203834#file-gistfile1-cpp-L41-L62
that is used in
https://github.com/devkitPro/deko3d/blob/04d4e9e587fa3dc5447b43d273bc45f440226e41/source/maxwell/tsc_generate.cpp#L38
Games usually bind 0xFD expecting a float texture border of 1.0f.
The conversion previous to this commit was multiplying the uint8 sRGB
texture border color by 255. This is close to 1.0f but when that
difference matters, some graphical glitches appear.
This look up table is manually changed in the edges, clamping towards
0.0f and 1.0f.
While we are at it, move this logic to its own translation unit.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The intention behind a Vulkan wrapper is to drop Vulkan-Hpp.
The issues with Vulkan-Hpp are:
- Regular breaks of the API.
- Copy constructors that do the same as the aggregates (fixed recently)
- External dynamic dispatch that is hard to remove
- Alias KHR handles with non-KHR handles making it impossible to use
smart handles on Vulkan 1.0 instances with extensions that were included
on Vulkan 1.1.
- Dynamic dispatchers silently change size depending on preprocessor
definitions. Different files will have different dispatch definitions,
generating all kinds of hard to debug memory issues.
In other words, Vulkan-Hpp is not "production ready" for our needs and
this wrapper aims to replace it without losing RAII and exception
safety.
|
| |
|
| |
|
|
|
|
|
| |
Instead of pre-specializing shaders and then post-specializing them,
drop the later and only "specialize" the shader while decoding it.
|
| |
|
|
|
|
| |
Add support for render targets and viewports.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Abstract the current OpenGL implementation into the VideoCommon
namespace and reimplement it on top of that. Doing this avoids repeating
code and logic in the Vulkan implementation.
|
|
|
|
| |
Implements GL_SAMPLES_PASSED by waiting immediately for queries.
|
|\
| |
| | |
yuzu: Implement Vulkan frontend
|
| |
| |
| |
| |
| | |
Adds a Qt and SDL2 frontend for Vulkan. It also finishes the missing
bits on Vulkan initialization.
|
|/ |
|
|
|
|
|
| |
This abstraction takes care of presenting accelerated and
non-accelerated or "framebuffer" images to the Vulkan swapchain.
|
|
|
|
|
|
| |
This abstraction is Vulkan's equivalent to OpenGL's rasterizer. It takes
care of joining all parts of the backend and rendering accordingly on
demand.
|
| |
|
|
|
|
|
| |
It currently ignores PBO linearizations since these should be dropped as
soon as possible on OpenGL.
|
|
|
|
|
|
|
|
|
|
|
| |
This currently only supports quad arrays and u8 indices.
In the future we can remove quad arrays with a table written from the
CPU, but this was used to bootstrap the other passes helpers and it
was left in the code.
The blob code is generated from the "shaders/" directory. Read the
instructions there to know how to generate the SPIR-V.
|
| |
|
|
|
|
|
|
|
|
|
| |
This abstractio represents the state of the 3D engine at a given draw.
Instead of changing individual bits of the pipeline how it's done in
APIs like D3D11, OpenGL and NVN; on Vulkan we are forced to put
everything together into a single, immutable object.
It takes advantage of the few dynamic states Vulkan offers.
|
|
|
|
| |
This abstraction represents a Vulkan compute pipeline.
|
|
|
|
|
| |
This function allows us to share code between compute and graphics
pipelines compilation.
|
| |
|
|
|
|
|
| |
The renderpass cache is used to avoid creating renderpasses on each
draw. The hashed structure is not currently optimized.
|
|
|
|
|
|
|
|
|
| |
The update descriptor is used to store in flat memory a large chunk of
staging data used to update descriptor sets through templates. It
provides a push interface to easily insert descriptors following the
current pipeline. The order used in the descriptor update template has
to be implicitly followed. We can catch bugs here using validation
layers.
|
|\
| |
| | |
vk_descriptor_pool: Initial implementation
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Create a large descriptor pool where we allocate all our descriptors
from. It has to be wide enough to support any pipeline, hence its large
numbers.
If the descritor pool is filled, we allocate more memory at that moment.
This way we can take advantage of permissive drivers like Nvidia's that
allocate more descriptors than what the spec requires.
|
|/
|
|
|
| |
This was carried from Citra and wasn't really used on yuzu. It also adds
some runtime overhead. This commit removes it from yuzu's codebase.
|
|\
| |
| | |
vk_image: Add an image object abstraction
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This object's job is to contain an image and manage its transitions.
Since Nvidia hardware doesn't know what a transition is but Vulkan
requires them anyway, we have to state track image subresources
individually.
To avoid the overhead of tracking each subresource in images with many
subresources (think of cubemap arrays with several mipmaps), this commit
tracks when subresources have diverged. As long as this doesn't happen
we can check the state of the first subresource (that will be shared
with all subresources) and update accordingly.
Image transitions are deferred to the scheduler command buffer.
|
|/
|
|
|
|
|
| |
The job of this abstraction is to provide staging buffers for temporary
operations. Think of image uploads or buffer uploads to device memory.
It automatically deletes unused buffers.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The intention behind this hasheable structure is to describe the state
of fixed function pipeline state that gets compiled to a single graphics
pipeline state object. This is all dynamic state in OpenGL but Vulkan
wants it in an immutable state, even if hardware can edit it freely.
In this commit the structure is defined in an optimized state (it uses
booleans, has paddings and many data entries that can be packed to
single integers). This is intentional as an initial implementation that
is easier to debug, implement and review. It will be optimized in later
stages, or it might change if Vulkan gets more dynamic states.
|
| |
|
|
|
|
|
|
| |
Use a large flat array to look up texture formats. This allows us to
properly implement formats with different component types. It should
also be faster.
|
|
|
| |
Enable sign conversion warnings but don't treat them as errors.
|
| |
|
|
|
|
|
|
| |
Add an intermediary class that implements common functions across GPU
accelerated rasterizers. This avoids code repetition on different
backends.
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| | |
Implement a New LLE Buffer Cache
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
|
|\
| |
| | |
buffer_cache: Implement a generic buffer cache and its OpenGL backend
|
| |
| |
| |
| |
| |
| | |
Implements a templated class with a similar approach to our current
generic texture cache. It is designed to be compatible with Vulkan and
OpenGL,
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| | |
video_core: Drop OpenGL core in favor of OpenGL compatibility
|
| | |
|
| |
| |
| |
| |
| |
| | |
Analysis passes do not have a good reason to depend on shader_ir.h to
work on top of nodes. This splits node-related declarations to their own
file and leaves the IR in shader_ir.h
|
|/
|
|
|
|
|
|
|
| |
Instead of having a vector of unique_ptr stored in a vector and
returning star pointers to this, use shared_ptr. While changing
initialization code, move it to a separate file when possible.
This is a first step to allow code analysis and node generation beyond
the ShaderIR class.
|
|\
| |
| | |
Corrections and Implementation on GPU Engines
|
| | |
|
|\ \
| |/
|/| |
gl_shader_decompiler: Disable variable AOFFI on unsupported devices
|
| | |
|
|\ \
| | |
| | | |
gl_sampler_cache: Port sampler cache to OpenGL
|
| | | |
|
| | | |
|
|\ \ \
| | | |
| | | | |
vk_shader_decompiler: Implement a SPIR-V decompiler
|
| | | | |
|
| | |/
| |/|
| | |
| | | |
sirit is a runtime assembler for SPIR-V
|
|\ \ \
| |/ /
|/| | |
video_core: Implement API agnostic view based texture cache
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Implements an API agnostic texture view based texture cache. Classes
defined here are intended to be inherited by the API implementation and
used in API-specific code.
This implementation exposes protected virtual functions to be called
from the implementer.
Before executing any surface copies methods (defined in API-specific code)
it tries to detect if the overlapping surface is a superset and if it
is, it creates a view. Views are references of a subset of a surface, it
can be a superset view (the same as referencing the whole texture).
Current code manages 1D, 1D array, 2D, 2D array, cube maps and cube map
arrays with layer and mipmap level views. Texture 3D slices views are
not implemented.
If the view attempt fails, the fast path is invoked with the overlapping
textures (defined in the implementer). If that one fails (returning
nullptr) it will flush and reload the texture.
|
|\ \
| | |
| | | |
Better LZ4 compression utilization for the disk based shader cache and the yuzu build system
|
| |/ |
|
|/ |
|
| |
|
|\
| |
| | |
shader_ir: Remove "extras" from the MetaTexture
|
| | |
|
|\ \
| | |
| | | |
maxwell_to_vk: Initial implementation
|
| | | |
|
|\ \ \
| | | |
| | | | |
Asynchronous GPU command processing
|
| | | | |
|
| |/ / |
|
|\ \ \
| |/ /
|/| | |
gl_rasterizer_cache: Move format conversion functions to their own file
|
| |/ |
|
| |
| |
| |
| |
| | |
This buffer cache is just like OpenGL's buffer cache with some minor
style changes. It uses VKStreamBuffer.
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
This manages two kinds of streaming buffers: one for unified memory
models and one for dedicated GPUs. The first one skips the copy from the
staging buffer to the real buffer, since it creates an unified buffer.
This implementation waits for all fences to finish their operation
before "invalidating". This is suboptimal since it should allocate
another buffer or start searching from the beginning. There is room for
improvement here.
This could also handle AMD's "pinned" memory (a heap with 256 MiB) that
seems to be designed for buffer streaming.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scheduler abstracts command buffer and fence management with an
interface that's able to do OpenGL-like operations on Vulkan command
buffers.
It returns by value a command buffer and fence that have to be used for
subsequent operations until Flush or Finish is executed, after that the
current execution context (the pair of command buffers and fences) gets
invalidated a new one must be fetched. Thankfully validation layers will
quickly detect if this is skipped throwing an error due to modifications
to a sent command buffer.
|
|
|
|
|
| |
A memory manager object handles the memory allocations for a device. It
allocates chunks of Vulkan memory objects and then suballocates.
|
|
|
|
|
| |
VKResource is an interface that gets signaled by a fence when it is free
to be reused.
|
|\
| |
| | |
vulkan: Add dependencies and device abstraction
|
| |
| |
| |
| |
| |
| |
| | |
VKDevice contains all the data required to manage and initialize a
physical device. Its intention is to be passed across Vulkan objects to
query device-specific data (for example the logical device and the
dispatch loader).
|
| |
| |
| |
| |
| |
| |
| | |
This file is intended to be included instead of vulkan/vulkan.hpp. It
includes declarations of unique handlers using a dynamic dispatcher
instead of a static one (which would require linking to a Vulkan
library).
|
|/
|
|
|
|
|
|
|
|
| |
When I originally added the compute assert I used the wrong
documentation. This addresses that.
The dispatch register was tested with homebrew against hardware and is
triggered by some games (e.g. Super Mario Odyssey). What exactly is
missing to get a valid program bound by this engine requires more
investigation.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
- More accurate impl., fixes Undertale (among other games).
|
| |
|
|
|
|
| |
Ensures that destruction will always do the right thing in any context.
|
|
|
|
|
| |
Those implementations are quite costly, so there is no need to inline them to the caller.
Ressource deletion is often a performance bug, so in this way, we support to add breakpoints to them.
|
| |
|
| |
|
| |
|
|\
| |
| | |
Implemented (Partialy) Shader Header
|
| | |
|
|/
|
|
| |
This engine writes data from a FIFO register into the configured address.
|
|
|
|
|
| |
Without this, the header file won't show up by default within IDEs such
as Visual Studio.
|
|
|
|
|
|
|
|
|
| |
The idea of this cache is to avoid redundant uploads. So we are going
to cache the uploaded buffers within the stream_buffer and just reuse
the old pointers.
The next step is to implement a VBO cache on GPU memory, but for now,
I want to check the overhead of the cache management. Fetching the
buffer over PCI-E should be quite fast.
|
| |
|
| |
|
| |
|
|
|
|
| |
Only tiled->linear and linear->tiled copies that aren't offsetted are supported for now. Queries are not supported. Swizzled copies are not supported.
|
| |
|
| |
|
|
|
|
|
|
| |
The Ryujinx macro interpreter and envydis were used as reference.
Macros are programs that are uploaded by the games during boot and can later be called by writing to their method id in a GPU command buffer.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
This should reduce recompile times when editing the Maxwell3D register structure.
|
|
|
|
| |
Also moved the GPU MemoryManager class to video_core since it makes more sense for it to be there.
|
| |
|
|
|
|
|
| |
Removes the need to store to separate SRC and HEADER variables, and then
construct the target in most cases.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The geometry pipeline manages data transfer between VS, GS and primitive assembler. It has known four modes:
- no GS mode: sends VS output directly to the primitive assembler (what citra currently does)
- GS mode 0: sends VS output to GS input registers, and sends GS output to primitive assembler
- GS mode 1: sends VS output to GS uniform registers, and sends GS output to primitive assembler. It also takes an index from the index buffer at the beginning of each primitive for determine the primitive size.
- GS mode 2: similar to mode 1, but doesn't take the index and uses a fixed primitive size.
hwtest shows that immediate mode also supports GS (at least for mode 0), so the geometry pipeline gets refactored into its own class for supporting both drawing mode.
In the immediate mode, some games don't set the pipeline registers to a valid value until the first attribute input, so a geometry pipeline reset flag is set in `pipeline.vs_default_attributes_setup.index` trigger, and the actual pipeline reconfigure is triggered in the first attribute input.
In the normal drawing mode with index buffer, the vertex cache is a little bit modified to support the geometry pipeline. Instead of OutputVertex, it now holds AttributeBuffer, which is the input to the geometry pipeline. The AttributeBuffer->OutputVertex conversion is done inside the pipeline vertex handler. The actual hardware vertex cache is believed to be implemented in a similar way (because this is the only way that makes sense).
Both geometry pipeline and GS unit rely on states preservation across drawing call, so they are put into the global state. In the future, the other three vertex shader units should be also placed in the global state, and a scheduler should be implemented on top of the four units. Note that the current gs_unit already allows running VS on it in the future.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Modules didn't correctly define their dependencies before, which relied
on the frontends implicitly including every module for linking to
succeed.
Also changed every target_link_libraries call to specify visibility of
dependencies to avoid leaking definitions to dependents when not
necessary.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Preparation for a similar concept to Dolphin or PPSSPP. These can be JIT-ed and cached.
|
| |
|
| |
|
|
|
|
|
|
| |
This removes explicit checks sprinkled all over the codebase to instead
just have the SW rasterizer expose an implementation with no-ops for
most operations.
|
| |
|
|
|
|
|
|
| |
The main advantage of switching to glad from glLoadGen is that, apart
from being actively maintained, it supports a customizable entrypoint
loader function, which makes it possible to also support OpenGL ES.
|
| |
|
| |
|
|
|
|
|
| |
- Config: Add an option for selecting to use shader JIT or interpreter.
- Qt: Add a menu option for enabling/disabling the shader JIT.
|
| |
|
| |
|
|
|
|
| |
- Also renames "vertex_shader.*" to "shader_interpreter.*"
|
|
|
|
|
| |
The functions are so simple that having them separate only bloats the
code and hinders optimization.
|
| |
|
|
|
|
| |
The file only contained vector manipulation code, and such widely-useable code doesn't belong in video_core.
|
| |
|
| |
|
|
|
|
|
|
| |
- Centralizes color format encode/decode functions.
- Fixes endianness issues.
- Implements remaining framebuffer formats in the debugger.
|
|
|
|
|
|
|
|
| |
Several cleanups to the buildsystem:
- Do better factoring of common libs between platforms.
- Add support to building on Windows.
- Remove Qt4 support.
- Re-sort file lists and add missing headers.
|
|
|
|
|
|
|
|
|
| |
This should fix the GL loading errors that occur in some drivers due to
the use of deprecated functions by GLEW. Side benefits are more accurate
auto-completion (deprecated function and symbols don't exist) and faster
pointer loading (less entrypoints to load). In addition it removes an
external library depency, simplifying the build system a bit and
eliminating one set of binary libraries for Windows.
|
|
|
|
|
|
| |
Screen contents are now displayed using textured quads. This can be updated to expose an FBO once an OpenGL backend for when Pica rendering is being worked on. That FBO's texture can then be applied to the quads.
Previously, FBO blitting was used in order to display screen contents, which did not work on OS X. The new textured quad approach is less of a compatibility risk.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
I wrote most of this for ppsspp, so I hold full copyright over it.
In addition to the original release in ppsspp, this provides functionality to easily extend e.g. two-dimensional vectors to three-dimensional vectors.
|
|
|
|
| |
Changes for clarity of comments, removed redundant compiler flags.
|
| |
|
| |
|
| |
|
|
|