Vulkan Renderer
A high-performance rendering engine built with Vulkan API, featuring PBR materials, advanced shadow techniques, and a custom shader pipeline.
Overview
This Vulkan-based rendering engine is designed for high-performance real-time graphics applications. It provides a flexible and extensible architecture that supports modern rendering techniques while maintaining excellent performance across different hardware configurations.
Implements a complete PBR pipeline with metallic-roughness workflow, image-based lighting, and energy conservation.
Supports cascaded shadow mapping, percentage-closer filtering, and variance shadow maps for high-quality shadows.
Flexible shader system with hot-reloading, permutation management, and automatic resource binding.
Utilizes multiple CPU cores for command buffer generation, resource uploads, and scene management.
Last Updated
3 months ago
Technologies
Screenshots
Implementation
Architecture
The renderer is built on a modular architecture with clear separation between the Vulkan abstraction layer, resource management, and rendering systems. This design allows for easy extension and maintenance while providing a clean API for client applications.
Vulkan Integration
The Vulkan API is wrapped in a thin abstraction layer that simplifies common operations while maintaining full access to Vulkan's features. This includes utilities for device selection, queue management, synchronization, and resource creation.
Resource Management
Resources like textures, buffers, and pipelines are managed through a centralized system that handles allocation, deallocation, and state tracking. This includes a custom memory allocator that efficiently manages Vulkan memory heaps and a descriptor set manager for optimal binding.
Rendering Pipeline
The rendering pipeline supports both forward and deferred rendering paths with a flexible material system. Post-processing effects are implemented using a composable graph-based approach that allows for easy addition and configuration of effects.
Challenges & Solutions
Challenge
Vulkan's explicit synchronization model required careful management of resource dependencies and command execution. This was addressed by implementing a high-level synchronization framework that tracks resource usage and automatically inserts appropriate barriers.
Solution
Developed a dependency tracking system that analyzes resource access patterns and automatically inserts the minimal set of barriers required for correct execution.
Challenge
Supporting various material features and rendering techniques led to a combinatorial explosion of shader variants, which increased compilation time and binary size.
Solution
Implemented a shader permutation system with runtime code generation and caching. Only the actually used combinations are compiled, and common code is shared between variants.
Challenge
Ensuring consistent behavior across different GPU vendors and driver versions was challenging due to varying interpretations of the Vulkan specification and driver bugs.
Solution
Created a comprehensive validation suite that tests all renderer features across different hardware configurations. Implemented workarounds for known driver issues that are conditionally enabled based on vendor and driver version detection.
Performance
Classic Sponza scene with 262K triangles, PBR materials, and dynamic lighting
Dense forest environment with 1.2M triangles and volumetric lighting
Urban environment with 3.5M triangles, reflections, and global illumination
- •Hierarchical depth buffer for early fragment culling
- •Asynchronous compute for post-processing effects
- •Mesh clustering for efficient GPU culling
- •Texture streaming with priority-based loading
- •Shader hot reloading for rapid iteration
Code Snippets
VulkanDevice::VulkanDevice(const DeviceCreateInfo& createInfo) {
// Select physical device
m_physicalDevice = selectPhysicalDevice(createInfo.instance, createInfo.requiredExtensions);
// Query queue family properties
uint32_t queueFamilyCount = 0;
vkGetPhysicalDeviceQueueFamilyProperties(m_physicalDevice, &queueFamilyCount, nullptr);
std::vector<VkQueueFamilyProperties> queueFamilies(queueFamilyCount);
vkGetPhysicalDeviceQueueFamilyProperties(m_physicalDevice, &queueFamilyCount, queueFamilies.data());
// Find queue families that support graphics, compute, and transfer operations
QueueFamilyIndices indices = findQueueFamilies(queueFamilies, createInfo.surface);
// Create logical device with requested queues and extensions
std::vector<VkDeviceQueueCreateInfo> queueCreateInfos;
std::set<uint32_t> uniqueQueueFamilies = {
indices.graphicsFamily.value(),
indices.computeFamily.value(),
indices.transferFamily.value()
};
float queuePriority = 1.0f;
for (uint32_t queueFamily : uniqueQueueFamilies) {
VkDeviceQueueCreateInfo queueCreateInfo{};
queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
queueCreateInfo.queueFamilyIndex = queueFamily;
queueCreateInfo.queueCount = 1;
queueCreateInfo.pQueuePriorities = &queuePriority;
queueCreateInfos.push_back(queueCreateInfo);
}
VkDeviceCreateInfo deviceCreateInfo{};
deviceCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
deviceCreateInfo.pQueueCreateInfos = queueCreateInfos.data();
deviceCreateInfo.queueCreateInfoCount = static_cast<uint32_t>(queueCreateInfos.size());
deviceCreateInfo.pEnabledFeatures = &createInfo.enabledFeatures;
deviceCreateInfo.enabledExtensionCount = static_cast<uint32_t>(createInfo.requiredExtensions.size());
deviceCreateInfo.ppEnabledExtensionNames = createInfo.requiredExtensions.data();
// Create the logical device
VK_CHECK(vkCreateDevice(m_physicalDevice, &deviceCreateInfo, nullptr, &m_device));
// Get queue handles
vkGetDeviceQueue(m_device, indices.graphicsFamily.value(), 0, &m_graphicsQueue);
vkGetDeviceQueue(m_device, indices.computeFamily.value(), 0, &m_computeQueue);
vkGetDeviceQueue(m_device, indices.transferFamily.value(), 0, &m_transferQueue);
// Initialize memory allocator
initializeMemoryAllocator();
}
Related Projects
Ray Tracing Framework
A real-time ray tracing framework with BVH acceleration structures.