Note: If you want to look at this project's code, as well as the REAMDE which details the optimizations, you can find that here. This blog post moreso covers the process that I went through while working on this project. You could think of this as a post-mortem report, but I view it also as a guide for how to get more out of your CPU from your C++ program.
Extra thanks to Mr. Shriley for giving this post a proof read.
Right when I was fresh out of college, I was in the depth of my "Nim binge". I was looking to try a second attempt at writing a ray tracer after my so-so attempt back in a Global Illumination class. After a quick search on Amazon for "ray tracing" I found the Peter Shirley "Ray Tracing in one Weekend", "... The Next Week", and "... The Rest of your Life" mini books. At $3 a pop I thought it was a fair thing to take a look at. As an exercise to better learn the Nim language, I went through these books but used Nim instead of C++. Looking back at my first review of the book series, I feel as if I sounded a little harsh, but I really did have a pleasant time. I had some high hopes that my Nim version was going to perform faster than the book's code though it didn't. In fact, book no. 2 was much more woefully slow than the reference C++.
Now throughout the past 4-ish years, I've been seeing pictures from this book pop up here and there. Especially book 1's final scene. These books are now free to read online. I kind of now know what it feels like to purchase a game at release, only to see it go free-to-play a short while later. I think it's good that this introductory resource is now available to all. The HTML format is much better than the Kindle eBook in my opinion.
With the popularity of ray tracing exploding recently (thanks to hardware acceleration) I've only run across this book even more! A few months back I was itching to start a CG project. So I thought to myself "Why don't I revisit those ray tracing books, but this time do it in C++ 17. And try to optimize it as much as possible? Let's see if I can beat the book this time!" I chose this because I have been a little lax on learning the new(ish) C++17 features. I also wanted to see how far I could push a CPU bound renderer.
Here were some goals & restraints:
Write modern, clean, standard C++ 17 code
- Needs to compile on Windows, Mac & Linux, under GCC & Clang
Should be as vanilla as possible
- Two exceptions are single-header/static libraries (e.g PCG32), and one Boost library. Single header libs typically are pure C++ themselves and Boost is a defacto standard library for C++ anyways
Give the code a nice, cleaner project architecture
- The books' original project structure is kinda messy to be honest
- I still have the keep the general architecture of the ray tracing operations itself, but I'm free to rename and re-organize things as I see fit
Have it perform better than the books' implementation
- But add compilation (or runtime flags) to compare the book's methods with my own
Add some extra features to the ray tracer
- Be able to reproduce every scene in the book, and deterministically
- Mutli-threading provided by
- I wasn't allowed to add any new rendering techniques that were beyond the scope of the book. E.g. No adaptive sampling. Threading is allowed since I can turn it off, as to compare the performance of my code vs. the books'. It's not really fair to compare Adaptive sampling vs. No adaptive sampling.
Books 1 & 2: Ray Tracing in One Weekend, and The Next Week
Setting out, it was pretty simple what I would do here. Read a section of the book, copy over the code, see if it worked, then continue on if so. While going through each section I would try to consider if there was a more performant way that the code could be written. Sometimes this would involve simply reordering lines of code, so that the compiler could do auto-vectorization. Other times, I would ponder if there was a more efficient algorithm.
A simple to follow example here would be the alternative
*Rect::hit() methods (take
for reference, the Book's code has this structure:
- Do Math (part A)
- Branch if A's math is bad (by doing math to check if so)
- Do more math (part B)
- Branch if B's math is bad
- Do even more math (part C)
- Store results (part C) in variables
If you want to speed up your program, one of the best ways to do this is reducing the number of branches. Try to put
similar sections together. My code has the following structure for the
- Do Math (parts A, B, & C together)
- Branch if math is bad (either A or B)
- Store the computed math (from C) if it's good
Compilers are pretty good at recognizing parts of your code that could benefit from auto vectorization. But putting all of the math operations together in one section gives the compiler better hints on how to solve these tasks much more efficiently. Reducing the possibilities of branches also helps as well.
Another great example of this comes from the
The books' solution is chock-full with branches. The method I used (not 100% my own creation) eliminates the vast
majority of the branching and keeps similar computations close together so that auto-vectorization can be achieved.
If you think you have something that's faster, the best way is to prove it is by measuring. And the best way to test this is by setting up a long render (e.g. 5 minutes). Don't forget to run it a few times, in order to make sure the renders complete within the same general time frame (with five minutes, it's okay to be off by a second or two). After that, you swap your changes and see if it shaves off a significant portion; which must be consistent through multiple runs.
Sometimes performance boosts from these ways could be quite significant (e.g. 8-15%), other times, they could be really-really tiny (e.g. 1-2%). For example, if you shave 10 seconds off of a 5 minute render time, that's only 3%. It can be a little difficult to conclude if a change truly saves on rendering time. So then that usually involves doing renders that would normally take upwards of 30 minutes, only to see if you still get that 3% render time improvement. You need to make sure that your computer is not running any other processes at the time too.
And another important method of testing is to also verify any code changes on different hardware too. For example, sometimes on a Gen 7 Intel chip I would get a 30% speedup! But then on Gen 9 it was only 10% (still good). Then on a Gen 10 would maybe give me only mere 2%; I'd still take that.
I had a few optimizations that were in the ~1% area. These are the hardest to prove if there was any actual change on the rendering performance or not. This is where things start to get into the microbenching realm. Iit gets much more difficult to measure accurately. Environmental conditions can even start to affect measurements. I'm not talking about what operating system you're running on, but the actual temperature of your hardware. This page gives good detail on the relationship between heat and speed. Another way to test any micro optimizations is by taking the 1% changes and trying them out together. See if the sum of their parts makes a significant boost.
While running after all of these little improvements, I was reminded of Nicholas Omrod's 2016 CppCon presentation about
small string optimizations at Facebook. After a lot of work, they were able to get a custom
implementation that was 1% more efficient. For them, that can be a very big deal. Though to your average company, that
might not be so enthralling to spend time on. I can't remember the exact words, but some other wisdom was given in that
talk: "For every small change we make, it adds up; and eventually, we make a big leap."
A very important tool that I cannot forget to mention is Matt Godbolt's Compiler Explorer. Those of you in C++ circles have definitely seen this before. For those of you outside of them, this tool lets you look at the generated assembly code for any given C/C++ snippet. With this, you can see if any C++ code rewriting/reordering would generate more efficient CPU code. The compiler explorer can also help you search for micro optimizations. Which as stated before, can be a little hard to measure with purely time lapping alone. I used the compiler explorer to see if there was a way to rewrite code that would reduce branching, use vectorized instructions or even reduce the amount of generated assembly.
I do want to note that in general reducing the amount of instructions a program has to run through doesn't always mean that it will run faster. For example, take a loop that has 100 iterations. If it were to be unrolled by the compiler, it would generate more assembly in the final executable. That unrolled loop will run faster since the loop no longer needs to check 100 times if the iteration is done. This is why we always measure our code changes!
One of the other key problems here was ensuring that my renders were always deterministic. Meaning, given the same inputs (resolution, samples-per-pixel, scene setup, etc.), the output render should be exactly the same. If I re-rendered with more or less cores, it should be the same as well.
The RNG controls where a Ray is shot. When the ray hits an object it could be bounced into millions of directions. Maybe 1/2 those possibilities will send the ray into the sky (where next to no objects are), and the other half could send it into a hall of mirrors filled with diamonds (an unlimited no. of bounces). A small tweak in the RNG could bias it (ever so slightly) into one of those areas more than the other. And if the hall of mirrors scene was set up by another RNG, any changes to that will also change the scene quite a bit, thus also changing the render time.
For example, the final scene of book 2 had three components that rely on the RNG. The floor (a bunch of boxes of
varying heights), a "cloud" of spheres, and the BVH node structure. I tested out an optimization for the
Box object that required the RNG. Rendering the cornell box example was about 6% faster. But when rendering out the
aforementioned final scene it was 15% slower... I noticed that all of the floor boxes and "sphere cloud" were
placed differently with the optimization on/off. At first I thought that couldn't be the issue. But when I used two
separate RNGs (one for controlling the layout of the scene, the other for the Box optimization). Not only did I get
back my original scene layout, I also got that perf boost I saw from the Cornell Box scene.
Let's take two different renders of that final scene, but for the first image, I set the RNG to be "
for the second it's "
0123456789". These were rendered a few times over (to get a good average). The above
rendered in an average of 973.0 seconds. The lower took an average of 1021.1 seconds. While that not seem like much, changing the RNG's
seed made it render 5% slower!
I tried to make it when toggling on/off my optimizations, the resulting images would be the same. But there are some cases in which this ideal was bent a little. To be more specific, I'm talking about the trig approximations. If you're making a flight control system or a spacecraft, you want to be damn sure that all of your mathematical formulas are correct; but when it comes to graphics, we can fudge things if they fool the user. A.k.a the "eh... looks good enough" guideline.
Another good example here is that of the approximations for
atan2(). For texturing spheres, the
difference is barely noticeable, but the speed boost was impactful. It's very unlikely that without a comparison that
flips between the two images quickly, no one would notice the difference! Though if we were to have a much higher
detailed texture, and be zoomed in much closer to any of the trouble points (e.g having only the UK & Ireland in
view), it's more likely a viewer might see something odd.
While the approximation optimization doesn't produce the exact same image. I guarantee you if you showed one of these renders to a person for a minute, told them to look away, then showed them the other, they would tell you it's the exact same picture. If you can get a faster render and don't need it to be mathematically accurate, approximations are great!
Not all attempts at trying to squeeze more performance were successful. I'm sure a lot of us have heard about the
famous fast inverse square root trick that was used in Quake.
I was wondering if there was something similar for computing the non-inverse version,
std::sqrt(). The best resource
that I found on the subject was this.
After exhausting all of the methods presented, they either produced a bad image, or were actually slower than
Revision 1 (or as it's tagged in the repo,
r1) was where most of the work was done in this project. There were other
possibilities I wanted to explore, but didn't have the time initially, so I delegated these to later releases. They
aren't as grand as this initial one, but each of them has their own notes of significance.
While I was initially working on the
Box object, I couldn't help but think that using six rectangles objects stored
HittableList wasn't the most efficient way of rendering such an object. My initial optimization was to use a
BVHNode instead (which also required an RNG). While that led to a reduction in rendering time, I felt that this
could be pushed further. Looking at the
hit() functions for each constituent rectangle, It seemed they could be put
together in one grander function. This would have some benefits:
- Reduced memory overhead of creating seven extra objects. Which also means less memory traversing (or pointer chasing)
- Don't need to traverse a list (or tree) to find out what hit
- The code to check for hits looks like it could be easily auto-vectorized and have reduced branching
I don't want to bore you with the gory details (
you can see them here). This alternative
Box::hit() function, it's quite SIMD friendly. From some of
my measuring, this method was about 40% faster to render than the books' method!
At this point, I was starting to exhaust most of the "under the hood" optimizations that I thought could make an impact. Two more I explored this time around were "Deep Copy Per Thread" and "BVH Tree as a List".
Talking about that first one, this optimization was only available because my implementation allowed for rendering with multiple cores (the books' code does not). The scene to render is stored as a tree structure, filled with shared pointers to other shared pointers to even more shared pointers. This can be very slow if we're only reading data from the tree; which is what happens during the rendering process. My hypothesis was "For each thread I render with, if I make a local copy of the scene tree to that thread, the render will finish faster".
I added an extra method to each object/material/texture called
deep_copy(), which would well, produce a deep copy of
the object and its children. This was quite a bit of a tedious task. But when, for example, doing a render with 4x
cores. Having "copy per thread" turned on, it would render the scene 20-30% faster! I'll admit I'm not 100%
sure why this was so beneficial.
I posed the question to one of Reddit's C++ communities, but I have yet to be given a satisfactory answer.
"BVH Tree as a List" was more of a complex experiment. While it was slightly more performant, it did not
yield the results that I hoped for. The
BVHNode class is nothing more than a simple object that may contain either
another hittabale object, or two child
BVHNodes. These are all stored with shared pointers. I was concerned that
(reference counted) pointer chasing and fragmented (dynamic) memory might not be too efficient.
My thought was "If I take all of the AABB's for each node, and store them linearly in an array (i.e. list), but in
such a way they can be traversed as a tree, this would allow for faster traversal". The hope was that it would be
more memory/cache friendly to check all of the AABBs, rather than testing a chain of BVHNodes. The speedup was quite
piddly; I measured about 1-2%. The code is much more convoluted than the standard
BVHNode. If you wish to read it,
it's here (don't forget to check out
the header file too!)
At this point, I thought I had hit a limit on what I could change without breaking the architecture. I was looking to work on the implementation for book 3, but I decided it might be best to take a little break.
As I mentioned before, this mini-book series has exploded in popularity. Reading Peter Shirley's Twitter, I saw him retweeting images of a project called RayRender; a ray tracer for the R programming language that's useful for data-viz. This ray tracing program was actually based off of these mini-books. After that, I subscribed to Tyler Morgan-Wall's Twitter. In part, watching his progress made me interested in revisiting these books.
In a Christmas Eve tweet, he said that he was able to give RayRender a 20% performance boost. My curiosity was piqued and I started to scour through his recent commits.
HitRecord class, he simply changed a shared pointer over to being a raw pointer. That was all.
and its material pointer member are used a lot during the rendering process. It really makes no sense for them to be
shared pointers at all. This little change netted me a 10% - 30% perf. boost! This one I'm a little upset about not
Book 3: Ray Tracing the Rest of Your Life
Before working on
r2 I tried to make an attempt at book 3. But while
working through its initial chapters, I soon realized it was impossible to make sure I could render any older scenes.
This was because the core logic of the main rendering function was changing quite a bit from chapter to chapter.
But in the interest of completeness (and that I exhausted all other possible optimizations I could think of), I set out
to finish the series. It's in a separate branch called
book3. It can't render any of the older scenes from books 1
There is nothing special about this revision. It's nothing more than book 3 alone. It only supports four scenes; the Cornell Box box with various configurations.
While I was working on it, I did encounter a "fun" rendering bug that was an absolute pain to figure out. I forgot to set an initial value for a variable. Take this as a good lesson on why you should always assign an initial value to anything.
While going through Book 3, I couldn't help but notice that during the rendering stage, we allocate dynamic memory and pass it around with shared pointers; this is an absolute speed killer. This was being done for the PDFs. Taking a stern look at the code, it looked like the PDFs could be allocated as stack memory instead.
Part of the issue is that inside some of the objects'
hit() functions, it could generate a PDF subclass of any time.
But then that function had to return the PDF as a pointer to a base class. Then later on, the PDF would be evaluated
with virtual functions;
So I thought "Wouldn't it be possible to pass around PDFs using a variant?" One of the rules for variants is
that they must be allocated on the stack. This solves the issue of dynamic memory (and usage of shared pointers). Then
when we need to evaluate the PDF, the variant can tell us exactly which specific PDF to use, and thus the appropriate
PDFVariant was born. Any of the existing PDF subclasses can be put into it.
The code for this is on another separate branch called
book3.PDF_pointer_alternative. This also breaks the
architecture a little.
MixturePDF was a little bit of an issue since it originally required two shared pointers to
PDFVariant for those pointers doesn't not work, so I needed to use raw pointers to PDFs instead.
It was a really great experience to re-explore this book series, as well as Ray Tracing. There are other optimizations I think that could push the performance much further, but these all would require breaking architecture more than I already have. Just some ideas:
- Remove all uses of shared pointers and use raw ones instead
- Incorporate libraries like Halide so some parts could be run on the GPU (breaks my "CPU-only" rule though)
- Incorporate other sampling methods; e.g. blue-noise or sobol
- See if rendering could be performed "breath first" instead of "depth first"
When I first went through the book series four years ago, there were bits of errata here and there. I made sure to email Mr. Shirley whatever I found. I think all of them have been cleaned up. But since this book series is now freely available online and a community project, some more have been introduced; I recall finding more in book 3 than others.
There are some other things I find a little unsatisfactory too:
- Having to throw away all of the other scenes from books 1 & 2 to do book 3. It would be fun to revisit those former scenes with PDF based rendering
- Rotations are only done along the Y axis, and there is no way to change the point an object is rotated about. Though, anyone who wants to add this for the X & Z axis should be able to easily do so. Maybe in a future revision of this book having the rotation method use quaternions instead
- The Motion Blur effect feels wrong. Only spheres can be motion blurred. And for the feature, we had to give Rays a sense of time
But keep in mind the ray tracer that is built more on the educational side rather than being more "real world application" focused. It serves the purpose of teaching well. I still recommend that anyone who is interested in computer graphics give this book a read through.
There are other parts of CG programming I want to explore; I think it's a good time to move on.
Let me start with a bit of a narrative first:
Around a year ago, I released a C#/.NET Core library called Bassoon. I was looking for a cross platform (Windows, OS X, and Linux) audio` playback library for C#, but I couldn’t find one that was suitable. So I did what any normal software developer would do: make your own. Instead of going full C# with it, I opted to take some off the shelf C libraries and use P/Invoke to chat with them. It uses libsndfile for decoding audio formats (sans MP3, but that might change soon). And PortAudio for making speakers make noise.
If you look at the repo’s README’s
section, you might notice that I’m not telling anyone one to do a
sudo apt install libsndfile libportaudio (or some other package
manager command for another OS). I’m not the biggest fan of baked
dev environments. They can be a pain to reproduce for others. I
like to have my dependencies per project instead of being installed
system wide if I can help it.
The only downside is that you need to
then create some (semi-) automated way for others to set up a dev
environment for the project. E.g. all that “Download package from
nonsense. At first, I tried to make a simple bash script, but that
got kinda ugly pretty quickly. I’m not the best shell programmer
nor am I too fond of the syntax. There was a consideration for
Python too, but I assumed that it could get a bit long and verbose.
I found out about CMake’s
feature and set off to make one
surely disgusting CMakeLists.txt file. After a lot of
pain and anguish, I got it to work cross platform and generate all of
the native DLLs that I desired. Some things that stick out in my
- having to also run the
autogen.shin some cases
- needing to rename DLLs on Windows
- finding the correct
./configureoptions to use on OS X
These all reduced the elegance/simplicity. But hey, it works!... Until about a month ago…
While it was still good on Linux to set up a clean build environment. After some updates on OS X, it stopped building. Same for Windows/MSYS2 as well. This has happened before for me with MSYS2 updates (on other projects) and I waslooking for an alternative solution.
C++ specific package managers are a tad bit of a new thing. I remember hearing about Conan and vcpkg when they first dropped. After doing a little research, I opted to use the Microsoft made option. While it was yet another piece of software to install, it seemed quite straightforward and easy to set up. PortAudio and libsndfile was in the repo as well. After testing it could build those libraries for all three platforms (which it did), I was sold on using it instead. There were a few caveats, but well worth it for my situation:
- Dynamic libraries were automatically built on Windows, but I needed to specify 64 bit. It was building 32 bit by default
- For Linux and OS X, static libraries are built by default. If you want the dynamic ones all you have to do is something called overlaying tripplets
The generated file names of the
DLLs were not always what I needed them to be. For example, in my
C# code I have
[DllImport(“sndfile”)]to make a P/Invoked function. On Windows, the DLL name must be
sndfile.dll, Mac OS is
libsndfile.dylib, finally Linux is
libsndfile.so. On Windows I get
libsndfile-1.dllbuilt by default. Linux nets me
libsndfile-shared.so. For these a simple file renaming works. OS X is a bit of a different story:
You see, every operating
system has their own personality quirks. The Apple one is no
exception. When I tried renaming
dotnet run crashed saying it couldn’t find
the library. I know that I had all of the path & file locations
correct, as the previous CMake built libraries worked. I was kinda
trying another run I got a little hint.
being loaded and then unloaded almost as soon as it was
dyld: loaded: /Users/ben/Desktop/Bassoon/third_party/lib//libsndfile.dylib dyld: unloaded: /Users/ben/Desktop/Bassoon/third_party/lib//libsndfile.dylib
It also should be loading up
libvorbis.dylib, etc. but that wasn’t happening. Looking at the
vcpkg generated libs, running
otool -L (OS X’s version of
I got the reason why things weren’t the way I expected:
$ otool -L * libFLAC.dylib: @rpath/libFLAC.dylib (compatibility version 0.0.0, current version 0.0.0) @rpath/libogg.0.dylib (compatibility version 0.0.0, current version 0.8.4) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1) libogg.dylib: @rpath/libogg.0.dylib (compatibility version 0.0.0, current version 0.8.4) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1) libsndfile.dylib: @rpath/libsndfile-shared.1.dylib (compatibility version 1.0.0, current version 1.0.29) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1) @rpath/libogg.0.dylib (compatibility version 0.0.0, current version 0.8.4) @rpath/libvorbisfile.3.3.7.dylib (compatibility version 3.3.7, current version 0.0.0) @rpath/libvorbis.0.4.8.dylib (compatibility version 0.4.8, current version 0.0.0) @rpath/libvorbisenc.2.0.11.dylib (compatibility version 2.0.11, current version 0.0.0) @rpath/libFLAC.dylib (compatibility version 0.0.0, current version 0.0.0) libvorbis.dylib: @rpath/libvorbis.0.4.8.dylib (compatibility version 0.4.8, current version 0.0.0) @rpath/libogg.0.dylib (compatibility version 0.0.0, current version 0.8.4) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1) libvorbisenc.dylib: @rpath/libvorbisenc.2.0.11.dylib (compatibility version 2.0.11, current version 0.0.0) @rpath/libogg.0.dylib (compatibility version 0.0.0, current version 0.8.4) @rpath/libvorbis.0.4.8.dylib (compatibility version 0.4.8, current version 0.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1) libvorbisfile.dylib: @rpath/libvorbisfile.3.3.7.dylib (compatibility version 3.3.7, current version 0.0.0) @rpath/libogg.0.dylib (compatibility version 0.0.0, current version 0.8.4) @rpath/libvorbis.0.4.8.dylib (compatibility version 0.4.8, current version 0.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1)
From this, I was able to identify two problems:
- The “id” of a dylib didn’t
match it’s filename. E.g.
libvorbis.dylib’s id was set to
- The dylibs were looking for
non-existent dylibs. E.g.
libvorbisenc.dylibwas looking for
As to why this
wasn’t happening with the previously CMake build native libs, it’s
because they were configured/compiled with
vcpkg, I wasn’t able to set this when building
OS X toolkit does have a utility to fix the rpaths;
install_name_tool -id "@rpath/<dylib_file>" <dylib_file>is used to set the id we want
install_name_tool -change "@rpath/<bad_dylib_path>" "@rpath/<good_dylib_path>" <dylib_file>can fix an incorrect rpath
Since I wanted the setup process to be
fire and forget I still needed to write a script to automate all of
this. At first I considered bash again, but then I thought “I
don’t want to force someone to install the entire MSYS2 ecosystem
for Windows. What else can I use?...” Python came to mind.
Any developer is bound to have Python on their machine. I know
that’s what I tried to avoid in the first place, but looking at the
built in libraries for Python 3.x (.e.g
pathlib, etc) it was a better choice IMO. I also like the syntax
more; I’ll always trade simple and easy to understand code any
day over something that’s complex and shorter. For an example,
here is how I have the dylibs for OS X fixed up:
To run this 3rd party dependency setup script, all you need to do is set an environment variable telling it where vcpkg is installed and then it will take care of the rest!
Now that all of the native library dependencies have been automated away, the next challenge was packaging them for NuGet. Before, I told my users to “clone the repo and run the CMake setup command yourself”. That wasn’t good for many reasons. A big one being that no one could easily make their own program using Bassoon and easily distribute it. I know that I needed to also have the native libs also put inside the NuGet package, but what to do…
If you search for “nuget packaging
native libraries” into Goggle you get a slew of results telling
you what to do; all of it can seem overwhelming from a quick glance.
“Do I use
dotnet pack or
nuget pack? Do I need to
make a separate
.nuspec file? But wait,
dotnet pack does that
for me already… What is a
.targets file? What is a
.props file? How many of those do I need? What is this whole
native/libs/* tree structure? Oh man, all that XML looks
complicated and scary. I have no idea what I’m reading.”
Throwing in cross platform native libraries adds a whole other level
of trouble too. Most tutorials are only written for Windows and for
use within Visual Studio. Not my situation, which was all three
major platforms. Even peeking into other cross platform projects
to see how they did it, makes it look even more confusing. Too many
configuration files to make sense of.
Then I found NativeLibraryManager.
It has a much more simpler method to solve this
problem: embed your native libraries inside of your generated .NET
DLL and extract them at runtime. I don’t want to copy what it says
in it’s README, so go read that. But I’ll summarize that I only
had to add one line for each native library to the
embedding). Then for extracting at runtime, a little bit of code.
For people who want to use
directly, they only need to call the function
before doing anything else. And as for the nature of Bassoon’s
initialization, they don’t have to do anything!
I cannot thank @olegtarasov enough for creating this. I’m a programmer. I like to write code; not configuration and settings files.
At the time of writing,
libsndfileSharp package is partially broken for OS X due to a bug
in NativeLibraryManager. But
a ticket has been filed explaining what’s wrong and
most likely what needs to be fixed. It should be good soon :P
If anyone wants to help out with Basson (e.g. adding multi-channel support) or the lower level libraries (adding more bindings to libsndfile and PortAudio), I do all of the development over on GitLab.
I’d like to mention that I’m a little less employed than I would like to be; I need a job. My strongest skills are in C/C++, C#/.NET, Python, Qt, OpenGL, Computer Graphics, game technologies, and low level hardware optimizations. I currently live in the Boston area, so I’m looking for something around there. Or a company that lets me work remotely is fine too. I’m also open to part time, contract, and contract-to-hire situations. If you send me an email about your open positions, I’ll respond with a full resume and portfolio if I’m interested.
Please; I’m the sole provider for a cat who’s love is motivated by food. Kibble ain’t free.
If you want to go ahead and skip to the game I made, it's over here.
About 4+ years ago I heard about a new(-ish) game engine called Godot. I thought that it was kinda neat to have another open source one around, but I didn't think too much of it. In the past few years I'd hear about it again from time to time (e.g. when it gained C# support, making it a Unity contender). I was kind of interested in making something with it, but at the time I had no ideas.
Recently, I thought "It has sure been a while since I worked on a personal (technical) project. Not to mention a video game. I'm kinda itching to try out that there Godot thingy...". So about two-ish months ago, I decided to build something with this engine. Thinking about what could be small, short, but good enough to get my feet wet, I settled on reimplementing my Linux Game Jam 2017 entry Pucker Up.
Lemme take a brief aside to tell you about Pucker Up. As stated before, it was for a jam. My first Jam in fact. At that time, I was a bit more into tinkering with the Nim language. I kinda wanted to be a bit more HARDCORE™ with my approach in the jam. Luckily the only restriction was "Make a game that runs on Linux". No theme whatsoever; quite nice. We had 72 hours to finish and submit it. Then it would be live streamed by the jam creator.
I originally planned out a much more complex game (i.e. what was a tower defence). Then being HARDCORE™ I set out to grab some GLFW bindings, write my own game engine/framework, graphics/shaders, physics, etc. At the end of the first day, I realized how much of a difficult decision that I had made. All I got done were the initial windowing and input management, being able to draw flat coloured debug circles, and circle intersection algorithms; it was absolutely piddly. Realizing the pickle I had put myself into, I reevaluated what I could do with the toolkit I had made from scratch. I threw out my 99% of my original idea. Thinking instead about some sort of arcade-like game. The result was a sort of Pong with the goal in the center, where you had to keep the puck out of it. The QWOP control scheme happened by accident (I swear). Turned out it was kind of fun.
After the Jam was over, leveraging Nim's compile to JS feature, I was actually able to make a web browser playable version of the game in a short amount of time. I didn't have to force users to download a sketchy executable which was nice. That took me about two weeks since I needed to also add some extra Nim to JS/HTML5 bindings and work out a few kinks and bug or two. But it actually was quite simple. (Speaking about Nim, it also has some Godot bindings too.)
I've been looking at that JS/HTML5 version of Pucker Up for the two-ish years, discovered some bugs here and there, I thought it would be best to give it a little refresh. So instead of trying to wracking my brain to think up a new game, I settled on renewing something old I had.
Back to Godot-land. What I would say that originally drew me to the engine is it seemed like a nice professional project that was very liberal with it's licensing and it is openly developed. Linux being a first class citizen for the project is very sweet too. I tried out the Unreal engine on Linux and wasn't too happy with it. I've also had some serious issues with playing Unity made games on Linux.
I'm a person who has probably made more game engines than games. I don't know why this has been the case for me, but it just has. Maybe it's that feeling of being closer to what's going on in the whole program, or rather knowing 100% how something was made. Looking at the source for Godot (and the docs, which has A+ tutorials), I appreciate how easy and hackable this engine is. And to boot, the community is quite friendly.
The tutorial section of the Docs are quite good, but the API docs don't feel like they are fully here right now. For example if you look at much of Microsoft's C# docs, many methods usually have an accompanying example with them. This isn't always the case with Godot. For instance, to spruce up Pucker Up, I wanted to add some directional sound. Doing some googling I was led to the docs for AudioEffectPanner. Looking through, it's super sparse, and doesn't have a simple example of how it can be used. Not nice.
The main draw of using any engine is "Look at all the stuff we provide for you." When I started make games, it mostly was only a set of APIs. Tooling was something you had to do on your own. Godot provides a pretty nice editor (UI, level, animation, etc...), but it does take some learning.
I'm also a pretty big fan of Animation (go look through some of my other posts to see). The builtin framework for Animation that Godot provides I think is nice, but the editor isn't the most intuitive. I've used programs such as Flash (R.I.P.), Moho, Clip Studio Paint, and even Unity. They were always pretty easy to get started with. In Godot, I had some trouble figuring out how to key properties. I didn't know what kind of interpolation I was initially using. And the curves editor was difficult when it came to zooming it's viewport(.e.g I needed to work on a value in the range of [0.0, 1.0], it was a bit of a struggle). One of the other things that drove me nuts: If you didn't reset the playback head to `0` before running your game, the animation would start where the head was left in the editor. I can see how this is handy for Animations that are longer (.e.g 5+ seconds). Though if you notice in video games, many actions/effects are on the quick side (e.g. 1/4 of a second). When you're doing this, you tend to what to see the whole thing. I will admit that my digital animation experience it a bit lacking (I think I've spend more hours with a pencil and paper than with a Wacom tablet), but some stuff didn't feel that natural. I also ran into a bug: when tabbing through options to adjust properties, sometimes the editor would freeze. Not fun.
Godot also has a minimal UI framework is built in. Adding custom skinning can be quite the hassle though. A CSS like way to skin the UI would be wonderful (which is that the Qt and Gtk frameworks already do). This might be a time sink for the engine (and would add much extra complexity) for what is only a minor feature. I can dream though...
After about two-ish months of work, I had a more sophisticated version of Pucker Up ready. I had some extra animations, more sound variation, smoother movement, improved score reporting; I could go on for a while. Without Godot, these would have taken much longer. There was one last hurdle to overcome: Exporting to HTML5. I was hoping for this to be a few clicks and done, but it wasn't quite that easy. Retrieving the HTML5 export was simple enough. IIRC, there was a one-click download button. Export prep was a breeze too. The issue arose when I then went to run the game in my browser. When I loaded up the
game.html file in my browser, the scaling and placement of my assets were not where I expected them to be. Even across browsers (and different machines) it all appeared vastly different. I got some of my other friends to help me test this out. I did file a ticket on the Godot issue tracker about my problem. I also asked the Godot Reddit community for their experiences with targeting HTML5. From there, someone was able to suggest I tinker with the "Stretch" settings for the project. Voila! It gave me the result that I wanted and order was fully restored. This was quite the frustrating experience and I think it could be remedied by mentioning these "Stretch" settings in the "Exporting for the Web" doc page.
I've also noticed that the performance of Pucker Up is much smoother in Chrome(ium) than in Firefox. That isn't good. The later browser has some semi-choppy movement of the puck (and high speeds), and the sound effects (such as the bounces) weren't playing at the exact moment that they should. They were off by a few milliseconds. While this doesn't grandly impact the game (as it's still playable), I don't like having to add a "Plays slightly better on Chrome based browsers." footnote to my game page.
All in all, it may seem that I'm be a little extra critical of Godot here, but in earnest it's been a very pleasant experience (re)making Pucker Up with it. With were it stands right now, things can only get better with the framework as time goes on. I'm looking forward to the next game jam I'll enter because I'm sure enough to use this tool. Or maybe I go on with a more grand idea. Godot only knows. :P
You can find the Godot version of Pucker Up over here. Please enjoy.
In other news, about a month ago I started a daily Japanese practice blog (毎日日本語練習ブログ). I call it 日本語ベン強 (nihongo-benkyou). It's a pun; you won't get it unless you know some of the language. Since February of 2018, I've started taking Japanese classes. Foreign languages have always been an affinity of mine, and I was looking for a new hobby that's not related to tech. I created the blog so I could get some practice writing Japanese. Even if it's not the most correct. So far it's been fun.
I created this since I wanted to be able to do audio playback in C# on Linux, Windows, and OS X. While there were some packages available on NuGet, they were not preferable to me, so I took the time to make my own. It's hobbled together using P/Invoke bindings to `libsndfile` and PortAudio. At the moment it does not support MP3 decoding (though that is planned), which is one of the main drawbacks. And you also need to build the native dependencies yourself, but a CMake file is provided to handle that. In the future, I hope to also add some more features such as audio recording and some minor effects. So far I am happy with it.