Performance Evaluation with PBRT-v3: Windows, Linux, WSL, Linux on a VM with Windows as Host, and Linux on a VM with Linux as Host

9 min readSep 21, 2022

Render from https://twitter.com/ECycles1/status/1534421276709769218

Summary

This article presents a performance evaluation of rendering 4 different scenes using PBRT-v3 in 5 different configuration of operating systems: Linux, and Windows running directly on the hardware, then Windows Subsystem for Linux, and finally Linux running on virtual machines with Windows and Linux as hosts. Performance measurement results show that Linux running on a virtual machine with Windows as host is the worst configuration, and that the other 4 are better options, however they are not statistically different between them.

PBRT

PBRT focus exclusively on photorealistic rendering, which can be defined variously as the task of generating images that are indistinguishable from those that a camera would capture in a photograph or as the task of generating images that evoke the same response from a human observer as looking at the actual scene. Photorealism gives a reasonably well-defined metric for evaluating the quality of the rendering system’s output. [7].

PBRT is based on the ray-tracing algorithm. Ray-tracing algorithms on computers follow the path of infinitesimal rays of light through the scene until they intersect a surface. This approach gives a simple method for finding the first visible object as seen from any particular position and direction and is the basis for many rendering algorithms [7].

Experiment design

The experiment consists on rendering 4 different scenes from PBRT’s official repository, in each of the distinct OSs configurations. The images are: a complex outdoor scene with many plants and trees, named ecosys (Fig. 1); a room with teapots scene rendered with Metropolis light transport, named room-mlt (Fig. 2); a simple scene with a teapot illuminated by a disk area light, named teapot-area-light (Fig. 3); and an unusual and intricate form on a glossy plate, named yeahright (Fig. 4), this is the more time-consuming scene to be rendered.

Fig. 1. Fairly complex outdoor scene with many plants and trees, illuminated by an environment map from PBRT official repository.

Fig. 2. Room with teapots scene, rendered with Metropolis light transport from PBRT official repository.

Fig. 3. Simple scene with a teapot illuminated by a disk area light source from PBRT official repository.

Fig. 4. An unusual and intricate form on a glossy plate from PBRT official repository.

The scenes are rendered using a different combination of two render options present in PBRT:

Accelerator: it is an aggregate function for efficiently finding ray-shape intersections. PBRT uses data structures that help reduce the O(n) complexity of testing a ray for intersection with all n objects in a scene. Most rays will intersect only a few primitives and miss the others by a large distance. If an intersection acceleration technique can reject whole groups of primitives at once, there will be a substantial performance improvement compared to simply testing each ray against each primitive in turn [7]. All the scenes are rendered using bvh, and kdtree as accelerators.
Integrator: it implements the light transport algorithm that computes radiance arriving at the film plane from surfaces and participating media in the scene [7]. All the scenes are rendered using path, and bdpt as integrators.

To handle the combination of these options a 2^k factor design is applied. In a factorial design all possible combinations of the levels of the factors are investigated in each replication. This way of working provides the smallest number of runs where the k factors can be studied in a complete factorial design. In this research we have 2 factors (so we have 22 factors), as it is displayed in table I.

Also, to perform the analysis of the data we use multi-factor ANOVA (analysis of variance) for the OSs, scenes, and configuration factors with their corresponding levels, as is described in table II. This is done using R.

Methodology

As can be seen in table II the experiments of this research are executed in 5 different OSs, here are the details of each of them:

Windows: Windows 11 Pro, version 21H2, and build 22000.675.
Linux: Ubuntu 20.04.4 LTS, release 20.04.
WSL: running Ubuntu 20.04.4 LTS, release 20.04.
Linux-W-VM: a VirtualBox 6.1.34 r150636 (Qt5.12.8)
VM with Windows (the one mentioned in this section) as host, running Linux (the same mentioned in this section) as guest OS with 2 processors and 8192 of RAM.
Linux-L-VM: a VirtualBox 6.1.34 r150636 (Qt5.12.8) VM with Linux (the one mentioned in this section) as host, running Linux (the one mentioned in this section) as guest OS with 2 processors and 8192 of RAM.

All the previous OSs were installed in the same computer, the hardware specifications are resumed in table III.

PBRT-v3 was compiled on release mode for each of the OSs (debug mode of PBRT is very slow compare to release mode), and for each of the different combinations of the config factor, so at the end we have 4 different executable files for each OSs.

The order in which the scenarios were executed in each of the OSs is: Windows, Linux, WSL, Linux-W-VM, and Linux-L-VM. Every scenario was executed 5 times per OS, and their order were randomized (a script was written to run them). In total 400 scenarios were ran. Energy saving features were turned off. After the startup of the OS the script to run the experiments was executed, only OS background services were running, and we avoid to interact with the computer as much as possible.

Results

In this section we present the results obtained in the different iterations of the experiment that was performed.

First thing is to calculate the ANOVA, the results are displayed in figure 5, the column that shows the p-value suggests that there is interaction among all the factors. Later in this article we do some pairwise t-testing to validate or deny this partial result.

Fig. 5. Results obtained after running the ANOVA.

The Levene test (test of the equality of variances) resulted 0.351, this means data has the quality of having homoscedasticity.

In the figure 6 there are 3 different plots: the first one shows the time consumed by each of the OSs. The second plot displays the PBRT configuration (config field) behaviour related with the time. Finally, the third one represents the time vs rendering each scene per OS.

Fig. 6. Time vs OS, vs configuration, and vs scene.

Table IV summarizes the results from executing a pairwise t-test, and table V shows the existing statistical difference between the OSs rendering the yeahright scene. These tables combined with the plots in figure 6 are the main evidence of the results of this experiment. This is discussed in the next section.

Discussion

As we can see in the first plot of the figure 6 the OS that seems to be statistically different to the others is Linux running in a VM with Windows 11 as host (Linux-W-VM), this is confirmed by the result of the pairwise t-test displayed in table IV where the only values smaller than 0.05 are related with Linux-W-VM.

It might be not very clear from plots in figure 6 if there is a statistically difference between Linux, WSL, Windows, and Linux-L-VM (Linux running in a VM with Linux as host), for this we have the pairwise t-test results in table IV which tell us that they are not different.

Linux had a better performance than the other OSs, followed by WSL, however the difference is not enough to say that they are the best choices. The more time consuming render is for yeahright image, and the less time consuming is for teapot-area-light; ecosys and room-mlt have a similar behaviour. Even when the difference is not significant among Linux, WSL, Windows, and Linux-L-VM Linux was the fastest to render all the images in all the 4 possible configurations. Linux-W-VM was the worst in all the scenarios. This is a very interesting situation, since the best and the worst are the same OSs, with the difference that one is running natively, and the other is running on a VM with Windows as host.

In general terms there is not statistical difference between Linux, Windows, WSL, and Linux-L-WM, however in table V it is possible to see that Linux is statistically different to Windows, and Linux-L-VM (also to Linux-W-VM, but we already knew that), which convert Linux in the best choice to render the yeahright scene. The difference between yeahright and the other scenes is that the first one is the more time consuming image to render, which might be something important in the case of future scenarios where the scenes to render are complex, and considerably time consuming.

Conclusions and future work

There can be too many guesses around the performance of different OSs, and the best way to clarify these doubts is applying the scientific method, and analyzing the data using the correct statistical methods.

In this research we made a performance evaluation of different OSs rendering images using PBRT-v3. The ANOVA and pairwise t-test show that there is not statistical difference between Linux, WSL, Windows, and Linux running on a VM with Linux as host; the only one that is different for worse is Linux running on a VM with Windows as host. However, Linux was the best rendering the more time consuming image (yeahright) compared with the other 4 options, this suggests that Linux might be the best choice when it is required to render very complex, and time consuming scenes.

Surprisingly, the best and the worst are the same OS, with the difference that one is running directly on the hardware, and the other is running on a VM with Windows as host. The reasons for this behaviour are unknown, however is not possible to assure that running on a VM degrades performance, since Linux-L-WM is in the group of the ones with good performance. This raises some questions:

Is Windows a bad host for running VMs?
Is Linux a better host for running VMs?
Is there some impact in the performance if the host and the guest OSs are the same?

These questions can be the motivation for future investigations, next to perform a deeper research in the result of Linux being the best choice when the scene is very time consuming, can we generalize this behaviour? Can we expect to see something similar in the future?

Doing a research where all the images are complex, and consume too much time can help to clarify this doubts, and to fine the answers for these questions.

References

[1] Josip Balen, Kresˇimir Vdovjak, and Goran Martinovic ́. Performance evaluation of windows virtual machines on a linux host. Automatika: cˇasopis za automatiku, mjerenje, elektroniku, racˇunarstvo i komu- nikacije, 61(3):425–435, 2020.

[2] David Bretthauer. Open source software: A history. Published Works, UConn Library, 2001.

[3] M. G. D’Elia and V. Paciello. Performance evaluation of labview on linux ubuntu and window xp operating systems. In 2011 19thTelecom- munications Forum (TELFOR) Proceedings of Papers, pages 1494– 1498, 2011.

[4] Einar Krogh. An introduction to windows operating system. Einar Krogh & bookbon.com, Norway, 2017.

[5] NathanLewis,AndrewCase,AishaAli-Gombe,andGoldenG.Richard. Memory forensics and the windows subsystem for linux. Digital Investigation, 26:S3–S11, 2018.

[6] Tim Newman. Performance comparison. Linux Journal, 1999(67es):7– es, 1999.

[7] MattPharr,WenzelJakob,andGregHumphreys.PhysicallyBasedRen- dering: From Theory to Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 3rd edition, 2016.

[8] Sasko Ristov and Marjan Gusev. Performance vs cost for windows and linux platforms in windows azure cloud. In 2013 IEEE 2nd International Conference on Cloud Networking (CloudNet), pages 214–218, 2013.

[9] William Stallings. The windows operating system. Operating Systems: Internals and Design Principles, 2005.