diff --git a/assets/state-of-the-art/3d-interaction/meshlab.png b/assets/state-of-the-art/3d-interaction/meshlab.png new file mode 100644 index 0000000..791d7fc Binary files /dev/null and b/assets/state-of-the-art/3d-interaction/meshlab.png differ diff --git a/src/foreword/3d-model.tex b/src/foreword/3d-model.tex index 2961c95..796d45f 100644 --- a/src/foreword/3d-model.tex +++ b/src/foreword/3d-model.tex @@ -6,15 +6,15 @@ We also give insights about interaction and streaming by comparing the 3D case t \section{What is a 3D model?\label{f:3d}} \subsection{3D data} -Most classical 3D models are set of mesh and textures, that can potentially be arranged in a scene graph. +Most classical 3D models are sets of meshes and textures, that can potentially be arranged in a scene graph. Such a model can typically contain the following: \begin{itemize} - \item \textbf{Vertices} are simply 3D points; - \item \textbf{Faces} are polygons defined from vertices (most of the time, they are triangles); - \item \textbf{Textures} are images that can be used for painting faces, to add visual richness; - \item \textbf{Texture coordinates} are information added to a face, describing how the texture should be painted over faces; - \item \textbf{Normals} are 3D vectors that can give information about light behaviour on a face. + \item \textbf{vertices}, that are simply 3D points; + \item \textbf{faces}, that are polygons defined from vertices (most of the time, they are triangles); + \item \textbf{textures}, that are images that can be used to paint faces in order to add visual richness; + \item \textbf{texture coordinates}, that are information added to a face, describing how the texture should be painted over faces; + \item \textbf{normals}, that are 3D vectors that can give information about light behaviour on a face. \end{itemize} The Wavefront OBJ is one of the most popular format that describes all these elements in text format. @@ -100,7 +100,7 @@ A typical 3D renderer follows Algorithm~\ref{f:renderer}. \caption{A rendering algorithm\label{f:renderer}} \end{algorithm} -The first task the renderer needs to perform is sending the data to the GPU\@: this is done in the loading loop at the beginning. +The first task the renderer needs to perform is sending the data to the GPU\@: this is done in the loading loop during an initialisation step. This step can be slow, but it is generally acceptable since it only occurs once at the beginning of the program. Then, the renderer starts the rendering loop: at each frame, it renders the whole scene: for each object, it binds the corresponding material to the GPU and then renders the object. During the rendering loop, there are two things to consider regarding performances: @@ -109,12 +109,12 @@ During the rendering loop, there are two things to consider regarding performanc \item the more objects in the scene, the more overhead cause by the CPU/GPU communication at each step of the loop. \end{itemize} -The way the loop works forces objects with different textures to be rendered separately. +The way the loop works forces objects with different materials to be rendered separately. An efficient renderer keeps the number of objects in a scene low to avoid introducing overhead. -However, an important feature of 3D engine regarding performance is frustum culling. +However, an important feature of 3D engines regarding performance is frustum culling. The frustum is the viewing volume of the camera. -Frustum culling consists in avoiding rendering objects that are outside the viewing volume of the camera. +Frustum culling consists in skipping objects that are outside the viewing volume of the camera in the rendering loop. Algorithm~\ref{f:frustum-culling} is a variation of Algorithm~\ref{f:renderer} with frustum culling. \begin{algorithm}[th] diff --git a/src/foreword/implementation.tex b/src/foreword/implementation.tex index cb34573..e0a64d2 100644 --- a/src/foreword/implementation.tex +++ b/src/foreword/implementation.tex @@ -16,16 +16,15 @@ However, WebGL is very low level and it can be really painful to write code, eve For example, \href{https://www.tutorialspoint.com/webgl/webgl_drawing_a_triangle.htm}{this tutorial}'s code contains 121 lines of javascript, 46 being code (not comments or empty lines) to render a simple, non-textured triangle. For this reason, it seems unreasonable to build a system like the one we are describing in raw WebGL\@. There are many libraires that wrap WebGL code and that help people building 3D interfaces, and \href{https://threejs.org}{THREE.js} is probably one of the most popular. - THREE.js acts as a 3D engine built on WebGL\@. It provides classes to deal with everything we need: \begin{itemize} \item the \textbf{Renderer} class contains all the WebGL code needed to render a scene on the web page; \item the \textbf{Object} class contains all the boilerplate needed to manage the tree structure of the content, it contains a transform (translation and rotation) and it can have children that are other objects; \item the \textbf{Scene} class is the root object, it contains all of the objects we want to render and it is passed as argument to the render function; - \item the \textbf{Geometry} and \textbf{BufferGeometry} classes are the classes that hold the vertices buffers, we will discuss it more in Section~\ref{f:geometries}; - \item the \textbf{Material} class is the class that holds the properties used to render geometry (the most important information being the texture), there are many classes derived from Material, and the developer can choose what material he wants for its objects; - \item the \textbf{Mesh} class is the class that links the geometry and the material, it derives the Object class and can thus be added to a scene and renderer. + \item the \textbf{Geometry} and \textbf{BufferGeometry} classes are the classes that hold the vertex buffers, we will discuss it more in Section~\ref{f:geometries}; + \item the \textbf{Material} class is the class that holds the properties used to render geometry (the most important information being the texture), there are many classes derived from Material, and the developer can choose what material they want for its objects; + \item the \textbf{Mesh} class is the class that links the geometry and the material, it derives the Object class and can thus be added to a scene and rendered. \end{itemize} A snippet of the basic usage of these classes is given in Listing~\ref{f:three-hello-world}. @@ -48,19 +47,19 @@ There are two most important geometry classes in THREE.js: \subsection{Rust} -\subsubsection{Borrow checker} - In this section, we explain the specificities of Rust and why it is a great language for writing efficient native software safely. +\subsubsection{Borrow checker} + Rust is a system programming language focused on safety. It is made to be efficient (and effectively has performances comparable to C or C++) but with some extra features. C++ users might see it as a language like C++ but that forbids undefined behaviours.\footnote{in Rust, when you need to execute code that might lead to undefined behaviours, you need to put it inside an \texttt{unsafe} block. Many operations will not be available outside an \texttt{unsafe} block (e.g., dereferencing a pointer, or mutating a static variable). The idea is that you can use \texttt{unsafe} blocks when you require it, but you should avoid it as much as possible and when you do it, you must be particularly careful.} The most powerful concept from Rust is \emph{ownership}. Basically, every value has a variable that we call its \emph{owner}. To be able to use a value, you must either be its owner or borrow it. -There are two types of borrow, the immutable borrow and the mutable borrow (people from C++ can see them as having a reference to a variable). -The compiler comes with the \emph{borrow checker} which makes sure you only use variables that you are allowed to. -For example, the owner can only use the value if it is not being borrowed, and it is only possible to either borrow mutably a value once, or immutably borrow a value as many times as you want. +There are two types of borrow, the immutable borrow and the mutable borrow (roughly equivalent to references in C++). +The compiler comes with the \emph{borrow checker} which makes sure you only use variables that you are allowed to use. +For example, the owner can only use the value if it is not being borrowed, and it is only possible to either borrow mutably a value once, or immutably borrow a value many times. At first, the borrow checker seems particularly efficient to detect bugs in concurrent software, but in fact, it is also decisive in non concurrent code. Consider the piece of C++ code in Listings~\ref{f:undefined-behaviour-cpp} and~\ref{f:undefined-behaviour-cpp-it}. @@ -105,6 +104,7 @@ The equivalent code in Rust is in Listings~\ref{f:undefined-behaviour-rs} and~\r What happens is that the iterator needs to borrow the vector. Since it is borrowed, it can no longer be borrowed as mutable since mutating it could invalidate the other borrowers. And effectively, the borrow checker will crash the compiler with the error in Listing~\ref{f:undefined-behaviour-rs-error}. + \begin{figure}[ht] \lstinputlisting[ language=XML, @@ -112,8 +112,9 @@ And effectively, the borrow checker will crash the compiler with the error in Li label={f:undefined-behaviour-rs-error} ]{assets/dash-3d-implementation/undefined-behaviour-error.txt} \end{figure} + This example is one of the many examples of how powerful the borrow checker is: in Rust code, there can be no dangling reference, and all the segmentation faults coming from them are detected by the compiler. -The borrow checker may seem like an enemy to newcomers because it often rejects code that seem correct, but once you get used to it, you understand what is the problem with the code and either fix the problem easily, or realise that the whole architecture is wrong and understand why. +The borrow checker may seem like an enemy to newcomers because it often rejects code that seem correct, but once one get used to it, they understand what is the problem with their code and either fix the problem easily, or realise that the whole architecture is wrong and understand why. It is probably for those reasons that Rust is the \emph{most loved programming language} according to the Stack Overflow Developer Survey in~\citeyear{so-survey-2016,so-survey-2017,so-survey-2018} and~\citeyear{so-survey-2019}. diff --git a/src/foreword/video-vs-3d.tex b/src/foreword/video-vs-3d.tex index 455b36e..52b8289 100644 --- a/src/foreword/video-vs-3d.tex +++ b/src/foreword/video-vs-3d.tex @@ -3,7 +3,7 @@ \section{Similarities and differences between video and 3D\label{i:video-vs-3d}} Contrary to what one might think, the video streaming setting and the 3D streaming setting share many similarities: at a higher level of abstraction, both systems allow a user to access remote content without having to wait until everything is loaded. -Analyzing the similarities and the differences between the video and the 3D scenarios as well as having knowledge about video streaming literature are the key to developing an efficient 3D streaming system. +Analysing similarities and differences between the video and the 3D scenarios as well as having knowledge about video streaming literature are the key to developing an efficient 3D streaming system. \subsection{Chunks of data} @@ -46,7 +46,7 @@ In video, those media are mostly images, sounds, and eventually subtitles, where In both cases, an algorithm for content streaming has to acknowledge those different media types and manage them correctly. In video streaming, most of the data (in terms of bytes) is used for images. -Thus, the most important thing a video streaming system should do is to optimize images streaming. +Thus, the most important thing a video streaming system should do is to optimise images streaming. That is why, on a video on Youtube for example, there may be 6 resolutions for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 resolutions for sound. This is one of the main differences between video and 3D streaming: in a 3D scene, geometry and texture sizes are approximately the same, and leveraging between those two types of content is a key problem. @@ -68,7 +68,7 @@ Even though these interactions seem easy to handle, giving the best possible exp \item To navigate to another time in the video, the user can: \begin{itemize} - \item click the timeline of the video where they wants; + \item click the timeline of the video where they want; \item press the left arrow key to move 5 seconds backwards; \item press the right arrow key to move 5 seconds forwards; \item press the \texttt{J} key to move 10 seconds backwards; @@ -307,6 +307,13 @@ This is typically the case of the video game \emph{nolimits 2: roller coaster si Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angle (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom). The most common controls are the trackball controls where the user rotate the object like a ball \href{https://threejs.org/examples/?q=controls\#misc_controls_trackball}{(live example here)} and the orbit controls, which behave like the trackball controls but preserving the up vector \href{https://threejs.org/examples/?q=controls\#misc_controls_orbit}{(live example here)}. These types of controls are notably used on the popular mesh editor \href{http://www.meshlab.net/}{MeshLab} and \href{https://sketchfab.com/}{SketchFab}, the YouTube for 3D models. + +\begin{figure}[th] + \centering + \includegraphics[width=0.7\textwidth]{assets/state-of-the-art/3d-interaction/meshlab.png} + \caption{Screenshot of MeshLab} +\end{figure} + Another popular way of controlling a free camera in a virtual environment is the first person controls \href{https://threejs.org/examples/?q=controls\#misc_controls_pointerlock}{(live example here)}. These controls are typically used in shooting video games, the mouse rotates the camera and the keyboard is used to translate it. diff --git a/src/introduction/challenges.tex b/src/introduction/challenges.tex index 1438225..fb05a42 100644 --- a/src/introduction/challenges.tex +++ b/src/introduction/challenges.tex @@ -6,11 +6,11 @@ The objective of our work is to design a system that allows a user to access rem A 3D streaming client has lots of tasks to accomplish: \begin{itemize} - \item render a scene; \item decide what part of the model to download next; \item download the next part; \item parse the downloaded content; \item add the parsed result to the scene; + \item render the scene; \item manage the interaction with the user. \end{itemize} diff --git a/src/introduction/main.tex b/src/introduction/main.tex index 1724e95..8192be6 100644 --- a/src/introduction/main.tex +++ b/src/introduction/main.tex @@ -3,7 +3,7 @@ \fresh{} During the last years, 3D acquisition and modeling techniques have made tremendous progress. -Recent software use 2D images from photographs to reconstruct 3D data, e.g. \href{https://alicevision.org/\#meshroom}{Meshroom} is free and open source software that got almost \numprint{200000} downloads on \href{https://www.fosshub.com/Meshroom.html}{fosshub} that use \emph{structure-from-motion} and \emph{multi-view-stereo} to infer a 3D model. +Recent software use 2D images from cameras to reconstruct 3D data, e.g. \href{https://alicevision.org/\#meshroom}{Meshroom} is free and open source software that got almost \numprint{200000} downloads on \href{https://www.fosshub.com/Meshroom.html}{fosshub}, that use \emph{structure-from-motion} and \emph{multi-view-stereo} to infer a 3D model. There are more and more devices that are specifically built to harvest 3D data: some still very expensive and provide precise information such as LIDAR (Light Detection And Ranging, as in RADAR but with light instead of radio waves), while some cheaper devices can obtain coarse data such as the Kinect. Thanks to these techniques, more and more 3D data become available. These models have potential for multiple purposes, for example, they can be printed, which can reduce the production cost of some pieces of hardware or enable the creation of new objects, but most uses are based on visualisation. @@ -26,7 +26,7 @@ For example, they can be used for augmented reality, to provide user with feedba In most 3D visualisation systems, the 3D data is stored on a server and needs to be transmitted to a terminal before the user can visualise it. The improvements in the acquisition setups we described lead to an increasing quality of the 3D models, thus an increasing size in bytes as well. -Simply downloading 3D content and waiting until the content is fully downloaded to let the user visualise it is no longer a satisfactory solution, so adaptive streaming is needed. +Simply downloading 3D content and waiting until it is fully downloaded to let the user visualise it is no longer a satisfactory solution, so adaptive streaming is needed. In this thesis, we propose a full framework for navigation and streaming of large 3D scenes, such as districts or whole cities. % With the progress in data acquisition and modeling techniques, networked virtual environments, or NVE, are increasing in scale. diff --git a/src/introduction/outline.tex b/src/introduction/outline.tex index bf31421..e79e4bb 100644 --- a/src/introduction/outline.tex +++ b/src/introduction/outline.tex @@ -9,8 +9,8 @@ Then it reviews the different 3D streaming approaches. The last section of this chapter focuses on 3D interaction. Then, in Chapter~\ref{bi}, we present our first contribution: an in-depth analysis of the impact of the UI on navigation and streaming in a 3D scene. -We first develop a basic interface for navigating in 3D and thus, we introduce 3D objects called \emph{bookmarks} that help users navigating in the scene. -We then present a user study that we conducted on 50 people that shows that bookmarks ease user navigation as bookmark improves performance at tasks such as finding objects. +We first develop a basic interface for navigating in 3D and then, we introduce 3D objects called \emph{bookmarks} that help users navigating in the scene. +We then present a user study that we conducted on 50 people that shows that bookmarks ease user navigation: they improve performance at tasks such as finding objects. % Then, we setup a basic 3D streaming system that allows us to replay the traces collected during the user study and simulate 3D streaming at the same time. We analyse how the presence of bookmarks impacts the streaming: we propose and evaluate streaming policies based on pre-computations relying on bookmarks and that measurably increase the quality of experience. @@ -18,10 +18,10 @@ In Chapter~\ref{d3}, we present the most important contribution of this thesis: DASH-3D is an adaptation of DASH (Dynamic Adaptive Streaming over HTTP): the video streaming standard, to 3D streaming. We first describe how we adapt the concepts of DASH to 3D content, including the segmentation of content. We then define utility metrics that associate score to each chunk depending on the user's position. -Then, we present a client and various streaming policies based on our utilities that can benefit from the DASH format. +Then, we present a client and various streaming policies based on our utilities that can benefit from DASH format. We finally evaluate the different parameters of our client. In Chapter~\ref{sb}, we present our last contribution: the integration of the interaction ideas that we developed in Chapter~\ref{bi} into DASH-3D. -We first develop an interface that allows desktop as well as mobile devices to navigate in a 3D scene being streamed, and that introduces a new style of bookmarks. -We then explain why simply applying the ideas developed in Chapter~\ref{bi} is not sufficient and we propose more efficient pre-computations that can enhance the streaming. +We first develop an interface that allows desktop as well as mobile devices to navigate streamed 3D scenes, and that introduces a new style of bookmarks. +We then explain why simply applying the ideas developed in Chapter~\ref{bi} is not sufficient and we propose more efficient pre-computations that enhances the streaming. Finally, we present a user study that provides us with traces on which we evaluate the impact of our extension of DASH-3D on the quality of service and on the quality of experience.