This commit is contained in:
Thomas Forgione 2020-02-11 13:22:55 +01:00
parent 6d1ef9fdd9
commit 24fd2390ac
10 changed files with 57 additions and 57 deletions

View File

@ -17,10 +17,10 @@ I also want to thank Praveen: working with him was a pleasure.
I would also like to thank the big ones (whom I forgot to thank during the defense, \emph{oopsies}), Sylvie, Jean-Denis, Simone, Yvain, Pierre.
They, as well as my advisors, not only helped during my PhD, but they also were my teachers back in engineering school and are a great part of the reason why I enjoyed being in school and being a PhD student.
Then, I also want to thank Sylvie and Muriel, for the administrative parts of the PhD, which are not negligible.
Then, I also want to thank Sylvie and Muriel, for the administrative parts of the PhD, which can often be painful.
% Potes n7
I also would like to thank the colleagues from when I was in engineering school, since they contributed to the skills that I used during this PhD\@: Alexandre, Killian, David, Maxence, Martin, Maxime, Korantin, Marion, Amandine, Émilie.
I would also like to thank the colleagues from when I was in engineering school, since they contributed to the skills that I used during this PhD\@: Alexandre, Killian, David, Maxence, Martin, Maxime, Korantin, Marion, Amandine, Émilie.
% Famille
Finally, I want to thank my brother, my sister and my parents, for the support and guidance.

View File

@ -4,9 +4,9 @@ In this thesis, we attempted to answer four main problems: \textbf{the content p
To answer those problems, we presented three main contributions.
\paragraph{}
Our first contribution analyses the links between the streaming policy and the user's interaction.
Our first contribution analyzes the links between the streaming policy and the user's interaction.
We set up a basic system allowing navigation in a 3D scene (represented as a textured mesh) with the content being streamed through the network from a remote server.
We developed a navigation aid in the form of \textbf{3D bookmarks}, and we conducted a user study to analyse its impact on navigation and streaming.
We developed a navigation aid in the form of \textbf{3D bookmarks}, and we conducted a user study to analyze its impact on navigation and streaming.
On one hand, consistently with the state of the art, we observed that navigation aid \textbf{helps people navigating in a scene}, since they perform tasks faster and more easily.
On the other hand, we showed that benefiting from bookmarks in 3D navigation comes at the cost of a negative impact on the quality of service (QoS): since users navigate faster, they require more data during the same time span.
However, we also showed that this cost is not a fatality: using prior knowledge we have about bookmarks, we are able to \textbf{precompute an optimal data ordering offline} so that the QoS increases when users click on bookmarks.
@ -30,8 +30,8 @@ Finally, we brought back the \textbf{3D navigation bookmark within our DASH-3D f
We developed interfaces that allow navigating in 3D scenes for both \textbf{desktop and mobile devices} and we reintroduced bookmarks in these interfaces.
The setup of our first contribution considered only geometry, triangle by triangle, which made precomputations and ordering straightforward.
Moreover, as the server knew exactly the client needs, it could create chunks adapted to the client's requirements.
In DASH-3D, the data is structured a priori (offline), so that chunks are grouped independently of a client's need.
We therefore focused on precomputing an optimal order for chunks from each bookmark, and, altered the streaming policies from our second contribution to switch to this optimized order when a user clicks a bookmark.
Simulations showed that the QoS is positively impacted by those policies.
In DASH-3D, the data are structured a priori (offline), so that chunks are grouped independently of a client's need.
We therefore focused on precomputing an optimized order for chunks from each bookmark, and, altered the streaming policies from our second contribution to switch to this optimized order when a user clicks a bookmark.
Evaluations showed that the QoS is positively impacted by those policies.
A demo paper was published at the conference ACMMM in 2019~\citep{dash-3d-bookmarks-demo} showing the interfaces for desktop and mobile clients with bookmarks, but without the streaming aspect.
A journal paper will be submitted shortly to value this third contribution.

View File

@ -6,14 +6,14 @@ In this section, we shall detail three major perspectives for future work.
\subsection{Semantic information}
In this thesis, no attention has been given to semantic.
Our content preparation considers only spatial information both for 3D content and clients so our adaptation sets and segments may separate data that could be grouped semantically.
Having semantic information could help us derive a better structure for our content: we know for example that displaying half a building will lead to low quality of experience.
Our content preparation considers only spatial information so our adaptation sets and segments may separate data that could be grouped semantically.
Having semantic information could help us derive a better structure for our content: we know for example that displaying half a building leads to poor quality of experience.
In order to account for semantic besides partitioning, we could also adapt the utilities we have been defining for our segments: some semantically significant data can be considered as more important than other by taking it into account in our utilities.
\subsection{Compression / multi-resolution for geometry}
In this thesis, we considered different resolutions for textures, but we have not investigated geometry compression nor multi-resolution.
Geometry data is transmitted as OBJ files (mostly consisting in ASCII encoded numbers), which is terrible for transmission.
Geometry data are transmitted as OBJ files (mostly consisting in ASCII encoded numbers), which is terrible for transmission.
Compression would reduce the size of the geometry files, thus increasing the quality of experience.
Supporting multi-resolution geometry would improve it even more, even if performing multi-resolution on a large and heterogeneous scene is difficult.
To this day, only little attention has been given to multi-resolution compression for textured geometry~\citep{maglo20153d}, and their focus has been on 3D objects.

View File

@ -1,27 +1,27 @@
A 3D streaming system is a system that progressively collects 3D data.
The previous chapter voluntarily remained vague about what \emph{3D data} actually is.
This chapter presents in detail the 3D data we consider and how it is rendered.
We also give insights about interaction and streaming by comparing the 3D case to the video one.
The previous chapter voluntarily remained vague about what \emph{3D data} actually are.
This chapter presents in detail the 3D data we consider and how they are rendered.
We also give insights about interaction and streaming by comparing the 3D setting to the video one.
\section{What is a 3D model?\label{f:3d}}
\subsection{3D data}
The 3D models we are interested in are sets of meshes and textures, which can potentially be arranged in a scene graph.
The 3D models we are interested in are sets of textured meshes, which can potentially be arranged in a scene graph.
Such models can typically contain the following:
\begin{itemize}
\item \textbf{Vertices}, which are 3D points,
\item \textbf{Faces}, which are polygons defined from vertices (most of the time, they are triangles),
\item \textbf{Textures}, which are images that can be used to paint faces in order to add visual richness,
\item \textbf{Texture coordinates}, which are information added to a face, describing how the texture should be painted over faces,
\item \textbf{Texture coordinates}, which are information added to a face, describing how the texture should be painted over it,
\item \textbf{Normals}, which are 3D vectors that can give information about light behaviour on a face.
\end{itemize}
The Wavefront OBJ is a format that describes all these elements in text format.
A 3D model encoded in the OBJ format typically consists in two files: the materials file (\texttt{.mtl}) and the object file (\texttt{.obj}).
A 3D model encoded in the OBJ format typically consists in two files: the material file (\texttt{.mtl}) and the object file (\texttt{.obj}).
\paragraph{}
The materials file declares all the materials that the object file will reference.
The material file declares all the materials that the object file will reference.
A material consists in name, and other photometric properties such as ambient, diffuse and specular colors, as well as texture maps, which are images that are painted on faces.
Each face corresponds to a material.
A simple material file is visible on Snippet~\ref{i:mtl}.
@ -33,7 +33,7 @@ These elements are numbered starting from 1.
Faces are declared by using the indices of these elements. A face is a polygon with an arbitrary number of vertices and can be declared in multiple manners:
\begin{itemize}
\item \texttt{f 1 2 3} defines a triangle face that joins the first, the second and the third declared vertex;
\item \texttt{f 1 2 3} defines a triangle face that joins the first, the second and the third declared vertices;
\item \texttt{f 1/1 2/3 3/4} defines a similar triangle but with texture coordinates, the first texture coordinate is associated to the first vertex, the third texture coordinate is associated to the second vertex, and the fourth texture coordinate is associated with the third vertex;
\item \texttt{f 1//1 2//3 3//4} defines a similar triangle but referencing normals instead of texture coordinates;
\item \texttt{f 1/1/1 2/3/3 3/4/4} defines a triangle with both texture coordinates and normals.
@ -98,13 +98,13 @@ A typical 3D renderer follows Algorithm~\ref{f:renderer}.
\caption{A rendering algorithm\label{f:renderer}}
\end{algorithm}
The first task the renderer needs to perform is sending the data to the GPU\@: this is done in the loading loop during an initialisation step.
The first task the renderer needs to perform is sending the data to the GPU\@: this is done in the loading loop during an initialization step.
This step can be slow, but it is generally acceptable since it only occurs once at the beginning of the program.
Then, the renderer starts the rendering loop: at each frame, it renders the whole scene: for each object, it binds the corresponding material to the GPU and then renders the object.
During the rendering loop, there are two things to consider regarding performances:
\begin{itemize}
\item obviously, the more faces a geometry contains, the slower the \texttt{draw} call is;
\item the more objects in the scene, the more overhead cause by the CPU/GPU communication at each step of the loop.
\item the more faces in a geometry, the slower the \texttt{draw} call;
\item the more objects in the scene, the more overhead caused by the CPU/GPU communication at each step of the loop.
\end{itemize}
The way the loop works forces objects with different materials to be rendered separately.

View File

@ -22,8 +22,8 @@ It provides classes to deal with everything we need:
\item the \textbf{Renderer} class contains all the WebGL code needed to render a scene on the web page;
\item the \textbf{Object} class contains all the boilerplate needed to manage the tree structure of the content, it contains a transform (translation and rotation) and it can have children that are other objects;
\item the \textbf{Scene} class is the root object, it contains all of the objects we want to render and it is passed as argument to the render function;
\item the \textbf{Geometry} and \textbf{BufferGeometry} classes are the classes that hold the vertex buffers, we will discuss it more in Section~\ref{f:geometries};
\item the \textbf{Material} class is the class that holds the properties used to render geometry (the most important information being the texture), there are many classes derived from Material, and the developer can choose what material they want for its objects;
\item the \textbf{Geometry} and \textbf{BufferGeometry} classes are the classes that hold the vertex buffers, we will discuss it more in the next paragraph;
\item the \textbf{Material} class is the class that holds the properties used to render geometry (the most important information being the texture), there are many classes derived from Material, and the developer can choose what material they want for their objects;
\item the \textbf{Mesh} class is the class that links the geometry and the material, it derives the Object class and can thus be added to a scene and rendered.
\end{itemize}
A snippet of the basic usage of these classes is given in Snippet~\ref{f:three-hello-world}.
@ -53,13 +53,13 @@ In this section, we explain the specificities of Rust and why it is an adequate
Rust is a system programming language focused on safety.
It is made to be efficient (and effectively has performances comparable to C\footnote{\url{https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/rust.html}} or C++\footnote{\url{https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/rust-gpp.html}}) but with some extra features.
C++ users might see it as a language like C++ but that forbids undefined behaviours.\footnote{in Rust, when you need to execute code that might lead to undefined behaviours, you need to put it inside an \texttt{unsafe} block. Many operations will not be available outside an \texttt{unsafe} block (e.g., dereferencing a pointer, or mutating a static variable). The idea is that you can use \texttt{unsafe} blocks when you require it, but you should avoid it as much as possible and when you do it, you must be particularly careful.}
C++ users might see it as a language like C++ but that forbids undefined behaviours.\footnote{in Rust, when you need to execute code that might lead to undefined behaviours, you need to put it inside an \texttt{unsafe} block. Many operations are not available outside an \texttt{unsafe} block (e.g., dereferencing a pointer, or mutating a static variable). The idea is that you can use \texttt{unsafe} blocks when you require it, but you should avoid it as much as possible and when you do it, you must be particularly careful.}
The most powerful concept from Rust is \emph{ownership}.
Basically, every value has a variable that we call its \emph{owner}.
To be able to use a value, you must either be its owner or borrow it.
There are two types of borrow, the immutable borrow and the mutable borrow (roughly equivalent to references in C++).
The compiler comes with the \emph{borrow checker} which makes sure you only use variables that you are allowed to use.
For example, the owner can only use the value if it is not being borrowed, and it is only possible to either borrow mutably a value once, or immutably borrow a value many times.
For example, the owner can only use the value if it is not being borrowed, and it is only possible to either mutably borrow a value once, or immutably borrow a value many times.
At first, the borrow checker seems particularly efficient to detect bugs in concurrent software, but in fact, it is also decisive in non concurrent code.
Consider the piece of C++ code in Snippets~\ref{f:undefined-behaviour-cpp} and~\ref{f:undefined-behaviour-cpp-it}.
@ -144,7 +144,7 @@ Its objectives are:
\subsubsection{Conclusion}
In our work, many tasks will consist in 3D content analysis, reorganising and 3D rendering evaluation.
In our work, many tasks will consist in 3D content analysis, reorganization, rendering and evaluation.
Many of these tasks require long computations, lasting from hours to entire days.
To perform them, we need a programming language that has good performances.
In addition, the extra features that Rust provides ease tremendously development, and this is why we use Rust for all tasks that do not require having a web interface.

View File

@ -1,19 +1,19 @@
\section{Similarities and differences between video and 3D\label{i:video-vs-3d}}
The video streaming setting and the 3D streaming setting share many similarities: at a higher level of abstraction, both systems allow a user to access remote content without having to wait until everything is loaded.
Analysing similarities and differences between the video and the 3D scenarios as well as having knowledge about video streaming literature are the key to developing an efficient 3D streaming system.
Analyzing similarities and differences between the video and the 3D scenarios as well as having knowledge about video streaming literature are the key to developing an efficient 3D streaming system.
\subsection{Chunks of data}
In order to be able to perform streaming, data needs to be segmented so that a client can request chunks of data and display it to the user while requesting another chunk.
In order to be able to perform streaming, data need to be segmented so that a client can request chunks of data and display it to the user while requesting another chunk.
In video streaming, data chunks typically consist in a few seconds of video.
In mesh streaming, some progressive mesh approaches encode a base mesh that contains low resolution geometry and textures and different chunks that increase the resolution of the base mesh.
Otherwise, a mesh can also be segmented by separating geometry and textures, creating chunks that contain some faces of the model, or some other chunks containing textures.
\subsection{Data persistence}
One of the main differences between video and 3D streaming is the persistence of data.
In video streaming, only one second of video is required at a time.
One of the main differences between video and 3D streaming is data persistence.
In video streaming, only one chunk of video is required at a time.
Of course, most video streaming services prefetch some future chunks, and keep in cache some previous ones, but a minimal system could work without latency and keep in memory only two chunks: the current one and the next one.
Already a few problems appear here regarding 3D streaming:
@ -26,7 +26,7 @@ Already a few problems appear here regarding 3D streaming:
All major video streaming platforms support multi-resolution streaming.
This means that a client can choose the quality at which it requests the content.
It can be chosen directly by the user or automatically determined by analysing the available resources (size of the screen, downloading bandwidth, device performances)
It can be chosen directly by the user or automatically determined by analyzing the available resources (size of the screen, downloading bandwidth, device performances)
\begin{figure}[th]
\centering
@ -39,18 +39,18 @@ Such strategies are reviewed in Section~\ref{sote:3d-streaming}.
\subsection{Media types}
Just like a video, a 3D scene is composed of different types of media.
Just like a video, a 3D scene is composed of different media types.
In video, those media are mostly images, sounds, and subtitles, whereas in 3D, those media are geometry or textures.
In both cases, an algorithm for content streaming has to acknowledge those different media types and manage them correctly.
In video streaming, most of the data (in terms of bytes) is used for images.
In video streaming, most of the data (in terms of bytes) are used for images.
Thus, the most important thing a video streaming system should do is to optimize images streaming.
That is why, on a video on Youtube for example, there may be 6 available qualities for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 qualities for sound.
This is one of the main differences between video and 3D streaming: in a 3D setting, the ratio between geometry and texture varies from one scene to another, and leveraging between those two types of content is a key problem.
\subsection{Interaction}
The ways of interacting with content is probably the most important difference between video and 3D.
The ways of interacting with content is another important difference between video and 3D.
In a video interface, there is only one degree of freedom: time.
The only things a user can do is letting the video play, pausing, resuming, or jumping to another time in the video.
There are also controls for other options that are described \href{https://web.archive.org/web/20191014131350/https://support.google.com/youtube/answer/7631406?hl=en}{on this help page}.
@ -287,10 +287,10 @@ When it comes to 3D, there are many approaches to manage user interaction.
Some interfaces mimic the video scenario, where the only variable is the time and the camera follows a predetermined path on which the user has no control.
These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
Some other interfaces add 2 degrees of freedom to the timeline: the user does not control the position of the camera but can control the angle. This mimics the scenario of the 360 video.
Some other interfaces add 2 degrees of freedom to the timeline: the user does not control the camera's position but can control the angle. This mimics the 360 video scenario.
This is typically the case of the video game \href{http://nolimitscoaster.com/}{\emph{nolimits 2: roller coaster simulator}} which works with VR devices (oculus rift, HTC vive, etc.) where the only interaction available to the user is turning the head.
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angles (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the camera's position, and 2 being the angles (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
The most common controls are the trackball controls where the user rotate the object like a ball \href{https://threejs.org/examples/?q=controls\#misc_controls_trackball}{(live example here)} and the orbit controls, which behave like the trackball controls but preserving the up vector \href{https://threejs.org/examples/?q=controls\#misc_controls_orbit}{(live example here)}.
These types of controls are notably used on the popular mesh editor \href{http://www.meshlab.net/}{MeshLab} and \href{https://sketchfab.com/}{SketchFab}, the YouTube for 3D models.
@ -301,14 +301,14 @@ These types of controls are notably used on the popular mesh editor \href{http:/
\end{figure}
Another popular way of controlling a free camera in a virtual environment is the first person controls \href{https://threejs.org/examples/?q=controls\#misc_controls_pointerlock}{(live example here)}.
These controls are typically used in shooting video games, the mouse rotates the camera and the keyboard is used to translate it.
These controls are typically used in shooting video games, the mouse rotates the camera and the keyboard translates it.
\subsection{Relationship between interface, interaction and streaming}
In both video and 3D systems, streaming affects interaction.
For example, in a video streaming scenario, if a user sees that the video is fully loaded, they might start moving around on the timeline, but if they see that the streaming is just enough to not stall, they might prefer not interacting and just watch the video.
If the streaming stalls for too long, the user might seek somewhere else hoping for the video to resume, or get frustrated and leave the video.
The same types of behaviour occur in 3D streaming: if a user is somewhere in a scene, and sees more data appearing, they might wait until enough data has arrived, but if they see nothing happens, they might leave to look for data somewhere else.
The same types of behaviour occur in 3D streaming: if a user is somewhere in a scene, and sees more data appearing, they might wait until enough data have arrived, but if they see nothing happens, they might leave to look for data somewhere else.
Those examples show how streaming can affect interaction, but interaction also affects streaming.
In a video streaming scenario, if a user is watching peacefully without interacting, the system just has to request the next chunks of video and display them.
@ -317,7 +317,7 @@ Just like in the video setup, the way a user navigates in a networked virtual en
Moving slowly allows the system to collect and display data to the user, whereas moving frenetically puts more pressure on the streaming: the data that the system requested may be obsolete when the response arrives.
Moreover, the interface and the way elements are displayed to the user also impacts his behaviour.
A streaming system can use this effect to its users benefit by providing feedback on the streaming to the user via the interface.
A streaming system can use this effect to enhancing the quality of experience by providing feedback on the streaming to the user via the interface.
For example, on Youtube, the buffered portion of the video is displayed in light grey on the timeline, whereas the portion that remains to be downloaded is displayed in dark grey.
A user is more likely to click on the light grey part of the timeline than on the dark grey part, preventing the streaming from stalling.

View File

@ -5,7 +5,7 @@ Recent software uses 2D images from cameras to reconstruct 3D data, e.g. \href{h
More and more devices are specifically built to harvest 3D data: for example, LIDAR (Light Detection And Ranging) can compute 3D distances by measuring time of flight of light. The recent research interest for autonomous vehicles allowed more companies to develop cheaper LIDARs, which increase the potential for new 3D content creation.
Thanks to these techniques, more and more 3D data become available.
These models have potential for multiple purposes, for example, they can be printed, which can reduce the production cost of some pieces of hardware or enable the creation of new objects, but most uses are based on visualization.
For example, they can be used for augmented reality, to provide user with feedback that can be useful to help worker with complex tasks, but also for fashion (for example, \emph{Fittingbox} is a company that develops software to virtually try glasses, as in Figure~\ref{i:fittingbox}).
For example, they can be used for augmented reality, to provide user with feedback that can be useful to help worker with complex tasks, but also for fashion (for example, \href{https://www.fittingbox.com}{Fittingbox} is a company that develops software to virtually try glasses, as in Figure~\ref{i:fittingbox}).
\begin{figure}[ht]
\centering
@ -24,7 +24,7 @@ For example, they can be used for augmented reality, to provide user with feedba
\caption{Sketchfab interface\label{i:sketchfab}}
\end{figure}
In most 3D visualization systems, the 3D data is stored on a server and needs to be transmitted to a terminal before the user can visualize it.
In most 3D visualization systems, the 3D data are stored on a server and need to be transmitted to a terminal before the user can visualize them.
The improvements in the acquisition setups we described lead to an increasing quality of the 3D models, thus an increasing size in bytes as well.
Simply downloading 3D content and waiting until it is fully downloaded to let the user visualize it is no longer a satisfactory solution, so adaptive streaming is needed.
In this thesis, we propose a full framework for navigation and streaming of large 3D scenes, such as districts or whole cities.

View File

@ -1,7 +1,7 @@
\section{Thesis outline}
First, in Chapter~\ref{f}, we give some preliminary information required to understand the types of objects we are manipulating in this thesis.
We then proceed to compare 3D and video content: video and 3D share many features, and analysing video setting gives inspiration for building a 3D streaming system.
We then proceed to compare 3D and video content: video and 3D share many features, and analyzing video setting gives inspiration for building a 3D streaming system.
In Chapter~\ref{sote}, we present a review of the state of the art in multimedia interaction and streaming.
This chapter starts with an analysis of the video streaming standards.
@ -12,16 +12,16 @@ Then, in Chapter~\ref{bi}, we present our first contribution: an in-depth analys
We first develop a basic interface for navigating in 3D and then, we introduce 3D objects called \emph{bookmarks} that help users navigating in the scene.
We then present a user study that we conducted on 50 people which shows that bookmarks ease user navigation: they improve performance at tasks such as finding objects.
% Then, we setup a basic 3D streaming system that allows us to replay the traces collected during the user study and simulate 3D streaming at the same time.
We analyse how the presence of bookmarks impacts the streaming: we propose and evaluate streaming policies based on precomputations relying on bookmarks and that measurably increase the quality of experience.
We analyze how the presence of bookmarks impacts the streaming: we propose and evaluate streaming policies based on precomputations relying on bookmarks and that measurably increase the quality of experience.
In Chapter~\ref{d3}, we present the most important contribution of this thesis: DASH-3D.
DASH-3D is an adaptation of DASH (Dynamic Adaptive Streaming over HTTP): the video streaming standard, to 3D streaming.
We first describe how we adapt the concepts of DASH to 3D content, including the segmentation of content.
We then define utility metrics that associate score to each chunk depending on the user's position.
We then define utility metrics that rate each chunk depending on the user's position.
Then, we present a client and various streaming policies based on our utilities which can benefit from DASH format.
We finally evaluate the different parameters of our client.
In Chapter~\ref{sb}, we present our last contribution: the integration of the interaction ideas that we developed in Chapter~\ref{bi} into DASH-3D.
We first develop an interface that allows desktop as well as mobile devices to navigate streamed 3D scenes, and that introduces a new style of bookmarks.
We then explain why simply applying the ideas developed in Chapter~\ref{bi} is not sufficient and we propose more efficient precomputations that enhances the streaming.
We first develop an interface that allows desktop as well as mobile devices to navigate in streamed 3D scenes, and that introduces a new style of bookmarks.
We then explain why simply applying the ideas developed in Chapter~\ref{bi} is not sufficient and we propose more efficient precomputations that enhance the streaming.
Finally, we present a user study that provides us with traces on which we evaluate the impact of our extension of DASH-3D on the quality of service and on the quality of experience.

View File

@ -157,16 +157,16 @@ By benefiting from the video compression techniques, the authors are able to rea
\subsection{Geometry and textures}
As discussed in Chapter~\ref{f:3d}, most 3D scenes consists in two main types of data: geometry and textures.
As discussed in Chapter~\ref{f:3d}, most 3D scenes consist in two main types of data: geometry and textures.
When addressing 3D streaming, one must handle the concurrency between geometry and textures, and the system needs to address this compromise.
Balancing between streaming of geometry and texture data is addressed by~\citep{batex3},~\citep{visual-quality-assessment}, and~\citep{mesh-texture-multiplexing}.
Their approaches combine the distortion caused by having lower resolution meshes and textures into a single view independent metric.
\citep{progressive-compression-textured-meshes} also deals with the geometry / texture compromise.
This work designs a cost driven framework for 3D data compression, both in terms of geometry and textures.
The authors generate an atlas for textures that enables efficient compression and multiresolution scheme.
The authors generate an atlas for textures that enables efficient compression and multi-resolution scheme.
All four works considered a single mesh, and have constraints on the types of meshes that they are able to compress.
Since the 3D scenes we are interested in in our work consists in a soup of textured polygon, those constraints are not satisfied and we cannot use those techniques.
Since the 3D scenes we are interested in in our work consist in soups of textured polygons, those constraints are not satisfied and we cannot use those techniques.
% All four works considered a single, manifold textured mesh model with progressive meshes, and are not applicable in our work since we deal with large and potentially non-manifold scenes.
@ -212,13 +212,13 @@ In~\citeyear{3d-tiles-10x}, 3D Tiles streaming system was improved by preloading
\citep{zampoglou} is another example of a streaming framework: it is the first paper that proposes to use DASH to stream 3D content.
In their work, the authors describe a system that allows users to access 3D content at multiple resolutions.
They organize the content, following DASH terminology, into periods, adaptation sets, representations.
They organize the content, following DASH terminology, into periods, adaptation sets, representations and segments.
Their first adaptation set codes the tree structure of the scene graph.
Each further adaptation set contains both geometry and texture information and is available at different resolutions defined in a corresponding representation.
To avoid requests that would take too long and thus introduce latency, the representations are split into segments.
The authors discuss the optimal number of polygons that should be stored in a single segment.
On the one hand, using segments containing very few faces will induce many HTTP requests from the client, and will lead to poor streaming efficiency.
On the other hand, if segments contain too many faces, the time to load the segment will be long and the system loses adaptability.
On the other hand, if segments contain too many faces, the time to load the segment is long and the system loses adaptability.
Their approach works well for several objects, but does not handle view-dependent streaming, which is desirable in the use case of large NVEs\@.
% \subsection{Prefetching in NVE}

View File

@ -16,13 +16,13 @@ This type of network architecture is called CDN (Content Delivery Network) and i
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH \citep{dash-std,dash-std-2}, is now a widely deployed
standard for adaptively streaming video on the web \citep{dash-std-full}, made to be simple, scalable and inter-operable.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content to downloaded, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD\@.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content to download, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD\@.
\subsubsection{DASH structure}
All the content structure is described in a Media Presentation Description (MPD) file, written in the XML format.
This file has 4 layers: the periods, the adaptation sets, the representations, and the segments.
A MPD has a hierarchical structure, meaning it has multiple periods, and each period can have multiple adaptation sets, each adaptation set can have multiple representation, and each representation can have multiple segments.
An MPD has a hierarchical structure, meaning it has multiple periods, and each period can have multiple adaptation sets, each adaptation set can have multiple representation, and each representation can have multiple segments.
\paragraph{Periods.}
Periods are used to delimit content depending on time.
@ -50,9 +50,9 @@ If a user wants to seek somewhere else in the video, only one segment of data is
\subsubsection{Content preparation and server}
Encoding a video in DASH format consists in partitioning the content into periods, adaptation sets, representations and segments as explained above, and generating a Media Presentation Description file (MPD) which describes this organization.
Once the data is prepared, it can simply be hosted on a static HTTP server which does no computation other than serving files when it receives requests.
Once the data are prepared, they can simply be hosted on a static HTTP server which does no computation other than serving files when it receives requests.
All the intelligence and the decision making is moved to the client side.
This is one of the DASH strengths: no powerful server is required, and since static HTTP server are stable and efficient, all DASH clients can benefit from it.
This is one of the DASH strengths: no powerful server is required, and since static HTTP server are mature and efficient, all DASH clients can benefit from it.
\subsubsection{Client side adaptation}
@ -73,7 +73,7 @@ That way, a client can choose to download either the low resolution of the whole
\end{figure}
For each tile of the video, an adaptation set is declared in the MPD, and a supplemental property is defined in order to give the client information about the tile.
This supplemental property contains many elements, but the most important ones are the position ($x$ and $y$) and the size (width and height) of the tile describing the position of the tile in relation to the full video.
This supplemental property contains many elements, but the most important ones are the position ($x$ and $y$) and the size (width and height) describing the position of the tile in relation to the full video.
An example of such a property is given in Snippet~\ref{sota:srd-xml}.
\begin{figure}[th]
@ -98,7 +98,7 @@ An example of such a property is given in Snippet~\ref{sota:srd-xml}.
]{assets/state-of-the-art/video/srd.xml}
\end{figure}
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed. While Figure~\ref{sota:srd-png} illustrates how DASH-SRD can be used in the context of zoomable video streaming, the ideas developed in DASH-SRD have proven particularly useful in the context of 360 video streaming (see for example \citep{ozcinar2017viewport}).
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed. While Figure~\ref{sota:srd-png} illustrates how DASH-SRD can be used in the context of zoomable video streaming, the ideas developed in DASH-SRD have proven to be particularly useful in the context of 360 video streaming (see for example \citep{ozcinar2017viewport}).
This is especially interesting in the context of 3D streaming since we have this same pattern of a user viewing only a part of a content.
% \subsection{Prefetching in video streaming}