Last proofread

This commit is contained in:
Thomas Forgione 2020-02-08 18:31:59 +01:00
parent b5ffd3408e
commit 5b0c6dddfe
24 changed files with 128 additions and 140 deletions

View File

@ -16,11 +16,11 @@ This work has been published at the ACM MMSys conference in 2016~\citep{bookmark
\paragraph{}
After studying the interactive aspect of 3D navigation, we proposed a contribution focusing on the content preparation and the streaming policies of such a system.
The objective of this contribution was to introduce a system able to perform \textbf{scalable, view-dependent 3D streaming}.
This new framework brings many improvements upon the basic system described in our first contribution: support for texture, externalisation of necessary computations from the server to the clients, support for multi-resolution textures, rendering performances considerations.
This new framework brought many improvements upon the basic system described in our first contribution: support for texture, externalisation of necessary computations from the server to the clients, support for multi-resolution textures, rendering performances considerations.
We drew massive inspiration from the DASH technology, a standard for video streaming used for its scalability and its adaptability.
We exploit the fact that DASH is made to be content agnostic to fit 3D content into its structure.
Following the path set by DASH-SRD, we propose to tile 3D content using a tree and encode this partition into a description file (MPD) to allow view-dependent streaming, without the need for computation on the server side.
On the client side, we implement loading policies that optimize a utility metric estimating how much geometry and texture segments contribute to the visual rendering of the scene at a particular viewpoint.
We exploited the fact that DASH is made to be content agnostic to fit 3D content into its structure.
Following the path set by DASH-SRD, we proposed to tile 3D content using a tree and encode this partition into a description file (MPD) to allow view-dependent streaming, without the need for computation on the server side.
On the client side, we implemented loading policies that optimize a utility metric estimating how much geometry and texture segments contribute to the visual rendering of the scene at a particular viewpoint.
We thoroughly tested our solutions by running simulations with different parameter values, as well as different loading policies, to propose an efficient framework that we name DASH-3D.
This work has been published as a full paper at the conference ACMMM in 2018~\citep{dash-3d}.
A demonstration paper on the DASH-3D implementation was also published~\citep{dash-3d-demo}.
@ -31,6 +31,7 @@ We developed interfaces that allow navigating in 3D scenes for both \textbf{desk
The setup of our first contribution considered only geometry, triangle by triangle, which made precomputations and ordering straightforward.
Moreover, as the server knew exactly the client needs, it could create chunks adapted to the client's requirements.
In DASH-3D, the data is structured a priori (offline), so that chunks are grouped independently of a client's need.
We therefore focus on precomputing an optimal order for chunks from each bookmark, and, alter the streaming policies from our second contribution to switch to this optimal order when a user clicks a bookmark.
Simulations show that the QoS is positively impacted by those policies.
A demo paper was published at the conference ACMMM in 2019~\citep{dash-3d-bookmarks-demo} showing the interfaces for desktop and mobile clients with bookmarks, but without the streaming aspect. A journal paper will be submitted shortly to value this third contribution.
We therefore focused on precomputing an optimal order for chunks from each bookmark, and, altered the streaming policies from our second contribution to switch to this optimized order when a user clicks a bookmark.
Simulations showed that the QoS is positively impacted by those policies.
A demo paper was published at the conference ACMMM in 2019~\citep{dash-3d-bookmarks-demo} showing the interfaces for desktop and mobile clients with bookmarks, but without the streaming aspect.
A journal paper will be submitted shortly to value this third contribution.

View File

@ -10,12 +10,12 @@ Our content preparation considers only spatial information both for 3D content a
Having semantic information could help us derive a better structure for our content: we know for example that displaying half a building will lead to low quality of experience.
In order to account for semantic besides partitioning, we could also adapt the utilities we have been defining for our segments: some semantically significant data can be considered as more important than other by taking it into account in our utilities.
\subsection{Compression / multi resolution for geometry}
\subsection{Compression / multi-resolution for geometry}
In this thesis, we considered different resolutions for textures, but we have not investigated geometry compression nor multi-resolution.
Geometry data is transmitted as OBJ files (mostly consisting in ASCII encoded numbers), which is terrible for transmission.
Compression would reduce the size of the geometry files, thus increasing the quality of experience.
Supporting multi resolution geometry would improve it even more, even if performing multi-resolution on a large and heterogeneous scene is difficult.
Supporting multi-resolution geometry would improve it even more, even if performing multi-resolution on a large and heterogeneous scene is difficult.
To this day, only little attention has been given to multi-resolution compression for textured geometry~\citep{maglo20153d}, and their focus has been on 3D objects.
Once again, semantic information could be a great help in this regard.
Other compression schemes are also interesting for our framework: \citep{demir2016proceduralization} describes an algorithm to proceduralize architectural models, and the authors also propose a semi-automatic method in \citep{demir2018guided} to give some control to models editors.

View File

@ -8,7 +8,7 @@ A camera path generated by a particular user is a set of viewpoint $v(t_i)$ inde
All DASH clients are built from the same basic bricks, as shown in Figure~\ref{d3:dash-scheme}:
\begin{itemize}
\item the \emph{access client}, which is the module that deals with making HTTP requests and receiving responses;
\item the \emph{segment parsers}, which decodes the data downloaded by the access client, whether it be materials, geometry or textures;
\item the \emph{segment parsers}, which decode the data downloaded by the access client, whether it be materials, geometry or textures;
\item the \emph{control engine}, which analyses the bandwidth to dynamically adapt to it;
\item the \emph{media engine}, which renders the multimedia content and the user interface to the screen.
\end{itemize}
@ -75,7 +75,7 @@ All DASH clients are built from the same basic bricks, as shown in Figure~\ref{d
\caption{DASH client-server architecture\label{d3:dash-scheme}}
\end{figure}
The DASH client first downloads the MPD file to get the material (.mtl) file containing information about all the geometry and textures available for the entire 3D model.
The DASH client first downloads the MPD file to get the material file containing information about all the geometry and textures available for the entire 3D model.
At time instance $t_i$, the DASH client decides to download the appropriate segments containing the geometry and the texture to generate the viewpoint $v(t_{i+1})$ for the time instance $t_{i+1}$.
Starting from $t_1$, the camera continuously follows a camera path $C=\{v(t_i), t_i \in [t_1,t_{end}]\}$, along which downloading opportunities are strategically exploited to sequentially query the most useful segments.
@ -93,7 +93,7 @@ The utility is a function of a segment, either geometry or texture, and the curr
\subsubsection{Offline parameters}
Let us detail first, all parameters available from the offline/static preparation of the 3D NVE\@.
These parameters are stored in the MPD file.
First, for each geometry segment $s^G$ there is a predetermined 3D area $\mathcal{A}_{3D}(s^G)$, equal to the sum of all triangle areas in this segment (in 3D); it is computed as the segments are created.
First, for each geometry segment $s^G$ there is a predetermined 3D area $\mathcal{A}\left(s^G\right)$, equal to the sum of all triangle areas in this segment (in 3D); it is computed as the segments are created.
Note that the texture segments have similar information, but computed at \textit{navigation time} $t_i$.
The second information stored in the MPD for all segments, geometry, and texture, is the size of the segment (in kB).
@ -118,7 +118,7 @@ Utility for texture segments follows from the geometric utility.
The utility of a geometric segment $s^G$ for a viewpoint $v(t_i)$ is:
\begin{equation*}
\mathcal{U} \Big(s^G,v(t_i) \Big) = \frac{\mathcal{A}_{3D}(s^G)}{\mathcal{D}{\left(v{(t_i)},AS^G\right)}^2}
\mathcal{U} \Big(s^G,v(t_i) \Big) = \frac{\mathcal{A}(s^G)}{\mathcal{D}{\left(v{(t_i)},AS^G\right)}^2}
\end{equation*}
where $AS^G$ is the adaptation set containing $s^G$.
@ -127,11 +127,11 @@ That way, we favor segments with big faces that are close to the camera.
\subsubsection{Utility for texture segments}
For a texture $T$ stored in a segment $s^T$, the triangles in $\Delta(T)$ are stored in arbitrary geometry segments, that is, they do not have spatial coherence.
Thus, for each $k^{th}$ downloaded geometry segment $s_k^G$, and total downloaded segment $K$ at time $t_i$, we collect the triangles of $\Delta(T, t_i)$ in $s^G_k$, and compute the ratio of $\mathcal{A}_{3D}(s_k^G)$ covered by these triangles.
Thus, for each $k^{th}$ downloaded geometry segment $s_k^G$, and total downloaded segment $K$ at time $t_i$, we collect the triangles of $\Delta(T, t_i)$ in $s^G_k$, and compute the ratio of $\mathcal{A}(s_k^G)$ covered by these triangles.
So, we define the utility:
\begin{equation*}
\mathcal{U}\Big( s^T,v(t_i) \Big)
= psnr(s^T) \sum_{k\in K}\frac{\mathcal{A}_{3D}( s_k^G\cap \Delta(T,t_i))}{\mathcal{A}_{3D}(s_k^G)} \mathcal{U}\Big( s_k^G,v(t_i) \Big)
= psnr(s^T) \sum_{k\in K}\frac{\mathcal{A}( s_k^G\cap \Delta(T,t_i))}{\mathcal{A}(s_k^G)} \mathcal{U}\Big( s_k^G,v(t_i) \Big)
\end{equation*}
where we sum over all geometry segments received before time $t_i$ that intersect $\Delta(T,t_i)$ and such that the adaptation set it belongs to is in the frustum.
This formula defines the utility of a texture segment by computing the linear combination of the utility of the geometry segments that use this texture, weighted by the proportion of area covered by the texture in the segment.
@ -145,7 +145,7 @@ Having defined a utility on both geometry and texture segments, the client uses
Along the camera path $C=\{v(t_i)\}$, viewpoints are indexed by a continuous time interval $t_i \in [t_1,t_{end}]$.
Contrastingly, the DASH adaptation logic proceeds sequentially along a discrete time line.
The first HTTP request made by the DASH client at time $t_1$ selects the most useful segment $s_1^*$ to download and will be followed by subsequent decisions at $t_2, t_3, \dots$.
While selecting $s_i^*$, the i-th best segment to request, the adaptation logic compromises between geometry, texture, and the available \texttt{representations} given the current bandwidth, camera dynamics, and the previously described utility scores.
While selecting $s_i^*$, the $i^{th}$ best segment to request, the adaptation logic compromises between geometry, texture, and the available representations given the current bandwidth, camera dynamics, and the previously described utility scores.
The difference between $t_{i+1}$ and $t_{i}$ is the $s_i^*$ delivery delay.
It varies with the segment size and network conditions.
Algorithm~\ref{d3:next-segment} details how our DASH client makes decisions.
@ -209,11 +209,11 @@ s^{\texttt{GREEDY}}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \
\subsection{JavaScript client\label{d3:js-implementation}}
In order to be able to evaluate our system, we need to collect traces and perform analyses on them.
Since our scene is large, and since the system we are describing allows navigating in a streaming scene, we developed a JavaScript Web client that implements our utility metrics and policies.
Since our scene is large, and since the system we are describing allows navigating in a streaming scene, we developed a JavaScript web client that implements our utility metrics and policies.
\subsubsection{Media engine}
Performance of our system is a key aspect in our work; as such, we can not use the default geometries described in Section~\ref{f:geometries} because of its poor performance, and we instead use buffer geometries.
Performance of our system is a key aspect in our work; as such, we can not use the default geometries described in Section~\ref{f:geometries} because of their poor performance, and we instead use buffer geometries.
However, in our system, the way changes happen to the 3D content is always the same: we only add faces and textures to the model.
We therefore implemented a class that derives \texttt{BufferGeometry}, for more convenience.
\begin{itemize}
@ -223,11 +223,11 @@ We therefore implemented a class that derives \texttt{BufferGeometry}, for more
\item It also keeps track of what part of the buffers has been transmitted to the GPU\@: THREE.js allows us to set the range of the buffer that we want to update, and we are able to update only what is necessary.
\end{itemize}
\paragraph{Our 3D model class.\label{d3:model-class}}
\subsection{Our 3D model class.\label{d3:model-class}}
As said in the previous subsections, a geometry and a material are bound together in a mesh.
This means that we are forced to have as many meshes as there are materials in our model.
To make this easy to manage, we implemented a \textbf{Model} class, that holds both geometry and textures.
We can add vertices, faces, and materials to this model, and it internally manages with the right geometries, materials and meshes.
We can add vertices, faces, and materials to this model, and it internally manages the right geometries, materials and meshes.
In order to avoid having many models that share the same material (which would harm performance), it automatically merges faces that share the same material in the same buffer geometry, as shown in Figure~\ref{d3:render-structure}.
\begin{figure}[ht]
@ -347,7 +347,7 @@ Since our system has many tasks to perform, it is natural to use workers to mana
However, what a worker can do is very limited, since it cannot access the variables of the main script.
Because of this, we are forced to run the renderer on the main script, where it can access the HTML page, and we move all the other tasks (i.e.\ the access client, the control engine and the segment parsers) to the worker.
Since the main script is the only thread communicating with the GPU, it will still have to update the model with the parsed content it receives from the worker.
We do not use web workers to improve the framerate of the system, but to reduce the latency that occurs when receiving a new segment, which can be frustrating since in a single thread scenario, each time a segment is received, the interface freezes for around half a second.
We do not use web workers to improve the framerate of the system, but rather to reduce the latency that occurs when receiving a new segment, which can be frustrating in a single thread scenario, since each time a segment is received, the interface would freeze for around half a second.
A sequence diagram of what happens when downloading, parsing and rendering content is shown in Figure~\ref{d3:sequence}.
\begin{figure}[ht]
@ -471,6 +471,6 @@ In order to be able to run simulations, we develop the bricks of the DASH client
\item the \textbf{simulator} takes a user trace as a parameter, it then replays the trace using specific parameters of the access client and outputs a file containing the history of the simulation (which files have been downloaded, and when);
\item the \textbf{renderer} takes the user trace as well as the history generated by the simulator as parameters, and renders images that correspond to what would have been seen.
\end{itemize}
When simulating experiments, we will run the simulator on many traces that we collected during user-studies, and we will then run the renderer program according to the traces to generate images corresponding to the simulation.
When simulating experiments, we run the simulator on many traces that we collected during user-studies, and we then run the renderer program according to the traces to generate images corresponding to the simulation.
We are then able to compute PSNR between those frames and the ground truth frames.
Doing so guarantees us that our simulator is not affected by the performances of our renderer.

View File

@ -1,6 +1,6 @@
\section{Content preparation\label{d3:dash-3d}}
In this section, we describe how we pre-process and store the 3D data of the NVE, consisting of a polygon soup, textures, and material information into a DASH-compliant Media Presentation Description (MPD) file.
In this section, we describe how we preprocess and store the 3D data of the NVE, consisting of a polygon soup, textures, and material information into a DASH-compliant Media Presentation Description (MPD) file.
In our work, we use the \texttt{obj} file format for the polygons, \texttt{png} for textures, and \texttt{mtl} format for material information.
The process, however, applies to other formats as well.
@ -8,12 +8,12 @@ The process, however, applies to other formats as well.
In DASH, the information about content storage and characteristics, such as location, resolution, or size, is extracted from an MPD file by the client.
The client relies only on this information to decide which chunk to request and at which quality level.
The MPD file is an XML file that is organized into different sections hierarchically.
The \texttt{period} element is a top-level element, which for the case of video, indicates the start time and length of a video chapter.
This element does not apply to NVE, and we use a single \texttt{period} for the whole scene, as the scene is static.
Each \texttt{period} element contains one or more adaptation sets, which describe the alternate versions, formats, and types of media.
The period element is a top-level element, which for the case of video, indicates the start time and length of a video chapter.
This element does not apply to NVE, and we use a single period for the whole scene, as the scene is static.
Each period element contains one or more adaptation sets, which describe the alternate versions, formats, and types of media.
We utilize adaptation sets to organize a 3D scene's material, geometry, and texture.
The piece of software that does the preprocessing of the model consists in file manipulation and is written in Rust as well.
The piece of software that does the preprocessing of the model consists in file manipulation and is written in Rust.
It successively preprocesses the geometry and then the textures.
The MPD is generated by a library named \href{https://github.com/netvl/xml-rs}{xml-rs} which works like a stack:
\begin{itemize}
@ -33,12 +33,12 @@ A face belongs to a cell if its barycenter falls inside the corresponding boundi
Each cell corresponds to an adaptation set.
Thus, geometry information is spread on adaptation sets based on spatial coherence, allowing the client to download the relevant faces selectively.
A cell is relevant if it intersects the frustum of the client's current viewpoint. Figure~\ref{d3:big-picture} shows the relevant cells in green.
As our 3D content, a virtual environment, is biased to spread along the horizontal plane, we alternate between splitting between the two horizontal directions.
As our 3D content, a virtual environment, is biased to spread along the horizontal plane, we split the bounding box alternatively along the two horizontal directions.
We create a separate adaptation set for large faces (e.g., the sky or ground) because they are essential to the 3D model and do not fit into cells.
We consider a face to be large if its area in 3D is more than $a+3\sigma$, where $a$ and $\sigma$ are the average and the standard deviation of 3D area of faces respectively.
In our example, it selects the 5 largest faces that represent $15\%$ of the total face area.
We thus obtain a decomposition of the NVE into adaptation sets that partitions the geometry of the scene into a small adaptation set containing the larger faces of the model, and smaller adaptation sets containing the remaining faces.
We thus obtain a decomposition of the NVE into adaptation sets that partitions the geometry of the scene into an adaptation that contains the larger faces of the model, and smaller adaptation sets containing the remaining faces.
We store the spatial location of each adaptation set, characterized by the coordinates of its bounding box, in the MPD file as the supplementary property of the adaptation set in the form of ``\textit{$x_{\min}$, width, $y_{\min}$, height, $z_{\min}$, depth}'' (as shown in Snippet~\ref{d3:mpd}).
This information is used by the client to implement a view-dependent streaming (Section~\ref{d3:dash-client}).
@ -51,10 +51,10 @@ The client can use this attribute to render a face for which the corresponding t
\subsubsection{Material management}
The material \texttt{.mtl} file is a text file that describes all materials used in the \texttt{.obj} files for the entire 3D model.
The material (MTL) file is a text file that describes all materials used in the OBJ files for the entire 3D model.
A material has a name, properties such as specular parameters, and, most importantly, a path to a texture file.
The \texttt{.mtl} file maps each face of the \texttt{.obj} to a material.
As the \texttt{.mtl} file is a different type of media than geometry and texture, we define a particular adaptation set for this file, with a single representation.
The MTL file maps each face of the OBJ to a material.
As the MTL file is a different type of media than geometry and texture, we define a particular adaptation set for this file, with a single representation.
\subsection{Representations}\label{d3:representation}
Each adaptation set can contain one or more representations of the geometry or texture data, at different levels of detail (e.g., a different number of faces).
@ -82,7 +82,7 @@ Figure~\ref{d3:textures} illustrates the use of the textures against the renderi
\subsection{Segments}
To allow random access to the content within an adaptation set storing geometry data, we group the faces into segments.
Each segment is then stored as a \texttt{.obj} file which can be individually requested by the client.
Each segment is then stored as an OBJ file which can be individually requested by the client.
For geometry, we partition the faces in an adaptation set into sets of $N_s$ faces, by first sorting the faces by their area in 3D space in descending order, and then place each successive $N_s$ faces into a segment.
Thus, the first segment contains the biggest faces and the last one the smallest.
In addition to the selected faces, a segment stores all face vertices and attributes so that each segment is independent.
@ -110,5 +110,5 @@ For textures, each representation contains a single segment.
]{assets/dash-3d/geometry-as.xml}
\end{figure}
Now that 3D data is partitioned and that the MPD file is generated, we see in the next section how the client uses the MPD to request the appropriate data chunks
Now that the 3D data is partitioned and that the MPD file is generated, we see in the next section how the client uses the MPD to request the appropriate data chunks.

View File

@ -9,7 +9,7 @@ We use a city model of the Marina Bay area in Singapore in our experiments.
The model came in 3DS Max format and has been converted into Wavefront OBJ format before the processing described in Section~\ref{d3:dash-3d}.
The converted model has 387,551 vertices and 552,118 faces.
Table~\ref{d3:size} gives some general information about the model and Figure~\ref{d3:heterogeneity} illustrates the heterogeneity of our model (wireframe rendering is used to illustrate the heterogeneity of the geometry complexity).
We partition the geometry into a k-$d$ tree until the leafs have less than 10000 faces, which gives us 64 adaptation sets, plus one containing the large faces.
We partition the geometry into a $k$-d tree until the leafs have less than 10000 faces, which gives us 64 adaptation sets, plus one containing the large faces.
\begin{figure}[th]
\centering
@ -83,8 +83,8 @@ We present experiments to validate our implementation choices at every step of o
We replay the user-generated camera paths with various bandwidth conditions while varying key components of our system.
Table~\ref{d3:experiments} sums up all the components we varied in our experiments.
We compare the impact of two space-partitioning trees, a $k$-d tree and an Octree, on content preparation.
We also try several utility metrics for geometry segments: an offline one, which assigns to each geometry segment $s^G$ the cumulated 3D area of its belonging faces $\mathcal{A}_{3D}(s^G)$; an online one, which assigns to each geometry segment the inverse of its distance to the camera position; and finally our proposed method, as described in Section~\ref{d3:utility} ($\mathcal{A}_{3D}(s^G)/ \mathcal{D}{(v{(t_i)},AS^G)}^2$).
We compare the impact of two space-partitioning trees, a $k$-d tree and an octree, on content preparation.
We also try several utility metrics for geometry segments: an offline one, which assigns to each geometry segment $s^G$ the cumulated 3D area of its belonging faces $\mathcal{A}(s^G)$; an online one, which assigns to each geometry segment the inverse of its distance to the camera position; and finally our proposed method, as described in Section~\ref{d3:utility} ($\mathcal{A}(s^G)/ \mathcal{D}{(v{(t_i)},AS^G)}^2$).
We consider two streaming policies to be applied by the client, proposed in Section~\ref{d3:dash-client}.
The greedy strategy determines, at each decision time, the segment that maximizes its predicted utility at arrival divided by its predicted delivery delay, which corresponds to equation (\ref{d3:greedy}).
The second streaming policy that we run is the one we proposed in equation (\ref{d3:smart}).
@ -130,9 +130,9 @@ Finally, we try several bandwidth parameters to study how our system can adapt t
\end{figure}
Figure~\ref{d3:preparation} shows how the space partition can affect the rendering quality.
We use our proposed utility metrics (see Section~\ref{d3:utility}) and streaming policy from Equation (\ref{d3:smart}), on content divided into adaptation sets obtained either using a $k$-d tree or an Octree and run experiments on all camera paths at 5 Mbps.
The octree partitions content into non-homogeneous adaptation sets; as a result, some adaptation sets may contain smaller segments, which contain both important (large) and non-important polygons. For the $k$-d tree, we create cells containing the same number of faces $N_a$ (here, we take $N_a=10k$).
Figure~\ref{d3:preparation} shows that the system seems to be slightly less efficient with an Octree than with a $k$-d tree based partition, but this result is not significant.
We use our proposed utility metrics (see Section~\ref{d3:utility}) and streaming policy from equation (\ref{d3:smart}), on content divided into adaptation sets obtained either using a $k$-d tree or an octree and run experiments on all camera paths at 5 Mbps.
The octree partitions content into non-homogeneous adaptation sets; as a result, some adaptation sets may contain smaller segments, which contain both important (large) and non-important polygons. For the $k$-d tree, we create cells containing the same number of faces $N_a$ (here, we take $N_a=10000$).
Figure~\ref{d3:preparation} shows that the system seems to be slightly less efficient with an octree than with a $k$-d tree based partition, but this result is not significant.
For the remaining experiments, partitioning is based on a $k$-d tree.
\begin{figure}[th]
@ -194,9 +194,9 @@ The PSNR significantly improves when the 3D area of faces is considered for crea
We also compared the greedy vs.\ proposed streaming policy (as shown in Figure~\ref{d3:greedy-weakness}) for limited bandwidth (5 Mbps).
The proposed scheme outperforms the greedy during the first 30s and does a better job overall.
Table~\ref{d3:greedy-vs-proposed} shows the average PSNR for the proposed method and the greedy method for different downloading bandwidth.
In the first 30 sec, since there are relatively few 3D contents downloaded, making a better decision at what to download matters more: we observe during that time that the proposed method leads to 1 --- 1.9 dB better in quality terms of PSNR compared to Greedy.
In the first 30 sec, since there are relatively few 3D contents downloaded, making a better decision at what to download matters more: we observe during that time that the proposed method leads to 1 --- 1.9 dB better in quality terms of PSNR compared to the greedy method.
Table~\ref{d3:percentages} shows the distribution of texture resolutions that are downloaded by greedy and our Proposed scheme, at different bandwidths.
Table~\ref{d3:percentages} shows the distribution of texture resolutions that are downloaded by greedy and our proposed scheme, at different bandwidths.
Resolution 5 is the highest and 1 is the lowest.
The table shows a weakness of the greedy policy: the distributioon of downloaded textures does not adapt to the bandwidth.
In contrast, our proposed streaming policy adapts to an increasing bandwidth by downloading higher resolution textures (13.9\% at 10 Mbps, vs. 0.3\% at 2.5 Mbps).

View File

@ -3,7 +3,7 @@
In this chapter, we take a little step back from interaction and propose a system with simple interactions that however, addresses most of the open problems mentioned in Section~\ref{i:challenges}.
We take inspiration from video streaming: working on the similarities between video streaming and 3D streaming (seen in~\ref{i:video-vs-3d}), we benefit from the DASH efficiency (seen in~\ref{sote:dash}) for streaming 3D content.
DASH is based on content preparation and structuring which helps not only the streaming policies but also leads to a scalable and efficient system since it moves completely the load from the server to the clients.
A DASH client downloads the structure of the content, and then, depending on its needs independently of the server, decides what to download.
A DASH client downloads the structure of the content, and then, depending on its needs and independently of the server, decides what to download.
In this chapter, we show how to mimic DASH video with 3D streaming, and we develop a system that keeps DASH benefits.
Section~\ref{d3:dash-3d} describes our content preparation and metadata, and all the preprocessing that is done to our model to allow efficient streaming.

View File

@ -17,7 +17,7 @@ In this chapter, we present the most important contribution of this thesis: adap
First, we show how to prepare 3D data into a format that complies with DASH data organisation, and we store enough metadata to enable a client to perform efficient streaming.
The data preparation consists in partitioning the scene into spatially coherent cells and segmenting each cell into chunks with a fixed number of faces, which are sorted by area so that faces of a different level of detail are not grouped together.
We also export each texture at different resolution.
We also export each texture at different resolutions.
We encode the metadata that describes the data organisation into a 3D version of the Media Presentation Description (MPD) that DASH uses for video.
All this prepared content is then stored on a simple static HTTP server: a clients can request the content without any need for computation on the server side, allowing a server to support an arbitrary number of clients.
% Namely, we store in the metadata the coordinates of the cells of the $k$-d tree, the areas of geometry chunks, and the average colors of textures.

View File

@ -6,8 +6,8 @@ We also give insights about interaction and streaming by comparing the 3D case t
\section{What is a 3D model?\label{f:3d}}
\subsection{3D data}
Most classical 3D models are sets of meshes and textures, which can potentially be arranged in a scene graph.
Such a model can typically contain the following:
The 3D models we are interested in are sets of meshes and textures, which can potentially be arranged in a scene graph.
Such models can typically contain the following:
\begin{itemize}
\item \textbf{Vertices}, which are 3D points,
@ -33,7 +33,7 @@ These elements are numbered starting from 1.
Faces are declared by using the indices of these elements. A face is a polygon with an arbitrary number of vertices and can be declared in multiple manners:
\begin{itemize}
\item \texttt{f 1 2 3} defines a triangle face that joins the first, the second and the third vertex declared;
\item \texttt{f 1 2 3} defines a triangle face that joins the first, the second and the third declared vertex;
\item \texttt{f 1/1 2/3 3/4} defines a similar triangle but with texture coordinates, the first texture coordinate is associated to the first vertex, the third texture coordinate is associated to the second vertex, and the fourth texture coordinate is associated with the third vertex;
\item \texttt{f 1//1 2//3 3//4} defines a similar triangle but referencing normals instead of texture coordinates;
\item \texttt{f 1/1/1 2/3/3 3/4/4} defines a triangle with both texture coordinates and normals.
@ -68,7 +68,6 @@ An example of object file is visible on Snippet~\ref{i:obj}.
\subsection{Rendering a 3D model\label{i:rendering}}
A typical 3D renderer follows Algorithm~\ref{f:renderer}.
\begin{algorithm}[th]
\SetKwData{Material}{material}
\SetKwData{Object}{object}
@ -99,7 +98,6 @@ A typical 3D renderer follows Algorithm~\ref{f:renderer}.
\caption{A rendering algorithm\label{f:renderer}}
\end{algorithm}
The first task the renderer needs to perform is sending the data to the GPU\@: this is done in the loading loop during an initialisation step.
This step can be slow, but it is generally acceptable since it only occurs once at the beginning of the program.
Then, the renderer starts the rendering loop: at each frame, it renders the whole scene: for each object, it binds the corresponding material to the GPU and then renders the object.

View File

@ -1,11 +1,11 @@
\section{Implementation details}
During this thesis, a lot of software has been developed, and for this software to be successful and efficient, we chose the appropriate languages.
During this thesis, a lot of software has been developed, and for this software to be successful and efficient, we chose appropriate languages.
When it comes to 3D streaming systems, we need two kind of software.
\begin{itemize}
\item \textbf{Interactive applications} which can run on as many devices as possible so we can easily conduct user studies. For this context, we chose the \textbf{JavaScript language}.% , since it can run on many devices and it has great support for WebGL\@.
\item \textbf{Native applications} which can run fast on desktop devices, in order to run simulations and evaluate our ideas. For this context, we chose the \textbf{Rust} language.% , which is a somewhat recent language that provides both the efficiency of C and C++ and the safety of functional languages.
\item \textbf{Interactive applications} which can run on as many devices as possible so we can easily conduct user studies. For this context, we chose the \textbf{JavaScript} language.% , since it can run on many devices and it has great support for WebGL\@.
\item \textbf{Native applications} which can run fast on desktop devices, in order to prepare data, run simulations and evaluate our ideas. For this context, we chose the \textbf{Rust} language.% , which is a somewhat recent language that provides both the efficiency of C and C++ and the safety of functional languages.
\end{itemize}
\subsection{JavaScript}
@ -80,7 +80,7 @@ Consider the piece of C++ code in Snippets~\ref{f:undefined-behaviour-cpp} and~\
\end{figure}
This loop should go endlessly because the vector grows in size as we add elements in the loop.
But the most important thing here is that since we add elements to the vector, it will eventually need to be reallocated, and that reallocation will invalidate the iterator, meaning that the following iterator will provoke an undefined behaviour.
But the most important thing here is that since we add elements to the vector, it will eventually need to be reallocated, and that reallocation will invalidate the iterator, meaning that the following iteration will provoke an undefined behaviour.
The equivalent code in Rust is in Snippets~\ref{f:undefined-behaviour-rs} and~\ref{f:undefined-behaviour-rs-it}.
\begin{figure}[ht]
@ -102,7 +102,7 @@ The equivalent code in Rust is in Snippets~\ref{f:undefined-behaviour-rs} and~\r
\end{minipage}
\end{figure}
What happens is that the iterator needs to borrow the vector.
Since it is borrowed, it can no longer be borrowed as mutable since mutating it could invalidate the other borrowers.
Because it is borrowed, it can no longer be borrowed as mutable since mutating it could invalidate the other borrowers.
And effectively, the borrow checker will crash the compiler with the error in Snippet~\ref{f:undefined-behaviour-rs-error}.
\begin{figure}[ht]
@ -114,7 +114,7 @@ And effectively, the borrow checker will crash the compiler with the error in Sn
\end{figure}
This example is one of the many examples of how powerful the borrow checker is: in Rust code, there can be no dangling reference, and all the segmentation faults coming from them are detected by the compiler.
The borrow checker may seem like an enemy to newcomers because it often rejects code that seem correct, but once one gets used to it, they understand what is the problem with their code and either fix the problem easily, or realise that the whole architecture is wrong and understand why.
The borrow checker may seem like an enemy to newcomers because it often rejects code that seem correct, but once they get used to it, they understand what is the problem with their code and either fix the problem easily, or realise that the whole architecture is wrong and understand why.
It is probably for those reasons that Rust is the \emph{most loved programming language} according to the Stack Overflow Developer Survey in~\citeyear{so-survey-2016}, \citeyear{so-survey-2017}, \citeyear{so-survey-2018} and~\citeyear{so-survey-2019}.

View File

@ -46,7 +46,7 @@ In both cases, an algorithm for content streaming has to acknowledge those diffe
In video streaming, most of the data (in terms of bytes) is used for images.
Thus, the most important thing a video streaming system should do is to optimise images streaming.
That is why, on a video on Youtube for example, there may be 6 available qualities for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 qualities for sound.
This is one of the main differences between video and 3D streaming: in a 3D scene, geometry and texture sizes are approximately the same, and leveraging between those two types of content is a key problem.
This is one of the main differences between video and 3D streaming: in a 3D setting, the ratio between geometry and texture varies from one scene to another, and leveraging between those two types of content is a key problem.
\subsection{Interaction}
@ -85,6 +85,7 @@ There are also controls for other options that are described \href{https://web.a
% \item \texttt{I} activates the mini-player (allowing to search for other videos while keeping the current video playing in the bottom right corner).
% \end{itemize}
All the keyboard shortcuts are summed up in Figure~\ref{i:youtube-keyboard}.
Those interactions are different if the user is using a mobile device.
\newcommand{\relativeseekcontrol}{LightBlue}
\newcommand{\absoluteseekcontrol}{LemonChiffon}
@ -281,26 +282,15 @@ All the keyboard shortcuts are summed up in Figure~\ref{i:youtube-keyboard}.
\caption{Youtube shortcuts (white keys are unused)\label{i:youtube-keyboard}}
\end{figure}
Those interactions are different if the user is using a mobile device.
\begin{itemize}
\item To pause a video, the user touches the screen once to make the timeline and the buttons appear and once on the pause button at the center of the screen.
\item To resume a video, the user touches the play button at the center of the screen.
\item To navigate to another moment of the video, the user can:
\begin{itemize}
\item double touch the left of the screen to move 5 seconds backwards;
\item double touch the right of the screen to move 5 seconds forwards.
\end{itemize}
\end{itemize}
When it comes to 3D, there are many approaches to manage user interaction.
Some interfaces mimic the video scenario, where the only variable is the time and the camera follows a predetermined path on which the user has no control.
These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
Some other interfaces add 2 degrees of freedom to the timeline: the user does not control the position of the camera but can control the angle. This mimics the scenario of the 360 video.
This is typically the case of the video game \href{http://nolimitscoaster.com/}{\emph{nolimits 2: roller coaster simulator}} which works with VR devices (oculus rift, HTC vive, etc.) where the only interaction the user has is turning their head.
This is typically the case of the video game \href{http://nolimitscoaster.com/}{\emph{nolimits 2: roller coaster simulator}} which works with VR devices (oculus rift, HTC vive, etc.) where the only interaction available to the user is turning the head.
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angle (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angles (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
The most common controls are the trackball controls where the user rotate the object like a ball \href{https://threejs.org/examples/?q=controls\#misc_controls_trackball}{(live example here)} and the orbit controls, which behave like the trackball controls but preserving the up vector \href{https://threejs.org/examples/?q=controls\#misc_controls_orbit}{(live example here)}.
These types of controls are notably used on the popular mesh editor \href{http://www.meshlab.net/}{MeshLab} and \href{https://sketchfab.com/}{SketchFab}, the YouTube for 3D models.

View File

@ -1,4 +1,4 @@
\section{Client DASH 3D}\label{fr:dashclientspec}
\section{Client DASH-3D}\label{fr:dashclientspec}
Dans cette section, nous détaillons un client DASH NVE qui exploite la préparation du contenu 3D.
Le client DASH commence par télécharger le MPD, avant de commencer à télécharger les segments. Quand un segment arrive, le client prend la décision du prochain segment à télécharger pour l'afficher quand il arrivera.
@ -7,23 +7,23 @@ Nous considérons une caméra virtuelle qui suit continuellement un chemin $C=\{
\subsection{Utilité des segments}\label{fr:utility}
Contrairement au streaming vidéo, où la taille (en octets) de chaque segment est corrélée à la qualité de la vidéo reçue, pour le contenu 3D, la taille du contenu n'est pas nécessairement corrélée à sa contribution en terme de qualité du rendu. Un grand polygone qui aura un grand impact visuel occupera à peu près autant d'octets qu'un petit polygone. De plus, l'impact visuel dépend du point de vue --- un grand polygone lointain ne contribuera pas autant qu'un petit polygone plus près de l'utilisateur. Ainsi, il est important pour un client DASH-NVE d'estimer ce que nous appelons \emph{l'utilité d'un segment}, de sorte à prendre les bonnes décisions de téléchargement.
Contrairement au cas de la transmission vidéo, où la taille (en octets) de chaque segment est corrélée à la qualité de la vidéo reçue, pour le contenu 3D, la taille du contenu n'est pas nécessairement corrélée à sa contribution en terme de qualité du rendu. Un grand polygone qui aura un grand impact visuel occupera à peu près autant d'octets qu'un petit polygone. De plus, l'impact visuel dépend du point de vue --- un grand polygone lointain ne contribuera pas autant qu'un petit polygone plus près de l'utilisateur. Ainsi, il est important pour un client DASH NVE d'estimer ce que nous appelons \emph{l'utilité d'un segment}, de sorte à prendre les bonnes décisions de téléchargement.
L'utilité est une fonction d'un segment, qu'il soit de géométrie ou de texture, et du point de vue courant (position de la caméra et direction du regard), et est donc calculé dynamiquement par le client à partir des métadonnées du MPD\@.
\paragraph{Paramètres statiques}
Dans un premier temps, nous allons détailler les paramètres calculés lors de la préparation du contenu, stockés dans le MPD\@.
\subsubsection{Paramètres statiques}
Dans un premier temps, nous allons détailler les paramètres calculés lors de la préparation du contenu, enregistrés dans le MPD\@.
Tout d'abord, pour chaque segment de géométrie $s_G$, nous calculons une aire $\mathcal{A}(s^G)$, égale à la somme des aires des polygones du segment.
Ensuite, pour chaque segment de texture $s^T$, le MPD enregistre l'\emph{EQM} (erreur quadratique moyenne) entre l'image courante et l'image à la plus haute résolution disponible.
Enfin, pour tous les segments, nous stockons la taille du fichier (en octets). En effet, les segments de géométrie ont un nombre similaire de faces, et leurs tailles sont donc à peu près les mêmes. En ce qui concerne les textures, les tailles sont généralement bien plus faibles que celles des segments de géométrie, mais sont aussi très variables, puisqu'entre deux résolutions, le nombre de pixels est multiplié par 4.
Enfin, pour tous les segments, nous enregistrons la taille du fichier (en octets). En effet, les segments de géométrie ont un nombre similaire de faces, et leurs tailles sont donc à peu près les mêmes. En ce qui concerne les textures, les tailles sont généralement bien plus faibles que celles des segments de géométrie, mais sont aussi très variables, puisqu'entre deux résolutions, le nombre de pixels est multiplié par 4.
\paragraph{Paramètres dynamiques}
En plus des paramètres statiques stockés dans le MPD pour chaque segment, des paramètres dépendants du point de vue sont calculés pendant la navigation. Premièrement, une mesure d'aire est calculée pour les segments de texture. Puisqu'une texture est peinte sur un ensemble de polygones, on compte pour l'aire de la texture la somme des aires de ces polygones. Nous pourrions calculer cette information de manière statique et la stocker dans le MPD, mais faire ce calcul dynamiquement nous permet de ne prendre en compte que les polygones déjà reçus par le client. Pour une texture $T$, nous notons l'ensemble des polygones peints par cette texture $\Delta(s^T) = \Delta(T)$ (qui ne dépend que de la texture $T$ et qui est donc constante pour chaque représentation de la texture). À chaque instant $t_i$, un sous ensemble de $\Delta(T)$ a été téléchargé, nous le notons $\Delta(T, t_i)$.
\subsubsection{Paramètres dynamiques}
En plus des paramètres statiques enregistrés dans le MPD pour chaque segment, des paramètres dépendants du point de vue sont calculés pendant la navigation. Premièrement, une mesure d'aire est calculée pour les segments de texture. Puisqu'une texture est peinte sur un ensemble de polygones, on compte pour l'aire de la texture la somme des aires de ces polygones. Nous pourrions calculer cette information de manière statique et l'enregistrer dans le MPD, mais faire ce calcul dynamiquement nous permet de ne prendre en compte que les polygones déjà reçus par le client. Pour une texture $T$, nous notons l'ensemble des polygones peints par cette texture $\Delta(s^T) = \Delta(T)$ (qui ne dépend que de la texture $T$ et qui est donc constante pour chaque représentation de la texture). À chaque instant $t_i$, un sous ensemble de $\Delta(T)$ a été téléchargé, nous le notons $\Delta(T, t_i)$.
De plus, chaque segment de géométrie appartient à un \emph{adaptation set} $AS^G$ dont les coordonnées de la boîte englobante sont stockées dans le MPD\@. Étant donné la boîte englobante $\mathcal{BB}(AS^G)$ et le point de vue $v(t_i)$ à l'instant $t_i$, le client calcule la distance $\mathcal{D}(v(t_i), AS^G)$ à $\mathcal{BB}(AS^G)$ comme la distance du centre de $\mathcal{BB}(AS^G)$ au point principal de la caméra.
De plus, chaque segment de géométrie appartient à un \emph{adaptation set} $AS^G$ dont les coordonnées de la boîte englobante sont enregistrées dans le MPD\@. Étant donné la boîte englobante $\mathcal{BB}(AS^G)$ et le point de vue $v(t_i)$ à l'instant $t_i$, le client calcule la distance $\mathcal{D}(v(t_i), AS^G)$ à $\mathcal{BB}(AS^G)$ comme la distance du centre de $\mathcal{BB}(AS^G)$ au point principal de la caméra.
\paragraph{Utilité des segments de géométrie}
\subsubsection{Utilité des segments de géométrie}
Maintenant, nous avons tous les paramètres pour déduire une mesure d'utilité d'un segment de géométrie. L'utilité des segments de textures est déduite des utilités des segments de géométrie.
L'utilité d'un segment de géométrie $s^G$ pour un point de vue $v(t_i)$ est
@ -34,10 +34,10 @@ où $AS^G$ est l'\emph{adaptation set} qui contient $s^G$.
Concrètement, l'utilité d'un segment est proportionnelle à l'aire couverte par ce segment, et inversement proportionnelle au carré de la distance entre la caméra et la boîte englobante de son \emph{adaptation set}. De cette manière, nous favorisons les segments contenant des grandes faces, et qui sont proches de la caméra.
\paragraph{Utilité des segments de texture}
\subsubsection{Utilité des segments de texture}
Pour une texture $T$, les polygones de $\Delta(T)$ peuvent être dans plusieurs segments de géométrie. Ainsi, pour chaque segment de géométrie déjà téléchargé $s_k^G \in K$, on compte les polygones de $\Delta(T, t_i)$ dans $s^G_k$, et on calcule ainsi le ratio qu'occupe $T$ dans $\mathcal{A}(s_k^G)$. Nous définissons donc l'utilité, notée $\mathcal{U}\Big( s^T,v(t_i) \Big)$, par
$$psnr(s^T) \sum_{k\in K}\frac{\mathcal{A}_{3D}( s_k^G\cap \Delta(T,t_i))}{\mathcal{A}_{3D}(s_k^G)} \mathcal{U}\big( s_k^G,v(t_i) \big)$$
$$psnr(s^T) \sum_{k\in K}\frac{\mathcal{A}( s_k^G\cap \Delta(T,t_i))}{\mathcal{A}(s_k^G)} \mathcal{U}\big( s_k^G,v(t_i) \big)$$
Concrètement, cette formule définit l'utilité d'un segment de texture grâce à la combinaison linéaire des utilités des segments de géométrie qui utilisent cette texture, pondérées par la proportion occupée par la texture dans le segment.
On calcule ensuite un PSNR en utilisant l'erreur quadratique moyenne du MPD et on le note $psnr(s^T)$, de sorte à donner une utilité plus grande à des textures plus haute résolution.
@ -46,8 +46,9 @@ Le client peut ainsi utiliser les utilités définies sur les segments de géom
\subsection{Logique d'adaptation de DASH}\label{fr:dashadaptation}
Le long du chemin de caméra $C=\{v(t_i)\}$, les points de vue sont indexés par un intervalle de temps continu $t_i \in [t_1, t_{final}]$. Par contraste, la logique d'adaptation de DASH procède séquentiellement le long d'une suite discrète d'instants. La première requête HTTP faite par le client DASH à l'instant $t_1$ choisit le segment le plus utile $s_1^*$, qui sera suivi des décisions suivantes aux instants $t_2, t_3, \dots$. Pour choisir le segment $s_i^*$, le client doit faire un compromis entre la géométrie et les différentes résolutions de texture en tenant compte du débit, des mouvements de la caméra, et des utilités des segments. La différence entre $t_{i+1}$ et $t_i$ correspond au temps que va mettre $s_i^*$ à arriver. Cette durée peut varier en fonction de la taille du segment et des conditions du réseau. L'algorithme~\ref{fr:nextsegment} explique comment notre client DASH prend ses décisions.
Le long du chemin de caméra $C=\{v(t_i)\}$, les points de vue sont indexés par un intervalle de temps continu $t_i \in [t_1, t_{final}]$. Par contraste, la logique d'adaptation de DASH procède séquentiellement le long d'une suite discrète d'instants. La première requête HTTP faite par le client DASH à l'instant $t_1$ choisit le segment le plus utile $s_1^*$, qui sera suivi des décisions suivantes aux instants $t_2, t_3, \dots$. Pour choisir le segment $s_i^*$, le client doit faire un compromis entre la géométrie et les différentes résolutions de texture en tenant compte du débit, des mouvements de la caméra, et des utilités des segments. La différence entre $t_{i+1}$ et $t_i$ correspond au temps que va mettre $s_i^*$ à être téléchargé. Cette durée peut varier en fonction de la taille du segment et des conditions du réseau. L'algorithme~\ref{fr:nextsegment} explique comment notre client DASH prend ses décisions.
\renewcommand{\algorithmcfname}{algorithme}
\begin{algorithm}[th]
\SetKwInOut{Input}{entrée}
\SetKwInOut{Output}{sorties}
@ -79,7 +80,7 @@ Le long du chemin de caméra $C=\{v(t_i)\}$, les points de vue sont indexés par
{\caption{Sélection du prochain segment\label{fr:nextsegment}}}
\end{algorithm}
La façon la plus naïve de séquentiellement optimiser $\mathcal{U}$ est de limiter la décision au point de vue courant $v(t_i)$. Dans ce cas, le meilleur segment $s$ à télécharger sera celui qui maximisera $\mathcal{U}(s, v(t_i))$ pour simplement avoir un meilleur rendu au point de vue courant $v(t_i)$. À cause des délais de transmission, ce segment n'arrivera qu'à l'instant $t_{i+1}=t_{i+1}(s)$ qui dépendra des conditions du réseau et de la taille du segment \begin{equation*} t_{i+1}(s)=t_i+\frac{\mathtt{size}(s)}{\widehat{BW_i}} + \widehat{\tau_i}\label{eq2}\end{equation*}
La façon la plus naïve de séquentiellement optimiser $\mathcal{U}$ est de limiter la décision au point de vue courant $v(t_i)$. Dans ce cas, le meilleur segment $s$ à télécharger sera celui qui maximisera $\mathcal{U}(s, v(t_i))$ pour simplement avoir un meilleur rendu au point de vue courant $v(t_i)$. À cause des délais de transmission, ce segment n'arrivera qu'à l'instant $t_{i+1}=t_{i+1}(s)$ qui dépendra des conditions du réseau et de la taille du segment \begin{equation*} t_{i+1}(s)=t_i+\frac{\mathtt{taille}(s)}{\widehat{BW_i}} + \widehat{\tau_i}\label{eq2}\end{equation*}
En conséquence, le segment le plus utile depuis $v(t_i)$ à l'instant $t_i$ sera peut-être moins utile au moment où il arrivera, à l'instant $t_{i+1}$.

View File

@ -1,42 +1,42 @@
\section{Formater un NVE en DASH}\label{fr:dash3d}
Dans cette section, nous décrivons comment nous préparons et stockons les données 3D de notre NVE (Networked Virtual Environment) dans un format qui soit compatible avec DASH\@.
Dans cette section, nous décrivons comment nous préparons et enregistrons les données 3D de notre NVE (Networked Virtual Environment) dans un format qui soit compatible avec DASH\@.
Dans nos travaux, nous utilisons le format WaveFront OBJ pour les polygones et PNG pour les textures. Cependant, le processus s'applique aussi à d'autres formats.
\subsection{Le MPD}
Dans DASH, les informations telles que les URL des fichiers, leur résolution ou leur taille sont extraites par le client grâce à un fichier appelé \emph{Media Presentation Description, MPD}. Le client ne se base que sur ces informations pour décider quel fichier télécharger et à quel niveau de résolution.
Dans DASH, les informations telles que les URL des fichiers, leur résolution ou leur taille sont extraites par le client grâce à un fichier appelé \emph{Media Presentation Description}, MPD\. Le client ne se base que sur ces informations pour décider quel fichier télécharger et à quel niveau de résolution.
Le MPD est un fichier XML organisé hiérarchiquement en différentes sections.
Les \emph{periods} sont le premier niveau, qui dans le cas de la vidéo, indiquent le début et la durée d'un chapitre. Cet élément ne s'applique pas dans le cas d'un NVE, et nous utilisons donc une seule \emph{period} qui contiendra toute la scène, puisque celle-ci est statique.
Chaque \emph{period} contient un ou plusieurs \emph{adaptation sets} qui décrivent des versions alternatives, formats et types de contenu. Nous utilisons les \emph{adaptation sets} pour organiser la géométrie et les textures de la scène.
\subsection{Adaptation sets}
\subsection{\emph{Adaptation sets}}
Quand un utilisateur navigue librement dans un NVE, le champ de vision à un moment donné ne contient qu'une partie limitée de la scène. De la même façon que DASH-vidéo partitionne une vidéo en blocs temporels, nous partitionnons les polygones en blocs spatiaux, de sorte que notre client puisse ne télécharger que les blocs nécessaires.
\textbf{Gestion de la géométrie.}\label{fr:geometry}
\subsubsection{Gestion de la géométrie\label{fr:geometry}}
Nous utilisons un arbre de partitionnement de l'espace pour organiser les faces en cellules.
Une face appartient à une cellule si son barycentre est à l'intérieur de la boîte englobante correspondante.
Chaque cellule appartient à un \emph{adaptation set}.
Ainsi, l'information géométrique est étalée dans les \emph{adaptation sets} en fonction de leur cohérence spatiale, permettant au client de choisir les faces pertinentes à télécharger. Une cellule est pertinente si son intersection avec le champ de vision de l'utilisateur est non vide. Dans la figure~\ref{fr:bigpic}, les cellules pertinentes sont représentées en bleu.
Ainsi, l'information géométrique est étalée dans les \emph{adaptation sets} en fonction de leur cohérence spatiale, permettant au client de choisir les faces pertinentes à télécharger. Une cellule est pertinente si son intersection avec le champ de vision de l'utilisateur est non vide. Dans la figure~\ref{fr:bigpic}, les cellules pertinentes sont représentées en vert.
Puisque notre scène 3D est principalement étalée le long d'un plan horizontal, nous séparons alternativement le modèle dans les deux directions de ce plan.
Nous créons un \emph{adaptation set} séparé pour les grandes faces (par exemple, le ciel ou le sol) puisqu'elles sont essentielles au modèle 3D et qu'elles ne rentrent pas dans les cellules. Nous considérons une face comme grande si son aire est supérieure à $a+3\sigma$$a$ et $\sigma$ sont respectivement la moyenne et l'écart-type des aires des faces. Dans notre exemple, ceci correspond aux 5 plus grandes faces, qui représentent $15\%$ de l'aire totale. Nous obtenons ainsi une décomposition du NVE en \emph{adaptation sets} qui partitionne la géométrie de la scène en un \emph{adaptation set} qui contient les faces les plus grandes, et d'autres qui contiennent les faces restantes.
Nous enregistrons la position de chaque \emph{adaptation set}, caractérisée par les coordonnées de sa boîte englobante, dans le MPD comme une propriété supplémentaire de l'\emph{adaptation set}, sous la forme ``\textit{$x_{min}$, largeur, $y_{min}$, hauteur, $z_{min}$, profondeur}'' (comme indiqué dans le listing 1). Ces informations sont utilisées par le client pour implémenter un streaming dépendant de la vue (Section~\ref{fr:dashclientspec}).
Nous enregistrons la position de chaque \emph{adaptation set}, caractérisée par les coordonnées de sa boîte englobante, dans le MPD comme une propriété supplémentaire de l'\emph{adaptation set}, sous la forme ``\textit{$x_{min}$, largeur, $y_{min}$, hauteur, $z_{min}$, profondeur}'' (comme indiqué dans l'extrait de code~\ref{fr:geometry-as-example}). Ces informations sont utilisées par le client pour implémenter une transmission dépendante de la vue (Section~\ref{fr:dashclientspec}).
\textbf{Gestion des textures.}
\subsubsection{Gestion des textures}
Avec les données de géométrie, nous gérons les textures en utilisant différents \emph{adaptation sets}, indépendants de ceux de la géométrie. Chaque fichier de texture correspond à un \emph{adaptation set} différent, avec différentes \emph{representations} (voir Section~\ref{fr:representation}) qui fournissent des résolutions différentes des images. Nous ajoutons à chaque \emph{adaptation set} de texture un attribut qui décrit la couleur moyenne de la texture. Le client peut se servir de cet attribut pour dessiner une face dont la texture n'a pas encore été téléchargée avec une couleur uniforme naturelle (voir Figure~\ref{fr:textures}).
\subsection{Représentations\label{fr:representation}}
Chaque \emph{adaptation set} peut contenir une ou plusieurs \emph{representations} de la géométrie ou des textures, à différent niveaux de détail (par exemple, avec un nombre différent de faces).
Pour la géométrie, la résolution est hétérogène, et appliquer une représentation multi-résolution est pénible : l'aire des faces varie de $0.01$ à plus de $10K$, sans tenir compte des faces extrêmes.
Pour les scènes texturées, il est commun d'avoir des données hétérogènes puisque l'information peut être stockée soit sous forme de géométrie, soit sous forme de texture. Anisi, gérer un compromis entre géométrie et textures est plus adaptable que gérer une combinaison multi-résolution.
Pour la géométrie, la résolution est hétérogène, et appliquer une représentation multi-résolution est pénible : l'aire des faces varie de $0.01$ à plus de $10k$, sans tenir compte des faces extrêmes.
Pour les scènes texturées, il est commun d'avoir des données hétérogènes puisque l'information peut être enregistrée soit sous forme de géométrie, soit sous forme de texture. Anisi, gérer un compromis entre géométrie et textures est plus adaptable que gérer une combinaison multi-résolution.
Pour chaque texture, nous générons des résolutions successives en divisant par 2 la hauteur et la largeur, en s'arrêtant lorsque l'image a une taille inférieure à $64 \times 64$.
La Figure~\ref{fr:textures} montre l'utilisation des textures comparée à l'affichage avec une seule couleur par face.
@ -54,7 +54,7 @@ La Figure~\ref{fr:textures} montre l'utilisation des textures comparée à l'aff
\end{figure}
\subsection{Segments}
Pour permettre l'accès aléatoire au contenu au sein d'un \emph{adaptation set} de géométrie, nous groupons les faces en \emph{segments}. Chaque \emph{segment} est ensuite encodé en un fichier OBJ, qui peut être requêté individuellement par le client. Nous partitionnons les faces d'un \emph{adaptation set} en ensembles de $N_s$ faces, en triant les faces par aires décroissantes, et en plaçant ensuite les $N_s$ faces successives dans un \emph{segment}. Ainsi, le premier \emph{segment} contient les faces les plus grandes et le dernier les faces les plus petites. Pour les textures, chaque \emph{representation} contient un \emph{segment} unique.
Pour permettre l'accès aléatoire au contenu au sein d'un \emph{adaptation set} de géométrie, nous groupons les faces en \emph{segments}. Chaque \emph{segment} est ensuite encodé en un fichier OBJ, qui peut être téléchargé individuellement par le client. Nous partitionnons les faces d'un \emph{adaptation set} en ensembles de $N_s$ faces, en triant les faces par aires décroissantes, et en plaçant ensuite les $N_s$ faces successives dans un \emph{segment}. Ainsi, le premier \emph{segment} contient les faces les plus grandes et le dernier les faces les plus petites. Pour les textures, chaque \emph{representation} contient un \emph{segment} unique.
\begin{figure}
\lstset{

View File

@ -4,28 +4,28 @@ Nous décrivons maintenant notre installation et les données que nous utilisons
\subsection{Installation expérimentale}
\paragraph{Modèle}
Dans nos expériences, nous utilisons un modèle du quartier de Marina Bay à Singapour. Le modèle contient 387.551 sommets et 552.118 faces. La géométrie occupe 62 MO et les textures en occupent 167. Nous partitionnons la géométrie dans un \emph{$k$-d tree} jusqu'à ce que les feuilles contiennent moins de 10.000 faces, ce qui nous donne 64 \emph{adaptation sets}, plus un pour les grandes faces.
\subsubsection{Modèle}
Dans nos expériences, nous utilisons un modèle du quartier de Marina Bay à Singapour. Le modèle contient 387.551 sommets et 552.118 faces. La géométrie occupe 62 MO et les textures en occupent 167. Nous partitionnons la géométrie dans un arbre $k$-d jusqu'à ce que les feuilles contiennent moins de 10.000 faces, ce qui nous donne 64 \emph{adaptation sets}, plus un pour les grandes faces.
\paragraph{Navigation des utilisateurs}
\subsubsection{Navigation des utilisateurs}
Pour évaluer notre système, nous avons collecté des traces d'utilisateurs réalistes que nous pouvons rejouer.
Nous avons présenté notre interface web à six utilisateurs, sur laquelle le modèle se chargeait progressivement pendant que l'utilisateur pouvait naviguer. Les interactions disponibles sont inspirées des jeux vidéos à la première personne (le clavier pour se déplacer et la souris pour tourner). Nous avons demandé aux utilisateurs de naviguer et d'explorer la scène jusqu'à ce qu'ils estiment avoir visité les régions les plus importantes. Nous leur avons ensuite demandé d'enregistrer un chemin qui donnerait une bonne présentation de la scène à un utilisateur qui voudrait la découvrir.
Toutes les 100 ms, la position et l'angle de la caméra sont enregistrées dans un tableau qui sera ensuite exporté au format JSON\@. Les traces enregistrées nous permettent de rejouer chaque enregistrement et d'effectuer les simulations et évaluations de notre système. Nous avons ainsi collecté 13 enregistrements.
Toutes les 100 ms, la position et l'angle de la caméra sont enregistrées dans un tableau qui sera ensuite exporté au format JSON\@. Les traces sauvegardées nous permettent de rejouer chaque enregistrement et d'effectuer les simulations et évaluations de notre système. Nous avons ainsi collecté 13 enregistrements.
\paragraph{Configuration du réseau}
\subsubsection{Configuration du réseau}
Nous avons testé notre implémentation sous trois débits de 2.5 Mbps, 5 Mbps et 10 Mbps avec un temps aller-retour de 76 ms, en suivant les paramètres de~\citep{dash-network-profiles}. Les valeurs restent constantes pendant toute la durée de la session pour analyser les variations de performance en fonction du débit.
Dans nos expériences, nous créons une caméra virtuelle qui suit un enregistrement, et notre système télécharge les segments en temps réel selon l'algorithme~\ref{fr:nextsegment}. Nous enregistrons dans un fichier JSON les moments où les segments sont téléchargés. En faisant ainsi, nous évitons de gaspiller le temps et les ressources nécessaires à l'évaluation du système pendant que les segments sont en train d'être téléchargés et aux stockage des informations nécessaires pour tracer les courbes présentées dans les prochaines sections.
Dans nos expériences, nous créons une caméra virtuelle qui suit un enregistrement, et notre système télécharge les segments en temps réel selon l'algorithme~\ref{fr:nextsegment}. Nous enregistrons dans un fichier JSON les moments où les segments sont téléchargés. En faisant ainsi, nous évitons de gaspiller le temps et les ressources nécessaires à l'évaluation du système pendant que les segments sont en train d'être téléchargés et à l'enregistrement des informations nécessaires pour tracer les courbes présentées dans les prochaines sections.
\paragraph{Machines et logiciels}
\subsubsection{Machines et logiciels}
Les expériences ont été lancées sur un Acer Aspire V3, avec un processeur Intel Core i7 3632QM et une carte graphique NVIDIA GeForce GT 740M. Le client DASH est écrit en Rust, et utilise Glium pour le rendu et reqwest pour le téléchargement des segments.
\paragraph{Métriques}
\subsubsection{Métriques}
Pour mesurer objectivement la qualité du rendu, nous utilisons le PSNR\@. La scène rendue a posteriori en utilisant les mêmes chemins mais en ayant téléchargé toute la géométrie et les textures est utilisée comme vérité terrain. Dans notre cas, une erreur de pixel ne peut arriver que lorsqu'une face est manquante ou quand une texture est manquante ou à une résolution trop faible.
\paragraph{Expériences}
\subsubsection{Expériences}
Nous présentons des expériences qui valident nos choix d'implémentation à chaque étape de notre système. Nous rejouons les chemins créés par les utilisateurs avec différentes conditions de débit tout en variant les composants clés de notre système.
Nous considérons deux stratégies de chargement appliquées à notre client, proposées dans la Section~\ref{fr:dashclientspec}. La stratégie gloutonne détermine, à chaque décision, le segment qui maximise l'utilité prédite du segment au moment de son arrivée, ce qui correspond à l'équation~(\ref{fr:greedy}). La deuxième stratégie de chargement que nous testons est celle proposée dans l'équation~(\ref{fr:smart}). Nous avons aussi analysé l'impact du groupement des faces dans les segments de géométrie en fonction de leur aire.
@ -73,7 +73,7 @@ Enfin, nous testons différents paramètres de débit pour étudier comment notr
\caption{Impact de l'utilité des segments de géométrie sur le rendu à une débit de 5 Mbps.\label{fr:utility-curve}}
\end{figure}
La Figure~\ref{fr:utility-curve} montre comment la métrique d'utilité peut exploiter les paramètres statiques et dynamiques. Les expériences utilisent un \emph{$k$-d tree} et la politique de chargement proposée, sur tous les chemins. On observe qu'une métrique d'utilité purement statique donne des mauvais PSNR\@. Une utilité purement dynamique donne des résultats légèrement meilleurs, notamment grâce a l'élimination des parties à l'extérieur du champ de vision, mais la version combinée décrite dans la Section~\ref{fr:utility} donne les meilleurs résultats.
La Figure~\ref{fr:utility-curve} montre comment la métrique d'utilité peut exploiter les paramètres statiques et dynamiques. Les expériences utilisent un arbre $k$-d et la politique de chargement proposée, sur tous les chemins. On observe qu'une métrique d'utilité purement statique donne des mauvais PSNR\@. Une utilité purement dynamique donne des résultats légèrement meilleurs, notamment grâce a l'élimination des parties à l'extérieur du champ de vision, mais la version combinée décrite dans la Section~\ref{fr:utility} donne les meilleurs résultats.
\begin{figure}[h]
\centering
@ -100,7 +100,7 @@ La Figure~\ref{fr:utility-curve} montre comment la métrique d'utilité peut exp
La Figure~\ref{fr:sorting} montre l'impact de l'arrangement des polygones dans les segments en fonction de leur aire. Il est clair que le PSNR augmente considérablement lorsque l'aire des faces est prise en compte lors de la création des segments. Puisque les segments ont tous la même taille (en octets), trier les faces par aire avant de les ranger dans les segments introduit une asymétrie dans la distribution des aires. Cette asymétrie permet au client de prendre des décisions (télécharger les segments avec la plus grande utilité) et peut créer une grande différence en terme de qualité de rendu.
Nous avons aussi comparé l'approche gloutonne et celle proposée (voir Figure~\ref{fr:greedyweakness}) pour un débit limitée (5 Mbps). La méthode proposée est meilleure sur les 30 premières secondes et fait mieux en moyenne. La Table~\ref{fr:greedyVsproposed} montre le PSNR moyen pour les deux méthodes pour différents débits. C'est sur les 30 premières secondes que les décisions sont cruciales puisqu'elles correspondent aux moments où peu de contenu a été téléchargé. Nous observons que notre méthode augmente la qualité du rendu entre 1 et 1.9 dB par rapport à l'approche gloutonne.
Nous avons aussi comparé l'approche gloutonne et celle proposée (voir Figure~\ref{fr:greedyweakness}) pour un débit limité (5 Mbps). La méthode proposée est meilleure sur les 30 premières secondes et fait mieux en moyenne. La Table~\ref{fr:greedyVsproposed} montre le PSNR moyen pour les deux méthodes pour différents débits. C'est sur les 30 premières secondes que les décisions sont cruciales puisqu'elles correspondent aux moments où peu de contenu a été téléchargé. Nous observons que notre méthode augmente la qualité du rendu entre 1 et 1.9 dB par rapport à l'approche gloutonne.
La Table~\ref{fr:perc} montre la distribution des textures téléchargées par les deux approches, à différents débits. La résolution 5 est la plus détaillée, et la résolution 1 la plus grossière. Cette table met en évidence une faiblesse de la politique gloutonne : quand le débit augmente, la distribution des résolutions téléchargées reste plus ou moins la même. En revanche, notre politique s'adapte en téléchargeant des plus hautes résolutions quand le débit est meilleur (13.9\% à 10 Mbps contre 0.3\% à 2.5 Mbps). En fait, une propriété intéressante de la politique proposée est qu'elle adapte le compromis géométrie-texture au débit. Les textures représentent 57.3\% des octets téléchargés à 2.5 Mbps, et 70.2\% à 10 Mbps. En d'autres termes, notre système tend à favoriser la géométrie quand le débit est faible, et favoriser les textures quand le débit augmente.

View File

@ -1,7 +1,7 @@
Ce chapitre présente la majeure contribution de cette thèse en français, pour les lecteurs non anglophones.
Il s'agit de l'adaptation du standard DASH (Dynamic Adaptive Streaming over HTTP) pour la transmission vidéo à la transmission de contenu 3D.
Il s'agit de l'adaptation du standard DASH (\emph{Dynamic Adaptive Streaming over HTTP}) pour la transmission vidéo à la transmission de contenu 3D.
DASH propose une préparation et une organisation du contenu qui permet l'élaboration de politiques de chargement.
Un client DASH est un client qui télécharge la description de l'organisation du contenu (un fichier XML appelé Media Persentation Description, MPD), et qui décide, en fonction de ses besoins, ce qui doit être téléchargé.
Un client DASH est un client qui télécharge la description de l'organisation du contenu (un fichier XML appelé \emph{Media Persentation Description}, MPD), et qui décide, en fonction de ses besoins, ce qui doit être téléchargé.
Ces décisions étant prises indépendamment du serveur, celui-ci n'effectue aucun calcul ce qui rend la solution scalable.
Dans ce chapitre, nous montrons comment nous imitons DASH vidéo pour la transmission 3D, et nous développons un système qui hérite des avantages de DASH.

View File

@ -1,7 +1,7 @@
\section{Thesis outline}
First, in Chapter~\ref{f}, we give some preliminary information required to understand the types of objects we are manipulating in this thesis.
We then proceed to compare 3D and video content: surprisingly, video and 3D share many features, and analysing video setting gives inspiration for building a 3D streaming system.
We then proceed to compare 3D and video content: video and 3D share many features, and analysing video setting gives inspiration for building a 3D streaming system.
In Chapter~\ref{sote}, we present a review of the state of the art in multimedia interaction and streaming.
This chapter starts with an analysis of the video streaming standards.
@ -12,7 +12,7 @@ Then, in Chapter~\ref{bi}, we present our first contribution: an in-depth analys
We first develop a basic interface for navigating in 3D and then, we introduce 3D objects called \emph{bookmarks} that help users navigating in the scene.
We then present a user study that we conducted on 50 people which shows that bookmarks ease user navigation: they improve performance at tasks such as finding objects.
% Then, we setup a basic 3D streaming system that allows us to replay the traces collected during the user study and simulate 3D streaming at the same time.
We analyse how the presence of bookmarks impacts the streaming: we propose and evaluate streaming policies based on pre-computations relying on bookmarks and that measurably increase the quality of experience.
We analyse how the presence of bookmarks impacts the streaming: we propose and evaluate streaming policies based on precomputations relying on bookmarks and that measurably increase the quality of experience.
In Chapter~\ref{d3}, we present the most important contribution of this thesis: DASH-3D.
DASH-3D is an adaptation of DASH (Dynamic Adaptive Streaming over HTTP): the video streaming standard, to 3D streaming.
@ -23,5 +23,5 @@ We finally evaluate the different parameters of our client.
In Chapter~\ref{sb}, we present our last contribution: the integration of the interaction ideas that we developed in Chapter~\ref{bi} into DASH-3D.
We first develop an interface that allows desktop as well as mobile devices to navigate streamed 3D scenes, and that introduces a new style of bookmarks.
We then explain why simply applying the ideas developed in Chapter~\ref{bi} is not sufficient and we propose more efficient pre-computations that enhances the streaming.
We then explain why simply applying the ideas developed in Chapter~\ref{bi} is not sufficient and we propose more efficient precomputations that enhances the streaming.
Finally, we present a user study that provides us with traces on which we evaluate the impact of our extension of DASH-3D on the quality of service and on the quality of experience.

View File

@ -5,14 +5,13 @@ First, we want to measure the impact of 3D bookmarks on navigation within an NVE
Second, we want to collect traces from the users so that we can replay them for reproducible experiments for comparing streaming strategies in Section~\ref{bi:system}.
\subsection{Our NVE\label{bi:our-nve}}
To ease the deployment of our experiments to users in distributed locations on a crowdsourcing platform, we implement a simple Web-based NVE client using THREE.js\footnote{http://threejs.org}.
To ease the deployment of our experiments to users in distributed locations on a crowdsourcing platform, we implement a simple web-based NVE client using THREE.js\footnote{http://threejs.org}.
The NVE server is implemented with node.js\footnote{http://nodejs.org}.
The NVE server streams a 3D scene to the client; the client renders the scene as the 3D content are received.
The NVE server streams a 3D scene to the client; the client renders the scene as the 3D content is received.
The user can navigate within the NVE in the following way; he/she can translate the camera using the arrow keys along four directions: forward, backward, to the left, and to the right.
The user can navigate within the NVE in the following way; the camera can be translated using the arrow keys along four directions: forward, backward, to the left, and to the right.
Alternatively, the keys W, A, S and D can also be used for the same actions.
This choice was inspired by 3D video games, which often use these keys in conjunction with the
mouse to move an avatar.
This choice was inspired by 3D video games, which often use these keys in conjunction with the mouse to move an avatar.
The virtual camera can rotate in four different directions using the keys I, K, J and L.
The user can also rotate the camera by dragging the mouse in the desired direction.
Finally, following the UI of popular 3D games, we also give users the possibility to lock their pointer and use their mouse as a virtual camera.
@ -40,7 +39,7 @@ The bottom of the arrow turns towards the current position, to better visualize
Bookmarks allow the user to achieve a large movement within the 3D environment using a single action (a mouse click).
Since bookmarks are part of the scene, they are visible only when not hidden by other objects from the scene.
We chose size and colors that are salient enough to be easily seen, but not too large to limit the occlusion of regions within the scene.
When reaching the bookmark, the corresponding arrow or viewport is not visible anymore, and subsequently will appear in a different color, to indicate that it has been clicked (similar to Web links).
When reaching the bookmark, the corresponding arrow or viewport is not visible anymore, and subsequently will appear in a different color, to indicate that it has been clicked (similar to web links).
\subsection{User study\label{bi:user-study}}
@ -49,7 +48,7 @@ We now describe in details our experimental setup and the user study that we con
\subsubsection{Models}
We use four 3D scenes (one for the tutorial and three for the actual experiments) which represent recreated scenes from a famous video game.
Those models are light (a few thousand of triangles per model) and are sent before the experiment starts.
We keep the models small so that users can perform the task with acceptable latency from any country using a decent Internet connection.
We keep the models small so that users can perform the task with acceptable latency from any country using a decent internet connection.
Our NVE does not stream the 3D content for these experiments, in order to avoid unreliable conditions caused by the network bandwidth variation, which might affect how the users interact.
\subsubsection{Task design}
@ -205,4 +204,3 @@ Figure~\ref{bi:triangles-curve} shows a CDF of the percentage of 3D mesh triangl
As expected, the fact that the users can browse the scene significantly quicker with bookmarks reflects on the demand on the 3D content.
Users need more triangles more quickly, which either leads to more demand on network bandwidth, or if the bandwidth is kept constant, leads to fewer objects being displayed.
In the next section, we introduce experiments based on our user study traces that show how the rendering is affected by the presence of bookmarks and how to improve it.
We found no significant correlation between the performance at the task and the age of the users or their skills in videogames.

View File

@ -4,7 +4,7 @@ In this chapter, we have described an interface that allows a user to navigate i
We identified and addressed the problems linked to the dynamics of both the user behaviour and the 3D content.
\begin{itemize}
\item Navigating in a 3D scene can be complex, due to the many degrees of freedom, and tweaking the interface can increase the user's Quality of Experience.
\item Navigating in a 3D scene can be complex, due to the many degrees of freedom, and tweaking the interface can increase the user's quality of experience.
\item Adding bookmarks to the interface increases the quality of experience of the users and makes them visiting more data in the same amount of time.
\item This increase in speed of navigation has a negative impact on the quality of service of the system.
\item Having bookmarks in the scene biases the users navigation and makes the navigation more predictable: it is possible to link data utility to bookmarks in order to benefit from this predictability.
@ -15,5 +15,5 @@ The server knows all the data and simply determines what the client needs, it pr
Thus, the server has to keep track of what the client already has (which will eat memory) and has to compute what should be sent next (which will eat CPU). The scalability of such a server is therefore inexistent.
Furthermore, we only considered geometry streaming: materials and textures are downloaded before the streaming starts, which causes great latency at the start of the streaming and harms the quality of experience.
After learning these lessons, we describe, in the next chapter, what is possible to do in order to alleviate these issues.
After learning these lessons, we describe, in the next chapter, what can be done in order to alleviate these issues.
We show how the standard for video steaming, DASH, teaches us to prepare 3D content in order to remove all server side computations, to elaborate great streaming policies, and to support both geometry and texture chunks.

View File

@ -35,7 +35,7 @@ We conduct a within-subject user-study on 51 participants, where each user start
We show that not only the presence of bookmarks causes a faster task completion, but also that it allows users to see a larger part of the scene during the same time span.
However, in a streaming scenario, this phenomenon leads to higher network requirements to maintain the same quality of service.
In the last part of this chapter, we simulate a streaming setup and we show that knowing the positions of the bookmarks beforehand allows us to pre-compute information that we can reuse during streaming to compensate for the harm caused by the faster navigation with bookmarks.
In the last part of this chapter, we simulate a streaming setup and we show that knowing the positions of the bookmarks beforehand allows us to precompute information that we can reuse during streaming to compensate for the harm caused by the faster navigation with bookmarks.
\newpage

View File

@ -3,8 +3,8 @@
\subsection{3D model streaming}
In this section, we describe our implementation of a 3D model streaming policy in our simulation.
A summary of the streaming policies we designed is give in Table~\ref{bi:streaming-policies}.
Note that the policy is different from that we used for the crowdsourcing experiments.
A summary of the streaming policies we designed is given in Table~\ref{bi:streaming-policies}.
Note that the policy is different from the one we used for the crowdsourcing experiments.
Recall that in the crowdsourcing experiments, we load all the 3D content before the participants begin to navigate to remove bias due to different network conditions.
Here, we implemented a streaming version, which we expect an actual NVE will use.
@ -16,7 +16,7 @@ To increase the size of the model, while keeping the same 3D scene, we subdivide
We do this to simulate a reasonable use case with large 3D scenes.
Table~\ref{bi:modelsize} shows that material and texture amount at most for $3.6\%$ of the geometry, which justifies this choice.
When a client starts loading the Web page containing the 3D model, the server first sends the list of materials and the texture files.
When a client starts loading the web page containing the 3D model, the server first sends the list of materials and the texture files.
Then, the server periodically sends a fixed size chunk that indifferently encapsulates vertices, texture coordinates, or faces.
A \textit{vertex} is coded with three floats and an integer ($x$, $y$, and $z$ coordinates and the index of the vertex), a \textit{texture coordinate} with two floats and an integer (the $x$ and $y$ coordinates on the image and the index of the texture coordinate), and a face with eight integers (the index of each vertex, the index of each texture coordinate, the index of the face and the number of the corresponding material).
Consequently, given the Javascript implementation of integers and floats, we approximate each vertex and each texture coordinate to take up 32 bytes, and each face takes up 96 bytes.
@ -76,7 +76,7 @@ We want to exploit bookmarks to improve the user's quality of experience. For th
\subsubsection{Visibility determination for 3D bookmarks}
A bookmarked viewpoint is more likely to be accessed, compared to other arbitrary viewpoint in the 3D scene.
We exploit this fact to perform some pre-computation on the 3D content visible from the bookmarked viewpoint.
We exploit this fact to perform some precomputation on the 3D content visible from the bookmarked viewpoint.
Recall that \textsf{culling} does not consider occlusion of the faces.
Furthermore, it prioritizes the faces according to distance from the camera, and does not consider the actual contribution of the faces to the rendered 2D images.
@ -84,11 +84,11 @@ Ideally, we should prioritize the faces that occupy a bigger area in the 2D rend
Computing this, however, requires rendering the scene at the server, and measuring the area of each face.
It is not scalable to compute this for every viewpoint requested by the client.
However, we can pre-render the bookmarked viewpoints, since the number of bookmarks is limited, their viewpoints are known in advance, and they are likely to be accessed.
However, we can prerender the bookmarked viewpoints, since the number of bookmarks is limited, their viewpoints are known in advance, and they are likely to be accessed.
For each bookmark, we render offline the scene using a single color per triangle.
Once rendered, we scan the output image to find the visible triangles (based on the color) and sort them by decreasing projected area.
This technique is also used by~\citep{view-dependent-progressive-mesh}.
Thus, when the user clicks on a 3D bookmark, this pre-computed list of faces is used by the server, and only visible faces are sent in decreasing order of contributions to the rendered image.
Thus, when the user clicks on a 3D bookmark, this precomputed list of faces is used by the server, and only visible faces are sent in decreasing order of contributions to the rendered image.
For the three scenes that we used in the experiment, we can reduce the number of triangles sent by 60\% (over all bookmarks).
This reduction is as high as 85.7\% for one particular bookmark (from 26,886 culled triangles to 3,853 culled and visible triangles).
@ -317,7 +317,7 @@ Combining both strategies (\textsf{V-PP+FD}) leads to the best quality.
At 1 Mbps bandwidth, \textsf{V-PP} penalizes the quality, as the curve \textsf{V-PP-FD} leads to a lower quality image than \textsf{V-FD} alone.
This effect is even stronger when the bandwidth is set to 2 Mbps (Figure~\ref{bi:2MB}).
Both streaming strategies based on the pre-computation of the ordering improves the image quality.
Both streaming strategies based on the precomputation of the ordering improves the image quality.
We see here, that \textsf{V-FD} has a greater impact than \textsf{V-PP}. Here, \textsf{V-PP} may prefetch content that eventually may not be used, whereas \textsf{V-FD} only sends relevant 3D content (knowing which bookmark has been just clicked).
We present only the results after the first click.

View File

@ -1,7 +1,7 @@
\section{3D streaming\label{sote:3d-streaming}}
In this thesis, we focus on the objective of delivering large, massive 3D scenes over the network.
While 3D streaming is not the most standard field of research, there has been a special attention around 3D content compression, in particular progressive compression which can be considered a premise for 3D streaming.
While 3D streaming is not the most popular research field, there has been a special attention around 3D content compression, in particular progressive compression which can be considered a premise for 3D streaming.
In the next sections, we review the 3D streaming related work, from 3D compression and structuring to 3D interaction.
\subsection{Compression and structuring}
@ -98,7 +98,7 @@ It is notably used in 3DHOP (3D Heritage Online Presenter, \citep{3dhop}), a fra
Each of these approaches define its own compression and coding for a single mesh.
However, users are often interested in scenes that contain multiple meshes, and the need to structure content emerged.
To answer those issues, the Khronos group proposed a generic format called glTF (GL Transmission Format,~\citep{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated model, etc.\
To answer those issues, the Khronos group proposed a generic format called glTF (GL Transmission Format,~\citep{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated models, etc.\
glTF is based on a JSON file, which encodes the structure of a scene of 3D objects.
It contains a scene graph with cameras, meshes, buffers, materials, textures and animations.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming, which is required for large scene remote visualisation and which we address in our work.
@ -138,7 +138,7 @@ Level of details have been initially used for efficient 3D rendering~\citep{lod}
When the change from one level of detail to another is direct, it can create visual discomfort to the user.
This is called the \emph{popping effect} and level of details have the advantage of enabling techniques, such as geomorhping \citep{hoppe-lod}, to transition smoothly from one level of detail to another.
Level of details have then been used for 3D streaming.
For example, \citep{streaming-hlod} propose an out-of-core viewer for remote model visualisation based by adapting hierarchical level of details~\cite{hlod} to the context of 3D streaming.
For example, \citep{streaming-hlod} propose an out-of-core viewer for remote model visualisation based by adapting hierarchical level of details~\citep{hlod} to the context of 3D streaming.
Level of details can also be used to perform viewpoint dependant streaming, such as \citep{view-dependent-lod}.
\subsection{Texture streaming}
@ -154,7 +154,6 @@ Since 3D data can contain many textures, \citep{simon2019streaming} propose a wa
Each texture is segmented into tiles of a fixed size.
Those tiles are then ordered to minimise dissimilarities between consecutive tiles, and encoded as a video.
By benefiting from the video compression techniques, the authors are able to reach a better rate-distortion ratio than webp, which is the new standard for texture transmission, and jpeg.
However, the geometry / texture compromise is not the point of that paper.
\subsection{Geometry and textures}

View File

@ -1,4 +1,4 @@
In this chapter, we review the part of the state of the art on Multimedia streaming and interaction that is relevant for this thesis.
In this chapter, we review the part of the state of the art on multimedia streaming and interaction that is relevant for this thesis.
As discussed in the previous chapter, video and 3D share many similarities and since there is already a very important body of work on video streaming, we start this chapter with a review of this domain with a particular focus on the DASH standard.
Then, we proceed with presenting topics related to 3D streaming, including compression and streaming, geometry and texture compromise, and viewpoint dependent streaming.
Finally, we end this chapter by reviewing the related work regarding 3D navigation and interfaces.

View File

@ -1,6 +1,6 @@
\section{Video\label{sote:vide}}
Accessing a remote video through the Web has been a widely studied problem since the 1990s.
Accessing a remote video through the web has been a widely studied problem since the 1990s.
The Real-time Transport Protocol (RTP,~\cite{rtp-std}) has been an early attempt to formalize audio and video streaming.
The protocol allowed data to be transferred unilaterally from a server to a client, and required the server to handle a separate session for each client.
% While this protocol can be useful in particular scenarii, such as video-conferencing, it suffers from several is a stateful protocol: the server keeps track of every user along the streaming session.
@ -15,8 +15,8 @@ This type of network architecture is called CDN (Content Delivery Network) and i
\subsection{DASH\@: the standard for video streaming\label{sote:dash}}
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH \citep{dash-std,dash-std-2}, is now a widely deployed
standard for adaptively streaming video on the Web \citep{dash-std-full}, made to be simple, scalable and inter-operable.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content should be downloaded, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD\@.
standard for adaptively streaming video on the web \citep{dash-std-full}, made to be simple, scalable and inter-operable.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content to downloaded, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD\@.
\subsubsection{DASH structure}
@ -45,7 +45,7 @@ A representation of the images of a chapter of a movie is still a long video, an
Segments are used to prevent this issue.
They typically encode files that contain two to ten seconds of video, and give the software a greater ability to dynamically adapt to the system.
If a user wants to seek somewhere else in the video, only one segment of data is potentially lost, and only one segment of data needs to be downloaded for the playback to resume. The impact of the segment duration has been investigated in many work, including \citep{sideris2015mpeg, stohr2017sweet}. For example, \citep{stohr2017sweet} discuss how the segment duration affects the streaming: short segments lower the initial delay and provide the best stalling Quality of Experience, but make the total downloading time of the video longer because of overhead.
If a user wants to seek somewhere else in the video, only one segment of data is potentially lost, and only one segment of data needs to be downloaded for the playback to resume. The impact of the segment duration has been investigated in many work, including \citep{sideris2015mpeg, stohr2017sweet}. For example, \citep{stohr2017sweet} discuss how the segment duration affects the streaming: short segments lower the initial delay and provide the best stalling quality of experience, but make the total downloading time of the video longer because of overhead.
\subsubsection{Content preparation and server}
@ -58,7 +58,7 @@ This is one of the DASH strengths: no powerful server is required, and since sta
\subsubsection{Client side adaptation}
A client typically starts by downloading the MPD file, and then proceeds on downloading segments from the different adaptation sets. While the standard describes well how to structure content on the server side, the client may be freely implemented to take into account the specificities of a given application.
The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have their own adaptation logic, and many more have been proposed in the literature. A thorough review is beyond the scope of this state-of-the-art, but examples include \citep{chiariotti2016online} who formulate the problem in a reinforcement learning framework, \citep{yadav2017quetra} who formulate the problem using Queuing theory, or \citep{huang2019hindsight} who use a formulation derived from the Knapsack problem.
The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have their own adaptation logic, and many more have been proposed in the literature. A thorough review is beyond the scope of this state-of-the-art, but examples include \citep{chiariotti2016online} who formulate the problem in a reinforcement learning framework, \citep{yadav2017quetra} who formulate the problem using queuing theory, or \citep{huang2019hindsight} who use a formulation derived from the knapsack problem.
\subsection{DASH-SRD}
Being now widely adopted in the context of video streaming, DASH has been adapted to various other contexts.

View File

@ -62,7 +62,7 @@ Since our scene is static, a user knows that a changing object is not part of th
The other bookmark parameters remain unchanged since Chapter~\ref{bi}: in order to avoid users to lose context, clicking on a bookmark triggers an automatic, smooth, camera displacement that ends up at the bookmarked camera position.
We also display a thumbnail of the bookmark's viewpoint when the mouse hovers a bookmark.
Such thumbnail is displayed in Figure~\ref{sb:desktop}.
Note that since on mobile, there is no mouse and thus no pointer, thus thumbnails not used in the mobile setting.
Note that since on mobile, there is no mouse and thus no pointer, thumbnails are not used in the mobile setting.
\begin{algorithm}[th]
@ -125,7 +125,7 @@ Note that since on mobile, there is no mouse and thus no pointer, thus thumbnail
\subsection{Segments utility at bookmarked viewpoint\label{sb:utility}}
Introducing bookmarks is a way to make users navigation more predictable.
Indeed, since they are emphasized and, in a way, recommended viewpoints, bookmarks are more likely to be visited by a significant portion of users than any other viewpoint on the scene.
As such, bookmarks can be used as a way to optimize streaming by downloading segments in an optimal, pre-computed order.
As such, bookmarks can be used as a way to optimize streaming by downloading segments in an optimal, precomputed order.
More specifically, segment utility as introduced in Section~\ref{d3:utility} is only an approximation of the segment's true contribution to the current viewpoint rendering.
When bookmarks are defined, it is possible to obtain a better measure of segment utility by performing an offline rendering at each bookmark's viewpoint.
@ -189,7 +189,8 @@ Note that this curve is averaged over all the 9 bookmarks of the scene. These bo
\subsection{MPD modification}
We now present how to include bookmarks information in the Media Presentation Description (MPD) file.
Bookmarks are fully defined by a position, a direction, and the additional content needed to properly render and use a bookmark in a system consists in two files: a thumbnail of the point of view at the bookmark, along with the JSON file giving the optimal segment order for this viewpoint, as computed by Algorithm~\ref{sb:algo-optimal-order}.
Bookmarks are fully defined by a position, a direction, and the additional content needed to properly render and use a bookmark in a system.
This additional data consist in two files: a thumbnail of the point of view at the bookmark, along with the JSON file giving the optimal segment order for this viewpoint, as computed by Algorithm~\ref{sb:algo-optimal-order}.
For this reason, for each bookmark, we create a separate adaptation set in the MPD\@.
The bookmarked viewpoint information is stored as a supplemental property.
Bookmarks adaptation set only contain one representation, composed of two segments: the thumbnail used as a preview for the desktop interface and the JSON file.

View File

@ -183,7 +183,7 @@ These curves isolate the effect of our optimized policy, and shows the differenc
Figures~\ref{sb:psnr-third-experiment} and~\ref{sb:psnr-third-experiment-after-click} represent the same curves on the third experiment (free navigation).
On average, the difference in terms of PSNR is less obvious, and both strategies seem to perform the same way at least in the first 50 seconds of the experiment. The optimized policy performs slightly better than the greedy policy in the end, which can be correlated with a peak in bookmark use occuring around the 50th second.
On average, the difference in terms of PSNR is less obvious, and both strategies seem to perform the same way at least in the first 50 seconds of the experiment. The optimized policy performs slightly better than the greedy policy in the end, which can be correlated with a peak in bookmark use occuring around the $50^{th}$ second.
Figure~\ref{sb:psnr-third-experiment-after-click} also shows an interesting effect: the optimized policy still performs way better after a click on a bookmark, but the two curves converge to the same PSNR value after 9 seconds. This is largely task-dependent: users are encouraged to observe the scene in experiment 2, while they are encouraged to visit as much of the scene as possible in experiment 3. In average, users therefore tend to stay less long at a bookmarked point of view in the third experiment than in the second.