More writing, citations, fix references

This commit is contained in:
Thomas Forgione 2019-09-13 12:01:43 +02:00
parent e198861812
commit 4641129216
No known key found for this signature in database
GPG Key ID: BFD17A2D71B3B5E7
15 changed files with 281 additions and 116 deletions

View File

@ -22,7 +22,6 @@ screen: pdf
# Build PDF version of the thesis manuscript.
pdf: $(rootfile)
make clean
$(call latexmk)
# Watch and automatically recompile when a file changes.

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

View File

@ -0,0 +1,17 @@
<Period>
<AdaptationSet>
<SupplementalProperty schemeIdUri="urn:mpeg:dash:s rd:2014" value="0,0,0,5760,3240,5760,3240"/>
<Role schemeIdUri="urn:mpeg:dash:role:2011" value= "main"/>
<Representation id="1" width="3840" height="2160">
<BaseURL>full.mp4</BaseURL>
</Representation>
</AdaptationSet>
<AdaptationSet>
<SupplementalProperty schemeIdUri="urn:mpeg:dash:s rd:2014" value="0,1920,1080,1920,1080,5760,3240"/>
<Role schemeIdUri="urn:mpeg:dash:role:2011" value= "supplementary"/>
<Representation id="2" width="1920" height="1080">
<BaseURL>part.mp4</BaseURL>
</Representation>
</AdaptationSet>
</Period>

98
src/bib.bib Normal file
View File

@ -0,0 +1,98 @@
@inproceedings{dash-srd,
title={MPEG DASH SRD: spatial relationship description},
author={Niamut, Omar A and Thomas, Emmanuel and D'Acunto, Lucia and Concolato, Cyril and Denoual, Franck and Lim, Seong Yong},
booktitle={Proceedings of the 7th International Conference on Multimedia Systems},
pages={5},
year={2016},
organization={ACM}
}
@inproceedings{dash-std,
author = {Stockhammer, Thomas},
title = {Dynamic Adaptive Streaming over HTTP --: Standards and Design Principles},
booktitle = {Proceedings of the Second Annual ACM Conference on Multimedia Systems},
series = {MMSys '11},
year = {2011},
isbn = {978-1-4503-0518-1},
location = {San Jose, CA, USA},
pages = {133--144},
numpages = {12},
url = {http://doi.acm.org/10.1145/1943552.1943572},
doi = {10.1145/1943552.1943572},
acmid = {1943572},
month = {Feb},
publisher = {ACM},
address ={San Jose, CA, USA},
keywords = {3gpp, mobile video, standards, streaming, video},
}
@article{dash-std-2,
author = {Sodagar, Iraj},
doi = {10.1109/MMUL.2011.71},
issn = {1070-986X},
journal = {IEEE Multimedia},
month = {apr},
number = {4},
pages = {62--67},
title = {{The MPEG-DASH Standard for Multimedia Streaming Over the Internet}},
url = {http://ieeexplore.ieee.org/document/6077864/},
volume = {18},
year = {2011}
}
@techreport{dash-std-full,
type={Standard},
key={ISO/IEC 23009-1:2014},
month={may},
year={2014},
title={{Information technology -- Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formats}}
}
@misc{dash-network-profiles,
Author = {DASH Industry Forum},
TITLE = "Guidelines for implementation: {DASH-AVC/264} test cases and vectors",
YEAR = {2014},
HOWPUBLISHED = {http://dashif.org/guidelines/}
}
@inproceedings{bookmarks-impact,
author = {Forgione, Thomas and Carlier, Axel and Morin, G{\'e}raldine and Ooi, Wei Tsang and Charvillat, Vincent},
title = {Impact of 3D Bookmarks on Navigation and Streaming in a Networked Virtual Environment},
booktitle = {Proceedings of the 7th International Conference on Multimedia Systems},
series = {MMSys '16},
year = {2016},
isbn = {978-1-4503-4297-1},
location = {Klagenfurt, Austria},
pages = {9:1--9:10},
articleno = {9},
numpages = {10},
url = {http://doi.acm.org/10.1145/2910017.2910607},
doi = {10.1145/2910017.2910607},
acmid = {2910607},
publisher = {ACM},
address = {Klagenfurt, Austria},
keywords = {3D bookmarks, 3D navigation aid, 3D streaming, networked virtual environment, prefetching},
month = {May},
}
@inproceedings{dash-3d,
author = {Forgione, Thomas and and Carlier, Axel and Morin, G{\'e}raldine and Ooi, Wei Tsang and Charvillat, Vincent and Yadav, Praveen Kumar},
title = {DASH for 3D Networked Virtual Environment},
year = {2018},
location = {Séoul, South Korea},
address = {Séoul, South Korea},
month = {October},
doi = {10.1145/3240508.3240701},
isbn = {978-1-4503-5665-7/18/10},
booktitle = {2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Korea}
}
@inproceedings{view-dependent-progressive-mesh,
title={Receiver-driven view-dependent streaming of progressive mesh},
author={Cheng, Wei and Ooi, Wei Tsang},
booktitle={Proceedings of the 18th International Workshop on Network and Operating Systems Support for Digital Audio and Video},
pages={9--14},
year={2008},
organization={ACM}
}

View File

@ -1,5 +1,5 @@
\copied{}
\section{Client}\label{sec:dashclientspec}
\section{Client\label{d3:dash-client}}
In this section, we specify a DASH NVE client that exploits the preparation of the 3D content in an NVE for streaming.
@ -11,7 +11,7 @@ At time instance $t_i$, the DASH client decides to download the appropriate segm
Starting from $t_1$, the camera continuously follows a camera path $C=\{v(t_i), t_i \in [t_1,t_{end}]\}$, along which downloading opportunities are strategically exploited to sequentially query the most useful segments.
\subsection{Segment Utility}\label{subsec:utility}
\subsection{Segment Utility\label{d3:utility}}
Unlike video streaming, where the bitrate of each segment correlates with the quality of the video received, for 3D content, the size (in bytes) of the content does not necessarily correlate well to its contribution to visual quality.
A large polygon with huge visual impact takes the same number of bytes as a tiny polygon.
@ -31,7 +31,7 @@ Indeed, geometry segments have close to a similar number of faces; their size is
For texture segments, the size is usually much smaller than the geometry segments but also varies a lot, as between two successive resolutions the number of pixels is divided by 4.
Finally, for each texture segment $s^{T}$, the MPD stores the \textit{MSE} (mean square error) of the image and resolution, relative to the highest resolution (by default, triangles are filled with its average color).
Offline parameters are stored in the MPD as shown in Listing~\ref{listing:MPD}.
Offline parameters are stored in the MPD as shown in Listing~\ref{d3:mpd}.
\subsubsection{Online parameters}
In addition to the offline parameters stored in the MPD file for each segment, view-dependent parameters are computed at navigation time.
@ -73,7 +73,7 @@ We do this to acknowledge the fact that a texture at a greater resolution has a
The equivalent term for geometry is 1 (and does not appear).
Having defined a utility on both geometry and texture segments, the client uses it next for its streaming strategy.
\subsection{DASH Adaptation Logic}\label{subsec:dashadaptation}
\subsection{DASH Adaptation Logic\label{d3:dash-adaptation}}
Along the camera path $C=\{v(t_i)\}$, viewpoints are indexed by a continuous time interval $t_i \in [t_1,t_{end}]$.
Contrastingly, the DASH adaptation logic proceeds sequentially along a discrete time line.
@ -81,7 +81,7 @@ The first request \texttt{(HTTP request)} made by the DASH client at time $t_1$
While selecting $s_i^*$, the i-th best segment to request, the adaptation logic compromises between geometry, texture, and the available \texttt{representations} given the current bandwidth, camera dynamics, and the previously described utility scores.
The difference between $t_{i+1}$ and $t_{i}$ is the $s_i^*$ delivery delay.
It varies with the segment size and network conditions.
Algorithm~\ref{algorithm:nextsegment} details how our DASH client makes decisions.
Algorithm~\ref{d3:next-segment} details how our DASH client makes decisions.
@ -99,18 +99,18 @@ Algorithm~\ref{algorithm:nextsegment} details how our DASH client makes decision
{- Optimize a criterion $\Omega$ based on $\mathcal{U}$ values and well chosen viewpoint $v(t_i)$ to select the next segment to query }
{\begin{equation*} s^{*}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC}} \Omega_{\theta_i} \Big(\mathcal{U}\left(s,v(t_i)\right)\Big) \label{eq1}\end{equation*} \\
{\begin{equation*} s^{*}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC}} \Omega_{\theta_i} \Big(\mathcal{U}\left(s,v(t_i)\right)\Big) \label{d3:eq1}\end{equation*} \\
given parameters $\theta_i$ that gathers both online parameters $(i,t_i,v(t_i),\widehat{BW_i}, \widehat{\tau_i}, \mathcal{B}_i)$ and offline metadata\;}
{- Update the buffer $\mathcal{B}_{i+1}$ for the next decision: $s^{*}_i$ and lowest \texttt{representations} of $s^{*}_i$ are considered downloaded\;}
{- \Return{segment $s^{*}_i$, buffer $\mathcal{B}_{i+1}$}\;}
{\caption{Algorithm to identify the next segment to query\label{algorithm:nextsegment}}}
{\caption{Algorithm to identify the next segment to query\label{d3:next-segment}}}
\end{algorithm}
The most naive way to sequentially optimize the $\mathcal{U}$ is to limit the decision-making to the current viewpoint $v(t_i)$.
In that case, the best segment $s$ to request would be the one maximizing $\mathcal{U}(s, v(t_i))$ to simply make a better rendering from the current viewpoint $v(t_i)$.
Due to transmission delay however, this segment will be only delivered at time $t_{i+1}=t_{i+1}(s)$ depending on the segment size and network conditions: \begin{equation*} t_{i+1}(s)=t_i+\frac{\mathtt{size}(s)}{\widehat{BW_i}} + \widehat{\tau_i}\label{eq2}\end{equation*}
Due to transmission delay however, this segment will be only delivered at time $t_{i+1}=t_{i+1}(s)$ depending on the segment size and network conditions: \begin{equation*} t_{i+1}(s)=t_i+\frac{\mathtt{size}(s)}{\widehat{BW_i}} + \widehat{\tau_i}\label{d3:eq2}\end{equation*}
In consequence, the most useful segment from $v(t_i)$ at decision time $t_i$ might be less useful at delivery time from $v(t_{i+1})$.
@ -119,16 +119,16 @@ With a temporal horizon $\chi$, we can optimize the cumulated $\mathcal{U}$ over
\begin{equation}
s^*_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC} } \int_{t_{i+1}(s)}^{t_i+\chi} \mathcal{U}(s,\hat{v}(t_i)) dt
\label{eq:smart}
\label{d3:smart}
\end{equation}
In our experiments, we typically use $\chi=2s$ and estimate the (\ref{eq:smart}) integral by a Riemann sum where the $[t_{i+1}(s), t_i+\chi]$ interval is divided in 4 subintervals of equal size.
In our experiments, we typically use $\chi=2s$ and estimate the (\ref{d3:smart}) integral by a Riemann sum where the $[t_{i+1}(s), t_i+\chi]$ interval is divided in 4 subintervals of equal size.
For each subinterval extremity, an order 1 predictor $\hat{v}(t_i)$ linearly estimates the viewpoint based on $v(t_i)$ and speed estimation (discrete derivative at $t_i$).
We also tested an alternative greedy heuristic selecting the segment that optimizes an utility variation during downloading (between $t_i$ and $t_{i+1}$):
\begin{equation}
s^{\texttt{GREEDY}}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC}} \frac{\mathcal{U}\Big(s,\hat{v}(t_{i+1}(s))\Big)}{t_{i+1}(s) - t_i}
\label{eq:greedy}
\label{d3:greedy}
\end{equation}

View File

@ -1,5 +1,5 @@
\copied{}
\section{Content preparation}\label{sec:dash3d}
\section{Content preparation\label{d3:dash-3d}}
In this section, we describe how we preprocess and store the 3D data of the NVE, consisting of a polygon soup, textures, and material information into a DASH-compliant Media Presentation Description (MPD) file.
In our work, we use the \texttt{obj} file format for the polygons, \texttt{png} for textures, and \texttt{mtl} format for material information.
@ -18,12 +18,12 @@ We utilize adaptation sets to organize a 3D scene's material, geometry, and text
When the user navigates freely within an NVE, the frustum at given time almost always contains a limited part of the 3D scene.
Similar to how DASH for video streaming partitions a video clip into temporal chunks, we segment the polygons into spatial chunks, such that the DASH client can request only the relevant chunks.
\subsubsection{Geometry Management}\label{sec:geometry}
\subsubsection{Geometry Management\label{d3:geometry}}
We use a space partitioning tree to organize the faces into cells.
A face belongs to a cell if its barycenter falls inside the corresponding bounding box.
Each cell corresponds to an adaptation set.
Thus, geometry information is spread on adaptation sets based on spatial coherence, allowing the client to download the relevant faces selectively.
A cell is relevant if it intersects the frustum of the client's current viewpoint. Figure~\ref{fig:bigpic} shows the relevant cells in blue.
A cell is relevant if it intersects the frustum of the client's current viewpoint. Figure~\ref{d3:big-picture} shows the relevant cells in blue.
As our 3D content, a virtual environment, is biased to spread the most along the horizontal plane, we alternate between splitting between the two horizontal directions.
We create a separate adaptation set for large faces (e.g., the sky or ground) because they are essential to the 3D model and do not fit into cells.
@ -31,14 +31,14 @@ We consider a face to be large if its area in 3D is more than $a+3\sigma$, where
In our example, it selects the 5 largest faces that represent $15\%$ of the total face area.
We thus obtain a decomposition of the NVE into adaptation sets that partitions the geometry of the scene into a small adaptation set containing the larger faces of the model, and smaller adaptation sets containing the remaining faces.
We store the spatial location of each adaptation set, characterized by the coordinates of its bounding box, in the MPD file as the supplementary property of the adaptation set in the form of ``\textit{$x_{\min}$, width, $y_{\min}$, height, $z_{\min}$, depth}'' (as shown in Listing~\ref{listing:MPD}).
This information is used by the client to implement a view-dependent streaming (Section~\ref{sec:dashclientspec}).
We store the spatial location of each adaptation set, characterized by the coordinates of its bounding box, in the MPD file as the supplementary property of the adaptation set in the form of ``\textit{$x_{\min}$, width, $y_{\min}$, height, $z_{\min}$, depth}'' (as shown in Listing~\ref{d3:mpd}).
This information is used by the client to implement a view-dependent streaming (Section~\ref{d3:dash-client}).
\subsubsection{Texture Management}
As with geometry data, we handle textures using adaptation sets but separate from geometry.
Each texture file is contained in a different adaptation set, with multiple representations providing different image resolutions (see Section~\ref{sec:representation}).
Each texture file is contained in a different adaptation set, with multiple representations providing different image resolutions (see Section~\ref{d3:representation}).
We add an attribute to each adaptation set that contains texture, describing the average color of the texture.
The client can use this attribute to render a face for which the corresponding texture has not been loaded yet, so that most objects appear, at least, with a uniform natural color (see Figure~\ref{fig:textures}).
The client can use this attribute to render a face for which the corresponding texture has not been loaded yet, so that most objects appear, at least, with a uniform natural color (see Figure~\ref{d3:textures}).
\subsubsection{Material Management}
@ -47,7 +47,7 @@ A material has a name, properties such as specular parameters, and, most importa
The \texttt{.mtl} file maps each face of the \texttt{.obj} to a material.
As the \texttt{.mtl} file is a different type of media than geometry and texture, we define a particular adaptation set for this file, with a single representation.
\subsection{Representations}\label{sec:representation}
\subsection{Representations}\label{d3:representation}
Each adaptation set can contain one or more representations of the geometry or texture data, at different levels of detail (e.g., a different number of faces).
For geometry, the resolution (i.e., 3D areas of faces) is heterogeneous, thus applying a sensible multi-resolution representation is cumbersome: the 3D area of faces varies from $0.01$ to more than $10K$, disregarding the outliers.
For textured scenes, it is common to have such heterogeneous geometry size since information can be stored either in geometry or texture.
@ -56,7 +56,7 @@ Moreover, as our faces are partitioned into independent cells, multi-resolution
For an adaptation set containing texture, each representation contains a single segment where the image file is stored at the chosen resolution.
In our example, from the full-size image, we generate successive resolutions by dividing both height and width by 2, stopping when the image size is less or equal to $64\times 64$.
Figure~\ref{fig:textures} illustrates the use of the textures against the rendering using a single, average color per face.
Figure~\ref{d3:textures} illustrates the use of the textures against the rendering using a single, average color per face.
\begin{figure}[th]
\centering
@ -68,7 +68,7 @@ Figure~\ref{fig:textures} illustrates the use of the textures against the render
\includegraphics[width=1\textwidth]{assets/dash-3d/average-color/no-res.png}
\caption{With average colors}
\end{subfigure}
\caption{Rendering of the model with different styles of textures\label{fig:textures}}
\caption{Rendering of the model with different styles of textures\label{d3:textures}}
\end{figure}
\subsection{Segments}
@ -83,7 +83,7 @@ For textures, each representation contains a single segment.
\lstinputlisting[%
language=XML,
caption={MPD description of a geometry adaptation set, and a texture adaptation set.},
label=listing:MPD,
label=d3:mpd,
emph={%
MPD,
Period,

View File

@ -1,5 +1,5 @@
\copied{}
\section{Evaluation}\label{sec:eval}
\section{Evaluation\label{d3:evaluation}}
We now describe our setup and the data we use in our experiments. We present an evaluation of our system and a comparison of the impact of the design choices we introduced in the previous sections.
@ -7,9 +7,9 @@ We now describe our setup and the data we use in our experiments. We present an
\subsubsection{Model}
We use a city model of the Marina Bay area in Singapore in our experiments.
The model came in 3DS Max format and has been converted into Wavefront OBJ format before the processing described in Section~\ref{sec:dash3d}.
The model came in 3DS Max format and has been converted into Wavefront OBJ format before the processing described in Section~\ref{d3:dash-3d}.
The converted model has 387,551 vertices and 552,118 faces.
Table~\ref{table:size} gives some general information about the model.
Table~\ref{d3:size} gives some general information about the model.
We partition the geometry into a k-$d$ tree until the leafs have less than 10000 faces, which gives us 64 adaptation sets, plus one containing the large faces.
\begin{table}[th]
@ -24,7 +24,7 @@ We partition the geometry into a k-$d$ tree until the leafs have less than 10000
Textures (low res) & 11 MB \\
\bottomrule
\end{tabular}
\caption{Sizes of the different files of the model\label{table:size}}
\caption{Sizes of the different files of the model\label{d3:size}}
\end{table}
\subsubsection{User Navigations}
@ -39,10 +39,10 @@ The recorded camera trace allows us to replay each camera path to perform our si
We collected 13 camera paths this way.
\subsubsection{Network Setup}
We tested our implementation under three network bandwidth of 2.5 Mbps, 5 Mbps, and 10 Mbps with an RTT of 38 ms, following the settings from DASH-IF~\cite{DASH_NETWORK_PROFILE}.
We tested our implementation under three network bandwidth of 2.5 Mbps, 5 Mbps, and 10 Mbps with an RTT of 38 ms, following the settings from DASH-IF~\cite{dash-network-profiles}.
The values are kept constant during the entire client session to analyze the difference in magnitude of performance by increasing the bandwidth.
In our experiments, we set up a virtual camera that moves along a navigation path, and our access engine downloads segments in real time according to Algorithm~\ref{algorithm:nextsegment}.
In our experiments, we set up a virtual camera that moves along a navigation path, and our access engine downloads segments in real time according to Algorithm~\ref{d3:next-segment}.
We log in a JSON file the time when a segment is requested and when it is received.
By doing so, we avoid wasting time and resources to evaluate our system while downloading segments and store all the information necessary to plot the figures introduced in the subsequent sections.
@ -60,12 +60,12 @@ We do not have pixel error due to compression.
We present experiments to validate our implementation choices at every step of our system.
We replay the user-generated camera paths with various bandwidth conditions while varying key components of our system.
Table~\ref{table:experiments} sums up all the components we varied in our experiments.
Table~\ref{d3:experiments} sums up all the components we varied in our experiments.
We compare the impact of two space-partitioning trees, a $k$-d tree and an Octree, on content preparation.
We also try several utility metrics for geometry segments: an offline one, which assigns to each geometry segment $s^G$ the cumulated 3D area of its belonging faces $\mathcal{A}_{3D}(s^G)$; an online one, which assigns to each geometry segment the inverse of its distance to the camera position; and finally our proposed method, as described in Section~\ref{subsec:utility} ($\mathcal{A}_{3D}(s^G)/ \mathcal{D}{(v{(t_i)},AS^G)}^2$).
We consider two streaming policies to be applied by the client, proposed in Section~\ref{sec:dashclientspec}.
The greedy strategy determines, at each decision time, the segment that maximizes its predicted utility at arrival divided by its predicted delivery delay, which corresponds to equation (\ref{eq:greedy}).
The second streaming policy that we run is the one we proposed in equation (\ref{eq:smart}).
We also try several utility metrics for geometry segments: an offline one, which assigns to each geometry segment $s^G$ the cumulated 3D area of its belonging faces $\mathcal{A}_{3D}(s^G)$; an online one, which assigns to each geometry segment the inverse of its distance to the camera position; and finally our proposed method, as described in Section~\ref{d3:utility} ($\mathcal{A}_{3D}(s^G)/ \mathcal{D}{(v{(t_i)},AS^G)}^2$).
We consider two streaming policies to be applied by the client, proposed in Section~\ref{d3:dash-client}.
The greedy strategy determines, at each decision time, the segment that maximizes its predicted utility at arrival divided by its predicted delivery delay, which corresponds to equation (\ref{d3:greedy}).
The second streaming policy that we run is the one we proposed in equation (\ref{d3:smart}).
We have also analyzed the effect of grouping the faces in geometry segments of an adaptation set based on their 3D area.
Finally, we try several bandwidth parameters to study how our system can adapt to varying network conditions.
@ -80,7 +80,7 @@ Finally, we try several bandwidth parameters to study how our system can adapt t
Grouping of Segments & Sorted based on area, Unsorted\\
Bandwidth & 2.5 Mbps, 5 Mbps, 10 Mbps \\\bottomrule
\end{tabular}
\caption{Different parameters in our experiments\label{table:experiments}}
\caption{Different parameters in our experiments\label{d3:experiments}}
\end{table}
\subsection{Experimental Results}
@ -104,13 +104,13 @@ Finally, we try several bandwidth parameters to study how our system can adapt t
\addlegendentry{\scriptsize octree}
\end{axis}
\end{tikzpicture}
\caption{Impact of the space-partitioning tree on the rendering quality with a 5Mbps bandwidth.\label{fig:preparation}}
\caption{Impact of the space-partitioning tree on the rendering quality with a 5Mbps bandwidth.\label{d3:preparation}}
\end{figure}
Figure~\ref{fig:preparation} shows how the space partition can affect the rendering quality.
We use our proposed utility metrics (see Section~\ref{subsec:utility}) and streaming policy from Equation (\ref{eq:smart}), on content divided into adaptation sets obtained either using a $k$-d tree or an Octree and run experiments on all camera paths at 5 Mbps.
Figure~\ref{d3:preparation} shows how the space partition can affect the rendering quality.
We use our proposed utility metrics (see Section~\ref{d3:utility}) and streaming policy from Equation (\ref{d3:smart}), on content divided into adaptation sets obtained either using a $k$-d tree or an Octree and run experiments on all camera paths at 5 Mbps.
The octree partitions content into non-homogeneous adaptation sets; as a result, some adaptation sets may contain smaller segments, which contain both important (large) and non-important polygons. For the $k$-d tree, we create cells containing the same number of faces $N_a$ (here, we take $N_a=10k$).
Figure~\ref{fig:preparation} shows that the system seems to be slightly less efficient with an Octree than with a $k$-d tree based partition, but this result is not significant.
Figure~\ref{d3:preparation} shows that the system seems to be slightly less efficient with an Octree than with a $k$-d tree based partition, but this result is not significant.
For the remaining experiments, partitioning is based on a $k$-d tree.
\begin{figure}[th]
@ -135,13 +135,13 @@ For the remaining experiments, partitioning is based on a $k$-d tree.
\addlegendentry{\scriptsize Offline only}
\end{axis}
\end{tikzpicture}
\caption{Impact of the segment utility metric on the rendering quality with a 5Mbps bandwidth.\label{fig:utility}}
\caption{Impact of the segment utility metric on the rendering quality with a 5Mbps bandwidth.\label{d3:utility-impact}}
\end{figure}
Figure~\ref{fig:utility} displays how a utility metric should take advantage of both offline and online features.
Figure~\ref{d3:utility-impact} displays how a utility metric should take advantage of both offline and online features.
The experiments consider $k$-d tree cell for adaptation sets and the proposed streaming policy, on all camera paths.
We observe that a purely offline utility metric leads to poor PSNR results.
An online-only utility improves the results, as it takes the user viewing frustum into consideration, but still, the proposed utility (in Section~\ref{subsec:utility}) performs better.
An online-only utility improves the results, as it takes the user viewing frustum into consideration, but still, the proposed utility (in Section~\ref{d3:utility}) performs better.
\begin{figure}[th]
\centering
@ -163,18 +163,18 @@ An online-only utility improves the results, as it takes the user viewing frustu
\addlegendentry{\scriptsize Without sorting the faces}
\end{axis}
\end{tikzpicture}
\caption{Impact of creating the segments of an adaptation set based on decreasing 3D area of faces with a 5Mbps bandwidth.}\label{fig:sorting}
\caption{Impact of creating the segments of an adaptation set based on decreasing 3D area of faces with a 5Mbps bandwidth.\label{d3:sorting}}
\end{figure}
Figure~\ref{fig:sorting} shows the effect of grouping the segments in an adaptation set based on their area in 3D.
Figure~\ref{d3:sorting} shows the effect of grouping the segments in an adaptation set based on their area in 3D.
Clearly, the PSNR significantly improves when the 3D area of faces is considered for creating the segments. Since all segments are of the same size, sorting the faces by area before grouping them into segments leads to a skew distribution of how useful the segments are. This skewness means that the decision that the client makes (to download those with the largest utility first) can make a bigger difference in the quality.
We also compared the greedy vs.\ proposed streaming policy (as shown in Figure~\ref{fig:greedyweakness}) for limited bandwidth (5 Mbps).
We also compared the greedy vs.\ proposed streaming policy (as shown in Figure~\ref{d3:greedy-weakness}) for limited bandwidth (5 Mbps).
The proposed scheme outperforms the greedy during the first 30s and does a better job overall.
Table~\ref{table:greedyVsproposed} shows the average PSNR for the proposed method and the greedy method for different downloading bandwidth.
Table~\ref{d3:greedy-vs-proposed} shows the average PSNR for the proposed method and the greedy method for different downloading bandwidth.
In the first 30 sec, since there are relatively few 3D contents downloaded, making a better decision at what to download matters more: we observe during that time that the proposed method leads to 1 --- 1.9 dB better in quality terms of PSNR compared to Greedy.
Table~\ref{table:perc} shows the distribution of texture resolutions that are downloaded by greedy and our Proposed scheme, at different bandwidths.
Table~\ref{d3:percentages} shows the distribution of texture resolutions that are downloaded by greedy and our Proposed scheme, at different bandwidths.
Resolution 5 is the highest and 1 is the lowest.
The table clearly shows a weakness of the greedy policy: as the bandwidth increases, the distribution of downloaded textures resolution stays more or less the same.
In contrast, our proposed streaming policy adapts to an increasing bandwidth by downloading higher resolution textures (13.9\% at 10 Mbps, vs. 0.3\% at 2.5 Mbps).
@ -201,7 +201,7 @@ In other words, our system tends to favor geometry segments when the bandwidth i
\addlegendentry{\scriptsize Greedy}
\end{axis}
\end{tikzpicture}
\caption{Impact of the streaming policy (greedy vs.\ proposed) with a 5 Mbps bandwidth.}\label{fig:greedyweakness}
\caption{Impact of the streaming policy (greedy vs.\ proposed) with a 5 Mbps bandwidth.}\label{d3:greedy-weakness}
\end{figure}
\begin{table}[th]
@ -216,7 +216,7 @@ In other words, our system tends to favor geometry segments when the bandwidth i
Proposed & 16.3 & 20.4 & 23.2 & & 23.8 & 28.2 & 31.1 \\
\bottomrule
\end{tabular}
\caption{Average PSNR, Greedy vs. Proposed\label{table:greedyVsproposed}}
\caption{Average PSNR, Greedy vs. Proposed\label{d3:greedy-vs-proposed}}
\end{table}
\begin{table}[th]
@ -232,6 +232,6 @@ In other words, our system tends to favor geometry segments when the bandwidth i
4 & 14.6\% vs 18.4\% & 14.4\% vs 25.2\% & 14.2\% vs 24.1\% \\
5 & 11.4\% vs 0.3\% & 11.1\% vs 5.9\% & 11.5\% vs 13.9\% \\\bottomrule
\end{tabular}
\caption{Percentages of downloaded bytes for textures from each resolution, for the greedy streaming policy (left) and for our proposed scheme (right)\label{table:perc}}
\caption{Percentages of downloaded bytes for textures from each resolution, for the greedy streaming policy (left) and for our proposed scheme (right)\label{d3:percentages}}
\end{table}

View File

@ -1,3 +1,2 @@
\subsection{On our way to DASH-3D}
DASH is made to be format agnositic, and even though it is almost only applied for video streaming nowadays, we believe it is still suitable for 3D streaming. Even though periods are not much of a use in the case of a scene that doesn't evolve as time goes by, but adaptation sets allow us to separate our content between geometry and textures, and gives answers to the questions that were addresed in the conclusion of the previous chapter.
DASH is made to be format agnositic, and even though it is almost only applied for video streaming nowadays, we believe it is still suitable for 3D streaming.
Even though periods are not much of a use in the case of a scene that doesn't evolve as time goes by, but adaptation sets allow us to separate our content between geometry and textures, and gives answers to the questions that were addresed in the conclusion of the previous chapter.

View File

@ -1,5 +1,16 @@
\chapter{DASH-3D}
\begin{figure}[ht]
\centering
\includegraphics[width=\textwidth]{assets/dash-3d/bigpicture.png}
\caption{%
A subdivided 3D scene with a viewport, with regions delimited with
red edges. In white, the regions that are outside the field of view
of the camera; in green, the regions inside the field of view of the
camera.\label{d3:big-picture}
}
\end{figure}
\input{dash-3d/introduction}
\resetstyle{}

View File

@ -23,5 +23,8 @@ Text that was freshly written will be in this color
\tableofcontents
\input{plan}
\bibliographystyle{plain}
\bibliography{src/bib.bib}
\end{document}

View File

@ -1,6 +1,6 @@
\copied{}
\section{Impact of 3D Bookmarks on Navigation}\label{sec:3dnavigation}
\section{Impact of 3D Bookmarks on Navigation\label{bi:3dnavigation}}
We now describe an experiment that we conducted on 51 participants, with two goals in mind.
First, we want to measure the impact of 3D bookmarks on navigation within an NVE\@.
@ -41,7 +41,7 @@ The interface also includes a button to reset the camera back to the starting po
\includegraphics[width=\textwidth]{assets/preliminary-work/bookmarks/arrow-bookmark.png}
\caption{A coin is hidden behind the curtain\newline}
\end{subfigure}
\caption{3D bookmarks propose to move to a new viewpoint; when the user clicks on the bookmark, his viewpoint moves to the indicated viewpoint.}\label{fig:bookmark}
\caption{3D bookmarks propose to move to a new viewpoint; when the user clicks on the bookmark, his viewpoint moves to the indicated viewpoint.\label{bi:bookmark}}
\end{figure}
@ -49,15 +49,15 @@ The interface also includes a button to reset the camera back to the starting po
Our NVE supports 3D bookmarks.
A 3D bookmark, or bookmark for short, is simply a fixed camera location (in 3D space), a view direction, and a focal.
Bookmarks visible from the user's current viewpoint are shown as 3D objects in the scene.
Figure~\ref{fig:bookmark} depicts some bookmarks from our NVE\@.
Figure~\ref{bi:bookmark} depicts some bookmarks from our NVE\@.
The user can click on a bookmark object to automatically move and align its viewpoint to that of the bookmark.
The movement follows a Hermite curve joining the current viewpoint to the viewpoint of the bookmark.
The tangent of the curve is the view direction.
The user can hover the mouse pointer over a bookmark object to see a thumbnail view of the 3D scene as seen from the bookmark.
(Figure~\ref{fig:bookmark}, bottom left).
(Figure~\ref{bi:bookmark}, bottom left).
In our work, we consider two different possibilities for displaying bookmarks: viewports (Figure~\ref{fig:bookmark} top left) and arrows (Figure~\ref{fig:bookmark} top right).
In our work, we consider two different possibilities for displaying bookmarks: viewports (Figure~\ref{bi:bookmark} top left) and arrows (Figure~\ref{bi:bookmark} top right).
A viewport is displayed as a pyramid where the top corresponds to the optical center of its viewpoint and the base corresponds to its image plane.
The arrows are view dependent.
The bottom of the arrow turns towards the current position, to better visualize the relative position of the bookmark.
@ -67,7 +67,7 @@ Since bookmarks are part of the scene, they are visible only when not hidden by
We chose size and colors that are salient enough to be easily seen, but not too large to limit the occlusion of regions within the scene.
When reaching the bookmark, the corresponding arrow or viewport is not visible anymore, and subsequently will appear in a different color, to indicate that it has been clicked (similar to Web links).
\subsection{User Study}\label{sec:userstudy}
\subsection{User Study\label{bi:user-study}}
We now describe in details our experimental setup and the user study that we conducted on 3D navigation.
@ -103,7 +103,7 @@ Alternatively, this button may appear one minute after the sixth coin was found.
This means that a user is authorized to move on without completing the task, in order to avoid potential frustration caused by not finding the remaining two coins.
After completing the three tasks, the participants have to answer a set of questions about their experience with the bookmarks (we refer to the bookmarks as \textit{recommendations} in the experiments).
Table~\ref{t:questions} shows the list of questions.
Table~\ref{bi:questions} shows the list of questions.
\begin{table}[th]
\centering
@ -120,20 +120,20 @@ Table~\ref{t:questions} shows the list of questions.
7 & Did you enjoy this? & 36 Yes, 3 No\\
\bottomrule
\end{tabular}
\caption{List of questions in the questionnaire and summary of answers. Questions 1 and 2 have a 99\% confidence interval.}\label{t:questions}
\caption{List of questions in the questionnaire and summary of answers. Questions 1 and 2 have a 99\% confidence interval.\label{bi:questions}}
\end{table}
\textbf{Participants}.
The participants were recruited on microworkers.com, a crowdsourcing website.
There were 51 participants (36 men and 15 women), who are in average 30.44 years old.
\subsection{Experimental Results}\label{sec:qoeresults}
\subsection{Experimental Results}\label{bi::qoe-results}
We now present the results from our user study, focusing on whether bookmarks help users navigating the 3D scene.
\subsubsection{Questionnaire}
We had 51 responses to the Questionnaire.
The answers are summarized in Table~\ref{t:questions}.
The answers are summarized in Table~\ref{bi:questions}.
Note that not all questions were answered by all participants.
The participants seem to find the task to be of average difficulty (3.04/5) when they have no bookmarks to help their navigation.
@ -158,10 +158,10 @@ In addition, users seem to have a preference for \Arrows{} against \Viewports{}
\Viewports{} & 51 & 7.51 & 30 & 2 min 16 s \\
\bottomrule
\end{tabular}
\caption{Analysis of the sessions length and users success by type of bookmarks}\label{tab:sessions}
\caption{Analysis of the sessions length and users success by type of bookmarks}\label{bi:sessions}
\end{table}
Table~\ref{tab:sessions} shows basic statistics on task completion given the type of bookmarks that were provided to the participants.
Table~\ref{bi:sessions} shows basic statistics on task completion given the type of bookmarks that were provided to the participants.
First, we can see that without bookmarks, only a little bit more than a third of the users are able to complete the task, i.e.\ find all 8 coins.
In average, these users find just above 7 coins, and spend 4 minutes and 16 seconds to do it.
@ -171,7 +171,7 @@ Although \Viewports{} seem to help users a little bit more in completing the tas
The difference between an interface with bookmarks and without bookmarks, however, is very clear.
Users tend to complete the task more efficiently using bookmarks: more users actually finish the task, and it takes them half the time to do so.
We computed 99\% confidence intervals on the results introduced in Table~\ref{tab:sessions}.
We computed 99\% confidence intervals on the results introduced in Table~\ref{bi:sessions}.
We found that the difference in mean number of coins collected with and without bookmarks is not high enough to be statistically significant: we would need more experiments to reach the significance.
The mean time spent on the task however is statistically significant.
@ -186,10 +186,10 @@ The mean time spent on the task however is statistically significant.
\Viewports{} & 546.96 & 332.72 & 61 \% \\
\bottomrule
\end{tabular}
\caption{Analysis of the length of the paths by type of bookmarks}\label{tab:paths-length}
\caption{Analysis of the length of the paths by type of bookmarks}\label{bi:paths-length}
\end{table}
Table~\ref{tab:paths-length} presents the length of the paths traveled by users in the scenes.
Table~\ref{bi:paths-length} presents the length of the paths traveled by users in the scenes.
Although users tend to spend less time on the tasks when they do not have bookmarks, they travel pretty much the same distance as without bookmarks.
As a consequence, they visit the scene faster in average with bookmarks, than without bookmarks.
The table shows that this higher speed is due to the bookmarks, as more than 60\% of the distance traveled by users with bookmarks happens when users click on bookmarks and fly to the destination.
@ -223,10 +223,10 @@ We can say that bookmarks have a positive effect on navigation within the 3D sce
\end{axis}
\end{tikzpicture}
\caption{Comparison of the triangles queried after a certain time}\label{fig:triangles-curve}
\caption{Comparison of the triangles queried after a certain time}\label{bi:triangles-curve}
\end{figure}
Figure~\ref{fig:triangles-curve} shows a CDF of the percentage of 3D mesh triangles in the scene that have been queried by users after a certain time. We plotted this same curve for users with and without bookmarks.
Figure~\ref{bi:triangles-curve} shows a CDF of the percentage of 3D mesh triangles in the scene that have been queried by users after a certain time. We plotted this same curve for users with and without bookmarks.
As expected, the fact that the users can browse the scene significantly quicker with bookmarks reflects on the demand on the 3D content.
Users need more triangles more quickly, which either leads to more demand on network bandwidth, or if the bandwidth is kept constant, leads to fewer objects being displayed.
In the next section, we introduce experiments based on our user study traces that show how the rendering is affected by the presence of bookmarks and how to improve it.

View File

@ -1,6 +1,6 @@
\copied{}
\section{Impact of 3D Bookmarks on Streaming}\label{s:system}
\section{Impact of 3D Bookmarks on Streaming\label{bi:system}}
\subsection{3D Model Streaming}
@ -15,7 +15,7 @@ The geometry consists of (i) a list of vertices and (ii) a list of faces, and th
In the crowdsourcing experiment, we keep the model small since the goal is to study the user interaction.
To increase the size of the model, while keeping the same 3D scene, we subdivide each triangle three times, successively, thereby multiplying the total number of triangles in the scene by 64.
We do this to simulate a reasonable use case with large 3D scenes.
Table~\ref{tab:modelsize} shows that material and texture amount at most for $3.6\%$ of the geometry, which justifies this choice.
Table~\ref{bi:modelsize} shows that material and texture amount at most for $3.6\%$ of the geometry, which justifies this choice.
When a client starts loading the Web page containing the 3D model, the server first sends the list of materials and the texture files.
Then, the server periodically sends a fixed size chunk that indifferently encapsulates vertices, texture coordinates, or faces.
@ -33,7 +33,7 @@ Consequently, given the Javascript implementation of integers and floats, we app
Scene 3 & 16 KB & 92 KB & 5.85 MB \\
\bottomrule
\end{tabular}
\caption{Respective sizes of materials, textures (images) and geometries for the three scenes used in the user study.}\label{tab:modelsize}
\caption{Respective sizes of materials, textures (images) and geometries for the three scenes used in the user study.}\label{bi:modelsize}
\end{table}
During playback, the client periodically (every 200 ms in our implementation) sends to the server its current position and camera orientation.
@ -43,7 +43,7 @@ The server then sorts the filtered faces according to their distance to the came
Finally, the server incrementally fills in chunks with these ordered faces.
If a face depends on a vertex or a texture coordinate that has not yet been sent, the vertex or the texture coordinate is added to the chunk as well.
When the chunk is full, the server sends it.
Both client and server algorithms are detailed in algorithms~\ref{streaming-algorithm-client} and~\ref{streaming-algorithm-server}.
Both client and server algorithms are detailed in algorithms~\ref{bi:streaming-algorithm-client} and~\ref{bi:streaming-algorithm-server}.
The chunk size is set according to the bandwidth limit of the server.
Note that the server may send faces that are occluded and not visible to the client, since determining visibility requires additional computation.
@ -55,7 +55,7 @@ Note that the server may send faces that are occluded and not visible to the cli
Compute the rendering and evaluate the quality\;
Send the position of the camera to the server\;
}
\caption{Client slide algorithm\label{streaming-algorithm-client}}
\caption{Client slide algorithm\label{bi:streaming-algorithm-client}}
\end{algorithm}
\begin{algorithm}[th]
@ -64,14 +64,14 @@ Note that the server may send faces that are occluded and not visible to the cli
Compute the list of triangles to send and sort them\;
Send a chunk of a certain amount of triangles\;
}
\caption{Server side algorithm\label{streaming-algorithm-server}}
\caption{Server side algorithm\label{bi:streaming-algorithm-server}}
\end{algorithm}
In the following, we shall denote this streaming policy \textsf{culling}; in Figures~\ref{fig:click-1250} and~\ref{fig:click-625} streaming using \textsf{culling} only is denoted \textsf{C-only}.
In the following, we shall denote this streaming policy \textsf{culling}; in Figures~\ref{bi:click-1250} and~\ref{bi:click-625} streaming using \textsf{culling} only is denoted \textsf{C-only}.
\subsection{3D Bookmarks}
We have seen (Figure~\ref{fig:triangles-curve}) that navigation with bookmarks is more demanding on the bandwidth.
We have seen (Figure~\ref{bi:triangles-curve}) that navigation with bookmarks is more demanding on the bandwidth.
We want to exploit bookmarks to improve the user's quality of experience. For this purpose, we propose two streaming policies based on offline computation of the relevance of 3D content to bookmarked viewpoints.
\subsubsection{Visibility Determination for 3D Bookmarks}
@ -88,13 +88,13 @@ It is not scalable to compute this for every viewpoint requested by the client.
However, we can pre-render the bookmarked viewpoints, since the number of bookmarks is limited, their viewpoints are known in advance, and they are likely to be accessed.
For each bookmark, we render offline the scene using a single color per triangle.
Once rendered, we scan the output image to find the visible triangles (based on the color) and sort them by decreasing projected area.
This technique is also used by~\cite{chengwei}.
This technique is also used by~\cite{view-dependent-progressive-mesh}.
Thus, when the user clicks on a 3D bookmark, this pre-computed list of faces is used by the server, and only visible faces are sent in decreasing order of contributions to the rendered image.
For the three scenes that we used in the experiment, we can reduce the number of triangles sent by 60\% (over all bookmarks).
This reduction is as high as 85.7\% for one particular bookmark (from 26,886 culled triangles to 3,853 culled and visible triangles).
To illustrate the impact of sorting by projected area of faces, Figure~\ref{fig:sortedtri} shows the quality improvement gained by sending the precomputed visible triangles prioritized by projected areas, compared to using culling only prioritized by distance.
To illustrate the impact of sorting by projected area of faces, Figure~\ref{bi:sorted-tri} shows the quality improvement gained by sending the precomputed visible triangles prioritized by projected areas, compared to using culling only prioritized by distance.
The curve shows the average quality over all bookmarks over all scenes, for a given number of triangles received.
The quality is measured by the ratio of correctly rendered pixels, comparing the fully and correctly rendered image (when all 3D content is available) and the rendered image (when content is partially available).
We sample one pixel every 100 rows and every 100 columns to compute this value.
@ -126,7 +126,7 @@ In what follows, we will refer to this streaming policy as \textsf{visible}.
\end{axis}
\end{tikzpicture}
\caption{Comparison of rendered image quality (average on all bookmarks and starting position): the triangles are sorted offline (dotted curve), or sorted online by distance to the viewpoint (solid curve).}\label{fig:sortedtri}
\caption{Comparison of rendered image quality (average on all bookmarks and starting position): the triangles are sorted offline (dotted curve), or sorted online by distance to the viewpoint (solid curve).}\label{bi:sorted-tri}
\end{figure}
\subsubsection{Prefetching by Predicting the Next Bookmark Clicked}
@ -153,10 +153,10 @@ It is thus natural to try to prefetch the 3D content of the bookmarks.
\draw (-40pt,5.5) node[rotate=90] {Next recommendation clicked};
\draw[step=1.0,black,thin,dashed] (0,0) grid (11,11);
\end{tikzpicture}
\caption{Probability distribution of `next clicked bookmark' for Scene 1 (computed from the 33 users with bookmarks). Numbering corresponds to 0 for initial viewport and 11 bookmarks; the size of the disk at $(i,j)$ is proportional to the probability of clicking bookmark $j$ after $i$.}\label{fig:mat1}
\caption{Probability distribution of `next clicked bookmark' for Scene 1 (computed from the 33 users with bookmarks). Numbering corresponds to 0 for initial viewport and 11 bookmarks; the size of the disk at $(i,j)$ is proportional to the probability of clicking bookmark $j$ after $i$.\label{bi:mat1}}
\end{figure}
Figure~\ref{fig:mat1} shows the probability of visiting a bookmark (vertical axis) given that another bookmark has been visited (horizontal axis).
Figure~\ref{bi:mat1} shows the probability of visiting a bookmark (vertical axis) given that another bookmark has been visited (horizontal axis).
This figure shows that users tend to follow similar paths when consuming bookmarks.
Thus, we hypothesize that prefetching along those paths would lead to better image quality and lower discovery latency.
@ -164,7 +164,7 @@ We use the following prefetching policy in this paper.
We divide each chunk sent by the server into two parts.
The first part is used to fetch the content from the current viewpoint, using the \textsf{culling} streaming policy.
The second part is used to prefetch content from the bookmarks, according to their likelihood of being clicked next.
We use the probabilities displayed in Figure~\ref{fig:mat1} to determine the size of each part.
We use the probabilities displayed in Figure~\ref{bi:mat1} to determine the size of each part.
Each bookmark $B$ has a probability $p(B|B_{prev})$ of being clicked next, considering that $B_{prev}$ was the last clicked bookmark.
We assign to each bookmark $p(B|B_{prev})/2$ of the chunk to prefetch the corresponding data.
We use the \textsf{visible} policy to determine which data should be sent for a bookmark.
@ -183,13 +183,13 @@ We denote this combination as \textsf{V-PP}, for Prefetching based on Prediction
\draw [fill=LightGreen] (7,0) rectangle (10,1);
\node at (8.5,0.5) {$B_k$};
\end{tikzpicture}
\caption{Example of how a chunk can be divided into fetching what is needed to display the current viewport (culling), and prefetching three recommendations according to their probability of being visited next.}\label{fig:prefetchedchunk}
\caption{Example of how a chunk can be divided into fetching what is needed to display the current viewport (culling), and prefetching three recommendations according to their probability of being visited next.\label{bi:prefetched-chunk}}
\end{figure}
\subsubsection{Fetching Destination Bookmark}
An alternate method to benefit from the precomputing visible triangles at the bookmark, is to fetch 3D content during the ``fly-to'' transition to reach the destination.
Indeed, as specified in Section~\ref{sec:3dnavigation}, moving to a bookmarked viewpoint is not instantaneous, but rather takes a small amount of time to smoothly move the user camera from its initial position towards the bookmark.
Indeed, as specified in Section~\ref{bi:3dnavigation}, moving to a bookmarked viewpoint is not instantaneous, but rather takes a small amount of time to smoothly move the user camera from its initial position towards the bookmark.
This transition usually takes from 1 to 2 seconds, depending on how far the current user camera position is from the bookmark.
When the user clicks on the bookmark, the client fetches the visible vertices from the destination viewpoint, with all the available bandwidth.
@ -231,11 +231,11 @@ The first point we are interested in is which streaming policy leads to the lowe
\addlegendentry{V-PP+FD}
\end{axis}
\end{tikzpicture}
\caption{Average percentage of the image pixels that are correctly rendered against time, for all users with bookmarks, and using a bandwidth (BW) of 1 Mbps. The origin, $t=0$, is the time of the first click on a bookmark. Each curve corresponds to a streaming policy.}\label{fig:click-1250}
\caption{Average percentage of the image pixels that are correctly rendered against time, for all users with bookmarks, and using a bandwidth (BW) of 1 Mbps. The origin, $t=0$, is the time of the first click on a bookmark. Each curve corresponds to a streaming policy.\label{bi:click-1250}}
\end{figure}
Figure~\ref{fig:click-1250} compares the quality of the view of a user after his/her first click on a bookmark.
The ratio of pixels correctly displayed is computed in the client algorithm, see also algorithm~\ref{streaming-algorithm-client}.
Figure~\ref{bi:click-1250} compares the quality of the view of a user after his/her first click on a bookmark.
The ratio of pixels correctly displayed is computed in the client algorithm, see also algorithm~\ref{bi:streaming-algorithm-client}.
In this figure we use a bandwidth of 1 Mbps.
The solid curve corresponds to the \textsf{culling} policy.
Clicking on a bookmark generates a user path with less spatial locality, causing a large drop in visual quality that is only compensated after 4 seconds.
@ -272,7 +272,7 @@ More quantitatively, with a $1$ Mbps bandwidth, 3 seconds are necessary after th
\addlegendentry{V-PP+FD}
\end{axis}
\end{tikzpicture}
\caption{Average percentage of the image pixels that are correctly rendered against time --for all users with bookmarks, and using a bandwidth (BW) of 0.5 Mbps. The origin, $t=0$, is the time of the first click on a bookmark. Each curve corresponds to a streaming policy.}\label{fig:click-625}
\caption{Average percentage of the image pixels that are correctly rendered against time --for all users with bookmarks, and using a bandwidth (BW) of 0.5 Mbps. The origin, $t=0$, is the time of the first click on a bookmark. Each curve corresponds to a streaming policy.\label{bi:click-625}}
\end{figure}
\begin{figure}[th]
@ -297,15 +297,15 @@ More quantitatively, with a $1$ Mbps bandwidth, 3 seconds are necessary after th
\addlegendentry{V-PP-FD}
\end{axis}
\end{tikzpicture}
\caption{Same curve as Figures~\ref{fig:click-1250} and~\ref{fig:click-625}, for comparing streaming policies \textsf{V-FD} alone and \textsf{V-PP+FD}. BW=2Mbps}\label{fig:2MB}
\caption{Same curve as Figures~\ref{bi:click-1250} and~\ref{bi:click-625}, for comparing streaming policies \textsf{V-FD} alone and \textsf{V-PP+FD}. BW=2Mbps\label{bi:2MB}}
\end{figure}
Figure~\ref{fig:click-625} showed the results of the same experiment with 0.5 Mbps bandwidth. Here, it takes 4 to 5 seconds to recover $85\%$ of the pixels with \textsf{culling} and \textsf{V-PP}, against 1.5 second for recovering $90\%$ with \textsf{V-FD}.
Figure~\ref{bi:click-625} showed the results of the same experiment with 0.5 Mbps bandwidth. Here, it takes 4 to 5 seconds to recover $85\%$ of the pixels with \textsf{culling} and \textsf{V-PP}, against 1.5 second for recovering $90\%$ with \textsf{V-FD}.
Combining both strategies (\textsf{V-PP+FD} leads to the best quality.
At 1 Mbps bandwidth, \textsf{V-PP} penalizes the quality, as the curve \textsf{V-PP-FD}) leads to a lower quality image than \textsf{V-FD} alone.
This effect is even stronger when the bandwidth is set to 2 Mbps (Figure~\ref{fig:2MB}).
This effect is even stronger when the bandwidth is set to 2 Mbps (Figure~\ref{bi:2MB}).
Both streaming strategies based on the pre-computation of the ordering improves the image quality.
We see here, that \textsf{V-FD} has a greater impact than \textsf{V-PP}. Here, \textsf{V-PP} may prefetch content that eventually may not be used, whereas \textsf{V-FD} only sends relevant 3D content (knowing which bookmark has been just clicked).

View File

@ -2,8 +2,8 @@
\section{Video scenario}
Despite what one may think, the video streaming scenario and the 3D streaming one share many similarities: at a higher level of abstraction, they are both interfaces that allow a user to access remote content without having to wait until everything is loaded.
Analyzing the similarities and the differences between the video and the 3D scenarios as well as having knowledge about video streaming litterature are key to developing an efficient 3D streaming system.
Despite what one may think, the video streaming scenario and the 3D streaming one share many similarities: at a higher level of abstraction, they are both systems that allow a user to access remote content without having to wait until everything is loaded.
Analyzing the similarities and the differences between the video and the 3D scenarios as well as having knowledge about video streaming litterature is\todo{is key or are key?} key to developing an efficient 3D streaming system.
\subsection{Similarities and differences between video and 3D}
@ -91,29 +91,31 @@ Finally, most of the other interfaces will give at least 5 degrees of freedom to
\subsection{DASH\@: the standard for video streaming}
\copied{}
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH~\cite{stockhammer2011dynamic,Sodagar2011}, is now a widely deployed
standard for streaming adaptive video content on the Web~\cite{dashstandard}, made to be simple and scalable.
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH~\cite{dash-std,dash-std-2}, is now a widely deployed
standard for streaming adaptive video content on the Web~\cite{dash-std-full}, made to be simple and scalable.
\fresh{}
DASH is based on a clever way of structuring the content that allows a great adaptability during the streaming without requiring any server side computation.
\subsubsection{DASH structure}
All those pieces are structured in a Media Persentation Description (MPD) file, written in the XML format.
This file has 4 layers, the periods, the adaptation sets, the representations and the segments.
Each period can have many adaptation sets, each adaptation set can have many representation, and each representation can have many segments.
\subsubsection{Periods}
\paragraph{Periods.}
Periods are used to delimit content depending on the time. It can be used to delimit chapters, or to add advertisements that occur at the beginning, during or at the end of a video.
\subsubsection{Adaptation sets}
\paragraph{Adaptation sets.}
Adaptation sets are used to delimit content depending of the format.
Each adaptation set has a mime-type, and all the representations and segments that it contains share this mime-type.
In videos, most of the time, each period has at least one adaptation set containing the images, and one adaptation set containing the sound.
\subsubsection{Representations}
\paragraph{Representations.}
The representation level is the level DASH uses to offer the same content at different levels of resolution.
For example, a adaptation set containing images have a representation for each available resolution (it might be 480p, 720p, 1080p, etc\ldots).
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal resolution, being the highest resolution that arrives on time to avoid stalling.
\subsubsection{Segments}
\paragraph{Segments.}
Until this level of the MPD, content can be long.
For example, a representation of images of a chapter of a movie can be heavy and long to download.
However, downloading heavy files is not suitable for streaming because it prevents the dynamicity of it: if the user requests to change the level of resolution of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
@ -126,4 +128,40 @@ Once a video is encoded in DASH format, once the files have been structured and
All the intelligence and the decision making is moved to the client side.
A client typically starts by downloading the MPD file, and then proceeds on downloading segments of the different adaptation sets that he needs, estimating itself its downloading speed and choosing itself whether it needs to change representation or not.
\subsection{DASH-SRD}
DASH-SRD (Spatial Relationship Description,~\cite{dash-srd}) is a feature that extends the DASH standard to allow stream only a spatial subpart of a video to a device.
It works by encoding a video at multiple resolutions, and tiling the highest resolutions, that way, a client can choose to download either the low resolution of the whole video or higher resolutions of a subpart of the video (see Figure~\ref{sota:srd-png}).
For each tile of the video, an adaptation set is declared in the MPD, and a supplemental property is defined in order to give the client information about the file.
This supplemental property contains many elements, but the most important ones are the position ($x$ and $y$) and the size (width and height) of the tile describing the position of the tile in relation to the full video. An example of such a property is given in Listing~\ref{sota:srd-xml}.
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed.
This is especially interesting in the context of 3D streaming since we have this same pattern of a user viewing only a part of a content.
\begin{figure}[th]
\centering
\includegraphics[width=\textwidth]{assets/state-of-the-art/video/srd.png}
\caption{DASH-SRD~\cite{dash-srd}\label{sota:srd-png}}
\end{figure}
\begin{figure}[th]
\lstinputlisting[%
language=XML,
caption={MPD of a video encoded using DASH-SRD},
label=sota:srd-xml,
emph={%
MPD,
Period,
AdaptationSet,
Representation,
BaseURL,
SegmentBase,
Initialization,
Role,
SupplementalProperty,
SegmentList,
SegmentURL,
Viewpoint
}
]{assets/state-of-the-art/video/srd.xml}
\end{figure}

View File

@ -1,7 +1,7 @@
\copied{}
\section{Adding bookmarks into DASH NVE framework}\label{sec:bookmarks}
\section{Adding bookmarks into DASH NVE framework\label{sb:bookmarks}}
In this section, we explain how to include a new interaction in the system described in Section~\ref{sec:dash3d}.
In this section, we explain how to include a new interaction in the system described in the previous chapter.
\subsection{Interaction --- Visual}
@ -22,12 +22,12 @@ In order to avoid users to lose context, clicking on a bookmark triggers an auto
We implement an additional interaction that displays a preview of the bookmark's viewpoint while it is hovered by the user's mouse.
A small thumbnail of the viewport is displayed below the bookmark.
\subsection{Segments utility at bookmarked viewpoint}\label{sec:utility}
\subsection{Segments utility at bookmarked viewpoint\label{sb:utility}}
Introducing bookmarks is a way to make users navigation more predictable.
Indeed, since they are emphasized and, in a way, recommended viewpoints, bookmarks are more likely to be visited by a significant portion of users than any other viewpoint on the scene.
As such, bookmarks can be used as a way to optimize streaming by downloading segments in an optimal, pre-computed order.
More specifically, segment utility as introduced in Section~\ref{sec:dash3d} is only an approximation of the segment's true contribution to the current viewpoint rendering.
More specifically, segment utility as introduced in Section~\ref{d3:utility} is only an approximation of the segment's true contribution to the current viewpoint rendering.
When bookmarks are defined, it is possible to obtain a perfect measure of segment utility by performing an offline rendering at each bookmark's viewpoint.
Then, by simply counting the number of pixels that are rendered using each segment, we can rank the segments by order of importance in the rendering.
@ -56,22 +56,22 @@ This utility definition is the same for geometry and texture segments, which all
\end{axis}
\end{tikzpicture}
\caption{Impact of using the precomputed information of bookmarks to select segments to download\label{fig:precomputation}}
\caption{Impact of using the precomputed information of bookmarks to select segments to download\label{sb:precomputation}}
\end{figure}
\begin{figure}[th]
\includegraphics[width=0.49\columnwidth]{assets/system-bookmarks/bookmark/ground-truth.png}
\includegraphics[width=0.49\columnwidth]{assets/system-bookmarks/bookmark/geometry.png}
\caption{A bookmarked viewpoint (left), and a pixel to geometry segment map (right)}\label{fig:bookmarks-utility}
\caption{A bookmarked viewpoint (left), and a pixel to geometry segment map (right)\label{sb:bookmarks-utility}}
\end{figure}
Figure~\ref{fig:bookmarks-utility} depicts a ``pixel to geometry segment'' map: all pixels of the same color in the right image display an element of the same geometry segment.
Figure~\ref{sb:bookmarks-utility} depicts a ``pixel to geometry segment'' map: all pixels of the same color in the right image display an element of the same geometry segment.
We render such maps offline, for each bookmark, and use it to compute the true utility $\mathcal{U}^*(s)$ of segment $s$.
\subsection{MPD modification}
We now present how to introduce bookmarks information in the Media Presentation Description (MPD) file, to be used in a DASH framework.
Bookmarks are fully defined by a viewport description, and the additional content needed to properly render and use a bookmark in a system consists in three images: a thumbnail of the point of view at the bookmark, along with two ``pixel to segment'' maps (see Figure~\ref{fig:bookmarks-utility}, right image).
Bookmarks are fully defined by a viewport description, and the additional content needed to properly render and use a bookmark in a system consists in three images: a thumbnail of the point of view at the bookmark, along with two ``pixel to segment'' maps (see Figure~\ref{sb:bookmarks-utility}, right image).
For this reason, we create a separate adaptation set in the MPD\@.
The bookmarked viewport information is stored as a supplemental property.
Bookmarks adaptation set only contain one representation, composed of three segments corresponding to the three images described earlier.
@ -80,7 +80,7 @@ Bookmarks adaptation set only contain one representation, composed of three segm
\lstinputlisting[%
language=XML,
caption={MPD description of a geometry adaptation set, and a texture adaptation set.},
label=listing:bookmark-as,
label=sb:bookmark-as,
emph={%
MPD,
Period,
@ -98,7 +98,7 @@ Bookmarks adaptation set only contain one representation, composed of three segm
]{assets/system-bookmarks/bookmark-as.xml}
\end{figure}
An example of a bookmark adaptation set is depicted on Listing~\ref{listing:bookmark-as}.
An example of a bookmark adaptation set is depicted on Listing~\ref{sb:bookmark-as}.
The three first values in the supplemental property are the camera position coordinates, and the three last values are the target point coordinates.
\subsection{System-aware bookmarks}
@ -107,7 +107,7 @@ The information we include in the MPD to optimize streaming at bookmarked viewpo
Indeed, displaying a thumbnail of what can be seen from a bookmark might fool users into thinking that all necessary segments visible from the bookmarked viewpoint have been downloaded.
In case this would be not true, users' Quality of Experience would be unsatisfactory.
In order to give users a sense of the amount of information readily available at a given bookmarked viewpoint, we use the pixel to segment maps described in Section~\ref{sec:utility} to create a mask of segment availability.
In order to give users a sense of the amount of information readily available at a given bookmarked viewpoint, we use the pixel to segment maps described in Section~\ref{sb:utility} to create a mask of segment availability.
Since we know which segments have been downloaded at any given time, we know which pixels in the thumbnail accurately depict what the user will see when clicking on the bookmark.
We thus render the thumbnail with the mask of already downloaded segments superimposed over it.
@ -116,8 +116,8 @@ We thus render the thumbnail with the mask of already downloaded segments superi
\subsection{Loader modifications}
We build on the loader introduced in~\cite{forgione2018dash} (Algorithm 1) to implement a client adaptation logic.
We include a bookmark adaptation logic such that (i) when a bookmark is hovered for the first time, the corresponding images (see Listing~\ref{listing:bookmark-as}) are downloaded, and (ii) when a bookmark is clicked, we switch from utility $\mathcal{U}$ to true utility $\mathcal{U}^*$ to determine which segments to download next.
We build on the loader introduced in Algorithm~\ref{d3:next-segment} to implement a client adaptation logic.
We include a bookmark adaptation logic such that (i) when a bookmark is hovered for the first time, the corresponding images (see Listing~\ref{sb:bookmark-as}) are downloaded, and (ii) when a bookmark is clicked, we switch from utility $\mathcal{U}$ to true utility $\mathcal{U}^*$ to determine which segments to download next.
\begin{algorithm}[th]
\SetKwInOut{Input}{input}
@ -134,14 +134,14 @@ We include a bookmark adaptation logic such that (i) when a bookmark is hovered
{- Optimize a criterion $\Omega$ based on $\mathcal{U}$ values and well chosen viewpoint $v(t_i)$ to select the next segment to query }
{\begin{equation*}
s^{*}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC}} \Omega_{\theta_i} \Big(\mathcal{U}(s,v(t_i))\Big) \label{eq1}
s^{*}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC}} \Omega_{\theta_i} \Big(\mathcal{U}(s,v(t_i))\Big) \label{sb:eq1}
\end{equation*} \\
given parameters $\theta_i$ that gathers both online parameters $(i,t_i,v(t_i),\widehat{BW_i}, \widehat{\tau_i}, \mathcal{B}_i)$ and offline metadata;}
{- Update the buffer $\mathcal{B}_{i+1}$ for the next decision: $s^{*}_i$ and lowest \texttt{representations} of $s^{*}_i$ are considered downloaded\;}
{- \Return{segment $s^{*}_i$, buffer $\mathcal{B}_{i+1}$}\;}
{\caption{Algorithm to identify the next segment to query\label{algorithm:nextsegment}}}
{\caption{Algorithm to identify the next segment to query\label{sb:next-segment}}}
\end{algorithm}
\todo[inline]{to be modified to include bookmarks}