phd/src/dash-3d/client.tex

\copied{}
\section{Client}\label{sec:dashclientspec}

In this section, we specify a DASH NVE client that exploits the preparation of the 3D content in an NVE for streaming.

The generated MPD file describes the content organization so that the client gets all the necessary information to make educated decisions and query the 3D content it needs according to the available resources and current viewpoint.
A camera path generated by a particular user is a set of viewpoint $v(t_i)$ indexed by a continuous time interval $t_i \in  [t_1,t_{end}]$.

The DASH client first downloads the MPD file to get the material (.mtl) file containing information about all the geometry and textures available for the entire 3D model.
At time instance $t_i$, the DASH client decides to download the appropriate segments containing the geometry and the texture to generate the viewpoint $v(t_{i+1})$ for the time instance $t_{i+1}$.

Starting from $t_1$, the camera continuously follows a camera path $C=\{v(t_i), t_i \in  [t_1,t_{end}]\}$, along which downloading opportunities are strategically exploited to sequentially query the most useful segments.

\subsection{Segment Utility}\label{subsec:utility}

Unlike video streaming, where the bitrate of each segment correlates with the quality of the video received, for 3D content, the size (in bytes) of the content does not necessarily correlate well to its contribution to visual quality.
A large polygon with huge visual impact takes the same number of bytes as a tiny polygon.
Further, the visual impact is \textit{view dependent} --- a large object that is far away or out of view does not contribute to the visual quality as much as a smaller object that is closer to the user.
As such, it is important for a DASH-based NVE client to estimate the usefulness of a given segment to download, so that it can make good decisions about what to download.
We call this usefulness the \textit{utility} of the segment.

The utility is a function of a segment, either geometry or texture, and the current viewpoint (camera location, view angle, and look-at point), and is therefore dynamically computed online by the client from parameters in the MPD file.

\subsubsection{Offline parameters}
Let us detail first, all parameters available from the offline/static preparation of the 3D NVE\@.
These parameters are stored in the MPD file.
First, for each geometry segment $s^G$ there is a predetermined 3D area $\mathcal{A}_{3D}(s^G)$, equal to the sum of all triangle areas in this segment (in 3D); it is computed as the segments are created.
Note that the texture segments have similar information, but computed at \textit{navigation time} $t_i$.
The second information stored in the MPD  for all segments, geometry, and texture, is the size of the segment (in kB).
Indeed, geometry segments have close to a similar number of faces; their size is almost uniform.
For texture segments, the size is usually much smaller than the geometry segments but also varies a lot, as between two successive resolutions the number of pixels is divided by 4.

Finally, for each texture segment $s^{T}$, the MPD stores the \textit{MSE} (mean square error) of the image and resolution, relative to the highest resolution (by default, triangles are filled with its average color).
Offline parameters are stored in the MPD as shown in Listing~\ref{listing:MPD}.

\subsubsection{Online parameters}
In addition to the offline parameters stored in the MPD file for each segment, view-dependent parameters are computed at navigation time.
First, a measure of 3D area is computed for texture segments.
As a texture maps on a set of triangles, we account for the area in 3D of all these triangles.
We could consider such an offline measure (attached to the adaptation set containing the texture), but we prefer to only account for the triangles that have been already downloaded by the client.
We call the set of triangles colored by a texture $T$: $\Delta(s^T)=\Delta(T)$ (depending only on $T$ and equal for any representation/segment $s^T$ in this texture adaptation set).
At each time $t_i$, a subset of $\Delta(T)$ has been downloaded; we denote it $\Delta(T, t_i)$.

Moreover, each geometry segment belongs to a geometry adaptation set $AS^G$ whose bounding box coordinates are stored in the MPD\@.
Given the coordinates of the bounding box $\mathcal{BB}(AS^G)$ and the viewpoint $v(t_i)$ at time $t_i$, the client computes the distance $\mathcal{D}(v(t_i),AS^G)$ of the bounding box $\mathcal{BB}(AS^G)$ as the distance from the center of $\mathcal{BB}(AS^G)$ to the principal point of the camera, given in $v(t_i)$.


\subsubsection{Utility for geometry segments}
We now have all parameters to derive a utility measure of a geometry segment.
Utility for texture segments follows from the geometric utility.

The utility of a geometric segment $s^G$ for a viewpoint $v(t_i)$ is:
\begin{equation*}
    \mathcal{U} \Big(s^G,v(t_i) \Big) = \frac{\mathcal{A}_{3D}(s^G)}{\mathcal{D}{\left(v{(t_i)},AS^G\right)}^2}
\end{equation*}
where $AS^G$ is the adaptation set containing $s^G$.

Basically, the utility of a segment is proportional to the area that its faces cover, and inversely proportional to the square of the distance between the camera and the center of the bounding box of the adaptation set containing the segment.
That way, we favor segments with big faces that are close to the camera.

\subsubsection{Utility for texture segments}
For a texture $T$ stored in a segment $s^T$, the triangles in $\Delta(T)$ are stored in arbitrary geometry segments, that is, they do not have spatial coherence.
Thus, for each $k^{th}$ downloaded geometry segment $s_k^G$,  and total downloaded segment $K$ at time $t_i$, we collect the triangles of $\Delta(T, t_i)$ in $s^G_k$, and compute the ratio of $\mathcal{A}_{3D}(s_k^G)$ covered by these triangles.
So, we define the utility:
\begin{equation*}
\mathcal{U}\Big( s^T,v(t_i) \Big)
= psnr(s^T) \sum_{k\in K}\frac{\mathcal{A}_{3D}( s_k^G\cap \Delta(T,t_i))}{\mathcal{A}_{3D}(s_k^G)} \mathcal{U}\Big( s_k^G,v(t_i) \Big)
\end{equation*}
where we sum over all geometry segments received before time $t_i$ that intersect $\Delta(T,t_i)$ and such that the adaptation set it belongs to is in the frustum.
This formula defines the utility of a texture segment by computing the linear combination of the utility of the geometry segments that use this texture, weighted by the proportion of area covered by the texture in the segment.
We compute the PSNR by using the MSE in the MPD and denote it $psnr(s^T)$.
We do this to acknowledge the fact that a texture at a greater resolution has a higher utility than a lower resolution texture.
The equivalent term for geometry is 1 (and does not appear).
Having defined a utility on both geometry and texture segments, the client uses it next for its streaming strategy.

\subsection{DASH Adaptation Logic}\label{subsec:dashadaptation}

Along the camera path $C=\{v(t_i)\}$, viewpoints are indexed by a continuous time interval $t_i \in  [t_1,t_{end}]$.
Contrastingly, the DASH adaptation logic proceeds sequentially along a discrete time line.
The first request \texttt{(HTTP request)} made by the DASH client at time $t_1$ selects the most useful segment $s_1^*$ to download and will be followed by subsequent decisions at $t_2, t_3, \dots$.
While selecting $s_i^*$, the i-th best segment to request, the adaptation logic compromises between geometry, texture, and the available \texttt{representations} given the current bandwidth, camera dynamics, and the previously described utility scores.
The difference between $t_{i+1}$  and $t_{i}$ is the $s_i^*$ delivery delay.
It varies with the segment size and network conditions.
Algorithm~\ref{algorithm:nextsegment} details how our DASH client makes decisions.


\begin{algorithm}[th]
    \SetKwInOut{Input}{input}
    \SetKwInOut{Output}{output}
    \Input{Current index $i$, time $t_i$, viewpoint $v(t_i)$, buffer of already downloaded  \texttt{segments} $\mathcal{B}_i$, MPD}
    \Output{Next segment $s^{*}_i$ to request, updated buffer $\mathcal{B}_{i+1}$}
    \SetAlgoLined{}
    {- Estimate the bandwidth $\widehat{BW_i}$ and RTT $\widehat{\tau_i}$ \;}

    {-  Among all \texttt{segments} that are not already downloaded $s \in \mathcal{S} \backslash \mathcal{B}_i$, % \;}
   % {-
    keep the ones inside the upcoming viewing frustums $\mathcal{FC}=\mathbb{FC}(\widehat{v}(t_i)), t\in [t_i, t_i+\chi]$  thanks to a viewpoint predictor $t_i \rightarrow \hat{v}(t_i)$, a temporal horizon $\chi$ and a frustum culling operator $\mathbb{FC}$ \;}


    {- Optimize a criterion  $\Omega$ based on $\mathcal{U}$ values and well chosen viewpoint $v(t_i)$ to select the next segment to query }
   {\begin{equation*} s^{*}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC}}  \Omega_{\theta_i} \Big(\mathcal{U}\left(s,v(t_i)\right)\Big) \label{eq1}\end{equation*} \\
   given parameters $\theta_i$ that gathers both online parameters $(i,t_i,v(t_i),\widehat{BW_i}, \widehat{\tau_i}, \mathcal{B}_i)$ and offline metadata\;}

    {- Update the buffer $\mathcal{B}_{i+1}$  for the next decision: $s^{*}_i$ and lowest \texttt{representations} of $s^{*}_i$ are considered downloaded\;}
     {- \Return{segment  $s^{*}_i$, buffer $\mathcal{B}_{i+1}$}\;}

    {\caption{Algorithm to identify the next segment to query\label{algorithm:nextsegment}}}
\end{algorithm}

The most naive way to sequentially optimize the $\mathcal{U}$ is to limit the decision-making to the current viewpoint $v(t_i)$.
In that case, the best segment $s$ to request would be the one maximizing $\mathcal{U}(s, v(t_i))$ to simply make a better rendering from the current viewpoint  $v(t_i)$.
Due to transmission delay however, this segment will be only delivered at time $t_{i+1}=t_{i+1}(s)$ depending on the segment size and network conditions: \begin{equation*} t_{i+1}(s)=t_i+\frac{\mathtt{size}(s)}{\widehat{BW_i}} + \widehat{\tau_i}\label{eq2}\end{equation*}

In consequence, the most useful segment from $v(t_i)$ at decision time $t_i$ might be less useful at delivery time from $v(t_{i+1})$.

A better solution is to download a segment that is expected to be the most useful in the future.
With a temporal horizon $\chi$, we can optimize the cumulated $\mathcal{U}$ over $[t_{i+1}(s), t_i+\chi]$:

\begin{equation}
s^*_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC} }  \int_{t_{i+1}(s)}^{t_i+\chi} \mathcal{U}(s,\hat{v}(t_i)) dt
\label{eq:smart}
\end{equation}

In our experiments, we typically use $\chi=2s$ and estimate the (\ref{eq:smart}) integral by a Riemann sum where the $[t_{i+1}(s), t_i+\chi]$ interval is divided in 4 subintervals of equal size.
For each subinterval extremity, an order 1 predictor $\hat{v}(t_i)$ linearly estimates the viewpoint based on $v(t_i)$ and speed estimation (discrete derivative at $t_i$).

We also tested an alternative greedy heuristic selecting the segment that optimizes an utility variation during downloading (between $t_i$ and $t_{i+1}$):
\begin{equation}
s^{\texttt{GREEDY}}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC}}  \frac{\mathcal{U}\Big(s,\hat{v}(t_{i+1}(s))\Big)}{t_{i+1}(s) - t_i}
\label{eq:greedy}
\end{equation}
Do stuff 2019-08-29 16:51:42 +02:00			`\copied{}`
Start merging ACMMM 2018 2019-08-27 10:46:00 +02:00			`\section{Client}\label{sec:dashclientspec}`

			`In this section, we specify a DASH NVE client that exploits the preparation of the 3D content in an NVE for streaming.`

			`The generated MPD file describes the content organization so that the client gets all the necessary information to make educated decisions and query the 3D content it needs according to the available resources and current viewpoint.`
			`A camera path generated by a particular user is a set of viewpoint $v(t_i)$ indexed by a continuous time interval $t_i \in [t_1,t_{end}]$.`

			`The DASH client first downloads the MPD file to get the material (.mtl) file containing information about all the geometry and textures available for the entire 3D model.`
			`At time instance $t_i$, the DASH client decides to download the appropriate segments containing the geometry and the texture to generate the viewpoint $v(t_{i+1})$ for the time instance $t_{i+1}$.`

			`Starting from $t_1$, the camera continuously follows a camera path $C=\{v(t_i), t_i \in [t_1,t_{end}]\}$, along which downloading opportunities are strategically exploited to sequentially query the most useful segments.`

			`\subsection{Segment Utility}\label{subsec:utility}`

			`Unlike video streaming, where the bitrate of each segment correlates with the quality of the video received, for 3D content, the size (in bytes) of the content does not necessarily correlate well to its contribution to visual quality.`
			`A large polygon with huge visual impact takes the same number of bytes as a tiny polygon.`
			`Further, the visual impact is \textit{view dependent} --- a large object that is far away or out of view does not contribute to the visual quality as much as a smaller object that is closer to the user.`
			`As such, it is important for a DASH-based NVE client to estimate the usefulness of a given segment to download, so that it can make good decisions about what to download.`
			`We call this usefulness the \textit{utility} of the segment.`

			`The utility is a function of a segment, either geometry or texture, and the current viewpoint (camera location, view angle, and look-at point), and is therefore dynamically computed online by the client from parameters in the MPD file.`

			`\subsubsection{Offline parameters}`
			`Let us detail first, all parameters available from the offline/static preparation of the 3D NVE\@.`
			`These parameters are stored in the MPD file.`
			`First, for each geometry segment $s^G$ there is a predetermined 3D area $\mathcal{A}_{3D}(s^G)$, equal to the sum of all triangle areas in this segment (in 3D); it is computed as the segments are created.`
Do stuff 2019-08-29 16:51:42 +02:00			`Note that the texture segments have similar information, but computed at \textit{navigation time} $t_i$.`
Start merging ACMMM 2018 2019-08-27 10:46:00 +02:00			`The second information stored in the MPD for all segments, geometry, and texture, is the size of the segment (in kB).`
			`Indeed, geometry segments have close to a similar number of faces; their size is almost uniform.`
			`For texture segments, the size is usually much smaller than the geometry segments but also varies a lot, as between two successive resolutions the number of pixels is divided by 4.`

			`Finally, for each texture segment $s^{T}$, the MPD stores the \textit{MSE} (mean square error) of the image and resolution, relative to the highest resolution (by default, triangles are filled with its average color).`
Merge MMSYS 16 2019-08-28 11:39:20 +02:00			`Offline parameters are stored in the MPD as shown in Listing~\ref{listing:MPD}.`
Start merging ACMMM 2018 2019-08-27 10:46:00 +02:00
			`\subsubsection{Online parameters}`
			`In addition to the offline parameters stored in the MPD file for each segment, view-dependent parameters are computed at navigation time.`
			`First, a measure of 3D area is computed for texture segments.`
			`As a texture maps on a set of triangles, we account for the area in 3D of all these triangles.`
			`We could consider such an offline measure (attached to the adaptation set containing the texture), but we prefer to only account for the triangles that have been already downloaded by the client.`
			`We call the set of triangles colored by a texture $T$: $\Delta(s^T)=\Delta(T)$ (depending only on $T$ and equal for any representation/segment $s^T$ in this texture adaptation set).`
			`At each time $t_i$, a subset of $\Delta(T)$ has been downloaded; we denote it $\Delta(T, t_i)$.`

			`Moreover, each geometry segment belongs to a geometry adaptation set $AS^G$ whose bounding box coordinates are stored in the MPD\@.`
			`Given the coordinates of the bounding box $\mathcal{BB}(AS^G)$ and the viewpoint $v(t_i)$ at time $t_i$, the client computes the distance $\mathcal{D}(v(t_i),AS^G)$ of the bounding box $\mathcal{BB}(AS^G)$ as the distance from the center of $\mathcal{BB}(AS^G)$ to the principal point of the camera, given in $v(t_i)$.`


			`\subsubsection{Utility for geometry segments}`
			`We now have all parameters to derive a utility measure of a geometry segment.`
Do stuff 2019-08-29 16:51:42 +02:00			`Utility for texture segments follows from the geometric utility.`
Start merging ACMMM 2018 2019-08-27 10:46:00 +02:00
			`The utility of a geometric segment $s^G$ for a viewpoint $v(t_i)$ is:`
			`\begin{equation*}`
			`\mathcal{U} \Big(s^G,v(t_i) \Big) = \frac{\mathcal{A}_{3D}(s^G)}{\mathcal{D}{\left(v{(t_i)},AS^G\right)}^2}`
			`\end{equation*}`
			`where $AS^G$ is the adaptation set containing $s^G$.`

			`Basically, the utility of a segment is proportional to the area that its faces cover, and inversely proportional to the square of the distance between the camera and the center of the bounding box of the adaptation set containing the segment.`
			`That way, we favor segments with big faces that are close to the camera.`

			`\subsubsection{Utility for texture segments}`
			`For a texture $T$ stored in a segment $s^T$, the triangles in $\Delta(T)$ are stored in arbitrary geometry segments, that is, they do not have spatial coherence.`
			`Thus, for each $k^{th}$ downloaded geometry segment $s_k^G$, and total downloaded segment $K$ at time $t_i$, we collect the triangles of $\Delta(T, t_i)$ in $s^G_k$, and compute the ratio of $\mathcal{A}_{3D}(s_k^G)$ covered by these triangles.`
			`So, we define the utility:`
			`\begin{equation*}`
			`\mathcal{U}\Big( s^T,v(t_i) \Big)`
			`= psnr(s^T) \sum_{k\in K}\frac{\mathcal{A}_{3D}( s_k^G\cap \Delta(T,t_i))}{\mathcal{A}_{3D}(s_k^G)} \mathcal{U}\Big( s_k^G,v(t_i) \Big)`
			`\end{equation*}`
			`where we sum over all geometry segments received before time $t_i$ that intersect $\Delta(T,t_i)$ and such that the adaptation set it belongs to is in the frustum.`
			`This formula defines the utility of a texture segment by computing the linear combination of the utility of the geometry segments that use this texture, weighted by the proportion of area covered by the texture in the segment.`
			`We compute the PSNR by using the MSE in the MPD and denote it $psnr(s^T)$.`
Do stuff 2019-08-29 16:51:42 +02:00			`We do this to acknowledge the fact that a texture at a greater resolution has a higher utility than a lower resolution texture.`
Start merging ACMMM 2018 2019-08-27 10:46:00 +02:00			`The equivalent term for geometry is 1 (and does not appear).`
			`Having defined a utility on both geometry and texture segments, the client uses it next for its streaming strategy.`

			`\subsection{DASH Adaptation Logic}\label{subsec:dashadaptation}`

			`Along the camera path $C=\{v(t_i)\}$, viewpoints are indexed by a continuous time interval $t_i \in [t_1,t_{end}]$.`
			`Contrastingly, the DASH adaptation logic proceeds sequentially along a discrete time line.`
			`The first request \texttt{(HTTP request)} made by the DASH client at time $t_1$ selects the most useful segment $s_1^*$ to download and will be followed by subsequent decisions at $t_2, t_3, \dots$.`
			`While selecting $s_i^*$, the i-th best segment to request, the adaptation logic compromises between geometry, texture, and the available \texttt{representations} given the current bandwidth, camera dynamics, and the previously described utility scores.`
			`The difference between $t_{i+1}$ and $t_{i}$ is the $s_i^*$ delivery delay.`
			`It varies with the segment size and network conditions.`
			`Algorithm~\ref{algorithm:nextsegment} details how our DASH client makes decisions.`



Merge MMSYS 16 2019-08-28 11:39:20 +02:00			`\begin{algorithm}[th]`
Start merging ACMMM 2018 2019-08-27 10:46:00 +02:00			`\SetKwInOut{Input}{input}`
			`\SetKwInOut{Output}{output}`
			`\Input{Current index $i$, time $t_i$, viewpoint $v(t_i)$, buffer of already downloaded \texttt{segments} $\mathcal{B}_i$, MPD}`
			`\Output{Next segment $s^{*}_i$ to request, updated buffer $\mathcal{B}_{i+1}$}`
			`\SetAlgoLined{}`
			`{- Estimate the bandwidth $\widehat{BW_i}$ and RTT $\widehat{\tau_i}$ \;}`

			`{- Among all \texttt{segments} that are not already downloaded $s \in \mathcal{S} \backslash \mathcal{B}_i$, % \;}`
			`% {-`
			`keep the ones inside the upcoming viewing frustums $\mathcal{FC}=\mathbb{FC}(\widehat{v}(t_i)), t\in [t_i, t_i+\chi]$ thanks to a viewpoint predictor $t_i \rightarrow \hat{v}(t_i)$, a temporal horizon $\chi$ and a frustum culling operator $\mathbb{FC}$ \;}`


			`{- Optimize a criterion $\Omega$ based on $\mathcal{U}$ values and well chosen viewpoint $v(t_i)$ to select the next segment to query }`
			`{\begin{equation} s^{}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC}} \Omega_{\theta_i} \Big(\mathcal{U}\left(s,v(t_i)\right)\Big) \label{eq1}\end{equation*} \\`
			`given parameters $\theta_i$ that gathers both online parameters $(i,t_i,v(t_i),\widehat{BW_i}, \widehat{\tau_i}, \mathcal{B}_i)$ and offline metadata\;}`

			`{- Update the buffer $\mathcal{B}_{i+1}$ for the next decision: $s^{}_i$ and lowest \texttt{representations} of $s^{}_i$ are considered downloaded\;}`
			`{- \Return{segment $s^{*}_i$, buffer $\mathcal{B}_{i+1}$}\;}`

			`{\caption{Algorithm to identify the next segment to query\label{algorithm:nextsegment}}}`
			`\end{algorithm}`

			`The most naive way to sequentially optimize the $\mathcal{U}$ is to limit the decision-making to the current viewpoint $v(t_i)$.`
			`In that case, the best segment $s$ to request would be the one maximizing $\mathcal{U}(s, v(t_i))$ to simply make a better rendering from the current viewpoint $v(t_i)$.`
			`Due to transmission delay however, this segment will be only delivered at time $t_{i+1}=t_{i+1}(s)$ depending on the segment size and network conditions: \begin{equation} t_{i+1}(s)=t_i+\frac{\mathtt{size}(s)}{\widehat{BW_i}} + \widehat{\tau_i}\label{eq2}\end{equation}`

			`In consequence, the most useful segment from $v(t_i)$ at decision time $t_i$ might be less useful at delivery time from $v(t_{i+1})$.`

			`A better solution is to download a segment that is expected to be the most useful in the future.`
			`With a temporal horizon $\chi$, we can optimize the cumulated $\mathcal{U}$ over $[t_{i+1}(s), t_i+\chi]$:`

			`\begin{equation}`
			`s^*_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC} } \int_{t_{i+1}(s)}^{t_i+\chi} \mathcal{U}(s,\hat{v}(t_i)) dt`
			`\label{eq:smart}`
			`\end{equation}`

			`In our experiments, we typically use $\chi=2s$ and estimate the (\ref{eq:smart}) integral by a Riemann sum where the $[t_{i+1}(s), t_i+\chi]$ interval is divided in 4 subintervals of equal size.`
			`For each subinterval extremity, an order 1 predictor $\hat{v}(t_i)$ linearly estimates the viewpoint based on $v(t_i)$ and speed estimation (discrete derivative at $t_i$).`

			`We also tested an alternative greedy heuristic selecting the segment that optimizes an utility variation during downloading (between $t_i$ and $t_{i+1}$):`
			`\begin{equation}`
			`s^{\texttt{GREEDY}}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \mathcal{FC}} \frac{\mathcal{U}\Big(s,\hat{v}(t_{i+1}(s))\Big)}{t_{i+1}(s) - t_i}`
			`\label{eq:greedy}`
			`\end{equation}`