In this section, we specify a DASH NVE client that exploits the preparation of the 3D content in an NVE for streaming.
The generated MPD file describes the content organization so that the client gets all the necessary information to make educated decisions and query the 3D content it needs according to the available resources and current viewpoint.
A camera path generated by a particular user is a set of viewpoint $v(t_i)$ indexed by a continuous time interval $t_i \in[t_1,t_{end}]$.
The DASH client first downloads the MPD file to get the material (.mtl) file containing information about all the geometry and textures available for the entire 3D model.
At time instance $t_i$, the DASH client decides to download the appropriate segments containing the geometry and the texture to generate the viewpoint $v(t_{i+1})$ for the time instance $t_{i+1}$.
Starting from $t_1$, the camera continuously follows a camera path $C=\{v(t_i), t_i \in[t_1,t_{end}]\}$, along which downloading opportunities are strategically exploited to sequentially query the most useful segments.
Unlike video streaming, where the bitrate of each segment correlates with the quality of the video received, for 3D content, the size (in bytes) of the content does not necessarily correlate well to its contribution to visual quality.
A large polygon with huge visual impact takes the same number of bytes as a tiny polygon.
Further, the visual impact is \textit{view dependent} --- a large object that is far away or out of view does not contribute to the visual quality as much as a smaller object that is closer to the user.
As such, it is important for a DASH-based NVE client to estimate the usefulness of a given segment to download, so that it can make good decisions about what to download.
We call this usefulness the \textit{utility} of the segment.
The utility is a function of a segment, either geometry or texture, and the current viewpoint (camera location, view angle, and look-at point), and is therefore dynamically computed online by the client from parameters in the MPD file.
\subsubsection{Offline parameters}
Let us detail first, all parameters available from the offline/static preparation of the 3D NVE\@.
These parameters are stored in the MPD file.
First, for each geometry segment $s^G$ there is a predetermined 3D area $\mathcal{A}_{3D}(s^G)$, equal to the sum of all triangle areas in this segment (in 3D); it is computed as the segments are created.
The second information stored in the MPD for all segments, geometry, and texture, is the size of the segment (in kB).
Indeed, geometry segments have close to a similar number of faces; their size is almost uniform.
For texture segments, the size is usually much smaller than the geometry segments but also varies a lot, as between two successive resolutions the number of pixels is divided by 4.
Finally, for each texture segment $s^{T}$, the MPD stores the \textit{MSE} (mean square error) of the image and resolution, relative to the highest resolution (by default, triangles are filled with its average color).
In addition to the offline parameters stored in the MPD file for each segment, view-dependent parameters are computed at navigation time.
First, a measure of 3D area is computed for texture segments.
As a texture maps on a set of triangles, we account for the area in 3D of all these triangles.
We could consider such an offline measure (attached to the adaptation set containing the texture), but we prefer to only account for the triangles that have been already downloaded by the client.
We call the set of triangles colored by a texture $T$: $\Delta(s^T)=\Delta(T)$ (depending only on $T$ and equal for any representation/segment $s^T$ in this texture adaptation set).
At each time $t_i$, a subset of $\Delta(T)$ has been downloaded; we denote it $\Delta(T, t_i)$.
Moreover, each geometry segment belongs to a geometry adaptation set $AS^G$ whose bounding box coordinates are stored in the MPD\@.
Given the coordinates of the bounding box $\mathcal{BB}(AS^G)$ and the viewpoint $v(t_i)$ at time $t_i$, the client computes the distance $\mathcal{D}(v(t_i),AS^G)$ of the bounding box $\mathcal{BB}(AS^G)$ as the distance from the center of $\mathcal{BB}(AS^G)$ to the principal point of the camera, given in $v(t_i)$.
\subsubsection{Utility for geometry segments}
We now have all parameters to derive a utility measure of a geometry segment.
where $AS^G$ is the adaptation set containing $s^G$.
Basically, the utility of a segment is proportional to the area that its faces cover, and inversely proportional to the square of the distance between the camera and the center of the bounding box of the adaptation set containing the segment.
That way, we favor segments with big faces that are close to the camera.
\subsubsection{Utility for texture segments}
For a texture $T$ stored in a segment $s^T$, the triangles in $\Delta(T)$ are stored in arbitrary geometry segments, that is, they do not have spatial coherence.
Thus, for each $k^{th}$ downloaded geometry segment $s_k^G$, and total downloaded segment $K$ at time $t_i$, we collect the triangles of $\Delta(T, t_i)$ in $s^G_k$, and compute the ratio of $\mathcal{A}_{3D}(s_k^G)$ covered by these triangles.
where we sum over all geometry segments received before time $t_i$ that intersect $\Delta(T,t_i)$ and such that the adaptation set it belongs to is in the frustum.
This formula defines the utility of a texture segment by computing the linear combination of the utility of the geometry segments that use this texture, weighted by the proportion of area covered by the texture in the segment.
We compute the PSNR by using the MSE in the MPD and denote it $psnr(s^T)$.
Along the camera path $C=\{v(t_i)\}$, viewpoints are indexed by a continuous time interval $t_i \in[t_1,t_{end}]$.
Contrastingly, the DASH adaptation logic proceeds sequentially along a discrete time line.
The first request \texttt{(HTTP request)} made by the DASH client at time $t_1$ selects the most useful segment $s_1^*$ to download and will be followed by subsequent decisions at $t_2, t_3, \dots$.
While selecting $s_i^*$, the i-th best segment to request, the adaptation logic compromises between geometry, texture, and the available \texttt{representations} given the current bandwidth, camera dynamics, and the previously described utility scores.
The difference between $t_{i+1}$ and $t_{i}$ is the $s_i^*$ delivery delay.
It varies with the segment size and network conditions.
\Input{Current index $i$, time $t_i$, viewpoint $v(t_i)$, buffer of already downloaded \texttt{segments}$\mathcal{B}_i$, MPD}
\Output{Next segment $s^{*}_i$ to request, updated buffer $\mathcal{B}_{i+1}$}
\SetAlgoLined{}
{- Estimate the bandwidth $\widehat{BW_i}$ and RTT $\widehat{\tau_i}$\;}
{- Among all \texttt{segments} that are not already downloaded $s \in\mathcal{S}\backslash\mathcal{B}_i$, % \;}
% {-
keep the ones inside the upcoming viewing frustums $\mathcal{FC}=\mathbb{FC}(\widehat{v}(t_i)), t\in[t_i, t_i+\chi]$ thanks to a viewpoint predictor $t_i \rightarrow\hat{v}(t_i)$, a temporal horizon $\chi$ and a frustum culling operator $\mathbb{FC}$\;}
{- Optimize a criterion $\Omega$ based on $\mathcal{U}$ values and well chosen viewpoint $v(t_i)$ to select the next segment to query }
given parameters $\theta_i$ that gathers both online parameters $(i,t_i,v(t_i),\widehat{BW_i}, \widehat{\tau_i}, \mathcal{B}_i)$ and offline metadata\;}
{- Update the buffer $\mathcal{B}_{i+1}$ for the next decision: $s^{*}_i$ and lowest \texttt{representations} of $s^{*}_i$ are considered downloaded\;}
The most naive way to sequentially optimize the $\mathcal{U}$ is to limit the decision-making to the current viewpoint $v(t_i)$.
In that case, the best segment $s$ to request would be the one maximizing $\mathcal{U}(s, v(t_i))$ to simply make a better rendering from the current viewpoint $v(t_i)$.
Due to transmission delay however, this segment will be only delivered at time $t_{i+1}=t_{i+1}(s)$ depending on the segment size and network conditions: \begin{equation*} t_{i+1}(s)=t_i+\frac{\mathtt{size}(s)}{\widehat{BW_i}} + \widehat{\tau_i}\label{d3:eq2}\end{equation*}
In our experiments, we typically use $\chi=2s$ and estimate the (\ref{d3:smart}) integral by a Riemann sum where the $[t_{i+1}(s), t_i+\chi]$ interval is divided in 4 subintervals of equal size.
For each subinterval extremity, an order 1 predictor $\hat{v}(t_i)$ linearly estimates the viewpoint based on $v(t_i)$ and speed estimation (discrete derivative at $t_i$).
We also tested an alternative greedy heuristic selecting the segment that optimizes an utility variation during downloading (between $t_i$ and $t_{i+1}$):
In order to be able to evaluate our system, we need to collect traces and perform analyses on them.
Since our scene is large, and since the system we are describing allows navigating in a streaming scene, we developed a web client that implements our utility metrics and policies.
\subsubsection{Media engine}
Of course, in this work, we are concerned about performance of our system, and we will not be able to use the normal geometries described in Section~\ref{f:geometries}.
However, the way our system works, the way changes happen to the 3D content is always the same: we only add faces and textures to the model.
Therefore, we made a class that derives BufferGeometry, and that makes it more convenient for us.
\begin{itemize}
\item It has a constructor that takes as parameter the number of faces: it allocates all the memory needed for our buffers so we do not have to reallocate it which would be inefficient.
\item It keeps track of the number of faces it is currently holding: it can then avoid rendering faces that have not been filled and knows where to put new faces.
\item It provides a method that adds a face to the geometry.
\item It also keeps track of what part of the buffers has been transmitted to the GPU\@: THREE.js allows us to set the range of the buffer that we want to update and we are able to update only what is necessary.
\end{itemize}
\paragraph{Our 3D model class.\label{d3:model-class}}
As said in the previous subsections, a geometry and a material a bound together in a mesh.
This means that we are forced to have has many meshes as there are materials in our model.
To make this easy to manage, we made a \textbf{Model} class, that holds everything we need.
We can add vertices, faces, and materials to this model, and it will internally deal with the right geometries, materials and meshes.
In order to avoid having many models that have the same material which would harm performance, it automatically merges faces that share the same material in the same buffer geometry, as shown in Figure~\ref{d3:render-structure}.
\caption{Reordering of the content on the renderer\label{d3:render-structure}}
\end{figure}
\subsubsection{Access client}
In order to be able to implement our DASH-3D client, we need to implement the access client, which is responsible for deciding what to download and download it.
To do so, we use the strategy pattern, as shown in Figure~\ref{d3:dash-loader}.
We have a base class named \texttt{LoadingPolicy} that contain some attributes and functions to keep data about what has been downloaded that a derived class can use to make smart decisions, and exposes a function named \texttt{nextSegment} that takes two arguments:
\begin{itemize}
\item the MPD, so that a strategy can know all the metadata of the segments before making its decision;
\item the camera, because the best segment depends on the position of the camera.
\end{itemize}
The greedy, greedy predictive and proposed policies from the previous chapter are all classes that derive from \texttt{LoadingPolicy}.
Then, the main class responsible for the loading of segments is the \texttt{DashLoader} class.
It uses \texttt{XMLHttpRequest}s, which are the usual way of making HTTP requests in JavaScript, and it calls the corresponding parser on the results of those requests.
The \texttt{DashLoader} class accepts as parameter a function that will be called each time some data has been downloaded and parsed: this data can contain vertices, texture coordinates, normals, materials or textures, and they can all be added to the \texttt{Model} class that we described in Section~\ref{d3:model-class}.
\begin{figure}[ht]
\centering
\begin{tikzpicture}[scale=0.65]
\draw (0, 0) rectangle (5, -4);
\draw (0, -1) -- (5, -1);
\node at (2.5, -0.5) {DashClient};
\node[right] at (0, -1.5) {\scriptsize loadNextSegment()};
\draw (5, -2) -- (8, -2);
\draw (8, 0) rectangle (14, -4);
\draw (8, -1) -- (14, -1);
\node at (11, -0.5) {LoadingPolicy};
\node[right] at (8, -1.5) {\scriptsize nextSegment(mpd, camera)};
\draw (1, -6) rectangle (7, -10);
\draw (1, -7) -- (7, -7);
\node at (4, -6.5) {Greedy};
\node[right] at (1, -7.5) {\scriptsize nextSegment(mpd, camera)};
\draw (8, -6) rectangle (14, -10);
\draw (8, -7) -- (14, -7);
\node at (11, -6.5) {GreedyPredictive};
\node[right] at (8, -7.5) {\scriptsize nextSegment(mpd, camera)};
\draw (15, -6) rectangle (21, -10);
\draw (15, -7) -- (21, -7);
\node at (18, -6.5) {Proposed};
\node[right] at (15, -7.5) {\scriptsize nextSegment(mpd, camera)};
\caption{Class diagram of our DASH client\label{d3:dash-loader}}
\end{figure}
\subsubsection{Performance}
In JavaScript, there is no way of doing parallel computing without using \emph{web workers}.
A web worker is a script in JavaScript that runs in the background, on a separate thread and that can communicate with the main script by sending and receiving messages.
Since our system has many tasks to do, it seems natural to use workers to manage the streaming without impacting the framerate of the renderer.
However, what a worker can do is very limited, since it cannot access the variables of the main script.
Because of this, we are forced to run the renderer on the main script, where it can access the HTML page, and we move all the other tasks to the worker (the access client, the control engine and the segment parsers), and since the main script is the one communicating with the GPU, it will still have to update the model with the parsed content it receives from the worker.
Using the worker does not so much improve the framerate of the system, but it reduces the latency that occurs when receiving a new segment, which can be very frustrating since in a single thread scenario, each time a segment is received, the interface freezes for around half a second.
A sequence diagram of what happens when downloading, parsing and rendering content is shown in Figure~\ref{d3:sequence}.
However, a web client is not sufficient to analyse our streaming policies: many tasks are performed (such as rendering, and managing the interaction) and all this overhead pollutes the analysis of our policies.
This is why we also implemented a client in Rust, for simulation, so we can gather precise simulated data.
Our requirements are quite different that the ones we had to deal with in our JavaScript implementation.
In this setup, we want to build a system that is the closest to our theoretical concepts.
Therefore, we do not have a full client in Rust (meaning an application to which you would give the URL to an MPD file and that would allow you to navigate in the scene while it is being downloaded).
In order to be able to run simulations, we develop the bricks of the DASH client separately: the access client and the media engine are totally separated:
\begin{itemize}
\item the \textbf{simulator} takes a user trace as a parameter, it then replays the trace using specific parameters of the access client and outputs a file containing the history of the simulation (what files have been downloaded, and when);
\item the \textbf{renderer} takes the user trace as well as the history generated by the simulator as parameters, and renders images that correspond to what would have been seen.
\end{itemize}
When simulating experiments, we will run the simulator on many traces that we collected during user-studies, and we will then run the renderer program on it to generate images corresponding to the simulation.
We are then able to compute PSNR between those frames and the ground truth frames.
Doing so guarantees us that our simulator is not affected by the performances of our renderer.