In this section, we specify a DASH NVE client that exploits the preparation of the 3D content in an NVE for streaming.
The generated MPD file describes the content organization so that the client gets all the necessary information to make educated decisions and query the 3D content it needs according to the available resources and current viewpoint.
A camera path generated by a particular user is a set of viewpoint $v(t_i)$ indexed by a continuous time interval $t_i \in[t_1,t_{end}]$.
The DASH client first downloads the MPD file to get the material (.mtl) file containing information about all the geometry and textures available for the entire 3D model.
At time instance $t_i$, the DASH client decides to download the appropriate segments containing the geometry and the texture to generate the viewpoint $v(t_{i+1})$ for the time instance $t_{i+1}$.
Starting from $t_1$, the camera continuously follows a camera path $C=\{v(t_i), t_i \in[t_1,t_{end}]\}$, along which downloading opportunities are strategically exploited to sequentially query the most useful segments.
Unlike video streaming, where the bitrate of each segment correlates with the quality of the video received, for 3D content, the size (in bytes) of the content does not necessarily correlate well to its contribution to visual quality.
A large polygon with huge visual impact takes the same number of bytes as a tiny polygon.
Further, the visual impact is \textit{view dependent} --- a large object that is far away or out of view does not contribute to the visual quality as much as a smaller object that is closer to the user.
As such, it is important for a DASH-based NVE client to estimate the usefulness of a given segment to download, so that it can make good decisions about what to download.
We call this usefulness the \textit{utility} of the segment.
The utility is a function of a segment, either geometry or texture, and the current viewpoint (camera location, view angle, and look-at point), and is therefore dynamically computed online by the client from parameters in the MPD file.
\subsubsection{Offline parameters}
Let us detail first, all parameters available from the offline/static preparation of the 3D NVE\@.
These parameters are stored in the MPD file.
First, for each geometry segment $s^G$ there is a predetermined 3D area $\mathcal{A}_{3D}(s^G)$, equal to the sum of all triangle areas in this segment (in 3D); it is computed as the segments are created.
The second information stored in the MPD for all segments, geometry, and texture, is the size of the segment (in kB).
Indeed, geometry segments have close to a similar number of faces; their size is almost uniform.
For texture segments, the size is usually much smaller than the geometry segments but also varies a lot, as between two successive resolutions the number of pixels is divided by 4.
Finally, for each texture segment $s^{T}$, the MPD stores the \textit{MSE} (mean square error) of the image and resolution, relative to the highest resolution (by default, triangles are filled with its average color).
In addition to the offline parameters stored in the MPD file for each segment, view-dependent parameters are computed at navigation time.
First, a measure of 3D area is computed for texture segments.
As a texture maps on a set of triangles, we account for the area in 3D of all these triangles.
We could consider such an offline measure (attached to the adaptation set containing the texture), but we prefer to only account for the triangles that have been already downloaded by the client.
We call the set of triangles colored by a texture $T$: $\Delta(s^T)=\Delta(T)$ (depending only on $T$ and equal for any representation/segment $s^T$ in this texture adaptation set).
At each time $t_i$, a subset of $\Delta(T)$ has been downloaded; we denote it $\Delta(T, t_i)$.
Moreover, each geometry segment belongs to a geometry adaptation set $AS^G$ whose bounding box coordinates are stored in the MPD\@.
Given the coordinates of the bounding box $\mathcal{BB}(AS^G)$ and the viewpoint $v(t_i)$ at time $t_i$, the client computes the distance $\mathcal{D}(v(t_i),AS^G)$ of the bounding box $\mathcal{BB}(AS^G)$ as the distance from the center of $\mathcal{BB}(AS^G)$ to the principal point of the camera, given in $v(t_i)$.
\subsubsection{Utility for geometry segments}
We now have all parameters to derive a utility measure of a geometry segment.
where $AS^G$ is the adaptation set containing $s^G$.
Basically, the utility of a segment is proportional to the area that its faces cover, and inversely proportional to the square of the distance between the camera and the center of the bounding box of the adaptation set containing the segment.
That way, we favor segments with big faces that are close to the camera.
\subsubsection{Utility for texture segments}
For a texture $T$ stored in a segment $s^T$, the triangles in $\Delta(T)$ are stored in arbitrary geometry segments, that is, they do not have spatial coherence.
Thus, for each $k^{th}$ downloaded geometry segment $s_k^G$, and total downloaded segment $K$ at time $t_i$, we collect the triangles of $\Delta(T, t_i)$ in $s^G_k$, and compute the ratio of $\mathcal{A}_{3D}(s_k^G)$ covered by these triangles.
where we sum over all geometry segments received before time $t_i$ that intersect $\Delta(T,t_i)$ and such that the adaptation set it belongs to is in the frustum.
This formula defines the utility of a texture segment by computing the linear combination of the utility of the geometry segments that use this texture, weighted by the proportion of area covered by the texture in the segment.
We compute the PSNR by using the MSE in the MPD and denote it $psnr(s^T)$.
The first HTTP request made by the DASH client at time $t_1$ selects the most useful segment $s_1^*$ to download and will be followed by subsequent decisions at $t_2, t_3, \dots$.
While selecting $s_i^*$, the i-th best segment to request, the adaptation logic compromises between geometry, texture, and the available \texttt{representations} given the current bandwidth, camera dynamics, and the previously described utility scores.
The difference between $t_{i+1}$ and $t_{i}$ is the $s_i^*$ delivery delay.
It varies with the segment size and network conditions.
The most naive way to sequentially optimize the $\mathcal{U}$ is to limit the decision-making to the current viewpoint $v(t_i)$.
In that case, the best segment $s$ to request would be the one maximizing $\mathcal{U}(s, v(t_i))$ to simply make a better rendering from the current viewpoint $v(t_i)$.
Due to transmission delay however, this segment will be only delivered at time $t_{i+1}=t_{i+1}(s)$ depending on the segment size and network conditions: \begin{equation*} t_{i+1}(s)=t_i+\frac{\mathtt{size}(s)}{\widehat{BW_i}} + \widehat{\tau_i}\label{d3:eq2}\end{equation*}
In our experiments, we typically use $\chi=2s$ and estimate the (\ref{d3:smart}) integral by a Riemann sum where the $[t_{i+1}(s), t_i+\chi]$ interval is divided in 4 subintervals of equal size.
For each subinterval extremity, an order 1 predictor $\hat{v}(t_i)$ linearly estimates the viewpoint based on $v(t_i)$ and speed estimation (discrete derivative at $t_i$).
We also tested an alternative greedy heuristic selecting the segment that optimizes an utility variation during downloading (between $t_i$ and $t_{i+1}$):
Since our scene is large, and since the system we are describing allows navigating in a streaming scene, we developed a JavaScript Web client that implements our utility metrics and policies.
Performance of our system is a key aspect in our work; as such, we can not use the default geometries described in Section~\ref{f:geometries} because of its poor performance, and we instead use buffer geometries.
\item It has a constructor that takes as parameter the number of faces: it allocates all the memory needed for our buffers so we do not have to reallocate it later (which would be inefficient).
\item It keeps track of the number of faces it is currently holding: it can then avoid rendering faces that have not been filled and knows where to add new faces.
\item It provides a method to add a new polygon to the geometry.
\item It also keeps track of what part of the buffers has been transmitted to the GPU\@: THREE.js allows us to set the range of the buffer that we want to update, and we are able to update only what is necessary.
To make this easy to manage, we implemented a \textbf{Model} class, that holds both geometry and textures.
We can add vertices, faces, and materials to this model, and it internally manages with the right geometries, materials and meshes.
In order to avoid having many models that share the same material (which would harm performance), it automatically merges faces that share the same material in the same buffer geometry, as shown in Figure~\ref{d3:render-structure}.
In order to be able to implement our view-dependent DASH-3D client, we need to implement the access client, which is responsible for deciding what to download and for downloading it.
To do so, we use the strategy pattern illustrated in Figure~\ref{d3:dash-loader}.
We maintain a base class named \texttt{LoadingPolicy} that contain some attributes and functions to keep track of what has been downloaded.
This class %INCOMPREHENSIBLE : has a derived class can use to make smart decisions, and
exposes a function named \texttt{nextSegment} that takes two arguments:
Then, the main class responsible for the loading of segments is the \texttt{DashLoader} class.
It uses \texttt{XMLHttpRequest}s, which are the usual way of making HTTP requests in JavaScript, and it calls the corresponding parser on the results of those requests.
The \texttt{DashLoader} class accepts as parameter a function that will be called each time some data has been downloaded and parsed: this data can contain vertices, texture coordinates, normals, materials or textures, and they can all be added to the \texttt{Model} class that we described in Section~\ref{d3:model-class}.
\caption{Class diagram of our DASH client\label{d3:dash-loader}}
\end{figure}
\subsubsection{Performance}
In JavaScript, there is no way of doing parallel computing without using \emph{web workers}.
A web worker is a script in JavaScript that runs in the background, on a separate thread and that can communicate with the main script by sending and receiving messages.
Because of this, we are forced to run the renderer on the main script, where it can access the HTML page, and we move all the other tasks (i.e. the access client, the control engine and the segment parsers) to the worker.
Since the main script is the only thread communicating with the GPU, it will still have to update the model with the parsed content it receives from the worker.
Using a worker does not so much improve the framerate of the system, but it reduces the latency that occurs when receiving a new segment, which can be very frustrating since in a single thread scenario, each time a segment is received, the interface freezes for around half a second.
However, a web client is not sufficient to analyse our streaming policies: many tasks are performed (such as rendering, and managing the interaction) and all this overhead pollutes the analysis of our policies.
This is why we also implemented a client in Rust, for simulation, so we can gather precise simulated data.
Our requirements are quite different that the ones we had to deal with in our JavaScript implementation.
In this setup, we want to build a system that is the closest to our theoretical concepts.
Therefore, we do not have a full client in Rust (meaning an application to which you would give the URL to an MPD file and that would allow you to navigate in the scene while it is being downloaded).
In order to be able to run simulations, we develop the bricks of the DASH client separately: the access client and the media engine are totally isolated:
\item the \textbf{simulator} takes a user trace as a parameter, it then replays the trace using specific parameters of the access client and outputs a file containing the history of the simulation (which files have been downloaded, and when);
\item the \textbf{renderer} takes the user trace as well as the history generated by the simulator as parameters, and renders images that correspond to what would have been seen.
When simulating experiments, we will run the simulator on many traces that we collected during user-studies, and we will then run the renderer program according to the traces to generate images corresponding to the simulation.