This commit is contained in:
2019-11-27 22:37:49 +01:00
parent 2146a28ddd
commit de4c390f70
13 changed files with 55 additions and 52 deletions

View File

@@ -1,6 +1,6 @@
\section{Client\label{d3:dash-client}}
In this section, we specify a DASH NVE client that exploits the preparation of the 3D content in an NVE for streaming.
In this section, we specify a DASH NVE client which exploits the preparation of the 3D content in an NVE for streaming.
The generated MPD file describes the content organization so that the client gets all the necessary information to make educated decisions and query the 3D content it needs according to the available resources and current viewpoint.
A camera path generated by a particular user is a set of viewpoint $v(t_i)$ indexed by a continuous time interval $t_i \in [t_1,t_{end}]$.
@@ -96,8 +96,8 @@ These parameters are stored in the MPD file.
First, for each geometry segment $s^G$ there is a predetermined 3D area $\mathcal{A}_{3D}(s^G)$, equal to the sum of all triangle areas in this segment (in 3D); it is computed as the segments are created.
Note that the texture segments have similar information, but computed at \textit{navigation time} $t_i$.
The second information stored in the MPD for all segments, geometry, and texture, is the size of the segment (in kB).
Indeed, geometry segments have close to a similar number of faces; their size is almost uniform.
For texture segments, the size is usually much smaller than the geometry segments but also varies a lot, as between two successive resolutions the number of pixels is divided by 4.
\update{Indeed, geometry segments have a similar number of faces; their size is almost uniform.
For texture segments, the size is usually much smaller than the geometry segments but also varies a lot, as between two successive resolutions the number of pixels is divided by 4.}{}
Finally, for each texture segment $s^{T}$, the MPD stores the \textit{MSE} (mean square error) of the image and resolution, relative to the highest resolution (by default, triangles are filled with its average color).
Offline parameters are stored in the MPD as shown in Listing~\ref{d3:mpd}.
@@ -162,6 +162,7 @@ Algorithm~\ref{d3:next-segment} details how our DASH client makes decisions.
\SetKwData{Bw}{bw\_estimation}
\SetKwData{Rtt}{rtt\_estimation}
\SetKwData{Segment}{best\_segment}
\SetKwData{CurrentSegment}{segment}
\SetKwData{Candidates}{candidates}
\SetKwData{AllSegments}{all\_segments}
\SetKwData{DownloadedSegments}{downloaded\_segments}
@@ -171,20 +172,20 @@ Algorithm~\ref{d3:next-segment} details how our DASH client makes decisions.
\SetKwFunction{EstimateNetwork}{estimate\_network\_parameters}
\SetKwFunction{Append}{append}
\Input{Current index $i$, time $t_i$, viewpoint $v(t_i)$, buffer of already downloaded \texttt{segments} $\mathcal{B}_i$, MPD}
\Input{Current index $i$, time $t_i$, viewpoint $v(t_i)$, buffer of already downloaded \texttt{segments} $\mathcal{B}_i$, MPD, utility metric $\mathcal{U}$, streaming policy $\Omega$}
\Output{Next segment $s^{*}_i$ to request, updated buffer $\mathcal{B}_{i+1}$}
\BlankLine{}
(\Bw, \Rtt) \leftarrow{} \EstimateNetwork{}\;
\BlankLine{}
\Candidates\leftarrow{} \AllSegments\newline\makebox[1cm]{}.\Filter{$\Segment\rightarrow\Segment\notin\DownloadedSegments$}\newline\makebox[1cm]{}.\Filter{$\Segment\rightarrow\Segment\in\Frustum$}\;
\Candidates\leftarrow{} \AllSegments\newline\makebox[1cm]{}.\Filter{$\CurrentSegment\rightarrow\CurrentSegment\notin\DownloadedSegments$}\newline\makebox[1cm]{}.\Filter{$\CurrentSegment\rightarrow\CurrentSegment\in\Frustum$}\;
\BlankLine{}
\Segment\leftarrow{} \Argmax{\Candidates, \Segment\rightarrow{} $\Omega\left(\mathcal{U}(\Segment)\right)$}\;
\Segment\leftarrow{} \Argmax{\Candidates, \CurrentSegment\rightarrow{} $\Omega\left(\mathcal{U},\CurrentSegment\right)$}\;
\DownloadedSegments.\Append{\Segment}\;
{\caption{Algorithm to identify the next segment to query\label{d3:next-segment}}}
\end{algorithm}
The most naive way to sequentially optimize the $\mathcal{U}$ is to limit the decision-making to the current viewpoint $v(t_i)$.
A naive way to sequentially optimize the utility $\mathcal{U}$ is to limit the decision-making to the current viewpoint $v(t_i)$.
In that case, the best segment $s$ to request would be the one maximizing $\mathcal{U}(s, v(t_i))$ to simply make a better rendering from the current viewpoint $v(t_i)$.
Due to transmission delay however, this segment will be only delivered at time $t_{i+1}=t_{i+1}(s)$ depending on the segment size and network conditions: \begin{equation*} t_{i+1}(s)=t_i+\frac{\mathtt{size}(s)}{\widehat{BW_i}} + \widehat{\tau_i}\label{d3:eq2}\end{equation*}
@@ -342,13 +343,13 @@ The \texttt{DashLoader} class accepts as parameter a function that will be calle
\subsubsection{Performance}
In JavaScript, there is no way of doing parallel computing without using \emph{web workers}.
\update{In JavaScript, there is no way of doing parallel computing without using \emph{web workers}.}{Javascript requires the use of \emph{web workers} to perform parallel computing.}
A web worker is a script in JavaScript that runs in the background, on a separate thread and that can communicate with the main script by sending and receiving messages.
Since our system has many tasks to perform, it is natural to use workers to manage the streaming without impacting the framerate of the renderer.
However, what a worker can do is very limited, since it cannot access the variables of the main script.
Because of this, we are forced to run the renderer on the main script, where it can access the HTML page, and we move all the other tasks (i.e. the access client, the control engine and the segment parsers) to the worker.
Because of this, we are forced to run the renderer on the main script, where it can access the HTML page, and we move all the other tasks (i.e.\ the access client, the control engine and the segment parsers) to the worker.
Since the main script is the only thread communicating with the GPU, it will still have to update the model with the parsed content it receives from the worker.
Using a worker does not so much improve the framerate of the system, but it reduces the latency that occurs when receiving a new segment, which can be very frustrating since in a single thread scenario, each time a segment is received, the interface freezes for around half a second.
\update{Using a worker does not so much improve the framerate of the system, but it reduces}{We do not use web workers to improve the framerate of the system, but to reduce} the latency that occurs when receiving a new segment, which can be frustrating since in a single thread scenario, each time a segment is received, the interface freezes for around half a second.
A sequence diagram of what happens when downloading, parsing and rendering content is shown in Figure~\ref{d3:sequence}.
\begin{figure}[ht]

View File

@@ -4,7 +4,7 @@ Our work in this chapter started with the question: can DASH be used for NVE\@?
The answer is \emph{yes}.
In answering this question, we contributed by showing how to organize a polygon soup and its textures into a DASH-compliant format that (i) includes a minimal amount of metadata that is useful for the client, (ii) organizes the data to allow the client to get the most useful content first.
We further show that the data organisation and its description with metadata (precomputed offline) is sufficient to design and build a DASH client that is adaptive --- it selectively downloads segments within its view, makes intelligent decisions about what to download, balances between geometry and texture while adapting to network bandwidth.
This way, our system addresses the open problems we mentioned in~\ref{i:challenges}.
This way, our system addresses the open problems we mentioned in Chapter~\ref{i:challenges}.
\begin{itemize}
\item \textbf{It prepares and structures the content in a way that enables streaming}: all this preparation is precomputed, and all the content is structured according to DASH framework, geometry but also materials and textures. Furthermore, textures are prepared in a multi-resolution manner, and even though multi-resolution geometry is not discussed here, the difficulty of integrating it in this system seem moderated: we could encode levels of detail in different representations and define a utility metric for each representation and the system should adapt naturally.

View File

@@ -5,17 +5,17 @@ In our work, we use the \texttt{obj} file format for the polygons, \texttt{png}
The process, however, applies to other formats as well.
\subsection{The MPD File}
In DASH, the information about content storage and characteristics, such as location, resolution, or size, are extracted from an MPD file by the client.
The client relies only on these information to decide which chunk to request and at which quality level.
In DASH, the information about content storage and characteristics, such as location, resolution, or size, is extracted from an MPD file by the client.
The client relies only on this information to decide which chunk to request and at which quality level.
The MPD file is an XML file that is organized into different sections hierarchically.
The \texttt{period} element is a top-level element, which for the case of video, indicates the start time and length of a video chapter.
This element does not apply to NVE, and we use a single \texttt{period} for the whole scene, as the scene is static.
Each \texttt{period} element contains one or more adaptation sets, which describe the alternate versions, formats, and types of media.
We utilize adaptation sets to organize a 3D scene's material, geometry, and texture.
The piece of software that does the preprocessing of the model mostly consists in file manipulation and is written is Rust as well.
The piece of software that does the preprocessing of the model consists in file manipulation and is written in Rust as well.
It successively preprocesses the geometry and then the textures.
The MPD is generated by a library named \href{https://github.com/netvl/xml-rs}{xml-rs} that works like a stack:
The MPD is generated by a library named \href{https://github.com/netvl/xml-rs}{xml-rs} which works like a stack:
\begin{itemize}
\item a structure is created on the root of the MPD file;
\item the \texttt{start\_element} method creates a new child in the XML file;
@@ -33,7 +33,7 @@ A face belongs to a cell if its barycenter falls inside the corresponding boundi
Each cell corresponds to an adaptation set.
Thus, geometry information is spread on adaptation sets based on spatial coherence, allowing the client to download the relevant faces selectively.
A cell is relevant if it intersects the frustum of the client's current viewpoint. Figure~\ref{d3:big-picture} shows the relevant cells in green.
As our 3D content, a virtual environment, is biased to spread the most along the horizontal plane, we alternate between splitting between the two horizontal directions.
As our 3D content, a virtual environment, is biased to spread along the horizontal plane, we alternate between splitting between the two horizontal directions.
We create a separate adaptation set for large faces (e.g., the sky or ground) because they are essential to the 3D model and do not fit into cells.
We consider a face to be large if its area in 3D is more than $a+3\sigma$, where $a$ and $\sigma$ are the average and the standard deviation of 3D area of faces respectively.
@@ -82,7 +82,7 @@ Figure~\ref{d3:textures} illustrates the use of the textures against the renderi
\subsection{Segments}
To allow random access to the content within an adaptation set storing geometry data, we group the faces into segments.
Each segment is then stored as a \texttt{.obj} file that can be individually requested by the client.
Each segment is then stored as a \texttt{.obj} file which can be individually requested by the client.
For geometry, we partition the faces in an adaptation set into sets of $N_s$ faces, by first sorting the faces by their area in 3D space in descending order, and then place each successive $N_s$ faces into a segment.
Thus, the first segment contains the biggest faces and the last one the smallest.
In addition to the selected faces, a segment stores all face vertices and attributes so that each segment is independent.

View File

@@ -27,13 +27,13 @@ We partition the geometry into a k-$d$ tree until the leafs have less than 10000
\end{table}
\subsubsection{User Navigations}
To evaluate our system, we collected realistic user navigation traces that we can replay in our experiments.
To evaluate our system, we collected realistic user navigation traces which we can replay in our experiments.
We presented six users with a web interface, on which the model was loaded progressively as the user could interact with it.
The available interactions were inspired by traditional first-person interactions in video games, i.e., W, A, S, and D keys to translate the camera, and mouse to rotate the camera.
We asked users to browse and explore the scene until they felt they had visited all important regions.
We then asked them to produce camera navigation paths that would best present the 3D scene to a user that would discover it.
To record a path, the users first place their camera to their preferred starting point, then click on a button to start recording.
Every 100ms, the position, viewing angle of the camera and look-at point are saved into an array that will then be exported into JSON format.
Every 100ms, the position, viewing angle of the camera and look-at point are saved into an array which will then be exported into JSON format.
The recorded camera trace allows us to replay each camera path to perform our simulations and evaluate our system.
We collected 13 camera paths this way.
@@ -166,7 +166,7 @@ An online-only utility improves the results, as it takes the user viewing frustu
\end{figure}
Figure~\ref{d3:sorting} shows the effect of grouping the segments in an adaptation set based on their area in 3D.
Clearly, the PSNR significantly improves when the 3D area of faces is considered for creating the segments. Since all segments are of the same size, sorting the faces by area before grouping them into segments leads to a skew distribution of how useful the segments are. This skewness means that the decision that the client makes (to download those with the largest utility first) can make a bigger difference in the quality.
The PSNR significantly improves when the 3D area of faces is considered for creating the segments. Since all segments are of the same size, sorting the faces by area before grouping them into segments leads to a skew distribution of how useful the segments are. This skewness means that the decision that the client makes (to download those with the largest utility first) can make a bigger difference in the quality.
We also compared the greedy vs.\ proposed streaming policy (as shown in Figure~\ref{d3:greedy-weakness}) for limited bandwidth (5 Mbps).
The proposed scheme outperforms the greedy during the first 30s and does a better job overall.
@@ -175,7 +175,7 @@ In the first 30 sec, since there are relatively few 3D contents downloaded, maki
Table~\ref{d3:percentages} shows the distribution of texture resolutions that are downloaded by greedy and our Proposed scheme, at different bandwidths.
Resolution 5 is the highest and 1 is the lowest.
The table clearly shows a weakness of the greedy policy: as the bandwidth increases, the distribution of downloaded textures resolution stays more or less the same.
The table shows a weakness of the greedy policy: \update{as the bandwidth increases, the distribution of downloaded textures resolution stays more or less the same.}{the distributioon of downloaded textures does not adapt to the bandwidth.}
In contrast, our proposed streaming policy adapts to an increasing bandwidth by downloading higher resolution textures (13.9\% at 10 Mbps, vs. 0.3\% at 2.5 Mbps).
In fact, an interesting feature of our proposed streaming policy is that it adapts the geometry-texture compromise to the bandwidth. The textures represent 57.3\% of the total amount of downloaded bytes at 2.5 Mbps, and 70.2\% at 10 Mbps.
In other words, our system tends to favor geometry segments when the bandwidth is low, and favor texture segments when the bandwidth increases.

View File

@@ -3,7 +3,7 @@
In this chapter, we take a little step back from interaction and propose a system with simple interactions that however, addresses most of the open problems mentioned in Section~\ref{i:challenges}.
We take inspiration from video streaming: working on the similarities between video streaming and 3D streaming (seen in~\ref{i:video-vs-3d}), we benefit from the DASH efficiency (seen in~\ref{sote:dash}) for streaming 3D content.
DASH is based on content preparation and structuring which helps not only the streaming policies but also leads to a scalable and efficient system since it moves completely the load from the server to the clients.
A DASH client is simply a client that downloads the structure of the content, and then, depending on its needs independently of the server, decides what to download.
A DASH client downloads the structure of the content, and then, depending on its needs independently of the server, decides what to download.
In this chapter, we show how to mimic DASH video with 3D streaming, and we develop a system that keeps DASH benefits.
Section~\ref{d3:dash-3d} describes our content preparation and metadata, and all the preprocessing that is done to our model to allow efficient streaming.