proof read

This commit is contained in:
Thomas Forgione 2019-09-25 11:51:07 +02:00
parent c491eb308b
commit 47b0ddf540
No known key found for this signature in database
GPG Key ID: 203DAEA747F48F41
13 changed files with 359 additions and 158 deletions

View File

@ -20,7 +20,7 @@ anchorcolor = blue]{hyperref}
\usepackage{todonotes}
\usepackage{booktabs}
\usepackage{etoc, blindtext}
\usepackage{fontspec,fontawesome}
\usepackage{setspace}
\onehalfspacing{}
@ -29,6 +29,23 @@ anchorcolor = blue]{hyperref}
\newcommand{\tikzwidth}{0.95\columnwidth}
\newcommand{\tikzheight}{0.7\columnwidth}
\tikzset{
double arrow/.style args={#1 colored by #2 and #3}{
-stealth,line width=#1,#2, % first arrow
postaction={draw,-stealth,#3,line width=(#1)/3,
shorten <=(#1)/3,shorten >=2*(#1)/3}, % second arrow
}
}
\tikzset{
double ended double arrow/.style args={#1 colored by #2 and #3}{
stealth-stealth,line width=#1,#2, % first arrow
postaction={draw,stealth-stealth,#3,line width=(#1)/3,
shorten <=(#1)/3+1.5,shorten >=2*(#1)/3}, % second arrow
}
}
\addtokomafont{chapterprefix}{\raggedleft}
\addtokomafont{chapter}{\fontsize{30}{38}\selectfont}
\addtokomafont{section}{\huge}

View File

@ -1,34 +1,19 @@
\fresh{}
\section{Introduction}
In the previous chapter, we discussed the theoritical aspects of 3D streaming based on DASH\@.
In the previous chapter, we discussed the theoritical aspects of DASH based 3D streaming\@.
We showed different ways of structuring and downloading content, and we evaluated the parameters.
In this chapter, we detail every aspect of the implementation of the DASH-3D client, from the way segments are downloaded to how they are rendered.
Implementation was noticeably absent from the previous chapter, and this is why, in this chapter, we detail every aspect of the implementation of the DASH-3D client, from the way segments are downloaded to how they are rendered.
All DASH clients are built from the same basic bricks, as shown in Figure~\ref{d3i:dash-scheme}:
\begin{itemize}
\item the \emph{access client}, which is the part that deals with making requests and receiving responses;
\item the \emph{segment parser}, which decodes the data downloaded by the access client;
\item the \emph{access client}, which is the module that deals with making requests and receiving responses;
\item the \emph{segment parsers}, which decodes the data downloaded by the access client, whether it be materials, geometry or textures;
\item the \emph{control engine}, which analyses the bandwidth to dynamically adapt to it;
\item the \emph{media engine}, which renders the multimedia content to the screen and the user interface.
\item the \emph{media engine}, which renders the multimedia content and the user interface to the screen.
\end{itemize}
\tikzset{
double arrow/.style args={#1 colored by #2 and #3}{
-stealth,line width=#1,#2, % first arrow
postaction={draw,-stealth,#3,line width=(#1)/3,
shorten <=(#1)/3,shorten >=2*(#1)/3}, % second arrow
}
}
\tikzset{
double ended double arrow/.style args={#1 colored by #2 and #3}{
stealth-stealth,line width=#1,#2, % first arrow
postaction={draw,stealth-stealth,#3,line width=(#1)/3,
shorten <=(#1)/3+1.5,shorten >=2*(#1)/3}, % second arrow
}
}
\begin{figure}[ht]
\centering
\begin{tikzpicture}
@ -88,13 +73,13 @@ All DASH clients are built from the same basic bricks, as shown in Figure~\ref{d
\draw[double arrow=5pt colored by RoyalBlue and white] (1.625, 4.5) -- (1.625, 5.75);
\end{tikzpicture}
\caption{Scheme of a server and a DASH client\label{d3i:dash-scheme}}
\caption{DASH client-server architecture\label{d3i:dash-scheme}}
\end{figure}
We want to have two implementations of such a client:
We want to have two implementations of such a client.
\begin{itemize}
\item \textbf{one in JavaScript}, so we can easily have demos and conduct user-studies with real users trying the real interface on desktop or mobile devices (this implementation is detailed in Section~\ref{d3i:js-implementation});
\item \textbf{one in Rust}, so we can easily run simulations with maximum performance to be able to compare different setups or parameters with more precision (this implementation is detailed in Section~\ref{d3i:rust-implementation}).
\item \textbf{We need a client available on the web}, so we can easily have demos and conduct user-studies with real users trying the real interface on desktop or mobile devices (this implementation is detailed in Section~\ref{d3i:js-implementation}). This client is written in JavaScript.
\item \textbf{We need a client that compiles to native code}, so we can easily run simulations with maximum performance to be able to compare different setups or parameters with more precision (this implementation is detailed in Section~\ref{d3i:rust-implementation}). This client is written in Rust.
\end{itemize}
Our implementation also contains the software that preprocess the model and structures it in a DASH manner.

View File

@ -2,21 +2,21 @@
\copied{}
Our work in this chapter started with the question: can DASH be used for NVE\@?
The answer is \textit{yes}.
The answer is \emph{yes}.
In answering this question, we contributed by showing how to organize a polygon soup and its textures into a DASH-compliant format that (i) includes a minimal amount of metadata that is useful for the client, (ii) organizes the data to allow the client to get the most useful content first.
We further show that these metadata that is precomputed offline is sufficient to design and build a DASH client that is adaptive --- it can selectively download segments within its view, make intelligent decisions about what to download, balancing between geometry and texture while being adaptive to network bandwidth.
\fresh{}
Exploiting DASH's concepts to design 3D streaming systems allow us to tackle some of the issues that were raised in the previous chapter.
This way, our system answers, at least partially, all the open problems we mentionned in~\ref{i:challenges}.
\begin{itemize}
\item \textbf{It has built-in support for materials and textures}: we use a DASH adaptation set for the materials, and the average color of textures are given in the MPD, meaning that a client is not forced to render everything in white while not having the texture for the materials.
\item \textbf{It doesn't require any computation on the server side}: the only computation required is preprocessing the model and creating metadata to allow a client make smart decisions, once those precomputations are done, the artifacts can be deployed to a static server like Apache or nginx and all the computation lod is deported to the client, making this solution scalable.
\item \textbf{It has support for multi-resolution}: in our implementation, we use multi-resolution textures, and even though multi-resolution geometry is not implemented yet, the challenge here lies more on the compression side than on the streaming side. Once a portion of geometry is encoded into different levels of details, we just have to create representations and segments for those levels and define their corresponding utility.
\item \textbf{Performance has been taken into consideration}, and even though the many textures and the heterogeneity of our model prevents us from reaching optimal performances, we still manage to restructure the content on the client side to have a decent framerate.
\item \textbf{It prepares and structures the content in a way that enables streaming}: all this preparation is precomputed, and all the content is structured, even materials and textures. Furthermore, textures are prepared in a multi-resolution manner, and even though multi-resolution geometry is not discussed here, the difficulty of integrating it in this system seem moderated: we could encode levels of detail in different representations and define a utility metric for each representation and the system should adapt naturally.
\item \textbf{We are able to estimate the utility of each segment} by exploiting all the metadata given in the MPD and by analysing the camera parameters of the user.
\item \textbf{We proposed a few streaming policies}, from the easiest to implement to the more complex, that are able to exploit the utility metrics we defined in order to guess the best decision.
\item \textbf{The implementation is efficient}: the content preparation allows a client to get all the information it needs from metadata and the server has nothing else to do than to serve files. Special attention has been granted to the client's performance, and explanations are given in Chapter~\ref{d3i}.
\end{itemize}
However, the work described in this chapter does not take any quality of experience aspects into account.
We designed a 3D streaming system, but did not consider interaction at all, even though we acknowledged it is a critical aspect for 3D streaming in Chapter~\ref{bi}.
However, the work described in this chapter does not take any quality of experience metrics into account.
We designed a 3D streaming system, but we kept the interaction system the simplest possible.
Dealing with interaction while dealing with all of the other problems we try to solve seems hard, and we believe keeping the interaction simple was a necessary step to build a solid 3D streaming system.
Now that we have this system, we are able to work again on the interaction problem and our work and conclusions are given in Chapter~\ref{sb}.
% We believe our proposed DASH for NVE is flexible enough for the community to start the simplicity and ease of deployment of DASH for NVE and to start investigating different streaming strategies to improve the quality of experience of NVE users.

View File

@ -1,10 +1,12 @@
\fresh{}
\section{Introduction}
In Section~\ref{i:video-vs-3d}, we presented the similarities and differences between video and 3D.
We higlighted the fact that knowledge about video streaming is helpful to design a 3D streaming system.
We also presented the main concepts of DASH (Dynamic Adaptive Streaming of HTTP) in Section~\ref{sote:dash}.
DASH is made to be content agnostic, and even though it is almost only applied for video streaming nowadays, we believe it is still suitable for 3D streaming.
In this chapter, we show our work on adapting DASH for 3D streaming.
In this chapter, we take a little step back from interaction and propose a system with very basic interaction but that answers most of the open problems mentionned in Section~\ref{i:challenges}.
We take massive inspiration from video streaming, since we have seen in Section~\ref{i:video-vs-3d} how related video streaming and 3D streaming are and how DASH, the standard for video streaming, is so efficient in Section~\ref{sote:dash}.
DASH is based on content preparation and structuring which helps not only the streaming policies that rely on it but also the performance of the system since it removes completely the load on the server side.
A DASH client is simply a client that downloads the structure of the content, and then, depending on its needs, decide what to download by itself.
In this chapter, we show how, by mimicking DASH with 3D streaming, we develop a system that keeps those benefits.
Section~\ref{d3:dash-3d} describes our content preparation, and all the preprocessing that is done to our model to allow efficient streaming.
Section~\ref{d3:dash-client} gives possible implementations of clients that exploit the content structure.
Section~\ref{d3:evaluation} evaluates the impact of the different parameters that appear both in the content preparation and the clients.

View File

@ -3,14 +3,22 @@
Before talking about 3D streaming, we need to define what is a 3D model and how it is rendered.
\subsection{Content of a 3D model}
A 3D model consists in a set of data.
A 3D model consists in 3D points (that are called \emph{vertices}), texture coordinates, nomals, faces, materials and textures.
The Wavefront OBJ is probably the best to give an introduction to 3D models since it describes all these elements.
\begin{itemize}
\item \textbf{Vertices} are simply 3D points;
\item \textbf{Faces} are polygons defined from vertices (most of the time, they are triangles);
\item \textbf{Textures} are images that can be applied to faces;
\item \textbf{Texture coordinates} are information added to a face to describe how the texture should be applied on a face;
\item \textbf{Normals} are 3D vectors that can give information about light behaviour on a face.
\end{itemize}
The Wavefront OBJ is probably the best format to give an example of 3D model since it describes all these elements in text format.
A 3D model encoded in the OBJ format typically consists in two files: the materials file (\texttt{.mtl}) and the object file (\texttt{.obj}).
\paragraph{}
The materials file declare all the materials that the object file will reference.
Each material has a name, ambient, diffuse and specular colors, as well as texture maps.
Each material has a name, and can have photometric properties such as ambient, diffuse and specular colors, as well as texture maps.
A simple material file is visible on Listing~\ref{i:mtl}.
\paragraph{}
@ -26,7 +34,7 @@ Faces are declared by using the indices of these elements. A face is a polygon w
\item \texttt{f 1/1/1 2/3/3 3/4/4} defines a triangle with both texture coordinates and normals.
\end{itemize}
It can include materials from a material file (\texttt{mtllib path.mtl}) and use apply it to faces.
It can include materials from a material file (\texttt{mtllib path.mtl}) and apply the materials that it declares to faces.
A material is applied by using the \texttt{usemtl} keyword, followed by the name of the material to use.
The faces declared after a \texttt{usemtl} are painted using the material in question.
An example of object file is visible on Listing~\ref{i:obj}.
@ -67,14 +75,14 @@ To understand how performance is impacted by the structure of the model, we need
However, due to the way the materials and textures work, we are forced to call \texttt{glDrawArray} at least as many times as there are materials in the model.
Minimizing the numbers of materials used in a 3D model is thus critical for rendering performances.
However, frustum culling can be used to improve the performance of rendering.
Another way to improve the performance of rendering is \textbf{frustum culling}.
Frustum culling is a technique that consists in avoiding drawing objects that are not in the field of view of the user's camera.
Frustum culling is efficient when they are many objects in a scene since it gives potential for skips.
It is particularly efficient when they are many objects in a scene since it gives potential for skips.
These two aspects are somehow contradictory, and to have greatest performance for 3D rendering, one must ensure that:
\begin{itemize}
\item the least amount of materials are used, and most objects that share materials are drawn together in a single \texttt{glDrawArray} call;
\item objects are not all drawn together and regrouped by location to keep the frustum culling efficient.
\item objects are not all drawn together and grouped together depending on their location to keep the frustum culling efficient.
\end{itemize}

View File

@ -1,6 +1,6 @@
\fresh{}
\section{Open problems}
\section{Open problems\label{i:challenges}}
The objective of our work is to design a system that allows a user to access remote 3D content that guarantees good quality of service and quality of experience.
A 3D streaming client has lots of tasks to accomplish:
@ -13,25 +13,15 @@ A 3D streaming client has lots of tasks to accomplish:
\item manage the interaction with the user.
\end{itemize}
To add to the challenge, we hope to have a client that would be portable between devices, and that would work easily on both desktop and mobile setups.
We have to setup a client-server architecture while respecting constraints required for performant software:
\begin{itemize}
\item \textbf{for the server}, since a server must serve many clients, and a solution that requires even low computation on the server will scale difficultly;
\item \textbf{for the client}, since the end user will use his own device, whether it be an old computer or a mobile device, and has many other things to do than simply downloading content.
\end{itemize}
This opens multiple problems that we need to take care of.
\subsection{Content preparation}
Any preprocessing that can be done on our 3D data gives us an advantage since it consists in computations that will not be needed live, neither for the server nor for the client.
Furthermore, for streaming, data needs to be split into chunks that are requested separately, so perparing those chunks in advance also gives us an advantage during the streaming.
Any preprocessing that can be done on our 3D data gives us a strategical advantage since it consists in computations that will not be needed live, neither for the server nor for the client.
Furthermore, for streaming, data needs to be split into chunks that are requested separately, so perparing those chunks in advance can also help the streaming.
\subsection{Chunk utility}
Once our content is prepared and split in chunks, we need to be able to rate those chunks depending to the user position.
A chunk that contains data in the field of view of the user must have a higher score that a chunk outside of it; a chunk that is close to the camera must have a higher score that a chunk far away from the camera, etc\ldots
Once our content is prepared and split in chunks, we need to be able to rate those chunks depending on the user's position.
A chunk that contains data in the field of view of the user should have a higher score than a chunk outside of it; a chunk that is close to the camera should have a higher score than a chunk far away from the camera, etc\ldots
\subsection{Streaming policies}
Rating the chunks is not enough, there are other parameters that need to be taken into account.
@ -40,12 +30,18 @@ The user interaction can also be taken into account: if we are able to perdict t
This is why we need to define streaming policies that rely on chunk utilities to determine the optimal segments to download.
\subsection{Evaluation}
All the problems mentionned earlier yield many ideas and solutions that have different parameters.
We need to compare the different options we have for each of the previous problems, and evaluate their impact in terms of Quality of Service and Quality of Exeperience.
All the problems mentionned earlier yield many ideas and solutions with different parameters.
We need to compare the different options we have for each of the previous problem, and evaluate their impact in terms of quality of service and quality of exeperience.
\subsection{Implementation}
Implementation is a key problem for our system.
We need to write software that answers the problems mentionned earlier (content preparation, chunk utility, streaming policies) as well as developing a client that performs 3D rendering in an efficient manner in order to leave resources to the other tasks that are needed.
Furthermore, since we want to be able to evaluate our systems, user studies are required and using web technologies are what make this possible.
We have to setup a client-server architecture that answers the problems mentionned earlier (content preparation, chunk utility, streaming policies).
This implementation must respect constraints required for performant software:
\begin{itemize}
\item \textbf{for the server}, since a server must serve many clients, and a solution that requires even low computational load on the server will scale difficultly;
\item \textbf{for the client}, since the end user will use his own device, whether it be an old computer or a mobile device, and the implementation must be efficient enough to leave resources (such as CPU or memory) for the other tasks it has to accomplish.
\end{itemize}
Furthermore, since we want to be able to evaluate our systems, user studies are required and using web technologies is a way to simplify this task.
Therefore, part of our software needs to be runnable from a web browser.

View File

@ -1,26 +1,27 @@
\section{Thesis outline}
First, in Chapter~\ref{sote}, we present a review of the state of the art on the fields that are interesting for us.
This chapter start by video streaming.
Then it reviews the different manner of performing 3D streaming.
First, in Chapter~\ref{sote}, we present a review of the state of the art on the fields that we are interesting in.
This chapter start by analysing the standards of video streaming.
Then it reviews the different manners of performing 3D streaming.
The last section of this chapter focuses on 3D interaction.
Then, in Chapter~\ref{bi}, we analyse the impact of the UI on navigation and streaming in a 3D scene.
We first present a user study that we conducted on 50 people that shows that adding 3D objects behaving as navigation aid can have great impact on how easy it is for a user to perform tasks such as finding objects.
We then setup a basic 3D streaming system that allows us to replay the traces collected during the user study and simulate 3D streaming at the same time.
Finally, we analyse how the presence of navigation aid objects impacts the streaming, and we propose and evaluate a few techniques that rely on these objects and that can increase the quality of experience.
We first develop a basic interface for navigating in 3D and we introduce 3D objects called \emph{bookmarks} that help users navigate in the scene.
We then present a user study that we conducted on 50 people that shows that bookmarks have a great impact on how easy it is for a user to perform tasks such as finding objects.
Then, we setup a basic 3D streaming system that allows us to replay the traces collected during the user study and simulate 3D streaming at the same time.
Finally, we analyse how the presence of bookmarks impacts the streaming, and we propose and evaluate a few streaming policies that rely on precomputations that can be made thanks to bookmarks and that can increase the quality of experience.
In Chapter~\ref{d3}, we develop the most important contribution of this thesis: DASH-3D.
DASH-3D is an adaptation of the DASH standard for video streaming.
We first describe how we adapt the concepts of DASH to 3D content.
We then present a client that can benefit for the DASH format.
DASH-3D is an adaptation of the DASH standard for 3D streaming.
We first describe how we adapt the concepts of DASH to 3D content, including the segmentation of content in \emph{segments}.
We then define utilty metrics that associates score to each segment depending on the camera's position.
Then, we present a client and various streaming policies based on our utilities that can benefit from the DASH format.
We finally evaluate the different parameters of our client.
In Chapter~\ref{d3i}, we explain the whole implementation of DASH-3D.
Implementating DASH-3D required a lot of effort, and since both user studies and simulations are required, we describe the two clients we implemented: one client using web technologies to enable easy user studies and one client native that allows us to run efficient simulations and precisely compare the impact of the parameters of our system.
Implementating DASH-3D required a lot of effort, and since both user studies and simulations are required, we describe the two clients we implemented: one client using web technologies to enable easy user studies and one client that is compiled to native code and that allows us to run efficient simulations and precisely compare the impact of the parameters of our system.
In Chapter~\ref{sb}, we integrate back the interaction ideas that we developed in Chapter~\ref{bi} into DASH-3D.
We first propose a new style of navigation aid, and we then explain how simply reusing the ideas from Chapter~\ref{bi} is not sufficient.
We then explain a more efficient way of applying those ideas.
Finally, we present a user study that provides us with traces on which we can perform simulations.
We evaluate the impact of our extension of DASH-3D on the quality of service and on the quality of experience.
We first develop an interface that allows desktop as well as mobile devices to navigate in a 3D scene being streamed, and that introduces a new style of bookmarks.
We then explain why simply applying the ideas developed in Chapter~\ref{bi} is not sufficient and we propose more efficient precomputations that can enhance the streaming.
Finally, we present a user study that provides us with traces on which we can perform simulations, and we evaluate the impact of our extension of DASH-3D on the quality of service and on the quality of experience.\todo{maybe only qos here}

View File

@ -9,14 +9,18 @@ Analyzing the similarities and the differences between the video and the 3D scen
One of the main differences between video and 3D streaming is the persistence of data.
In video streaming, only one second of video is required at a time.
Of course, most of video streaming services prefetch some future chunks, and keep in cache some previous ones.
Of course, most of video streaming services prefetch some future chunks, and keep in cache some previous ones, but a minimal system could work without latency and keep in memory only two chunks: the current one and the next one.
In 3D streaming, each chunk is part of a scene, and not only many chunks are required to perform a satisfying rendering for the user, but it is impossible to know in advance what chunks are necessary to perform a rendering.
In 3D streaming, each chunk is part of a scene, and already a few problems appear here:
\begin{itemize}
\item depending on the user's field of view, many chunks may be required to perform a single rendering;
\item chunks do not become obsolete the way they do in video, a user navigating in a 3D scene may come back to a same spot after some time, or see the same objects but from elsewhere in the scene.
\end{itemize}
\subsection{Multiresolution}
All the major video streaming platforms support multiresolution streaming.
This means that a client can choose the resolution at which the user requests the content.
This means that a client can choose the resolution at which it requests the content.
It can be chosen directly by the user or automatically determined by analysing the available resources (size of the screen, downoading bandwidth, device performances, etc\ldots)
\begin{figure}[th]
@ -35,20 +39,20 @@ In both cases, an algorithm for content streaming has to acknowledge those diffe
In video streaming, most of the data (in terms of bytes) is used for images.
Thus, the most important thing a video streaming system should do is optimize the image streaming.
That's why, on a video on Youtube for example, there may be 6 resolutions for images (144p, 240p, 320p, 480p, 720p and 1080p) but having only 2 resolutions for sound.
That's why, on a video on Youtube for example, there may be 6 resolutions for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 resolutions for sound.
This is one of the main differences between video and 3D streaming: in a 3D scene, the geometry and the texture size are approximately the same, and work to improve the streaming needs to be performed on both.
\subsection{Chunks of data}
In order to be able to perform streaming, data needs to be segmented in order for a client to be able to request chunks of data and display it to the user while requesting another chunk.
In order to be able to perform streaming, data needs to be segmented so that a client can request chunks of data and display it to the user while requesting another chunk.
In video streaming, data chunks typically consist in a few seconds of video.
In mesh streaming, it can either by segmenting faces in chunks, with a certain number of faces per chunk, or, in the case of progressive meshes, it can be segmented in a base mesh and different chunks encoding the data needed to increase the resolution of the previous level of detail.
In mesh streaming, it can either by segmenting faces in chunks, with a certain number of faces per chunk, or, in the case of progressive meshes, it can be segmented in a chunk containing the base mesh and different chunks encoding the data needed to increase the resolution of the previous level of detail.
\subsection{Interaction}
The ways of interacting with the content is probably the most important difference between video and 3D.
In a video interface, there is only one degree of freedom: the time.
The only things a user can do is watch the video (without interacting), pause or resume it, or jump to another moment in the video.
The only thing a user can do is let the video play itself, pause or resume it, or jump to another moment in the video.
Even though these interactions seem easy to handle, giving the best possible experience to the user is already challenging. For example, to perform these few actions, Youtube gives the user multiple options.
\begin{itemize}
@ -72,6 +76,192 @@ Even though these interactions seem easy to handle, giving the best possible exp
\end{itemize}
There are even ways of controlling the other options, for example, \texttt{F} puts the player in fullscreen mode, up and down arrows changes the sound volume, \texttt{M} mutes the sound and \texttt{C} activates the subtitles.
All the interactions are summmed up in Figure~\ref{i:youtube-keyboard}.
\newcommand{\relativeseekcontrol}{LightBlue}
\newcommand{\absoluteseekcontrol}{LemonChiffon}
\newcommand{\playpausecontrol}{Pink}
\newcommand{\othercontrol}{PalePaleGreen}
\newcommand{\keystrokescale}{0.6}
\newcommand{\tuxlogo}{\FA\symbol{"F17C}}
\newcommand{\keystrokemargin}{0.1}
\newcommand{\keystroke}[5]{%
\draw[%
fill=white,
drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
rounded corners=2pt,
inner sep=1pt,
line width=0.5pt,
font=\scriptsize\sffamily,
minimum width=0.1cm,
minimum height=0.1cm,
] (#1+\keystrokemargin, #3+\keystrokemargin) rectangle (#2-\keystrokemargin, #4-\keystrokemargin);
\node[align=center] at ({(#1+#2)/2}, {(#3+#4)/2}) {#5\strut};
}
\newcommand{\keystrokebg}[6]{%
\draw[%
fill=#6,
drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
rounded corners=2pt,
inner sep=1pt,
line width=0.5pt,
font=\scriptsize\sffamily,
minimum width=0.1cm,
minimum height=0.1cm,
] (#1+\keystrokemargin, #3+\keystrokemargin) rectangle (#2-\keystrokemargin, #4-\keystrokemargin);
\node[align=center] at ({(#1+#2)/2}, {(#3+#4)/2}) {#5\strut};
}
\begin{figure}[ht]
\centering
\begin{tikzpicture}[scale=\keystrokescale, every node/.style={scale=\keystrokescale}]
% Escape key
\keystroke{0}{1}{-0.75}{0}{ESC};
% F1 - F4
\begin{scope}[shift={(1.5, 0)}]
\foreach \key/\offset in {F1/1,F2/2,F3/3,F4/4}
\keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
\end{scope}
% F5 - F8
\begin{scope}[shift={(6,0)}]
\foreach \key/\offset in {F5/1,F6/2,F7/3,F8/4}
\keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
\end{scope}
% F9 - F12
\begin{scope}[shift={(10.5,0)}]
\foreach \key/\offset in {F9/1,F10/2,F11/3,F12/4}
\keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
\end{scope}
% Number rows
\foreach \key/\offset in {`/0,-/11,=/12,\textbackslash/13}
\keystroke{\offset}{1+\offset}{-1.75}{-1}{\key};
\foreach \key/\offset in {1/1,2/2,3/3,4/4,5/5,6/6,7/7,8/8,0/9,0/10}
\keystrokebg{\offset}{1+\offset}{-1.75}{-1}{\key}{\absoluteseekcontrol};
% Delete char
\keystroke{14}{15.5}{-1.75}{-1}{DEL};
% Tab char
\keystroke{0}{1.5}{-2.5}{-1.75}{Tab};
% First alphabetic row
\begin{scope}[shift={(1.5,0)}]
\foreach \key/\offset in {Q/0,W/1,E/2,R/3,T/4,Y/5,U/6,I/7,O/8,P/9,[/10,]/11}
\keystroke{\offset}{1+\offset}{-2.5}{-1.75}{\key};
\end{scope}
% Caps lock
\keystroke{0}{1.75}{-3.25}{-2.5}{Caps};
% Second alphabetic row
\begin{scope}[shift={(1.75,0)}]
\foreach \key/\offset in {A/0,S/1,D/2,G/4,H/5,;/9,'/10}
\keystroke{\offset}{1+\offset}{-3.25}{-2.5}{\key};
\keystrokebg{3}{4}{-3.25}{-2.5}{F}{\othercontrol}
\keystrokebg{6}{7}{-3.25}{-2.5}{J}{\relativeseekcontrol};
\keystrokebg{7}{8}{-3.25}{-2.5}{K}{\playpausecontrol};
\keystrokebg{8}{9}{-3.25}{-2.5}{L}{\relativeseekcontrol};
\end{scope}
% Enter key
\draw[%
fill=white,
drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
rounded corners=2pt,
inner sep=1pt,
line width=0.5pt,
font=\scriptsize\sffamily,
minimum width=0.1cm,
minimum height=0.1cm,
] (13.6, -1.85) -- (15.4, -1.85) -- (15.4, -3.15) -- (12.85, -3.15) -- (12.85, -2.6) -- (13.6, -2.6) -- cycle;
\node[right] at(12.85, -2.875) {Enter $\hookleftarrow$};
% Left shift key
\keystroke{0}{2.25}{-4}{-3.25}{$\Uparrow$ Shift};
% Third alphabetic row
\begin{scope}[shift={(2.25,0)}]
\foreach \key/\offset in {Z/0,X/1,V/3,B/4,N/5, /7,./8,\slash/9}
\keystroke{\offset}{1+\offset}{-4}{-3.25}{\key};
\keystrokebg{2}{3}{-4}{-3.25}{C}{\othercontrol};
\keystrokebg{6}{7}{-4}{-3.25}{M}{\othercontrol};
\end{scope}
% Right shift key
\keystroke{12.25}{15.5}{-4}{-3.25}{$\Uparrow$ Shift};
% Last keyboard row
\keystroke{0}{1.25}{-4.75}{-4}{Ctrl};
\keystroke{1.25}{2.5}{-4.75}{-4}{\tuxlogo};
\keystroke{2.5}{3.75}{-4.75}{-4}{Alt};
\keystrokebg{3.75}{9.75}{-4.75}{-4}{}{\playpausecontrol};
\keystroke{9.75}{11}{-4.75}{-4}{Alt};
\keystroke{11}{12.25}{-4.75}{-4}{\tuxlogo};
\keystroke{12.25}{13.5}{-4.75}{-4}{}
\keystroke{13.5}{15.5}{-4.75}{-4}{Ctrl};
% Arrow keys
\keystrokebg{16}{17}{-4.75}{-4}{$\leftarrow$}{\relativeseekcontrol};
\keystrokebg{17}{18}{-4.75}{-4}{$\downarrow$}{\othercontrol};
\keystrokebg{18}{19}{-4.75}{-4}{$\rightarrow$}{\relativeseekcontrol};
\keystrokebg{17}{18}{-4}{-3.25}{$\uparrow$}{\othercontrol};
% Numpad
\keystroke{19.5}{20.5}{-1.75}{-1}{Lock};
\keystroke{20.5}{21.5}{-1.75}{-1}{/};
\keystroke{21.5}{22.5}{-1.75}{-1}{*};
\keystroke{22.5}{23.5}{-1.75}{-1}{-};
\keystrokebg{19.5}{20.5}{-2.5}{-1.75}{7}{\absoluteseekcontrol};
\keystrokebg{20.5}{21.5}{-2.5}{-1.75}{8}{\absoluteseekcontrol};
\keystrokebg{21.5}{22.5}{-2.5}{-1.75}{9}{\absoluteseekcontrol};
\keystrokebg{19.5}{20.5}{-3.25}{-2.5}{4}{\absoluteseekcontrol};
\keystrokebg{20.5}{21.5}{-3.25}{-2.5}{5}{\absoluteseekcontrol};
\keystrokebg{21.5}{22.5}{-3.25}{-2.5}{6}{\absoluteseekcontrol};
\keystrokebg{19.5}{20.5}{-4}{-3.25}{1}{\absoluteseekcontrol};
\keystrokebg{20.5}{21.5}{-4}{-3.25}{2}{\absoluteseekcontrol};
\keystrokebg{21.5}{22.5}{-4}{-3.25}{3}{\absoluteseekcontrol};
\keystrokebg{19.5}{21.5}{-4.75}{-4}{0}{\absoluteseekcontrol};
\keystroke{21.5}{22.5}{-4.75}{-4}{.};
\keystroke{22.5}{23.5}{-3.25}{-1.75}{+};
\keystroke{22.5}{23.5}{-4.75}{-3.25}{$\hookleftarrow$};
\end{tikzpicture}
\vspace{0.5cm}
% Legend
\begin{tikzpicture}[scale=\keystrokescale]
\keystrokebg{7}{8}{-6}{-5}{}{\absoluteseekcontrol};
\node[right=0.2cm] at (7.5, -5.55) {Absolute seek keys};
\keystrokebg{7}{8}{-7}{-6}{}{\relativeseekcontrol};
\node[right=0.2cm] at (7.5, -6.55) {Relative seek keys};
\keystrokebg{7}{8}{-8}{-7}{}{\playpausecontrol};
\node[right=0.2cm] at (7.5, -7.55) {Play or pause keys};
\keystrokebg{7}{8}{-9}{-8}{}{\othercontrol};
\node[right=0.2cm] at (7.5, -8.55) {Other keys};
\end{tikzpicture}
\caption{Youtube shortcuts\label{i:youtube-keyboard}}
\end{figure}
Those interactions are different if the user is using a mobile device.
\begin{itemize}
@ -84,26 +274,26 @@ Those interactions are different if the user is using a mobile device.
\end{itemize}
\end{itemize}
When interacting with a 3D model, there are many approches.
Some interfaces mimic the video scenario, where the only variable is the time and the user has no control on the camera.
These interfaces are not interactive, and can be frustrating to the user if he does not feel free.
When it comes to 3D, there are many approaches to manage user interaction.
Some interfaces mimic the video scenario, where the only variable is the time and the camera follows a predetermined path on which the user has no control.
These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
Some other interfaces add 2 degrees of freedom to the previous one: the user does not control the position of the camera but he can control the angle. This mimics the scenario of the 360 video.
Some other interfaces add 2 degrees of freedom to the previous one: the user does not control the position of the camera but he can control the angle. This mimics the scenario of the 360 video.
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angle (assuming the up vector is unchangeable, some interfaces might allow that giving a sixth degree of freedom).
\subsection{Relationship between interface, interaction and streaming}
In both video and 3D streaming, streaming affects the interaction.
For example, in a video streaming scenario, if a user sees that the video is fully loaded, he might start moving around on the timeline, but if he sees that the streaming is just enough to not stall, he would most likely stay peaceful and just watch the video.
In both video and 3D systems, streaming affects the interaction.
For example, in a video streaming scenario, if a user sees that the video is fully loaded, he might start moving around on the timeline, but if he sees that the streaming is just enough to not stall, he might prefer staying peaceful and just watch the video.
If the streaming stalls for too long, the user migth seek somewhere else hoping for the video to resume, or totally give up and leave the video.
The same types of behaviour occur in 3D streaming, if a user is somewhere in a scene, and sees more data appearing, he might wait until enough data has arrived, but if he sees nothing happens, he would most likely leave to look for data somewhere else.
The same types of behaviour occur in 3D streaming: if a user is somewhere in a scene, and sees more data appearing, he might wait until enough data has arrived, but if he sees nothing happens, he might leave to look for data somewhere else.
Those examples show how streaming can affect the interaction, but the interaction also affect the streaming.
Those examples show how streaming can affect the interaction, but the interaction also affects the streaming.
In a video streaming scenario, if a user is watching peacefully without interacting, the system just has to request the next chunks of video and display them.
However, if a user starts seeking at a different time of the streaming, the streaming would most likely stall until the system is able to gather the data it needs to resume the video.
Just like in the video setup, the way a user navigates in a networked virtual environment affects the streaming.
Moving slowly allows the system to collect and display data to the user, whereas moving frenetically puts more pressure on the streaming: the data that the system requested might be obsolete when the response arrives.
Moving slowly allows the system to collect and display data to the user, whereas moving frenetically puts more pressure on the streaming: the data that the system requested may be obsolete when the response arrives.
Morevoer, the interface and the way elements are displayed to the user also impacts his behaviour.
A streaming system can use this effect to its users benefit by providing feedback on the streaming to the user via the interace.
@ -113,15 +303,11 @@ A user is more likely to click on the light grey part of the timeline that on th
\begin{figure}[th]
\centering
\begin{tikzpicture}
\node (S) at (0, 0) [minimum width=1.5cm,minimum height=0.5cm] {};
\node (I) at (-1.5, -1.5) [minimum width=1.5cm,minimum height=0.5cm] {};
\node (U) at (1.5, -1.5) [minimum width=1.5cm,minimum height=0.5cm] {};
\node at (S) {Streaming};
\node at (I) {Interface};
\node at (U) {User};
\draw[{Latex[length=3mm]}-{Latex[length=3mm]}] (S) -- (I);
\draw[{Latex[length=3mm]}-{Latex[length=3mm]}] (S) -- (U);
\draw[{Latex[length=3mm]}-{Latex[length=3mm]}] (I) -- (U);
\node (S) at (0, 0) [draw, rectangle, minimum width=2cm,minimum height=1cm] {Streaming};
\node (I) at (-2, -3) [draw, rectangle, minimum width=2cm,minimum height=1cm] {Interface};
\node (U) at (2, -3) [draw, rectangle, minimum width=2cm,minimum height=1cm] {User};
\draw[double ended double arrow=5pt colored by black and white] (S) -- (I);
\draw[double ended double arrow=5pt colored by black and white] (S) -- (U);
\draw[double arrow=5pt colored by black and white] (I) -- (U);
\end{tikzpicture}
\end{figure}

View File

@ -7,17 +7,16 @@ It allowed us to understand the problems linked to the dynamicity of both the us
\begin{itemize}
\item Navigating in a 3D scene can be complex, due to the many degrees of freedom, and tweaking the interface can increase the user's Quality of Experience.
\item The tweaks operated on the interface may have a drawback on the streaming aspect of the system.
\item Depending on how the interface is tweaked, the behaviour of the users may change and heuristics can be determined to benefit from this.
\item Adding bookmarks to the interface can have a drawback on the quality of service of the system.
\item Having bookmarks in the scene biases the users nagivation and make them more predictible: it is then possible to precompute data from bookmarks in order to benefit from this predictability.
\end{itemize}
However, the system described in this chapter has some drawbacks:
However, the system described in this chapter has some drawbacks and fails to answer some of the problems we mentionned in Section~\ref{i:challenges}.
\begin{itemize}
\item \textbf{It doesn't support materials and textures}: these elements are downloaded at the beginning of the interaction, and since they can have a massive size, this solution is not satisfactory for a system streaming an NVE\@.
\item \textbf{It still requires a heavy load on the server side}: even though the server is not performing online rendering of the scene, it still has to perform frustum and backface culling to find the faces to send to the client, and it also has to keep track of what each client has already downloaded, and what remains to be downloaded.
\item \textbf{No multi-resolution techniques are used}: in modern 3D streaming, mutli-resolution is a must-have. It prevents the user from waiting until all the data is arrived while still having a global, lower-resolution view of the content he's trying to access.
\item \textbf{The performance of the rendering has not been taken into account}: of course, a system for navigating in 3D scenes must have a sufficient framerate to guarantee a good Quality of Experience for users, and this chapter does not tackle at any point the difficulty to have many tasks to do at the same time (downloading data, uploading the OpenGL buffers, managing the user interaction, rendering the scene, etc\ldots).
\item \textbf{The content preparation and chunk utility is almost inexistant}: the server knows all the data and simply determines what the client needs, he prepares the content and builds chunk on the go. Furthermore, it has no support for material or textures: in our setup, they are downloaded before the streaming starts.
\item \textbf{The streaming policy is basic}: the server traverses all the polygons and determines which polygons should be sent to the client.
\item \textbf{The implementation has sloppy\note{maybe \emph{sloppy} is too strong} performances}: since the content preparation and the streaming policy is made on the fly, the server has to keep track of what the client already has (which will eat memory) and has to compute what should be sent next (which will eat CPU). The scalability of such a server is therefore not possible. Moreover, no client performance has been taken into account since the client used in the user-study did not have to perform streaming.
\end{itemize}
After learning these lessons, we show, in the next chapter, what is possible to do in order to alleviate these issues.

View File

@ -1,18 +1,18 @@
\copied{}
\section{Introduction}
With the progress in data acquisition and modeling techniques, networked virtual environments, or NVE, are increasing in scale.
For instance, Gaillard et al.~\cite{urban-data-visualisation} reported that the 3D scene for the city of Lyon takes more than 30 GB of data.
It has become impractical to download the whole 3D scene before the user begins to navigate in the scene.
A more common approach is to stream the required 3D content (models and textures) on demand, as the user moves around the scene.
Downloading the required 3D content the moment the user demands it, however, leads to ``popping effect'' where 3D objects materialize suddenly in the view of the user, due to the latency between requesting for and receiving the 3D content from the server~\cite{visibility-determination}.
Such latency can be quite high --- Varvello et al.\ reported a median of about 30 seconds for all 3D data in an avatar's surrounding to be loaded in high density Second Life regions under their experimental network conditions, due to a bottleneck at the server~\cite{second-life}.
For a smoother user experience, NVE typically prefetch 3D content, so that a 3D object is readily available for rendering when the object falls into the view of the user.
Efficient prefetching, however, requires the client or the server to predict where the user would navigate to in the future and retrieve the corresponding 3D content before the user reaches there.
In a typical scenario, users navigate along a continuous path in a NVE, leading to a significant overlap between the 3D content visible from the user's known current position and possible next positions (i.e., \textit{spatial data locality}).
Furthermore, there is a significant overlap between the 3D content visible from the current point in time to the next point in time (i.e., \textit{temporal data locality}).
Both forms of locality lead to content overlaps, thus making a correct prediction easier and a wrong prediction less costly. 3D content overlaps are particularly common in a NVE with open space, such as a 3D archaeological site or a 3D city.
% With the progress in data acquisition and modeling techniques, networked virtual environments, or NVE, are increasing in scale.
% For instance, Gaillard et al.~\cite{urban-data-visualisation} reported that the 3D scene for the city of Lyon takes more than 30 GB of data.
% It has become impractical to download the whole 3D scene before the user begins to navigate in the scene.
% A more common approach is to stream the required 3D content (models and textures) on demand, as the user moves around the scene.
% Downloading the required 3D content the moment the user demands it, however, leads to ``popping effect'' where 3D objects materialize suddenly in the view of the user, due to the latency between requesting for and receiving the 3D content from the server~\cite{visibility-determination}.
% Such latency can be quite high --- Varvello et al.\ reported a median of about 30 seconds for all 3D data in an avatar's surrounding to be loaded in high density Second Life regions under their experimental network conditions, due to a bottleneck at the server~\cite{second-life}.
%
% For a smoother user experience, NVE typically prefetch 3D content, so that a 3D object is readily available for rendering when the object falls into the view of the user.
% Efficient prefetching, however, requires the client or the server to predict where the user would navigate to in the future and retrieve the corresponding 3D content before the user reaches there.
% In a typical scenario, users navigate along a continuous path in a NVE, leading to a significant overlap between the 3D content visible from the user's known current position and possible next positions (i.e., \textit{spatial data locality}).
% Furthermore, there is a significant overlap between the 3D content visible from the current point in time to the next point in time (i.e., \textit{temporal data locality}).
% Both forms of locality lead to content overlaps, thus making a correct prediction easier and a wrong prediction less costly. 3D content overlaps are particularly common in a NVE with open space, such as a 3D archaeological site or a 3D city.
Navigating in NVE with a large virtual space (most times through a 2D interface) is sometimes cumbersome.
In particular, a user may have difficulties reaching the right place to find information.
@ -27,7 +27,7 @@ In the worst case, the 3D objects corresponding to the current and destination v
Such movement to a bookmark may lead to a \textit{discovery latency}~\cite{second-life}, in which users have to wait for the 3D content for the new viewpoint to be loaded and displayed.
An analogy for this situation, in the context of video streaming, is seeking into a segment of video that has not been prefetched yet.
In this paper, we explore the impact of bookmarks on NVE navigation and streaming, and make several contributions.
In this chapter, we explore the impact of bookmarks on NVE navigation and streaming, and make several contributions.
First, we conducted a crowdsourcing experiment where 51 participants navigated in 3 virtual scenes to complete a task.
This experiment serves two purposes: (i) it validates our intuition that bookmarks significantly reduce the number of interactions and navigation time (in average the time needed to complete the task
for users with bookmarks is half the time for users without bookmarks); (ii) it produces a set of user interaction traces that we use for subsequent simulation experiments.

View File

@ -160,7 +160,7 @@ Figure~\ref{bi:mat1} shows the probability of visiting a bookmark (vertical axis
This figure shows that users tend to follow similar paths when consuming bookmarks.
Thus, we hypothesize that prefetching along those paths would lead to better image quality and lower discovery latency.
We use the following prefetching policy in this paper.
The policy used is the following.
We divide each chunk sent by the server into two parts.
The first part is used to fetch the content from the current viewpoint, using the \textsf{culling} streaming policy.
The second part is used to prefetch content from the bookmarks, according to their likelihood of being clicked next.

View File

@ -1,14 +1,11 @@
\fresh{}
\section{3D Streaming}
\subsection{Progressive meshes}
It is not possible to speak about 3D streaming without speaking about progressive meshes.
Progressive meshes were introduced by Hughes Hoppe in 1996~\cite{progressive-meshes} and allow transmitting a mesh by send first a low resolution mesh, called \emph{base mesh}, and then transmitting detail information that a client can use to increase the resolution.
Progressive meshes were introduced by~\citet{progressive-meshes} and allow transmitting a mesh by sending a low resolution mesh first, called \emph{base mesh}, and then transmitting detail information that a client can use to increase the resolution.
To do so, an algorithm, called \emph{decimation algorithm} removes vertices and faces by merging vertices (Figure~\ref{sote:progressive-scheme}).
Each time two vertices are merged, vertices and faces are removed from the original mesh, and the resolution of the model decreases a little.
When the model is light enough, it is encoded as is, and the operations needed to recover the initial resolution of the model are encoded as well.
Thus, a client can start by downloading the low resolution model, display it to the user, and keep downloading and displaying details as time goes by.
This process reduces the time a user has to wait before seeing something, and increases the quality of experience.
\begin{figure}[ht]
\centering
@ -65,18 +62,22 @@ This process reduces the time a user has to wait before seeing something, and in
\caption{Vertex split and edge collapse\label{sote:progressive-scheme}}
\end{figure}
Every time two vertices are merged, vertices and faces are removed from the original mesh, and the resolution of the model decreases a little.
When the model is light enough, it is encoded as is, and the operations needed to recover the initial resolution of the model are encoded as well.
Thus, a client can start by downloading the low resolution model, display it to the user, and keep downloading and displaying details as time goes by.
This process reduces the time a user has to wait before seeing something, and increases the quality of experience.
\subsection{glTF}
In a recent standardization effort, the Khronos group has proposed a generic format called glTF (GL Transmission Format~\cite{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated model, etc\ldots
glTF is based on a JSON file, which encodes the structure of a scene of 3D objects.
It can contain a scene tree with cameras, meshes, buffers, materials, textures, animations an skinning information.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming.
However, glTF in itself does not address the problem of view-dependent 3D streaming which is required for large scene remote visualisation.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming which is required for large scene remote visualisation.
\subsection{3D Tiles}
3D Tiles is a specification for visualizing massive 3D geospatial data developped by Cesium and built on glTF\@.
\todo{add stuff here}
\copied{}
\subsection{Prefetching in NVE}

View File

@ -5,19 +5,20 @@
\subsection{DASH\@: the standard for video streaming\label{sote:dash}}
\copied{}
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH~\cite{dash-std,dash-std-2}, is now a widely deployed
standard for streaming adaptive video content on the Web~\cite{dash-std-full}, made to be simple and scalable.
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH (\citet{dash-std,dash-std-2}), is now a widely deployed
standard for streaming adaptive video content on the Web (\citet{dash-std-full}), made to be simple and scalable.
\fresh{}
DASH is based on a clever way of structuring the content that allows a great adaptability during the streaming without requiring any server side computation.
\subsubsection{DASH structure}
All those pieces are structured in a Media Persentation Description (MPD) file, written in the XML format.
This file has 4 layers, the periods, the adaptation sets, the representations and the segments.
Each period can have many adaptation sets, each adaptation set can have many representation, and each representation can have many segments.
All the content structure is described in a Media Persentation Description (MPD) file, written in the XML format.
This file has 4 layers: the periods, the adaptation sets, the representations and the segments.
A MPD behaves like a tree-structure, meaning that each period can have many adaptation sets, each adaptation set can have many representation, and each representation can have many segments.
\paragraph{Periods.}
Periods are used to delimit content depending on the time. It can be used to delimit chapters, or to add advertisements that occur at the beginning, during or at the end of a video.
Periods are used to delimit content depending on the time.
It can be used to delimit chapters, or to add advertisements that occur at the beginning, during or at the end of a video.
\paragraph{Adaptation sets.}
Adaptation sets are used to delimit content depending of the format.
@ -27,37 +28,39 @@ In videos, most of the time, each period has at least one adaptation set contain
\paragraph{Representations.}
The representation level is the level DASH uses to offer the same content at different levels of resolution.
For example, a adaptation set containing images have a representation for each available resolution (it might be 480p, 720p, 1080p, etc\ldots).
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal resolution, being the highest resolution that arrives on time to avoid stalling.
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal resolution, being the highest resolution that the client can request without stalling.
\paragraph{Segments.}
Until this level of the MPD, content can be long.
For example, a representation of images of a chapter of a movie can be heavy and long to download.
However, downloading heavy files is not suitable for streaming because it prevents the dynamicity of it: if the user requests to change the level of resolution of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
Until this level in the MPD, content has been divided but it is still far from being sufficiently divided to be streamed efficiently.
In fact, a representation of the images of a chapter of a movie is still a long video, and keeping such a big file is not possible since downloading heavy files is not suitable for streaming.
In fact, heavy files prevent the dynamicity of streaming: if the user requests to change the level of resolution of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
Segments are used to prevent this behaviour. They typically encode files that last approximately one second of video, and give the software a great ability to dynamically adapt to the system. If a user wants to seek somewhere else in the video, only one second of data can be lost, and only one second of data has to be downloaded for the playback to resume.
Segments are used to prevent this behaviour.
They typically encode files that last approximately one second of video, and give the software a great ability to dynamically adapt to the system.
If a user wants to seek somewhere else in the video, only one second of data can be lost, and only one second of data has to be downloaded for the playback to resume.
\subsubsection{Client side computation}
Once a video is encoded in DASH format, once the files have been structured and the MPD has been generated, they can simply be put on a static HTTP server that does no computation other than serving files when it receives requests.
Once a video is encoded in DASH format, all the files have been structured and the MPD has been generated, all this data can simply be put on a static HTTP server that does no computation other than serving files when it receives requests.
All the intelligence and the decision making is moved to the client side.
A client typically starts by downloading the MPD file, and then proceeds on downloading segments of the different adaptation sets that he needs, estimating itself its downloading speed and choosing itself whether it needs to change representation or not.
\subsection{DASH-SRD}
DASH-SRD (Spatial Relationship Description,~\cite{dash-srd}) is a feature that extends the DASH standard to allow stream only a spatial subpart of a video to a device.
It works by encoding a video at multiple resolutions, and tiling the highest resolutions, that way, a client can choose to download either the low resolution of the whole video or higher resolutions of a subpart of the video (see Figure~\ref{sota:srd-png}).
For each tile of the video, an adaptation set is declared in the MPD, and a supplemental property is defined in order to give the client information about the file.
This supplemental property contains many elements, but the most important ones are the position ($x$ and $y$) and the size (width and height) of the tile describing the position of the tile in relation to the full video. An example of such a property is given in Listing~\ref{sota:srd-xml}.
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed.
This is especially interesting in the context of 3D streaming since we have this same pattern of a user viewing only a part of a content.
DASH-SRD (Spatial Relationship Description,~\cite{dash-srd}) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device.
It works by encoding a video at multiple resolutions, and tiling the highest resolutions as shown in Figure~\ref{sota:srd-png}.
That way, a client can choose to download either the low resolution of the whole video or higher resolutions of a subpart of the video.
\begin{figure}[th]
\centering
\includegraphics[width=\textwidth]{assets/state-of-the-art/video/srd.png}
\includegraphics[width=0.6\textwidth]{assets/state-of-the-art/video/srd.png}
\caption{DASH-SRD~\cite{dash-srd}\label{sota:srd-png}}
\end{figure}
For each tile of the video, an adaptation set is declared in the MPD, and a supplemental property is defined in order to give the client information about the file.
This supplemental property contains many elements, but the most important ones are the position ($x$ and $y$) and the size (width and height) of the tile describing the position of the tile in relation to the full video.
An example of such a property is given in Listing~\ref{sota:srd-xml}.
\begin{figure}[th]
\lstinputlisting[%
language=XML,
@ -80,6 +83,9 @@ This is especially interesting in the context of 3D streaming since we have this
]{assets/state-of-the-art/video/srd.xml}
\end{figure}
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed.
This is especially interesting in the context of 3D streaming since we have this same pattern of a user viewing only a part of a content.
\subsection{Prefetching in video steaming}
\copied{}