This commit is contained in:
Thomas Forgione 2019-09-26 15:11:12 +02:00
parent a500d1e3cc
commit 98248fe751
No known key found for this signature in database
GPG Key ID: 203DAEA747F48F41
7 changed files with 32 additions and 424 deletions

View File

@ -1,88 +0,0 @@
\section{What is a 3D model?}
Before talking about 3D streaming, we need to define what is a 3D model and how it is rendered.
\subsection{Content of a 3D model}
A 3D model consists in a set of data.
\begin{itemize}
\item \textbf{Vertices} are simply 3D points;
\item \textbf{Faces} are polygons defined from vertices (most of the time, they are triangles);
\item \textbf{Textures} are images that can be applied to faces;
\item \textbf{Texture coordinates} are information added to a face to describe how the texture should be applied on a face;
\item \textbf{Normals} are 3D vectors that can give information about light behaviour on a face.
\end{itemize}
The Wavefront OBJ is probably the best format to give an example of 3D model since it describes all these elements in text format.
A 3D model encoded in the OBJ format typically consists in two files: the materials file (\texttt{.mtl}) and the object file (\texttt{.obj}).
\paragraph{}
The materials file declare all the materials that the object file will reference.
Each material has a name, and can have photometric properties such as ambient, diffuse and specular colors, as well as texture maps.
A simple material file is visible on Listing~\ref{i:mtl}.
\paragraph{}
The object file declare the 3D content of the objects.
It declares vertices, texture coordinates and normals from coordinates (e.g.\ \texttt{v 1.0 2.0 3.0} for a vertex, \texttt{vt 1.0 2.0} for a texture coordinate, \texttt{vn 1.0 2.0 3.0} for a normal).
These elements are numbered starting from 1.
Faces are declared by using the indices of these elements. A face is a polygon with any number of vertices and can be declared in multiple manners:
\begin{itemize}
\item \texttt{f 1 2 3} defines a triangle face that joins the first, the second and the third vertex declared;
\item \texttt{f 1/1 2/3 3/4} defines a triangle similar but with texture coordinates, the first texture coordinate is associated to the first vertex, the third texture coordinate is associated to the second vertex, and the fourth texture coordinate is associated with the third vertex;
\item \texttt{f 1//1 2//3 3//4} defines a triangle similar but using normal instead of texture coordinates;
\item \texttt{f 1/1/1 2/3/3 3/4/4} defines a triangle with both texture coordinates and normals.
\end{itemize}
It can include materials from a material file (\texttt{mtllib path.mtl}) and apply the materials that it declares to faces.
A material is applied by using the \texttt{usemtl} keyword, followed by the name of the material to use.
The faces declared after a \texttt{usemtl} are painted using the material in question.
An example of object file is visible on Listing~\ref{i:obj}.
\begin{figure}[th]
\centering
\begin{subfigure}[b]{0.4\textwidth}
\lstinputlisting[
language=XML,
caption={An object file describing a cube},
label=i:obj,
]{assets/introduction/cube.obj}
\end{subfigure}\quad%
\begin{subfigure}[b]{0.4\textwidth}
\lstinputlisting[
language=XML,
caption={A material file describing a material},
label=i:mtl,
]{assets/introduction/materials.mtl}
\vspace{0.2cm}
\includegraphics[width=\textwidth]{assets/introduction/cube.png}
\caption*{A rendering of the cube}
\end{subfigure}
\caption{The OBJ representation of a cube and its render\label{i:cube}}
\end{figure}
\subsection{Rendering a 3D model\label{i:rendering}}
To be able to render a model, it is first required to send the data (vertices, textures coordinates, normals, faces and textures) to the GPU, and then only the rendering can be done.
Then, to render a 3D model, the objects from the model are traversed, the materials and textures are bound to the target on which the rendering will be done, and then, \texttt{glDrawArray} or \texttt{glDrawElements} function is called.
To understand how performance is impacted by the structure of the model, we need to realize two things:
\begin{itemize}
\item calling many times \texttt{glDrawArray} on small arrays is considerably slower than calling it once on a big array;
\item calling \texttt{glDrawArray} on a small array is faster than calling it on a big array.
\end{itemize}
However, due to the way the materials and textures work, we are forced to call \texttt{glDrawArray} at least as many times as there are materials in the model.
Minimizing the numbers of materials used in a 3D model is thus critical for rendering performances.
Another way to improve the performance of rendering is \textbf{frustum culling}.
Frustum culling is a technique that consists in avoiding drawing objects that are not in the field of view of the user's camera.
It is particularly efficient when they are many objects in a scene since it gives potential for skips.
These two aspects are somehow contradictory, and to have greatest performance for 3D rendering, one must ensure that:
\begin{itemize}
\item the least amount of materials are used, and most objects that share materials are drawn together in a single \texttt{glDrawArray} call;
\item objects are not all drawn together and grouped together depending on their location to keep the frustum culling efficient.
\end{itemize}

View File

@ -2,7 +2,7 @@
\section{Open problems\label{i:challenges}}
The objective of our work is to design a system that allows a user to access remote 3D content that guarantees good quality of service and quality of experience.
The objective of our work is to design a system that allows a user to access remote 3D content and that guarantees good quality of service and quality of experience.
A 3D streaming client has lots of tasks to accomplish:
\begin{itemize}
@ -18,6 +18,8 @@ This opens multiple problems that we need to take care of.
\subsection{Content preparation}
Any preprocessing that can be done on our 3D data gives us a strategical advantage since it consists in computations that will not be needed live, neither for the server nor for the client.
Furthermore, for streaming, data needs to be split into chunks that are requested separately, so perparing those chunks in advance can also help the streaming.
Before streaming content, it needs to be prepared.
This includes but is not limited to compression and segmentation.
\subsection{Chunk utility}
Once our content is prepared and split in chunks, we need to be able to rate those chunks depending on the user's position.

View File

@ -1,9 +1,20 @@
\chapter{Introduction\label{i}}
\input{introduction/3d-model}
\resetstyle{}
\copied{}
With the progress in data acquisition and modeling techniques, networked virtual environments, or NVE, are increasing in scale.
For instance,~\cite{urban-data-visualisation} reported that the 3D scene for the city of Lyon takes more than 30 GB of data.
It has become impractical to download the whole 3D scene before the user begins to navigate in the scene.
A more common approach is to stream the required 3D content (models and textures) on demand, as the user moves around the scene.
Downloading the required 3D content the moment the user demands it, however, leads to ``popping effect'' where 3D objects materialize suddenly in the view of the user, due to the latency between requesting for and receiving the 3D content from the server~\cite{visibility-determination}.
Such latency can be quite high --- Varvello et al.\ reported a median of about 30 seconds for all 3D data in an avatar's surrounding to be loaded in high density Second Life regions under their experimental network conditions, due to a bottleneck at the server~\cite{second-life}.
For a smoother user experience, NVE typically prefetch 3D content, so that a 3D object is readily available for rendering when the object falls into the view of the user.
Efficient prefetching, however, requires the client or the server to predict where the user would navigate to in the future and retrieve the corresponding 3D content before the user reaches there.
In a typical scenario, users navigate along a continuous path in a NVE, leading to a significant overlap between the 3D content visible from the user's known current position and possible next positions (i.e., \textit{spatial data locality}).
Furthermore, there is a significant overlap between the 3D content visible from the current point in time to the next point in time (i.e., \textit{temporal data locality}).
Both forms of locality lead to content overlaps, thus making a correct prediction easier and a wrong prediction less costly. 3D content overlaps are particularly common in a NVE with open space, such as a 3D archaeological site or a 3D city.
\input{introduction/video-vs-3d}
\resetstyle{}
\input{introduction/challenges}
@ -11,3 +22,4 @@
\input{introduction/outline}
\resetstyle{}

View File

@ -1,27 +1,30 @@
\section{Thesis outline}
First, in Chapter~\ref{sote}, we present a review of the state of the art on the fields that we are interesting in.
First, in Chapter~\ref{f}, we give some preliminary information required to understand the types of objects we are manipulating in this thesis.
We then proceed to compare 3D and video content: surprisingly, video and 3D share many problems, and analysing them gives inspiration for building a 3D streaming system.
In Chapter~\ref{sote}, we present a review of the state of the art on the fields that we are interesting in.
This chapter start by analysing the standards of video streaming.
Then it reviews the different manners of performing 3D streaming.
The last section of this chapter focuses on 3D interaction.
Then, in Chapter~\ref{bi}, we analyse the impact of the UI on navigation and streaming in a 3D scene.
Then, in Chapter~\ref{bi}, we present our first contribution: an in depth analysis of the impact of the UI on navigation and streaming in a 3D scene.
We first develop a basic interface for navigating in 3D and we introduce 3D objects called \emph{bookmarks} that help users navigate in the scene.
We then present a user study that we conducted on 50 people that shows that bookmarks have a great impact on how easy it is for a user to perform tasks such as finding objects.
Then, we setup a basic 3D streaming system that allows us to replay the traces collected during the user study and simulate 3D streaming at the same time.
Finally, we analyse how the presence of bookmarks impacts the streaming, and we propose and evaluate a few streaming policies that rely on precomputations that can be made thanks to bookmarks and that can increase the quality of experience.
In Chapter~\ref{d3}, we develop the most important contribution of this thesis: DASH-3D.
In Chapter~\ref{d3}, we present the most important contribution of this thesis: DASH-3D.
DASH-3D is an adaptation of the DASH standard for 3D streaming.
We first describe how we adapt the concepts of DASH to 3D content, including the segmentation of content in \emph{segments}.
We then define utilty metrics that associates score to each segment depending on the camera's position.
Then, we present a client and various streaming policies based on our utilities that can benefit from the DASH format.
We finally evaluate the different parameters of our client.
In Chapter~\ref{d3i}, we explain the whole implementation of DASH-3D.
In Chapter~\ref{d3i}, we explain the whole implementation of DASH-3D.\todo{going to change}
Implementating DASH-3D required a lot of effort, and since both user studies and simulations are required, we describe the two clients we implemented: one client using web technologies to enable easy user studies and one client that is compiled to native code and that allows us to run efficient simulations and precisely compare the impact of the parameters of our system.
In Chapter~\ref{sb}, we integrate back the interaction ideas that we developed in Chapter~\ref{bi} into DASH-3D.
In Chapter~\ref{sb}, we present our last contribution: the integration of the interaction ideas that we developed in Chapter~\ref{bi} into DASH-3D.
We first develop an interface that allows desktop as well as mobile devices to navigate in a 3D scene being streamed, and that introduces a new style of bookmarks.
We then explain why simply applying the ideas developed in Chapter~\ref{bi} is not sufficient and we propose more efficient precomputations that can enhance the streaming.
Finally, we present a user study that provides us with traces on which we can perform simulations, and we evaluate the impact of our extension of DASH-3D on the quality of service and on the quality of experience.\todo{maybe only qos here}

View File

@ -1,313 +0,0 @@
\fresh{}
\section{Similarities and differences between video and 3D\label{i:video-vs-3d}}
Despite what one may think, the video streaming scenario and the 3D streaming one share many similarities: at a higher level of abstraction, they are both systems that allow a user to access remote content without having to wait until everything is loaded.
Analyzing the similarities and the differences between the video and the 3D scenarios as well as having knowledge about video streaming litterature is\todo{is key or are key?} key to developing an efficient 3D streaming system.
\subsection{Data persistence}
One of the main differences between video and 3D streaming is the persistence of data.
In video streaming, only one second of video is required at a time.
Of course, most of video streaming services prefetch some future chunks, and keep in cache some previous ones, but a minimal system could work without latency and keep in memory only two chunks: the current one and the next one.
In 3D streaming, each chunk is part of a scene, and already a few problems appear here:
\begin{itemize}
\item depending on the user's field of view, many chunks may be required to perform a single rendering;
\item chunks do not become obsolete the way they do in video, a user navigating in a 3D scene may come back to a same spot after some time, or see the same objects but from elsewhere in the scene.
\end{itemize}
\subsection{Multiresolution}
All the major video streaming platforms support multiresolution streaming.
This means that a client can choose the resolution at which it requests the content.
It can be chosen directly by the user or automatically determined by analysing the available resources (size of the screen, downoading bandwidth, device performances, etc\ldots)
\begin{figure}[th]
\centering
\includegraphics[width=\textwidth]{assets/introduction/youtube-multiresolution.png}
\caption{The different resolutions available for a Youtube video}
\end{figure}
In the same way, the recent work in 3D streaming have proposed many ways to progressively streaming 3D models, allowing the user to have a low resolution without having to wait, and being able to interact with the model while the details are being downloaded.
\subsection{Media types}
Just like a video, a 3D scene is composed of different types of media.
In video, those media typically are images, sounds, and eventually subtitles, whereas in 3D, those media typically are geometry or textures.
In both cases, an algorithm for content streaming has to acknowledge those different media types and manage them correctly.
In video streaming, most of the data (in terms of bytes) is used for images.
Thus, the most important thing a video streaming system should do is optimize the image streaming.
That's why, on a video on Youtube for example, there may be 6 resolutions for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 resolutions for sound.
This is one of the main differences between video and 3D streaming: in a 3D scene, the geometry and the texture size are approximately the same, and work to improve the streaming needs to be performed on both.
\subsection{Chunks of data}
In order to be able to perform streaming, data needs to be segmented so that a client can request chunks of data and display it to the user while requesting another chunk.
In video streaming, data chunks typically consist in a few seconds of video.
In mesh streaming, it can either by segmenting faces in chunks, with a certain number of faces per chunk, or, in the case of progressive meshes, it can be segmented in a chunk containing the base mesh and different chunks encoding the data needed to increase the resolution of the previous level of detail.
\subsection{Interaction}
The ways of interacting with the content is probably the most important difference between video and 3D.
In a video interface, there is only one degree of freedom: the time.
The only thing a user can do is let the video play itself, pause or resume it, or jump to another moment in the video.
Even though these interactions seem easy to handle, giving the best possible experience to the user is already challenging. For example, to perform these few actions, Youtube gives the user multiple options.
\begin{itemize}
\item To pause or resume a video, the user can:
\begin{itemize}
\item click the video;
\item press the \texttt{K} key;
\item press the space key if the video is focused by the browser.
\end{itemize}
\item To navigate to another moment of the video, the user can:
\begin{itemize}
\item click the timeline of the video where he wants;
\item press the left arrow key to move 5 seconds backwards;
\item press the right arrow key to move 5 seconds forwards;
\item press the \texttt{J} key to move 10 seconds backwards;
\item press the \texttt{L} key to move 10 seconds forwards;
\item press one of the number key (on the first row of the keyoard, below the function keys) to move the corresponding decile of the video.
\end{itemize}
\end{itemize}
There are even ways of controlling the other options, for example, \texttt{F} puts the player in fullscreen mode, up and down arrows changes the sound volume, \texttt{M} mutes the sound and \texttt{C} activates the subtitles.
All the interactions are summmed up in Figure~\ref{i:youtube-keyboard}.
\newcommand{\relativeseekcontrol}{LightBlue}
\newcommand{\absoluteseekcontrol}{LemonChiffon}
\newcommand{\playpausecontrol}{Pink}
\newcommand{\othercontrol}{PalePaleGreen}
\newcommand{\keystrokescale}{0.6}
\newcommand{\tuxlogo}{\FA\symbol{"F17C}}
\newcommand{\keystrokemargin}{0.1}
\newcommand{\keystroke}[5]{%
\draw[%
fill=white,
drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
rounded corners=2pt,
inner sep=1pt,
line width=0.5pt,
font=\scriptsize\sffamily,
minimum width=0.1cm,
minimum height=0.1cm,
] (#1+\keystrokemargin, #3+\keystrokemargin) rectangle (#2-\keystrokemargin, #4-\keystrokemargin);
\node[align=center] at ({(#1+#2)/2}, {(#3+#4)/2}) {#5\strut};
}
\newcommand{\keystrokebg}[6]{%
\draw[%
fill=#6,
drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
rounded corners=2pt,
inner sep=1pt,
line width=0.5pt,
font=\scriptsize\sffamily,
minimum width=0.1cm,
minimum height=0.1cm,
] (#1+\keystrokemargin, #3+\keystrokemargin) rectangle (#2-\keystrokemargin, #4-\keystrokemargin);
\node[align=center] at ({(#1+#2)/2}, {(#3+#4)/2}) {#5\strut};
}
\begin{figure}[ht]
\centering
\begin{tikzpicture}[scale=\keystrokescale, every node/.style={scale=\keystrokescale}]
% Escape key
\keystroke{0}{1}{-0.75}{0}{ESC};
% F1 - F4
\begin{scope}[shift={(1.5, 0)}]
\foreach \key/\offset in {F1/1,F2/2,F3/3,F4/4}
\keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
\end{scope}
% F5 - F8
\begin{scope}[shift={(6,0)}]
\foreach \key/\offset in {F5/1,F6/2,F7/3,F8/4}
\keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
\end{scope}
% F9 - F12
\begin{scope}[shift={(10.5,0)}]
\foreach \key/\offset in {F9/1,F10/2,F11/3,F12/4}
\keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
\end{scope}
% Number rows
\foreach \key/\offset in {`/0,-/11,=/12,\textbackslash/13}
\keystroke{\offset}{1+\offset}{-1.75}{-1}{\key};
\foreach \key/\offset in {1/1,2/2,3/3,4/4,5/5,6/6,7/7,8/8,0/9,0/10}
\keystrokebg{\offset}{1+\offset}{-1.75}{-1}{\key}{\absoluteseekcontrol};
% Delete char
\keystroke{14}{15.5}{-1.75}{-1}{DEL};
% Tab char
\keystroke{0}{1.5}{-2.5}{-1.75}{Tab};
% First alphabetic row
\begin{scope}[shift={(1.5,0)}]
\foreach \key/\offset in {Q/0,W/1,E/2,R/3,T/4,Y/5,U/6,I/7,O/8,P/9,[/10,]/11}
\keystroke{\offset}{1+\offset}{-2.5}{-1.75}{\key};
\end{scope}
% Caps lock
\keystroke{0}{1.75}{-3.25}{-2.5}{Caps};
% Second alphabetic row
\begin{scope}[shift={(1.75,0)}]
\foreach \key/\offset in {A/0,S/1,D/2,G/4,H/5,;/9,'/10}
\keystroke{\offset}{1+\offset}{-3.25}{-2.5}{\key};
\keystrokebg{3}{4}{-3.25}{-2.5}{F}{\othercontrol}
\keystrokebg{6}{7}{-3.25}{-2.5}{J}{\relativeseekcontrol};
\keystrokebg{7}{8}{-3.25}{-2.5}{K}{\playpausecontrol};
\keystrokebg{8}{9}{-3.25}{-2.5}{L}{\relativeseekcontrol};
\end{scope}
% Enter key
\draw[%
fill=white,
drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
rounded corners=2pt,
inner sep=1pt,
line width=0.5pt,
font=\scriptsize\sffamily,
minimum width=0.1cm,
minimum height=0.1cm,
] (13.6, -1.85) -- (15.4, -1.85) -- (15.4, -3.15) -- (12.85, -3.15) -- (12.85, -2.6) -- (13.6, -2.6) -- cycle;
\node[right] at(12.85, -2.875) {Enter $\hookleftarrow$};
% Left shift key
\keystroke{0}{2.25}{-4}{-3.25}{$\Uparrow$ Shift};
% Third alphabetic row
\begin{scope}[shift={(2.25,0)}]
\foreach \key/\offset in {Z/0,X/1,V/3,B/4,N/5, /7,./8,\slash/9}
\keystroke{\offset}{1+\offset}{-4}{-3.25}{\key};
\keystrokebg{2}{3}{-4}{-3.25}{C}{\othercontrol};
\keystrokebg{6}{7}{-4}{-3.25}{M}{\othercontrol};
\end{scope}
% Right shift key
\keystroke{12.25}{15.5}{-4}{-3.25}{$\Uparrow$ Shift};
% Last keyboard row
\keystroke{0}{1.25}{-4.75}{-4}{Ctrl};
\keystroke{1.25}{2.5}{-4.75}{-4}{\tuxlogo};
\keystroke{2.5}{3.75}{-4.75}{-4}{Alt};
\keystrokebg{3.75}{9.75}{-4.75}{-4}{}{\playpausecontrol};
\keystroke{9.75}{11}{-4.75}{-4}{Alt};
\keystroke{11}{12.25}{-4.75}{-4}{\tuxlogo};
\keystroke{12.25}{13.5}{-4.75}{-4}{}
\keystroke{13.5}{15.5}{-4.75}{-4}{Ctrl};
% Arrow keys
\keystrokebg{16}{17}{-4.75}{-4}{$\leftarrow$}{\relativeseekcontrol};
\keystrokebg{17}{18}{-4.75}{-4}{$\downarrow$}{\othercontrol};
\keystrokebg{18}{19}{-4.75}{-4}{$\rightarrow$}{\relativeseekcontrol};
\keystrokebg{17}{18}{-4}{-3.25}{$\uparrow$}{\othercontrol};
% Numpad
\keystroke{19.5}{20.5}{-1.75}{-1}{Lock};
\keystroke{20.5}{21.5}{-1.75}{-1}{/};
\keystroke{21.5}{22.5}{-1.75}{-1}{*};
\keystroke{22.5}{23.5}{-1.75}{-1}{-};
\keystrokebg{19.5}{20.5}{-2.5}{-1.75}{7}{\absoluteseekcontrol};
\keystrokebg{20.5}{21.5}{-2.5}{-1.75}{8}{\absoluteseekcontrol};
\keystrokebg{21.5}{22.5}{-2.5}{-1.75}{9}{\absoluteseekcontrol};
\keystrokebg{19.5}{20.5}{-3.25}{-2.5}{4}{\absoluteseekcontrol};
\keystrokebg{20.5}{21.5}{-3.25}{-2.5}{5}{\absoluteseekcontrol};
\keystrokebg{21.5}{22.5}{-3.25}{-2.5}{6}{\absoluteseekcontrol};
\keystrokebg{19.5}{20.5}{-4}{-3.25}{1}{\absoluteseekcontrol};
\keystrokebg{20.5}{21.5}{-4}{-3.25}{2}{\absoluteseekcontrol};
\keystrokebg{21.5}{22.5}{-4}{-3.25}{3}{\absoluteseekcontrol};
\keystrokebg{19.5}{21.5}{-4.75}{-4}{0}{\absoluteseekcontrol};
\keystroke{21.5}{22.5}{-4.75}{-4}{.};
\keystroke{22.5}{23.5}{-3.25}{-1.75}{+};
\keystroke{22.5}{23.5}{-4.75}{-3.25}{$\hookleftarrow$};
\end{tikzpicture}
\vspace{0.5cm}
% Legend
\begin{tikzpicture}[scale=\keystrokescale]
\keystrokebg{7}{8}{-6}{-5}{}{\absoluteseekcontrol};
\node[right=0.2cm] at (7.5, -5.55) {Absolute seek keys};
\keystrokebg{7}{8}{-7}{-6}{}{\relativeseekcontrol};
\node[right=0.2cm] at (7.5, -6.55) {Relative seek keys};
\keystrokebg{7}{8}{-8}{-7}{}{\playpausecontrol};
\node[right=0.2cm] at (7.5, -7.55) {Play or pause keys};
\keystrokebg{7}{8}{-9}{-8}{}{\othercontrol};
\node[right=0.2cm] at (7.5, -8.55) {Other keys};
\end{tikzpicture}
\caption{Youtube shortcuts\label{i:youtube-keyboard}}
\end{figure}
Those interactions are different if the user is using a mobile device.
\begin{itemize}
\item To pause a video, the user must touch the screen once to make the HUD appear and once on the pause button at the center of the screen.
\item To resume a video, the user must touch the play button at the center of the screen.
\item To navigate to another moment of the video, the user can:
\begin{itemize}
\item double touch the left of the screen to move 5 seconds backwards;
\item double touch the right of the screen to move 5 seconds forwards.
\end{itemize}
\end{itemize}
When it comes to 3D, there are many approaches to manage user interaction.
Some interfaces mimic the video scenario, where the only variable is the time and the camera follows a predetermined path on which the user has no control.
These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
Some other interfaces add 2 degrees of freedom to the previous one: the user does not control the position of the camera but he can control the angle. This mimics the scenario of the 360 video.
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angle (assuming the up vector is unchangeable, some interfaces might allow that giving a sixth degree of freedom).
\subsection{Relationship between interface, interaction and streaming}
In both video and 3D systems, streaming affects the interaction.
For example, in a video streaming scenario, if a user sees that the video is fully loaded, he might start moving around on the timeline, but if he sees that the streaming is just enough to not stall, he might prefer staying peaceful and just watch the video.
If the streaming stalls for too long, the user migth seek somewhere else hoping for the video to resume, or totally give up and leave the video.
The same types of behaviour occur in 3D streaming: if a user is somewhere in a scene, and sees more data appearing, he might wait until enough data has arrived, but if he sees nothing happens, he might leave to look for data somewhere else.
Those examples show how streaming can affect the interaction, but the interaction also affects the streaming.
In a video streaming scenario, if a user is watching peacefully without interacting, the system just has to request the next chunks of video and display them.
However, if a user starts seeking at a different time of the streaming, the streaming would most likely stall until the system is able to gather the data it needs to resume the video.
Just like in the video setup, the way a user navigates in a networked virtual environment affects the streaming.
Moving slowly allows the system to collect and display data to the user, whereas moving frenetically puts more pressure on the streaming: the data that the system requested may be obsolete when the response arrives.
Morevoer, the interface and the way elements are displayed to the user also impacts his behaviour.
A streaming system can use this effect to its users benefit by providing feedback on the streaming to the user via the interace.
For example, on YouTube, the buffered portion of the video is displayed in light grey on the timeline, whereas the portion that remains to be downloaded is displayed in dark grey.
A user is more likely to click on the light grey part of the timeline that on the dark grey part, preventing the streaming from stalling.
\begin{figure}[th]
\centering
\begin{tikzpicture}
\node (S) at (0, 0) [draw, rectangle, minimum width=2cm,minimum height=1cm] {Streaming};
\node (I) at (-2, -3) [draw, rectangle, minimum width=2cm,minimum height=1cm] {Interface};
\node (U) at (2, -3) [draw, rectangle, minimum width=2cm,minimum height=1cm] {User};
\draw[double ended double arrow=5pt colored by black and white] (S) -- (I);
\draw[double ended double arrow=5pt colored by black and white] (S) -- (U);
\draw[double arrow=5pt colored by black and white] (I) -- (U);
\end{tikzpicture}
\end{figure}

View File

@ -1,8 +1,13 @@
\mainmatter{}
\frontmatter{}
\input{introduction/main}
\resetstyle{}
\mainmatter{}
\input{foreword/main}
\resetstyle{}
\input{state-of-the-art/main}
\resetstyle{}

View File

@ -1,19 +1,6 @@
\copied{}
\section{Introduction}
% With the progress in data acquisition and modeling techniques, networked virtual environments, or NVE, are increasing in scale.
% For instance, Gaillard et al.~\cite{urban-data-visualisation} reported that the 3D scene for the city of Lyon takes more than 30 GB of data.
% It has become impractical to download the whole 3D scene before the user begins to navigate in the scene.
% A more common approach is to stream the required 3D content (models and textures) on demand, as the user moves around the scene.
% Downloading the required 3D content the moment the user demands it, however, leads to ``popping effect'' where 3D objects materialize suddenly in the view of the user, due to the latency between requesting for and receiving the 3D content from the server~\cite{visibility-determination}.
% Such latency can be quite high --- Varvello et al.\ reported a median of about 30 seconds for all 3D data in an avatar's surrounding to be loaded in high density Second Life regions under their experimental network conditions, due to a bottleneck at the server~\cite{second-life}.
%
% For a smoother user experience, NVE typically prefetch 3D content, so that a 3D object is readily available for rendering when the object falls into the view of the user.
% Efficient prefetching, however, requires the client or the server to predict where the user would navigate to in the future and retrieve the corresponding 3D content before the user reaches there.
% In a typical scenario, users navigate along a continuous path in a NVE, leading to a significant overlap between the 3D content visible from the user's known current position and possible next positions (i.e., \textit{spatial data locality}).
% Furthermore, there is a significant overlap between the 3D content visible from the current point in time to the next point in time (i.e., \textit{temporal data locality}).
% Both forms of locality lead to content overlaps, thus making a correct prediction easier and a wrong prediction less costly. 3D content overlaps are particularly common in a NVE with open space, such as a 3D archaeological site or a 3D city.
Navigating in NVE with a large virtual space (most times through a 2D interface) is sometimes cumbersome.
In particular, a user may have difficulties reaching the right place to find information.
The content provider of the NVE may want to highlight certain interesting features for the users to view and experience, such as a vantage point in a city, an excavation at an archaeological site, or an exhibit in a museum.