This commit is contained in:
parent
1d2bf6a251
commit
9029df1ec1
|
@ -541,3 +541,12 @@
|
||||||
address = {New York, NY, USA},
|
address = {New York, NY, USA},
|
||||||
keywords = {3D city models, CityGML, WebGL, browser integration, geographic information system, web streaming},
|
keywords = {3D city models, CityGML, WebGL, browser integration, geographic information system, web streaming},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@inproceedings{youtube-dash-analysis,
|
||||||
|
title={YouTube’s DASH implementation analysis},
|
||||||
|
author={A{\~n}orga, Javier and Arrizabalaga, Saioa and Sedano, Beatriz and Alonso-Arce, Maykel and Mendizabal, Jaizki},
|
||||||
|
booktitle={19th International Conference on Circuits, Systems, Communications and Computers (CSCC)},
|
||||||
|
pages={61--66},
|
||||||
|
year={2015}
|
||||||
|
}
|
||||||
|
|
|
@ -1,11 +1,13 @@
|
||||||
A 3D streaming system is a system that collects 3D data and dynamically renders it.
|
A 3D streaming system is a system that dynamically collects 3D data.
|
||||||
The previous chapter voluntarily remained vague about what \emph{3D data} actually is.
|
The previous chapter voluntarily remained vague about what \emph{3D data} actually is.
|
||||||
This chapter presents in detail what 3D data is and how it is renderer, and gives insights about interaction and streaming by comparing the 3D case to the video one.
|
This chapter presents in detail the 3D data we consider and how it is renderer.
|
||||||
|
We also give insights about interaction and streaming by comparing the 3D case to the video one.
|
||||||
|
|
||||||
\section{What is a 3D model?}
|
\section{What is a 3D model?}
|
||||||
|
|
||||||
\subsection{3D data}
|
\subsection{3D data}
|
||||||
A 3D model consists in a set of data.
|
Most classical 3D models are set of mesh and textures, that can potentially be arranged in a scene graph.
|
||||||
|
Such a model can typically contain the following:
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item \textbf{Vertices} are simply 3D points;
|
\item \textbf{Vertices} are simply 3D points;
|
||||||
|
@ -15,25 +17,25 @@ A 3D model consists in a set of data.
|
||||||
\item \textbf{Normals} are 3D vectors that can give information about light behaviour on a face.
|
\item \textbf{Normals} are 3D vectors that can give information about light behaviour on a face.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
The Wavefront OBJ is one of the most popular format and describes all these elements in text format.
|
The Wavefront OBJ is one of the most popular format that describes all these elements in text format.
|
||||||
A 3D model encoded in the OBJ format typically consists in two files: the materials file (\texttt{.mtl}) and the object file (\texttt{.obj}).
|
A 3D model encoded in the OBJ format typically consists in two files: the materials file (\texttt{.mtl}) and the object file (\texttt{.obj}).
|
||||||
|
|
||||||
\paragraph{}
|
\paragraph{}
|
||||||
The materials file declares all the materials that the object file will reference.
|
The materials file declares all the materials that the object file will reference.
|
||||||
A material consists in name, and other photometric properties such as ambient, diffuse and specular colors, as well as texture maps.
|
A material consists in name, and other photometric properties such as ambient, diffuse and specular colors, as well as texture maps.
|
||||||
Each face correspond to a material and a renderer can use the material's information to render the faces in a specific way.
|
Each face correspond to a material and a renderer can use the material's information to render the faces.
|
||||||
A simple material file is visible on Listing~\ref{i:mtl}.
|
A simple material file is visible on Listing~\ref{i:mtl}.
|
||||||
|
|
||||||
\paragraph{}
|
\paragraph{}
|
||||||
The object file declares the 3D content of the objects.
|
The object file declares the 3D content of the objects.
|
||||||
It declares vertices, texture coordinates and normals from coordinates (e.g.\ \texttt{v 1.0 2.0 3.0} for a vertex, \texttt{vt 1.0 2.0} for a texture coordinate, \texttt{vn 1.0 2.0 3.0} for a normal).
|
It declares vertices, texture coordinates and normals from coordinates (e.g.\ \texttt{v 1.0 2.0 3.0} for a vertex, \texttt{vt 1.0 2.0} for a texture coordinate, \texttt{vn 1.0 2.0 3.0} for a normal).
|
||||||
These elements are numbered starting from 1.
|
These elements are numbered starting from 1.
|
||||||
Faces are declared by using the indices of these elements. A face is a polygon with any number of vertices and can be declared in multiple manners:
|
Faces are declared by using the indices of these elements. A face is a polygon with an arbitrary number of vertices and can be declared in multiple manners:
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item \texttt{f 1 2 3} defines a triangle face that joins the first, the second and the third vertex declared;
|
\item \texttt{f 1 2 3} defines a triangle face that joins the first, the second and the third vertex declared;
|
||||||
\item \texttt{f 1/1 2/3 3/4} defines a similar triangle but with texture coordinates, the first texture coordinate is associated to the first vertex, the third texture coordinate is associated to the second vertex, and the fourth texture coordinate is associated with the third vertex;
|
\item \texttt{f 1/1 2/3 3/4} defines a similar triangle but with texture coordinates, the first texture coordinate is associated to the first vertex, the third texture coordinate is associated to the second vertex, and the fourth texture coordinate is associated with the third vertex;
|
||||||
\item \texttt{f 1//1 2//3 3//4} defines a similar triangle but using normals instead of texture coordinates;
|
\item \texttt{f 1//1 2//3 3//4} defines a similar triangle but referencing normals instead of texture coordinates;
|
||||||
\item \texttt{f 1/1/1 2/3/3 3/4/4} defines a triangle with both texture coordinates and normals.
|
\item \texttt{f 1/1/1 2/3/3 3/4/4} defines a triangle with both texture coordinates and normals.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
|
@ -68,29 +70,29 @@ An example of object file is visible on Listing~\ref{i:obj}.
|
||||||
A typical 3D renderer follows Algorithm~\ref{f:renderer}.
|
A typical 3D renderer follows Algorithm~\ref{f:renderer}.
|
||||||
|
|
||||||
\begin{algorithm}[th]
|
\begin{algorithm}[th]
|
||||||
\SetKwData{Texture}{texture}
|
\SetKwData{Material}{material}
|
||||||
\SetKwData{Object}{object}
|
\SetKwData{Object}{object}
|
||||||
\SetKwData{Geometry}{geometry}
|
\SetKwData{Geometry}{geometry}
|
||||||
\SetKwData{Textures}{all\_textures}
|
\SetKwData{Materials}{all\_materials}
|
||||||
\SetKwData{Object}{object}
|
\SetKwData{Object}{object}
|
||||||
\SetKwData{Scene}{scene}
|
\SetKwData{Scene}{scene}
|
||||||
\SetKwData{True}{true}
|
\SetKwData{True}{true}
|
||||||
\SetKwFunction{LoadGeometry}{load\_geometry}
|
\SetKwFunction{LoadGeometry}{load\_geometry}
|
||||||
\SetKwFunction{LoadTexture}{load\_texture}
|
\SetKwFunction{LoadMaterial}{load\_material}
|
||||||
\SetKwFunction{BindTexture}{bind\_texture}
|
\SetKwFunction{BindMaterial}{bind\_material}
|
||||||
\SetKwFunction{Draw}{draw}
|
\SetKwFunction{Draw}{draw}
|
||||||
|
|
||||||
\tcc{Initialization}
|
\tcc{Initialization}
|
||||||
\For{$\Object\in\Scene$}{%
|
\For{$\Object\in\Scene$}{%
|
||||||
\LoadGeometry{\Object.\Geometry}\;
|
\LoadGeometry{\Object.\Geometry}\;
|
||||||
\LoadTexture{\Object.\Texture}\;
|
\LoadMaterial{\Object.\Material}\;
|
||||||
}
|
}
|
||||||
\BlankLine%
|
\BlankLine%
|
||||||
\BlankLine%
|
\BlankLine%
|
||||||
\tcc{Render loop}
|
\tcc{Render loop}
|
||||||
\While{\True}{%
|
\While{\True}{%
|
||||||
\For{$\Object\in\Scene$}{%
|
\For{$\Object\in\Scene$}{%
|
||||||
\BindTexture{\Object.\Texture}\;
|
\BindMaterial{\Object.\Material}\;
|
||||||
\Draw{\Object.\Geometry}\;
|
\Draw{\Object.\Geometry}\;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -100,11 +102,11 @@ A typical 3D renderer follows Algorithm~\ref{f:renderer}.
|
||||||
|
|
||||||
The first task the renderer needs to perform is sending the data to the GPU\@: this is done in the loading loop at the beginning.
|
The first task the renderer needs to perform is sending the data to the GPU\@: this is done in the loading loop at the beginning.
|
||||||
This step can be slow, but it is generally acceptable since it only occurs once at the beginning of the program.
|
This step can be slow, but it is generally acceptable since it only occurs once at the beginning of the program.
|
||||||
Then, the renderer starts the rendering loop: at each frame, it renders the whole scene.
|
Then, the renderer starts the rendering loop: at each frame, it renders the whole scene: for each object, it binds the corresponding material to the GPU and then renders the object.
|
||||||
During the rendering loop, there are two things to consider regarding performances:
|
During the rendering loop, there are two things to consider regarding performances:
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item obviously, the more faces a geometry contains, the slower the \texttt{draw} call is;
|
\item obviously, the more faces a geometry contains, the slower the \texttt{draw} call is;
|
||||||
\item the more objects in the scene, the more overhead at each step of the loop.
|
\item the more objects in the scene, the more overhead cause by the CPU/GPU communication at each step of the loop.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
The way the loop works forces objects with different textures to be rendered separately.
|
The way the loop works forces objects with different textures to be rendered separately.
|
||||||
|
@ -150,5 +152,9 @@ Algorithm~\ref{f:frustum-culling} is a variation of Algorithm~\ref{f:renderer} w
|
||||||
\end{algorithm}
|
\end{algorithm}
|
||||||
|
|
||||||
A renderer that uses a single object avoids the overhead, but fails to benefit from frustum culling.
|
A renderer that uses a single object avoids the overhead, but fails to benefit from frustum culling.
|
||||||
A better renderer ensures to have objects that do not spread across the whole scene, since that would lead to a useless frustum culling, and many objects to avoid rendering the whole scene at each frame, but not too many objects to avoid suffering from the overhead.
|
An optimized renderer needs to find a compromise between a too fine partition of the scene, which introduces overhead, and a too coarse partition which introduces useless rendering.
|
||||||
|
|
||||||
|
% ensures to have objects that do not spread across the whole scene, since that would lead to a useless frustum culling, and many objects to avoid rendering the whole scene at each frame.
|
||||||
|
|
||||||
|
% but not too many objects to avoid suffering from the overhead.
|
||||||
|
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
\fresh{}
|
\fresh{}
|
||||||
\section{Implementation details}
|
\section{Implementation details}
|
||||||
|
|
||||||
During this thesis, a lot of software has been written, and for this software to be successful and efficient, we took care of choosing the right languages.
|
During this thesis, a lot of software has been developed, and for this software to be successful and efficient, we took care of choosing the appropriate languages.
|
||||||
When it comes to 3D streaming systems, there are two kind of software that we need.
|
When it comes to 3D streaming systems, there are two kind of software that we need.
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item \textbf{Interactive applications} that can run on as many devices as possible whether it be desktop or mobile in order to try and to conduct user studies. For this context, we chose the \textbf{JavaScript language}, since it can run on many devices and it has great support for WebGL\@.
|
\item \textbf{Interactive applications} that can run on as many devices as possible whether it be desktop or mobile in order to try and to conduct user studies. For this context, we chose the \textbf{JavaScript language}, since it can run on many devices and it has great support for WebGL\@.
|
||||||
|
|
|
@ -2,15 +2,15 @@
|
||||||
|
|
||||||
\section{Similarities and differences between video and 3D\label{i:video-vs-3d}}
|
\section{Similarities and differences between video and 3D\label{i:video-vs-3d}}
|
||||||
|
|
||||||
Contrary to what one might think, the video streaming scenario and the 3D streaming one share many similarities: at a higher level of abstraction, they are both systems that allow a user to access remote content without having to wait until everything is loaded.
|
Contrary to what one might think, the video streaming setting and the 3D streaming setting share many similarities: at a higher level of abstraction, both systems allow a user to access remote content without having to wait until everything is loaded.
|
||||||
Analyzing the similarities and the differences between the video and the 3D scenarios as well as having knowledge about video streaming literature is~\todo{is key or are key?} key to developing an efficient 3D streaming system.
|
Analyzing the similarities and the differences between the video and the 3D scenarios as well as having knowledge about video streaming literature are the key to developing an efficient 3D streaming system.
|
||||||
|
|
||||||
\subsection{Chunks of data}
|
\subsection{Chunks of data}
|
||||||
|
|
||||||
In order to be able to perform streaming, data needs to be segmented so that a client can request chunks of data and display it to the user while requesting another chunk.
|
In order to be able to perform streaming, data needs to be segmented so that a client can request chunks of data and display it to the user while requesting another chunk.
|
||||||
In video streaming, data chunks typically consist in a few seconds of video.
|
In video streaming, data chunks typically consist in a few seconds of video.
|
||||||
In mesh streaming, some progressive mesh approaches encode a base mesh that contains low resolution geometry and textures and different chunks that increase the resolution of the base mesh.
|
In mesh streaming, some progressive mesh approaches encode a base mesh that contains low resolution geometry and textures and different chunks that increase the resolution of the base mesh.
|
||||||
Otherwise, a mesh can also be segmented by separating geometry and textures, creating chunks that contain some faces of the model, or some textures.
|
Otherwise, a mesh can also be segmented by separating geometry and textures, creating chunks that contain some faces of the model, or some other chunks containing textures.
|
||||||
|
|
||||||
\subsection{Data persistence}
|
\subsection{Data persistence}
|
||||||
|
|
||||||
|
@ -36,7 +36,8 @@ It can be chosen directly by the user or automatically determined by analysing t
|
||||||
\caption{The different resolutions available for a Youtube video}
|
\caption{The different resolutions available for a Youtube video}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
In the same way, recent work in 3D streaming have proposed many ways to progressively streaming 3D models, displaying a low resolution to the user without latency, and supporting interaction with the model while the details are being downloaded.
|
Similarly, recent work in 3D streaming have proposed different ways to progressively streaming 3D models, displaying a low resolution to the user without latency, and supporting interaction with the model while the details are being downloaded.
|
||||||
|
Such strategies are reviewed in Section~\ref{sote:3d-streaming}.
|
||||||
|
|
||||||
\subsection{Media types}
|
\subsection{Media types}
|
||||||
|
|
||||||
|
@ -45,15 +46,15 @@ In video, those media typically are images, sounds, and eventually subtitles, wh
|
||||||
In both cases, an algorithm for content streaming has to acknowledge those different media types and manage them correctly.
|
In both cases, an algorithm for content streaming has to acknowledge those different media types and manage them correctly.
|
||||||
|
|
||||||
In video streaming, most of the data (in terms of bytes) is used for images.
|
In video streaming, most of the data (in terms of bytes) is used for images.
|
||||||
Thus, the most important thing a video streaming system should do is optimize the image streaming.
|
Thus, the most important thing a video streaming system should do is to optimize images streaming.
|
||||||
That's why, on a video on Youtube for example, there may be 6 resolutions for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 resolutions for sound.
|
That is why, on a video on Youtube for example, there may be 6 resolutions for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 resolutions for sound.
|
||||||
This is one of the main differences between video and 3D streaming: in a 3D scene, geometry and texture sizes are approximately the same, and leveraging between those two types of content is a key problem.
|
This is one of the main differences between video and 3D streaming: in a 3D scene, geometry and texture sizes are approximately the same, and leveraging between those two types of content is a key problem.
|
||||||
|
|
||||||
\subsection{Interaction}
|
\subsection{Interaction}
|
||||||
|
|
||||||
The ways of interacting with the content is probably the most important difference between video and 3D.
|
The ways of interacting with the content is probably the most important difference between video and 3D.
|
||||||
In a video interface, there is only one degree of freedom: time.
|
In a video interface, there is only one degree of freedom: time.
|
||||||
The only thing a user can do is let the video play itself, pause or resume it, or jump to another moment in the video.
|
The only thing a user can do is letting the video play, pausing, resuming, or jumping to another time in the video.
|
||||||
Even though these interactions seem easy to handle, giving the best possible experience to the user is already challenging. For example, to perform these few actions, Youtube provides the user with multiple options.
|
Even though these interactions seem easy to handle, giving the best possible experience to the user is already challenging. For example, to perform these few actions, Youtube provides the user with multiple options.
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
|
@ -62,22 +63,22 @@ Even though these interactions seem easy to handle, giving the best possible exp
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item click the video;
|
\item click the video;
|
||||||
\item press the \texttt{K} key;
|
\item press the \texttt{K} key;
|
||||||
\item press the space key if the video is focused by the browser.
|
\item press the space key.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
\item To navigate to another moment of the video, the user can:
|
\item To navigate to another time in the video, the user can:
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item click the timeline of the video where they wants;
|
\item click the timeline of the video where they wants;
|
||||||
\item press the left arrow key to move 5 seconds backwards;
|
\item press the left arrow key to move 5 seconds backwards;
|
||||||
\item press the right arrow key to move 5 seconds forwards;
|
\item press the right arrow key to move 5 seconds forwards;
|
||||||
\item press the \texttt{J} key to move 10 seconds backwards;
|
\item press the \texttt{J} key to move 10 seconds backwards;
|
||||||
\item press the \texttt{L} key to move 10 seconds forwards;
|
\item press the \texttt{L} key to move 10 seconds forwards;
|
||||||
\item press one of the number key (on the first row of the keyboard, below the function keys) to move the corresponding decile of the video.
|
\item press one of the number key (on the first row of the keyboard, below the function keys, or on the numpad) to move the corresponding decile of the video.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
There are even ways of controlling the other options, for example, \texttt{F} puts the player in fullscreen mode, up and down arrows change the sound volume, \texttt{M} mutes the sound and \texttt{C} activates the subtitles.
|
There are also controls for other options: for example, \texttt{F} puts the player in fullscreen mode, up and down arrows change the sound volume, \texttt{M} mutes the sound and \texttt{C} activates the subtitles.
|
||||||
All the interactions are summed up in Figure~\ref{i:youtube-keyboard}.
|
All the interactions are summed up in Figure~\ref{i:youtube-keyboard}.
|
||||||
|
|
||||||
\newcommand{\relativeseekcontrol}{LightBlue}
|
\newcommand{\relativeseekcontrol}{LightBlue}
|
||||||
|
@ -266,8 +267,8 @@ All the interactions are summed up in Figure~\ref{i:youtube-keyboard}.
|
||||||
Those interactions are different if the user is using a mobile device.
|
Those interactions are different if the user is using a mobile device.
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item To pause a video, the user must touch the screen once to make the timeline and the buttons appear and once on the pause button at the center of the screen.
|
\item To pause a video, the user touches the screen once to make the timeline and the buttons appear and once on the pause button at the center of the screen.
|
||||||
\item To resume a video, the user must touch the play button at the center of the screen.
|
\item To resume a video, the user touches the play button at the center of the screen.
|
||||||
\item To navigate to another moment of the video, the user can:
|
\item To navigate to another moment of the video, the user can:
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item double touch the left of the screen to move 5 seconds backwards;
|
\item double touch the left of the screen to move 5 seconds backwards;
|
||||||
|
@ -279,8 +280,8 @@ When it comes to 3D, there are many approaches to manage user interaction.
|
||||||
Some interfaces mimic the video scenario, where the only variable is the time and the camera follows a predetermined path on which the user has no control.
|
Some interfaces mimic the video scenario, where the only variable is the time and the camera follows a predetermined path on which the user has no control.
|
||||||
These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
|
These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
|
||||||
|
|
||||||
Some other interfaces add 2 degrees of freedom to the previous one: the user does not control the position of the camera but they can control the angle. This mimics the scenario of the 360 video.
|
Some other interfaces add 2 degrees of freedom to the timeline: the user does not control the position of the camera but they can control the angle. This mimics the scenario of the 360 video.
|
||||||
This is typically the case of the video game \emph{nolimits 2: roller coaster simulator} which works with VR devices (oculus rift, HTC vive, etc.) where the only interaction the user has is turning the head.
|
This is typically the case of the video game \emph{nolimits 2: roller coaster simulator} which works with VR devices (oculus rift, HTC vive, etc.) where the only interaction the user has is turning their head.
|
||||||
|
|
||||||
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angle (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
|
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angle (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
|
||||||
The most common controls are the trackball controls where the user rotate the object like a ball \href{https://threejs.org/examples/?q=controls\#misc_controls_trackball}{(live example here)} and the orbit controls, which behave like the trackball controls but preserving the up vector \href{https://threejs.org/examples/?q=controls\#misc_controls_orbit}{(live example here)}.
|
The most common controls are the trackball controls where the user rotate the object like a ball \href{https://threejs.org/examples/?q=controls\#misc_controls_trackball}{(live example here)} and the orbit controls, which behave like the trackball controls but preserving the up vector \href{https://threejs.org/examples/?q=controls\#misc_controls_orbit}{(live example here)}.
|
||||||
|
@ -289,12 +290,12 @@ These controls are typically used in shooting video games, the mouse rotates the
|
||||||
|
|
||||||
\subsection{Relationship between interface, interaction and streaming}
|
\subsection{Relationship between interface, interaction and streaming}
|
||||||
|
|
||||||
In both video and 3D systems, streaming affects the interaction.
|
In both video and 3D systems, streaming affects interaction.
|
||||||
For example, in a video streaming scenario, if a user sees that the video is fully loaded, they might start moving around on the timeline, but if they see that the streaming is just enough to not stall, they might prefer staying peaceful and just watch the video.
|
For example, in a video streaming scenario, if a user sees that the video is fully loaded, they might start moving around on the timeline, but if they see that the streaming is just enough to not stall, they might prefer not interacting and just watch the video.
|
||||||
If the streaming stalls for too long, the user might seek somewhere else hoping for the video to resume, or get frustrated and leave the video.
|
If the streaming stalls for too long, the user might seek somewhere else hoping for the video to resume, or get frustrated and leave the video.
|
||||||
The same types of behaviour occur in 3D streaming: if a user is somewhere in a scene, and sees more data appearing, they might wait until enough data has arrived, but if they sees nothing happens, they might leave to look for data somewhere else.
|
The same types of behaviour occur in 3D streaming: if a user is somewhere in a scene, and sees more data appearing, they might wait until enough data has arrived, but if they see nothing happens, they might leave to look for data somewhere else.
|
||||||
|
|
||||||
Those examples show how streaming can affect the interaction, but the interaction also affects the streaming.
|
Those examples show how streaming can affect interaction, but interaction also affects streaming.
|
||||||
In a video streaming scenario, if a user is watching peacefully without interacting, the system just has to request the next chunks of video and display them.
|
In a video streaming scenario, if a user is watching peacefully without interacting, the system just has to request the next chunks of video and display them.
|
||||||
However, if a user starts seeking at a different time of the streaming, the streaming would most likely stall until the system is able to gather the data it needs to resume the video.
|
However, if a user starts seeking at a different time of the streaming, the streaming would most likely stall until the system is able to gather the data it needs to resume the video.
|
||||||
Just like in the video setup, the way a user navigates in a networked virtual environment affects the streaming.
|
Just like in the video setup, the way a user navigates in a networked virtual environment affects the streaming.
|
||||||
|
|
|
@ -2,12 +2,13 @@
|
||||||
|
|
||||||
\section{Open problems\label{i:challenges}}
|
\section{Open problems\label{i:challenges}}
|
||||||
|
|
||||||
The objective of our work is to design a system that allows a user to access remote 3D content and that guarantees good quality of service and quality of experience.
|
The objective of our work is to design a system that allows a user to access remote 3D content and that guarantees both good quality of service and good quality of experience.
|
||||||
A 3D streaming client has lots of tasks to accomplish:
|
A 3D streaming client has lots of tasks to accomplish:
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item render a scene;
|
\item render a scene;
|
||||||
\item find out what to download and download it;
|
\item find out what part of the model to download next;
|
||||||
|
\item download the next part;
|
||||||
\item parse the downloaded content;
|
\item parse the downloaded content;
|
||||||
\item add the parsed result to the scene;
|
\item add the parsed result to the scene;
|
||||||
\item manage the interaction with the user.
|
\item manage the interaction with the user.
|
||||||
|
@ -19,25 +20,26 @@ This opens multiple problems that we need to take care of.
|
||||||
% Any preprocessing that can be done on our 3D data gives us a strategical advantage since it consists in computations that will not be needed live, neither for the server nor for the client.
|
% Any preprocessing that can be done on our 3D data gives us a strategical advantage since it consists in computations that will not be needed live, neither for the server nor for the client.
|
||||||
% Furthermore, for streaming, data needs to be split into chunks that are requested separately, so perparing those chunks in advance can also help the streaming.
|
% Furthermore, for streaming, data needs to be split into chunks that are requested separately, so perparing those chunks in advance can also help the streaming.
|
||||||
Before streaming content, it needs to be prepared.
|
Before streaming content, it needs to be prepared.
|
||||||
The segmentation of the content into chunks is particularly important for streaming since it allows transmitting only a portion of the data to the client, that it can render before downloading more chunks.
|
The segmentation of the content into chunks is particularly important for streaming since it allows transmitting only a portion of the data to the client.
|
||||||
|
A partial model consisting in the downloaded content, it can be rendered before downloading more chunks.
|
||||||
Content preparation also includes compression.
|
Content preparation also includes compression.
|
||||||
One of the questions this thesis has to answer is \emph{``what is the best way to prepare 3D content so that a client can benefit from it?''}
|
One of the questions this thesis has to answer is: \emph{what is the best way to prepare 3D content so that a client can progressively download and render the 3D model?}
|
||||||
|
|
||||||
\paragraph{Streaming policies.}
|
\paragraph{Streaming policies.}
|
||||||
Once our content is prepared and split in chunks, a client needs to determine which chunks it needs to download.
|
Once our content is prepared and split in chunks, a client needs to determine which chunks should be downloaded first.
|
||||||
A chunk that contains data in the field of view of the user is more relevant than a chunk outside of it; a chunk that is close to the camera is more relevant than a chunk far away from the camera, etc.
|
A chunk that contains data in the field of view of the user is more relevant than a chunk that is not inside; a chunk that is close to the camera is more relevant than a chunk far away from the camera, etc.
|
||||||
This should also include other contextual parameters, such as the size of a chunk, the bandwidth, the user's behaviour, etc.
|
This should also include other contextual parameters, such as the size of a chunk, the bandwidth, the user's behaviour, etc.
|
||||||
The most important question we have to answer is \emph{how to determine which chunks need to be downloaded depending on the chunks themselves and the user's interactions?}
|
The most important questions we have to answer are: \emph{how to estimate a chunk utility, and how to determine which chunks need to be downloaded depending on the chunks themselves and the user's interactions?}
|
||||||
|
|
||||||
\paragraph{Evaluation.}
|
\paragraph{Evaluation.}
|
||||||
In such systems, the two most important criteria for evaluation are quality of service, and quality of experience.
|
In such systems, the two most important criteria for evaluation are quality of service, and quality of experience.
|
||||||
The quality of service is a network-centric metric, which considers values such as throughput.
|
The quality of service is a network-centric metric, which considers values such as throughput.
|
||||||
The quality of experience is a user-centric metric, and can only be measured by asking how users feel about a system.
|
The quality of experience is a user-centric metric, and can only be measured by asking how users feel about a system.
|
||||||
To be able to know which streaming policies are best, one needs to know \emph{how to compare streaming policies and evalute the impact of their parameters in terms of quality of service and quality of experience?}
|
To be able to know which streaming policies are best, one needs to know \emph{how to compare streaming policies and evaluate the impact of their parameters in terms of quality of service and quality of experience?}
|
||||||
|
|
||||||
\paragraph{Implementation.}
|
\paragraph{Implementation.}
|
||||||
The objective of our work is to setup a client-server architecture that answers the problems mentioned earlier (content preparation, chunk utility, streaming policies).
|
The objective of our work is to setup a client-server architecture that answers the above problems: content preparation, chunk utility, streaming policies.
|
||||||
In this regard, we have to find out \emph{how do we build this architecture while keeping the computational load on the server low so it can scale and the client efficient so that it has resources to perform its many tasks?}
|
In this regard, we have to find out \emph{how do we build this architecture that keeps the computational load for the server low so it scales up and for the client so that it has enough resources to perform the tasks described above?}
|
||||||
|
|
||||||
% This implementation must respect constraints required for performant software:
|
% This implementation must respect constraints required for performant software:
|
||||||
%
|
%
|
||||||
|
|
|
@ -4,16 +4,16 @@
|
||||||
|
|
||||||
During the last years, 3D acquisition and modeling techniques have progressed a lot.
|
During the last years, 3D acquisition and modeling techniques have progressed a lot.
|
||||||
Recent software such as \href{https://alicevision.org/\#meshroom}{Meshroom} use \emph{structure from motion} and \emph{multi view stereo} to infer a 3D model from a set of photographs.
|
Recent software such as \href{https://alicevision.org/\#meshroom}{Meshroom} use \emph{structure from motion} and \emph{multi view stereo} to infer a 3D model from a set of photographs.
|
||||||
There are more and more devices that are specifically built to obtain 3D data: some are more expensive and provide with very precise information such as Lidar, and some cheaper devices can obtain coarse data such as the Kinect.
|
There are more and more devices that are specifically built to obtain 3D data: some are more expensive and provide with very precise information such as LIDAR (Light Detection And Ranging, as in RADAR but with light instead of radio waves), and some cheaper devices can obtain coarse data such as the Kinect.
|
||||||
Thanks to these techniques, more and more 3D data become available.
|
Thanks to these techniques, more and more 3D data become available.
|
||||||
These models have potential for multiple purposes, for example, they can be 3D printed, which can reduce the production cost of some pieces of hardware or enable the creation of new objects, but most uses will consist in visualisation.
|
These models have potential for multiple purposes, for example, they can be 3D printed, which can reduce the production cost of some pieces of hardware or enable the creation of new objects, but most uses will consist in visualisation.
|
||||||
For example, they can be used for augmented reality, to provide user with feedback that can be useful to help worker with complex tasks, but also for fashion (for example, \emph{Fitting Box} is a company that develops software to virtually try glasses).
|
For example, they can be used for augmented reality, to provide user with feedback that can be useful to help worker with complex tasks, but also for fashion (for example, \emph{Fitting Box} is a company that develops software to virtually try glasses).
|
||||||
3D acquisition and visualisation is also useful to preserve cultural heritage, and software such as Google Heritage or 3DHop are such examples, or to allow users navigating in a city (as in Google Earth or Google Maps in 3D).
|
3D acquisition and visualisation is also useful to preserve cultural heritage, and software such as Google Heritage or 3DHop are such examples, or to allow users navigating in a city (as in Google Earth or Google Maps in 3D).
|
||||||
\href{https://sketchfab.com}{Sketchfab} is an example of a website allowing users to share their 3D models and visualise models from other users.
|
\href{https://sketchfab.com}{Sketchfab} is an example of a website allowing users to share their 3D models and visualise models from other users.
|
||||||
In most 3D visualisation systems, the 3D data needs to be transmitted to a terminal before the user can visualise it.
|
In most 3D visualisation systems, the 3D data is stored on a server and needs to be transmitted to a terminal before the user can visualise it.
|
||||||
The improvements in the acquisition setups we described lead to an increasing quality of the 3D models, and an increasing size in bytes as well.
|
The improvements in the acquisition setups we described lead to an increasing quality of the 3D models, thus an increasing size in bytes as well.
|
||||||
Simply downloading 3D content and waiting until the content is fully downloaded to let the user visualise it is no longer a satisfactory solution, and streaming needs to be performed.
|
Simply downloading 3D content and waiting until the content is fully downloaded to let the user visualise it is no longer a satisfactory solution, so streaming needs to be performed.
|
||||||
In this thesis, we are especially interested in the navigation and in the streaming of large 3D scenes, such as districts or whole cities.
|
In this thesis, we propose a full framework for the navigation and the streaming of large 3D scenes, such as districts or whole cities.
|
||||||
|
|
||||||
% With the progress in data acquisition and modeling techniques, networked virtual environments, or NVE, are increasing in scale.
|
% With the progress in data acquisition and modeling techniques, networked virtual environments, or NVE, are increasing in scale.
|
||||||
% For instance,~\cite{urban-data-visualisation} reported that the 3D scene for the city of Lyon takes more than 30 GB of data.
|
% For instance,~\cite{urban-data-visualisation} reported that the 3D scene for the city of Lyon takes more than 30 GB of data.
|
||||||
|
|
|
@ -1,23 +1,23 @@
|
||||||
\section{Thesis outline}
|
\section{Thesis outline}
|
||||||
|
|
||||||
First, in Chapter~\ref{f}, we give some preliminary information required to understand the types of objects we are manipulating in this thesis.
|
First, in Chapter~\ref{f}, we give some preliminary information required to understand the types of objects we are manipulating in this thesis.
|
||||||
We then proceed to compare 3D and video content: surprisingly, video and 3D share many problems, and analysing them gives inspiration for building a 3D streaming system.
|
We then proceed to compare 3D and video content: surprisingly, video and 3D share many features, and analysing video setting gives inspiration for building a 3D streaming system.
|
||||||
|
|
||||||
In Chapter~\ref{sote}, we present a review of the state of the art in the multimedia interaction and streaming.
|
In Chapter~\ref{sote}, we present a review of the state of the art in the multimedia interaction and streaming.
|
||||||
This chapter starts with an analysis of the video streaming standards.
|
This chapter starts with an analysis of the video streaming standards.
|
||||||
Then it reviews the different manners of performing 3D streaming.
|
Then it reviews the different 3D streaming approaches.
|
||||||
The last section of this chapter focuses on 3D interaction.
|
The last section of this chapter focuses on 3D interaction.
|
||||||
|
|
||||||
Then, in Chapter~\ref{bi}, we present our first contribution: an in-depth analysis of the impact of the UI on navigation and streaming in a 3D scene.
|
Then, in Chapter~\ref{bi}, we present our first contribution: an in-depth analysis of the impact of the UI on navigation and streaming in a 3D scene.
|
||||||
We first develop a basic interface for navigating in 3D and we introduce 3D objects called \emph{bookmarks} that help users navigating in the scene.
|
We first develop a basic interface for navigating in 3D and thus, we introduce 3D objects called \emph{bookmarks} that help users navigating in the scene.
|
||||||
We then present a user study that we conducted on 50 people that shows that bookmarks have a great impact on how easy it is for a user to perform tasks such as finding objects.
|
We then present a user study that we conducted on 50 people that shows that bookmarks ease user navigation as bookmark improves performance at tasks such as finding objects.
|
||||||
% Then, we setup a basic 3D streaming system that allows us to replay the traces collected during the user study and simulate 3D streaming at the same time.
|
% Then, we setup a basic 3D streaming system that allows us to replay the traces collected during the user study and simulate 3D streaming at the same time.
|
||||||
We analyse how the presence of bookmarks impacts the streaming, and we propose and evaluate a few streaming policies that rely on pre-computations that can be made thanks to bookmarks and that can increase the quality of experience.
|
We analyse how the presence of bookmarks impacts the streaming: we propose and evaluate streaming policies based on pre-computations relying on bookmarks and that measurably increase the quality of experience.
|
||||||
|
|
||||||
In Chapter~\ref{d3}, we present the most important contribution of this thesis: DASH-3D.
|
In Chapter~\ref{d3}, we present the most important contribution of this thesis: DASH-3D.
|
||||||
DASH-3D is an adaptation of the video streaming standard to 3D streaming.
|
DASH-3D is an adaptation of DASH (Dynamic Adaptive Streaming over HTTP): the video streaming standard, to 3D streaming.
|
||||||
We first describe how we adapt the concepts of DASH to 3D content, including the segmentation of content.
|
We first describe how we adapt the concepts of DASH to 3D content, including the segmentation of content.
|
||||||
We then define utility metrics that associate score to each chunk depending on the camera's position.
|
We then define utility metrics that associate score to each chunk depending on the user's position.
|
||||||
Then, we present a client and various streaming policies based on our utilities that can benefit from the DASH format.
|
Then, we present a client and various streaming policies based on our utilities that can benefit from the DASH format.
|
||||||
We finally evaluate the different parameters of our client.
|
We finally evaluate the different parameters of our client.
|
||||||
|
|
||||||
|
|
|
@ -1,5 +1,5 @@
|
||||||
\fresh{}
|
\fresh{}
|
||||||
\section{3D Streaming}
|
\section{3D Streaming\label{sote:3d-streaming}}
|
||||||
|
|
||||||
\subsection{Compression and structuring}
|
\subsection{Compression and structuring}
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue