proof read

This commit is contained in:
2019-09-25 11:51:07 +02:00
parent c491eb308b
commit 47b0ddf540
13 changed files with 359 additions and 158 deletions

View File

@@ -1,14 +1,11 @@
\fresh{}
\section{3D Streaming}
\subsection{Progressive meshes}
It is not possible to speak about 3D streaming without speaking about progressive meshes.
Progressive meshes were introduced by Hughes Hoppe in 1996~\cite{progressive-meshes} and allow transmitting a mesh by send first a low resolution mesh, called \emph{base mesh}, and then transmitting detail information that a client can use to increase the resolution.
Progressive meshes were introduced by~\citet{progressive-meshes} and allow transmitting a mesh by sending a low resolution mesh first, called \emph{base mesh}, and then transmitting detail information that a client can use to increase the resolution.
To do so, an algorithm, called \emph{decimation algorithm} removes vertices and faces by merging vertices (Figure~\ref{sote:progressive-scheme}).
Each time two vertices are merged, vertices and faces are removed from the original mesh, and the resolution of the model decreases a little.
When the model is light enough, it is encoded as is, and the operations needed to recover the initial resolution of the model are encoded as well.
Thus, a client can start by downloading the low resolution model, display it to the user, and keep downloading and displaying details as time goes by.
This process reduces the time a user has to wait before seeing something, and increases the quality of experience.
\begin{figure}[ht]
\centering
@@ -65,18 +62,22 @@ This process reduces the time a user has to wait before seeing something, and in
\caption{Vertex split and edge collapse\label{sote:progressive-scheme}}
\end{figure}
Every time two vertices are merged, vertices and faces are removed from the original mesh, and the resolution of the model decreases a little.
When the model is light enough, it is encoded as is, and the operations needed to recover the initial resolution of the model are encoded as well.
Thus, a client can start by downloading the low resolution model, display it to the user, and keep downloading and displaying details as time goes by.
This process reduces the time a user has to wait before seeing something, and increases the quality of experience.
\subsection{glTF}
In a recent standardization effort, the Khronos group has proposed a generic format called glTF (GL Transmission Format~\cite{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated model, etc\ldots
glTF is based on a JSON file, which encodes the structure of a scene of 3D objects.
It can contain a scene tree with cameras, meshes, buffers, materials, textures, animations an skinning information.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming.
However, glTF in itself does not address the problem of view-dependent 3D streaming which is required for large scene remote visualisation.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming which is required for large scene remote visualisation.
\subsection{3D Tiles}
3D Tiles is a specification for visualizing massive 3D geospatial data developped by Cesium and built on glTF\@.
\todo{add stuff here}
\copied{}
\subsection{Prefetching in NVE}

View File

@@ -5,19 +5,20 @@
\subsection{DASH\@: the standard for video streaming\label{sote:dash}}
\copied{}
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH~\cite{dash-std,dash-std-2}, is now a widely deployed
standard for streaming adaptive video content on the Web~\cite{dash-std-full}, made to be simple and scalable.
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH (\citet{dash-std,dash-std-2}), is now a widely deployed
standard for streaming adaptive video content on the Web (\citet{dash-std-full}), made to be simple and scalable.
\fresh{}
DASH is based on a clever way of structuring the content that allows a great adaptability during the streaming without requiring any server side computation.
\subsubsection{DASH structure}
All those pieces are structured in a Media Persentation Description (MPD) file, written in the XML format.
This file has 4 layers, the periods, the adaptation sets, the representations and the segments.
Each period can have many adaptation sets, each adaptation set can have many representation, and each representation can have many segments.
All the content structure is described in a Media Persentation Description (MPD) file, written in the XML format.
This file has 4 layers: the periods, the adaptation sets, the representations and the segments.
A MPD behaves like a tree-structure, meaning that each period can have many adaptation sets, each adaptation set can have many representation, and each representation can have many segments.
\paragraph{Periods.}
Periods are used to delimit content depending on the time. It can be used to delimit chapters, or to add advertisements that occur at the beginning, during or at the end of a video.
Periods are used to delimit content depending on the time.
It can be used to delimit chapters, or to add advertisements that occur at the beginning, during or at the end of a video.
\paragraph{Adaptation sets.}
Adaptation sets are used to delimit content depending of the format.
@@ -27,37 +28,39 @@ In videos, most of the time, each period has at least one adaptation set contain
\paragraph{Representations.}
The representation level is the level DASH uses to offer the same content at different levels of resolution.
For example, a adaptation set containing images have a representation for each available resolution (it might be 480p, 720p, 1080p, etc\ldots).
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal resolution, being the highest resolution that arrives on time to avoid stalling.
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal resolution, being the highest resolution that the client can request without stalling.
\paragraph{Segments.}
Until this level of the MPD, content can be long.
For example, a representation of images of a chapter of a movie can be heavy and long to download.
However, downloading heavy files is not suitable for streaming because it prevents the dynamicity of it: if the user requests to change the level of resolution of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
Until this level in the MPD, content has been divided but it is still far from being sufficiently divided to be streamed efficiently.
In fact, a representation of the images of a chapter of a movie is still a long video, and keeping such a big file is not possible since downloading heavy files is not suitable for streaming.
In fact, heavy files prevent the dynamicity of streaming: if the user requests to change the level of resolution of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
Segments are used to prevent this behaviour. They typically encode files that last approximately one second of video, and give the software a great ability to dynamically adapt to the system. If a user wants to seek somewhere else in the video, only one second of data can be lost, and only one second of data has to be downloaded for the playback to resume.
Segments are used to prevent this behaviour.
They typically encode files that last approximately one second of video, and give the software a great ability to dynamically adapt to the system.
If a user wants to seek somewhere else in the video, only one second of data can be lost, and only one second of data has to be downloaded for the playback to resume.
\subsubsection{Client side computation}
Once a video is encoded in DASH format, once the files have been structured and the MPD has been generated, they can simply be put on a static HTTP server that does no computation other than serving files when it receives requests.
Once a video is encoded in DASH format, all the files have been structured and the MPD has been generated, all this data can simply be put on a static HTTP server that does no computation other than serving files when it receives requests.
All the intelligence and the decision making is moved to the client side.
A client typically starts by downloading the MPD file, and then proceeds on downloading segments of the different adaptation sets that he needs, estimating itself its downloading speed and choosing itself whether it needs to change representation or not.
\subsection{DASH-SRD}
DASH-SRD (Spatial Relationship Description,~\cite{dash-srd}) is a feature that extends the DASH standard to allow stream only a spatial subpart of a video to a device.
It works by encoding a video at multiple resolutions, and tiling the highest resolutions, that way, a client can choose to download either the low resolution of the whole video or higher resolutions of a subpart of the video (see Figure~\ref{sota:srd-png}).
For each tile of the video, an adaptation set is declared in the MPD, and a supplemental property is defined in order to give the client information about the file.
This supplemental property contains many elements, but the most important ones are the position ($x$ and $y$) and the size (width and height) of the tile describing the position of the tile in relation to the full video. An example of such a property is given in Listing~\ref{sota:srd-xml}.
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed.
This is especially interesting in the context of 3D streaming since we have this same pattern of a user viewing only a part of a content.
DASH-SRD (Spatial Relationship Description,~\cite{dash-srd}) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device.
It works by encoding a video at multiple resolutions, and tiling the highest resolutions as shown in Figure~\ref{sota:srd-png}.
That way, a client can choose to download either the low resolution of the whole video or higher resolutions of a subpart of the video.
\begin{figure}[th]
\centering
\includegraphics[width=\textwidth]{assets/state-of-the-art/video/srd.png}
\includegraphics[width=0.6\textwidth]{assets/state-of-the-art/video/srd.png}
\caption{DASH-SRD~\cite{dash-srd}\label{sota:srd-png}}
\end{figure}
For each tile of the video, an adaptation set is declared in the MPD, and a supplemental property is defined in order to give the client information about the file.
This supplemental property contains many elements, but the most important ones are the position ($x$ and $y$) and the size (width and height) of the tile describing the position of the tile in relation to the full video.
An example of such a property is given in Listing~\ref{sota:srd-xml}.
\begin{figure}[th]
\lstinputlisting[%
language=XML,
@@ -80,6 +83,9 @@ This is especially interesting in the context of 3D streaming since we have this
]{assets/state-of-the-art/video/srd.xml}
\end{figure}
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed.
This is especially interesting in the context of 3D streaming since we have this same pattern of a user viewing only a part of a content.
\subsection{Prefetching in video steaming}
\copied{}