This commit is contained in:
2019-11-27 22:37:49 +01:00
parent 2146a28ddd
commit de4c390f70
13 changed files with 55 additions and 52 deletions

View File

@@ -1,8 +1,8 @@
\section{3D Bookmarks and Navigation Aids}
The only use for 3D streaming is to allow users interacting with the content while it is being downloaded.
One of the uses for 3D streaming is to allow users interacting with the content while it is being downloaded.
However, devising an ergonomic technique for browsing 3D environments through a 2D interface is difficult.
Controlling the viewpoint in 3D (6 DOFs) with 2D devices is not only inherently challenging but also strongly task-dependent. In their review,~\citep{interaction-3d-environment} distinguish between several types of camera movements: general movements for exploration (e.g., navigation with no explicit target), targeted movements (e.g., searching and/or examining a model in detail), specified trajectory (e.g., a cinematographic camera path), etc.
Controlling the viewpoint in 3D (6 DOFs) with 2D devices is not only inherently challenging but also strongly task-dependent. In their review,~\citep{interaction-3d-environment} distinguish between several types of camera movements: general movements for exploration (e.g., navigation with no explicit target), targeted movements (e.g., searching and/or examining a model in detail), specified trajectory (e.g., a cinematographic camera path).
For each type of movement, specialized 3D interaction techniques can be designed.
In most cases, rotating, panning, and zooming movements are required, and users are consequently forced to switch back and forth among several navigation modes, leading to interactions that are too complicated overall for a layperson.
Navigation aids and smart widgets are required and subject to research efforts both in 3D companies (see \url{sketchfab.com}, \url{cl3ver.com} among others) and in academia, as reported below.
@@ -34,7 +34,7 @@ Such viewpoints can be either static, or dynamically adapted:~\citep{dual-mode-u
Early 3D VRML environments \citep{browsing-3d-bookmarks} offer 3D bookmarks with animated transitions between bookmarked views.
These transitions prevent disorientation since users see how they got there.
Hyperlinks can also ease rapid movements between distant viewpoints and naturally support non-linear and non-continuous access to 3D content.
Navigating with 3D hyperlinks is potentially faster, but is likely to cause disorientation, as shown by the work of~\citep{ve-hyperlinks}.
Navigating with 3D hyperlinks is faster due to the instant motion, but can cause disorientation, as shown by the work of~\citep{ve-hyperlinks}.
\citep{linking-behavior-ve} examine explicit landmark links as well as implicit avatar-chosen links in Second Life.
These authors point out that linking is appreciated by users and that easing linking would likely result in a richer user experience.
\citep{dual-mode-ui} developed the Dual-Mode User Interface (DMUI) that coordinates and links hypertext to 3D graphics in order to access information in a 3D space.

View File

@@ -101,7 +101,7 @@ However, users are often interested in scenes that contain multiple meshes, and
To answer those issues, the Khronos group proposed a generic format called glTF (GL Transmission Format,~\citep{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated model, etc.\
glTF is based on a JSON file, which encodes the structure of a scene of 3D objects.
It contains a scene graph with cameras, meshes, buffers, materials, textures and animations.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming which is required for large scene remote visualisation and that we address in our work.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming, which is required for large scene remote visualisation and which we address in our work.
% Zampoglou

View File

@@ -3,19 +3,19 @@
Accessing a remote video through the Web has been a widely studied problem since the 1990s. The Real-time Transport Protocol (RTP,~\cite{rtp-std}) has been an early attempt
to formalize audio and video streaming. The protocol allowed data to be transferred unilaterally from a server to a client, and required the server to handle a separate session for each client. While this protocol can be useful in particular scenarii, such as video-conferencing, it can not realistically scale to modern video streaming platforms such as Youtube or Netflix, which must serve millions of simultaneous clients.
Because of this limitation, and while the increasing network capabilities made video streaming a more and more common practice, a new trend emerged during the 2000s. Building on the democratization of HTTP servers, many industrial actors (Apple, Microsoft, Adobe, etc.) developed HTTP streaming systems to deliver multimedia content over the network. In an effort to bring interoperability between all different actors, the MPEG group launched an initiative which eventually became a standard known as DASH, Dynamic Adaptive Streaming over HTTP.
Because of this limitation, and while the increasing network capabilities made video streaming a more and more common practice, a new trend emerged during the 2000s. Building on the democratization of HTTP servers, many industrial actors (Apple, Microsoft, Adobe, etc.) developed HTTP streaming systems to deliver multimedia content over the network. In an effort to bring interoperability between all different actors, the MPEG group launched an initiative, which eventually became a standard known as DASH, Dynamic Adaptive Streaming over HTTP\@.
\subsection{DASH\@: the standard for video streaming\label{sote:dash}}
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH \citep{dash-std,dash-std-2}, is now a widely deployed
standard for adaptively streaming video on the Web \citep{dash-std-full}, made to be simple, scalable and inter-operable.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content should be downloaded, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content should be downloaded, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD\@.
\subsubsection{DASH structure}
All the content structure is described in a Media Presentation Description (MPD) file, written in the XML format.
This file has 4 layers: the periods, the adaptation sets, the representations and the segments.
A MPD has a hierarchical structure, meaning that it has multiple periods, and that each period can have multiple adaptation sets, that each adaptation set can have multiple representation, and that each representation can have multiple segments.
This file has 4 layers: the periods, the adaptation sets, the representations, and the segments.
A MPD has a hierarchical structure, meaning it has multiple periods, and each period can have multiple adaptation sets, each adaptation set can have multiple representation, and each representation can have multiple segments.
\paragraph{Periods.}
Periods are used to delimit content depending on time.
@@ -28,13 +28,13 @@ In videos, most of the time, each period has at least one adaptation set contain
It may also have an adaptation set for subtitles.
\paragraph{Representations.}
The representation level is the level DASH uses to offer the same content at different levels of resolution.
For example, an adaptation set containing images has a representation for each available resolution (it might be 480p, 720p, 1080p, etc.).
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal resolution, being the highest resolution that the client can request without stalling.
The representation level is the level DASH uses to offer the same content at different levels of quality.
For example, an adaptation set containing images has a representation for each available quality (it might be 480p, 720p, 1080p, etc.).
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal representation, being the highest quality that the client can request without stalling.
\paragraph{Segments.}
Until this level in the MPD, content has been divided but it is still far from being sufficiently divided to be streamed efficiently.
In fact, a representation of the images of a chapter of a movie is still a long video, and keeping such a big file is not possible since heavy files prevent streaming adaptability: if the user requests to change the level of resolution of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
A representation of the images of a chapter of a movie is still a long video, and keeping such a big file is not possible since heavy files prevent streaming adaptability: if the user requests to change the quality of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
Segments are used to prevent this issue.
They typically encode files that contain two to ten seconds of video, and give the software a greater ability to dynamically adapt to the system.
@@ -42,8 +42,8 @@ If a user wants to seek somewhere else in the video, only one segment of data is
\subsubsection{Content preparation and server}
Encoding a video in DASH format consists in partitioning the content into periods, adaptation sets, representations and segments as explained above, and generating a Media Presentation Description file (MPD) that describes this organisation.
Once the data is prepared, it can simply be hosted on a static HTTP server that does no computation other than serving files when it receives requests.
Encoding a video in DASH format consists in partitioning the content into periods, adaptation sets, representations and segments as explained above, and generating a Media Presentation Description file (MPD) which describes this organisation.
Once the data is prepared, it can simply be hosted on a static HTTP server which does no computation other than serving files when it receives requests.
All the intelligence and the decision making is moved to the client side.
This is one of the DASH strengths: no powerful server is required, and since static HTTP server are stable and efficient, all DASH clients can benefit from it.
@@ -51,7 +51,7 @@ This is one of the DASH strengths: no powerful server is required, and since sta
\subsubsection{Client side adaptation}
A client typically starts by downloading the MPD file, and then proceeds on downloading segments from the different adaptation sets. While the standard describes well how to structure content on the server side, the client may be freely implemented to take into account the specificities of a given application.
The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have of course their own adaptation logic, and many more have been proposed in the literature. A thorough review is beyond the scope of this state-of-the-art, but examples include \citep{chiariotti2016online} who formulate the problem in a reinforcement learning framework, \citep{yadav2017quetra} who formulate the problem using Queuing theory, or \citep{huang2019hindsight} who use a formulation derived from the Knapsack problem.
The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have their own adaptation logic, and many more have been proposed in the literature. A thorough review is beyond the scope of this state-of-the-art, but examples include \citep{chiariotti2016online} who formulate the problem in a reinforcement learning framework, \citep{yadav2017quetra} who formulate the problem using Queuing theory, or \citep{huang2019hindsight} who use a formulation derived from the Knapsack problem.
\subsection{DASH-SRD}
Being now widely adopted in the context of video streaming, DASH has been adapted to various other contexts.