From 0780f056b7fe71d10a8d8120602500ad2c1ed913 Mon Sep 17 00:00:00 2001 From: Thomas Forgione Date: Wed, 16 Oct 2019 16:11:52 +0200 Subject: [PATCH] This is shit --- src/bib.bib | 10 ++++++ src/state-of-the-art/3d-streaming.tex | 48 +++++++++++++++------------ src/state-of-the-art/video.tex | 15 ++++++--- 3 files changed, 46 insertions(+), 27 deletions(-) diff --git a/src/bib.bib b/src/bib.bib index 88ab5ce..5ef734d 100644 --- a/src/bib.bib +++ b/src/bib.bib @@ -658,3 +658,13 @@ year={2013}, organization={Wiley Online Library} } + + +@inproceedings{streaming-compressed-webgl, + title={Streaming compressed 3D data on the web using JavaScript and WebGL}, + author={Lavou{\'e}, Guillaume and Chevalier, Laurent and Dupont, Florent}, + booktitle={Proceedings of the 18th international conference on 3D web technology}, + pages={19--27}, + year={2013}, + organization={ACM} +} diff --git a/src/state-of-the-art/3d-streaming.tex b/src/state-of-the-art/3d-streaming.tex index 7c77c48..95d2499 100644 --- a/src/state-of-the-art/3d-streaming.tex +++ b/src/state-of-the-art/3d-streaming.tex @@ -3,8 +3,8 @@ \subsection{Compression and structuring} -The most popular compression model for 3D is progressive meshes: they were introduced in~\citep{progressive-meshes} and allow transmitting a mesh by sending a low resolution mesh first, called \emph{base mesh}, and then transmitting detail information that a client can use to increase the resolution. -To do so, an algorithm, called \emph{decimation algorithm} removes vertices and faces by merging vertices (Figure~\ref{sote:progressive-scheme}). +The most popular compression model for 3D is progressive meshes: they were introduced in~\citep{progressive-meshes} and allow a progressive transmission of a mesh by sending a low resolution mesh first, called \emph{base mesh}, and then transmitting detail information that a client can use to increase the resolution. +To do so, an algorithm, called \emph{decimation algorithm}, starts from the original mesh and iteratively removes vertices and faces by merging vertices through the so called \emph{edge collapse} operation (Figure~\ref{sote:progressive-scheme}). \begin{figure}[ht] \centering @@ -62,25 +62,25 @@ To do so, an algorithm, called \emph{decimation algorithm} removes vertices and \end{figure} Every time two vertices are merged, vertices and faces are removed from the original mesh, and the resolution of the model decreases a little. -When the model is light enough, it is encoded as is, and the operations needed to recover the initial resolution of the model are encoded as well. -Thus, a client can start by downloading the low resolution model, display it to the user, and keep downloading and displaying details as time goes by. +After content preparation, the mesh consists in a base mesh and a sequence of partially ordered edge split operations. +Thus, a client can start by downloading the base mesh, display it to the user, and keep downloading and displaying details as time goes by. This process reduces the time a user has to wait before seeing something, and increases the quality of experience. +\citep{streaming-compressed-webgl} develop a dedicated progressive compression algorithm for efficient decoding, in order to be usable on web clients. +With the same objective, \citep{pop-buffer} proposes pop buffer, a progressive compression method based on quantization that allows efficient decoding. + Following this, many approaches use multi triangulation, which creates mesh fragments at different levels of resolution and encodes the dependencies between fragments in a directed acyclic graph. \citep{batched-multi-triangulation} proposes a GPU optimized version of multi triangulation that pushes its performances to real time. It is notably used in 3DHOP (3D Heritage Online Presenter, \citep{3dhop}), a framework to easily build web interfaces to present 3D models to users in the context of cultural heritage. -Some other approaches use voxels in order to progressively stream 3D models. -It is the case of~\citep{pop-buffer}, which proposes the pop buffer, a progressive compression method that allows efficient decoding which is useful, in particular for mobile devices. - -More recently, to answer the need for a standard format for 3D data, the Khronos group has proposed a generic format called glTF (GL Transmission Format,~\citep{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated model, etc. +More recently, to answer the need for a standard format for 3D data, the Khronos group proposed a generic format called glTF (GL Transmission Format,~\citep{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated model, etc.\ glTF is based on a JSON file, which encodes the structure of a scene of 3D objects. -It can contain a scene tree with cameras, meshes, buffers, materials, textures, animations an skinning information. +It contains a scene tree with cameras, meshes, buffers, materials, textures, animations an skinning information. Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming which is required for large scene remote visualisation. % Zampoglou -\citep{zampoglou} are the first to propose DASH to stream 3D content. +\citep{zampoglou} are the first to propose to use DASH to stream 3D content. In their work, the authors describe a system that allows users to access 3D content at multiple resolutions. They organize the content, following DASH terminology, into periods, adaptation sets, representations. Their first adaptation set codes the tree structure of the scene graph. @@ -91,18 +91,6 @@ On the one hand, using segments containing very few faces will induce many HTTP On the other hand, if segments contain too many faces, the time to load the segment will be long and the system loses adaptability. This approach works well for several objects, but does not handle view-dependent streaming, which is desirable in the use case of large NVEs\@. -\subsection{Viewpoint dependency} - -3D Tiles \citep{3d-tiles} is a specification for visualizing massive 3D geospatial data developed by Cesium and built on top of glTF\@. -Their main goal is to display 3D objects on top of regular maps, and the data they use is quite different from ours: while they have nice and regular polygons with all the semantic they need, we only work on a polygon soup with textures. -Their use case is also different from ours, while their interface allows a user to have a top vision of a city, we want our users to move inside a city. - -Another way to implement viewpoint dependency is to access the content near the user's camera. -This approach, implemented in Second Life and several other NVEs (e.g.,~\citep{peer-texture-streaming}), only depends on the location of the avatar, not on its viewing direction. -It exploits spatial locality and works well for any continuous movement of the user, including turning. -Once the set of objects that are likely to be accessed by the user is determined, the next question is in what order should these objects be retrieved. -A simple approach is to retrieve the objects based on distance: the spatial distance from the user's virtual location and rotational distance from the user's view. - \subsection{Geometry and textures} As discussed in Chapter~\ref{f:3d}, meshes consists in two main types of data: geometry and textures. @@ -116,6 +104,22 @@ Their approach is to combine the distortion caused by having lower resolution me This work designs a cost driven framework for 3D data compression, both in terms of geometry and textures. This framework generates an atlas for textures that enables efficient compression and multiresolution scheme. +\subsection{Viewpoint dependency} + +In the case of large scene 3D streaming, viewpoint dependent streaming is a must-have: a user will only be seeing one small portion of the scene at each time, and a system that does not adapt its streaming to the user's point of view is bound to have poor quality of experience. + +A simple way to implement viewpoint dependency is to access the content near the user's camera. +This approach, implemented in Second Life and several other NVEs (e.g.,~\citep{peer-texture-streaming}), only depends on the location of the avatar, not on its viewing direction. +It exploits spatial locality and works well for any continuous movement of the user, including turning. +Once the set of objects that are likely to be accessed by the user is determined, the next question is in what order should these objects be retrieved. +A simple approach is to retrieve the objects based on distance: the spatial distance from the user's virtual location and rotational distance from the user's view. + +More recently, Google integrated Google Earth 3D module into Google Maps. +Users are now able to go to Google Maps, and click the 3D button which shifts the camera from the vertical point of view. +Even though there are no associated publications, it seems that the interface does view dependent streaming: low resolution from the center of the point of view gets downloaded right away, and then, data farther away or higher resolution data gets downloaded. + +In the same vein, \citep{3d-tiles} developed 3D Tiles, is a specification for visualizing massive 3D geospatial data developed by Cesium and built on top of glTF\@. +Their main goal is to display 3D objects on top of regular maps. % \copied{} % \subsection{Prefetching in NVE} diff --git a/src/state-of-the-art/video.tex b/src/state-of-the-art/video.tex index cfca40e..58505a2 100644 --- a/src/state-of-the-art/video.tex +++ b/src/state-of-the-art/video.tex @@ -8,7 +8,7 @@ Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH \citep{dash-std,dash-std-2}, is now a widely deployed standard for streaming adaptive video content on the Web \citep{dash-std-full}, made to be simple and scalable. \fresh{} -DASH is based on a clever way of structuring the content that allows a great adaptability during the streaming without requiring any server side computation. +DASH is based on a clever way of preparing and structuring a video in order to allow a great adaptability of the streaming without requiring any server side computation. \subsubsection{DASH structure} @@ -39,15 +39,20 @@ Segments are used to prevent this issue. They typically encode files that contain approximately one second of video, and give the software a great ability to dynamically adapt to the system. If a user wants to seek somewhere else in the video, only one second of data can be lost, and only one second of data has to be downloaded for the playback to resume. +\subsubsection{Content preparation and server} + +Encoding a video in DASH format consists in partitioning the content into periods, adaptation sets, representations and segments as explained above, and generating a Media Presentation Description file (MPD) that describes this organisation. +Once the data is prepared, it can simply be hosted on a static HTTP server that does no computation other than serving files when it receives requests. +All the intelligence and the decision making is moved to the client side. +This is one of the strengths of DASH\@: no powerful server is required, and since static HTTP server are studied since the beginning of the internet, they are stable and efficient and all DASH clients can benefit from it. + + \subsubsection{Client side computation} -Once a video is encoded in DASH format, all the files have been structured and the MPD has been generated, all this data can simply be hosted on a static HTTP server that does no computation other than serving files when it receives requests. -All the intelligence and the decision making is moved to the client side. A client typically starts by downloading the MPD file, and then proceeds on downloading segments of the different adaptation sets that he needs, estimating itself its downloading speed and choosing itself whether it needs to change representation or not. -This is one of the strengths of DASH\@: no powerful server is required, and since static HTTP server are studied since the beginning of the internet, they are optimized well and all DASH clients can benefit from it. \subsection{DASH-SRD} - +DASH has already been adapted in the setting of video streaming. DASH-SRD (Spatial Relationship Description,~\citep{dash-srd}) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device. It works by encoding a video at multiple resolutions, and tiling the highest resolutions as shown in Figure~\ref{sota:srd-png}. That way, a client can choose to download either the low resolution of the whole video or higher resolutions of a subpart of the video.