This commit is contained in:
Thomas Forgione 2019-10-18 15:24:29 +02:00
parent 31f227b913
commit 04ab09b4fb
No known key found for this signature in database
GPG Key ID: 203DAEA747F48F41
8 changed files with 64 additions and 27 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 181 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 529 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 418 KiB

View File

@ -801,3 +801,14 @@
year={2015},
organization={ACM}
}
@article{maglo20153d,
title={3d mesh compression: Survey, comparisons, and emerging trends},
author={Maglo, Adrien and Lavou{\'e}, Guillaume and Dupont, Florent and Hudelot, C{\'e}line},
journal={ACM Computing Surveys (CSUR)},
volume={47},
number={3},
pages={44},
year={2015},
publisher={ACM}
}

View File

@ -33,8 +33,6 @@ anchorcolor = blue]{hyperref}
\setitemize{noitemsep,topsep=4pt,parsep=4pt,partopsep=0pt}
\pagestyle{scrheadings}
\refoot[]{Version built on \today{} at \currenttime{}}
\lofoot[]{Version built on \today{} at \currenttime{}}
\usepackage{tikz}
\usetikzlibrary{shadows}

View File

@ -10,17 +10,29 @@ In most cases, rotating, panning, and zooming movements are required, and users
Navigation aids and smart widgets are required and subject to research efforts both in 3D companies (see \url{sketchfab.com}, \url{cl3ver.com} among others) and in academia, as reported below.
Translating and rotating the camera can be simply specified by a \textit{lookat} point.
This is often known as point-of-interest movement (or \textit{go-to}, \textit{fly-to} interactions)~\citep{controlled-movement-virtual-3d}.
Given such a point, the camera automatically animates from its current position to a new position that looks at the specified point.
This is often known as point-of-interest (POI) movement (or \textit{go-to}, \textit{fly-to} interactions)~\citep{controlled-movement-virtual-3d}.
Given such a point, the camera automatically moves from its current position to a new position that looks at the POI\@.
One key issue of these techniques is to correctly orient the camera at destination.
In Unicam \citep{two-pointer-input}, the so-called click-to-focus strategy automatically chooses the destination viewpoint depending on 3D orientations around the contact point.
The recent Drag'n Go interaction \citep{drag-n-go} also hits a destination point while offering control on speed and position along the camera path.
This 3D interaction is designed in the screen space (it is typically a mouse-based camera control), where cursor's movements are mapped to camera movements following the same direction as the on-screen optical-flow.
\begin{figure}[ht]
\centering
\includegraphics[width=0.7\textwidth]{assets/state-of-the-art/3d-interaction/dragngo.png}
\caption{Screenshot of the drag'n go interface~\citep{drag-n-go} (the percentage widget is for illustration)}
\end{figure}
Some 3D browsers provide a viewpoint menu offering a choice of viewpoints \citep{visual-perception-3d,showmotion}.
Authors of 3D scenes can place several viewpoints (typically for each POI) in order to allow easy navigation for users, who can then easily navigate from viewpoint to viewpoint just by selecting a menu item.
Such viewpoints can be either static, or dynamically adapted:~\citep{dual-mode-ui} report that users clearly prefer navigating in 3D using a menu with animated viewpoints than with static ones.
\begin{figure}[ht]
\centering
\includegraphics[width=0.5\textwidth]{assets/state-of-the-art/3d-interaction/burtnyk.png}
\caption{Screenshot of an interface with menu for navigation~\citep{showmotion}}
\end{figure}
Early 3D VRML environments \citep{browsing-3d-bookmarks} offer 3D bookmarks with animated transitions between bookmarked views.
These transitions prevent disorientation since users see how they got there.
Hyperlinks can also ease rapid movements between distant viewpoints and naturally support non-linear and non-continuous access to 3D content.
@ -29,7 +41,13 @@ Navigating with 3D hyperlinks is potentially faster, but is likely to cause diso
These authors point out that linking is appreciated by users and that easing linking would likely result in a richer user experience.
\citep{dual-mode-ui} developed the Dual-Mode User Interface (DMUI) that coordinates and links hypertext to 3D graphics in order to access information in a 3D space.
\begin{figure}[ht]
\centering
\includegraphics[width=1\textwidth]{assets/state-of-the-art/3d-interaction/dmui.png}
\caption{The two modes of DMUI~\citep{dual-mode-ui}}
\end{figure}
The use of in-scene 3D navigation widgets can also facilitate 3D navigation tasks.
\citep{navigation-aid-multi-floor} propose and evaluate 2D and 3D maps as navigation aids for complex virtual buildings and find that the 2D navigation aid outperforms the 3D one for searching tasks.
The ViewCube widget \citep{viewcube} serves as a proxy for the 3D scene and offers viewpoint switching between 26 views while clearly indicating associated 3D orientations.
Interactive 3D arrows that point to objects of interest have also been proposed as navigation aids by Chittaro and Burigat~\citep{location-pointing-navigation-aid,location-pointing-effect}: when clicked, the arrows transfer the viewpoint to the destination through a simulated walk or a faster flight.
Interactive 3D arrows that point to objects of interest have also been proposed as navigation aids in~\citep{location-pointing-navigation-aid,location-pointing-effect}: when clicked, the arrows transfer the viewpoint to the destination through a simulated walk or a faster flight.

View File

@ -7,8 +7,19 @@ In the next sections, we review the 3D streaming related work, from 3D compressi
\subsection{Compression and structuring}
The most popular compression model for 3D is progressive meshes: they were introduced in~\citep{progressive-meshes} and allow a progressive transmission of a mesh by sending a low resolution mesh first, called \emph{base mesh}, and then transmitting detail information that a client can use to increase the resolution.
To do so, an algorithm, called \emph{decimation algorithm}, starts from the original mesh and iteratively removes vertices and faces by merging vertices through the so called \emph{edge collapse} operation (Figure~\ref{sote:progressive-scheme}).
According to \citep{maglo20153d}, mesh compression can be divided in four categories:
\begin{itemize}
\item single-rate mesh compression, seeking to reduce the size of a mesh;
\item progressive mesh compression, encoding meshes in many levels of resolution that can be downloaded and rendered one after the other;
\item random accessible mesh compression, where parts of the models can be decoded in any order;
\item mesh sequence compression, compressing mesh animations.
\end{itemize}
Since our objective is to stream 3D static scenes, single-rate mesh and mesh sequence compressions are less interesting for us.
This section thus focuses on progressive meshes and random accessible mesh compression.
Progressive meshes were introduced in~\citep{progressive-meshes} and allow a progressive transmission of a mesh by sending a low resolution mesh first, called \emph{base mesh}, and then transmitting detail information that a client can use to increase the resolution.
To do so, an algorithm, called \emph{decimation algorithm}, starts from the original full resolution mesh and iteratively removes vertices and faces by merging vertices through the so called \emph{edge collapse} operation (Figure~\ref{sote:progressive-scheme}).
\begin{figure}[ht]
\centering
@ -65,10 +76,10 @@ To do so, an algorithm, called \emph{decimation algorithm}, starts from the orig
\caption{Vertex split and edge collapse\label{sote:progressive-scheme}}
\end{figure}
Every time two vertices are merged, vertices and faces are removed from the original mesh, and the resolution of the model decreases a little.
Every time two vertices are merged, a vertex and two faces are removed from the original mesh, decreasing the resolution of the model.
After content preparation, the mesh consists in a base mesh and a sequence of partially ordered edge split operations.
Thus, a client can start by downloading the base mesh, display it to the user, and keep downloading and displaying details as time goes by.
This process reduces the time a user has to wait before seeing something, and increases the quality of experience.
Thus, a client can start by downloading the base mesh, display it to the user, and keep downloading refinement operations (vertex splits) and display details as time goes by.
This process reduces the time a user has to wait before seeing something, thus increases the quality of experience.
These methods have been vastly researched \citep{bayazit20093,mamou2010shape}, but very few of these methods can handle meshes with attributes, such as texture coordinates.
@ -76,7 +87,7 @@ These methods have been vastly researched \citep{bayazit20093,mamou2010shape}, b
With the same objective, \citep{pop-buffer} proposes pop buffer, a progressive compression method based on quantization that allows efficient decoding.
Following this, many approaches use multi triangulation, which creates mesh fragments at different levels of resolution and encodes the dependencies between fragments in a directed acyclic graph.
\citep{batched-multi-triangulation} proposes Nexus: a GPU optimized version of multi triangulation that pushes its performances to real time.
In \citep{batched-multi-triangulation}, the authors propose Nexus: a GPU optimized version of multi triangulation that pushes its performances to make real time rendering possible.
It is notably used in 3DHOP (3D Heritage Online Presenter, \citep{3dhop}), a framework to easily build web interfaces to present 3D models to users in the context of cultural heritage.
Each of these approaches define its own compression and coding for a single mesh.
@ -84,12 +95,12 @@ However, users are frequently interested in scenes that contain many meshes, and
To answer those issues, the Khronos group proposed a generic format called glTF (GL Transmission Format,~\citep{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated model, etc.\
glTF is based on a JSON file, which encodes the structure of a scene of 3D objects.
It contains a scene tree with cameras, meshes, buffers, materials, textures, animations an skinning information.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming which is required for large scene remote visualisation.
It contains a scene graph with cameras, meshes, buffers, materials, textures, animations an skinning information.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming which is required for large scene remote visualisation and that we address in our work.
% Zampoglou
\citep{zampoglou} are the first to propose to use DASH to stream 3D content.
\citep{zampoglou} is the first paper that proposes to use DASH to stream 3D content.
In their work, the authors describe a system that allows users to access 3D content at multiple resolutions.
They organize the content, following DASH terminology, into periods, adaptation sets, representations.
Their first adaptation set codes the tree structure of the scene graph.
@ -98,7 +109,7 @@ To avoid requests that would take too long and thus introduce latency, the repre
The authors discuss the optimal number of polygons that should be stored in a single segment.
On the one hand, using segments containing very few faces will induce many HTTP requests from the client, and will lead to poor streaming efficiency.
On the other hand, if segments contain too many faces, the time to load the segment will be long and the system loses adaptability.
This approach works well for several objects, but does not handle view-dependent streaming, which is desirable in the use case of large NVEs\@.
Their approach works well for several objects, but does not handle view-dependent streaming, which is desirable in the use case of large NVEs\@.
\subsection{Viewpoint dependency}
@ -108,7 +119,7 @@ This means that the progressive compression must allow a decoder to choose what
This is typically called \emph{random accessible mesh compression}.
\citep{maglo2013pomar} is such an example of random accessible progressive mesh compression.
In the case of large scene 3D streaming, viewpoint dependent streaming is a must-have: a user will only be seeing one small portion of the scene at each time, and a system that does not adapt its streaming to the user's point of view is bound to have poor quality of experience.
In the case of streaming a large 3D scene, viewpoint dependent streaming is a must-have: a user will only be seeing one small portion of the scene at each time, and a system that does not adapt its streaming to the user's point of view is bound to have poor quality of experience.
A simple way to implement viewpoint dependency is to access the content near the user's camera.
This approach, implemented in Second Life and several other NVEs (e.g.,~\citep{peer-texture-streaming}), only depends on the location of the avatar, not on its viewing direction.
@ -118,28 +129,28 @@ A simple approach is to retrieve the objects based on distance: the spatial dist
More recently, Google integrated Google Earth 3D module into Google Maps.
Users are now able to go to Google Maps, and click the 3D button which shifts the camera from the top-down view.
Even though there are no associated publications, it seems that the interface does view dependent streaming: low resolution from the center of the point of view gets downloaded right away, and then, data farther away or higher resolution data gets downloaded.
In the same vein, \citep{3d-tiles} developed 3D Tiles, is a specification for visualizing massive 3D geospatial data developed by Cesium and built on top of glTF\@.
Even though there are no associated publications, it seems that the interface does view dependent streaming: low resolution from the center of the point of view gets downloaded right away, and then, data farther away or higher resolution data gets downloaded since it appears in a second time.
The choice of the nearby can be based based on an a priori, discretized, partitioned version of the environment; for example, \citep{3d-tiles} developed 3D Tiles, is a specification for visualizing massive 3D geospatial data developed by Cesium and built on top of glTF\@.
Their main goal is to display 3D objects on top of regular maps.
\subsection{Geometry and textures}
As discussed in Chapter~\ref{f:3d}, meshes consists in two main types of data: geometry and textures.
When addressing 3D streaming, one must find a compromise between geometry and textures, and a system needs to solve this compromise.
Balancing between streaming of geometry and texture data are considered by~\citep{batex3},~\citep{visual-quality-assessment}, and~\citep{mesh-texture-multiplexing}.
All three work considered a single, manifold textured mesh model with progressive meshes.
Their approach is to combine the distortion caused by having lower resolution meshes and textures into a single view independent metric.
As discussed in Chapter~\ref{f:3d}, most 3D scenes consists in two main types of data: geometry and textures.
When addressing 3D streaming, one must handle the concurrency between geometry and textures, and a system needs to solve this compromise.
Balancing between streaming of geometry and texture data is addressed by~\citep{batex3},~\citep{visual-quality-assessment}, and~\citep{mesh-texture-multiplexing}.
Their approaches combine the distortion caused by having lower resolution meshes and textures into a single view independent metric.
\citep{progressive-compression-textured-meshes} also deals with the geometry / texture compromise.
This work designs a cost driven framework for 3D data compression, both in terms of geometry and textures.
This framework generates an atlas for textures that enables efficient compression and multiresolution scheme.
All four works considered a single, manifold textured mesh model with progressive meshes, and is not applicable in our work since we deal with large scenes with autointersections and T vertices.
\citep{simon2019streaming} propose a way to stream a set of textures by encoding the textures into a video.
Regarding texture streaming, \citep{simon2019streaming} propose a way to stream a set of textures by encoding the textures into a video.
Each texture is segmented into tiles of a fixed size.
Those tiles are then ordered to minimise dissimilarities between consecutive tiles, and encoded as a video.
By benefiting from the video compression techniques, they are able to reach a better rate-distortion ratio than webp, which is the new standard for texture transmission, and jpeg.
However, the geometry / texture compromise is not the point of that paper.
% \copied{}
% \subsection{Prefetching in NVE}
% The general prefetching problem can be described as follows: what are the data most likely to be accessed by the user in the near future, and in what order do we download the data?

View File

@ -2,5 +2,4 @@
In this chapter, we review a part of the state of the art on Multimedia streaming and interaction.
As discussed in the previous chapter, video and 3D share many similarities and since there is already a very important body of work on video streaming, we start this chapter with a review of this domain with a particular focus on the DASH standard.
Then, we proceed with presenting various topics related to 3D streaming, including compression, geometry and texture compromise, and viewpoint dependent streaming.
Finally, we end this chapter by reviewing the related work regarding 3D navigation and interfaces.