Removed fresh copied and resetstyle

This commit is contained in:
Thomas Forgione 2019-10-20 21:56:02 +02:00
parent 92029f43c7
commit b717777da1
No known key found for this signature in database
GPG Key ID: BFD17A2D71B3B5E7
31 changed files with 17 additions and 126 deletions

View File

@ -9,10 +9,6 @@
\newcommand{\argmax}[1]{\underset{#1}{\mathrm{argmax}\ }}
\newcommand{\copied}{\color{blue}}
\newcommand{\fresh}{\color{black}}
\newcommand{\resetstyle}{\color{black}}
\let\rawhref\href%
\renewcommand{\href}[2]{\rawhref{#1}{#2}\footnote{\url{#1}}}

View File

@ -1,4 +1,3 @@
\fresh{}
\section{Future work}
Successfully adapting the DASH framework to 3D content is a significant step that naturally opens many exciting perspectives.
@ -16,8 +15,8 @@ In order to account for semantic besides partitioning, we could also adapt the u
In this thesis, we considered different resolutions for textures, but we have not investigated geometry compression nor multi-resolution.
Geometry data is transmitted as OBJ files (mostly consisting in ASCII encoded numbers), which is terrible for transmission. Compression would reduce the size of the geometry files, thus increasing the quality of experience.
Supporting multi resolution geometry would improve it even more, even if performing multi-resolution on a large and heterogeneous scene is difficult.
To this day, only a few work have considered multi-resolution for textured geometry~\citep{maglo20153d}, and their focus has been on 3D objects.
Once again, semantic information could be a great help in this regard.
To this day, only a few work have considered multi-resolution for textured geometry~\citep{maglo20153d}, and their focus has been on 3D objects.
Once again, semantic information could be a great help in this regard.
% we have no doubt that semantic information can help this task.
\subsection{Performance optimization}

View File

@ -1,7 +1,5 @@
\chapter{Conclusion}
\input{conclusion/contributions}
\resetstyle{}
\input{conclusion/future-work}
\resetstyle{}

View File

@ -1,4 +1,3 @@
\copied{}
\section{Client\label{d3:dash-client}}
In this section, we specify a DASH NVE client that exploits the preparation of the 3D content in an NVE for streaming.
@ -6,7 +5,6 @@ In this section, we specify a DASH NVE client that exploits the preparation of t
The generated MPD file describes the content organization so that the client gets all the necessary information to make educated decisions and query the 3D content it needs according to the available resources and current viewpoint.
A camera path generated by a particular user is a set of viewpoint $v(t_i)$ indexed by a continuous time interval $t_i \in [t_1,t_{end}]$.
\fresh{}
All DASH clients are built from the same basic bricks, as shown in Figure~\ref{d3:dash-scheme}:
\begin{itemize}
\item the \emph{access client}, which is the module that deals with making HTTP requests and receiving responses;
@ -77,7 +75,6 @@ All DASH clients are built from the same basic bricks, as shown in Figure~\ref{d
\caption{DASH client-server architecture\label{d3:dash-scheme}}
\end{figure}
\copied{}
The DASH client first downloads the MPD file to get the material (.mtl) file containing information about all the geometry and textures available for the entire 3D model.
At time instance $t_i$, the DASH client decides to download the appropriate segments containing the geometry and the texture to generate the viewpoint $v(t_{i+1})$ for the time instance $t_{i+1}$.
@ -210,8 +207,6 @@ s^{\texttt{GREEDY}}_i= \argmax{s \in \mathcal{S} \backslash \mathcal{B}_i \cap \
\label{d3:greedy}
\end{equation}
\fresh{}
\subsection{JavaScript client\label{d3:js-implementation}}
In order to be able to evaluate our system, we need to collect traces and perform analyses on them.

View File

@ -1,11 +1,9 @@
\section{Conclusion\label{d3:conclusion}}
\copied{}
Our work in this chapter started with the question: can DASH be used for NVE\@?
The answer is \emph{yes}.
In answering this question, we contributed by showing how to organize a polygon soup and its textures into a DASH-compliant format that (i) includes a minimal amount of metadata that is useful for the client, (ii) organizes the data to allow the client to get the most useful content first.
We further show that the data organisation and its description with metadata (precomputed offline) is sufficient to design and build a DASH client that is adaptive --- it selectively downloads segments within its view, makes intelligent decisions about what to download, balances between geometry and texture while adapting to network bandwidth.
\fresh{}
This way, our system addresses the open problems we mentioned in~\ref{i:challenges}.
\begin{itemize}

View File

@ -1,4 +1,3 @@
\copied{}
\section{Content preparation\label{d3:dash-3d}}
In this section, we describe how we pre-process and store the 3D data of the NVE, consisting of a polygon soup, textures, and material information into a DASH-compliant Media Presentation Description (MPD) file.
@ -14,7 +13,6 @@ This element does not apply to NVE, and we use a single \texttt{period} for the
Each \texttt{period} element contains one or more adaptation sets, which describe the alternate versions, formats, and types of media.
We utilize adaptation sets to organize a 3D scene's material, geometry, and texture.
\fresh{}
The piece of software that does the preprocessing of the model mostly consists in file manipulation and is written is Rust as well.
It successively preprocesses the geometry and then the textures.
The MPD is generated by a library named \href{https://github.com/netvl/xml-rs}{xml-rs} that works like a stack:
@ -25,8 +23,6 @@ The MPD is generated by a library named \href{https://github.com/netvl/xml-rs}{x
\end{itemize}
This structure is passed along with our geometry and texture preprocessors that can add elements to the XML file as they are generating the corresponding data chunks.
\copied{}
\subsection{Adaptation Sets}
When the user navigates freely within an NVE, the frustum at given time almost always contains a limited part of the 3D scene.
Similar to how DASH for video streaming partitions a video clip into temporal chunks, we segment the polygons into spatial chunks, such that the DASH client can request only the relevant chunks.
@ -114,6 +110,5 @@ For textures, each representation contains a single segment.
]{assets/dash-3d/geometry-as.xml}
\end{figure}
\fresh{}
Now that 3D data is partitioned and that the MPD file is generated, we see in the next section how the client uses the MPD to request the appropriate data chunks

View File

@ -1,4 +1,3 @@
\copied{}
\section{Evaluation\label{d3:evaluation}}
We now describe our setup and the data we use in our experiments. We present an evaluation of our system and a comparison of the impact of the design choices we introduced in the previous sections.

View File

@ -1,4 +1,3 @@
\fresh{}
\section{Introduction}
In this chapter, we take a little step back from interaction and propose a system with simple interactions that however, addresses most of the open problems mentioned in Section~\ref{i:challenges}.

View File

@ -30,16 +30,8 @@ We finally evaluate these system parameters under different bandwidth setups and
\newpage
\input{dash-3d/introduction}
\resetstyle{}
\input{dash-3d/content-preparation}
\resetstyle{}
\input{dash-3d/client}
\resetstyle{}
\input{dash-3d/evaluation}
\resetstyle{}
\input{dash-3d/conclusion}
\resetstyle{}

View File

@ -1,4 +1,3 @@
\fresh{}
\section{Implementation details}
During this thesis, a lot of software has been developed, and for this software to be successful and efficient, we chose the appropriate languages.

View File

@ -1,10 +1,6 @@
\chapter{Foreword\label{f}}
\input{foreword/3d-model}
\resetstyle{}
\input{foreword/video-vs-3d}
\resetstyle{}
\input{foreword/implementation}
\resetstyle{}

View File

@ -1,5 +1,3 @@
\fresh{}
\section{Similarities and differences between video and 3D\label{i:video-vs-3d}}
Contrary to what one might think, the video streaming setting and the 3D streaming setting share many similarities: at a higher level of abstraction, both systems allow a user to access remote content without having to wait until everything is loaded.

View File

@ -1,5 +1,3 @@
\fresh{}
\section{Open problems\label{i:challenges}}
The objective of our work is to design a system that allows a user to access remote 3D content and that guarantees both good quality of service and good quality of experience.

View File

@ -1,7 +1,5 @@
\chapter{Introduction\label{i}}
\fresh{}
During the last years, 3D acquisition and modeling techniques have made tremendous progress.
Recent software use 2D images from cameras to reconstruct 3D data, e.g. \href{https://alicevision.org/\#meshroom}{Meshroom} is free and open source software that got almost \numprint{200000} downloads on \href{https://www.fosshub.com/Meshroom.html}{fosshub}, that use \emph{structure-from-motion} and \emph{multi-view-stereo} to infer a 3D model.
There are more and more devices that are specifically built to harvest 3D data: some still very expensive and provide precise information such as LIDAR (Light Detection And Ranging, as in RADAR but with light instead of radio waves), while some cheaper devices can obtain coarse data such as the Kinect.
@ -42,13 +40,8 @@ In this thesis, we propose a full framework for navigation and streaming of larg
% Furthermore, there is a significant overlap between the 3D content visible from the current point in time to the next point in time (i.e., \textit{temporal data locality}).
% Both forms of locality lead to content overlaps, thus making a correct prediction easier and a wrong prediction less costly. 3D content overlaps are particularly common in a NVE with open space, such as a 3D archaeological site or a 3D city.
\resetstyle{}
\newpage
\input{introduction/challenges}
\resetstyle{}
\input{introduction/outline}
\resetstyle{}

View File

@ -45,4 +45,3 @@
\input{abstracts/main}
\end{document}

View File

@ -1,26 +1,11 @@
\frontmatter{}
\input{introduction/main}
\resetstyle{}
\mainmatter{}
\input{foreword/main}
\resetstyle{}
\input{state-of-the-art/main}
\resetstyle{}
\input{preliminary-work/main}
\resetstyle{}
\input{dash-3d/main}
\resetstyle{}
\input{system-bookmarks/main}
\resetstyle{}
\backmatter{}
\input{conclusion/main}

View File

@ -1,5 +1,3 @@
\copied{}
\section{Impact of 3D Bookmarks on Navigation\label{bi:3dnavigation}}
We now describe an experiment that we conducted on 51 participants, with two goals in mind.

View File

@ -1,5 +1,3 @@
\fresh{}
\section{Conclusion\label{bi:conclusion}}
In this chapter, we have described an interface that allows a user to navigate in a scene that is being streamed.

View File

@ -1,4 +1,3 @@
\copied{}
\section{Introduction}
Navigating in NVE with a large virtual space (most times through a 2D interface) is sometimes cumbersome.

View File

@ -3,8 +3,6 @@
\minitoc{}
\newpage
\fresh{}
\begin{figure}[th]
\centering
\begin{subfigure}[b]{0.45\textwidth}
@ -46,14 +44,7 @@ In the last part of this chapter, we simulate a streaming setup and we show that
\newcommand{\Arrows}{\textsf{Ar\xspace}}
\input{preliminary-work/intro}
\resetstyle{}
\input{preliminary-work/bookmarks-impact}
\resetstyle{}
\input{preliminary-work/streaming}
\resetstyle{}
\input{preliminary-work/conclusion}
\resetstyle{}

View File

@ -1,5 +1,3 @@
\copied{}
\section{Impact of 3D Bookmarks on Streaming\label{bi:system}}
\subsection{3D Model Streaming}

View File

@ -1,8 +1,6 @@
\section{3D Bookmarks and Navigation Aids}
\fresh{}
The only use for 3D streaming is to allow users interacting with the content while it is being downloaded.
\copied%
However, devising an ergonomic technique for browsing 3D environments through a 2D interface is difficult.
Controlling the viewpoint in 3D (6 DOFs) with 2D devices is not only inherently challenging but also strongly task-dependent. In their review,~\citep{interaction-3d-environment} distinguish between several types of camera movements: general movements for exploration (e.g., navigation with no explicit target), targeted movements (e.g., searching and/or examining a model in detail), specified trajectory (e.g., a cinematographic camera path), etc.
For each type of movement, specialized 3D interaction techniques can be designed.

View File

@ -1,4 +1,3 @@
\fresh{}
\section{3D Streaming\label{sote:3d-streaming}}
In this thesis, our objective is to stream 3D scenes.
@ -165,7 +164,6 @@ Those tiles are then ordered to minimise dissimilarities between consecutive til
By benefiting from the video compression techniques, they are able to reach a better rate-distortion ratio than webp, which is the new standard for texture transmission, and jpeg.
However, the geometry / texture compromise is not the point of that paper.
% \copied{}
% \subsection{Prefetching in NVE}
% The general prefetching problem can be described as follows: what are the data most likely to be accessed by the user in the near future, and in what order do we download the data?
%

View File

@ -1,4 +1,3 @@
\fresh{}
In this chapter, we review a part of the state of the art on Multimedia streaming and interaction.
As discussed in the previous chapter, video and 3D share many similarities and since there is already a very important body of work on video streaming, we start this chapter with a review of this domain with a particular focus on the DASH standard.
Then, we proceed with presenting various topics related to 3D streaming, including compression, geometry and texture compromise, and viewpoint dependent streaming.

View File

@ -1,13 +1,7 @@
\chapter{Related work\label{sote}}
\input{state-of-the-art/intro}
\resetstyle{}
\input{state-of-the-art/video}
\resetstyle{}
\input{state-of-the-art/3d-streaming}
\resetstyle{}
\input{state-of-the-art/3d-interaction}
\resetstyle{}

View File

@ -1,9 +1,7 @@
\fresh{}
\section{Video\label{sote:vide}}
Accessing a remote video through the Web has been a widely studied problem since the 1990s. The Real-time Transport Protocol (RTP, \cite{rtp-std}) has been an early attempt
to formalize audio and video streaming. The protocol allowed data to be transferred unilaterally from a server to a client, and required the server to handle a separate session for each client. While this protocol can be useful in particular scenarii, such as video-conferencing, it can not realistically scale to modern video streaming platforms such as Youtube or Netflix, which must serve millions of simultaneous clients.
Accessing a remote video through the Web has been a widely studied problem since the 1990s. The Real-time Transport Protocol (RTP,~\cite{rtp-std}) has been an early attempt
to formalize audio and video streaming. The protocol allowed data to be transferred unilaterally from a server to a client, and required the server to handle a separate session for each client. While this protocol can be useful in particular scenarii, such as video-conferencing, it can not realistically scale to modern video streaming platforms such as Youtube or Netflix, which must serve millions of simultaneous clients.
Because of this limitation, and while the increasing network capabilities made video streaming a more and more common practice, a new trend emerged during the 2000s. Building on the democratization of HTTP servers, many industrial actors (Apple, Microsoft, Adobe, etc.) developed HTTP streaming systems to deliver multimedia content over the network. In an effort to bring interoperability between all different actors, the MPEG group launched an initiative which eventually became a standard known as DASH, Dynamic Adaptive Streaming over HTTP.
@ -11,7 +9,7 @@ Because of this limitation, and while the increasing network capabilities made v
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH \citep{dash-std,dash-std-2}, is now a widely deployed
standard for adaptively streaming video on the Web \citep{dash-std-full}, made to be simple, scalable and inter-operable.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content should be downloaded, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content should be downloaded, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD.
\subsubsection{DASH structure}
@ -52,12 +50,12 @@ This is one of the DASH strengths: no powerful server is required, and since sta
\subsubsection{Client side adaptation}
A client typically starts by downloading the MPD file, and then proceeds on downloading segments from the different adaptation sets. While the standard describes well how to structure content on the server side, the client may be freely implemented to take into account the specificities of a given application.
The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have of course their own adaptation logic, and many more have been proposed in the literature. A thorough review is beyond the scope of this state-of-the-art, but examples include \citep{chiariotti2016online} who formulate the problem in a reinforcement learning framework, \citep{yadav2017quetra} who formulate the problem using Queuing theory, or \citep{huang2019hindsight} who use a formulation derived from the Knapsack problem.
A client typically starts by downloading the MPD file, and then proceeds on downloading segments from the different adaptation sets. While the standard describes well how to structure content on the server side, the client may be freely implemented to take into account the specificities of a given application.
The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have of course their own adaptation logic, and many more have been proposed in the literature. A thorough review is beyond the scope of this state-of-the-art, but examples include \citep{chiariotti2016online} who formulate the problem in a reinforcement learning framework, \citep{yadav2017quetra} who formulate the problem using Queuing theory, or \citep{huang2019hindsight} who use a formulation derived from the Knapsack problem.
\subsection{DASH-SRD}
Being now widely adopted in the context of video streaming, DASH has been adapted to various other contexts.
DASH-SRD (Spatial Relationship Description,~\citep{dash-srd}) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device.
DASH-SRD (Spatial Relationship Description,~\citep{dash-srd}) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device.
It works by encoding a video at multiple resolutions, and tiling the highest resolutions as shown in Figure~\ref{sota:srd-png}.
That way, a client can choose to download either the low resolution of the whole video or higher resolutions of a subpart of the video.
@ -93,11 +91,10 @@ An example of such a property is given in Listing~\ref{sota:srd-xml}.
]{assets/state-of-the-art/video/srd.xml}
\end{figure}
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed. While Figure~\ref{sota:srd-png} illustrates how DASH-SRD can be used in the context of zoomable video streaming, the ideas developed in DASH-SRD have proven particularly useful in the context of 360 video streaming (see for example \citep{ozcinar2017viewport}).
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed. While Figure~\ref{sota:srd-png} illustrates how DASH-SRD can be used in the context of zoomable video streaming, the ideas developed in DASH-SRD have proven particularly useful in the context of 360 video streaming (see for example \citep{ozcinar2017viewport}).
This is especially interesting in the context of 3D streaming since we have this same pattern of a user viewing only a part of a content.
% \subsection{Prefetching in video streaming}
% \copied{}
%
% We briefly survey other research on prefetching that focuses on non-continuous interaction in other types of media.
%

View File

@ -1,5 +1,3 @@
\fresh{}
\section{Desktop and mobile interactions}\label{sb:interaction}
\subsection{Desktop interaction}
@ -19,7 +17,6 @@ A screenshot of this interface is displayed in Figure~\ref{sb:desktop}.
\subsection{Mobile interaction}
\copied{}
Mobile interactions are more complex because the user does not have the keyboard and mouse to interact with.
However, there are some other sensors on most mobile devices that can help interaction.
One useful sensor for 3D interaction on mobile devices is definitely the gyroscope.
@ -33,7 +30,7 @@ For this reason, we display a small joystick on the bottom-left corner of the sc
\item moving the joystick down makes the camera move backwards;
\item moving the joystick on the sides makes the camera move sideways.
\end{itemize}
A screenshot of this interface is displayed in Figure~\ref{sb:mobile}. The virtual joystick is rendered as a black circle inside a larger semi-transparent white circle. The black circle can be moved up, down, and sideways to define the direction in which the camera is translated.
A screenshot of this interface is displayed in Figure~\ref{sb:mobile}. The virtual joystick is rendered as a black circle inside a larger semi-transparent white circle. The black circle can be moved up, down, and sideways to define the direction in which the camera is translated.
\begin{figure}[ht]
@ -43,8 +40,7 @@ A screenshot of this interface is displayed in Figure~\ref{sb:mobile}. The virtu
\end{figure}
\section{Adding bookmarks into DASH NVE framework\label{sb:bookmarks}}
\fresh{}
While the previously defined interactions allow users to navigate freely throughout the scene, controlling such a high number of degrees of freedom can feel overwhelming to some users. That is why we introduce bookmarks, i.e. widgets that help the users reach a distant part of the scene using only a single, simple, interaction.
While the previously defined interactions allow users to navigate freely throughout the scene, controlling such a high number of degrees of freedom can feel overwhelming to some users. That is why we introduce bookmarks, i.e. widgets that help the users reach a distant part of the scene using only a single, simple, interaction.
\subsection{Bookmark interaction and visual aspect}
@ -127,7 +123,6 @@ Note that since on mobile, there is no mouse and thus no pointer, thumbnails are
\subsection{Segments utility at bookmarked viewpoint\label{sb:utility}}
\copied{}
Introducing bookmarks is a way to make users navigation more predictable.
Indeed, since they are emphasized and, in a way, recommended viewpoints, bookmarks are more likely to be visited by a significant portion of users than any other viewpoint on the scene.
As such, bookmarks can be used as a way to optimize streaming by downloading segments in an optimal, pre-computed order.
@ -137,7 +132,6 @@ When bookmarks are defined, it is possible to obtain a better measure of segment
% Then, by simply counting the number of pixels that are rendered using each segment, we can rank the segments by order of importance in the rendering.
We define $\mathcal{U}^{*} (s,B_i)$ as being the optimized utility of a segment $s$ in a viewpoint defined at bookmark $B_i$.
\fresh{}
In order to compute the optimized utility of a segment, we developed Algorithm~\ref{sb:algo-optimal-order}, that sorts segments according to their optimized utility.
This algorithm takes as input the considered viewpoint, the ground truth rendering from this viewpoint and the set of segments (both geometry and texture) to sort.
Starting from an empty model, each segment from the set of candidates is independently added to the scene, and the PSNR between the corresponding render and the ground truth render is computed.
@ -192,8 +186,6 @@ Note that this curve is averaged over all the 9 bookmarks of the scene. These bo
\subsection{MPD modification}
\copied{}
We now present how to include bookmarks information in the Media Presentation Description (MPD) file.
Bookmarks are fully defined by a position, a direction, and the additional content needed to properly render and use a bookmark in a system consists in two files: a thumbnail of the point of view at the bookmark, along with the JSON file giving the optimal segment order for this viewpoint, as computed by Algorithm~\ref{sb:algo-optimal-order}.
For this reason, for each bookmark, we create a separate adaptation set in the MPD\@.

View File

@ -1,4 +1,3 @@
\fresh{}
\section{Conclusion}
In this chapter, our objective was to propose a mobile interface for DASH-3D and to integrate back the interaction aspects that we developed in Chapter~\ref{bi}.
%We have seen that doing so is not trivial, and many improvements have been made.

View File

@ -1,9 +1,7 @@
\fresh{}
\section{Introduction}
In Chapter~\ref{bi}, we described how it is possible to modify a user interface to ease user navigation in a 3D scene, and how the system can benefit from it.
In Chapter~\ref{d3}, we presented the DASH-3D streaming system, which does not depend on the interface nor on the user interaction.
In Chapter~\ref{d3}, we presented the DASH-3D streaming system, which does not depend on the interface nor on the user interaction.
In this chapter, we will analyze how the user interaction can impact performances of DASH-3D.
In order to do so, we followed these two steps:

View File

@ -19,13 +19,7 @@ We then present a user study on 18 participants, that evaluate how users perceiv
\newpage
\input{system-bookmarks/introduction}
\resetstyle{}
\input{system-bookmarks/bookmark}
\resetstyle{}
\input{system-bookmarks/user-study}
\resetstyle{}
\input{system-bookmarks/conclusion}
\resetstyle{}

View File

@ -1,4 +1,3 @@
\fresh{}
\section{Evaluation}\label{sb:evaluation}
\subsection{Preliminary user study}