Thierry's fix

This commit is contained in:
Thomas Forgione 2019-10-11 11:06:22 +02:00
parent beee676839
commit bd7bc84eee
No known key found for this signature in database
GPG Key ID: 203DAEA747F48F41
19 changed files with 36 additions and 38 deletions

View File

@ -13,7 +13,7 @@ Secondly, we propose an adaptation of Dynamic Adaptive Streaming over HTTP (DASH
To do so, we cut the scene into a k-d tree where each cell correspond to a DASH adaptation set.
Each cell is further divided into segments of a fixed number of faces, grouping together faces of similar areas.
Each texture is stored in its own adaptation set, and multiple representations are available for different resolutions of the textures.
All the metadata (the cells of the k-d tree, the resolutions of the textures, etc...) is encoded in an XML file.
All the metadata (the cells of the k-d tree, the resolutions of the textures, etc.) is encoded in an XML file.
We then propose a client capable of evaluating the usefulness of each chunk of data, and a few streaming policies that decide which chunks to download.
Finally, we investigate the case of 3D streaming and navigation on mobile devices.

View File

@ -13,7 +13,7 @@ Dans un second temps, nous adaptons Dynamic Adaptive Streaming over HTTP (DASH),
Pour ce faire, nous découpons la scène en un arbre k-d où chaque cellule correspond à un adaptation set de DASH.
Chaque cellule est ensuite divisée en segments d'un certain nombre de faces, dans lesquels les faces d'aires similaires sont regroupées.
Chaque texture est enregistrée dans son adaptation set, et plusieurs représentations sont disponibles pour chaque résolution de la texture.
Toutes les meta-données (les cellules de l'arbre k-d, les résolutions des textures, etc...) sont encodées dans un fichier XML.
Toutes les meta-données (les cellules de l'arbre k-d, les résolutions des textures, etc.) sont encodées dans un fichier XML.
Nous proposons ensuite un client capable d'évaluer l'utilité de chaque morceau, et quelques politiques de transmission qui décident de quels morceaux télécharger.
Enfin, nous nous intéressons au cas de la transmission et la navigation 3D sur dispositifs mobiles.

View File

@ -5,7 +5,7 @@ In this thesis, we have presented three main contributions.
\paragraph{}
First, we set up a basic system allowing 3D navigation and 3D content streaming.
We developped a navigation aid in the form of 3D bookmarks, and we conducted a user study to analyse its impact on navigation and streaming.
On one hand, we concluded that navigation aid help people navigating in a scene, they can perform tasks faster and more easily.
On one hand, we concluded that navigation aid helps people navigating in a scene, they can perform tasks faster and more easily.
On the other hand, we showed that this help in 3D navigation comes at the cost of a negative impact for the quality of service: since users navigate faster, they require more data to perform in the same way.
However, we also showed that this cost is not a fatality\todo{not sure of that sentence, we could also say \emph{is not inevitable}}.
Due to the prior knowledge we have about bookmarks, we are able to precompute data offline that we are then able to use when users click on bookmarks to improve the quality of service.

View File

@ -1,17 +1,17 @@
\fresh{}
\section{Future work}
After all this, there is still plenty of avenues for improving our streaming system.
After all this, we have three major perspectives for future work.
\subsection{Semantic information}
In all of this work, no attention has been paid towards semantic.
Our content preparation does not take anything like that into account, and our adaptation sets and segments surely will cut data that could be grouped semantically.
Having semantic support can help us have a better structure for our content: we know that displaying half a building will lead to low quality of experience.
Having semantic support can help us having a better structure for our content: we know that displaying half a building will lead to low quality of experience.
Moreover, semantic data could also change the utilities we have been defining for our segments: some data can be marked as more important than other and this can be taken into account in our utilities.
\subsection{Compression / multi resolution for geometry}
In all of this work, no attention has been paid towards geometry compression or multi-resolution.
Geometry data is transmitted as OBJ files which is the worst possible format for transmission, and compression could drastically increase the quality of experience.
Geometry data is transmitted as OBJ files which mostly consist in ASCII encoded numbers, which is terrible for transmission, and compression could drastically increase the quality of experience.
Being able to support multi resolution geometry would be even better, and even if performing multi resolution on a large and heterogeneous scene just like ours is difficult, we have no doubt that semantic information can help this task.
\subsection{Performance optimization}

View File

@ -220,7 +220,7 @@ Since our scene is large, and since the system we are describing allows navigati
\subsubsection{Media engine}
Of course, in this work, we are concerned about performance of our system, and we will not be able to use the normal geometries described in Section~\ref{f:geometries}.
However, the way our system works, the way changes happen to the 3D content is always the same: we only add faces and textures to the model.
However, in our system, the way changes happen to the 3D content is always the same: we only add faces and textures to the model.
Therefore, we made a class that derives BufferGeometry, and that makes it more convenient for us.
\begin{itemize}
\item It has a constructor that takes as parameter the number of faces: it allocates all the memory needed for our buffers so we do not have to reallocate it which would be inefficient.
@ -230,8 +230,8 @@ Therefore, we made a class that derives BufferGeometry, and that makes it more c
\end{itemize}
\paragraph{Our 3D model class.\label{d3:model-class}}
As said in the previous subsections, a geometry and a material a bound together in a mesh.
This means that we are forced to have has many meshes as there are materials in our model.
As said in the previous subsections, a geometry and a material are bound together in a mesh.
This means that we are forced to have as many meshes as there are materials in our model.
To make this easy to manage, we made a \textbf{Model} class, that holds everything we need.
We can add vertices, faces, and materials to this model, and it will internally deal with the right geometries, materials and meshes.
In order to avoid having many models that have the same material which would harm performance, it automatically merges faces that share the same material in the same buffer geometry, as shown in Figure~\ref{d3:render-structure}.
@ -473,7 +473,7 @@ In this setup, we want to build a system that is the closest to our theoretical
Therefore, we do not have a full client in Rust (meaning an application to which you would give the URL to an MPD file and that would allow you to navigate in the scene while it is being downloaded).
In order to be able to run simulations, we develop the bricks of the DASH client separately: the access client and the media engine are totally separated:
\begin{itemize}
\item the \textbf{simulator} takes a user trace as a parameter, it then replays the trace using specific parameters of the access client and outputs a file containing the history of the simulation (what files have been downloaded, and when);
\item the \textbf{simulator} takes a user trace as a parameter, it then replays the trace using specific parameters of the access client and outputs a file containing the history of the simulation (which files have been downloaded, and when);
\item the \textbf{renderer} takes the user trace as well as the history generated by the simulator as parameters, and renders images that correspond to what would have been seen.
\end{itemize}
When simulating experiments, we will run the simulator on many traces that we collected during user-studies, and we will then run the renderer program on it to generate images corresponding to the simulation.

View File

@ -17,7 +17,7 @@ We utilize adaptation sets to organize a 3D scene's material, geometry, and text
\fresh{}
The piece of software that does the preprocessing of the model mostly consists in file manipulation and is written is Rust as well.
It successively preprocess the geometry and then the textures.
The MPD is generated by a library named \href{https://github.com/netvl/xml-rs}{xml-rs}, and that works like a stack:
The MPD is generated by a library named \href{https://github.com/netvl/xml-rs}{xml-rs} that works like a stack:
\begin{itemize}
\item a structure is created on the root of the MPD file;
\item the \texttt{start\_element} method creates a new child in the XML file;

View File

@ -7,10 +7,8 @@
\centering
\includegraphics[width=\textwidth]{assets/dash-3d/bigpicture.png}
\caption{%
A subdivided 3D scene with a viewport, with regions delimited with
red edges. In white, the regions that are outside the field of view
of the camera; in green, the regions inside the field of view of the
camera.\label{d3:big-picture}
A subdivided 3D scene with a viewport and regions delimited with red edges.
In white, the regions that are outside the field of view of the camera; in green, the regions inside the field of view of the camera.\label{d3:big-picture}
}
\end{figure}

View File

@ -118,7 +118,7 @@ The borrow checker may seem like an enemy to newcomers because it often rejects
Even better, Rust comes with great tooling.
\begin{itemize}
\item \href{https://github.com/rust-lang/rust}{\textbf{\texttt{rustc}}} is the Rust compiler. It is very confortable due to the nice error messages it displays.
\item \href{https://github.com/rust-lang/cargo}{\textbf{\texttt{cargo}}} is the official Rust's project and package manager. It manages compilation, dependencies, documentation, tests, etc\ldots
\item \href{https://github.com/rust-lang/cargo}{\textbf{\texttt{cargo}}} is the official Rust's project and package manager. It manages compilation, dependencies, documentation, tests, etc.
\item \href{https://github.com/racer-rust/racer}{\textbf{\texttt{racer}}}, \href{https://github.com/rust-lang/rls}{\textbf{\texttt{rls}} (Rust Language Server)} and \href{https://github.com/rust-analyzer/rust-analyzer}{\textbf{\texttt{rust-analyzer}}} are software that manage automatic compilation to display errors in code editors as well as providing semantic code completion.
\item \href{https://github.com/rust-lang/rustfmt}{\textbf{\texttt{rustfmt}}} auto formats code.
\item \href{https://github.com/rust-lang/rust-clippy}{\textbf{\texttt{clippy}}} is a linter that detects unidiomatic code and suggests modifications.

View File

@ -28,7 +28,7 @@ In 3D streaming, each chunk is part of a scene, and already a few problems appea
All major video streaming platforms support multi-resolution streaming.
This means that a client can choose the resolution at which it requests the content.
It can be chosen directly by the user or automatically determined by analysing the available resources (size of the screen, downloading bandwidth, device performances, etc\ldots)
It can be chosen directly by the user or automatically determined by analysing the available resources (size of the screen, downloading bandwidth, device performances, etc.)
\begin{figure}[th]
\centering
@ -280,7 +280,7 @@ Some interfaces mimic the video scenario, where the only variable is the time an
These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
Some other interfaces add 2 degrees of freedom to the previous one: the user does not control the position of the camera but they can control the angle. This mimics the scenario of the 360 video.
This is typically the case of the video game \emph{nolimits 2: roller coaster simulator} which works with VR devices (oculus rift, HTC vive, etc\ldots) where the only interaction the user has is turning the head.
This is typically the case of the video game \emph{nolimits 2: roller coaster simulator} which works with VR devices (oculus rift, HTC vive, etc.) where the only interaction the user has is turning the head.
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angle (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
The most common controls are the trackball controls where the user rotate the object like a ball \href{https://threejs.org/examples/?q=controls\#misc_controls_trackball}{(live example here)} and the orbit controls, which behave like the trackball controls but preserving the up vector \href{https://threejs.org/examples/?q=controls\#misc_controls_orbit}{(live example here)}.

View File

@ -25,8 +25,8 @@ One of the questions this thesis has to answer is \emph{``what is the best way t
\paragraph{Streaming policies.}
Once our content is prepared and split in chunks, a client needs to determine which chunks it needs to download.
A chunk that contains data in the field of view of the user is more relevant than a chunk outside of it; a chunk that is close to the camera is more relevant than a chunk far away from the camera, etc\ldots.
This should also include other contextual parameters, such as the size of a chunk, the bandwidth, the user's behaviour, etc\ldots.
A chunk that contains data in the field of view of the user is more relevant than a chunk outside of it; a chunk that is close to the camera is more relevant than a chunk far away from the camera, etc.
This should also include other contextual parameters, such as the size of a chunk, the bandwidth, the user's behaviour, etc.
The most important question we have to answer is \emph{how to determine which chunks need to be downloaded depending on the chunks themselves and the user's interactions?}
\paragraph{Evaluation.}

View File

@ -4,7 +4,7 @@
We now describe an experiment that we conducted on 51 participants, with two goals in mind.
First, we want to measure the impact of 3D bookmarks on navigation within an NVE\@.
Second, we want to collect traces from the users so that we can replay them for reproducible experiments for comparing streaming strategies in Section 4.
Second, we want to collect traces from the users so that we can replay them for reproducible experiments for comparing streaming strategies in Section~\ref{bi:system}.
\subsection{Our NVE\label{bi:our-nve}}
To ease the deployment of our experiments to users in distributed locations on a crowdsourcing platform, we implement a simple Web-based NVE client using THREE.js\footnote{http://threejs.org}.
@ -86,7 +86,7 @@ In order to avoid any bias due to the coins position, we predefined 50 possible
Participants are first presented with an initial screen to collect some preliminary information: age, gender, the last time they played 3D video games, and self-rated 3D gaming skills. We ask those questions because we believe that someone who is used to play 3D video games should browse the scene more easily, and thus, may not need to use our bookmarks.
Then, the participants go through a tutorial to learn how the UI works, and how to complete the task.
The different interactions (keyboard navigation, mouse navigation, bookmarks interaction) are progressively introduced to participants, and the tutorial finishes once the participant completes an easy version of the task.
The different interactions (keyboard navigation, mouse navigation, bookmarks interaction) are progressively introduced to participants, and the tutorial ends once the participant completes an easy version of the task.
The tutorial is always performed on the same scene.
Then, each participant has to complete the task three times.

View File

@ -3,18 +3,18 @@
\section{Conclusion\label{bi:conclusion}}
In this chapter, we have described a basic interface that allows a user to navigate in a scene that is being streamed.
It allowed us to understand the problems linked to the dynamicity of both the user behaviour and the 3D content:
It allowed us to understand the problems linked to the dynamics of both the user behaviour and the 3D content:
\begin{itemize}
\item Navigating in a 3D scene can be complex, due to the many degrees of freedom, and tweaking the interface can increase the user's Quality of Experience.
\item Adding bookmarks to the interface can have a drawback on the quality of service of the system.
\item Having bookmarks in the scene biases the users nagivation and make them more predictible: it is then possible to precompute data from bookmarks in order to benefit from this predictability.
\item Having bookmarks in the scene biases the users navigation and make them more predictable: it is then possible to precompute data from bookmarks in order to benefit from this predictability.
\end{itemize}
However, the system described in this chapter has some drawbacks and fails to answer some of the problems we mentioned in Section~\ref{i:challenges}.
\begin{itemize}
\item \textbf{The content preparation and chunk utility is almost inexistant}: the server knows all the data and simply determines what the client needs, he prepares the content and builds chunk on the go. Furthermore, it has no support for material or textures: in our setup, they are downloaded before the streaming starts.
\item \textbf{The content preparation and chunk utility is almost inexistent}: the server knows all the data and simply determines what the client needs, it prepares the content and builds chunk on the go. Furthermore, it has no support for material or textures: in our setup, they are downloaded before the streaming starts.
\item \textbf{The streaming policy is basic}: the server traverses all the polygons and determines which polygons should be sent to the client.
\item \textbf{The implementation has sloppy\todo{maybe \emph{sloppy} is too strong} performances}: since the content preparation and the streaming policy is made on the fly, the server has to keep track of what the client already has (which will eat memory) and has to compute what should be sent next (which will eat CPU). The scalability of such a server is therefore not possible. Moreover, no client performance has been taken into account since the client used in the user-study did not have to perform streaming.
\end{itemize}

View File

@ -9,9 +9,9 @@ Bookmarks are virtual objects that represent a point of view, that are rendered
In this chapter, we present our first contribution: an analysis of the impact of bookmarks on navigation and streaming.
We implement a simple 3D navigation interface that we augment with 3D bookmarks.
When the user's cursor hovers over a bookmark, a preview of the point of view is displayed to the user, and when the user clicks a bookmark, the camera smoothly moves from its initial position to the bookmarked point of view.
We conduct a within-subject user-study on 51 participants, where each user starts with a tutorial, and then tries successively to perform a task with and without bookmarks.
We show that not only the presence bookmarks causes a faster task completion, but also that it allows users to see a larger part of the scene during the same time span.
When the user's cursor hovers over a bookmark, a preview of the point of view is displayed to the user, and when the user clicks on a bookmark, the camera smoothly moves from its current position to the bookmarked point of view.
We conduct a within-subject user-study on 51 participants, where each user starts with a tutorial to get used to the 3D navigation controls, and then tries successively to perform a task with and without bookmarks.
We show that not only the presence of bookmarks causes a faster task completion, but also that it allows users to see a larger part of the scene during the same time span.
However, in a streaming scenario, this phenomenon leads to higher network requirements to maintain the same quality of service\todo{not sure if service or experience}.
In the last part of this chapter, we simulate streaming by replaying the traces collected during the user study, and we show that knowing the positions of the bookmarks beforehand allows us to pre-compute information that we can reuse during streaming to compensate for the harm caused by the faster navigation with bookmarks.

View File

@ -66,7 +66,7 @@ When the model is light enough, it is encoded as is, and the operations needed t
Thus, a client can start by downloading the low resolution model, display it to the user, and keep downloading and displaying details as time goes by.
This process reduces the time a user has to wait before seeing something, and increases the quality of experience.
More recently, to answer the need for a standard format for 3D data, the Khronos group has proposed a generic format called glTF (GL Transmission Format,~\cite{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated model, etc\ldots
More recently, to answer the need for a standard format for 3D data, the Khronos group has proposed a generic format called glTF (GL Transmission Format,~\cite{gltf}) to handle all types of 3D content representations: point clouds, meshes, animated model, etc.
glTF is based on a JSON file, which encodes the structure of a scene of 3D objects.
It can contain a scene tree with cameras, meshes, buffers, materials, textures, animations an skinning information.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming which is required for large scene remote visualisation.

View File

@ -28,7 +28,7 @@ It may also have an adaptation set for subtitles.
\paragraph{Representations.}
The representation level is the level DASH uses to offer the same content at different levels of resolution.
For example, an adaptation set containing images has a representation for each available resolution (it might be 480p, 720p, 1080p, etc\ldots).
For example, an adaptation set containing images has a representation for each available resolution (it might be 480p, 720p, 1080p, etc.).
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal resolution, being the highest resolution that the client can request without stalling.
\paragraph{Segments.}

View File

@ -148,7 +148,7 @@ Sorting all the segments from the model would be an excessively time consuming c
To speed up this algorithm, we only sort the 200 first best segments, and we choose these segments among a filtered set of candidates.
To find those candidates, we reuse the ideas developed in Chapter~\ref{bi}.
We render the ``pixel to geometry segment'' and ``pixel to texture'' maps, as shown in Figure~\ref{sb:bookmarks-utility}.
These renderings allow us to know what geometry segment and what texture correspond to each pixel, and filter out useless candidates.
These renderings allow us to know which geometry segment and which texture correspond to each pixel, and filter out useless candidates.
\begin{figure}[th]
\centering

View File

@ -4,8 +4,8 @@
In Chapter~\ref{bi}, we described how it is possible to modify a user interface to ease user navigation in a 3D scene, and how the system can benefit from it.
In Chapter~\ref{d3}, we presented a streaming system that takes neither the interface nor the user interaction into account.
Hence, it is natural study how the user interaction can impact performances of DASH-3D.
In order to do so, followed these two steps:
Hence, it is natural to study how the user interaction can impact performances of DASH-3D.
In order to do so, we followed these two steps:
\begin{itemize}
\item we design an interface allowing to navigate in a 3D scene on both desktop and mobile devices;

View File

@ -7,7 +7,7 @@ Nowadays, smartphones are more and more powerful, and people slowly move their a
This is why we decided to port our interface to mobile devices.
Desktop devices and mobile devices are very different.
There are many differences in terms of performance: desktop devices tend to be much more powerful and have much better memory network connection than mobile devices.
Also, the interaction is not comparable in any way: the desktop mostly uses keyboard and mouse, whereas most of the mobile devices only have a touchscreen, as well as many sensors (accelerometer, gyroscope, GPS, etc\ldots).
Also, the interaction is not comparable in any way: the desktop mostly uses keyboard and mouse, whereas most of the mobile devices only have a touchscreen, as well as many sensors (accelerometer, gyroscope, GPS, etc.).
This is why porting our DASH-3D client to mobile is not an easy task.
To do so, we add some widgets on the screen to support touch interaction: a virtual joystick is displayed on the screen and the user can touch it to translate the camera, instead of using the W, A, S and D keys on a computer.

View File

@ -5,7 +5,7 @@
Before conducting the user study on mobile devices, we designed a user study for desktop devices.
This experiment was conducted on a little more than a dozen of people, with the model described in the previous chapter.
Bookmarks were positioned from user-generated panoramic picture available on Google Maps, and the task consisted in retrieving spots on the 3D model from a picture: users were presented with an image coming from Google Street View and user had to find the corresponding spot in the 3D model.
Bookmarks were positioned from user-generated panoramic picture available on Google Maps, and the task consisted in retrieving spots on the 3D model from a picture: users were presented with an image coming from Google Street View and they had to find the corresponding spot in the 3D model.
Due to the fact the task was hard, and that our users were familiar with 3D navigation, they preferred navigating slowly in the scene, and did not use bookmarks as much as they did during the experiment we ran in Chapter~\ref{bi}.
@ -49,7 +49,7 @@ The order of those two sessions is randomized to avoid biases.
% Since we know that the difference between our streaming policies is subtle, we designed a task a little more complex in order to highlight the differences so that the user can see it.
Since the behaviours of our streaming policy only differ when the user clicks a bookmark, we designed a task where the users have to perform a guided tour of the scene, where each bookmark is a step of the tour.
The user starts in the scene, and one of the bookmarks is blinking.
The user has to click the bookmark, and wait a little when he arrives at the destination.
The user has to touch the bookmark, and wait a little when he arrives at the destination.
Once some data has been downloaded, and the user is satisfied with the data downloaded, they can look for the next blinking bookmarks.
This setup is repeated for each streaming policy, and after the two sessions, the users have to answer a questionnaire asking the question \emph{In what session did you find the streaming the smoothest?}
The questionnaire also has a text field for users to explain their answer if they wish.
@ -91,7 +91,7 @@ We could argue that they do not like the bookmarks because they make the task to
\subsubsection{Qualitative results --- Streaming}
Among the 18 participants of this user study, 10 confirmed that they preferred the optimized policy, 4 preferred the greedy policy, and 4 did not perceive the difference.
Another interesting fact is that on the last part of the experiment (the free navigation) the average number of clicks on bookmarks is 3 for users having the greedy policy and 5.3 for users having the optimized policy.
Another interesting fact is that on the last part of the experiment (the free navigation), the average number of clicks on bookmarks is 3 for users having the greedy policy, and 5.3 for users having the optimized policy.
Even though statistical significance is not reached, this result seems to indicate that a policy optimized for bookmarks could lead users to click more on bookmarks.
\subsubsection{Quantitative results}
@ -100,7 +100,7 @@ By collecting all the traces during the experiments, we are able to replay the r
Figure~\ref{sb:psnr-second-experiment} shows the average PSNR that user got while navigating during the second experiment (bookmark path).
Below the PSNR curve is a curve that shows how many users were moving to or staying at a bookmark position.
As we can see, the two policies perform in the same way in the beginning when few users are moving to a bookmarks.
However, when they start clicking on bookmarks, the gap grows and our optimized policy perform better.
However, when they start clicking on bookmarks, the gap grows and our optimized policy performs better.
Figure~\ref{sb:psnr-second-experiment-after-click} shows the PSNR after a click on a bookmark.
To compute these curves, we isolated the ten seconds after each click on a bookmark that occurs and we averaged them all.
These curves isolate the effect of our optimized policy, and shows the difference a user can feel when clicking on a bookmark.