This commit is contained in:
Thomas Forgione 2019-10-16 17:01:56 +02:00
parent 8b9303b1a5
commit 8e686dc040
No known key found for this signature in database
GPG Key ID: 203DAEA747F48F41
8 changed files with 40 additions and 42 deletions

View File

@ -1,6 +1,6 @@
A 3D streaming system is a system that dynamically collects 3D data.
The previous chapter voluntarily remained vague about what \emph{3D data} actually is.
This chapter presents in detail the 3D data we consider and how it is renderer.
This chapter presents in detail the 3D data we consider and how it is rendered.
We also give insights about interaction and streaming by comparing the 3D case to the video one.
\section{What is a 3D model?\label{f:3d}}
@ -13,7 +13,7 @@ Such a model can typically contain the following:
\item \textbf{Vertices} are simply 3D points;
\item \textbf{Faces} are polygons defined from vertices (most of the time, they are triangles);
\item \textbf{Textures} are images that can be used for painting faces, to add visual richness;
\item \textbf{Texture coordinates} are information added to a face, describing how the texture should be applied on a face;
\item \textbf{Texture coordinates} are information added to a face, describing how the texture should be painted over faces;
\item \textbf{Normals} are 3D vectors that can give information about light behaviour on a face.
\end{itemize}
@ -23,7 +23,7 @@ A 3D model encoded in the OBJ format typically consists in two files: the materi
\paragraph{}
The materials file declares all the materials that the object file will reference.
A material consists in name, and other photometric properties such as ambient, diffuse and specular colors, as well as texture maps.
Each face correspond to a material and a renderer can use the material's information to render the faces.
Each face corresponds to a material and a renderer can use the material's information to render the faces.
A simple material file is visible on Listing~\ref{i:mtl}.
\paragraph{}

View File

@ -1,10 +1,10 @@
\fresh{}
\section{Implementation details}
During this thesis, a lot of software has been developed, and for this software to be successful and efficient, we took care of choosing the appropriate languages.
During this thesis, a lot of software has been developed, and for this software to be successful and efficient, we chose the appropriate languages.
When it comes to 3D streaming systems, there are two kind of software that we need.
\begin{itemize}
\item \textbf{Interactive applications} that can run on as many devices as possible whether it be desktop or mobile in order to try and to conduct user studies. For this context, we chose the \textbf{JavaScript language}, since it can run on many devices and it has great support for WebGL\@.
\item \textbf{Interactive applications} that can run on as many devices as possible whether it be desktop or mobile in order to conduct user studies. For this context, we chose the \textbf{JavaScript language}, since it can run on many devices and it has great support for WebGL\@.
\item \textbf{Native applications} that can run fast on desktop devices, in order to run simulations and evaluate our ideas. For this context, we chose the \textbf{Rust} language, which is a somewhat recent language that provides both the efficiency of C and C++ and the safety of functional languages.
\end{itemize}
@ -21,9 +21,9 @@ THREE.js acts as a 3D engine built on WebGL\@.
It provides classes to deal with everything we need:
\begin{itemize}
\item the \textbf{Renderer} class contains all the WebGL code needed to render a scene on the web page;
\item the \textbf{Object} class contains all the boilerplate needed to manage the tree structure of the content, it contains a transform and it can have children that are other objects;
\item the \textbf{Object} class contains all the boilerplate needed to manage the tree structure of the content, it contains a transform (translation and rotation) and it can have children that are other objects;
\item the \textbf{Scene} class is the root object, it contains all of the objects we want to render and it is passed as argument to the render function;
\item the \textbf{Geometry} and \textbf{BufferGeometry} classes are the classes that hold the vertices buffers, we will discuss that more in Section~\ref{f:geometries};
\item the \textbf{Geometry} and \textbf{BufferGeometry} classes are the classes that hold the vertices buffers, we will discuss it more in Section~\ref{f:geometries};
\item the \textbf{Material} class is the class that holds the properties used to render geometry (the most important information being the texture), there are many classes derived from Material, and the developer can choose what material he wants for its objects;
\item the \textbf{Mesh} class is the class that links the geometry and the material, it derives the Object class and can thus be added to a scene and renderer.
\end{itemize}
@ -41,7 +41,7 @@ A snippet of the basic usage of these classes is given in Listing~\ref{f:three-h
Geometries are the classes that hold the vertices, texture coordinates, normals and faces.
There are two most important geometry classes in THREE.js:
\begin{itemize}
\item the \textbf{Geometry} class, which is made to be developer friendly and allows easy editing but can suffer from issues of performance;
\item the \textbf{Geometry} class, which is made to be developer friendly and allows easy editing but can suffer from performance issues;
\item the \textbf{BufferGeometry} class, which is harder to use for a developer, but allows better performance since the developer controls how data is transmitted to the GPU\@.
\end{itemize}
@ -52,12 +52,12 @@ In this section, we explain the specificities of Rust and why it is a great lang
\subsubsection{Specificity of Rust}
Rust is a system programming language focused on safety.
It is made to be efficient (and effectively has performances comparable to C or C++) but with some nice features.
Developers coming from C++ (as myself) might see it as a language like C++ but that forbids undefined behaviours.\footnote{in Rust, when you need to execute code that might lead to undefined behaviours, you need to put it inside an \texttt{unsafe} block. Many operations will not be available outside an \texttt{unsafe} block (e.g., dereferencing a pointer, or mutating a static variable). The idea is that you can use \texttt{unsafe} blocks when you require it, but you should avoid it as much as possible and when you do it, you must be particularly careful.}
It is made to be efficient (and effectively has performances comparable to C or C++) but with some extra features.
C++ users might see it as a language like C++ but that forbids undefined behaviours.\footnote{in Rust, when you need to execute code that might lead to undefined behaviours, you need to put it inside an \texttt{unsafe} block. Many operations will not be available outside an \texttt{unsafe} block (e.g., dereferencing a pointer, or mutating a static variable). The idea is that you can use \texttt{unsafe} blocks when you require it, but you should avoid it as much as possible and when you do it, you must be particularly careful.}
The most powerful concept from Rust is \emph{ownership}.
Basically, every value has a variable that we call its \emph{owner}.
To be able to use a value, you must either be its owner or borrow it.
There are two types of borrow, the immutable borrow and the mutable borrow (people from C++ can see them as having a const reference or a reference to a variable).
There are two types of borrow, the immutable borrow and the mutable borrow (people from C++ can see them as having a reference to a variable).
The compiler comes with the \emph{borrow checker} which makes sure you only use variables that you are allowed to.
For example, the owner can only use the value if it is not being borrowed, and it is only possible to either borrow mutably a value once, or immutably borrow a value as many times as you want.
At first, the borrow checker seems particularly efficient to detect bugs in concurrent software, but in fact, it is also decisive in non concurrent code.
@ -74,7 +74,7 @@ Consider the piece of C++ code in Listings~\ref{f:undefined-behaviour-cpp} and~\
\lstinputlisting[
language=c++,
label={f:undefined-behaviour-cpp-it},
caption={Undefined beheaviour with iterator syntax}
caption={Undefined behaviour with iterator syntax}
]{assets/dash-3d-implementation/undefined-behaviour-it.cpp}
\end{minipage}
\end{figure}

View File

@ -36,13 +36,13 @@ It can be chosen directly by the user or automatically determined by analysing t
\caption{The different resolutions available for a Youtube video}
\end{figure}
Similarly, recent work in 3D streaming have proposed different ways to progressively streaming 3D models, displaying a low resolution to the user without latency, and supporting interaction with the model while the details are being downloaded.
Similarly, recent work in 3D streaming have proposed different ways to progressively stream 3D models, displaying a low resolution to the user without latency, and supporting interaction with the model while details are being downloaded.
Such strategies are reviewed in Section~\ref{sote:3d-streaming}.
\subsection{Media types}
Just like a video, a 3D scene is composed of different types of media.
In video, those media typically are images, sounds, and eventually subtitles, whereas in 3D, those media typically are geometry or textures.
In video, those media are mostly images, sounds, and eventually subtitles, whereas in 3D, those media are geometry or textures.
In both cases, an algorithm for content streaming has to acknowledge those different media types and manage them correctly.
In video streaming, most of the data (in terms of bytes) is used for images.
@ -52,9 +52,9 @@ This is one of the main differences between video and 3D streaming: in a 3D scen
\subsection{Interaction}
The ways of interacting with the content is probably the most important difference between video and 3D.
The ways of interacting with content is probably the most important difference between video and 3D.
In a video interface, there is only one degree of freedom: time.
The only thing a user can do is letting the video play, pausing, resuming, or jumping to another time in the video.
The only things a user can do is letting the video play, pausing, resuming, or jumping to another time in the video.
Even though these interactions seem easy to handle, giving the best possible experience to the user is already challenging. For example, to perform these few actions, Youtube provides the user with multiple options.
\begin{itemize}
@ -301,7 +301,7 @@ When it comes to 3D, there are many approaches to manage user interaction.
Some interfaces mimic the video scenario, where the only variable is the time and the camera follows a predetermined path on which the user has no control.
These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
Some other interfaces add 2 degrees of freedom to the timeline: the user does not control the position of the camera but they can control the angle. This mimics the scenario of the 360 video.
Some other interfaces add 2 degrees of freedom to the timeline: the user does not control the position of the camera but can control the angle. This mimics the scenario of the 360 video.
This is typically the case of the video game \emph{nolimits 2: roller coaster simulator} which works with VR devices (oculus rift, HTC vive, etc.) where the only interaction the user has is turning their head.
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the position of the camera, and 2 being the angle (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).

View File

@ -7,39 +7,39 @@ A 3D streaming client has lots of tasks to accomplish:
\begin{itemize}
\item render a scene;
\item find out what part of the model to download next;
\item decide what part of the model to download next;
\item download the next part;
\item parse the downloaded content;
\item add the parsed result to the scene;
\item manage the interaction with the user.
\end{itemize}
This opens multiple problems that we need to take care of.
This opens multiple problems that need to be considered and will be studied in this thesis.
\paragraph{Content preparation.}
% Any preprocessing that can be done on our 3D data gives us a strategical advantage since it consists in computations that will not be needed live, neither for the server nor for the client.
% Furthermore, for streaming, data needs to be split into chunks that are requested separately, so perparing those chunks in advance can also help the streaming.
Before streaming content, it needs to be prepared.
The segmentation of the content into chunks is particularly important for streaming since it allows transmitting only a portion of the data to the client.
A partial model consisting in the downloaded content, it can be rendered before downloading more chunks.
A partial model consisting in the downloaded content, it can be rendered while downloading more chunks.
Content preparation also includes compression.
One of the questions this thesis has to answer is: \emph{what is the best way to prepare 3D content so that a client can progressively download and render the 3D model?}
One of the questions this thesis has to answer is: \emph{what is the best way to prepare 3D content so that a streaming client can progressively download and render the 3D model?}
\paragraph{Streaming policies.}
Once our content is prepared and split in chunks, a client needs to determine which chunks should be downloaded first.
A chunk that contains data in the field of view of the user is more relevant than a chunk that is not inside; a chunk that is close to the camera is more relevant than a chunk far away from the camera, etc.
This should also include other contextual parameters, such as the size of a chunk, the bandwidth, the user's behaviour, etc.
The most important questions we have to answer are: \emph{how to estimate a chunk utility, and how to determine which chunks need to be downloaded depending on the chunks themselves and the user's interactions?}
The most important questions we have to answer are: \emph{how to estimate a chunk utility, and how to determine which chunks need to be downloaded depending the user's interactions?}
\paragraph{Evaluation.}
In such systems, the two most important criteria for evaluation are quality of service, and quality of experience.
The quality of service is a network-centric metric, which considers values such as throughput.
The quality of experience is a user-centric metric, and can only be measured by asking how users feel about a system.
The quality of service is a network-centric metric, which considers values such as throughput and measures how well the content is served to the client.
The quality of experience is a user-centric metric: it relies on user perception and can only be measured by asking how users feel about a system.
To be able to know which streaming policies are best, one needs to know \emph{how to compare streaming policies and evaluate the impact of their parameters in terms of quality of service and quality of experience?}
\paragraph{Implementation.}
The objective of our work is to setup a client-server architecture that answers the above problems: content preparation, chunk utility, streaming policies.
In this regard, we have to find out \emph{how do we build this architecture that keeps the computational load for the server low so it scales up and for the client so that it has enough resources to perform the tasks described above?}
In this regard, we have to find out \emph{how do we build this architecture that keeps a low computational load on the server so it scales up and on the client so that it has enough resources to perform the tasks described above?}
% This implementation must respect constraints required for performant software:
%

View File

@ -2,17 +2,17 @@
\fresh{}
During the last years, 3D acquisition and modeling techniques have progressed a lot.
Recent software such as \href{https://alicevision.org/\#meshroom}{Meshroom} use \emph{structure from motion} and \emph{multi view stereo} to infer a 3D model from a set of photographs.
There are more and more devices that are specifically built to obtain 3D data: some are more expensive and provide with very precise information such as LIDAR (Light Detection And Ranging, as in RADAR but with light instead of radio waves), and some cheaper devices can obtain coarse data such as the Kinect.
During the last years, 3D acquisition and modeling techniques have made tremendous progress.
Recent software such as \href{https://alicevision.org/\#meshroom}{Meshroom} use \emph{structure-from-motion} and \emph{multi-view-stereo} to infer a 3D model from a set of photographs.
There are more and more devices that are specifically built to harvest 3D data: some still very expensive and provide precise information such as LIDAR (Light Detection And Ranging, as in RADAR but with light instead of radio waves), while some cheaper devices can obtain coarse data such as the Kinect.
Thanks to these techniques, more and more 3D data become available.
These models have potential for multiple purposes, for example, they can be 3D printed, which can reduce the production cost of some pieces of hardware or enable the creation of new objects, but most uses will consist in visualisation.
These models have potential for multiple purposes, for example, they can be printed, which can reduce the production cost of some pieces of hardware or enable the creation of new objects, but most uses are based on visualisation.
For example, they can be used for augmented reality, to provide user with feedback that can be useful to help worker with complex tasks, but also for fashion (for example, \emph{Fitting Box} is a company that develops software to virtually try glasses).
3D acquisition and visualisation is also useful to preserve cultural heritage, and software such as Google Heritage or 3DHop are such examples, or to allow users navigating in a city (as in Google Earth or Google Maps in 3D).
\href{https://sketchfab.com}{Sketchfab} is an example of a website allowing users to share their 3D models and visualise models from other users.
In most 3D visualisation systems, the 3D data is stored on a server and needs to be transmitted to a terminal before the user can visualise it.
The improvements in the acquisition setups we described lead to an increasing quality of the 3D models, thus an increasing size in bytes as well.
Simply downloading 3D content and waiting until the content is fully downloaded to let the user visualise it is no longer a satisfactory solution, so streaming needs to be performed.
Simply downloading 3D content and waiting until the content is fully downloaded to let the user visualise it is no longer a satisfactory solution, so adaptive streaming is needed.
In this thesis, we propose a full framework for the navigation and the streaming of large 3D scenes, such as districts or whole cities.
% With the progress in data acquisition and modeling techniques, networked virtual environments, or NVE, are increasing in scale.

View File

@ -3,7 +3,7 @@
First, in Chapter~\ref{f}, we give some preliminary information required to understand the types of objects we are manipulating in this thesis.
We then proceed to compare 3D and video content: surprisingly, video and 3D share many features, and analysing video setting gives inspiration for building a 3D streaming system.
In Chapter~\ref{sote}, we present a review of the state of the art in the multimedia interaction and streaming.
In Chapter~\ref{sote}, we present a review of the state of the art in multimedia interaction and streaming.
This chapter starts with an analysis of the video streaming standards.
Then it reviews the different 3D streaming approaches.
The last section of this chapter focuses on 3D interaction.
@ -24,4 +24,4 @@ We finally evaluate the different parameters of our client.
In Chapter~\ref{sb}, we present our last contribution: the integration of the interaction ideas that we developed in Chapter~\ref{bi} into DASH-3D.
We first develop an interface that allows desktop as well as mobile devices to navigate in a 3D scene being streamed, and that introduces a new style of bookmarks.
We then explain why simply applying the ideas developed in Chapter~\ref{bi} is not sufficient and we propose more efficient pre-computations that can enhance the streaming.
Finally, we present a user study that provides us with traces on which we can perform simulations, and we evaluate the impact of our extension of DASH-3D on the quality of service and on the quality of experience.\todo{maybe only qos here}
Finally, we present a user study that provides us with traces on which we evaluate the impact of our extension of DASH-3D on the quality of service and on the quality of experience.

View File

@ -4,24 +4,22 @@
\subsection{DASH\@: the standard for video streaming\label{sote:dash}}
\copied{}
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH \citep{dash-std,dash-std-2}, is now a widely deployed
standard for streaming adaptive video content on the Web \citep{dash-std-full}, made to be simple and scalable.
\fresh{}
DASH is based on a clever way of preparing and structuring a video in order to allow a great adaptability of the streaming without requiring any server side computation.
\subsubsection{DASH structure}
All the content structure is described in a Media Presentation Description (MPD) file, written in the XML format.
This file has 4 layers: the periods, the adaptation sets, the representations and the segments.
A MPD has a tree-structure, meaning that it has many periods, each period can have many adaptation sets, each adaptation set can have many representation, and each representation can have many segments.
A MPD has a tree-structure, meaning that it has multiple periods, each period can have multiple adaptation sets, each adaptation set can have multiple representation, and each representation can have multiple segments.
\paragraph{Periods.}
Periods are used to delimit content depending on time.
It can be used to delimit chapters, or to add advertisements that occur at the beginning, during or at the end of a video.
\paragraph{Adaptation sets.}
Adaptation sets are used to delimit content depending on format.
Adaptation sets are used to delimit content according to the format.
Each adaptation set has a mime-type, and all the representations and segments that it contains share this mime-type.
In videos, most of the time, each period has at least one adaptation set containing the images, and one adaptation set containing the sound.
It may also have an adaptation set for subtitles.
@ -33,18 +31,18 @@ This allows a user to choose its representation and change it during the video,
\paragraph{Segments.}
Until this level in the MPD, content has been divided but it is still far from being sufficiently divided to be streamed efficiently.
In fact, a representation of the images of a chapter of a movie is still a long video, and keeping such a big file is not possible since heavy files prevent the dynamic of streaming: if the user requests to change the level of resolution of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
In fact, a representation of the images of a chapter of a movie is still a long video, and keeping such a big file is not possible since heavy files prevent streaming adaptability: if the user requests to change the level of resolution of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
Segments are used to prevent this issue.
They typically encode files that contain approximately one second of video, and give the software a great ability to dynamically adapt to the system.
If a user wants to seek somewhere else in the video, only one second of data can be lost, and only one second of data has to be downloaded for the playback to resume.
They typically encode files that contain approximately one second of video, and give the software a greater ability to dynamically adapt to the system.
If a user wants to seek somewhere else in the video, only one second of data can be lost, and only one second of data needs to be downloaded for the playback to resume.
\subsubsection{Content preparation and server}
Encoding a video in DASH format consists in partitioning the content into periods, adaptation sets, representations and segments as explained above, and generating a Media Presentation Description file (MPD) that describes this organisation.
Once the data is prepared, it can simply be hosted on a static HTTP server that does no computation other than serving files when it receives requests.
All the intelligence and the decision making is moved to the client side.
This is one of the strengths of DASH\@: no powerful server is required, and since static HTTP server are studied since the beginning of the internet, they are stable and efficient and all DASH clients can benefit from it.
This is one of the DASH strengths: no powerful server is required, and since static HTTP server are stable and efficient, all DASH clients can benefit from it.
\subsubsection{Client side computation}

View File

@ -35,8 +35,8 @@
\color{white}\textbf{Gilles GESQUIÈRE}, rapporteur\\
\color{white}\textbf{Wei Tsang OOI}, examiner\\
\color{white}\textbf{Vincent CHARVILLAT}, thesis supervisor\\
\color{white}\textbf{Géraldine MORIN}, thesis co-supervisor\\
\color{white}\textbf{Axel CARLIER}, thesis co-supervisor
\color{white}\textbf{Axel CARLIER}, thesis co-supervisor\\
\color{white}\textbf{Géraldine MORIN}, thesis co-supervisor
};
\node at (current page.south)[