Massive cleaning

This commit is contained in:
Thomas Forgione 2023-05-02 17:57:14 +02:00
parent 019d9b7704
commit eb3afbee34
22 changed files with 213 additions and 216 deletions

11
abstracts/fourth.typ Normal file
View File

@ -0,0 +1,11 @@
#set text(size: 11pt)
#heading(level: 4, numbering: none)[Abstract]
#set text(size: 8pt)
#include "en.typ"
#set text(size: 11pt)
#heading(level: 4, numbering: none)[Résumé]
#set text(size: 8pt)
#include "fr.typ"

View File

@ -1 +1,21 @@
#pagebreak()
#pagebreak()
#h(1em) *Titre :* Transmission Adaptative de Modèles 3D Massifs
*Résumé :*
#include "fr.typ"
#pagebreak()
#pagebreak()
#h(1em) *Title:* Dynamic Adaptive 3D Streaming over HTTP
*Abstract:*
#include "en.typ"
// Acknowledgments
#pagebreak()

View File

@ -1,4 +1,4 @@
= Acknowledgments
#heading(level: 2, numbering: none)[Acknowledgments]
// Directeurs de thèse / WTO
First of all, I would like to thank my advisors, Vincent CHARVILLAT, Axel CARLIER, and Géraldine MORIN for luring me into doing a PhD (which was a lot of work), for the support, and for the fun (and beers). // TODO \footnote{drink responsibly}

View File

@ -1,21 +0,0 @@
// Chapter management
#let chapter(title, count: true) = {
if count {
counter("chapter").step()
}
align(right, {
v(100pt)
if (count) {
text(size: 50pt)[Chapter ]
text(size: 150pt, fill: rgb(173, 216, 230), counter("chapter").display())
linebreak()
v(50pt)
}
text(size: 40pt, title)
})
if count {
pagebreak()
}
counter(heading).update(0)
}

30
cover.typ Normal file
View File

@ -0,0 +1,30 @@
#set page(background: image("assets/background.png", width: 100%))
#set text(fill: white)
#align(center + bottom)[
#rect(width: 120%, fill: rgb(0, 0, 0))[
#pad(5pt, text(weight: "bold", size: 20pt)[Dynamic Adaptive 3D Streaming over HTTP])
#text(weight: "bold")[For the University of Toulouse PhD granted by the INP Toulouse]\
#text(weight: "bold")[Presented and defended on Friday 29th November, 2019 by Thomas Forgione]
*Gilles GESQUIÈRE*, president\
*Sidonie CHRISTOPHE*, reviewer \
*Gwendal SIMON*, reviewer\
*Maarten WIJNANTS*, examiner\
*Wei Tsang OOI*, examiner\
*Vincent CHARVILLAT*, thesis supervisor\
*Axel CARLIER*, thesis co-supervisor\
*Géraldine MORIN*, thesis co-supervisor
#set text(size: 10pt)
#align(left, [*Doctoral school and field*: EDMITT: École Doctorale de Mathématiques, Informatiques et Télécommunications de
Toulouse\
*Field*: Computer science and telecommunication\
*Research unit*: IRIT (5505)\
*Thesis supervisors*: Vincent CHARVILLAT, Axel CARLIER and Géraldine MORIN\
*Reviewers*: Sidonie CHRISTOPHE and Gwendal SIMON
])
]
]

View File

@ -3,7 +3,7 @@ The previous chapter voluntarily remained vague about what \emph{3D data} actual
This chapter presents in detail the 3D data we consider and how they are rendered.
We also give insights about interaction and streaming by comparing the 3D setting to the video one.
= What is a 3D model?
== What is a 3D model?
The 3D models we are interested in are sets of textured meshes, which can potentially be arranged in a scene graph.
Such models can typically contain the following:
@ -76,7 +76,7 @@ An example of object file is visible on @cube.
caption: [The OBJ representation of a cube and its render]
)<cube>
== Rendering a 3D model
=== Rendering a 3D model
A typical 3D renderer follows Algorithm X. // TODO
// \begin{algorithm}[th]

View File

@ -1,4 +1,4 @@
= Implementation details
== Implementation details
During this thesis, a lot of software has been developed, and for this software to be successful and efficient, we chose appropriate languages.
When it comes to 3D streaming systems, we need two kind of software.
@ -6,9 +6,9 @@ When it comes to 3D streaming systems, we need two kind of software.
- *Interactive applications* which can run on as many devices as possible so we can easily conduct user studies. For this context, we chose the *JavaScript* language, since it can run on many devices and it has great support for WebGL.
- *Native applications* which can run fast on desktop devices, in order to prepare data, run simulations and evaluate our ideas. For this context, we chose the *Rust* language, which is a somewhat recent language that provides both the efficiency of C and C++ and the safety of functional languages.
== JavaScript
=== JavaScript
#heading(level: 3, numbering: none)[THREE.js]
#heading(level: 4, numbering: none)[THREE.js]
On the web browser, it is now possible to perform 3D rendering by using WebGL.
However, WebGL is very low level and it can be painful to write code, even to render a simple triangle.
@ -37,7 +37,7 @@ A snippet of the basic usage of these classes is given in @three-hello-world.
caption: [A THREE.js _hello world_]
)<three-hello-world>
#heading(level: 3, numbering: none)[Geometries]
#heading(level: 4, numbering: none)[Geometries]
Geometries are the classes that hold the vertices, texture coordinates, normals and faces.
THREE.js proposes two classes for handling geometries:
@ -45,11 +45,11 @@ THREE.js proposes two classes for handling geometries:
- the *BufferGeometry* class, which is harder to use for a developer, but allows better performance since the developer controls how data is transmitted to the GPU.
== Rust
=== Rust
In this section, we explain the specificities of Rust and why it is an adequate language for writing efficient native software safely.
#heading(level: 3, numbering: none)[Borrow checker]
#heading(level: 4, numbering: none)[Borrow checker]
Rust is a system programming language focused on safety.
It is made to be efficient (and effectively has performances comparable to C // TODO \footnote{\url{https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/rust.html}} or C++\footnote{\url{https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/rust-gpp.html}})
@ -142,7 +142,7 @@ The borrow checker may seem like an enemy to newcomers because it often rejects
It is probably for those reasons that Rust is the _most loved programming language_ according to the Stack Overflow
Developer Survey // TODO in~\citeyear{so-survey-2016}, \citeyear{so-survey-2017}, \citeyear{so-survey-2018} and~\citeyear{so-survey-2019}.
#heading(level: 3, numbering: none)[Tooling]
#heading(level: 4, numbering: none)[Tooling]
Moreover, Rust comes with many programs that help developers.
- #link("https://github.com/rust-lang/rust")[*`rustc`*] is the Rust compiler. It is comfortable due to the clarity and precise explanations of its error messages.
@ -152,7 +152,7 @@ Moreover, Rust comes with many programs that help developers.
- #link("https://github.com/rust-lang/rustfmt")[*`rustfmt`*] auto formats code.
- #link("https://github.com/rust-lang/rust-clippy")[*`clippy`*] is a linter that detects unidiomatic code and suggests modifications.
#heading(level: 3, numbering: none)[Glium]
#heading(level: 4, numbering: none)[Glium]
When we need to perform rendering for 3D content analysis or for evaluation, we use the #link("https://github.com/glium/glium")[*`glium`*] library.
Glium has many advantages over using raw OpenGL calls.
@ -163,7 +163,7 @@ Its objectives are:
- to be fast: the binary produced use optimized OpenGL functions calls;
- to be compatible: glium seeks to support the latest versions of OpenGL functions and falls back to older functions if the most recent ones are not supported on the device.
#heading(level: 3, numbering: none)[Conclusion]
#heading(level: 4, numbering: none)[Conclusion]
In our work, many tasks will consist in 3D content analysis, reorganization, rendering and evaluation.
Many of these tasks require long computations, lasting from hours to entire days.

View File

@ -1,7 +1,6 @@
#import "../chapter.typ"
#chapter.chapter[Foreword]
= Foreword<f>
#include "3d-model.typ"
#include "video-vs-3d.typ"
#include "implementation.typ"

View File

@ -1,16 +1,16 @@
= Similarities and differences between video and 3D
== Similarities and differences between video and 3D
The video streaming setting and the 3D streaming setting share many similarities: at a higher level of abstraction, both systems allow a user to access remote content without having to wait until everything is loaded.
Analyzing similarities and differences between the video and the 3D scenarios as well as having knowledge about video streaming literature are the key to developing an efficient 3D streaming system.
== Chunks of data
=== Chunks of data
In order to be able to perform streaming, data need to be segmented so that a client can request chunks of data and display it to the user while requesting another chunk.
In video streaming, data chunks typically consist in a few seconds of video.
In mesh streaming, some progressive mesh approaches encode a base mesh that contains low resolution geometry and textures and different chunks that increase the resolution of the base mesh.
Otherwise, a mesh can also be segmented by separating geometry and textures, creating chunks that contain some faces of the model, or some other chunks containing textures.
== Data persistence
=== Data persistence
One of the main differences between video and 3D streaming is data persistence.
In video streaming, only one chunk of video is required at a time.
@ -20,7 +20,7 @@ Already a few problems appear here regarding 3D streaming:
- depending on the user's field of view, many chunks may be required to perform a single rendering;
- chunks do not become obsolete the way they do in video, a user navigating in a 3D scene may come back to a same spot after some time, or see the same objects but from elsewhere in the scene.
== Multiple representations
=== Multiple representations
All major video streaming platforms support multi-resolution streaming.
This means that a client can choose the quality at which it requests the content.
@ -34,7 +34,7 @@ It can be chosen directly by the user or automatically determined by analyzing t
Similarly, recent work in 3D streaming have proposed different ways to progressively stream 3D models, displaying a low quality version of the model to the user without latency, and supporting interaction with the model while details are being downloaded.
Such strategies are reviewed in Section X. // TODO
== Media types
=== Media types
Just like a video, a 3D scene is composed of different media types.
In video, those media are mostly images, sounds, and subtitles, whereas in 3D, those media are geometry or textures.
@ -45,7 +45,7 @@ Thus, the most important thing a video streaming system should do is to optimize
That is why, on a video on Youtube for example, there may be 6 available qualities for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 qualities for sound.
This is one of the main differences between video and 3D streaming: in a 3D setting, the ratio between geometry and texture varies from one scene to another, and leveraging between those two types of content is a key problem.
== Interaction
=== Interaction
The ways of interacting with content is another important difference between video and 3D.
In a video interface, there is only one degree of freedom: time.
@ -304,7 +304,7 @@ These types of controls are notably used on the popular mesh editor #link("http:
Another popular way of controlling a free camera in a virtual environment is the first person controls #link("https://threejs.org/examples/?q=controls#misc_controls_pointerlock")[(live example here)].
These controls are typically used in shooting video games, the mouse rotates the camera and the keyboard translates it.
== Relationship between interface, interaction and streaming
=== Relationship between interface, interaction and streaming
In both video and 3D systems, streaming affects interaction.
For example, in a video streaming scenario, if a user sees that the video is fully loaded, they might start moving around on the timeline, but if they see that the streaming is just enough to not stall, they might prefer not interacting and just watch the video.

View File

@ -1,4 +1,4 @@
= Open problems
== Open problems
The objective of our work is to design a system which allows a user to access remote 3D content.
A 3D streaming client has lots of tasks to accomplish:

View File

@ -1,6 +1,8 @@
#import "../chapter.typ"
#heading(level: 1, numbering: none)[Introduction]
#chapter.chapter(count: false)[Introduction]
#set heading(numbering: (..nums) => {
nums.pos().slice(1).join(".")
})
During the last years, 3D acquisition and modeling techniques have made tremendous progress.
Recent software uses 2D images from cameras to reconstruct 3D data, e.g.
@ -11,14 +13,11 @@ These models have potential for multiple purposes, for example, they can be prin
For example, they can be used for augmented reality, to provide user with feedback that can be useful to help worker
with complex tasks, but also for fashion (for example, #link("https://www.fittingbox.com")[Fittingbox] is a company that develops software to virtually try glasses, as in @fittingbox).
#v(50pt)
#figure(
image("../assets/introduction/fittingbox.png", width: 45%),
caption: [My face with augmented glasses]
)<fittingbox>
#pagebreak()
3D acquisition and visualization is also useful to preserve cultural heritage, and software such as Google Heritage or 3DHop are such examples, or to allow users navigating in a city (as in Google Earth or Google Maps in 3D).
#link("https://sketchfab.com")[Sketchfab] (see @sketchfab) is an example of a website allowing users to share their 3D models and visualize models from other users.

View File

@ -1,9 +1,9 @@
= Thesis outline
== Thesis outline
First, in Chapter X, we give some preliminary information required to understand the types of objects we are manipulating in this thesis.
First, in @f, we give some preliminary information required to understand the types of objects we are manipulating in this thesis.
We then proceed to compare 3D and video content: video and 3D share many features, and analyzing video setting gives inspiration for building a 3D streaming system.
In Chapter X, we present a review of the state of the art in multimedia interaction and streaming.
In @rw, we present a review of the state of the art in multimedia interaction and streaming.
This chapter starts with an analysis of the video streaming standards.
Then it reviews the different 3D streaming approaches.
The last section of this chapter focuses on 3D interaction.
@ -11,7 +11,6 @@ The last section of this chapter focuses on 3D interaction.
Then, in Chapter X, we present our first contribution: an in-depth analysis of the impact of the UI on navigation and streaming in a 3D scene.
We first develop a basic interface for navigating in 3D and then, we introduce 3D objects called _bookmarks_ that help users navigating in the scene.
We then present a user study that we conducted on 51 people which shows that bookmarks ease user navigation: they improve performance at tasks such as finding objects.
% Then, we setup a basic 3D streaming system that allows us to replay the traces collected during the user study and simulate 3D streaming at the same time.
We analyze how the presence of bookmarks impacts the streaming: we propose and evaluate streaming policies based on precomputations relying on bookmarks and that measurably increase the quality of experience.
In Chapter X, we present the most important contribution of this thesis: DASH-3D.

124
main.typ
View File

@ -1,116 +1,14 @@
#set page(paper: "a4")
#import "template.typ"
#show link: content => {
set text(fill: blue)
content
}
#show: doc => template.phd(doc)
#show cite: content => {
set text(fill: blue)
content
}
// Code formatting
#show raw.where(block: true): it => {
set par(justify: false)
let split = it.text.split("\n")
let len = split.len()
grid(
columns: (100%, 100%),
column-gutter: -100%,
block(width: 100%, inset: 1em, for (i, line) in split.enumerate() {
if i != len - 1 {
box(width: 0pt, align(right, str(i + 1) + h(2em)))
hide(line)
linebreak()
}
}),
block(radius: 1em, fill: luma(246), width: 100%, inset: 1em, it),
)
}
#show heading: content => {
content
v(1em)
}
#show figure: content => {
content
v(1em)
}
// First page
#set page(background: image("assets/background.png", width: 100%))
#set text(fill: white)
#align(center + bottom)[
#rect(width: 120%, fill: rgb(0, 0, 0))[
#pad(5pt, text(weight: "bold", size: 20pt)[Dynamic Adaptive 3D Streaming over HTTP])
#text(weight: "bold")[For the University of Toulouse PhD granted by the INP Toulouse]\
#text(weight: "bold")[Presented and defended on Friday 29th November, 2019 by Thomas Forgione]
*Gilles GESQUIÈRE*, president\
*Sidonie CHRISTOPHE*, reviewer \
*Gwendal SIMON*, reviewer\
*Maarten WIJNANTS*, examiner\
*Wei Tsang OOI*, examiner\
*Vincent CHARVILLAT*, thesis supervisor\
*Axel CARLIER*, thesis co-supervisor\
*Géraldine MORIN*, thesis co-supervisor
#set text(size: 10pt)
#align(left, [*Doctoral school and field*: EDMITT: École Doctorale de Mathématiques, Informatiques et Télécommunications de
Toulouse\
*Field*: Computer science and telecommunication\
*Research unit*: IRIT (5505)\
*Thesis supervisors*: Vincent CHARVILLAT, Axel CARLIER and Géraldine MORIN\
*Reviewers*: Sidonie CHRISTOPHE and Gwendal SIMON
])
]
]
#set text(fill: black)
#set par(first-line-indent: 1em, justify: true, leading: 1em)
// Abstracts
#pagebreak()
#set page(background: none)
#pagebreak()
#h(1em) *Titre :* Transmission Adaptative de Modèles 3D Massifs
*Résumé :*
#include "abstracts/fr.typ"
#pagebreak()
#pagebreak()
#set page(background: none)
#h(1em) *Title:* Dynamic Adaptive 3D Streaming over HTTP
*Abstract:*
#include "abstracts/en.typ"
// Acknowledgments
#pagebreak()
#include "cover.typ"
#include "abstracts/main.typ"
#include "acknowledgments.typ"
// Content of the thesis
#pagebreak()
#set text(size: 11pt)
#set heading(numbering: "1.1")
#include "introduction/main.typ"
#set heading(numbering: (..nums) =>
counter("chapter").display() + "." + nums
.pos()
.map(str)
.join(".")
)
#pagebreak()
#include "foreword/main.typ"
@ -120,21 +18,9 @@
#pagebreak()
#include "preliminary-work/main.typ"
// Bibliography
#pagebreak()
#bibliography("bib.bib", style: "chicago-author-date")
#pagebreak()
#include "abstracts/fourth.typ"
// Abstracts
#set text(size: 11pt)
#heading(level: 4, numbering: none)[Abstract]
#set text(size: 8pt)
#include "abstracts/en.typ"
#set text(size: 11pt)
#heading(level: 4, numbering: none)[Résumé]
#set text(size: 8pt)
#include "abstracts/fr.typ"

View File

@ -1,10 +1,10 @@
= Impact of 3D bookmarks on navigation
== Impact of 3D bookmarks on navigation
We now describe an experiment that we conducted on 51 participants, with two goals in mind.
First, we want to measure the impact of 3D bookmarks on navigation within an NVE\@.
Second, we want to collect traces from the users so that we can replay them for reproducible experiments for comparing streaming strategies in Section~\ref{bi:system}.
== Our NVE
=== Our NVE
To ease the deployment of our experiments to users in distributed locations on a crowdsourcing platform, we implement a simple web-based NVE client using THREE.js// \footnote{http://threejs.org}.
The NVE server is implemented with node.js. // \footnote{http://nodejs.org}.
The NVE server streams a 3D scene to the client; the client renders the scene as the 3D content is received.
@ -19,7 +19,7 @@ The mouse movement controls the camera rotation.
The user can always choose to lock the pointer, or unlock it using the escape key.
The interface also includes a button to reset the camera back to the starting position in the scene.
== 3D bookmarks
=== 3D bookmarks
Our NVE supports 3D bookmarks.
A 3D bookmark, or bookmark for short, is simply a fixed camera location (in 3D space), a view direction, and a focal.
Bookmarks visible from the user's current viewpoint are shown as 3D objects in the scene.
@ -40,22 +40,22 @@ Since bookmarks are part of the scene, they are visible only when not hidden by
We chose size and colors that are salient enough to be easily seen, but not too large to limit the occlusion of regions within the scene.
When reaching the bookmark, the corresponding arrow or viewport is not visible anymore, and subsequently will appear in a different color, to indicate that it has been clicked (similar to web links).
== User study
=== User study
We now describe in details our experimental setup and the user study that we conducted on 3D navigation.
#heading(level: 3, numbering: none)[Models]
#heading(level: 4, numbering: none)[Models]
We use four 3D scenes (one for the tutorial and three for the actual experiments) which represent recreated scenes from a famous video game.
Those models are light (a few thousand of triangles per model) and are sent before the experiment starts.
We keep the models small so that users can perform the task with acceptable latency from any country using a decent internet connection.
Our NVE does not stream the 3D content for these experiments, in order to avoid unreliable conditions caused by the network bandwidth variation, which might affect how the users interact.
#heading(level: 3, numbering: none)[Task design]
#heading(level: 4, numbering: none)[Task design]
Since we are interested in studying how efficiently users navigate in the 3D scene, we ask our participants to complete a task which forces them to visit, at least partially, various regions in the scene.
To this end, we hide a set of 8 coins on the scene: participants are asked to collect the coins by clicking on them.
In order to avoid any bias due to the coins position, we predefined 50 possible coin locations per scene, and randomly select 8 out of these 50 positions each time a new participant starts the experiment.
#heading(level: 3, numbering: none)[Experiment]
#heading(level: 4, numbering: none)[Experiment]
Participants are first presented with an initial screen to collect some preliminary information: age, gender, the last time they played 3D video games, and self-rated 3D gaming skills. We ask those questions because we believe that someone who is used to playing 3D video games should browse the scene more easily, and thus, may not need to use our bookmarks.
Then, the participants go through a tutorial to learn how the UI works, and how to complete the task.
@ -94,15 +94,15 @@ After completing the three tasks, the participants have to answer a set of quest
caption: [List of questions in the questionnaire and summary of answers. Questions 1 and 2 have a 99% confidence interval.],
)<bi:questions>
#heading(level: 3, numbering: none)[Participants]
#heading(level: 4, numbering: none)[Participants]
The participants were recruited on microworkers.com, a crowdsourcing website.
There were 51 participants (36 men and 15 women), who are in average 30.44 years old.
== Experimental results
=== Experimental results
We now present the results from our user study, focusing on whether bookmarks help users navigating the 3D scene.
#heading(level: 3, numbering: none)[Questionnaire]
#heading(level: 4, numbering: none)[Questionnaire]
We had 51 responses to the questionnaire.
The answers are summarized in Table~\ref{bi:questions}.
Note that not all questions were answered by all participants.
@ -116,7 +116,7 @@ This is slightly in contradiction with our setup; even if coins may appear in so
The strongest result is that almost all users (49 out of 51) find bookmarks to be helpful.
In addition, users seem to have a preference for \Arrows{} against \Viewports{} (32 against 7).
#heading(level: 3, numbering: none)[Analysis of interactions]
#heading(level: 4, numbering: none)[Analysis of interactions]
#figure(
table(
@ -159,7 +159,7 @@ Although users tend to spend less time on the tasks when they do not have bookma
As a consequence, they visit the scene faster in average with bookmarks, than without bookmarks.
The table shows that this higher speed is due to the bookmarks, as more than 60\% of the distance traveled by users with bookmarks happens when users click on bookmarks and fly to the destination.
#heading(level: 3, numbering: none)[Discussion]
#heading(level: 4, numbering: none)[Discussion]
In the previous paragraphs, we have shown how bookmarks are well perceived by users (looking at the questionnaire answers).
We also showed that users tend to be more efficient in completing the task when they have bookmarks than when they do not.

View File

@ -1,4 +1,4 @@
= Introduction
== Introduction
Navigating in NVE with a large virtual space (most times through a 2D interface) is sometimes cumbersome.
In particular, a user may have difficulties reaching the right place to find information.

View File

@ -1,6 +1,4 @@
#import "../chapter.typ"
#chapter.chapter[Bookmarks, navigation and streaming]
= Bookmarks, navigation and streaming
#figure(
grid(

View File

@ -1,6 +1,6 @@
= Impact of 3D bookmarks on streaming
== Impact of 3D bookmarks on streaming
== 3D model streaming
=== 3D model streaming
In this section, we describe our implementation of a 3D model streaming policy in our simulation.
A summary of the streaming policies we designed is given in Table X.
@ -66,12 +66,12 @@ Note that the server may send faces that are occluded and not visible to the cli
In the following, we shall denote this streaming policy \textsf{culling}; in Figures~\ref{bi:click-1250} and~\ref{bi:click-625} streaming using \textsf{culling} only is denoted \textsf{C-only}.
== 3D bookmarks
=== 3D bookmarks
We have seen (Figure~\ref{bi:triangles-curve}) that navigation with bookmarks is more demanding on the bandwidth.
We want to exploit bookmarks to improve the user's quality of experience. For this purpose, we propose two streaming policies based on offline computation of the relevance of 3D content to bookmarked viewpoints.
#heading(level: 3, numbering: none)[Visibility determination for 3D bookmarks]
#heading(level: 4, numbering: none)[Visibility determination for 3D bookmarks]
A bookmarked viewpoint is more likely to be accessed, compared to other arbitrary viewpoint in the 3D scene.
We exploit this fact to perform some precomputation on the 3D content visible from the bookmarked viewpoint.
@ -126,7 +126,7 @@ In what follows, we will refer to this streaming policy as \textsf{visible}.
// \caption{Comparison of rendered image quality (average on all bookmarks and starting position): the triangles are sorted offline (green curve), or sorted online by distance to the viewpoint (blue curve).}\label{bi:sorted-tri}
// \end{figure}
#heading(level: 3, numbering: none)[Prefetching by predicting the next bookmark clicked]
#heading(level: 4, numbering: none)[Prefetching by predicting the next bookmark clicked]
We can now use the precomputed, visibility-based streaming of 3D content for the bookmarks to reduce the amount of traffic needed.
Next, we propose to prefetch the 3D content from the bookmarks.
@ -183,7 +183,7 @@ We denote this combination as \textsf{V-PP}, for Prefetching based on Prediction
// \caption{Example of how a chunk can be divided into fetching what is needed to display the current viewport (culling), and prefetching three recommendations according to their probability of being visited next.\label{bi:prefetched-chunk}}
// \end{figure}
#heading(level: 3, numbering: none)[Fetching destination bookmark]
#heading(level: 4, numbering: none)[Fetching destination bookmark]
An alternate method to benefit from the precomputing visible triangles at the bookmark, is to fetch 3D content during the "fly-to" transition to reach the destination.
Indeed, as specified in Section~\ref{bi:3dnavigation}, moving to a bookmarked viewpoint is not instantaneous, but rather takes a small amount of time to smoothly move the user camera from its initial position towards the bookmark.
@ -209,7 +209,7 @@ We call this method \textsf{V-FD}, since we are Fetching the 3D data from the De
// \caption{Summary of the streaming policies\label{bi:streaming-policies}}
// \end{table}
== Comparing streaming policies
=== Comparing streaming policies
In order to determine which policy to use, we replay the traces from the user study while simulating different streaming policies.
The first point we are interested in is which streaming policy leads to the lower discovery latency and better image

View File

@ -1,4 +1,4 @@
= 3D bookmarks and navigation aids
== 3D bookmarks and navigation aids
One of the uses for 3D streaming is to allow users interacting with the content while it is being downloaded.
However, devising an ergonomic technique for browsing 3D environments through a 2D interface is difficult.

View File

@ -1,10 +1,10 @@
= 3D streaming
== 3D streaming
In this thesis, we focus on the objective of delivering large, massive 3D scenes over the network.
While 3D streaming is not the most popular research field, there has been a special attention around 3D content compression, in particular progressive compression which can be considered a premise for 3D streaming.
In the next sections, we review the 3D streaming related work, from 3D compression and structuring to 3D interaction.
== Compression and structuring
=== Compression and structuring
According to #cite("maglo20153d"), mesh compression can be divided into four categories:
- single-rate mesh compression, seeking to reduce the size of a mesh;
@ -100,7 +100,7 @@ glTF is based on a JSON file, which encodes the structure of a scene of 3D objec
It contains a scene graph with cameras, meshes, buffers, materials, textures and animations.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming, which is required for large scene remote visualization and which we address in our work.
== Viewpoint dependency
=== Viewpoint dependency
3D streaming means that content is downloaded while the user is interacting with the 3D object.
In terms of quality of experience, it is desirable that the downloaded content falls into the user's field of view.
@ -135,7 +135,7 @@ Level of details have then been used for 3D streaming.
For example, #cite("streaming-hlod") propose an out-of-core viewer for remote model visualization based by adapting hierarchical level of details #cite("hlod") to the context of 3D streaming.
Level of details can also be used to perform viewpoint dependant streaming, such as #cite("view-dependent-lod").
== Texture streaming
=== Texture streaming
In order to increase the texture rendering speed, a common technique is the _mipmapping_ technique.
It consists in generating progressively lower resolutions of an initial texture.
@ -149,7 +149,7 @@ Each texture is segmented into tiles of a fixed size.
Those tiles are then ordered to minimize dissimilarities between consecutive tiles, and encoded as a video.
By benefiting from the video compression techniques, the authors are able to reach a better rate-distortion ratio than webp, which is the new standard for texture transmission, and jpeg.
== Geometry and textures
=== Geometry and textures
As discussed in Chapter~\ref{f:3d}, most 3D scenes consist in two main types of data: geometry and textures.
When addressing 3D streaming, one must handle the concurrency between geometry and textures, and the system needs to address this compromise.
@ -164,7 +164,7 @@ Since the 3D scenes we are interested in in our work consist in soups of texture
% All four works considered a single, manifold textured mesh model with progressive meshes, and are not applicable in our work since we deal with large and potentially non-manifold scenes.
== Streaming in game engines
=== Streaming in game engines
In traditional video games, including online games, there is no requirement for 3D data streaming.
Video games either come with a physical support (CD, DVD, Blu-Ray) or they require the downloading of the game itself, which includes the 3D data, before letting the user play.
@ -175,7 +175,7 @@ Some other online games, such as #link("https://secondlife.com")[Second Life], r
In such scenarios, 3D streaming is appropriate and this is why the idea of streaming 3D content for video games has been investigated.
For example, #cite("game-on-demand") proposes an online game engine based on geometry streaming, that addresses the challenge of streaming 3D content at the same time as synchronization of the different players.
== NVE streaming frameworks
=== NVE streaming frameworks
An example of NVE streaming framework is 3D Tiles #cite("3d-tiles"), which is a specification for visualizing massive 3D geospatial data developed by Cesium and built on top of glTF.
Their main goal is to display 3D objects on top of regular maps, and their visualization consists in a top-down view, whereas we seek to let users freely navigate in our scenes, whether it be flying over the scene or moving along the roads.

View File

@ -1,6 +1,4 @@
#import "../chapter.typ"
#chapter.chapter[Related work]
= Related work<rw>
In this chapter, we review the part of the state of the art on multimedia streaming and interaction that is relevant for this thesis.
As discussed in the previous chapter, video and 3D share many similarities and since there is already a very important body of work on video streaming, we start this chapter with a review of this domain with a particular focus on the DASH standard.

View File

@ -1,4 +1,4 @@
= Video
== Video
Accessing a remote video through the web has been a widely studied problem since the 1990s.
The Real-time Transport Protocol (RTP, #cite("rtp-std")) has been an early attempt to formalize audio and video streaming.
@ -11,34 +11,34 @@ While RTP is stateful (that is to say, it requires keeping track of every user a
Furthermore, an HTTP server can easily be replicated at different geographical locations, allowing users to fetch data from the closest server.
This type of network architecture is called CDN (Content Delivery Network) and increases the speed of HTTP requests, making HTTP based multimedia streaming more efficient.
== DASH: the standard for video streaming
=== DASH: the standard for video streaming
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH #cite("dash-std", "dash-std-2") is now a widely deployed
standard for adaptively streaming video on the web #cite("dash-std-full"), made to be simple, scalable and inter-operable.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content to download, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD.
#heading(level: 3, numbering: none)[DASH structure]
#heading(level: 4, numbering: none)[DASH structure]
All the content structure is described in a Media Presentation Description (MPD) file, written in the XML format.
This file has 4 layers: the periods, the adaptation sets, the representations, and the segments.
An MPD has a hierarchical structure, meaning it has multiple periods, and each period can have multiple adaptation sets, each adaptation set can have multiple representation, and each representation can have multiple segments.
#heading(level: 4, numbering: none)[Periods]
#heading(level: 5, numbering: none)[Periods]
Periods are used to delimit content depending on time.
It can be used to delimit chapters, or to add advertisements that occur at the beginning, during or at the end of a video.
#heading(level: 4, numbering: none)[Adaptation sets]
#heading(level: 5, numbering: none)[Adaptation sets]
Adaptation sets are used to delimit content according to the format.
Each adaptation set has a mime-type, and all the representations and segments that it contains share this mime-type.
In videos, most of the time, each period has at least one adaptation set containing the images, and one adaptation set containing the sound.
It may also have an adaptation set for subtitles.
#heading(level: 4, numbering: none)[Representations]
#heading(level: 5, numbering: none)[Representations]
The representation level is the level DASH uses to offer the same content at different levels of quality.
For example, an adaptation set containing images has a representation for each available quality (it might be 480p, 720p, 1080p, etc.).
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal representation, being the highest quality that the client can request without stalling.
#heading(level: 4, numbering: none)[Segments]
#heading(level: 5, numbering: none)[Segments]
Until this level in the MPD, content has been divided but it is still far from being sufficiently divided to be streamed efficiently.
A representation of the images of a chapter of a movie is still a long video, and keeping such a big file is not possible since heavy files prevent streaming adaptability: if the user requests to change the quality of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
@ -48,7 +48,7 @@ If a user wants to seek somewhere else in the video, only one segment of data is
of data needs to be downloaded for the playback to resume. The impact of the segment duration has been investigated in many work, including #cite("sideris2015mpeg", "stohr2017sweet").
For example, #cite("stohr2017sweet") discuss how the segment duration affects the streaming: short segments lower the initial delay and provide the best stalling quality of experience, but make the total downloading time of the video longer because of overhead.
#heading(level: 3, numbering: none)[Content preparation and server]
#heading(level: 4, numbering: none)[Content preparation and server]
Encoding a video in DASH format consists in partitioning the content into periods, adaptation sets, representations and segments as explained above, and generating a Media Presentation Description file (MPD) which describes this organization.
Once the data are prepared, they can simply be hosted on a static HTTP server which does no computation other than serving files when it receives requests.
@ -56,14 +56,14 @@ All the intelligence and the decision making is moved to the client side.
This is one of the DASH strengths: no powerful server is required, and since static HTTP server are mature and efficient, all DASH clients can benefit from it.
#heading(level: 3, numbering: none)[Client side adaptation]
#heading(level: 4, numbering: none)[Client side adaptation]
A client typically starts by downloading the MPD file, and then proceeds on downloading segments from the different adaptation sets. While the standard describes well how to structure content on the server side, the client may be freely implemented to take into account the specificities of a given application.
The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have their own
adaptation logic, and many more have been proposed in the literature.
A thorough review is beyond the scope of this state-of-the-art, but examples include #cite("chiariotti2016online") who formulate the problem in a reinforcement learning framework, #cite("yadav2017quetra") who formulate the problem using queuing theory, or #cite("huang2019hindsight") who use a formulation derived from the knapsack problem.
== DASH-SRD
=== DASH-SRD
Being now widely adopted in the context of video streaming, DASH has been adapted to various other contexts.
DASH-SRD (Spatial Relationship Description, #cite("dash-srd")) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device.
It works by encoding a video at multiple resolutions, and tiling the highest resolutions as shown in Figure \ref{sota:srd-png}.

78
template.typ Normal file
View File

@ -0,0 +1,78 @@
#let phd(doc) = {
set page(paper: "a4")
set par(first-line-indent: 1em, justify: true, leading: 1em)
// Code formatting
show raw.where(block: true): it => {
set par(justify: false)
let split = it.text.split("\n")
let len = split.len()
grid(
columns: (100%, 100%),
column-gutter: -100%,
block(width: 100%, inset: 1em, for (i, line) in split.enumerate() {
if i != len - 1 {
box(width: 0pt, align(right, str(i + 1) + h(2em)))
hide(line)
linebreak()
}
}),
block(radius: 1em, fill: luma(246), width: 100%, inset: 1em, it),
)
}
show heading: content => {
content
v(1em)
}
show figure: content => {
content
v(1em)
}
show link: content => {
set text(fill: blue)
content
}
show cite: content => {
set text(fill: blue)
content
}
show ref: content => {
set text(fill: blue)
content
}
set heading(supplement: (..nums) =>
if (nums.pos().len() == 1) {
[Chapter]
} else {
[Section]
}
)
set heading(numbering: "1.1")
show heading.where(level: 1): it => {
align(right, {
v(100pt)
if it.numbering != none {
text(size: 50pt)[Chapter ]
text(counter(heading).display(), size: 150pt, fill: rgb(173, 216, 230))
}
v(50pt)
text(it.body, size: 40pt)
if it.numbering != none {
pagebreak()
} else {
v(40pt)
}
})
}
doc
}