Massive cleaning

This commit is contained in:
2023-05-02 17:57:14 +02:00
parent 019d9b7704
commit eb3afbee34
22 changed files with 213 additions and 216 deletions
+1 -1
View File
@@ -1,4 +1,4 @@
= 3D bookmarks and navigation aids
== 3D bookmarks and navigation aids
One of the uses for 3D streaming is to allow users interacting with the content while it is being downloaded.
However, devising an ergonomic technique for browsing 3D environments through a 2D interface is difficult.
+7 -7
View File
@@ -1,10 +1,10 @@
= 3D streaming
== 3D streaming
In this thesis, we focus on the objective of delivering large, massive 3D scenes over the network.
While 3D streaming is not the most popular research field, there has been a special attention around 3D content compression, in particular progressive compression which can be considered a premise for 3D streaming.
In the next sections, we review the 3D streaming related work, from 3D compression and structuring to 3D interaction.
== Compression and structuring
=== Compression and structuring
According to #cite("maglo20153d"), mesh compression can be divided into four categories:
- single-rate mesh compression, seeking to reduce the size of a mesh;
@@ -100,7 +100,7 @@ glTF is based on a JSON file, which encodes the structure of a scene of 3D objec
It contains a scene graph with cameras, meshes, buffers, materials, textures and animations.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming, which is required for large scene remote visualization and which we address in our work.
== Viewpoint dependency
=== Viewpoint dependency
3D streaming means that content is downloaded while the user is interacting with the 3D object.
In terms of quality of experience, it is desirable that the downloaded content falls into the user's field of view.
@@ -135,7 +135,7 @@ Level of details have then been used for 3D streaming.
For example, #cite("streaming-hlod") propose an out-of-core viewer for remote model visualization based by adapting hierarchical level of details #cite("hlod") to the context of 3D streaming.
Level of details can also be used to perform viewpoint dependant streaming, such as #cite("view-dependent-lod").
== Texture streaming
=== Texture streaming
In order to increase the texture rendering speed, a common technique is the _mipmapping_ technique.
It consists in generating progressively lower resolutions of an initial texture.
@@ -149,7 +149,7 @@ Each texture is segmented into tiles of a fixed size.
Those tiles are then ordered to minimize dissimilarities between consecutive tiles, and encoded as a video.
By benefiting from the video compression techniques, the authors are able to reach a better rate-distortion ratio than webp, which is the new standard for texture transmission, and jpeg.
== Geometry and textures
=== Geometry and textures
As discussed in Chapter~\ref{f:3d}, most 3D scenes consist in two main types of data: geometry and textures.
When addressing 3D streaming, one must handle the concurrency between geometry and textures, and the system needs to address this compromise.
@@ -164,7 +164,7 @@ Since the 3D scenes we are interested in in our work consist in soups of texture
% All four works considered a single, manifold textured mesh model with progressive meshes, and are not applicable in our work since we deal with large and potentially non-manifold scenes.
== Streaming in game engines
=== Streaming in game engines
In traditional video games, including online games, there is no requirement for 3D data streaming.
Video games either come with a physical support (CD, DVD, Blu-Ray) or they require the downloading of the game itself, which includes the 3D data, before letting the user play.
@@ -175,7 +175,7 @@ Some other online games, such as #link("https://secondlife.com")[Second Life], r
In such scenarios, 3D streaming is appropriate and this is why the idea of streaming 3D content for video games has been investigated.
For example, #cite("game-on-demand") proposes an online game engine based on geometry streaming, that addresses the challenge of streaming 3D content at the same time as synchronization of the different players.
== NVE streaming frameworks
=== NVE streaming frameworks
An example of NVE streaming framework is 3D Tiles #cite("3d-tiles"), which is a specification for visualizing massive 3D geospatial data developed by Cesium and built on top of glTF.
Their main goal is to display 3D objects on top of regular maps, and their visualization consists in a top-down view, whereas we seek to let users freely navigate in our scenes, whether it be flying over the scene or moving along the roads.
+1 -3
View File
@@ -1,6 +1,4 @@
#import "../chapter.typ"
#chapter.chapter[Related work]
= Related work<rw>
In this chapter, we review the part of the state of the art on multimedia streaming and interaction that is relevant for this thesis.
As discussed in the previous chapter, video and 3D share many similarities and since there is already a very important body of work on video streaming, we start this chapter with a review of this domain with a particular focus on the DASH standard.
+10 -10
View File
@@ -1,4 +1,4 @@
= Video
== Video
Accessing a remote video through the web has been a widely studied problem since the 1990s.
The Real-time Transport Protocol (RTP, #cite("rtp-std")) has been an early attempt to formalize audio and video streaming.
@@ -11,34 +11,34 @@ While RTP is stateful (that is to say, it requires keeping track of every user a
Furthermore, an HTTP server can easily be replicated at different geographical locations, allowing users to fetch data from the closest server.
This type of network architecture is called CDN (Content Delivery Network) and increases the speed of HTTP requests, making HTTP based multimedia streaming more efficient.
== DASH: the standard for video streaming
=== DASH: the standard for video streaming
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH #cite("dash-std", "dash-std-2") is now a widely deployed
standard for adaptively streaming video on the web #cite("dash-std-full"), made to be simple, scalable and inter-operable.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content to download, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD.
#heading(level: 3, numbering: none)[DASH structure]
#heading(level: 4, numbering: none)[DASH structure]
All the content structure is described in a Media Presentation Description (MPD) file, written in the XML format.
This file has 4 layers: the periods, the adaptation sets, the representations, and the segments.
An MPD has a hierarchical structure, meaning it has multiple periods, and each period can have multiple adaptation sets, each adaptation set can have multiple representation, and each representation can have multiple segments.
#heading(level: 4, numbering: none)[Periods]
#heading(level: 5, numbering: none)[Periods]
Periods are used to delimit content depending on time.
It can be used to delimit chapters, or to add advertisements that occur at the beginning, during or at the end of a video.
#heading(level: 4, numbering: none)[Adaptation sets]
#heading(level: 5, numbering: none)[Adaptation sets]
Adaptation sets are used to delimit content according to the format.
Each adaptation set has a mime-type, and all the representations and segments that it contains share this mime-type.
In videos, most of the time, each period has at least one adaptation set containing the images, and one adaptation set containing the sound.
It may also have an adaptation set for subtitles.
#heading(level: 4, numbering: none)[Representations]
#heading(level: 5, numbering: none)[Representations]
The representation level is the level DASH uses to offer the same content at different levels of quality.
For example, an adaptation set containing images has a representation for each available quality (it might be 480p, 720p, 1080p, etc.).
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal representation, being the highest quality that the client can request without stalling.
#heading(level: 4, numbering: none)[Segments]
#heading(level: 5, numbering: none)[Segments]
Until this level in the MPD, content has been divided but it is still far from being sufficiently divided to be streamed efficiently.
A representation of the images of a chapter of a movie is still a long video, and keeping such a big file is not possible since heavy files prevent streaming adaptability: if the user requests to change the quality of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
@@ -48,7 +48,7 @@ If a user wants to seek somewhere else in the video, only one segment of data is
of data needs to be downloaded for the playback to resume. The impact of the segment duration has been investigated in many work, including #cite("sideris2015mpeg", "stohr2017sweet").
For example, #cite("stohr2017sweet") discuss how the segment duration affects the streaming: short segments lower the initial delay and provide the best stalling quality of experience, but make the total downloading time of the video longer because of overhead.
#heading(level: 3, numbering: none)[Content preparation and server]
#heading(level: 4, numbering: none)[Content preparation and server]
Encoding a video in DASH format consists in partitioning the content into periods, adaptation sets, representations and segments as explained above, and generating a Media Presentation Description file (MPD) which describes this organization.
Once the data are prepared, they can simply be hosted on a static HTTP server which does no computation other than serving files when it receives requests.
@@ -56,14 +56,14 @@ All the intelligence and the decision making is moved to the client side.
This is one of the DASH strengths: no powerful server is required, and since static HTTP server are mature and efficient, all DASH clients can benefit from it.
#heading(level: 3, numbering: none)[Client side adaptation]
#heading(level: 4, numbering: none)[Client side adaptation]
A client typically starts by downloading the MPD file, and then proceeds on downloading segments from the different adaptation sets. While the standard describes well how to structure content on the server side, the client may be freely implemented to take into account the specificities of a given application.
The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have their own
adaptation logic, and many more have been proposed in the literature.
A thorough review is beyond the scope of this state-of-the-art, but examples include #cite("chiariotti2016online") who formulate the problem in a reinforcement learning framework, #cite("yadav2017quetra") who formulate the problem using queuing theory, or #cite("huang2019hindsight") who use a formulation derived from the knapsack problem.
== DASH-SRD
=== DASH-SRD
Being now widely adopted in the context of video streaming, DASH has been adapted to various other contexts.
DASH-SRD (Spatial Relationship Description, #cite("dash-srd")) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device.
It works by encoding a video at multiple resolutions, and tiling the highest resolutions as shown in Figure \ref{sota:srd-png}.