This commit is contained in:
Thomas Forgione 2023-05-11 11:59:37 +02:00
parent fcc78d0c77
commit 082267411e
12 changed files with 196 additions and 15 deletions

View File

@ -0,0 +1,111 @@
== Content preparation<d3:dash-3d>
In this section, we describe how we preprocess and store the 3D data of the NVE, consisting of a polygon soup, textures, and material information into a DASH-compliant Media Presentation Description (MPD) file.
In our work, we use the `obj` file format for the polygons, `png` for textures, and `mtl` format for material information.
The process, however, applies to other formats as well.
=== The MPD File
In DASH, the information about content storage and characteristics, such as location, resolution, or size, is extracted from an MPD file by the client.
The client relies only on this information to decide which chunk to request and at which quality level.
The MPD file is an XML file that is organized into different sections hierarchically.
The period element is a top-level element, which for the case of video, indicates the start time and length of a video chapter.
This element does not apply to NVE, and we use a single period for the whole scene, as the scene is static.
Each period element contains one or more adaptation sets, which describe the alternate versions, formats, and types of media.
We utilize adaptation sets to organize a 3D scene's material, geometry, and texture.
The piece of software that does the preprocessing of the model consists in file manipulation and is written in Rust.
It successively preprocesses the geometry and then the textures.
The MPD is generated by a library named #link("https://github.com/netvl/xml-rs")[xml-rs] which works like a stack:
- a structure is created on the root of the MPD file;
- the `start_element` method creates a new child in the XML file;
- the `end_element` method ends the current child and pops the stack.
This structure is passed along with our geometry and texture preprocessors that can add elements to the XML file as they are generating the corresponding data chunks.
=== Adaptation sets
When the user navigates freely within an NVE, the frustum at given time almost always contains a limited part of the 3D scene.
Similar to how DASH for video streaming partitions a video clip into temporal chunks, we segment the polygons into spatial chunks, such that the DASH client can request only the relevant chunks.
==== Geometry management<d3:geometry>
We use a space partitioning tree to organize the faces into cells.
A face belongs to a cell if its barycenter falls inside the corresponding bounding box.
Each cell corresponds to an adaptation set.
Thus, geometry information is spread on adaptation sets based on spatial coherence, allowing the client to download the relevant faces selectively.
A cell is relevant if it intersects the frustum of the client's current viewpoint.
@d3:big-picture shows the relevant cells in green.
As our 3D content, a virtual environment, is biased to spread along the horizontal plane, we split the bounding box alternatively along the two horizontal directions.
We create a separate adaptation set for large faces (e.g., the sky or ground) because they are essential to the 3D model and do not fit into cells.
We consider a face to be large if its area in 3D is more than $a+3sigma$, where $a$ and $sigma$ are the average and the standard deviation of 3D area of faces respectively.
In our example, it selects the 5 largest faces that represent $15%$ of the total face area.
We thus obtain a decomposition of the NVE into adaptation sets that partitions the geometry of the scene into an adaptation that contains the larger faces of the model, and smaller adaptation sets containing the remaining faces.
We store the spatial location of each adaptation set, characterized by the coordinates of its bounding box, in the MPD
file as the supplementary property of the adaptation set in the form of "_$x_"min"$, width, $y_"min"$, height,$z_"min"$, depth_" (as shown in @d3:mpd).
This information is used by the client to implement a view-dependent streaming (Section~\ref{d3:dash-client}). // TODO ref
==== Texture management
As with geometry data, we handle textures using adaptation sets but separate from geometry.
Each texture file is contained in a different adaptation set, with multiple representations providing different image resolutions (see @d3:representation[Section]).
We add an attribute to each adaptation set that contains texture, describing the average color of the texture.
The client can use this attribute to render a face for which the corresponding texture has not been loaded yet, so that most objects appear, at least, with a uniform natural color (see @d3:textures).
==== Material management
The material (MTL) file is a text file that describes all materials used in the OBJ files for the entire 3D model.
A material has a name, properties such as specular parameters, and, most importantly, a path to a texture file.
The MTL file maps each face of the OBJ to a material.
As the MTL file is a different type of media than geometry and texture, we define a particular adaptation set for this file, with a single representation.
=== Representations<d3:representation>
Each adaptation set can contain one or more representations of the geometry or texture data, at different levels of detail (e.g., a different number of faces).
For geometry, the resolution (i.e., 3D areas of faces) is heterogeneous, thus applying a sensible multi-resolution representation is cumbersome: the 3D area of faces varies from $0.01$ to more than $10K$, disregarding the outliers.
For textured scenes, it is common to have such heterogeneous geometry size since information can be stored either in geometry or texture.
Thus, handling the streaming compromise between geometry and texture is more adaptive than handling separately multi-resolution geometry.
Moreover, as our faces are partitioned into independent cells, multi-resolution would cause difficult stitching issues such as topological gaps between the cells.
For an adaptation set containing texture, each representation contains a single segment where the image file is stored at the chosen resolution.
In our example, from the full-size image, we generate successive resolutions by dividing both height and width by 2, stopping when the image size is less or equal to $64 times 64$.
@d3:textures illustrates the use of the textures against the rendering using a single, average color per face.
#figure(
grid(
columns: (1fr, 0.2fr, 1fr),
figure(
image("../assets/dash-3d/average-color/full-res.png", width: 100%),
caption: [With full resolution textures]
),
[],
figure(
image("../assets/dash-3d/average-color/no-res.png", width: 100%),
caption: [With average colors]
)
),
caption: [Rendering of the model with different styles of textures]
)<d3:textures>
=== Segments
To allow random access to the content within an adaptation set storing geometry data, we group the faces into segments.
Each segment is then stored as an OBJ file which can be individually requested by the client.
For geometry, we partition the faces in an adaptation set into sets of $N_s$ faces, by first sorting the faces by their area in 3D space in descending order, and then place each successive $N_s$ faces into a segment.
Thus, the first segment contains the biggest faces and the last one the smallest.
In addition to the selected faces, a segment stores all face vertices and attributes so that each segment is independent.
For textures, each representation contains a single segment.
#figure(
align(left,
raw(
read("../assets/dash-3d/geometry-as.xml"),
block: true,
lang: "xml",
),
),
caption: [MPD description of a geometry adaptation set, and a texture adaptation set.]
)<d3:mpd>
Now that the 3D data is partitioned and that the MPD file is generated, we see in the next section how the client uses the MPD to request the appropriate data chunks.

12
dash-3d/introduction.typ Normal file
View File

@ -0,0 +1,12 @@
== Introduction
In this chapter, we take a little step back from interaction and propose a system with simple interactions that however, addresses most of the open problems mentioned in @i:challenges[Section].
We take inspiration from video streaming: working on the similarities between video streaming and 3D streaming (seen in @i:video-vs-3d[Section]), we benefit from the DASH efficiency (seen in @rw:dash[Section]) for streaming 3D content.
DASH is based on content preparation and structuring which helps not only the streaming policies but also leads to a scalable and efficient system since it moves completely the load from the server to the clients.
A DASH client downloads the structure of the content, and then, depending on its needs and independently of the server, decides what to download.
In this chapter, we show how to mimic DASH video with 3D streaming, and we develop a system that keeps DASH benefits.
@d3:dash-3d[Section] describes our content preparation and metadata, and all the preprocessing that is done to our model to allow efficient streaming.
Section~\ref{d3:dash-client} gives possible implementations of clients that exploit the content structure.
Section~\ref{d3:evaluation} evaluates the impact of the different parameters that appear both in the content preparation and the client.
Finally, Section~\ref{d3:conclusion} sums up our work and explains how it tackles the challenges raised in the conclusion of the previous chapter.

35
dash-3d/main.typ Normal file
View File

@ -0,0 +1,35 @@
#import "../template.typ"
#template.beforeChapter()
= DASH-3D<d3>
#template.afterNumberedChapter()
#figure(
image("../assets/dash-3d/bigpicture.png", width: 100%),
caption: [A subdivided 3D scene with a viewport and regions delimited with red edges. In white, the regions that are outside the field of view of the camera; in green, the regions inside the field of view of the camera.]
)<d3:big-picture>
Dynamic Adaptive Streaming over HTTP (DASH) is now a widely deployed standard for video streaming, and even though video streaming and 3D streaming are different problems, many of DASH features can inspire us for 3D streaming.
In this chapter, we present the most important contribution of this thesis: adapting DASH to 3D streaming.
First, we show how to prepare 3D data into a format that complies with DASH data organization, and we store enough metadata to enable a client to perform efficient streaming.
The data preparation consists in partitioning the scene into spatially coherent cells and segmenting each cell into chunks with a fixed number of faces, which are sorted by area so that faces of a different level of detail are not grouped together.
We also export each texture at different resolutions.
We encode the metadata that describes the data organization into a 3D version of the Media Presentation Description (MPD) that DASH uses for video.
All this prepared content is then stored on a simple static HTTP server: a clients can request the content without any need for computation on the server side, allowing a server to support an arbitrary number of clients.
We then propose DASH-3D clients that are viewpoint aware: they perform frustum culling to eliminate cells outside the viewing volume of the camera (as shown in @d3:big-picture).
We define utility metrics to give a score to each chunk of data, be it geometry or texture, based on offline information that is given in the MPD, and online information that the client is able to compute, such as view parameters, user interaction or bandwidth measurements.
We also define streaming policies that rely on those utilities in order for the client to determine which chunks need to be downloaded.
We finally evaluate these system parameters under different bandwidth setups and compare our streaming policies.
#pagebreak()
#include("introduction.typ")
#include("content-preparation.typ")
\input{dash-3d/content-preparation}
\input{dash-3d/client}
\input{dash-3d/evaluation}
\input{dash-3d/conclusion}

View File

@ -1,4 +1,8 @@
#import "../template.typ"
#template.beforeChapter()
= Foreword<f>
#template.afterNumberedChapter()
#include "3d-model.typ"
#include "video-vs-3d.typ"

View File

@ -1,4 +1,4 @@
== Similarities and differences between video and 3D
== Similarities and differences between video and 3D<i:video-vs-3d>
The video streaming setting and the 3D streaming setting share many similarities: at a higher level of abstraction, both systems allow a user to access remote content without having to wait until everything is loaded.
Analyzing similarities and differences between the video and the 3D scenarios as well as having knowledge about video streaming literature are the key to developing an efficient 3D streaming system.

View File

@ -1,4 +1,4 @@
== Open problems
== Open problems<i:challenges>
#set heading(numbering: none, outlined: false)

View File

@ -1,3 +1,6 @@
#import "../template.typ"
#template.beforeChapter()
#heading(level: 1, numbering: none)[Introduction]
#set heading(numbering: (..nums) => {

View File

@ -16,18 +16,16 @@
#outline(indent: true, depth: 3)
]
#pagebreak()
#include "introduction/main.typ"
#pagebreak()
#include "foreword/main.typ"
#pagebreak()
#include "related-work/main.typ"
#pagebreak()
#include "preliminary-work/main.typ"
#include "dash-3d/main.typ"
#pagebreak()
#bibliography("bib.bib", style: "chicago-author-date")

View File

@ -1,4 +1,8 @@
#import "../template.typ"
#template.beforeChapter()
= Bookmarks, navigation and streaming<bi>
#template.afterNumberedChapter()
#figure(
grid(

View File

@ -1,4 +1,8 @@
#import "../template.typ"
#template.beforeChapter()
= Related work<rw>
#template.afterNumberedChapter()
In this chapter, we review the part of the state of the art on multimedia streaming and interaction that is relevant for this thesis.
As discussed in the previous chapter, video and 3D share many similarities and since there is already a very important body of work on video streaming, we start this chapter with a review of this domain with a particular focus on the DASH standard.

View File

@ -11,7 +11,7 @@ While RTP is stateful (that is to say, it requires keeping track of every user a
Furthermore, an HTTP server can easily be replicated at different geographical locations, allowing users to fetch data from the closest server.
This type of network architecture is called CDN (Content Delivery Network) and increases the speed of HTTP requests, making HTTP based multimedia streaming more efficient.
=== DASH: the standard for video streaming
=== DASH: the standard for video streaming<rw:dash>
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH #cite("dash-std", "dash-std-2") is now a widely deployed
standard for adaptively streaming video on the web #cite("dash-std-full"), made to be simple, scalable and inter-operable.

View File

@ -60,14 +60,12 @@
if nums.pos().len() >= 4 {
none
} else {
nums.pos().map(str).join(".") + " "
nums.pos().map(str).join(".")
}
})
show heading.where(level: 1): it => {
set text(size: 11pt, weight: "regular")
align(right, {
pagebreak();
v(100pt)
if it.numbering != none {
text(size: 50pt, weight: "bold")[Chapter ]
@ -76,11 +74,7 @@
}
v(50pt)
text(it.body, size: 40pt, weight: "bold")
if it.numbering != none {
pagebreak()
} else {
v(40pt)
}
v(40pt)
})
}
@ -89,3 +83,19 @@
doc
}
#let beforeChapter = () => {
pagebreak()
locate(loc => {
// This is not fully working but I don't know how to do better
if calc.rem(loc.position().page, 2) == 0 {
pagebreak()
}
})
}
#let afterNumberedChapter = () => {
pagebreak()
}