phd/src/dash-3d/content-preparation.tex

\section{Content preparation\label{d3:dash-3d}}

In this section, we describe how we pre-process and store the 3D data of the NVE, consisting of a polygon soup, textures, and material information into a DASH-compliant Media Presentation Description (MPD) file.
In our work, we use the \texttt{obj} file format for the polygons, \texttt{png} for textures, and \texttt{mtl} format for material information.
The process, however, applies to other formats as well.

\subsection{The MPD File}
In DASH, the information about content storage and characteristics, such as location, resolution, or size, is extracted from an MPD file by the client.
The client relies only on this information to decide which chunk to request and at which quality level.
The MPD file is an XML file that is organized into different sections hierarchically.
The \texttt{period} element is a top-level element, which for the case of video, indicates the start time and length of a video chapter.
This element does not apply to NVE, and we use a single \texttt{period} for the whole scene, as the scene is static.
Each \texttt{period} element contains one or more adaptation sets, which describe the alternate versions, formats, and types of media.
We utilize adaptation sets to organize a 3D scene's material, geometry, and texture.

The piece of software that does the preprocessing of the model consists in file manipulation and is written in Rust as well.
It successively preprocesses the geometry and then the textures.
The MPD is generated by a library named \href{https://github.com/netvl/xml-rs}{xml-rs} which works like a stack:
\begin{itemize}
    \item a structure is created on the root of the MPD file;
    \item the \texttt{start\_element} method creates a new child in the XML file;
    \item the \texttt{end\_element} method ends the current child and pops the stack.
\end{itemize}
This structure is passed along with our geometry and texture preprocessors that can add elements to the XML file as they are generating the corresponding data chunks.

\subsection{Adaptation Sets}
When the user navigates freely within an NVE, the frustum at given time almost always contains a limited part of the 3D scene.
Similar to how DASH for video streaming partitions a video clip into temporal chunks, we segment the polygons into spatial chunks, such that the DASH client can request only the relevant chunks.

\subsubsection{Geometry Management\label{d3:geometry}}
We use a space partitioning tree to organize the faces into cells.
A face belongs to a cell if its barycenter falls inside the corresponding bounding box.
Each cell corresponds to an adaptation set.
Thus, geometry information is spread on adaptation sets based on spatial coherence, allowing the client to download the relevant faces selectively.
A cell is relevant if it intersects the frustum of the client's current viewpoint. Figure~\ref{d3:big-picture} shows the relevant cells in green.
As our 3D content, a virtual environment, is biased to spread along the horizontal plane, we alternate between splitting between the two horizontal directions.

We create a separate adaptation set for large faces (e.g., the sky or ground) because they are essential to the 3D model and do not fit into cells.
We consider a face to be large if its area in 3D is more than $a+3\sigma$, where $a$ and $\sigma$ are the average and the standard deviation of 3D area of faces respectively.
In our example, it selects the 5 largest faces that represent $15\%$ of the total face area.
We thus obtain a decomposition of the NVE into adaptation sets that partitions the geometry of the scene into a small adaptation set containing the larger faces of the model, and smaller adaptation sets containing the remaining faces.

We store the spatial location of each adaptation set, characterized by the coordinates of its bounding box, in the MPD file as the supplementary property of the adaptation set in the form of ``\textit{$x_{\min}$, width, $y_{\min}$, height, $z_{\min}$, depth}'' (as shown in Listing~\ref{d3:mpd}).
This information is used by the client to implement a view-dependent streaming (Section~\ref{d3:dash-client}).

\subsubsection{Texture Management}
As with geometry data, we handle textures using adaptation sets but separate from geometry.
Each texture file is contained in a different adaptation set, with multiple representations providing different image resolutions (see Section~\ref{d3:representation}).
We add an attribute to each adaptation set that contains texture, describing the average color of the texture.
The client can use this attribute to render a face for which the corresponding texture has not been loaded yet, so that most objects appear, at least, with a uniform  natural color (see Figure~\ref{d3:textures}).


\subsubsection{Material Management}
The material \texttt{.mtl} file is a text file that describes all materials used in the \texttt{.obj} files for the entire 3D model.
A material has a name, properties such as specular parameters, and, most importantly, a path to a texture file.
The \texttt{.mtl} file maps each face of the \texttt{.obj} to a material.
As the \texttt{.mtl} file is a different type of media than geometry and texture, we define a particular adaptation set for this file, with a single representation.

\subsection{Representations}\label{d3:representation}
Each adaptation set can contain one or more representations of the geometry or texture data, at different levels of detail (e.g., a different number of faces).
For geometry, the resolution (i.e., 3D areas of faces) is heterogeneous, thus applying a sensible multi-resolution representation is cumbersome: the 3D area of faces varies from $0.01$ to more than $10K$, disregarding the outliers.
For textured scenes, it is common to have such heterogeneous geometry size since information can be stored either in geometry or texture.
Thus, handling the streaming compromise between geometry and texture is more adaptive than handling separately multi-resolution geometry.
Moreover, as our faces are partitioned into independent cells, multi-resolution would cause difficult stitching issues such as topological gaps between the cells.

For an adaptation set containing texture, each representation contains a single segment where the image file is stored at the chosen resolution.
In our example, from the full-size image, we generate successive resolutions by dividing both height and width by 2, stopping when the image size is less or equal to $64\times 64$.
Figure~\ref{d3:textures} illustrates the use of the textures against the rendering using a single, average color per face.

\begin{figure}[th]
    \centering
    \begin{subfigure}[b]{0.45\textwidth}
        \includegraphics[width=1\textwidth]{assets/dash-3d/average-color/full-res.png}
        \caption{With full resolution textures}
    \end{subfigure}
    \begin{subfigure}[b]{0.45\textwidth}
        \includegraphics[width=1\textwidth]{assets/dash-3d/average-color/no-res.png}
        \caption{With average colors}
    \end{subfigure}
    \caption{Rendering of the model with different styles of textures\label{d3:textures}}
\end{figure}

\subsection{Segments}
To allow random access to the content within an adaptation set storing geometry data, we group the faces into segments.
Each segment is then stored as a \texttt{.obj} file which can be individually requested by the client.
For geometry, we partition the faces in an adaptation set into sets of $N_s$ faces, by first sorting the faces by their area in 3D space in descending order, and then place each successive $N_s$ faces into a segment.
Thus, the first segment contains the biggest faces and the last one the smallest.
In addition to the selected faces, a segment stores all face vertices and attributes so that each segment is independent.
For textures, each representation contains a single segment.

\begin{figure}[th]
    \lstinputlisting[%
        language=XML,
        caption={MPD description of a geometry adaptation set, and a texture adaptation set.},
        label=d3:mpd,
        emph={%
            MPD,
            Period,
            AdaptationSet,
            Representation,
            BaseURL,
            SegmentBase,
            Initialization,
            Role,
            SupplementalProperty,
            SegmentList,
            SegmentURL,
            Viewpoint
        }
    ]{assets/dash-3d/geometry-as.xml}
\end{figure}

Now that 3D data is partitioned and that the MPD file is generated, we see in the next section how the client uses the MPD to request the appropriate data chunks