phd/src/preliminary-work/streaming.tex

\section{Impact of 3D bookmarks on streaming\label{bi:system}}

\subsection{3D model streaming}

In this section, we describe our implementation of a 3D model streaming policy in our simulation.
A summary of the streaming policies we designed is given in Table~\ref{bi:streaming-policies}.
Note that the policy is different from the one we used for the crowdsourcing experiments.
Recall that in the crowdsourcing experiments, we load all the 3D content before the participants begin to navigate to remove bias due to different network conditions.
Here, we implemented a streaming version, which we expect an actual NVE will use.

The 3D content we used are textured mesh --- coded in \texttt{obj} file format.
As such, the data we used in our experiments are made of several components.
The geometry consists of (i) a list of vertices and (ii) a list of faces, and the texture consists of (i) a list of materials, (ii) a list of texture coordinates, and (iii) a set of texture images.
In the crowdsourcing experiment, we keep the model small since the goal is to study the user interaction.
To increase the size of the model, while keeping the same 3D scene, we subdivide each triangle three times, successively, thereby multiplying the total number of triangles in the scene by 64.
We do this to simulate a reasonable use case with large 3D scenes.
Table~\ref{bi:modelsize} shows that material and texture amount at most for $3.6\%$ of the geometry, which justifies this choice.

When a client starts loading the web page containing the 3D model, the server first sends the list of materials and the texture files.
Then, the server periodically sends a fixed size chunk that indifferently encapsulates vertices, texture coordinates, or faces.
A \textit{vertex} is coded with three floats and an integer ($x$, $y$, and $z$ coordinates and the index of the vertex), a \textit{texture coordinate} with two floats and an integer (the $x$ and $y$ coordinates on the image and the index of the texture coordinate), and a face with eight integers (the index of each vertex, the index of each texture coordinate, the index of the face and the number of the corresponding material).
Consequently, given the JavaScript implementation of integers and floats, we approximate each vertex and each texture coordinate to take up 32 bytes, and each face takes up 96 bytes.

\begin{table}[th]
    \centering
    \begin{tabular}{lccc}
        \toprule
        \textbf{Scene} & \textbf{Material} & \textbf{Images}  & \textbf{Geometry} \\
        \midrule
        Scene 1 & 8 KB & 72 KB & 8.48 MB \\
        Scene 2 & 302 KB & 8 KB & 8.54 MB \\
        Scene 3 & 16 KB & 92 KB & 5.85 MB \\
        \bottomrule
    \end{tabular}
    \caption{Respective sizes of materials, textures (images) and geometries for the three scenes used in the user study.}\label{bi:modelsize}
\end{table}

During playback, the client periodically (every 200 ms in our implementation) sends to the server its current position and camera orientation.
The server computes a sorted list of relevant faces: first the server performs frustum culling to compute the list of faces that intersect with the client's viewing frustum.
Then, it performs backface culling to discard the faces whose normals point towards the same direction as the client's camera orientation.
The server then sorts the filtered faces according to their distance to the camera.
Finally, the server incrementally fills in chunks with these ordered faces.
If a face depends on a vertex or a texture coordinate that has not yet been sent, the vertex or the texture coordinate is added to the chunk as well.
When the chunk is full, the server sends it.
Both client and server algorithms are detailed in algorithms~\ref{bi:streaming-algorithm-client} and~\ref{bi:streaming-algorithm-server}.
The chunk size is set according to the bandwidth limit of the server.
Note that the server may send faces that are occluded and not visible to the client, since determining visibility requires additional computation.

\begin{algorithm}[th]
    \While{streaming is not finished}{%
        Receive chunk from the server\;
        Add the faces from the chunk to the model\;
        Update the camera (by 200ms)\;
        Compute the rendering and evaluate the quality\;
        Send the position of the camera to the server\;
    }
    \caption{Client slide algorithm\label{bi:streaming-algorithm-client}}
\end{algorithm}

\begin{algorithm}[th]
    \While{streaming is not finished}{%
        Receive position of the camera from the client\;
        Compute the list of triangles to send and sort them\;
        Send a chunk of a certain amount of triangles\;
    }
    \caption{Server side algorithm\label{bi:streaming-algorithm-server}}
\end{algorithm}

In the following, we shall denote this streaming policy \textsf{culling}; in Figures~\ref{bi:click-1250} and~\ref{bi:click-625} streaming using \textsf{culling} only is denoted \textsf{C-only}.

\subsection{3D bookmarks}

We have seen (Figure~\ref{bi:triangles-curve}) that navigation with bookmarks is more demanding on the bandwidth.
We want to exploit bookmarks to improve the user's quality of experience. For this purpose, we propose two streaming policies based on offline computation of the relevance of 3D content to bookmarked viewpoints.

\subsubsection{Visibility determination for 3D bookmarks}

A bookmarked viewpoint is more likely to be accessed, compared to other arbitrary viewpoint in the 3D scene.
We exploit this fact to perform some precomputation on the 3D content visible from the bookmarked viewpoint.

Recall that \textsf{culling} does not consider occlusion of the faces.
Furthermore, it prioritizes the faces according to distance from the camera, and does not consider the actual contribution of the faces to the rendered 2D images.
Ideally, we should prioritize the faces that occupy a bigger area in the 2D rendered images.
Computing this, however, requires rendering the scene at the server, and measuring the area of each face.
It is not scalable to compute this for every viewpoint requested by the client.

However, we can prerender the bookmarked viewpoints, since the number of bookmarks is limited, their viewpoints are known in advance, and they are likely to be accessed.
For each bookmark, we render the scene offline, using a single color per triangle.
Once rendered, we scan the output image to find the visible triangles (based on the color) and sort them by decreasing projected area.
This technique is also used by~\citep{view-dependent-progressive-mesh}.
Thus, when the user clicks on a 3D bookmark, this precomputed list of faces is used by the server, and only visible faces are sent in decreasing order of contributions to the rendered image.

For the three scenes that we used in the experiment, we can reduce the number of triangles sent by 60\% (over all bookmarks).
This reduction is as high as 85.7\% for one particular bookmark (from 26,886 culled triangles to 3,853 culled and visible triangles).

To illustrate the impact of sorting by projected area of faces, Figure~\ref{bi:sorted-tri} shows the quality improvement gained by sending the precomputed visible triangles prioritized by projected areas, compared to using culling only prioritized by distance.
The curve shows the average quality over all bookmarks over all scenes, for a given number of triangles received.
The quality is measured by the ratio of correctly rendered pixels, comparing the fully and correctly rendered image (when all 3D content is available) and the rendered image (when content is partially available).
We sample one pixel every 100 rows and every 100 columns to compute this value.
The figure shows that, to obtain 90\% of correctly displayed samples, we require 1904 triangles instead of 5752 triangles, about 1/3 savings.

In what follows, we will refer to this streaming policy as \textsf{visible}.

\begin{figure}[th]
    \centering
    \begin{tikzpicture}
        \begin{axis}[
            xlabel=Number of Triangles Received,
            ylabel=Quality of rendering,
            no markers,
            width=\tikzwidth,
            height=\tikzheight,
            cycle list name=mystyle,
            legend pos=south east,
            xmin=0,
            xmax=21000,
            ymin=0,
            ymax=1.1
        ]

            \addplot table [y=y1, x=x]{assets/preliminary-work/cdf.dat};
            \addlegendentry{Culling}
            \addplot table [y=y2, x=x]{assets/preliminary-work/cdf.dat};
            \addlegendentry{Precomputation}

        \end{axis}
    \end{tikzpicture}
    \caption{Comparison of rendered image quality (average on all bookmarks and starting position): the triangles are sorted offline (green curve), or sorted online by distance to the viewpoint (blue curve).}\label{bi:sorted-tri}
\end{figure}

\subsubsection{Prefetching by predicting the next bookmark clicked}

We can now use the precomputed, visibility-based streaming of 3D content for the bookmarks to reduce the amount of traffic needed.
Next, we propose to prefetch the 3D content from the bookmarks.
Any efficient prefetching policy needs to accurately predict users' actions.

As shown, users tend to visit the bookmarked viewpoints more often than others, except the initial viewpoint.
It is thus natural to try to prefetch the 3D content of the bookmarks.

\begin{figure}[th]
    \centering
    \DTLloaddb[noheader=false]{mat1}{assets/preliminary-work/click-probability.dat}
    \begin{tikzpicture}[scale=0.75]
        \DTLforeach*{mat1}{\x=x, \y=y, \r=r, \g=g}{%
            \draw[fill=Grey] (\x,\y) circle (\r);
        }
        \foreach \x in {0,...,11}
        \draw (\x, -10pt) node[anchor=north] {\x};
        \foreach \y in {0,...,11}
        \draw (-10pt, \y) node[anchor=east] {\y};
        \draw (5.5, -40pt) node {Previous recommendation clicked};
        \draw (-40pt,5.5) node[rotate=90] {Next recommendation clicked};
        \draw[step=1.0,black,thin,dashed] (0,0) grid (11,11);
    \end{tikzpicture}
    \caption{Probability distribution of `next clicked bookmark' for Scene 1 (computed from the 33 users with bookmarks). Numbering corresponds to 0 for initial viewport and 11 bookmarks; the size of the disk at $(i,j)$ is proportional to the probability of clicking bookmark $j$ after $i$.\label{bi:mat1}}
\end{figure}

Figure~\ref{bi:mat1} shows the probability of visiting a bookmark (vertical axis) given that another bookmark has been visited (horizontal axis).
This figure shows that users tend to follow similar paths when consuming bookmarks.
Thus, we hypothesize that prefetching along those paths would lead to better image quality and lower discovery latency.

The policy used is the following.
We divide each chunk sent by the server into two parts.
The first part is used to fetch the content from the current viewpoint, using the \textsf{culling} streaming policy.
The second part is used to prefetch content from the bookmarks, according to their likelihood of being clicked next.
We use the probabilities displayed in Figure~\ref{bi:mat1} to determine the size of each part.
Each bookmark $B$ has a probability $p(B|B_{prev})$ of being clicked next, considering that $B_{prev}$ was the last clicked bookmark.
We assign to each bookmark a certain portion of the chunk to prefetch the corresponding data proportionally to the probability of it being clicked.
We use the \textsf{visible} policy to determine which data should be sent for a bookmark.

We denote this combination as \textsf{V-PP}, for Prefetching based on Prediction using \textsf{visible} policy.

\begin{figure}[th]
    \centering
    \begin{tikzpicture}
        \draw [fill=LightCoral] (0,0) rectangle (5,1);
        \node at (2.5,0.5) {Frustum / backface culling};
        \draw [fill=Khaki] (5,0) rectangle (6.5,1);
        \node at (5.75,0.5) {$B_i$};
        \draw [fill=SandyBrown] (6.5,0) rectangle (7,1);
        \node at (6.75,0.5) {$B_j$};
        \draw [fill=LightGreen] (7,0) rectangle (10,1);
        \node at (8.5,0.5) {$B_k$};
    \end{tikzpicture}
    \caption{Example of how a chunk can be divided into fetching what is needed to display the current viewport (culling), and prefetching three recommendations according to their probability of being visited next.\label{bi:prefetched-chunk}}
\end{figure}

\subsubsection{Fetching destination bookmark}

An alternate method to benefit from the precomputing visible triangles at the bookmark, is to fetch 3D content during the ``fly-to'' transition to reach the destination.
Indeed, as specified in Section~\ref{bi:3dnavigation}, moving to a bookmarked viewpoint is not instantaneous, but rather takes a small amount of time to smoothly move the user camera from its initial position towards the bookmark.
This transition usually takes from 1 to 2 seconds, depending on how far the current user camera position is from the bookmark.

When the user clicks on the bookmark, the client fetches the visible vertices from the destination viewpoint, with all the available bandwidth.
So, during the transition time, the server no longer does \textsf{culling}, but the whole chunk is used for fetching following \textsf{visible} policy.

The immediate drawback of this policy is that on the way to the bookmark, the user perception of the scene will be degraded because of the lack of data for the viewpoints in transition.
On the bright side, no time is lost to prefetch bookmarks that will never be consumed, because we fetch only when we are sure that the user has clicked on a bookmark.
This way, when the user is not clicking on bookmarks, we can use the entire bandwidth for the current viewpoint and get as many triangles as possible to improve the current viewpoint.
We call this method \textsf{V-FD}, since we are Fetching the 3D data from the Destination using \textsf{visible} policy.

\begin{table}
    \centering
    \begin{tabular}{ccccc}
        \toprule
        & \textsf{Visible} & \textsf{V-FD} & \textsf{V-PP} & \textsf{V-PP+FD} \\ \midrule
        \textbf{Frustum culling}    &\cmark&\cmark&\cmark&\cmark\\
        \textbf{Fetch destination}  &\xmark&\cmark&\xmark&\cmark\\
        \textbf{Prefetch predicted} &\xmark&\xmark&\cmark&\cmark\\\bottomrule
    \end{tabular}
    \caption{Summary of the streaming policies\label{bi:streaming-policies}}
\end{table}

\subsection{Comparing streaming policies}

In order to determine which policy to use, we replay the traces from the user study while simulating different streaming policies.
The first point we are interested in is which streaming policy leads to the lower discovery latency and better image quality for the user: \textsf{culling} (no prefetching), \textsf{V-PP} (prefetching based on probability of accessing bookmarks), or \textsf{V-FD} (no prefetching, but fetch the destination during fly-to transition) or combining both \textsf{V-PP} and \textsf{V-FD} (\textsf{V-PP+FD}).

\begin{figure}[th]
    \centering
    \begin{tikzpicture}
        \begin{axis}[
                xlabel=Time (in s),
                ylabel=Quality of rendering,
                no markers,
                cycle list name=mystyle,
                width=\tikzwidth,
                height=\tikzheight,
                legend pos=south east,
                ymin=0.6,
                ymax=1.01,
                xmin=0,
                xmax=8.1
            ]
            \addplot table [y=y1, x=x]{assets/preliminary-work/evaluation/click-curves-local-1250.dat};
            \addlegendentry{C only}
            \addplot table [y=y3, x=x]{assets/preliminary-work/evaluation/click-curves-local-1250.dat};
            \addlegendentry{V-PP}
            \addplot table [y=y2, x=x]{assets/preliminary-work/evaluation/click-curves-local-1250.dat};
            \addlegendentry{V-FD}
            \addplot table [y=y4, x=x]{assets/preliminary-work/evaluation/click-curves-local-1250.dat};
            \addlegendentry{V-PP+FD}
        \end{axis}
    \end{tikzpicture}
    \caption{Average percentage of the image pixels that are correctly rendered against time, for all users with bookmarks, and using a bandwidth (BW) of 1 Mbps. The origin, $t=0$, is the time of the first click on a bookmark. Each curve corresponds to a streaming policy.\label{bi:click-1250}}
\end{figure}

Figure~\ref{bi:click-1250} compares the quality of the view of a user after their first click on a bookmark.
The ratio of pixels correctly displayed is computed in the client algorithm, see also algorithm~\ref{bi:streaming-algorithm-client}.
In this figure we use a bandwidth of 1 Mbps.
The blue curve corresponds to the \textsf{culling} policy.
Clicking on a bookmark generates a user path with less spatial locality, causing a large drop in visual quality that is only compensated after 4 seconds.
During the first second, the camera moves from the current viewport to the bookmarked viewport.

When the data has been prefetched according to the probability of the bookmark to be clicked, the drop in quality is less visible (\textsf{V-PP} curve).
However, by benefiting from the precomputation of visible triangles and ordering of the important triangles in a bookmark (\textsf{V-FD}) the drop in quality is still there, but is very short (approximately four times shorter than for \textsf{culling}).
This drop in quality is happening during the transition on the path.
More quantitatively, with a $1$ Mbps bandwidth, 3 seconds are necessary after the click to recover $90\%$ of correct pixels.

\begin{figure}[th]
    \centering
    \begin{tikzpicture}
        \begin{axis}[
                xlabel=Time (in s),
                ylabel=Quality of rendering,
                no markers,
                cycle list name=mystyle,
                width=\tikzwidth,
                height=\tikzheight,
                legend pos=south east,
                ymin=0.6,
                ymax=1.01,
                xmin=0,
                xmax=8.1
            ]
            \addplot table [y=y1, x=x]{assets/preliminary-work/evaluation/click-curves-local-625.dat};
            \addlegendentry{C only}
            \addplot table [y=y3, x=x]{assets/preliminary-work/evaluation/click-curves-local-625.dat};
            \addlegendentry{V-PP}
            \addplot table [y=y2, x=x]{assets/preliminary-work/evaluation/click-curves-local-625.dat};
            \addlegendentry{V-FD}
            \addplot table [y=y4, x=x]{assets/preliminary-work/evaluation/click-curves-local-625.dat};
            \addlegendentry{V-PP+FD}
        \end{axis}
    \end{tikzpicture}
    \caption{Average percentage of the image pixels that are correctly rendered against time --for all users with bookmarks, and using a bandwidth (BW) of 0.5 Mbps. The origin, $t=0$, is the time of the first click on a bookmark. Each curve corresponds to a streaming policy.\label{bi:click-625}}
\end{figure}

\begin{figure}[th]
    \centering
    \begin{tikzpicture}
        \begin{axis}[
                xlabel=Time (in s),
                ylabel=Quality of rendering,
                no markers,
                cycle list name=mystyle,
                width=\tikzwidth,
                height=\tikzheight,
                legend pos=south east,
                ymin=0.6,
                ymax=1.01,
                xmin=0,
                xmax=4.5
            ]
            \addplot table [y=y1, x=x]{assets/preliminary-work/evaluation/click-curves-local-2500.dat};
            \addlegendentry{V-FD}
            \addplot table [y=y2, x=x]{assets/preliminary-work/evaluation/click-curves-local-2500.dat};
            \addlegendentry{V-PP-FD}
        \end{axis}
    \end{tikzpicture}
    \caption{Same curve as Figures~\ref{bi:click-1250} and~\ref{bi:click-625}, for comparing streaming policies \textsf{V-FD} alone and \textsf{V-PP+FD}. BW=2Mbps\label{bi:2MB}}
\end{figure}


Figure~\ref{bi:click-625} showed the results of the same experiment with 0.5 Mbps bandwidth.  Here, it takes 4 to 5 seconds to recover $85\%$ of the pixels with \textsf{culling} and \textsf{V-PP}, against 1.5 second for recovering $90\%$ with \textsf{V-FD}.
Combining both strategies (\textsf{V-PP+FD}) leads to the best quality.

At 1 Mbps bandwidth, \textsf{V-PP} penalizes the quality, as the curve \textsf{V-PP-FD} leads to a lower quality image than \textsf{V-FD} alone.
This effect is even stronger when the bandwidth is set to 2 Mbps (Figure~\ref{bi:2MB}).
Both streaming strategies based on the precomputation of the ordering improves the image quality.
We see here, that \textsf{V-FD} has a greater impact than \textsf{V-PP}.  Here, \textsf{V-PP} may prefetch content that eventually may not be used,  whereas \textsf{V-FD} only sends relevant 3D content (knowing which bookmark has been just clicked).

We present only the results after the first click.
For subsequent clicks, we found that other factors came into play and thus, it is hard to analyze the impact of the various streaming policies.
For instance, a user may revisit a previously visited bookmark, or the bookmarks may overlap.
If the users click on a subsequent bookmark after a long period, then more content would have been fetched for this user, making comparisons difficult.

To summarize, we found that exploiting the fact that bookmarked viewpoints are frequently visited to precompute the visible faces and sort them according to projected areas can lead to significant improvement in image quality after a user interaction (clicking on a bookmark).
This alone can lead to 60\% less triangles being sent, with 1/3 of the triangles sufficient to ensure 90\% of pixels correctly rendered, compared to doing frustum/backface culling.
If we fetch these precomputed faces of the destination viewpoint this way immediately after the click, during the ``fly-to'' transition, then we can already significantly improve the quality without any prefetching.
Prefetching helps if the bandwidth is low, and fewer triangles can be downloaded during this transition.
The network conditions play a minimum role in this key message --- bookmarking allows precomputation of an ordered list of visible faces, and this holds regardless of the underlying network condition (except for non-interesting extreme cases, such as negligible bandwidth or abundance of bandwidth).