phd-typst/foreword/video-vs-3d.typ

337 lines
18 KiB
Plaintext
Raw Normal View History

2023-05-02 17:57:14 +02:00
== Similarities and differences between video and 3D
2023-04-14 18:27:59 +02:00
The video streaming setting and the 3D streaming setting share many similarities: at a higher level of abstraction, both systems allow a user to access remote content without having to wait until everything is loaded.
Analyzing similarities and differences between the video and the 3D scenarios as well as having knowledge about video streaming literature are the key to developing an efficient 3D streaming system.
2023-05-02 17:57:14 +02:00
=== Chunks of data
2023-04-14 18:27:59 +02:00
In order to be able to perform streaming, data need to be segmented so that a client can request chunks of data and display it to the user while requesting another chunk.
In video streaming, data chunks typically consist in a few seconds of video.
In mesh streaming, some progressive mesh approaches encode a base mesh that contains low resolution geometry and textures and different chunks that increase the resolution of the base mesh.
Otherwise, a mesh can also be segmented by separating geometry and textures, creating chunks that contain some faces of the model, or some other chunks containing textures.
2023-05-02 17:57:14 +02:00
=== Data persistence
2023-04-14 18:27:59 +02:00
One of the main differences between video and 3D streaming is data persistence.
In video streaming, only one chunk of video is required at a time.
Of course, most video streaming services prefetch some future chunks, and keep in cache some previous ones, but a minimal system could work without latency and keep in memory only two chunks: the current one and the next one.
Already a few problems appear here regarding 3D streaming:
- depending on the user's field of view, many chunks may be required to perform a single rendering;
- chunks do not become obsolete the way they do in video, a user navigating in a 3D scene may come back to a same spot after some time, or see the same objects but from elsewhere in the scene.
2023-05-02 17:57:14 +02:00
=== Multiple representations
2023-04-14 18:27:59 +02:00
All major video streaming platforms support multi-resolution streaming.
This means that a client can choose the quality at which it requests the content.
It can be chosen directly by the user or automatically determined by analyzing the available resources (size of the screen, downloading bandwidth, device performances)
#figure(
image("../assets/introduction/youtube-multiresolution.png", width: 80%),
caption: [The different qualities available for a Youtube video],
)
Similarly, recent work in 3D streaming have proposed different ways to progressively stream 3D models, displaying a low quality version of the model to the user without latency, and supporting interaction with the model while details are being downloaded.
Such strategies are reviewed in Section X. // TODO
2023-05-02 17:57:14 +02:00
=== Media types
2023-04-14 18:27:59 +02:00
Just like a video, a 3D scene is composed of different media types.
In video, those media are mostly images, sounds, and subtitles, whereas in 3D, those media are geometry or textures.
In both cases, an algorithm for content streaming has to acknowledge those different media types and manage them correctly.
In video streaming, most of the data (in terms of bytes) are used for images.
Thus, the most important thing a video streaming system should do is to optimize images streaming.
That is why, on a video on Youtube for example, there may be 6 available qualities for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 qualities for sound.
This is one of the main differences between video and 3D streaming: in a 3D setting, the ratio between geometry and texture varies from one scene to another, and leveraging between those two types of content is a key problem.
2023-05-02 17:57:14 +02:00
=== Interaction
2023-04-14 18:27:59 +02:00
The ways of interacting with content is another important difference between video and 3D.
In a video interface, there is only one degree of freedom: time.
The only things a user can do is letting the video play, pausing, resuming, or jumping to another time in the video.
There are also controls for other options that are described
#link("https://web.archive.org/web/20191014131350/https://support.google.com/youtube/answer/7631406?hl=en")[on this help page].
// For example, to perform these few actions, Youtube provides the user with multiple options.
// \begin{itemize}
//
// \item To pause or resume a video, the user can:
// \begin{itemize}
// \item click the video;
// \item press the \texttt{K} key;
// \item press the space key.
// \end{itemize}
//
// \item To navigate to another time in the video, the user can:
// \begin{itemize}
// \item click the timeline of the video where they want;
// \item press the left arrow key to move 5 seconds backwards;
// \item press the right arrow key to move 5 seconds forwards;
// \item press the \texttt{J} key to move 10 seconds backwards;
// \item press the \texttt{L} key to move 10 seconds forwards;
// \item press one of the number key (on the first row of the keyboard, below the function keys, or on the numpad) to move the corresponding tenth of the video;
// \item press the home key to go the beginning of the video, or the end key to go to the end.
// \end{itemize}
//
// \end{itemize}
// \begin{itemize}
// \item up and down arrows change the sound volume;
// \item \texttt{M} mutes the sound;
// \item \texttt{C} activates the subtitles;
// \item \texttt{F} puts the player in fullscreen mode;
// \item \texttt{T} activates the theater mode (where the video occupies the total width of the screen, instead of occupying two thirds of the screen, the last third being advertising or recommendations);
// \item \texttt{I} activates the mini-player (allowing to search for other videos while keeping the current video playing in the bottom right corner).
// \end{itemize}
//
All the keyboard shortcuts are summed up in Figure X. // TODO
Those interactions are different if the user is using a mobile device.
// \newcommand{\relativeseekcontrol}{LightBlue}
// \newcommand{\absoluteseekcontrol}{LemonChiffon}
// \newcommand{\playpausecontrol}{Pink}
// \newcommand{\othercontrol}{PalePaleGreen}
//
// \newcommand{\keystrokescale}{0.625}
// \newcommand{\tuxlogo}{\FA\symbol{"F17C}}
// \newcommand{\keystrokemargin}{0.1}
// \newcommand{\keystroke}[5]{%
// \draw[%
// fill=white,
// drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
// rounded corners=2pt,
// inner sep=1pt,
// line width=0.5pt,
// font=\scriptsize\sffamily,
// minimum width=0.1cm,
// minimum height=0.1cm,
// ] (#1+\keystrokemargin, #3+\keystrokemargin) rectangle (#2-\keystrokemargin, #4-\keystrokemargin);
// \node[align=center] at ({(#1+#2)/2}, {(#3+#4)/2}) {#5\strut};
// }
// \newcommand{\keystrokebg}[6]{%
// \draw[%
// fill=#6,
// drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
// rounded corners=2pt,
// inner sep=1pt,
// line width=0.5pt,
// font=\scriptsize\sffamily,
// minimum width=0.1cm,
// minimum height=0.1cm,
// ] (#1+\keystrokemargin, #3+\keystrokemargin) rectangle (#2-\keystrokemargin, #4-\keystrokemargin);
// \node[align=center] at ({(#1+#2)/2}, {(#3+#4)/2}) {#5\strut};
// }
//
// \begin{figure}[ht]
// \centering
// \begin{tikzpicture}[scale=\keystrokescale, every node/.style={scale=\keystrokescale}]
// % Escape key
// \keystroke{0}{1}{-0.75}{0}{ESC};
//
// % F1 - F4
// \begin{scope}[shift={(1.5, 0)}]
// \foreach \key/\offset in {F1/1,F2/2,F3/3,F4/4}
// \keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
// \end{scope}
//
// % F5 - F8
// \begin{scope}[shift={(6,0)}]
// \foreach \key/\offset in {F5/1,F6/2,F7/3,F8/4}
// \keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
// \end{scope}
//
// % F9 - F12
// \begin{scope}[shift={(10.5,0)}]
// \foreach \key/\offset in {F9/1,F10/2,F11/3,F12/4}
// \keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
// \end{scope}
//
// % Number rows
// \foreach \key/\offset in {`/0,-/11,=/12,\textbackslash/13}
// \keystroke{\offset}{1+\offset}{-1.75}{-1}{\key};
//
// \foreach \key/\offset in {1/1,2/2,3/3,4/4,5/5,6/6,7/7,8/8,0/9,0/10}
// \keystrokebg{\offset}{1+\offset}{-1.75}{-1}{\key}{\absoluteseekcontrol};
//
// % Delete char
// \keystroke{14}{15.5}{-1.75}{-1}{DEL};
//
// % Tab char
// \keystroke{0}{1.5}{-2.5}{-1.75}{Tab};
//
// % First alphabetic row
// \begin{scope}[shift={(1.5,0)}]
// \foreach \key/\offset in {Q/0,W/1,E/2,R/3,Y/5,U/6,O/8,P/9,[/10,]/11}
// \keystroke{\offset}{1+\offset}{-2.5}{-1.75}{\key};
//
// \keystrokebg{4}{5}{-2.5}{-1.75}{T}{\othercontrol};
// \keystrokebg{7}{8}{-2.5}{-1.75}{I}{\othercontrol};
// \end{scope}
//
// % Caps lock
// \keystroke{0}{1.75}{-3.25}{-2.5}{Caps};
//
// % Second alphabetic row
// \begin{scope}[shift={(1.75,0)}]
// \foreach \key/\offset in {A/0,S/1,D/2,G/4,H/5,;/9,'/10}
// \keystroke{\offset}{1+\offset}{-3.25}{-2.5}{\key};
//
// \keystrokebg{3}{4}{-3.25}{-2.5}{F}{\othercontrol}
//
// \keystrokebg{6}{7}{-3.25}{-2.5}{J}{\relativeseekcontrol};
// \keystrokebg{7}{8}{-3.25}{-2.5}{K}{\playpausecontrol};
// \keystrokebg{8}{9}{-3.25}{-2.5}{L}{\relativeseekcontrol};
// \end{scope}
//
// % Enter key
// \draw[%
// fill=white,
// drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
// rounded corners=2pt,
// inner sep=1pt,
// line width=0.5pt,
// font=\scriptsize\sffamily,
// minimum width=0.1cm,
// minimum height=0.1cm,
// ] (13.6, -1.85) -- (15.4, -1.85) -- (15.4, -3.15) -- (12.85, -3.15) -- (12.85, -2.6) -- (13.6, -2.6) -- cycle;
// \node[right] at(12.85, -2.875) {Enter $\hookleftarrow$};
//
// % Left shift key
// \keystroke{0}{2.25}{-4}{-3.25}{$\Uparrow$ Shift};
//
// % Third alphabetic row
// \begin{scope}[shift={(2.25,0)}]
// \foreach \key/\offset in {Z/0,X/1,V/3,B/4,N/5, /7,./8,\slash/9}
// \keystroke{\offset}{1+\offset}{-4}{-3.25}{\key};
// \keystrokebg{2}{3}{-4}{-3.25}{C}{\othercontrol};
// \keystrokebg{6}{7}{-4}{-3.25}{M}{\othercontrol};
// \end{scope}
//
// % Right shift key
// \keystroke{12.25}{15.5}{-4}{-3.25}{$\Uparrow$ Shift};
//
// % Last keyboard row
// \keystroke{0}{1.25}{-4.75}{-4}{Ctrl};
// \keystroke{1.25}{2.5}{-4.75}{-4}{\tuxlogo};
// \keystroke{2.5}{3.75}{-4.75}{-4}{Alt};
// \keystrokebg{3.75}{9.75}{-4.75}{-4}{}{\playpausecontrol};
// \keystroke{9.75}{11}{-4.75}{-4}{Alt};
// \keystroke{11}{12.25}{-4.75}{-4}{\tuxlogo};
// \keystroke{12.25}{13.5}{-4.75}{-4}{}
// \keystroke{13.5}{15.5}{-4.75}{-4}{Ctrl};
//
// % Arrow keys
// \keystrokebg{16}{17}{-4.75}{-4}{$\leftarrow$}{\relativeseekcontrol};
// \keystrokebg{17}{18}{-4.75}{-4}{$\downarrow$}{\othercontrol};
// \keystrokebg{18}{19}{-4.75}{-4}{$\rightarrow$}{\relativeseekcontrol};
// \keystrokebg{17}{18}{-4}{-3.25}{$\uparrow$}{\othercontrol};
//
// % Control keys
// \keystroke{16}{17}{-1.75}{-1}{\tiny Inser};
// \keystrokebg{17}{18}{-1.75}{-1}{\tiny Home}{\absoluteseekcontrol};
// \keystroke{18}{19}{-1.75}{-1}{\tiny PgUp};
//
// \keystroke{16}{17}{-2.5}{-1.75}{\tiny Del};
// \keystrokebg{17}{18}{-2.5}{-1.75}{\tiny End}{\absoluteseekcontrol};
// \keystroke{18}{19}{-2.5}{-1.75}{\tiny PgDown};
//
// % Numpad
// \keystroke{19.5}{20.5}{-1.75}{-1}{Lock};
// \keystroke{20.5}{21.5}{-1.75}{-1}{/};
// \keystroke{21.5}{22.5}{-1.75}{-1}{*};
// \keystroke{22.5}{23.5}{-1.75}{-1}{-};
//
// \keystrokebg{19.5}{20.5}{-2.5}{-1.75}{7}{\absoluteseekcontrol};
// \keystrokebg{20.5}{21.5}{-2.5}{-1.75}{8}{\absoluteseekcontrol};
// \keystrokebg{21.5}{22.5}{-2.5}{-1.75}{9}{\absoluteseekcontrol};
//
// \keystrokebg{19.5}{20.5}{-3.25}{-2.5}{4}{\absoluteseekcontrol};
// \keystrokebg{20.5}{21.5}{-3.25}{-2.5}{5}{\absoluteseekcontrol};
// \keystrokebg{21.5}{22.5}{-3.25}{-2.5}{6}{\absoluteseekcontrol};
//
// \keystrokebg{19.5}{20.5}{-4}{-3.25}{1}{\absoluteseekcontrol};
// \keystrokebg{20.5}{21.5}{-4}{-3.25}{2}{\absoluteseekcontrol};
// \keystrokebg{21.5}{22.5}{-4}{-3.25}{3}{\absoluteseekcontrol};
//
// \keystrokebg{19.5}{21.5}{-4.75}{-4}{0}{\absoluteseekcontrol};
// \keystroke{21.5}{22.5}{-4.75}{-4}{.};
//
// \keystroke{22.5}{23.5}{-3.25}{-1.75}{+};
// \keystroke{22.5}{23.5}{-4.75}{-3.25}{$\hookleftarrow$};
// \end{tikzpicture}
//
// \vspace{0.5cm}
//
// % Legend
// \begin{tikzpicture}[scale=\keystrokescale]
//
// \keystrokebg{0}{1}{0}{1}{}{\absoluteseekcontrol};
// \node[right=0.3cm] at (0.5, 0.5) {\small Absolute seek keys};
//
// \keystrokebg{6}{7}{0}{1}{}{\relativeseekcontrol};
// \node[right=0.3cm] at (6.5, 0.5) {\small Relative seek keys};
//
// \keystrokebg{12}{13}{0}{1}{}{\playpausecontrol};
// \node[right=0.3cm] at (12.5, 0.5) {\small Play or pause keys};
//
// \keystrokebg{18}{19}{0}{1}{}{\othercontrol};
// \node[right=0.3cm] at (18.5, 0.5) {\small Other shortcuts};
//
// \end{tikzpicture}
//
// \caption{Youtube shortcuts (white keys are unused)\label{i:youtube-keyboard}}
// \end{figure}
When it comes to 3D, there are many approaches to manage user interaction.
Some interfaces mimic the video scenario, where the only variable is the time and the camera follows a predetermined path on which the user has no control.
These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
Some other interfaces add 2 degrees of freedom to the timeline: the user does not control the camera's position but can control the angle. This mimics the 360 video scenario.
This is typically the case of the video game #link("http://nolimitscoaster.com/")[_nolimits 2: roller coaster simulator_] which works with VR devices (oculus rift, HTC vive, etc.) where the only interaction available to the user is turning the head.
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the camera's position, and 2 being the angles (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
The most common controls are the trackball controls where the user rotate the object like a ball
2023-04-22 17:26:54 +02:00
#link("https://threejs.org/examples/?q=controls#misc_controls_trackball")[(live example here)] and the orbit controls, which behave like the trackball controls but preserving the up vector #link("https://threejs.org/examples/?q=controls#misc_controls_orbit")[(live example here)].
2023-04-14 18:27:59 +02:00
These types of controls are notably used on the popular mesh editor #link("http://www.meshlab.net/")[MeshLab] and
#link("https://sketchfab.com/")[SketchFab], the YouTube for 3D models.
#figure(
2023-04-22 17:26:54 +02:00
image("../assets/related-work/3d-interaction/meshlab.png", width: 80%),
2023-04-14 18:27:59 +02:00
caption: [Screenshot of MeshLab],
)
2023-04-22 17:26:54 +02:00
Another popular way of controlling a free camera in a virtual environment is the first person controls #link("https://threejs.org/examples/?q=controls#misc_controls_pointerlock")[(live example here)].
2023-04-14 18:27:59 +02:00
These controls are typically used in shooting video games, the mouse rotates the camera and the keyboard translates it.
2023-05-02 17:57:14 +02:00
=== Relationship between interface, interaction and streaming
2023-04-14 18:27:59 +02:00
In both video and 3D systems, streaming affects interaction.
For example, in a video streaming scenario, if a user sees that the video is fully loaded, they might start moving around on the timeline, but if they see that the streaming is just enough to not stall, they might prefer not interacting and just watch the video.
If the streaming stalls for too long, the user might seek somewhere else hoping for the video to resume, or get frustrated and leave the video.
The same types of behaviour occur in 3D streaming: if a user is somewhere in a scene, and sees more data appearing, they might wait until enough data have arrived, but if they see nothing happens, they might leave to look for data somewhere else.
Those examples show how streaming can affect interaction, but interaction also affects streaming.
In a video streaming scenario, if a user is watching peacefully without interacting, the system just has to request the next chunks of video and display them.
However, if a user starts seeking at a different time of the streaming, the streaming would most likely stall until the system is able to gather the data it needs to resume the video.
Just like in the video setup, the way a user navigates in a networked virtual environment affects the streaming.
Moving slowly allows the system to collect and display data to the user, whereas moving frenetically puts more pressure on the streaming: the data that the system requested may be obsolete when the response arrives.
Moreover, the interface and the way elements are displayed to the user also impacts his behaviour.
A streaming system can use this effect to enhancing the quality of experience by providing feedback on the streaming to the user via the interface.
For example, on Youtube, the buffered portion of the video is displayed in light grey on the timeline, whereas the portion that remains to be downloaded is displayed in dark grey.
A user is more likely to click on the light grey part of the timeline than on the dark grey part, preventing the streaming from stalling.
// \begin{figure}[th]
// \centering
// \begin{tikzpicture}
// \node (S) at (0, 0) [draw, rectangle, minimum width=2cm,minimum height=1cm] {Streaming};
// \node (I) at (-2, -3) [draw, rectangle, minimum width=2cm,minimum height=1cm] {Interface};
// \node (U) at (2, -3) [draw, rectangle, minimum width=2cm,minimum height=1cm] {User};
// \draw[double ended double arrow=5pt colored by black and white] (S) -- (I);
// \draw[double ended double arrow=5pt colored by black and white] (S) -- (U);
// \draw[double arrow=5pt colored by black and white] (I) -- (U);
// \end{tikzpicture}
// \end{figure}