Prepare phd in typst

2023-04-14 18:27:59 +02:00
commit ac9abf3809
133 changed files with 584961 additions and 0 deletions
@@ -0,0 +1,149 @@
+A 3D streaming system is a system that progressively collects 3D data.
+The previous chapter voluntarily remained vague about what \emph{3D data} actually are.
+This chapter presents in detail the 3D data we consider and how they are rendered.
+We also give insights about interaction and streaming by comparing the 3D setting to the video one.
+
+= What is a 3D model?
+
+The 3D models we are interested in are sets of textured meshes, which can potentially be arranged in a scene graph.
+Such models can typically contain the following:
+
+- *Vertices*, which are 3D points,
+- *Faces*, which are polygons defined from vertices (most of the time, they are triangles),
+- *Textures*, which are images that can be used to paint faces in order to add visual richness,
+- *Texture coordinates*, which are information added to a face, describing how the texture should be painted over it,
+- *Normals*, which are 3D vectors that can give information about light behaviour on a face.
+
+The Wavefront OBJ is a format that describes all these elements in text format.
+A 3D model encoded in the OBJ format typically consists in two files: the material file (`.mtl`) and the object file (`.obj`).
+
+The material file declares all the materials that the object file will reference.
+A material consists in name, and other photometric properties such as ambient, diffuse and specular colors, as well as texture maps, which are images that are painted on faces.
+Each face corresponds to a material.
+A simple material file is visible on Snippet X. // TODO
+
+The object file declares the 3D content of the objects.
+It declares vertices, texture coordinates and normals from coordinates (e.g. `v 1.0 2.0 3.0` for a vertex, `vt 1.0 2.0` for a texture coordinate, `vn 1.0 2.0 3.0` for a normal).
+These elements are numbered starting from 1.
+Faces are declared by using the indices of these elements. A face is a polygon with an arbitrary number of vertices and can be declared in multiple manners:
+
+- `f 1 2 3` defines a triangle face that joins the first, the second and the third declared vertices;
+- `f 1/1 2/3 3/4` defines a similar triangle but with texture coordinates, the first texture coordinate is associated to the first vertex, the third texture coordinate is associated to the second vertex, and the fourth texture coordinate is associated with the third vertex;
+- `f 1//1 2//3 3//4` defines a similar triangle but referencing normals instead of texture coordinates;
+- `f 1/1/1 2/3/3 3/4/4` defines a triangle with both texture coordinates and normals.
+
+An object file can include materials from a material file (`mtllib path.mtl`) and apply the materials that it declares to faces.
+A material is applied by using the `usemtl` keyword, followed by the name of the material to use.
+The faces declared after a `usemtl` are painted using the material in question.
+An example of object file is visible on Snippet X. // TODO
+
+// \begin{figure}[th]
+//     \centering
+//     \begin{subfigure}[c]{0.4\textwidth}
+//         \lstinputlisting[
+//             language=XML,
+//             caption={An object file describing a cube},
+//             label=i:obj,
+//         ]{assets/introduction/cube.obj}
+//     \end{subfigure}\quad%
+//     \begin{subfigure}[c]{0.4\textwidth}
+//         \lstinputlisting[
+//             language=XML,
+//             caption={A material file describing a material},
+//             label=i:mtl,
+//         ]{assets/introduction/materials.mtl}
+//         \includegraphics[width=\textwidth]{assets/introduction/cube.png}
+//         \captionof{figure}{A rendering of the cube}
+//     \end{subfigure}
+//     \caption{The OBJ representation of a cube and its render\label{i:cube}}
+// \end{figure}
+
+== Rendering a 3D model
+
+A typical 3D renderer follows Algorithm X. // TODO
+// \begin{algorithm}[th]
+//     \SetKwData{Material}{material}
+//     \SetKwData{Object}{object}
+//     \SetKwData{Geometry}{geometry}
+//     \SetKwData{Materials}{all\_materials}
+//     \SetKwData{Object}{object}
+//     \SetKwData{Scene}{scene}
+//     \SetKwData{True}{true}
+//     \SetKwFunction{LoadGeometry}{load\_geometry}
+//     \SetKwFunction{LoadMaterial}{load\_material}
+//     \SetKwFunction{BindMaterial}{bind\_material}
+//     \SetKwFunction{Draw}{draw}
+//
+//     \tcc{Initialization}
+//     \For{$\Object\in\Scene$}{%
+//         \LoadGeometry{\Object.\Geometry}\;
+//         \LoadMaterial{\Object.\Material}\;
+//     }
+//     \BlankLine%
+//     \BlankLine%
+//     \tcc{Render loop}
+//     \While{\True}{%
+//         \For{$\Object\in\Scene$}{%
+//             \BindMaterial{\Object.\Material}\;
+//             \Draw{\Object.\Geometry}\;
+//         }
+//     }
+//
+//     \caption{A rendering algorithm\label{f:renderer}}
+// \end{algorithm}
+//
+The first task the renderer needs to perform is sending the data to the GPU: this is done in the loading loop during an initialization step.
+This step can be slow, but it is generally acceptable since it only occurs once at the beginning of the program.
+Then, the renderer starts the rendering loop: at each frame, it renders the whole scene: for each object, it binds the corresponding material to the GPU and then renders the object.
+During the rendering loop, there are two things to consider regarding performances:
+
+- the more faces in a geometry, the slower the `draw` call;
+- the more objects in the scene, the more overhead caused by the CPU/GPU communication at each step of the loop.
+
+The way the loop works forces objects with different materials to be rendered separately.
+An efficient renderer keeps the number of objects in a scene low to avoid introducing overhead.
+However, an important feature of 3D engines regarding performance is frustum culling.
+The frustum is the viewing volume of the camera.
+Frustum culling consists in skipping the objects that are outside the viewing volume of the camera in the rendering loop.
+Algorithm X is a variation of Algorithm Y with frustum culling. // TODO
+
+// \begin{algorithm}[th]
+//     \SetKwData{Texture}{texture}
+//     \SetKwData{Object}{object}
+//     \SetKwData{Geometry}{geometry}
+//     \SetKwData{Textures}{all\_textures}
+//     \SetKwData{Object}{object}
+//     \SetKwData{Scene}{scene}
+//     \SetKwData{True}{true}
+//     \SetKwData{CameraFrustum}{camera\_frustum}
+//     \SetKwFunction{LoadGeometry}{load\_geometry}
+//     \SetKwFunction{LoadTexture}{load\_texture}
+//     \SetKwFunction{BindTexture}{bind\_texture}
+//     \SetKwFunction{Draw}{draw}
+//
+//     \tcc{Initialization}
+//     \For{$\Object\in\Scene$}{%
+//         \LoadGeometry{\Object.\Geometry}\;
+//         \LoadTexture{\Object.\Texture}\;
+//     }
+//     \BlankLine%
+//     \BlankLine%
+//     \tcc{Render loop}
+//     \While{\True}{%
+//         \For{$\Object\in\Scene$}{%
+//             \If{$\Object\cap\CameraFrustum\neq\emptyset$}{%
+//                 \BindTexture{\Object.\Texture}\;
+//                 \Draw{\Object.\Geometry}\;
+//             }
+//         }
+//     }
+//
+//     \caption{A rendering algorithm with frustum culling\label{f:frustum-culling}}
+// \end{algorithm}
+
+A renderer that uses a single object avoids the overhead, but fails to benefit from frustum culling.
+An optimized renderer needs to find a compromise between a too fine partition of the scene, which introduces overhead, and a too coarse partition, which introduces useless rendering.
+
+- ensures to have objects that do not spread across the whole scene, since that would lead to a useless frustum culling, and many objects to avoid rendering the whole scene at each frame;
+- but not too many objects to avoid suffering from the overhead.
+
@@ -0,0 +1,210 @@
+= Implementation details
+
+During this thesis, a lot of software has been developed, and for this software to be successful and efficient, we chose appropriate languages.
+When it comes to 3D streaming systems, we need two kind of software.
+
+- *Interactive applications* which can run on as many devices as possible so we can easily conduct user studies. For this context, we chose the *JavaScript* language, since it can run on many devices and it has great support for WebGL.
+- *Native applications* which can run fast on desktop devices, in order to prepare data, run simulations and evaluate our ideas. For this context, we chose the *Rust* language, which is a somewhat recent language that provides both the efficiency of C and C++ and the safety of functional languages.
+
+== JavaScript
+
+=== THREE.js
+
+On the web browser, it is now possible to perform 3D rendering by using WebGL.
+However, WebGL is very low level and it can be painful to write code, even to render a simple triangle.
+For example, #link("https://www.tutorialspoint.com/webgl/webgl_drawing_a_triangle.htm")[this tutorial]'s code contains 121 lines of JavaScript, 46 being code (not comments or empty lines) to render a simple, non-textured triangle.
+For this reason, it seems unreasonable to build a system like the one we are describing in raw WebGL.
+There are many libraires that wrap WebGL code and that help people building 3D interfaces, and
+#link("https://threejs.org")[THREE.js] is a very popular one (56617 stars on github, making it the 35th most starred repository on GitHub as of November 26th, 2019) // \footnote{\url{https://web.archive.org/web/20191126151645/https://gitstar-ranking.com/mrdoob/three.js}}). // TODO footnote
+THREE.js acts as a 3D engine built on WebGL.
+It provides classes to deal with everything we need:
+- the *Renderer* class contains all the WebGL code needed to render a scene on the web page;
+- the *Object* class contains all the boilerplate needed to manage the tree structure of the content, it contains a transform (translation and rotation) and it can have children that are other objects;
+- the *Scene* class is the root object, it contains all of the objects we want to render and it is passed as argument to the render function;
+- the *Geometry* and *BufferGeometry* classes are the classes that hold the vertex buffers, we will discuss it more in the next paragraph;
+- the *Material* class is the class that holds the properties used to render geometry (the most important information being the texture), there are many classes derived from Material, and the developer can choose what material they want for their objects;
+- the *Mesh* class is the class that links the geometry and the material, it derives the Object class and can thus be added to a scene and rendered.
+A snippet of the basic usage of these classes is given in @three-hello-world. // TODO
+
+#figure(
+  align(left,
+    ```javascript
+// Computes the aspect ratio of the window.
+let aspectRatio = window.innerWidth / window.innerHeight;
+
+// Creates a camera and sets its parameters and position.
+let camera = new THREE.PerspectiveCamera(70, aspectRatio, 0.01, 10);
+camera.position.z = 1;
+
+// Creates the scene that contains our objects.
+let scene = new THREE.Scene();
+
+// Creates a geometry (vertices and faces) corresponding to a cube.
+let geometry = new THREE.BoxGeometry(0.2, 0.2, 0.2);
+
+// Creates a material that paints the faces depending on their normal.
+let material = new THREE.MeshNormalMaterial();
+
+// Creates a mesh that associates the geometry with the material.
+let mesh = new THREE.Mesh(geometry, material);
+
+// Adds the mesh to the scene.
+scene.add(mesh);
+
+// Creates the renderer and append its canvas to the DOM.
+renderer = new THREE.WebGLRenderer({ antialias: true });
+renderer.setSize(window.innerWidth, window.innerHeight);
+document.body.appendChild(renderer.domElement);
+
+// Renders the scene with the camera.
+renderer.render(scene, camera);
+```
+  ),
+caption: [A THREE.js _hello world_]
+)<three-hello-world>
+
+=== Geometries
+
+Geometries are the classes that hold the vertices, texture coordinates, normals and faces.
+THREE.js proposes two classes for handling geometries:
+- the *Geometry* class, which is made to be developer friendly and allows easy editing but can suffer from performance issues;
+- the *BufferGeometry* class, which is harder to use for a developer, but allows better performance since the developer controls how data is transmitted to the GPU.
+
+
+== Rust
+
+In this section, we explain the specificities of Rust and why it is an adequate language for writing efficient native software safely.
+
+=== Borrow checker
+
+Rust is a system programming language focused on safety.
+It is made to be efficient (and effectively has performances comparable to C // TODO \footnote{\url{https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/rust.html}} or C++\footnote{\url{https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/rust-gpp.html}})
+but with some extra features.
+C++ users might see it as a language like C++ but that forbids undefined behaviours.// TODO \footnote{in Rust, when you need to execute code that might lead to undefined behaviours, you need to put it inside an \texttt{unsafe} block. Many operations are not available outside an \texttt{unsafe} block (e.g., dereferencing a pointer, or mutating a static variable). The idea is that you can use \texttt{unsafe} blocks when you require it, but you should avoid it as much as possible and when you do it, you must be particularly careful.}
+The most powerful concept from Rust is _ownership_.
+Basically, every value has a variable that we call its _owner_.
+To be able to use a value, you must either be its owner or borrow it.
+There are two types of borrow, the immutable borrow and the mutable borrow (roughly equivalent to references in C++).
+The compiler comes with the _borrow checker_ which makes sure you only use variables that you are allowed to use.
+For example, the owner can only use the value if it is not being borrowed, and it is only possible to either mutably borrow a value once, or immutably borrow a value many times.
+At first, the borrow checker seems particularly efficient to detect bugs in concurrent software, but in fact, it is also decisive in non concurrent code.
+Consider the piece of C++ code in Snippets X and Y. // TODO
+
+#figure(
+  align(left,
+    ```cpp
+auto vec = std::vector<int> {1, 2, 3};
+for (auto value: vec)
+    vec.push_back(value);
+)
+    ```
+  ),
+  caption: [Undefined behaviour with for each syntax],
+)<undefined-behaviour-cpp>
+
+#figure(
+  align(left,
+    ```cpp
+auto vec = std::vector<int> {1, 2, 3};
+for (auto it = std::begin(vec); it < std::end(vec); it++)
+    vec.push_back(*it);
+    ```
+  ),
+  caption: [Undefined behaviour with iterator syntax],
+)<undefined-behaviour-it-cpp>
+
+This loop should go endlessly because the vector grows in size as we add elements in the loop.
+But the most important thing here is that since we add elements to the vector, it will eventually need to be reallocated, and that reallocation will invalidate the iterator, meaning that the following iteration will provoke an undefined behaviour.
+The equivalent code in Rust is in @undefined-behaviour-rs and @undefined-behaviour-it-rs. // TODO
+
+#columns(2, gutter: 11pt)[
+  #v(0.8cm)
+  #figure(
+    align(left,
+      ```rust
+let mut vec = vec![1, 2, 3];
+for value in &vec {
+    vec.push(value);
+}
+      ```
+    ),
+    caption: [Rust version of @undefined-behaviour-cpp],
+  )<undefined-behaviour-rs>
+
+  #colbreak()
+
+  #figure(
+    align(left,
+      ```rust
+let mut vec = vec![1, 2, 3];
+let iter = vec.iter();
+loop {
+    match iter.next() {
+        Some(x) => vec.push(x),
+        None => break,
+    }
+}
+      ```
+    ),
+    caption: [Rust version of @undefined-behaviour-it-cpp],
+  )<undefined-behaviour-it-rs>
+]
+
+
+What happens is that the iterator needs to borrow the vector.
+Because it is borrowed, it can no longer be borrowed as mutable since mutating it could invalidate the other borrowers.
+And effectively, the borrow checker will crash the compiler with the error in Snippet X. // TODO
+
+#figure(
+  align(left,
+    ```
+error[E0502]: cannot borrow `vec` as mutable because it is also borrowed as immutable
+ --> src/main.rs:4:9
+  |
+3 |     for value in &vec {
+  |                  ----
+  |                  |
+  |                  immutable borrow occurs here
+  |                  immutable borrow later used here
+4 |         vec.push(*value);
+  |         ^^^^^^^^^^^^^^^^ mutable borrow occurs here
+    ```
+  ),
+  caption: [Error given by the compiler on @undefined-behaviour-rs],
+)
+
+This example is one of the many examples of how powerful the borrow checker is: in Rust code, there can be no dangling reference, and all the segmentation faults coming from them are detected by the compiler.
+The borrow checker may seem like an enemy to newcomers because it often rejects code that seem correct, but once they get used to it, they understand what is the problem with their code and either fix the problem easily, or realize that the whole architecture is wrong and understand why.
+
+It is probably for those reasons that Rust is the _most loved programming language_ according to the Stack Overflow
+Developer Survey // TODO in~\citeyear{so-survey-2016}, \citeyear{so-survey-2017}, \citeyear{so-survey-2018} and~\citeyear{so-survey-2019}.
+
+=== Tooling
+
+Moreover, Rust comes with many programs that help developers.
+- #link("https://github.com/rust-lang/rust")[*`rustc`*] is the Rust compiler. It is comfortable due to the clarity and precise explanations of its error messages.
+- #link("https://github.com/rust-lang/cargo")[*`cargo`*] is the official Rust's project and package manager. It manages compilation, dependencies, documentation, tests, etc.
+- #link("https://github.com/racer-rust/racer")[*`racer`*], #link("https://github.com/rust-lang/rls")[*`rls`*] (Rust
+  Language Server)} and #link("https://github.com/rust-analyzer/rust-analyzer")[*`rust-analyzer`*] are software that manage automatic compilation to display errors in code editors as well as providing semantic code completion.
+- #link("https://github.com/rust-lang/rustfmt")[*`rustfmt`*] auto formats code.
+- #link("https://github.com/rust-lang/rust-clippy")[*`clippy`*] is a linter that detects unidiomatic code and suggests modifications.
+
+=== Glium
+
+When we need to perform rendering for 3D content analysis or for evaluation, we use the #link("https://github.com/glium/glium")[`glium`] library.
+Glium has many advantages over using raw OpenGL calls.
+Its objectives are:
+
+- to be easy to use: it exposes functions that are higher level than raw OpenGL calls, but still low enough level to let the developer free;
+- to be safe: debugging OpenGL code can be a nightmare, and glium does its best to use the borrow checker to its advantage to avoid OpenGL bugs;
+- to be fast: the binary produced use optimized OpenGL functions calls;
+- to be compatible: glium seeks to support the latest versions of OpenGL functions and falls back to older functions if the most recent ones are not supported on the device.
+
+=== Conclusion
+
+In our work, many tasks will consist in 3D content analysis, reorganization, rendering and evaluation.
+Many of these tasks require long computations, lasting from hours to entire days.
+To perform them, we need a programming language that has good performances.
+In addition, the extra features that Rust provides ease tremendously development, and this is why we use Rust for all tasks that do not require having a web interface.
+
+
@@ -0,0 +1,3 @@
+#include "3d-model.typ"
+#include "video-vs-3d.typ"
+#include "implementation.typ"
@@ -0,0 +1,336 @@
+= Similarities and differences between video and 3D
+
+The video streaming setting and the 3D streaming setting share many similarities: at a higher level of abstraction, both systems allow a user to access remote content without having to wait until everything is loaded.
+Analyzing similarities and differences between the video and the 3D scenarios as well as having knowledge about video streaming literature are the key to developing an efficient 3D streaming system.
+
+== Chunks of data
+
+In order to be able to perform streaming, data need to be segmented so that a client can request chunks of data and display it to the user while requesting another chunk.
+In video streaming, data chunks typically consist in a few seconds of video.
+In mesh streaming, some progressive mesh approaches encode a base mesh that contains low resolution geometry and textures and different chunks that increase the resolution of the base mesh.
+Otherwise, a mesh can also be segmented by separating geometry and textures, creating chunks that contain some faces of the model, or some other chunks containing textures.
+
+== Data persistence
+
+One of the main differences between video and 3D streaming is data persistence.
+In video streaming, only one chunk of video is required at a time.
+Of course, most video streaming services prefetch some future chunks, and keep in cache some previous ones, but a minimal system could work without latency and keep in memory only two chunks: the current one and the next one.
+
+Already a few problems appear here regarding 3D streaming:
+- depending on the user's field of view, many chunks may be required to perform a single rendering;
+- chunks do not become obsolete the way they do in video, a user navigating in a 3D scene may come back to a same spot after some time, or see the same objects but from elsewhere in the scene.
+
+== Multiple representations
+
+All major video streaming platforms support multi-resolution streaming.
+This means that a client can choose the quality at which it requests the content.
+It can be chosen directly by the user or automatically determined by analyzing the available resources (size of the screen, downloading bandwidth, device performances)
+
+#figure(
+  image("../assets/introduction/youtube-multiresolution.png", width: 80%),
+  caption: [The different qualities available for a Youtube video],
+)
+
+Similarly, recent work in 3D streaming have proposed different ways to progressively stream 3D models, displaying a low quality version of the model to the user without latency, and supporting interaction with the model while details are being downloaded.
+Such strategies are reviewed in Section X. // TODO
+
+== Media types
+
+Just like a video, a 3D scene is composed of different media types.
+In video, those media are mostly images, sounds, and subtitles, whereas in 3D, those media are geometry or textures.
+In both cases, an algorithm for content streaming has to acknowledge those different media types and manage them correctly.
+
+In video streaming, most of the data (in terms of bytes) are used for images.
+Thus, the most important thing a video streaming system should do is to optimize images streaming.
+That is why, on a video on Youtube for example, there may be 6 available qualities for images (144p, 240p, 320p, 480p, 720p and 1080p) but only 2 qualities for sound.
+This is one of the main differences between video and 3D streaming: in a 3D setting, the ratio between geometry and texture varies from one scene to another, and leveraging between those two types of content is a key problem.
+
+== Interaction
+
+The ways of interacting with content is another important difference between video and 3D.
+In a video interface, there is only one degree of freedom: time.
+The only things a user can do is letting the video play, pausing, resuming, or jumping to another time in the video.
+There are also controls for other options that are described
+#link("https://web.archive.org/web/20191014131350/https://support.google.com/youtube/answer/7631406?hl=en")[on this help page].
+
+//  For example, to perform these few actions, Youtube provides the user with multiple options.
+//  \begin{itemize}
+//
+//      \item To pause or resume a video, the user can:
+//          \begin{itemize}
+//              \item click the video;
+//              \item press the \texttt{K} key;
+//              \item press the space key.
+//          \end{itemize}
+//
+//      \item To navigate to another time in the video, the user can:
+//          \begin{itemize}
+//              \item click the timeline of the video where they want;
+//              \item press the left arrow key to move 5 seconds backwards;
+//              \item press the right arrow key to move 5 seconds forwards;
+//              \item press the \texttt{J} key to move 10 seconds backwards;
+//              \item press the \texttt{L} key to move 10 seconds forwards;
+//              \item press one of the number key (on the first row of the keyboard, below the function keys, or on the numpad) to move the corresponding tenth of the video;
+//              \item press the home key to go the beginning of the video, or the end key to go to the end.
+//          \end{itemize}
+//
+//  \end{itemize}
+//  \begin{itemize}
+//      \item up and down arrows change the sound volume;
+//      \item \texttt{M} mutes the sound;
+//      \item \texttt{C} activates the subtitles;
+//      \item \texttt{F} puts the player in fullscreen mode;
+//      \item \texttt{T} activates the theater mode (where the video occupies the total width of the screen, instead of occupying two thirds of the screen, the last third being advertising or recommendations);
+//      \item \texttt{I} activates the mini-player (allowing to search for other videos while keeping the current video playing in the bottom right corner).
+//  \end{itemize}
+//
+All the keyboard shortcuts are summed up in Figure X. // TODO
+Those interactions are different if the user is using a mobile device.
+
+// \newcommand{\relativeseekcontrol}{LightBlue}
+// \newcommand{\absoluteseekcontrol}{LemonChiffon}
+// \newcommand{\playpausecontrol}{Pink}
+// \newcommand{\othercontrol}{PalePaleGreen}
+//
+// \newcommand{\keystrokescale}{0.625}
+// \newcommand{\tuxlogo}{\FA\symbol{"F17C}}
+// \newcommand{\keystrokemargin}{0.1}
+// \newcommand{\keystroke}[5]{%
+//     \draw[%
+//         fill=white,
+//         drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
+//         rounded corners=2pt,
+//         inner sep=1pt,
+//         line width=0.5pt,
+//         font=\scriptsize\sffamily,
+//         minimum width=0.1cm,
+//         minimum height=0.1cm,
+//     ] (#1+\keystrokemargin, #3+\keystrokemargin) rectangle (#2-\keystrokemargin, #4-\keystrokemargin);
+//     \node[align=center] at ({(#1+#2)/2}, {(#3+#4)/2}) {#5\strut};
+// }
+// \newcommand{\keystrokebg}[6]{%
+//     \draw[%
+//         fill=#6,
+//         drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
+//         rounded corners=2pt,
+//         inner sep=1pt,
+//         line width=0.5pt,
+//         font=\scriptsize\sffamily,
+//         minimum width=0.1cm,
+//         minimum height=0.1cm,
+//     ] (#1+\keystrokemargin, #3+\keystrokemargin) rectangle (#2-\keystrokemargin, #4-\keystrokemargin);
+//     \node[align=center] at ({(#1+#2)/2}, {(#3+#4)/2}) {#5\strut};
+// }
+//
+// \begin{figure}[ht]
+//     \centering
+//     \begin{tikzpicture}[scale=\keystrokescale, every node/.style={scale=\keystrokescale}]
+//         % Escape key
+//         \keystroke{0}{1}{-0.75}{0}{ESC};
+//
+//         % F1 - F4
+//         \begin{scope}[shift={(1.5, 0)}]
+//             \foreach \key/\offset in {F1/1,F2/2,F3/3,F4/4}
+//                 \keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
+//         \end{scope}
+//
+//         % F5 - F8
+//         \begin{scope}[shift={(6,0)}]
+//             \foreach \key/\offset in {F5/1,F6/2,F7/3,F8/4}
+//                 \keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
+//         \end{scope}
+//
+//         % F9 - F12
+//         \begin{scope}[shift={(10.5,0)}]
+//             \foreach \key/\offset in {F9/1,F10/2,F11/3,F12/4}
+//                 \keystroke{\offset}{1+\offset}{-0.75}{0}{\key};
+//         \end{scope}
+//
+//         % Number rows
+//         \foreach \key/\offset in {`/0,-/11,=/12,\textbackslash/13}
+//             \keystroke{\offset}{1+\offset}{-1.75}{-1}{\key};
+//
+//         \foreach \key/\offset in {1/1,2/2,3/3,4/4,5/5,6/6,7/7,8/8,0/9,0/10}
+//             \keystrokebg{\offset}{1+\offset}{-1.75}{-1}{\key}{\absoluteseekcontrol};
+//
+//         % Delete char
+//         \keystroke{14}{15.5}{-1.75}{-1}{DEL};
+//
+//         % Tab char
+//         \keystroke{0}{1.5}{-2.5}{-1.75}{Tab};
+//
+//         % First alphabetic row
+//         \begin{scope}[shift={(1.5,0)}]
+//             \foreach \key/\offset in {Q/0,W/1,E/2,R/3,Y/5,U/6,O/8,P/9,[/10,]/11}
+//                 \keystroke{\offset}{1+\offset}{-2.5}{-1.75}{\key};
+//
+//             \keystrokebg{4}{5}{-2.5}{-1.75}{T}{\othercontrol};
+//             \keystrokebg{7}{8}{-2.5}{-1.75}{I}{\othercontrol};
+//         \end{scope}
+//
+//         % Caps lock
+//         \keystroke{0}{1.75}{-3.25}{-2.5}{Caps};
+//
+//         % Second alphabetic row
+//         \begin{scope}[shift={(1.75,0)}]
+//             \foreach \key/\offset in {A/0,S/1,D/2,G/4,H/5,;/9,'/10}
+//                 \keystroke{\offset}{1+\offset}{-3.25}{-2.5}{\key};
+//
+//             \keystrokebg{3}{4}{-3.25}{-2.5}{F}{\othercontrol}
+//
+//             \keystrokebg{6}{7}{-3.25}{-2.5}{J}{\relativeseekcontrol};
+//             \keystrokebg{7}{8}{-3.25}{-2.5}{K}{\playpausecontrol};
+//             \keystrokebg{8}{9}{-3.25}{-2.5}{L}{\relativeseekcontrol};
+//         \end{scope}
+//
+//         % Enter key
+//         \draw[%
+//             fill=white,
+//             drop shadow={shadow xshift=0.25ex,shadow yshift=-0.25ex,fill=black,opacity=0.75},
+//             rounded corners=2pt,
+//             inner sep=1pt,
+//             line width=0.5pt,
+//             font=\scriptsize\sffamily,
+//             minimum width=0.1cm,
+//             minimum height=0.1cm,
+//         ] (13.6, -1.85) -- (15.4, -1.85) -- (15.4, -3.15) -- (12.85, -3.15) -- (12.85, -2.6) -- (13.6, -2.6) -- cycle;
+//         \node[right] at(12.85, -2.875) {Enter $\hookleftarrow$};
+//
+//         % Left shift key
+//         \keystroke{0}{2.25}{-4}{-3.25}{$\Uparrow$ Shift};
+//
+//         % Third alphabetic row
+//         \begin{scope}[shift={(2.25,0)}]
+//             \foreach \key/\offset in {Z/0,X/1,V/3,B/4,N/5, /7,./8,\slash/9}
+//                 \keystroke{\offset}{1+\offset}{-4}{-3.25}{\key};
+//             \keystrokebg{2}{3}{-4}{-3.25}{C}{\othercontrol};
+//             \keystrokebg{6}{7}{-4}{-3.25}{M}{\othercontrol};
+//         \end{scope}
+//
+//         % Right shift key
+//         \keystroke{12.25}{15.5}{-4}{-3.25}{$\Uparrow$ Shift};
+//
+//         % Last keyboard row
+//         \keystroke{0}{1.25}{-4.75}{-4}{Ctrl};
+//         \keystroke{1.25}{2.5}{-4.75}{-4}{\tuxlogo};
+//         \keystroke{2.5}{3.75}{-4.75}{-4}{Alt};
+//         \keystrokebg{3.75}{9.75}{-4.75}{-4}{}{\playpausecontrol};
+//         \keystroke{9.75}{11}{-4.75}{-4}{Alt};
+//         \keystroke{11}{12.25}{-4.75}{-4}{\tuxlogo};
+//         \keystroke{12.25}{13.5}{-4.75}{-4}{}
+//         \keystroke{13.5}{15.5}{-4.75}{-4}{Ctrl};
+//
+//         % Arrow keys
+//         \keystrokebg{16}{17}{-4.75}{-4}{$\leftarrow$}{\relativeseekcontrol};
+//         \keystrokebg{17}{18}{-4.75}{-4}{$\downarrow$}{\othercontrol};
+//         \keystrokebg{18}{19}{-4.75}{-4}{$\rightarrow$}{\relativeseekcontrol};
+//         \keystrokebg{17}{18}{-4}{-3.25}{$\uparrow$}{\othercontrol};
+//
+//         % Control keys
+//         \keystroke{16}{17}{-1.75}{-1}{\tiny Inser};
+//         \keystrokebg{17}{18}{-1.75}{-1}{\tiny Home}{\absoluteseekcontrol};
+//         \keystroke{18}{19}{-1.75}{-1}{\tiny PgUp};
+//
+//         \keystroke{16}{17}{-2.5}{-1.75}{\tiny Del};
+//         \keystrokebg{17}{18}{-2.5}{-1.75}{\tiny End}{\absoluteseekcontrol};
+//         \keystroke{18}{19}{-2.5}{-1.75}{\tiny PgDown};
+//
+//         % Numpad
+//         \keystroke{19.5}{20.5}{-1.75}{-1}{Lock};
+//         \keystroke{20.5}{21.5}{-1.75}{-1}{/};
+//         \keystroke{21.5}{22.5}{-1.75}{-1}{*};
+//         \keystroke{22.5}{23.5}{-1.75}{-1}{-};
+//
+//         \keystrokebg{19.5}{20.5}{-2.5}{-1.75}{7}{\absoluteseekcontrol};
+//         \keystrokebg{20.5}{21.5}{-2.5}{-1.75}{8}{\absoluteseekcontrol};
+//         \keystrokebg{21.5}{22.5}{-2.5}{-1.75}{9}{\absoluteseekcontrol};
+//
+//         \keystrokebg{19.5}{20.5}{-3.25}{-2.5}{4}{\absoluteseekcontrol};
+//         \keystrokebg{20.5}{21.5}{-3.25}{-2.5}{5}{\absoluteseekcontrol};
+//         \keystrokebg{21.5}{22.5}{-3.25}{-2.5}{6}{\absoluteseekcontrol};
+//
+//         \keystrokebg{19.5}{20.5}{-4}{-3.25}{1}{\absoluteseekcontrol};
+//         \keystrokebg{20.5}{21.5}{-4}{-3.25}{2}{\absoluteseekcontrol};
+//         \keystrokebg{21.5}{22.5}{-4}{-3.25}{3}{\absoluteseekcontrol};
+//
+//         \keystrokebg{19.5}{21.5}{-4.75}{-4}{0}{\absoluteseekcontrol};
+//         \keystroke{21.5}{22.5}{-4.75}{-4}{.};
+//
+//         \keystroke{22.5}{23.5}{-3.25}{-1.75}{+};
+//         \keystroke{22.5}{23.5}{-4.75}{-3.25}{$\hookleftarrow$};
+//     \end{tikzpicture}
+//
+//     \vspace{0.5cm}
+//
+//     % Legend
+//     \begin{tikzpicture}[scale=\keystrokescale]
+//
+//         \keystrokebg{0}{1}{0}{1}{}{\absoluteseekcontrol};
+//         \node[right=0.3cm] at (0.5, 0.5) {\small Absolute seek keys};
+//
+//         \keystrokebg{6}{7}{0}{1}{}{\relativeseekcontrol};
+//         \node[right=0.3cm] at (6.5, 0.5) {\small Relative seek keys};
+//
+//         \keystrokebg{12}{13}{0}{1}{}{\playpausecontrol};
+//         \node[right=0.3cm] at (12.5, 0.5) {\small Play or pause keys};
+//
+//         \keystrokebg{18}{19}{0}{1}{}{\othercontrol};
+//         \node[right=0.3cm] at (18.5, 0.5) {\small Other shortcuts};
+//
+//     \end{tikzpicture}
+//
+//     \caption{Youtube shortcuts (white keys are unused)\label{i:youtube-keyboard}}
+// \end{figure}
+
+
+When it comes to 3D, there are many approaches to manage user interaction.
+Some interfaces mimic the video scenario, where the only variable is the time and the camera follows a predetermined path on which the user has no control.
+These interfaces are not interactive, and can be frustrating to the user who might feel constrained.
+
+Some other interfaces add 2 degrees of freedom to the timeline: the user does not control the camera's position but can control the angle. This mimics the 360 video scenario.
+This is typically the case of the video game #link("http://nolimitscoaster.com/")[_nolimits 2: roller coaster simulator_] which works with VR devices (oculus rift, HTC vive, etc.) where the only interaction available to the user is turning the head.
+
+Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the camera's position, and 2 being the angles (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
+The most common controls are the trackball controls where the user rotate the object like a ball
+#link("https://threejs.org/examples/?q=controls\#misc_controls_trackball")[(live example here)] and the orbit controls, which behave like the trackball controls but preserving the up vector #link("https://threejs.org/examples/?q=controls\#misc_controls_orbit")[(live example here)].
+These types of controls are notably used on the popular mesh editor #link("http://www.meshlab.net/")[MeshLab] and
+#link("https://sketchfab.com/")[SketchFab], the YouTube for 3D models.
+
+#figure(
+  image("../assets/state-of-the-art/3d-interaction/meshlab.png", width: 80%),
+  caption: [Screenshot of MeshLab],
+)
+
+Another popular way of controlling a free camera in a virtual environment is the first person controls #link("https://threejs.org/examples/?q=controls\#misc_controls_pointerlock")[(live example here)].
+These controls are typically used in shooting video games, the mouse rotates the camera and the keyboard translates it.
+
+== Relationship between interface, interaction and streaming
+
+In both video and 3D systems, streaming affects interaction.
+For example, in a video streaming scenario, if a user sees that the video is fully loaded, they might start moving around on the timeline, but if they see that the streaming is just enough to not stall, they might prefer not interacting and just watch the video.
+If the streaming stalls for too long, the user might seek somewhere else hoping for the video to resume, or get frustrated and leave the video.
+The same types of behaviour occur in 3D streaming: if a user is somewhere in a scene, and sees more data appearing, they might wait until enough data have arrived, but if they see nothing happens, they might leave to look for data somewhere else.
+
+Those examples show how streaming can affect interaction, but interaction also affects streaming.
+In a video streaming scenario, if a user is watching peacefully without interacting, the system just has to request the next chunks of video and display them.
+However, if a user starts seeking at a different time of the streaming, the streaming would most likely stall until the system is able to gather the data it needs to resume the video.
+Just like in the video setup, the way a user navigates in a networked virtual environment affects the streaming.
+Moving slowly allows the system to collect and display data to the user, whereas moving frenetically puts more pressure on the streaming: the data that the system requested may be obsolete when the response arrives.
+
+Moreover, the interface and the way elements are displayed to the user also impacts his behaviour.
+A streaming system can use this effect to enhancing the quality of experience by providing feedback on the streaming to the user via the interface.
+For example, on Youtube, the buffered portion of the video is displayed in light grey on the timeline, whereas the portion that remains to be downloaded is displayed in dark grey.
+A user is more likely to click on the light grey part of the timeline than on the dark grey part, preventing the streaming from stalling.
+
+// \begin{figure}[th]
+//     \centering
+//     \begin{tikzpicture}
+//         \node (S) at (0, 0) [draw, rectangle, minimum width=2cm,minimum height=1cm] {Streaming};
+//         \node (I) at (-2, -3) [draw, rectangle, minimum width=2cm,minimum height=1cm] {Interface};
+//         \node (U) at (2, -3) [draw, rectangle, minimum width=2cm,minimum height=1cm] {User};
+//         \draw[double ended double arrow=5pt colored by black and white] (S) -- (I);
+//         \draw[double ended double arrow=5pt colored by black and white] (S) -- (U);
+//         \draw[double arrow=5pt colored by black and white] (I) -- (U);
+//     \end{tikzpicture}
+// \end{figure}
+