Fix cite

2024-03-17 18:04:44 +01:00 · 2024-03-17 18:04:44 +01:00 · 8b3a0cb91e
commit 8b3a0cb91e
parent c0155a1c39
5 changed files with 52 additions and 50 deletions
--- a/foreword/implementation.typ
+++ b/foreword/implementation.typ
@ -131,6 +131,7 @@ And effectively, the borrow checker will crash the compiler with the error in @u
    raw(
      read("../assets/dash-3d-implementation/undefined-behaviour-error.txt"),
      block: true,
+      lang: "ansi"
    ),
  ),
  caption: [Error given by the compiler on @undefined-behaviour-rs],
--- a/preliminary-work/intro.typ
+++ b/preliminary-work/intro.typ
@ -7,11 +7,11 @@ To allow users to easily find these interesting locations within the NVE, _3D bo
 A bookmark is simply a 3D virtual camera (with position and camera parameters) predefined by the content provider, and can be presented to users in different ways, including as a text link (URL), a thumbnail image, or a 3D object embedded within the NVE itself.

 When users click on a bookmark, NVEs commonly provide a "fly-to" animation to transit the camera from the current
-viewpoint to the destination #cite("controlled-movement-virtual-3d", "browsing-3d-bookmarks") to help orient the users within the 3D space.
+viewpoint to the destination @controlled-movement-virtual-3d @browsing-3d-bookmarks to help orient the users within the 3D space. // TODO double cite
 Clicking on a bookmark to fly to another viewpoint leads to reduced data locality.
 The 3D content at the  bookmarked destination viewpoint may overlap less with the current viewpoint.
 In the worst case, the 3D objects corresponding to the current and destination viewpoints can be completely disjoint.
-Such movement to a bookmark may lead to a _discovery latency_ #cite("second-life"), in which users have to wait for the 3D content for the new viewpoint to be loaded and displayed.
+Such movement to a bookmark may lead to a _discovery latency_ @second-life, in which users have to wait for the 3D content for the new viewpoint to be loaded and displayed.
 An analogy for this situation, in the context of video streaming, is seeking into a segment of video that has not been prefetched yet.

 In this chapter, we explore the impact of bookmarks on NVE navigation and streaming, and make several contributions.
--- a/related-work/3d-interaction.typ
+++ b/related-work/3d-interaction.typ
@ -2,47 +2,48 @@

 One of the uses for 3D streaming is to allow users interacting with the content while it is being downloaded.
 However, devising an ergonomic technique for browsing 3D environments through a 2D interface is difficult.
-Controlling the viewpoint in 3D (6 DOFs) with 2D devices is not only inherently challenging but also strongly task-dependent. In their review, #cite("interaction-3d-environment") distinguish between several types of camera movements: general movements for exploration (e.g., navigation with no explicit target), targeted movements (e.g., searching and/or examining a model in detail), specified trajectory (e.g., a cinematographic camera path).
+Controlling the viewpoint in 3D (6 DOFs) with 2D devices is not only inherently challenging but also strongly task-dependent. In their review, @interaction-3d-environment distinguish between several types of camera movements: general movements for exploration (e.g., navigation with no explicit target), targeted movements (e.g., searching and/or examining a model in detail), specified trajectory (e.g., a cinematographic camera path).
 For each type of movement, specialized 3D interaction techniques can be designed.
 In most cases, rotating, panning, and zooming movements are required, and users are consequently forced to switch back and forth among several navigation modes, leading to interactions that are too complicated overall for a layperson.
 Navigation aids and smart widgets are required and subject to research efforts both in 3D companies (see #link("https://sketchfab.com")[sketchfab.com], #link("https://cl3ver.com")[cl3ver.com] among others) and in academia, as reported below.

 Translating and rotating the camera can be simply specified by a _lookat_ point.
-This is often known as point-of-interest (POI) movement (or _go-to_, _fly-to_ interactions) #cite("controlled-movement-virtual-3d").
+This is often known as point-of-interest (POI) movement (or _go-to_, _fly-to_ interactions) @controlled-movement-virtual-3d.
 Given such a point, the camera automatically moves from its current position to a new position that looks at the POI.
 One key issue of these techniques is to correctly orient the camera at destination.
-In Unicam #cite("two-pointer-input"), the so-called click-to-focus strategy automatically chooses the destination viewpoint depending on 3D orientations around the contact point.
-The more recent Drag'n Go interaction #cite("drag-n-go") also hits a destination point while offering control on speed and position along the camera path.
+In Unicam @two-pointer-input, the so-called click-to-focus strategy automatically chooses the destination viewpoint depending on 3D orientations around the contact point.
+The more recent Drag'n Go interaction @drag-n-go also hits a destination point while offering control on speed and position along the camera path.
 This 3D interaction is designed in the screen space (it is typically a mouse-based camera control), where cursor's movements are mapped to camera movements following the same direction as the on-screen optical-flow.

 #figure(
  image("../assets/related-work/3d-interaction/dragngo.png", width: 70%),
-  caption: [Screenshot of the drag'n go interface #cite("drag-n-go") (the percentage widget is for illustration)]
+  caption: [Screenshot of the drag'n go interface @drag-n-go (the percentage widget is for illustration)]
 )

-Some 3D browsers provide a viewpoint menu offering a choice of viewpoints #cite("visual-perception-3d", "showmotion").
+Some 3D browsers provide a viewpoint menu offering a choice of viewpoints @visual-perception-3d, @showmotion. // TODO double cite
 Authors of 3D scenes can place several viewpoints (typically for each POI) in order to allow easy navigation for users, who can then easily navigate from viewpoint to viewpoint just by selecting a menu item.
-Such viewpoints can be either static, or dynamically adapted: #cite("dual-mode-ui") report that users clearly prefer navigating in 3D using a menu with animated viewpoints than with static ones.
+Such viewpoints can be either static, or dynamically adapted: @dual-mode-ui report that users clearly prefer navigating in 3D using a menu with animated viewpoints than with static ones.

 #figure(
  image("../assets/related-work/3d-interaction/burtnyk.png", width: 70%),
-  caption: [Screenshot of an interface with menu for navigation #cite("showmotion")]
+  caption: [Screenshot of an interface with menu for navigation @showmotion]
 )

-Early 3D VRML environments #cite("browsing-3d-bookmarks") offer 3D bookmarks with animated transitions between bookmarked views.
+Early 3D VRML environments @browsing-3d-bookmarks offer 3D bookmarks with animated transitions between bookmarked views.
 These transitions prevent disorientation since users see how they got there.
 Hyperlinks can also ease rapid movements between distant viewpoints and naturally support non-linear and non-continuous access to 3D content.
-Navigating with 3D hyperlinks is faster due to the instant motion, but can cause disorientation, as shown by the work of #cite("ve-hyperlinks").
-#cite("linking-behavior-ve") examine explicit landmark links as well as implicit avatar-chosen links in Second Life.
+Navigating with 3D hyperlinks is faster due to the instant motion, but can cause disorientation, as shown by the work of @ve-hyperlinks.
+@linking-behavior-ve examine explicit landmark links as well as implicit avatar-chosen links in Second Life.
 These authors point out that linking is appreciated by users and that easing linking would likely result in a richer user experience.
-#cite("dual-mode-ui") developed the Dual-Mode User Interface (DMUI) that coordinates and links hypertext to 3D graphics in order to access information in a 3D space.
+@dual-mode-ui developed the Dual-Mode User Interface (DMUI) that coordinates and links hypertext to 3D graphics in order to access information in a 3D space.

 #figure(
  image("../assets/related-work/3d-interaction/dmui.png", width: 100%),
-  caption: [The two modes of DMUI #cite("dual-mode-ui")]
+  caption: [The two modes of DMUI @dual-mode-ui]
 )

 The use of in-scene 3D navigation widgets can also facilitate 3D navigation tasks.
-#cite("navigation-aid-multi-floor") propose and evaluate 2D and 3D maps as navigation aids for complex virtual buildings and find that the 2D navigation aid outperforms the 3D one for searching tasks.
-The ViewCube widget #cite("viewcube") serves as a proxy for the 3D scene and offers viewpoint switching between 26 views while clearly indicating associated 3D orientations.
-Interactive 3D arrows that point to objects of interest have also been proposed as navigation aids in #cite("location-pointing-navigation-aid", "location-pointing-effect"): when clicked, the arrows transfer the viewpoint to the destination through a simulated walk or a faster flight.
+@navigation-aid-multi-floor propose and evaluate 2D and 3D maps as navigation aids for complex virtual buildings and find that the 2D navigation aid outperforms the 3D one for searching tasks.
+The ViewCube widget @viewcube serves as a proxy for the 3D scene and offers viewpoint switching between 26 views while clearly indicating associated 3D orientations.
+Interactive 3D arrows that point to objects of interest have also been proposed as navigation aids in
+@location-pointing-navigation-aid, @location-pointing-effect): when clicked, the arrows transfer the viewpoint to the destination through a simulated walk or a faster flight. // TODO double cite
--- a/related-work/3d-streaming.typ
+++ b/related-work/3d-streaming.typ
@ -6,7 +6,7 @@ In the next sections, we review the 3D streaming related work, from 3D compressi

 === Compression and structuring

-According to #cite("maglo20153d"), mesh compression can be divided into four categories:
+According to @maglo20153d, mesh compression can be divided into four categories:
 - single-rate mesh compression, seeking to reduce the size of a mesh;
 - progressive mesh compression, encoding meshes in many levels of resolution that can be downloaded and rendered one after the other;
 - random accessible mesh compression, where different parts of the models can be decoded in an arbitrary order;
@ -15,7 +15,7 @@ According to #cite("maglo20153d"), mesh compression can be divided into four cat
 Since our objective is to stream 3D static scenes, single-rate mesh and mesh sequence compressions are less interesting for us.
 This section thus focuses on progressive meshes and random accessible mesh compression.

-Progressive meshes were introduced in #cite("progressive-meshes") and allow a progressive transmission of a mesh by sending a low resolution mesh first, called _base mesh_, and then transmitting detail information that a client can use to increase the resolution.
+Progressive meshes were introduced in @progressive-meshes and allow a progressive transmission of a mesh by sending a low resolution mesh first, called _base mesh_, and then transmitting detail information that a client can use to increase the resolution.
 To do so, an algorithm, called _decimation algorithm_, starts from the original full resolution mesh and iteratively removes vertices and faces by merging vertices through the so-called _edge collapse_ operation (Figure X).

 // \begin{figure}[ht]
@ -83,17 +83,17 @@ This process reduces the time a user has to wait before seeing a downloaded 3D o
  caption: [Four levels of resolution of a mesh]
 )

-#cite("streaming-compressed-webgl") develop a dedicated progressive compression algorithm based on iterative decimation, for efficient decoding, in order to be usable on web clients.
-With the same objective, #cite("pop-buffer") proposes pop buffer, a progressive compression method based on quantization that allows efficient decoding.
+@streaming-compressed-webgl develop a dedicated progressive compression algorithm based on iterative decimation, for efficient decoding, in order to be usable on web clients.
+With the same objective, @pop-buffer proposes pop buffer, a progressive compression method based on quantization that allows efficient decoding.

 Following these, many approaches use multi triangulation, which creates mesh fragments at different levels of resolution and encodes the dependencies between fragments in a directed acyclic graph.
-In #cite("batched-multi-triangulation"), the authors propose Nexus: a GPU optimized version of multi triangulation that pushes its performances to make real time rendering possible.
-It is notably used in 3DHOP (3D Heritage Online Presenter, #cite("3dhop")), a framework to easily build web interfaces to present 3D objects to users in the context of cultural heritage.
+In @batched-multi-triangulation, the authors propose Nexus: a GPU optimized version of multi triangulation that pushes its performances to make real time rendering possible.
+It is notably used in 3DHOP (3D Heritage Online Presenter, @3dhop), a framework to easily build web interfaces to present 3D objects to users in the context of cultural heritage.

 Each of these approaches define its own compression and coding for a single mesh.
 However, users are often interested in scenes that contain multiple meshes, and the need to structure content emerged.

-To answer those issues, the Khronos group proposed a generic format called glTF (GL Transmission Format, #cite("gltf")) to handle all types of 3D content representations: point clouds, meshes, animated models, etc.
+To answer those issues, the Khronos group proposed a generic format called glTF (GL Transmission Format, @gltf) to handle all types of 3D content representations: point clouds, meshes, animated models, etc.
 glTF is based on a JSON file, which encodes the structure of a scene of 3D objects.
 It contains a scene graph with cameras, meshes, buffers, materials, textures and animations.
 Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming, which is required for large scene remote visualization and which we address in our work.
@ -104,14 +104,14 @@ Although relevant for compression, transmission and in particular streaming, thi
 In terms of quality of experience, it is desirable that the downloaded content falls into the user's field of view.
 This means that the progressive compression must encode a spatial information in order to allow the decoder to determine content adapted to its viewpoint.
 This is typically called _random accessible mesh compression_.
-#cite("maglo2013pomar") is such an example of random accessible progressive mesh compression.
-#cite("cheng2008receiver") proposes a receiver driven way of achieving viewpoint dependency with progressive mesh: the client starts by downloading the base mesh, and from then is able to estimate the importance of the different vertex splits, in order to choose which ones to download.
+@maglo2013pomar is such an example of random accessible progressive mesh compression.
+@cheng2008receiver proposes a receiver driven way of achieving viewpoint dependency with progressive mesh: the client starts by downloading the base mesh, and from then is able to estimate the importance of the different vertex splits, in order to choose which ones to download.
 Doing so drastically reduces the server computational load, since it only has to send data, and improves the scalability of this framework.

 In the case of streaming a large 3D scene, view-dependent streaming is fundamental: a user will only be seeing one small portion of the scene at each time, and a system that does not adapt its streaming to the user's point of view is bound to induce a low quality of experience.

 A simple way to implement viewpoint dependency is to request the content that is spatially close to the user's camera.
-This approach, implemented in Second Life and several other NVEs (e.g., #cite("peer-texture-streaming")), only depends on the location of the avatar, not on its viewing direction.
+This approach, implemented in Second Life and several other NVEs (e.g., @peer-texture-streaming), only depends on the location of the avatar, not on its viewing direction.
 It exploits spatial coherence and works well for any continuous movement of the user, including turning.
 Once the set of objects that are likely to be accessed by the user is determined, the next question is in what order should these objects be retrieved.
 A simple approach is to retrieve the objects based on distance: the spatial distance from the user's virtual location and rotational distance from the user's view.
@ -126,12 +126,12 @@ Even though there are no associated publications to support this assertion, it s
 )<rw:google-maps>

 Other approaches use level of details.
-Level of details have been initially used for efficient 3D rendering #cite("lod").
+Level of details have been initially used for efficient 3D rendering @lod.
 When the change from one level of detail to another is direct, it can create visual discomfort to the user.
-This is called the _popping effect_ and level of details have the advantage of enabling techniques, such as geomorhping #cite("hoppe-lod"), to transition smoothly from one level of detail to another.
+This is called the _popping effect_ and level of details have the advantage of enabling techniques, such as geomorhping @hoppe-lod, to transition smoothly from one level of detail to another.
 Level of details have then been used for 3D streaming.
-For example, #cite("streaming-hlod") propose an out-of-core viewer for remote model visualization based by adapting hierarchical level of details #cite("hlod") to the context of 3D streaming.
-Level of details can also be used to perform viewpoint dependant streaming, such as #cite("view-dependent-lod").
+For example, @streaming-hlod propose an out-of-core viewer for remote model visualization based by adapting hierarchical level of details @hlod to the context of 3D streaming.
+Level of details can also be used to perform viewpoint dependant streaming, such as @view-dependent-lod.

 === Texture streaming

@ -140,9 +140,9 @@ It consists in generating progressively lower resolutions of an initial texture.
 Lower resolutions of the textures are used for polygons which are far away from the camera, and higher resolutions for polygons closer to the camera.
 Not only this reduces the time needed to render the polygons, but it can also reduce the aliasing effect.
 Using these lower resolutions can be especially interesting for streaming.
-#cite("mipmap-streaming") proposes the PTM format which encode the mipmap levels of a texture that can be downloaded progressively, so that a lower resolution can be shown to the user while the higher resolutions are being downloaded.
+@mipmap-streaming proposes the PTM format which encode the mipmap levels of a texture that can be downloaded progressively, so that a lower resolution can be shown to the user while the higher resolutions are being downloaded.

-Since 3D data can contain many textures, #cite("simon2019streaming") propose a way to stream a set of textures by encoding them into a video.
+Since 3D data can contain many textures, @simon2019streaming propose a way to stream a set of textures by encoding them into a video.
 Each texture is segmented into tiles of a fixed size.
 Those tiles are then ordered to minimize dissimilarities between consecutive tiles, and encoded as a video.
 By benefiting from the video compression techniques, the authors are able to reach a better rate-distortion ratio than webp, which is the new standard for texture transmission, and jpeg.
@ -152,9 +152,9 @@ By benefiting from the video compression techniques, the authors are able to rea
 As discussed in @f:3d, most 3D scenes consist in two main types of data: geometry and textures.
 When addressing 3D streaming, one must handle the concurrency between geometry and textures, and the system needs to address this compromise.

-Balancing between streaming of geometry and texture data is addressed by #cite("batex3"), #cite("visual-quality-assessment"), and #cite("mesh-texture-multiplexing").
+Balancing between streaming of geometry and texture data is addressed by @batex3"), @visual-quality-assessment, and @mesh-texture-multiplexing.
 Their approaches combine the distortion caused by having lower resolution meshes and textures into a single view independent metric.
-#cite("progressive-compression-textured-meshes") also deals with the geometry / texture compromise.
+@progressive-compression-textured-meshes also deals with the geometry / texture compromise.
 This work designs a cost driven framework for 3D data compression, both in terms of geometry and textures.
 The authors generate an atlas for textures that enables efficient compression and multi-resolution scheme.
 All four works considered a single mesh, and have constraints on the types of meshes that they are able to compress.
@ -169,11 +169,11 @@ This is why optimized engines for video games use techniques that are reused for

 Some other online games, such as #link("https://secondlife.com")[Second Life], rely on user generated data, and thus are forced to send data from users to others.
 In such scenarios, 3D streaming is appropriate and this is why the idea of streaming 3D content for video games has been investigated.
-For example, #cite("game-on-demand") proposes an online game engine based on geometry streaming, that addresses the challenge of streaming 3D content at the same time as synchronization of the different players.
+For example, @game-on-demand proposes an online game engine based on geometry streaming, that addresses the challenge of streaming 3D content at the same time as synchronization of the different players.

 === NVE streaming frameworks

-An example of NVE streaming framework is 3D Tiles #cite("3d-tiles"), which is a specification for visualizing massive 3D geospatial data developed by Cesium and built on top of glTF.
+An example of NVE streaming framework is 3D Tiles @3d-tiles, which is a specification for visualizing massive 3D geospatial data developed by Cesium and built on top of glTF.
 Their main goal is to display 3D objects on top of regular maps, and their visualization consists in a top-down view, whereas we seek to let users freely navigate in our scenes, whether it be flying over the scene or moving along the roads.

 #figure(
@ -202,7 +202,7 @@ It started with a regular octree, but has then been improved to a $k$-d tree (se

 In~\citeyear{3d-tiles-10x}, 3D Tiles streaming system was improved by preloading the data at the camera's next position when known in advance (with ideas that are similar to those we discuss and implement in Chapter~\ref{bi}, published in~\citeyear{bookmarks-impact}) and by ordering tile requests depending on the user's position (with ideas that are similar to those we discuss and implement in Chapter~\ref{d3}, published in~\citeyear{dash-3d}).

-#cite("zampoglou") is another example of a streaming framework: it is the first paper that proposes to use DASH to stream 3D content.
+@zampoglou is another example of a streaming framework: it is the first paper that proposes to use DASH to stream 3D content.
 In their work, the authors describe a system that allows users to access 3D content at multiple resolutions.
 They organize the content, following DASH terminology, into periods, adaptation sets, representations and segments.
 Their first adaptation set codes the tree structure of the scene graph.
--- a/related-work/video.typ
+++ b/related-work/video.typ
@ -1,7 +1,7 @@
 == Video

 Accessing a remote video through the web has been a widely studied problem since the 1990s.
-The Real-time Transport Protocol (RTP, #cite("rtp-std")) has been an early attempt to formalize audio and video streaming.
+The Real-time Transport Protocol (RTP, @rtp-std) has been an early attempt to formalize audio and video streaming.
 The protocol allowed data to be transferred unilaterally from a server to a client, and required the server to handle a separate session for each client.

 In the following years, HTTP servers have become ubiquitous, and many industrial actors (Apple, Microsoft, Adobe, etc.) developed HTTP streaming systems to deliver multimedia content over the network.
@ -13,8 +13,8 @@ This type of network architecture is called CDN (Content Delivery Network) and i

 === DASH: the standard for video streaming<rw:dash>

-Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH #cite("dash-std", "dash-std-2") is now a widely deployed
-standard for adaptively streaming video on the web #cite("dash-std-full"), made to be simple, scalable and inter-operable.
+Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH @dash-std, @dash-std-2) is now a widely deployed
+standard for adaptively streaming video on the web @dash-std-full, made to be simple, scalable and inter-operable.
 DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content to download, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD.

 #heading(level: 4, numbering: none)[DASH structure]
@ -45,8 +45,8 @@ A representation of the images of a chapter of a movie is still a long video, an
 Segments are used to prevent this issue.
 They typically encode files that contain two to ten seconds of video, and give the software a greater ability to dynamically adapt to the system.
 If a user wants to seek somewhere else in the video, only one segment of data is potentially lost, and only one segment
-of data needs to be downloaded for the playback to resume. The impact of the segment duration has been investigated in many work, including #cite("sideris2015mpeg", "stohr2017sweet").
-For example, #cite("stohr2017sweet") discuss how the segment duration affects the streaming: short segments lower the initial delay and provide the best stalling quality of experience, but make the total downloading time of the video longer because of overhead.
+of data needs to be downloaded for the playback to resume. The impact of the segment duration has been investigated in many work, including @sideris2015mpeg", "stohr2017sweet.
+For example, @stohr2017sweet discuss how the segment duration affects the streaming: short segments lower the initial delay and provide the best stalling quality of experience, but make the total downloading time of the video longer because of overhead.

 #heading(level: 4, numbering: none)[Content preparation and server]

@ -59,19 +59,19 @@ This is one of the DASH strengths: no powerful server is required, and since sta
 #heading(level: 4, numbering: none)[Client side adaptation]

 A client typically starts by downloading the MPD file, and then proceeds on downloading segments from the different adaptation sets. While the standard describes well how to structure content on the server side, the client may be freely implemented to take into account the specificities of a given application.
-The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have their own
-adaptation logic, and many more have been proposed in the literature.
-A thorough review is beyond the scope of this state-of-the-art, but examples include #cite("chiariotti2016online") who formulate the problem in a reinforcement learning framework, #cite("yadav2017quetra") who formulate the problem using queuing theory, or #cite("huang2019hindsight") who use a formulation derived from the knapsack problem.
+The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next.
+Most of the industrial actors have their own adaptation logic, and many more have been proposed in the literature.
+A thorough review is beyond the scope of this state-of-the-art, but examples include @chiariotti2016online who formulate the problem in a reinforcement learning framework, @yadav2017quetra who formulate the problem using queuing theory, or @huang2019hindsight who use a formulation derived from the knapsack problem.

 === DASH-SRD
 Being now widely adopted in the context of video streaming, DASH has been adapted to various other contexts.
-DASH-SRD (Spatial Relationship Description, #cite("dash-srd")) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device.
+DASH-SRD (Spatial Relationship Description, @dash-srd) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device.
 It works by encoding a video at multiple resolutions, and tiling the highest resolutions as shown in Figure \ref{sota:srd-png}.
 That way, a client can choose to download either the low resolution of the whole video or higher resolutions of a subpart of the video.

 #figure(
  image("../assets/related-work/video/srd.png", width: 60%),
-  caption: [DASH-SRD #cite("dash-srd")],
+  caption: [DASH-SRD @dash-srd],
 )

 For each tile of the video, an adaptation set is declared in the MPD, and a supplemental property is defined in order to give the client information about the tile.
@ -90,6 +90,6 @@ An example of such a property is given in @rw:srd-xml.
 )<rw:srd-xml>

 Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed.
-While Figure \ref{sota:srd-png} illustrates how DASH-SRD can be used in the context of zoomable video streaming, the ideas developed in DASH-SRD have proven to be particularly useful in the context of 360 video streaming (see for example #cite("ozcinar2017viewport")).
+While Figure \ref{sota:srd-png} illustrates how DASH-SRD can be used in the context of zoomable video streaming, the ideas developed in DASH-SRD have proven to be particularly useful in the context of 360 video streaming (see for example @ozcinar2017viewport).
 This is especially interesting in the context of 3D streaming since we have this same pattern of a user viewing only a part of a content.