This commit is contained in:
Thomas Forgione 2023-04-22 17:26:54 +02:00
parent 056fcfacd1
commit 77dfebde7d
23 changed files with 1366 additions and 5 deletions

11
abstracts/simple-en.typ Normal file
View File

@ -0,0 +1,11 @@
More and more 3D models are made available online, and web browsers have now full support for 3D visualization:
this thesis focuses on remote 3D virtual environments streaming and interaction, and describes three major contributions.
First, we propose an interface for 3D navigation with bookmarks, which are small virtual objects added to the scene that the user can click to move towards a recommended location.
We describe a user study where we analyse the impact of bookmarks on navigation and streaming, and we propose a way to improve the streaming based on the bookmarks.
Secondly, we propose an adaptation of DASH, the video streaming standard, to 3D streaming.
We structure the 3D data and textures into chunks, and we propose a client and a few streaming policies that benefit from this structure.
Finally, we integrate our 3D version of DASH and the bookmarks together in a interface for mobile devices, and we describe another study where participants tried this interface.

9
abstracts/simple-fr.typ Normal file
View File

@ -0,0 +1,9 @@
Dans un contexte de démocratisation du nombre et de l'accès à des modèles 3D, nous nous intéressons dans cette thèse à la transmission d'environnements virtuels 3D distants, à travers trois contributions majeures.
Tout d'abord, nous proposons une interface de navigation 3D avec des signets, de petits objets virtuels ajoutés à la scène que l'utilisateur peut cliquer pour se déplacer vers un emplacement recommandé.
Nous proposons un moyen d'améliorer la transmission en fonction des signets et évaluons leur impact au travers d'une étude utilisateur.
Ensuite, nous proposons une adaptation de DASH, le standard de la transmission, à la transmission 3D.
Nous structurons les données 3D et les textures, et nous proposons un client et des politiques de chargement qui bénéficient de cette structure.
Enfin, nous intégrons notre version 3D de DASH et les signets ensemble dans une interface pour appareils mobiles, et nous décrivons une autre étude où les participants ont essayé cette interface.

View File

Before

Width:  |  Height:  |  Size: 181 KiB

After

Width:  |  Height:  |  Size: 181 KiB

View File

Before

Width:  |  Height:  |  Size: 529 KiB

After

Width:  |  Height:  |  Size: 529 KiB

View File

Before

Width:  |  Height:  |  Size: 418 KiB

After

Width:  |  Height:  |  Size: 418 KiB

View File

Before

Width:  |  Height:  |  Size: 268 KiB

After

Width:  |  Height:  |  Size: 268 KiB

View File

Before

Width:  |  Height:  |  Size: 238 KiB

After

Width:  |  Height:  |  Size: 238 KiB

View File

Before

Width:  |  Height:  |  Size: 237 KiB

After

Width:  |  Height:  |  Size: 237 KiB

View File

Before

Width:  |  Height:  |  Size: 244 KiB

After

Width:  |  Height:  |  Size: 244 KiB

View File

Before

Width:  |  Height:  |  Size: 414 KiB

After

Width:  |  Height:  |  Size: 414 KiB

View File

Before

Width:  |  Height:  |  Size: 4.8 MiB

After

Width:  |  Height:  |  Size: 4.8 MiB

View File

Before

Width:  |  Height:  |  Size: 672 KiB

After

Width:  |  Height:  |  Size: 672 KiB

View File

Before

Width:  |  Height:  |  Size: 46 KiB

After

Width:  |  Height:  |  Size: 46 KiB

View File

@ -14,4 +14,3 @@
</Representation>
</AdaptationSet>
</Period>

938
bib.bib Normal file
View File

@ -0,0 +1,938 @@
@inproceedings{dash-srd,
title={MPEG DASH SRD: spatial relationship description},
author={Niamut, Omar A and Thomas, Emmanuel and D'Acunto, Lucia and Concolato, Cyril and Denoual, Franck and Lim, Seong Yong},
booktitle={Proceedings of the 7th International Conference on Multimedia Systems},
pages={5},
year={2016},
organization={ACM}
}
@inproceedings{dash-std,
author = {Stockhammer, Thomas},
title = {Dynamic Adaptive Streaming over HTTP --: Standards and Design Principles},
booktitle = {Proceedings of the Second Annual ACM Conference on Multimedia Systems},
series = {MMSys '11},
year = {2011},
isbn = {978-1-4503-0518-1},
location = {San Jose, CA, USA},
pages = {133--144},
numpages = {12},
url = {http://doi.acm.org/10.1145/1943552.1943572},
doi = {10.1145/1943552.1943572},
acmid = {1943572},
month = {Feb},
publisher = {ACM},
address ={San Jose, CA, USA},
keywords = {3gpp, mobile video, standards, streaming, video},
}
@article{dash-std-2,
author = {Sodagar, Iraj},
doi = {10.1109/MMUL.2011.71},
issn = {1070-986X},
journal = {IEEE Multimedia},
month = {apr},
number = {4},
pages = {62--67},
title = {{The MPEG-DASH Standard for Multimedia Streaming Over the Internet}},
url = {http://ieeexplore.ieee.org/document/6077864/},
volume = {18},
year = {2011}
}
@techreport{rtp-std,
author={Schulzrinne, H. and Casner, S. and Frederick, R. and Jacobson, V.},
type={Standard},
key={RFC 1889},
month={january},
year={1996},
title={{RTP: A Transport Protocol for Real-Time Applications}}
}
@techreport{dash-std-full,
author={DASH},
type={Standard},
key={ISO/IEC 23009-1:2014},
month={may},
year={2014},
title={{Information technology -- Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formats}}
}
@misc{dash-network-profiles,
author = {DASH Industry Forum},
TITLE = "Guidelines for implementation: {DASH-AVC/264} test cases and vectors",
YEAR = {2014},
HOWPUBLISHED = {http://dashif.org/guidelines/}
}
@inproceedings{bookmarks-impact,
author = {Forgione, Thomas and Carlier, Axel and Morin, G{\'e}raldine and Ooi, Wei Tsang and Charvillat, Vincent},
title = {Impact of 3D Bookmarks on Navigation and Streaming in a Networked Virtual Environment},
booktitle = {Proceedings of the 7th International Conference on Multimedia Systems},
series = {MMSys '16},
year = {2016},
isbn = {978-1-4503-4297-1},
location = {Klagenfurt, Austria},
pages = {9:1--9:10},
articleno = {9},
numpages = {10},
url = {http://doi.acm.org/10.1145/2910017.2910607},
doi = {10.1145/2910017.2910607},
acmid = {2910607},
publisher = {ACM},
address = {Klagenfurt, Austria},
keywords = {3D bookmarks, 3D navigation aid, 3D streaming, networked virtual environment, prefetching},
month = {May},
}
@inproceedings{dash-3d,
author = {Forgione, Thomas and Carlier, Axel and Morin, G{\'e}raldine and Ooi, Wei Tsang and Charvillat, Vincent and Yadav, Praveen Kumar},
title = {DASH for 3D Networked Virtual Environment},
year = {2018},
location = {Séoul, South Korea},
address = {Séoul, South Korea},
month = {October},
doi = {10.1145/3240508.3240701},
isbn = {978-1-4503-5665-7/18/10},
booktitle = {2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Korea}
}
@inproceedings{dash-3d-demo,
title={An Implementation of a DASH Client for Browsing Networked Virtual Environment},
author={Forgione, Thomas and Carlier, Axel and Morin, G{\'e}raldine and Ooi, Wei Tsang and Charvillat, Vincent and Yadav, Praveen Kumar},
year={2018},
location = {Séoul, South Korea},
address = {Séoul, South Korea},
month = {October},
doi = {10.1145/3240508.3241398},
isbn = {978-1-4503-5665-7},
booktitle = {2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Korea}
}
@inproceedings{dash-3d-bookmarks-demo,
title = {Using 3D Bookmarks for Desktop and Mobile DASH-3D Clients},
author = {Forgione, Thomas and Carlier, Axel and Morin, G{\'e}raldine and Ooi, Wei Tsang and Charvillat, Vincent},
year= {2019},
location = {Nice, France},
month = {October},
doi = {10.1145/1122445.1122456},
isbn = {978-1-4503-9999-9/18/06},
booktitle = {2019 ACM Multimedia Conference (MM '19), October 21--27, 2019, Nice, France}
}
@inproceedings{view-dependent-progressive-mesh,
title={Receiver-driven view-dependent streaming of progressive mesh},
author={Cheng, Wei and Ooi, Wei Tsang},
booktitle={Proceedings of the 18th International Workshop on Network and Operating Systems Support for Digital Audio and Video},
pages={9--14},
year={2008},
organization={ACM}
}
@article{gltf,
title={Gltf---The Runtime Asset Format for WebGL},
author={Robinet, Fabrice and Cozzi, P},
journal={OpenGL ES, and OpenGL},
year={2013}
}
@inproceedings{progressive-meshes,
title={Progressive meshes},
author={Hoppe, Hugues},
booktitle={Proceedings of the 23rd annual conference on Computer graphics and interactive techniques},
pages={99--108},
year={1996},
organization={ACM}
}
@inproceedings{video-bookmarks,
title={A Video Timeline with Bookmarks and Prefetch State for Faster Video Browsing},
author={Carlier, Axel and Charvillat, Vincent and Ooi, Wei Tsang},
booktitle={Proceedings of the 23rd Annual ACM Conference on Multimedia Conference},
pages={967--970},
year={2015},
month=oct,
address={Brisbane, Australia},
organization={ACM}
}
@inproceedings{zoomable-video,
author = {Carlier, Axel and Ravindra, Guntur and Ooi, Wei Tsang},
title = {Towards Characterizing Users' Interaction with Zoomable Video},
booktitle = {Proceedings of the 2010 ACM Workshop on Social, Adaptive and Personalized Multimedia Interaction and Access},
series = {SAPMIA '10},
year = {2010},
address = {Firenze, Italy},
pages = {21--24},
}
@article{user-generated-videos,
title={Watching user generated videos with prefetching},
author={Khemmarat, Samamon and Zhou, Renjie and Krishnappa, Dilip Kumar and Gao, Lixin and Zink, Michael},
journal={Signal Processing: Image Communication},
volume={27},
number={4},
pages={343--359},
year={2012},
publisher={Elsevier}
}
@inproceedings{video-navigation-mpd,
title={Optimizing hypervideo navigation using a {Markov} decision process approach},
author={Grigoras, Romulus and Charvillat, Vincent and Douze, Matthijs},
booktitle={Proceedings of the 10th ACM International Conference on Multimedia},
pages={39--48},
year={2002},
address={Juan les Pins, France},
}
@inproceedings{joserlin,
author = {Zhao, Zhen Wei and Ooi, Wei Tsang},
title = {Joserlin: Joint Request and Service Scheduling for Peer-to-peer Non-linear Media Access},
booktitle = {Proceedings of the 21st ACM International Conference on Multimedia},
series = {MM '13},
year = {2013},
month=oct,
address = {Barcelona, Spain},
pages = {303--312},
}
@article{survey-caching-prefetching,
title={A survey of Web caching and prefetching},
author={Ali, Waleed and Shamsuddin, Siti Mariyam and Ismail, Abdul Samad},
journal={International Journal of Advances in Soft Computing and its Application},
volume={3},
number={1},
pages={18--44},
year={2011},
}
@inproceedings{peer-texture-streaming,
title={Peer-assisted texture streaming in metaverses},
author={Liang, Ke and Zimmermann, Roger and Ooi, Wei Tsang},
booktitle={Proceedings of the 19th ACM International Conference on Multimedia},
pages={203--212},
year={2011},
address={Scottsdale, AZ},
organization={ACM}
}
@inproceedings{motion-prediction,
author = {Chan, Addison and Lau, Rynson W. H. and Ng, Beatrice},
title = {A Hybrid Motion Prediction Method for Caching and Prefetching in Distributed Virtual Environments},
booktitle = {Proceedings of the ACM Symposium on Virtual Reality Software and Technology},
series = {VRST '01},
year = {2001},
isbn = {1-58113-427-4},
address = {Baniff, Alberta, Canada},
pages = {135--142},
numpages = {8},
url = {http://doi.acm.org/10.1145/505008.505035},
doi = {10.1145/505008.505035},
acmid = {505035},
publisher = {ACM},
keywords = {3D navigation, caching, distributed virtual environments, motion prediction, prefetching, virtual walkthrough},
}
@article{walkthrough-ve,
title={A data management scheme for effective walkthrough in large-scale virtual environments},
author={Li, Tsai-Yen and Hsu, Wen-Hsiang},
journal={The Visual Computer},
volume={20},
number={10},
pages={624--634},
year={2004},
publisher={Springer}
}
@article{cyberwalk,
title={{CyberWalk}: a {Web}-based distributed virtual walkthrough environment},
author={Chim, Jimmy and Lau, Rynson WH and Leong, Hong Va and Si, Antonio},
journal={Multimedia, IEEE Transactions on},
volume={5},
number={4},
pages={503--515},
year={2003},
publisher={IEEE}
}
@article{prefetching-walkthrough-latency,
author = {Hung, Shao-Shin and Liu, Damon Shing-Min},
title = {Using Prefetching to Improve Walkthrough Latency: Research Articles},
journal = {Comput. Animat. Virtual Worlds},
volume = {17},
number = {3-4},
month = jul,
year = {2006},
issn = {1546-4261},
pages = {469--478},
numpages = {10},
url = {http://dx.doi.org/10.1002/cav.v17:3/4},
doi = {10.1002/cav.v17:3/4},
acmid = {1144489},
publisher = {John Wiley and Sons Ltd.},
address = {Chichester, UK},
keywords = {clustering, latency, mining, pattern growth, prefetching, walkthrough},
}
@inproceedings{caching-prefetching-dve,
title={Scalable data management using user-based caching and prefetching in distributed virtual environments},
author={Park, Sungju and Lee, Dongman and Lim, Mingyu and Yu, Chansu},
booktitle={Proceedings of the ACM Symposium on Virtual Reality Software and Technology},
pages={121--126},
year={2001},
month=11,
address={Banff, Canada}
}
@article{learning-user-access-patterns,
author={Zhong Zhou and Ke Chen and Jingchang Zhang},
journal={IEEE Transactions on Multimedia},
title={Efficient {3-D} Scene Prefetching From Learning User Access Patterns},
year={2015},
volume={17},
number={7},
pages={1081-1095},
doi={10.1109/TMM.2015.2430817},
ISSN={1520-9210},
month={July},
}
@inproceedings{remote-rendering-streaming,
title={Prediction-based prefetching for remote rendering streaming in mobile virtual environments},
author={Lazem, Shaimaa and Elteir, Marwa and Abdel-Hamid, Ayman and Gracanin, Denis},
booktitle={Signal Processing and Information Technology, 2007 IEEE International Symposium on},
pages={760--765},
year={2007},
organization={IEEE}
}
@inproceedings{prefetching-remote-walkthroughs,
title={Prefetching policies for remote walkthroughs},
author={Zach Christopher and Karner Konrad},
booktitle={The 10-th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision 2002 Conference proceedings,},
pages={153159},
year={2002},
}
@article{cache-remote-visualization,
author = {Robert Sisneros and Chad Jones and Jian Huang and Jinzhu Gao and Byung-Hoon Park and Nagiza Samatova},
title = {A Multi-Level Cache Model for Run-Time Optimization of Remote Visualization},
journal ={IEEE Transactions on Visualization and Computer Graphics},
volume = {13},
number = {5},
issn = {1077-2626},
year = {2007},
pages = {991-1003},
}
@article{interaction-3d-environment,
author = {Jacek Jankowski and Martin Hachet},
title = {Advances in Interaction with 3D Environments},
journal = {Comput. Graph. Forum},
volume = {34},
number = {1},
pages = {152--190},
year = {2015},
}
@inproceedings{controlled-movement-virtual-3d,
title={Rapid controlled movement through a virtual 3D workspace},
author={Mackinlay, Jock D and Card, Stuart K and Robertson, George G},
booktitle={ACM SIGGRAPH Computer Graphics},
volume={24},
number={4},
pages={171--176},
year={1990},
organization={ACM}
}
@inproceedings{two-pointer-input,
title={Two pointer input for 3D interaction},
author={Zeleznik, Robert C and Forsberg, Andrew S and Strauss, Paul S},
booktitle={Proceedings of the 1997 symposium on Interactive 3D graphics},
pages={115--120},
year={1997},
organization={ACM}
}
@inproceedings{drag-n-go,
title={Drag'n Go: Simple and fast navigation in virtual environment},
author={Moerman, Cl{\'e}ment and Marchal, Damien and Grisoni, Laurent},
booktitle={3D User Interfaces (3DUI), 2012 IEEE Symposium on},
pages={15--18},
year={2012},
organization={IEEE}
}
@article{visual-perception-3d,
title={The visual perception of 3D shape},
author={Todd, James T},
journal={Trends in cognitive sciences},
volume={8},
number={3},
pages={115--121},
year={2004},
publisher={Elsevier}
}
@inproceedings{showmotion,
title={ShowMotion: camera motion based 3D design review},
author={Burtnyk, Nicolas and Khan, Azam and Fitzmaurice, George and Kurtenbach, Gordon},
booktitle={Proceedings of the 2006 symposium on Interactive 3D graphics and games},
pages={167--174},
year={2006},
organization={ACM}
}
@inproceedings{dual-mode-ui,
title={A dual-mode user interface for accessing 3D content on the world wide web},
author={Jankowski, Jacek and Decker, Stefan},
booktitle={Proceedings of the 21st international conference on World Wide Web},
pages={1047--1056},
year={2012},
organization={ACM}
}
@inproceedings{browsing-3d-bookmarks,
author = {Serge Rezzonico and Daniel Thalmann},
title = {Browsing {3D} bookmarks in {BED}},
address = {San Francisco, California, USA},
booktitle = {Proceedings of WebNet 96 - World Conference of the Web Society},
month = {October},
year = {1996},
timestamp = {Thu, 20 Nov 2003 13:19:52 +0100},
biburl = {http://dblp.uni-trier.de/rec/bib/conf/webnet/RezzonicoT96},
bibsource = {dblp computer science bibliography, http://dblp.org}
}
@article{ve-hyperlinks,
title={The effects of hyperlinks on navigation in virtual environments},
author={Ruddle, Roy A and Howes, Andrew and Payne, Stephen J and Jones, Dylan M},
journal={International Journal of Human-Computer Studies},
volume={53},
number={4},
pages={551--581},
year={2000},
publisher={Elsevier}
}
@inproceedings{linking-behavior-ve,
title={Linking behavior in a virtual world environment},
author={Eno, Joshua and Gauch, Susan and Thompson, Craig W},
booktitle={Proceedings of the 15th International Conference on Web 3D Technology},
pages={157--164},
year={2010},
organization={ACM}
}
@inproceedings{navigation-aid-multi-floor,
title={Navigation aids for multi-floor virtual buildings: A comparative evaluation of two approaches},
author={Chittaro, Luca and Venkataraman, Subramanian},
booktitle={Proceedings of the ACM symposium on Virtual Reality Software and Technology},
pages={227--235},
year={2006},
organization={ACM}
}
@inproceedings{viewcube,
author = {Khan, Azam and Mordatch, Igor and Fitzmaurice, George and Matejka, Justin and Kurtenbach, Gordon},
title = {ViewCube: A 3D Orientation Indicator and Controller},
booktitle = {Proceedings of the 2008 Symposium on Interactive 3D Graphics and Games},
series = {I3D '08},
year = {2008},
pages = {17--25},
numpages = {9},
publisher = {ACM},
}
@inproceedings{location-pointing-navigation-aid,
author = {Luca Chittaro and Stefano Burigat},
title = {3D location-pointing as a navigation aid in Virtual Environments},
booktitle = {Proceedings of the working conference on Advanced visual interfaces, {AVI} 2004, Gallipoli, Italy, May 25-28, 2004},
pages = {267--274},
year = {2004},
}
@article{location-pointing-effect,
author = {Stefano Burigat and Luca Chittaro},
title = {Navigation in 3D virtual environments: Effects of user experience and location-pointing navigation aids},
journal = {International Journal of Man-Machine Studies},
volume = {65},
number = {11},
pages = {945--958},
year = {2007},
}
@misc{so-survey-2016,
title = {Stack Overflow Developer Survey Results},
year = {2016},
author = {},
howpublished = {\url{https://insights.stackoverflow.com/survey/2016#technology-most-loved-dreaded-and-wanted}}
}
@misc{so-survey-2017,
title = {Stack Overflow Developer Survey Results},
year = {2017},
author = {},
howpublished = {\url{https://insights.stackoverflow.com/survey/2017#technology-most-loved-dreaded-and-wanted}}
}
@misc{so-survey-2018,
title = {Stack Overflow Developer Survey Results},
year = {2018},
author = {},
howpublished = {\url{https://insights.stackoverflow.com/survey/2018#technology-most-loved-dreaded-and-wanted}}
}
@misc{so-survey-2019,
title = {Stack Overflow Developer Survey Results},
year = {2019},
author = {},
howpublished = {\url{https://insights.stackoverflow.com/survey/2019#technology-most-loved-dreaded-and-wanted}}
}
@misc{3d-tiles-10x,
name = "3D Tiles Blog",
title = {Up to 10x Faster 3D Tiles Streaming},
year = {2019},
author = {},
howpublished = {\url{https://cesium.com/blog/2019/05/07/faster-3d-tiles/}}
}
@inproceedings{urban-data-visualisation,
author = {Gaillard, J{\'e}r{\'e}my and Vienne, Alexandre and Baume, R{\'e}mi and Pedrinis, Fr{\'e}d{\'e}ric and Peytavie, Adrien and Gesqui\`{e}re, Gilles},
title = {Urban Data Visualisation in a Web Browser},
booktitle = {Proceedings of the 20th International Conference on 3D Web Technology},
series = {Web3D '15},
year = {2015},
isbn = {978-1-4503-3647-5},
address = {Heraklion, Crete, Greece},
pages = {81--88},
numpages = {8},
url = {http://doi.acm.org/10.1145/2775292.2775302},
doi = {10.1145/2775292.2775302},
acmid = {2775302},
publisher = {ACM},
keywords = {3D virtual city, WebGL, spatial information, standards},
}
@article{visibility-determination,
title={Quantitative analysis of visibility determinations for networked virtual environments},
author={Seo, Beomjoo and Zimmermann, Roger},
journal={Journal of Visual Communication and Image Representation},
volume={23},
number={5},
pages={705--718},
year={2012},
publisher={Elsevier}
}
@article{second-life,
title={Exploring {S}econd {L}ife},
author={Varvello, Matteo and Ferrari, Stefano and Biersack, Ernst and Diot, Christophe},
journal={IEEE/ACM Transactions on Networking (TON)},
volume={19},
number={1},
pages={80--91},
year={2011},
}
@inproceedings{3d-tiles,
author = {Schilling, Arne and Bolling, Jannes and Nagel, Claus},
title = {Using glTF for Streaming CityGML 3D City Models},
booktitle = {Proceedings of the 21st International Conference on Web3D Technology},
series = {Web3D '16},
year = {2016},
isbn = {978-1-4503-4428-9},
location = {Anaheim, California},
pages = {109--116},
numpages = {8},
url = {http://doi.acm.org/10.1145/2945292.2945312},
doi = {10.1145/2945292.2945312},
acmid = {2945312},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {3D city models, CityGML, WebGL, browser integration, geographic information system, web streaming},
}
@inproceedings{youtube-dash-analysis,
title={YouTubes DASH implementation analysis},
author={A{\~n}orga, Javier and Arrizabalaga, Saioa and Sedano, Beatriz and Alonso-Arce, Maykel and Mendizabal, Jaizki},
booktitle={19th International Conference on Circuits, Systems, Communications and Computers (CSCC)},
pages={61--66},
year={2015}
}
@inproceedings{batched-multi-triangulation,
title={Batched multi triangulation},
author={Cignoni, Paolo and Ganovelli, Fabio and Gobbetti, Enrico and Marton, Fabio and Ponchio, Federico and Scopigno, Roberto},
booktitle={VIS 05. IEEE Visualization, 2005.},
pages={207--214},
year={2005},
organization={IEEE}
}
@article{3dhop,
title={3DHOP: 3D heritage online presenter},
author={Potenziani, Marco and Callieri, Marco and Dellepiane, Matteo and Corsini, Massimiliano and Ponchio, Federico and Scopigno, Roberto},
journal={Computers \& Graphics},
volume={52},
pages={129--141},
year={2015},
publisher={Elsevier}
}
@inproceedings{progressive-compression-textured-meshes,
title={Cost-driven framework for progressive compression of textured meshes},
author={Portaneri, C{\'e}dric and Alliez, Pierre and Hemmer, Michael and Birklein, Lukas and Schoemer, Elmar},
booktitle={Proceedings of the 10th ACM Multimedia Systems Conference},
pages={175--188},
year={2019},
organization={ACM}
}
@article{zampoglou,
title={Adaptive streaming of complex Web 3D scenes based on the MPEG-DASH standard},
author={Zampoglou, Markos and Kapetanakis, Kostas and Stamoulias, Andreas and Malamos, Athanasios G and Panagiotakis, Spyros},
journal={Multimedia Tools and Applications},
volume={77},
number={1},
pages={125--148},
year={2018},
publisher={Springer}
}
@article{batex3,
title={Batex3: Bit allocation for progressive transmission of textured 3-d models},
author={Tian, Dihong and AlRegib, Ghassan},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={18},
number={1},
pages={23--35},
year={2008},
publisher={IEEE}
}
@article{visual-quality-assessment,
title={Subjective and objective visual quality assessment of textured 3D meshes},
author={Guo, Jinjiang and Vidal, Vincent and Cheng, Irene and Basu, Anup and Baskurt, Atilla and Lavoue, Guillaume},
journal={ACM Transactions on Applied Perception (TAP)},
volume={14},
number={2},
pages={11},
year={2017},
publisher={ACM}
}
@inproceedings{mesh-texture-multiplexing,
author = {Yang, Sheng and Lee, Chao-Hua and Kuo, C.-C. Jay},
title = {Optimized Mesh and Texture Multiplexing for Progressive Textured Model Transmission},
booktitle = {Proceedings of the 12th Annual ACM International Conference on Multimedia},
series = {MULTIMEDIA '04},
month = {Oct},
year = {2004},
isbn = {1-58113-893-8},
location = {New York, NY, USA},
pages = {676--683},
numpages = {8},
url = {http://doi.acm.org/10.1145/1027527.1027683},
doi = {10.1145/1027527.1027683},
acmid = {1027683},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {mesh-texture multiplexing, progressive transmission, rate-distortion surface}
}
@inproceedings{x3dom-scalable,
address = {Los Angeles, California},
author = {Behr, J. and Jung, Y. and Keil, J. and Drevensek, T. and Zoellner, M. and Eschler, P. and Fellner, D.},
booktitle = {Proceedings of the 15th International Conference on Web 3D Technology - Web3D '10},
location = {Los Angeles, California},
doi = {10.1145/1836049.1836077},
isbn = {9781450302098},
pages = {185-194},
publisher = {ACM},
title = {{A scalable architecture for the HTML5/X3D integration model X3DOM}},
url = {http://portal.acm.org/citation.cfm?doid=1836049.1836077},
month = {Jul},
year = {2010}
}
@inproceedings{pop-buffer,
title={The pop buffer: Rapid progressive clustering by geometry quantization},
author={Limper, Max and Jung, Yvonne and Behr, Johannes and Alexa, Marc},
booktitle={Computer Graphics Forum},
volume={32},
number={7},
pages={197--206},
year={2013},
organization={Wiley Online Library}
}
@inproceedings{streaming-compressed-webgl,
title={Streaming compressed 3D data on the web using JavaScript and WebGL},
author={Lavou{\'e}, Guillaume and Chevalier, Laurent and Dupont, Florent},
booktitle={Proceedings of the 18th international conference on 3D web technology},
pages={19--27},
year={2013},
organization={ACM}
}
@inproceedings{sideris2015mpeg,
title={MPEG-DASH users' QoE: The segment duration effect},
author={Sideris, Anargyros and Markakis, E and Zotos, Nikos and Pallis, Evangelos and Skianis, Charalabos},
booktitle={2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX)},
pages={1--6},
year={2015},
organization={IEEE}
}
@inproceedings{stohr2017sweet,
title={Where are the sweet spots?: A systematic approach to reproducible dash player comparisons},
author={Stohr, Denny and Fr{\"o}mmgen, Alexander and Rizk, Amr and Zink, Michael and Steinmetz, Ralf and Effelsberg, Wolfgang},
booktitle={Proceedings of the 25th ACM international conference on Multimedia},
pages={1113--1121},
year={2017},
organization={ACM}
}
@inproceedings{chiariotti2016online,
title={Online learning adaptation strategy for DASH clients},
author={Chiariotti, Federico and D'Aronco, Stefano and Toni, Laura and Frossard, Pascal},
booktitle={Proceedings of the 7th International Conference on Multimedia Systems},
pages={8},
year={2016},
organization={ACM}
}
@inproceedings{yadav2017quetra,
title={Quetra: A queuing theory approach to dash rate adaptation},
author={Yadav, Praveen Kumar and Shafiei, Arash and Ooi, Wei Tsang},
booktitle={Proceedings of the 25th ACM international conference on Multimedia},
pages={1130--1138},
year={2017},
organization={ACM}
}
@inproceedings{huang2019hindsight,
title={Hindsight: evaluate video bitrate adaptation at scale},
author={Huang, Te-Yuan and Ekanadham, Chaitanya and Berglund, Andrew J and Li, Zhi},
booktitle={Proceedings of the 10th ACM Multimedia Systems Conference},
pages={86--97},
year={2019},
organization={ACM}
}
@inproceedings{ozcinar2017viewport,
title={Viewport-aware adaptive 360 video streaming using tiles for virtual reality},
author={Ozcinar, Cagri and De Abreu, Ana and Smolic, Aljosa},
booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
pages={2174--2178},
year={2017},
organization={IEEE}
}
@inproceedings{simon2019streaming,
title={Streaming a Sequence of Textures for Adaptive 3D Scene Delivery},
author={Simon, Gwendal and Petrangeli, Stefano and Carr, Nathan and Swaminathan, Viswanathan},
booktitle={2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)},
pages={1159--1160},
year={2019},
organization={IEEE}
}
@article{maglo2013pomar,
title={POMAR: Compression of progressive oriented meshes accessible randomly},
author={Maglo, Adrien and Grimstead, Ian and Hudelot, C{\'e}line},
journal={Computers \& Graphics},
volume={37},
number={6},
pages={743--752},
year={2013},
publisher={Elsevier}
}
@article{bayazit20093,
title={3-D mesh geometry compression with set partitioning in the spectral domain},
author={Bayazit, Ulug and Konur, Umut and Ates, Hasan Fehmi},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={20},
number={2},
pages={179--188},
year={2009},
publisher={IEEE}
}
@inproceedings{mamou2010shape,
title={Shape approximation for efficient progressive mesh compression},
author={Mamou, Khaled and Dehais, Christophe and Chaieb, Faten and Ghorbel, Faouzi},
booktitle={2010 IEEE International Conference on Image Processing},
pages={3425--3428},
year={2010},
organization={IEEE}
}
@inproceedings{isenburg2006streaming,
title={Streaming compression of tetrahedral volume meshes},
author={Isenburg, Martin and Lindstrom, Peter and Gumhold, Stefan and Shewchuk, Jonathan},
booktitle={Proceedings of Graphics Interface 2006},
pages={115--121},
year={2006},
organization={Canadian Information Processing Society}
}
@article{courbet2010streaming,
title={Streaming compression of hexahedral meshes},
author={Courbet, Clement and Isenburg, Martin},
journal={The Visual Computer},
volume={26},
number={6-8},
pages={1113--1122},
year={2010},
publisher={Springer}
}
@inproceedings{gaillard2015urban,
title={Urban data visualisation in a web browser},
author={Gaillard, J{\'e}r{\'e}my and Vienne, Alexandre and Baume, R{\'e}mi and Pedrinis, Fr{\'e}d{\'e}ric and Peytavie, Adrien and Gesqui{\`e}re, Gilles},
booktitle={Proceedings of the 20th International Conference on 3D Web Technology},
pages={81--88},
year={2015},
organization={ACM}
}
@article{maglo20153d,
title={3d mesh compression: Survey, comparisons, and emerging trends},
author={Maglo, Adrien and Lavou{\'e}, Guillaume and Dupont, Florent and Hudelot, C{\'e}line},
journal={ACM Computing Surveys (CSUR)},
volume={47},
number={3},
pages={44},
year={2015},
publisher={ACM}
}
@inproceedings{portaneri2019cost,
title={Cost-driven framework for progressive compression of textured meshes},
author={Portaneri, C{\'e}dric and Alliez, Pierre and Hemmer, Michael and Birklein, Lukas and Schoemer, Elmar},
booktitle={Proceedings of the 10th ACM Multimedia Systems Conference},
pages={175--188},
year={2019},
organization={ACM}
}
@inproceedings{cheng2008receiver,
title={Receiver-driven view-dependent streaming of progressive mesh},
author={Cheng, Wei and Ooi, Wei Tsang},
booktitle={Proceedings of the 18th International Workshop on Network and Operating Systems Support for Digital Audio and Video},
pages={9--14},
year={2008},
organization={ACM}
}
@inproceedings{demir2016proceduralization,
title={Proceduralization for editing 3d architectural models},
author={Demir, Ilke and Aliaga, Daniel G and Benes, Bedrich},
booktitle={2016 Fourth International Conference on 3D Vision (3DV)},
pages={194--202},
year={2016},
organization={IEEE}
}
@article{demir2018guided,
title={Guided proceduralization: Optimizing geometry processing and grammar extraction for architectural models},
author={Demir, Ilke and Aliaga, Daniel G},
journal={Computers \& Graphics},
volume={74},
pages={257--267},
year={2018},
publisher={Elsevier}
}
@inproceedings{du2011tilt,
title={Tilt \& touch: Mobile phone for 3D interaction},
author={Du, Yuan and Ren, Haoyi and Pan, Gang and Li, Shjian},
booktitle={Proceedings of the 13th international conference on Ubiquitous computing},
pages={485--486},
year={2011},
organization={ACM}
}
@article{streaming-hlod,
title={Streaming HLODs: an out-of-core viewer for network visualization of huge polygon models},
author={Guthe, Michael and Klein, Reinhard},
journal={Computers \& Graphics},
volume={28},
number={1},
pages={43--50},
year={2004},
publisher={Elsevier}
}
@inproceedings{hlod,
title={HLODs for faster display of large static and dynamic environments},
author={Erikson, Carl and Manocha, Dinesh and Baxter III, William V},
booktitle={Proceedings of the 2001 symposium on Interactive 3D graphics},
pages={111--120},
year={2001},
organization={ACM}
}
@techreport{lod,
title={Real-time, continuous level of detail rendering of height fields},
author={Lindstrom, Peter and Koller, David and Ribarsky, William and Hodges, Larry F and Faust, Nick L and Turner, Gregory},
year={1996},
institution={Georgia Institute of Technology}
}
@article{game-on-demand,
title={Game-on-demand:: An online game engine based on geometry streaming},
author={Li, Frederick WB and Lau, Rynson WH and Kilis, Danny and Li, Lewis WF},
journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
volume={7},
number={3},
pages={19},
year={2011},
publisher={ACM}
}
@inproceedings{hoppe-lod,
title={Smooth view-dependent level-of-detail control and its application to terrain rendering},
author={Hoppe, Hugues},
booktitle={Proceedings Visualization'98 (Cat. No. 98CB36276)},
pages={35--42},
year={1998},
organization={IEEE}
}
@inproceedings{view-dependent-lod,
title={Streaming transmission of point-sampled geometry based on view-dependent level-of-detail},
author={Meng, Fang and Zha, Hongbin},
booktitle={Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings.},
pages={466--473},
year={2003},
organization={IEEE}
}
@inproceedings{mipmap-streaming,
title={Remote rendering of massively textured 3D scenes through progressive texture maps},
author={Marvie, Jean Eudes and Bouatouch, Kadi},
booktitle={The 3rd IASTED conference on Visualisation, Imaging and Image Processing},
volume={2},
pages={756--761},
year={2003}
}

View File

@ -17,4 +17,5 @@
if count {
pagebreak()
}
counter(heading).update(0)
}

View File

@ -292,16 +292,16 @@ This is typically the case of the video game #link("http://nolimitscoaster.com/"
Finally, most of the other interfaces give at least 5 degrees of freedom to the user: 3 being the coordinates of the camera's position, and 2 being the angles (assuming the up vector is unchangeable, some interfaces might allow that, giving a sixth degree of freedom).
The most common controls are the trackball controls where the user rotate the object like a ball
#link("https://threejs.org/examples/?q=controls\#misc_controls_trackball")[(live example here)] and the orbit controls, which behave like the trackball controls but preserving the up vector #link("https://threejs.org/examples/?q=controls\#misc_controls_orbit")[(live example here)].
#link("https://threejs.org/examples/?q=controls#misc_controls_trackball")[(live example here)] and the orbit controls, which behave like the trackball controls but preserving the up vector #link("https://threejs.org/examples/?q=controls#misc_controls_orbit")[(live example here)].
These types of controls are notably used on the popular mesh editor #link("http://www.meshlab.net/")[MeshLab] and
#link("https://sketchfab.com/")[SketchFab], the YouTube for 3D models.
#figure(
image("../assets/state-of-the-art/3d-interaction/meshlab.png", width: 80%),
image("../assets/related-work/3d-interaction/meshlab.png", width: 80%),
caption: [Screenshot of MeshLab],
)
Another popular way of controlling a free camera in a virtual environment is the first person controls #link("https://threejs.org/examples/?q=controls\#misc_controls_pointerlock")[(live example here)].
Another popular way of controlling a free camera in a virtual environment is the first person controls #link("https://threejs.org/examples/?q=controls#misc_controls_pointerlock")[(live example here)].
These controls are typically used in shooting video games, the mouse rotates the camera and the keyboard translates it.
== Relationship between interface, interaction and streaming

View File

@ -4,7 +4,7 @@
During the last years, 3D acquisition and modeling techniques have made tremendous progress.
Recent software uses 2D images from cameras to reconstruct 3D data, e.g.
#link("https://alicevision.org/\#meshroom")[Meshroom] is a free and open source software which got almost 200.000 downloads on #link("https://www.fosshub.com/Meshroom.html")[fosshub], which use _structure-from-motion_ and _multi-view-stereo_ to infer a 3D model.
#link("https://alicevision.org/#meshroom")[Meshroom] is a free and open source software which got almost 200.000 downloads on #link("https://www.fosshub.com/Meshroom.html")[fosshub], which use _structure-from-motion_ and _multi-view-stereo_ to infer a 3D model.
More and more devices are specifically built to harvest 3D data: for example, LIDAR (Light Detection And Ranging) can compute 3D distances by measuring time of flight of light. The recent research interest for autonomous vehicles allowed more companies to develop cheaper LIDARs, which increase the potential for new 3D content creation.
Thanks to these techniques, more and more 3D data become available.
These models have potential for multiple purposes, for example, they can be printed, which can reduce the production cost of some pieces of hardware or enable the creation of new objects, but most uses are based on visualization.

View File

@ -5,6 +5,11 @@
content
}
#show cite: content => {
set text(fill: blue)
content
}
// Code formatting
#show raw.where(block: true): it => {
set par(justify: false)
@ -95,6 +100,7 @@
// Content of the thesis
#pagebreak()
#set text(size: 11pt)
#set heading(numbering: "1.1")
#include "introduction/main.typ"
@ -107,3 +113,26 @@
#pagebreak()
#include "foreword/main.typ"
#pagebreak()
#include "related-work/main.typ"
// Bibliography
#pagebreak()
#bibliography("bib.bib", style: "chicago-author-date")
#pagebreak()
// Abstracts
#set text(size: 11pt)
#heading(level: 4, numbering: none)[Abstract]
#set text(size: 8pt)
#include "abstracts/en.typ"
#set text(size: 11pt)
#heading(level: 4, numbering: none)[Résumé]
#set text(size: 8pt)
#include "abstracts/fr.typ"

View File

@ -0,0 +1,48 @@
= 3D bookmarks and navigation aids
One of the uses for 3D streaming is to allow users interacting with the content while it is being downloaded.
However, devising an ergonomic technique for browsing 3D environments through a 2D interface is difficult.
Controlling the viewpoint in 3D (6 DOFs) with 2D devices is not only inherently challenging but also strongly task-dependent. In their review, #cite("interaction-3d-environment") distinguish between several types of camera movements: general movements for exploration (e.g., navigation with no explicit target), targeted movements (e.g., searching and/or examining a model in detail), specified trajectory (e.g., a cinematographic camera path).
For each type of movement, specialized 3D interaction techniques can be designed.
In most cases, rotating, panning, and zooming movements are required, and users are consequently forced to switch back and forth among several navigation modes, leading to interactions that are too complicated overall for a layperson.
Navigation aids and smart widgets are required and subject to research efforts both in 3D companies (see #link("https://sketchfab.com")[sketchfab.com], #link("https://cl3ver.com")[cl3ver.com] among others) and in academia, as reported below.
Translating and rotating the camera can be simply specified by a _lookat_ point.
This is often known as point-of-interest (POI) movement (or _go-to_, _fly-to_ interactions) #cite("controlled-movement-virtual-3d").
Given such a point, the camera automatically moves from its current position to a new position that looks at the POI.
One key issue of these techniques is to correctly orient the camera at destination.
In Unicam #cite("two-pointer-input"), the so-called click-to-focus strategy automatically chooses the destination viewpoint depending on 3D orientations around the contact point.
The more recent Drag'n Go interaction #cite("drag-n-go") also hits a destination point while offering control on speed and position along the camera path.
This 3D interaction is designed in the screen space (it is typically a mouse-based camera control), where cursor's movements are mapped to camera movements following the same direction as the on-screen optical-flow.
#figure(
image("../assets/related-work/3d-interaction/dragngo.png", width: 70%),
caption: [Screenshot of the drag'n go interface #cite("drag-n-go") (the percentage widget is for illustration)]
)
Some 3D browsers provide a viewpoint menu offering a choice of viewpoints #cite("visual-perception-3d", "showmotion").
Authors of 3D scenes can place several viewpoints (typically for each POI) in order to allow easy navigation for users, who can then easily navigate from viewpoint to viewpoint just by selecting a menu item.
Such viewpoints can be either static, or dynamically adapted: #cite("dual-mode-ui") report that users clearly prefer navigating in 3D using a menu with animated viewpoints than with static ones.
#figure(
image("../assets/related-work/3d-interaction/burtnyk.png", width: 70%),
caption: [Screenshot of an interface with menu for navigation #cite("showmotion")]
)
Early 3D VRML environments #cite("browsing-3d-bookmarks") offer 3D bookmarks with animated transitions between bookmarked views.
These transitions prevent disorientation since users see how they got there.
Hyperlinks can also ease rapid movements between distant viewpoints and naturally support non-linear and non-continuous access to 3D content.
Navigating with 3D hyperlinks is faster due to the instant motion, but can cause disorientation, as shown by the work of #cite("ve-hyperlinks").
#cite("linking-behavior-ve") examine explicit landmark links as well as implicit avatar-chosen links in Second Life.
These authors point out that linking is appreciated by users and that easing linking would likely result in a richer user experience.
#cite("dual-mode-ui") developed the Dual-Mode User Interface (DMUI) that coordinates and links hypertext to 3D graphics in order to access information in a 3D space.
#figure(
image("../assets/related-work/3d-interaction/dmui.png", width: 100%),
caption: [The two modes of DMUI #cite("dual-mode-ui")]
)
The use of in-scene 3D navigation widgets can also facilitate 3D navigation tasks.
#cite("navigation-aid-multi-floor") propose and evaluate 2D and 3D maps as navigation aids for complex virtual buildings and find that the 2D navigation aid outperforms the 3D one for searching tasks.
The ViewCube widget #cite("viewcube") serves as a proxy for the 3D scene and offers viewpoint switching between 26 views while clearly indicating associated 3D orientations.
Interactive 3D arrows that point to objects of interest have also been proposed as navigation aids in #cite("location-pointing-navigation-aid", "location-pointing-effect"): when clicked, the arrows transfer the viewpoint to the destination through a simulated walk or a faster flight.

View File

@ -0,0 +1,219 @@
= 3D streaming
In this thesis, we focus on the objective of delivering large, massive 3D scenes over the network.
While 3D streaming is not the most popular research field, there has been a special attention around 3D content compression, in particular progressive compression which can be considered a premise for 3D streaming.
In the next sections, we review the 3D streaming related work, from 3D compression and structuring to 3D interaction.
== Compression and structuring
According to #cite("maglo20153d"), mesh compression can be divided into four categories:
- single-rate mesh compression, seeking to reduce the size of a mesh;
- progressive mesh compression, encoding meshes in many levels of resolution that can be downloaded and rendered one after the other;
- random accessible mesh compression, where different parts of the models can be decoded in an arbitrary order;
- mesh sequence compression, compressing mesh animations.
Since our objective is to stream 3D static scenes, single-rate mesh and mesh sequence compressions are less interesting for us.
This section thus focuses on progressive meshes and random accessible mesh compression.
Progressive meshes were introduced in #cite("progressive-meshes") and allow a progressive transmission of a mesh by sending a low resolution mesh first, called _base mesh_, and then transmitting detail information that a client can use to increase the resolution.
To do so, an algorithm, called _decimation algorithm_, starts from the original full resolution mesh and iteratively removes vertices and faces by merging vertices through the so-called _edge collapse_ operation (Figure X).
// \begin{figure}[ht]
// \centering
// \begin{tikzpicture}[scale=2]
// \node (Top1) at (0.5, 1) {};
// \node (A) at (0, 0.8) {};
// \node (B) at (1, 0.9) {};
// \node (C) at (1.2, 0) {};
// \node (D) at (0.9, -0.8) {};
// \node (E) at (0.2, -0.9) {};
// \node (F) at (-0.2, 0) {};
// \node (G) at (0.5, 0.5) {};
// \node (H) at (0.6, -0.5) {};
// \node (Bottom1) at (0.5, -1) {};
//
// \node (Top2) at (3.5, 1) {};
// \node (A2) at (3, 0.8) {};
// \node (B2) at (4, 0.9) {};
// \node (C2) at (4.2, 0) {};
// \node (D2) at (3.9, -0.8) {};
// \node (E2) at (3.2, -0.9) {};
// \node (F2) at (2.8, 0) {};
// \node (G2) at (3.55, 0) {};
// \node (Bottom2) at (3.5, -1) {};
//
// \draw (A.center) -- (B.center) -- (C.center) -- (D.center) -- (E.center) -- (F.center) -- (A.center);
// \draw (A.center) -- (G.center);
// \draw (B.center) -- (G.center);
// \draw (C.center) -- (G.center);
// \draw (F.center) -- (G.center);
// \draw (C.center) -- (H.center);
// \draw (F.center) -- (H.center);
// \draw (E.center) -- (H.center);
// \draw (D.center) -- (H.center);
// \draw[color=red, line width=1mm] (G.center) -- (H.center);
//
// \draw (A2.center) -- (B2.center) -- (C2.center) -- (D2.center) -- (E2.center) -- (F2.center) -- (A2.center);
// \draw (A2.center) -- (G2.center);
// \draw (B2.center) -- (G2.center);
// \draw (C2.center) -- (G2.center);
// \draw (F2.center) -- (G2.center);
// \draw (E2.center) -- (G2.center);
// \draw (D2.center) -- (G2.center);
// \node at (G2) [circle,fill=red,inner sep=2pt]{};
//
// \draw[-{Latex[length=3mm]}] (Top1) to [out=30, in=150] (Top2);
// \draw[-{Latex[length=3mm]}] (Bottom2) to [out=-150, in=-30] (Bottom1);
//
// \node at (2, 1.75) {Edge collapse};
// \node at (2, -1.75) {Vertex split};
//
//
// \end{tikzpicture}
// \caption{Vertex split and edge collapse\label{sote:progressive-scheme}}
// \end{figure}
Every time two vertices are merged, a vertex and two faces are removed from the original mesh, decreasing the model resolution.
At the end of this content preparation phase, the mesh has been reorganized into a base mesh and a sequence of partially ordered edge split operations.
Thus, a client can start by downloading the base mesh, display it to the user, and keep downloading refinement operations (vertex splits) and display details as time goes by.
This process reduces the time a user has to wait before seeing a downloaded 3D object, thus increases the quality of experience.
#figure(
image("../assets/related-work/3d-streaming/progressivemesh.png", width: 100%),
caption: [Four levels of resolution of a mesh]
)
%These methods have been vastly researched #cite("bayazit20093", "mamou2010shape"), but very few of these methods can handle meshes with attributes, such as texture coordinates.
#cite("streaming-compressed-webgl") develop a dedicated progressive compression algorithm based on iterative decimation, for efficient decoding, in order to be usable on web clients.
With the same objective, #cite("pop-buffer") proposes pop buffer, a progressive compression method based on quantization that allows efficient decoding.
Following these, many approaches use multi triangulation, which creates mesh fragments at different levels of resolution and encodes the dependencies between fragments in a directed acyclic graph.
In #cite("batched-multi-triangulation"), the authors propose Nexus: a GPU optimized version of multi triangulation that pushes its performances to make real time rendering possible.
It is notably used in 3DHOP (3D Heritage Online Presenter, #cite("3dhop")), a framework to easily build web interfaces to present 3D objects to users in the context of cultural heritage.
Each of these approaches define its own compression and coding for a single mesh.
However, users are often interested in scenes that contain multiple meshes, and the need to structure content emerged.
To answer those issues, the Khronos group proposed a generic format called glTF (GL Transmission Format, #cite("gltf")) to handle all types of 3D content representations: point clouds, meshes, animated models, etc.
glTF is based on a JSON file, which encodes the structure of a scene of 3D objects.
It contains a scene graph with cameras, meshes, buffers, materials, textures and animations.
Although relevant for compression, transmission and in particular streaming, this standard does not yet consider view-dependent streaming, which is required for large scene remote visualization and which we address in our work.
== Viewpoint dependency
3D streaming means that content is downloaded while the user is interacting with the 3D object.
In terms of quality of experience, it is desirable that the downloaded content falls into the user's field of view.
This means that the progressive compression must encode a spatial information in order to allow the decoder to determine content adapted to its viewpoint.
This is typically called _random accessible mesh compression_.
#cite("maglo2013pomar") is such an example of random accessible progressive mesh compression.
#cite("cheng2008receiver") proposes a receiver driven way of achieving viewpoint dependency with progressive mesh: the client starts by downloading the base mesh, and from then is able to estimate the importance of the different vertex splits, in order to choose which ones to download.
Doing so drastically reduces the server computational load, since it only has to send data, and improves the scalability of this framework.
In the case of streaming a large 3D scene, view-dependent streaming is fundamental: a user will only be seeing one small portion of the scene at each time, and a system that does not adapt its streaming to the user's point of view is bound to induce a low quality of experience.
A simple way to implement viewpoint dependency is to request the content that is spatially close to the user's camera.
This approach, implemented in Second Life and several other NVEs (e.g., #cite("peer-texture-streaming")), only depends on the location of the avatar, not on its viewing direction.
It exploits spatial coherence and works well for any continuous movement of the user, including turning.
Once the set of objects that are likely to be accessed by the user is determined, the next question is in what order should these objects be retrieved.
A simple approach is to retrieve the objects based on distance: the spatial distance from the user's virtual location and rotational distance from the user's view.
More recently, Google integrated Google Earth 3D module into Google Maps (@rw:google-maps).
Users are now able to go to Google Maps, and click the 3D button which shifts the camera from the aerial view.
Even though there are no associated publications to support this assertion, it seems clear that the streaming is view-dependent: low resolution from the center of the point of view gets downloaded first, and higher resolution data gets downloaded for closer objects than for distant ones.
#figure(
image("../assets/related-work/3d-streaming/googlemaps.png", width: 80%),
caption: [Screeshot of the 3D interface of Google Maps]
)<rw:google-maps>
Other approaches use level of details.
Level of details have been initially used for efficient 3D rendering #cite("lod").
When the change from one level of detail to another is direct, it can create visual discomfort to the user.
This is called the _popping effect_ and level of details have the advantage of enabling techniques, such as geomorhping #cite("hoppe-lod"), to transition smoothly from one level of detail to another.
Level of details have then been used for 3D streaming.
For example, #cite("streaming-hlod") propose an out-of-core viewer for remote model visualization based by adapting hierarchical level of details #cite("hlod") to the context of 3D streaming.
Level of details can also be used to perform viewpoint dependant streaming, such as #cite("view-dependent-lod").
== Texture streaming
In order to increase the texture rendering speed, a common technique is the _mipmapping_ technique.
It consists in generating progressively lower resolutions of an initial texture.
Lower resolutions of the textures are used for polygons which are far away from the camera, and higher resolutions for polygons closer to the camera.
Not only this reduces the time needed to render the polygons, but it can also reduce the aliasing effect.
Using these lower resolutions can be especially interesting for streaming.
#cite("mipmap-streaming") proposes the PTM format which encode the mipmap levels of a texture that can be downloaded progressively, so that a lower resolution can be shown to the user while the higher resolutions are being downloaded.
Since 3D data can contain many textures, #cite("simon2019streaming") propose a way to stream a set of textures by encoding them into a video.
Each texture is segmented into tiles of a fixed size.
Those tiles are then ordered to minimize dissimilarities between consecutive tiles, and encoded as a video.
By benefiting from the video compression techniques, the authors are able to reach a better rate-distortion ratio than webp, which is the new standard for texture transmission, and jpeg.
== Geometry and textures
As discussed in Chapter~\ref{f:3d}, most 3D scenes consist in two main types of data: geometry and textures.
When addressing 3D streaming, one must handle the concurrency between geometry and textures, and the system needs to address this compromise.
Balancing between streaming of geometry and texture data is addressed by #cite("batex3"), #cite("visual-quality-assessment"), and #cite("mesh-texture-multiplexing").
Their approaches combine the distortion caused by having lower resolution meshes and textures into a single view independent metric.
#cite("progressive-compression-textured-meshes") also deals with the geometry / texture compromise.
This work designs a cost driven framework for 3D data compression, both in terms of geometry and textures.
The authors generate an atlas for textures that enables efficient compression and multi-resolution scheme.
All four works considered a single mesh, and have constraints on the types of meshes that they are able to compress.
Since the 3D scenes we are interested in in our work consist in soups of textured polygons, those constraints are not satisfied and we cannot use those techniques.
% All four works considered a single, manifold textured mesh model with progressive meshes, and are not applicable in our work since we deal with large and potentially non-manifold scenes.
== Streaming in game engines
In traditional video games, including online games, there is no requirement for 3D data streaming.
Video games either come with a physical support (CD, DVD, Blu-Ray) or they require the downloading of the game itself, which includes the 3D data, before letting the user play.
However, transferring data from the disk to the memory is already a form of streaming.
This is why optimized engines for video games use techniques that are reused for streaming such as level of details, to reduce the details of objects far away for the point of view and save the resources to enhance the level of detail of closer objects.
Some other online games, such as #link("https://secondlife.com")[Second Life], rely on user generated data, and thus are forced to send data from users to others.
In such scenarios, 3D streaming is appropriate and this is why the idea of streaming 3D content for video games has been investigated.
For example, #cite("game-on-demand") proposes an online game engine based on geometry streaming, that addresses the challenge of streaming 3D content at the same time as synchronization of the different players.
== NVE streaming frameworks
An example of NVE streaming framework is 3D Tiles #cite("3d-tiles"), which is a specification for visualizing massive 3D geospatial data developed by Cesium and built on top of glTF.
Their main goal is to display 3D objects on top of regular maps, and their visualization consists in a top-down view, whereas we seek to let users freely navigate in our scenes, whether it be flying over the scene or moving along the roads.
#figure(
image("../assets/related-work/3d-streaming/3dtiles.png", width: 80%),
caption: [Screenshot of 3D Tiles interface]
)
3D Tiles, as its name suggests, is based on a spacial partitionning of the scene.
It started with a regular octree, but has then been improved to a $k$-d tree (see Figure~\ref{sote:3d-tiles-partition}).
#grid(
columns:(1fr, 0.1fr, 1fr),
figure(
image("../assets/related-work/3d-streaming/3d-tiles-octree.png", width: 100%),
caption: [With regular octree (depth 4)]
),
[],
figure(
image("../assets/related-work/3d-streaming/3d-tiles-kd-tree.png", width: 100%),
caption: [With $k$-d tree (depth 4)]
)
)
In~\citeyear{3d-tiles-10x}, 3D Tiles streaming system was improved by preloading the data at the camera's next position when known in advance (with ideas that are similar to those we discuss and implement in Chapter~\ref{bi}, published in~\citeyear{bookmarks-impact}) and by ordering tile requests depending on the user's position (with ideas that are similar to those we discuss and implement in Chapter~\ref{d3}, published in~\citeyear{dash-3d}).
#cite("zampoglou") is another example of a streaming framework: it is the first paper that proposes to use DASH to stream 3D content.
In their work, the authors describe a system that allows users to access 3D content at multiple resolutions.
They organize the content, following DASH terminology, into periods, adaptation sets, representations and segments.
Their first adaptation set codes the tree structure of the scene graph.
Each further adaptation set contains both geometry and texture information and is available at different resolutions defined in a corresponding representation.
To avoid requests that would take too long and thus introduce latency, the representations are split into segments.
The authors discuss the optimal number of polygons that should be stored in a single segment.
On the one hand, using segments containing very few faces will induce many HTTP requests from the client, and will lead to poor streaming efficiency.
On the other hand, if segments contain too many faces, the time to load the segment is long and the system loses adaptability.
Their approach works well for several objects, but does not handle view-dependent streaming, which is desirable in the use case of large NVEs.

12
related-work/main.typ Normal file
View File

@ -0,0 +1,12 @@
#import "../chapter.typ"
#chapter.chapter[Related work]
In this chapter, we review the part of the state of the art on multimedia streaming and interaction that is relevant for this thesis.
As discussed in the previous chapter, video and 3D share many similarities and since there is already a very important body of work on video streaming, we start this chapter with a review of this domain with a particular focus on the DASH standard.
Then, we proceed with presenting topics related to 3D streaming, including compression and streaming, geometry and texture compromise, and viewpoint dependent streaming.
Finally, we end this chapter by reviewing the related work regarding 3D navigation and interfaces.
#include "video.typ"
#include "3d-streaming.typ"
#include "3d-interaction.typ"

95
related-work/video.typ Normal file
View File

@ -0,0 +1,95 @@
= Video
Accessing a remote video through the web has been a widely studied problem since the 1990s.
The Real-time Transport Protocol (RTP, #cite("rtp-std")) has been an early attempt to formalize audio and video streaming.
The protocol allowed data to be transferred unilaterally from a server to a client, and required the server to handle a separate session for each client.
In the following years, HTTP servers have become ubiquitous, and many industrial actors (Apple, Microsoft, Adobe, etc.) developed HTTP streaming systems to deliver multimedia content over the network.
In an effort to bring interoperability between all different actors, the MPEG group launched an initiative, which eventually became a standard known as DASH, Dynamic Adaptive Streaming over HTTP.
Using HTTP for multimedia streaming has many advantages over RTP.
While RTP is stateful (that is to say, it requires keeping track of every user along the streaming session), HTTP is stateless, and thus more efficient.
Furthermore, an HTTP server can easily be replicated at different geographical locations, allowing users to fetch data from the closest server.
This type of network architecture is called CDN (Content Delivery Network) and increases the speed of HTTP requests, making HTTP based multimedia streaming more efficient.
== DASH: the standard for video streaming
Dynamic Adaptive Streaming over HTTP (DASH), or MPEG-DASH #cite("dash-std", "dash-std-2") is now a widely deployed
standard for adaptively streaming video on the web #cite("dash-std-full"), made to be simple, scalable and inter-operable.
DASH describes guidelines to prepare and structure video content, in order to allow a great adaptability of the streaming without requiring any server side computation. The client should be able to make good decisions on what part of the content to download, only based on an estimation of the network constraints and on the information provided in a descriptive file: the MPD.
#heading(level: 3, numbering: none)[DASH structure]
All the content structure is described in a Media Presentation Description (MPD) file, written in the XML format.
This file has 4 layers: the periods, the adaptation sets, the representations, and the segments.
An MPD has a hierarchical structure, meaning it has multiple periods, and each period can have multiple adaptation sets, each adaptation set can have multiple representation, and each representation can have multiple segments.
#heading(level: 4, numbering: none)[Periods]
Periods are used to delimit content depending on time.
It can be used to delimit chapters, or to add advertisements that occur at the beginning, during or at the end of a video.
#heading(level: 4, numbering: none)[Adaptation sets]
Adaptation sets are used to delimit content according to the format.
Each adaptation set has a mime-type, and all the representations and segments that it contains share this mime-type.
In videos, most of the time, each period has at least one adaptation set containing the images, and one adaptation set containing the sound.
It may also have an adaptation set for subtitles.
#heading(level: 4, numbering: none)[Representations]
The representation level is the level DASH uses to offer the same content at different levels of quality.
For example, an adaptation set containing images has a representation for each available quality (it might be 480p, 720p, 1080p, etc.).
This allows a user to choose its representation and change it during the video, but most importantly, since the software is able to estimate its downloading speed based on the time it took to download data in the past, it is able to find the optimal representation, being the highest quality that the client can request without stalling.
#heading(level: 4, numbering: none)[Segments]
Until this level in the MPD, content has been divided but it is still far from being sufficiently divided to be streamed efficiently.
A representation of the images of a chapter of a movie is still a long video, and keeping such a big file is not possible since heavy files prevent streaming adaptability: if the user requests to change the quality of a video, the system would either have to wait until the file is totally downloaded, or cancel the request, making all the progress done unusable.
Segments are used to prevent this issue.
They typically encode files that contain two to ten seconds of video, and give the software a greater ability to dynamically adapt to the system.
If a user wants to seek somewhere else in the video, only one segment of data is potentially lost, and only one segment
of data needs to be downloaded for the playback to resume. The impact of the segment duration has been investigated in many work, including #cite("sideris2015mpeg", "stohr2017sweet").
For example, #cite("stohr2017sweet") discuss how the segment duration affects the streaming: short segments lower the initial delay and provide the best stalling quality of experience, but make the total downloading time of the video longer because of overhead.
#heading(level: 3, numbering: none)[Content preparation and server]
Encoding a video in DASH format consists in partitioning the content into periods, adaptation sets, representations and segments as explained above, and generating a Media Presentation Description file (MPD) which describes this organization.
Once the data are prepared, they can simply be hosted on a static HTTP server which does no computation other than serving files when it receives requests.
All the intelligence and the decision making is moved to the client side.
This is one of the DASH strengths: no powerful server is required, and since static HTTP server are mature and efficient, all DASH clients can benefit from it.
#heading(level: 3, numbering: none)[Client side adaptation]
A client typically starts by downloading the MPD file, and then proceeds on downloading segments from the different adaptation sets. While the standard describes well how to structure content on the server side, the client may be freely implemented to take into account the specificities of a given application.
The most important part of any implementation of a DASH client is called the adaptation logic. This component takes into account a set of parameters, such as network conditions (bandwidth, throughput, for example), buffer states or segments size to derive a decision on which segments should be downloaded next. Most of the industrial actors have their own
adaptation logic, and many more have been proposed in the literature.
A thorough review is beyond the scope of this state-of-the-art, but examples include #cite("chiariotti2016online") who formulate the problem in a reinforcement learning framework, #cite("yadav2017quetra") who formulate the problem using queuing theory, or #cite("huang2019hindsight") who use a formulation derived from the knapsack problem.
== DASH-SRD
Being now widely adopted in the context of video streaming, DASH has been adapted to various other contexts.
DASH-SRD (Spatial Relationship Description, #cite("dash-srd")) is a feature that extends the DASH standard to allow streaming only a spatial subpart of a video to a device.
It works by encoding a video at multiple resolutions, and tiling the highest resolutions as shown in Figure \ref{sota:srd-png}.
That way, a client can choose to download either the low resolution of the whole video or higher resolutions of a subpart of the video.
#figure(
image("../assets/related-work/video/srd.png", width: 60%),
caption: [DASH-SRD #cite("dash-srd")],
)
For each tile of the video, an adaptation set is declared in the MPD, and a supplemental property is defined in order to give the client information about the tile.
This supplemental property contains many elements, but the most important ones are the position ($x$ and $y$) and the size (width and height) describing the position of the tile in relation to the full video.
An example of such a property is given in @rw:srd-xml.
#figure(
align(left,
raw(
read("../assets/related-work/video/srd.xml"),
block: true,
lang: "xml",
),
),
caption: [MPD of a video encoded using DASH-SRD]
)<rw:srd-xml>
Essentially, this feature is a way of achieving view-dependent streaming, since the client only displays a part of the video and can avoid downloading content that will not be displayed.
While Figure \ref{sota:srd-png} illustrates how DASH-SRD can be used in the context of zoomable video streaming, the ideas developed in DASH-SRD have proven to be particularly useful in the context of 360 video streaming (see for example #cite("ozcinar2017viewport")).
This is especially interesting in the context of 3D streaming since we have this same pattern of a user viewing only a part of a content.