Bøger af Cha Zhang
-
320,95 kr. Light field is one of the most representative image-based rendering techniques that generate novel virtual views from images instead of 3D models. The light field capture and rendering process can be considered as a procedure of sampling the light rays in the space and interpolating those in novel views. As a result, light field can be studied as a high-dimensional signal sampling problem, which has attracted a lot of research interest and become a convergence point between computer graphics and signal processing, and even computer vision. This lecture focuses on answering two questions regarding light field sampling, namely how many images are needed for a light field, and if such number is limited, where we should capture them. The book can be divided into three parts. First, we give a complete analysis on uniform sampling of IBR data. By introducing the surface plenoptic function, we are able to analyze the Fourier spectrum of non-Lambertian and occluded scenes. Given the spectrum, we also apply the generalized sampling theorem on the IBR data, which results in better rendering quality than rectangular sampling for complex scenes. Such uniform sampling analysis provides general guidelines on how the images in IBR should be taken. For instance, it shows that non-Lambertian and occluded scenes often require a higher sampling rate. Next, we describe a very general sampling framework named freeform sampling. Freeform sampling handles three kinds of problems: sample reduction, minimum sampling rate to meet an error requirement, and minimization of reconstruction error given a fixed number of samples. When the to-be-reconstructed function values are unknown, freeform sampling becomes active sampling. Algorithms of active sampling are developed for light field and show better results than the traditional uniform sampling approach. Third, we present a self-reconfigurable camera array that we developed, which features a very efficient algorithm for real-time rendering and the ability of automatically reconfiguring the cameras to improve the rendering quality. Both are based on active sampling. Our camera array is able to render dynamic scenes interactively at high quality. To the best of our knowledge, it is the first camera array that can reconfigure the camera positions automatically.
- Bog
- 320,95 kr.
-
388,95 kr. Face detection, because of its vast array of applications, is one of the most active research areas in computer vision. In this book, we review various approaches to face detection developed in the past decade, with more emphasis on boosting-based learning algorithms. We then present a series of algorithms that are empowered by the statistical view of boosting and the concept of multiple instance learning. We start by describing a boosting learning framework that is capable to handle billions of training examples. It differs from traditional bootstrapping schemes in that no intermediate thresholds need to be set during training, yet the total number of negative examples used for feature selection remains constant and focused (on the poor performing ones). A multiple instance pruning scheme is then adopted to set the intermediate thresholds after boosting learning. This algorithm generates detectors that are both fast and accurate. We then present two multiple instance learning schemes for face detection, multiple instance learning boosting (MILBoost) and winner-take-all multiple category boosting (WTA-McBoost). MILBoost addresses the uncertainty in accurately pinpointing the location of the object being detected, while WTA-McBoost addresses the uncertainty in determining the most appropriate subcategory label for multiview object detection. Both schemes can resolve the ambiguity of the labeling process and reduce outliers during training, which leads to improved detector performances. In many applications, a detector trained with generic data sets may not perform optimally in a new environment. We propose detection adaption, which is a promising solution for this problem. We present an adaptation scheme based on the Taylor expansion of the boosting learning objective function, and we propose to store the second order statistics of the generic training data for future adaptation. We show that with a small amount of labeled data in the new environment, the detector's performance can be greatly improved. We also present two interesting applications where boosting learning was applied successfully. The first application is face verification for filtering and ranking image/video search results on celebrities. We present boosted multi-task learning (MTL), yet another boosting learning algorithm that extends MILBoost with a graphical model. Since the available number of training images for each celebrity may be limited, learning individual classifiers for each person may cause overfitting. MTL jointly learns classifiers for multiple people by sharing a few boosting classifiers in order to avoid overfitting. The second application addresses the need of speaker detection in conference rooms. The goal is to find who is speaking, given a microphone array and a panoramic video of the room. We show that by combining audio and visual features in a boosting framework, we can determine the speaker's position very accurately. Finally, we offer our thoughts on future directions for face detection. Table of Contents: A Brief Survey of the Face Detection Literature / Cascade-based Real-Time Face Detection / Multiple Instance Learning for Face Detection / Detector Adaptation / Other Applications / Conclusions and Future Work
- Bog
- 388,95 kr.