Jan 27th, 2014 by Federico Lois

Compartir en:


Hi there!  We’re pleased to announce the release of our first Computer Vision Dossier.

We suggest that you bookmark each dossier so you can use it whenever you need it for essential documents, papers and articles on technologies that will shape the future of the digital world.

Enjoy! And while you’re at it, give your knowledge a boost.



Rapid Object Detection Using a Boosted Cascade of Simple Features (2001)

Paul Viola and Michael Jones
Why is this here? This paper is one of the most cited works in face recognition literature and not without reason. The methodology of using chained weak classifiers has been used for many of the most successful advances in the field. From a practical point of view, the ideas presented in this paper have very important implications for the design and implementation of classification algorithms.


Random Forests (2001)

Leo Breiman
Why is this here? Along with the Viola-Jones approach, the random forest based algorithms spearheaded by Leo Breiman have very important implications for practical object recognition tasks. They are designed to handle high-variance data and regression tasks. One of the most important features is that random trees can be trained very quickly and therefore, they are ideal for iterating and designing the classification pipeline. The faster you can iterate, the faster you can accept or reject hypotheses. Need I say more? 


Study and Comparison of Various Image Edge Detection Techniques (2009)

Raman Maini and Dr. Himanshu Aggarwal
Why this is here? Sooner or later, most of the feature descriptors are going to rely on the morphological features of data. Even deep learning approaches learn to detect edges without requiring user supervision. So understanding edge detection techniques is a must for anyone who truly takes computer vision seriously.  

Feature Descriptors


fastHOG – A Real-Time GPU Implementation of HOG (2009)

Victor A. Prisacariu and Ian Reid
Why this is here? Along with SIFT and SURF, Histograms of Gradients are one of the most successful descriptors used in world-class object detectors. They have proven very good for dealing with pedestrian and face detection; however, the computational complexity is very high. A real-time GPU implementation of HOGs is essential to streamline the detection pipeline.    


The Blessing of Dimensionality: High-Dimensional Feature and its Efficient Compression for Face Verification (2013)

Dong Chen, Xudong Cao, Fang Wen and Jian Sun
Why this is here? The curse of dimensionality plagues every single machine learning algorithm. In the last couple of years algorithms designers are looking into ways to embrace high-dimensional data which often breaks the ability of making accurate predictions. As far as I know this is the first method to actually accomplish that in a computationally reasonable way and also general enough to be used in other domains.

Deep Learning


ImageNet Classification with Deep Convolutional Neural Networks (2012)

Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton
Why this is here? Deep convolutional networks are inspired by the arrangement of cells within the visual cortex. Around 10 years before deep learning become trendy, convolutional neural networks were the first successful story of deep neural networks.  Since Hinton’s work on deep belief networks, deep learning approaches have taken the image pattern recognition field by storm. Deep learning algorithms are the current state of the art in the field of computer vision, and a sure bet, given that the big names in the computer science industry like Google, Yahoo and Facebook are on a buying spree of startups which showcase this kind of technology.


Learning to Align from Scratch (2012)

Gary B. Huang, Marwan A. Mattar, Honglak Lee and Erik Learned-Miller
Why this is here? This is a follow up paper. The original paper “Unsupervised Joint Alignment of Complex Images” used an EM algorithm to achieve great alignment results. At a glance, the algorithm looks pretty CPU intensive, but after careful consideration its GPU implementation is very efficient. The practical issue with the original algorithm is that it learns from what it sees. If you provide only cars, it will align cars pretty well. However, with imperfect detectors, you end up in a Catch 22 situation.
In this paper, the addition of an unsupervised feature learning step based on RBMs (restricted Boltzmann machines) and topographic filters for penalty calculation would help in that regard. Given that the algorithm core hasn’t changed much and the RBMs can be implemented very efficiently on the GPU, the practical implications of this paper should not be underestimated.


Previous Dossier #2: Machine Learning Series – Introduction and a bit more

  • Federico Lois

    Investigación y Desarrollo Tecnológico - Fundador

    Especializado en Arquitecturas de Integración y Diseño de Aplicaciones en organizaciones de tecnología. Su interés por los desafíos lo ha llevado a transitar por caminos como el desarrollo de motores 3D, el análisis de imágenes utilizando hardware gráfico y "beWeeVee", un framework de desarrollo de aplicaciones co-operativas. Fanático de l

Dejar un comentario