Recently, approaches utilizing spatial-temporal features to form Bag-of-Words BoWs models have achieved great success due to their simplicity and effectiveness. But they still have difficulties when distinguishing between actions with high inter-ambiguity. The main reason is that they describe actions by orderless bag of features, and ignore the spatial and temporal structure information of visual words.

In order to improve classification performance, we present a novel approach called sequential Bag-of-Words. It captures temporal sequential structure by segmenting the entire action into sub-actions. Meanwhile, we pay more attention to the distinguishing parts of an action by classifying sub-actions separately, which is then employed to vote for the final result. Extensive experiments are conducted on challenging datasets and real scenes to evaluate our method.

Concretely, we compare our results to some state-of-the-art classification approaches and confirm the advantages of our approach to distinguish similar actions. Results show that our approach is robust and outperforms most existing BoWs based classification approaches, especially on complex datasets with interactive activities, cluttered backgrounds and inter-class action ambiguities.

Hong Liu received the Ph. D in computer science and technology in Tsinghua University, China. He is also the Director of Open Lab on Human Robot Interaction, PKU, his research fields include computer vision and robotics, image processing, and pattern recognition.

Hao Tang received the B. S in computer science and technology in Peking University, China. His current research interests are image classification, hand gesture recognition, gender recognition, image retrieval, action recognition and deep learning. Wei Xiao received his Ph. D in computer science and technology in Tsinghua University, China. His research interests include computer vision and HRI. Ziyi Guo received her B. S in computer science and technology in Peking University, China. Her research interests include interactive media technologies, interaction design and human action recognition.

Her research interest is mainly about human action recognition. Yuan Gao received the B. S in computer science and technology in Peking University, China. Then he obtained the M. S in computer science and technology in Peking University, China. His research interests include object detection, 3D reconstruction, facial expression and gender recognition.

