Autism Spectrum Disorder Detection Dataset


We propose here a new video dataset consisting in a set of video clips of reach-to-grasp actions performed by children with Autism Spectrum Disorders (ASD) and IQ-matched typically developing (TD) children. Children of the two groups were asked to grasp a bottle, in order to perform four different subsequent actions (placing, pouring, passing to pour, and passing to place). Motivated by recent studies in psychology and neuroscience, we attempt to classify whether actions are performed by a TD or an ASD child, by only processing the part of video data recording the grasping gesture. In our work the only exploitable information is conveyed by the kinematics, being the surrounding context totally uninformative. For a detailed description of the problem and the dataset, please refer to the paper.



Twenty children with ASD (18 males) without accompanying intellectual impairment and twenty typically developing children (TD group: 16 males) were recruited from the Child neuropsychiatry Unit of the IRCCS Giannina Gaslini Hospital and primary schools in Genova.

Children were seated on a height-adjustable chair, with their right elbow and wrist resting on a table (height = 64 cm; length = 100 cm; width = 60 cm). A plastic bottle filled with water (base diameter= 5 cm; height = 18 cm; weight = 225 g) was positioned on the table at a distance of 44 cm from children's midline. Children were instructed to reach towards and grasp an object (a bottle) to place it into a box (grasp-to-place), to pour some water into a glass (grasp-to-pour), or to pass the bottle to a co-actor (grasp-to-pass), who would then either place the bottle into the box (pass-to-place) or pour some water (pass-to-pour). Children performed a series of 12 consecutive grasps for each condition, making a total of 48 movements. On each trial, children were asked to perform the movement at a natural speed after an auditory tone.

Movements were filmed from a lateral viewpoint using a Vicon VUE video camera (resolution: 1280 x 720 pixels, 100 frames/sec). The video sequences are exactly trimmed at the instant when the hand grasps the bottle, removing the following part. The videos result very short, the average length is 83 frames. We discard the corrupted acquisitions, collecting a final dataset based on 1837 video sequences.


How to get the dataset

To obtain this dataset, we ask you to complete, sign and return the form below. After that, we will send you the credentials to download it. Note that the dataset is available only for research purposes.


author = {Andrea Zunino and Pietro Morerio and Andrea Cavallo and Caterina Ansuini and Jessica Podda and Francesca Battaglia and Edvige Veneselli and Cristina Becchio and Vittorio Murino},
title = {Video Gesture Analysis for Autism Spectrum Disorder Detection},
booktitle = {International Conference on Pattern Recognition (ICPR)},
year = {2018},



Code is available at For code-related questions please browse existing issues or open a new one on the Github page.





  • A. Zunino, P. Morerio, A. Cavallo, C. Ansuini, J. Podda, F. Battaglia, E. Veneselli, C. Becchio, V. Murino
    "Video Gesture Analysis for Autism Spectrum Disorder Detection"
    In International Conference on Pattern Recognition, (ICPR), 2018