Intention from Motion Dataset

IfM problem

We present here a dataset for tackling a challenging problem in computer vision community, namely Intention from Motion. The problem of the prediction of human intentions, is defined as the overarching goal embedded in an action sequence. In our work the only exploitable information is conveyed by the kinematics, being the surrounding context totally uninformative. We want to analyse the movement of an apparently unrelated action (actually embedding the intention from the very beginning), the same for all intentions, capturing those subtle motion patterns which are anticipative of different future actions. For a detailed description of the problem and the dataset, please refer to the paper [PDF]



Seventeen naive volunteers were seated beside a 110 X 100 cm table resting on it elbow, wrist and hand inside a fixed tape-marked starting point. A glass bottle was positioned on the table at a distance of about 46 cm and participants were asked to grasp it in order to perform one of the following 4 different intentions.

  1. Pouring some water into a small glass (diameter 5 cm; height 8.5 cm) positioned on the left side of the bottle, at 25 cm from it.
  2. Passing the bottle to a co-experimenter seating opposite the table.
  3. Drinking some water from the bottle.
  4. Placing the bottle in a cardboard 17 X 17 X 12.5 box positioned on the same table, 25 cm distant.

After a practice session, each subject performed 20 trials per intention. We completely removed trials judged imprecise and the final dataset includes 1098 trial (253 for pouring, 262 for passing, 300 for drinking and 283 for placing). For each execution, both 3D and video data have been collected. 3D marker trajectories and video sequences are acquired from the moment when the hand starts from a stable fixed position up to the reaching of the object. Both are exactly trimmed when the hand grasps the bottle, removing the following part.

Since the same action can be performed with different intentions, the goal is to predict the latter from the former using the kinematics only.

3D marker 3D kinematic data. Near-infrared 100 Hz VICON system was used to track the hand kinematics. Nine cameras were placed in the experimental room and each participant’s right hand was outfitted with 20 lightweight retro-reflective hemispheric markers. After data collection, each trial was individually inspected for correct marker identification and then run through a low-pass Butterworth filter with a 6 Hz cutoff. Globally, each trial is represented with a set of 3D points describing the trajectory covered by every single marker during execution phase.

2D video

2D video sequences. Movements were filmed from a lateral viewpoint using a fixed digital video camera (Sony Handycam 3-D) placed at about 120 cm from hand start position. The view angle is directed perpendicularly to the agent’s midline, in order to ensure that the hand and the bottle were fully visible from the beginning up to the end of the movement. The video camera was positioned in a way that neither the glass (Pouring), nor the co-experimenter (Passing), nor the box (Placing) were filmed.


The complete dataset (3D+2D) can be downloaded Here



  • A. Zunino, J. Cavazza, A. Koul, A. Cavallo, C. Becchio, V. Murino
    "Intention from Motion"
    rXiv preprint arXiv:1605.09526, 2016 [PDF]