Benchmarks
The following key performance indicators will be measured in the ACAT project. The full specifications are shown in Deliverable 5.6: Performance index evaluation.
End-User-Oriented Benchmarks
Key Performance Indicator 1.1(a) - Setup time for execution of a known task
Description: Total setup time for an instruction with an existing ADT. This is the variation with the least complexity during the execution.
Measurement: Feed the ACAT system a new instruction and measure the setup time until task is ready for execution (measured in seconds).
Evaluation: This KPI was evaluated extensively during the M24 demo using the instruction “Take rotor cap from conveyor and put it on the fixture”. This instruction is similar to the “Pick up rotor cap from conveyor and place it on a fixture” since the main, primary and secondary objects are the same. Therefore, the system uses a stored ADT where all the details of the task were already taught including action chunks and visual and grasp poses of the rotor cap, the conveyor and the fixture. Thus, the setup time is minimum and the user is able to setup the system and start the execution in only 5 seconds. 
The execution part takes only 20 seconds as this video indicates. 
Specifically, for the evaluation of this KPI an experienced robot operator performed the task for 100 times. The deviation in the setup time was ±2 seconds and in the execution ±3 seconds.
Key Performance Indicator 1.1(b) - Setup time for execution of a semi-known task with unreliable pose inputs
Description: Total setup time for an instruction with an existing ADT but with re-calculation of the poses for some of the objects.
Measurement: Feed the ACAT system a new instruction and measure the setup time until task is ready for execution (measured in seconds).
Evaluation: This KPI was evaluated using the instruction “Take rotor cap from conveyor and put it on the fixture”. In this case the system is able to re-use the main, primary and secondary objects of the instruction and the related action chunks. However, the operator can choose to re-calculate the pose and grasp poses in order to verify their applicability. This re-calculation incorporates the use of a motion and grasp planner thus the movement of the robot is slower. Besides, the vision system requires extra time to perform another pose estimation of the object and the operator needs to teach a new place location. 
Therefore, the setup time is increased to 111 seconds as the video indicates. 
The task was performed 100 times. The deviation in the setup time was ±10 seconds due to the delays for the estimation of new object visual and grasp poses and specification of a new place location. The deviation of the execution was also 25±5 seconds.
Key Performance Indicator 1.1(c) - Setup time for execution of a semi-known task with unreliable grasp inputs
Description: Total setup time for an instruction with an existing ADT but with re-calculation of the grasp poses for some of the objects.
Measurement: Feed the ACAT system a new instruction and measure the setup time until task is ready for execution (measured in seconds).
Evaluation: This KPI was evaluated using the instruction “Take rotor cap from table and put it on the fixture”. In this case the system is able to identify that the main and secondary objects of the instruction are the same and pair an existing ADT from the instruction “pick up rotor cap from conveyor and place it on the fixture”. However, because the primary object is different, the existing visual and grasp poses were unreliable. As a result, the vision system requires extra time to perform another pose and grasp estimation of the object located in the new primary object and the operator needs to teach a new place location after that. 
Therefore, the setup time needed 126 seconds as this video indicates. 
The task performed another 100 times. A small number of setup efforts required 10 seconds less depending on the outcome of the motion and grasp planner and the availability of a new feasible trajectory. However, there were cases where the setup of the task required 160 – 180 seconds due to the delays of the extra calculations. Regardless the delays in the setup time of the task the deviation of the execution of the task was 40±5 seconds.
Key Performance Indicator 1.1(d) - Setup time for execution of a semi-known task with similar model information
Description: Total setup time for an instruction where only the model similarity is used from the ADTs.
Measurement: Feed the ACAT system a new instruction and measure the setup time until task is ready for execution (measured in seconds).
Evaluation: This KPI was evaluated using the instruction “Take metal bottle from conveyor and put it on the table”. In this case the system is able to identify that both the main and secondary objects of the instruction are different from the ones in existing ADTs. As a result, the existing visual and grasp poses were unreliable. As in the 1.1(c), the vision system requires extra time to perform another pose estimation of the object and the operator needs to teach a new place location. 
Therefore, the setup time needed 141 seconds as the video shows.
The task performed another 100 times. Similarly, a small number of setup efforts required significantly less time (-15 seconds) depending on the outcome of the motion and grasp planner. However, there were cases where the setup of the task required up to 2 minutes more in total due to the delays of the extra calculations and the extra restrictions introduced by the motion planner. The deviation of the execution of the task was 50±10 seconds.
Key Performance Indicator 1.2 - Robustness during setup
Description: Robustness of the system when processing a new instruction sheet.
Measurement: eed the ACAT system multiple new (unknown) instruction sheets and measure the percentage of successful established task sequences. A task sequence is successful if it achieves the specified goal of the task (end state).
Evaluation: This KPI was evaluated during the setup of the various instantiations of the instruction sheet presented before. As a new instruction we count the cases where either the main, primary or secondary objects were other than the ones in the stored ADTs. Therefore, the user needed to respond to many pop-up messages in the monitor and take decisions on how to proceed with the setup of the task sequence. In the cases where all the parameters of an instruction were unknown then the operator could feel confused and make mistakes. Hence, from a total of 300 trials the operator set up the task sequence successfully 95% of the time.
Key Performance Indicator 1.3 - Robustness during execution
Description: Robustness of the execution phase.
Measurement: During multiple executions of a task instantiated from an instruction sheet, measure the successful completions of the task.
Evaluation: During the participation in IROS exhibition we had the opportunity to set up the system and execute the same task for a prolonged period of time and major part of the duration of the exhibition. In total of 5 days we executed the same task continuously for almost 4 hours per day. Naturally, due to overload of the systems in the fair premises, there were hardware and systematic issues that caused breakdowns of the workstation (e.g., loss of power, loss of internet connection). However, these problems are not related to the robustness of the ACAT system and the average execution time for one repetition of the task was almost 60 seconds as we can notice in the video (https://goo.gl/e1mg5I). As a result, we can state that the robustness of the ACAT system is close to 99% since it executed the same task successfully for about 1190/1200 times.
Key Performance Indicator 1.4 - Cycle time during execution
Description: The cycle time of the task during execution.
Measurement: During multiple executions of a task instantiated from an instruction sheet, measure the mean cycle time. This could be compared to other task instantiation methods.
Evaluation: As mean cycle time in ACAT we define the combined time needed for specification, teaching and execution of a task. Since we use an ADT translator we are able to re-use data from stored ADTs and produce the specifications for the list of required skills. In case the task is new then the operator has to teach some parameters of the skills. Considering the new vision modules and the adaptation force – based methods we have integrated in the ACAT platform, large portion of the cycle time is spent in communication between systems. As a result, the mean cycle time in ACAT is 240 seconds when the main, primary and secondary objects are known and 4 skills are generated for an advanced pick and place task. Compared to the task instantiation method used in the project TAPAS (were the main integrator partner, AAU, was using the previous generation of Little Helper) with manual selection and specification of skills, the mean cycle time for a simple pick and place task with two skills was 191 seconds and for an advanced pick and place task with 5 skills was 400 seconds. So, we can safely conclude that we benefit significantly from the application of the ACAT system.
Key Performance Indicator 1.5 – Training time required
Description: Operators with different levels of robot expertise will try to setup the robot for the Rotor Cap Collection benchmark. Non-robot experts, robot experts unfamiliar with ACAT and robot experts within the ACAT consortium will be asked to setup the robot after a brief training session.
Measurement: The training time elapsed for the setup of a known task will be measured in seconds.
Evaluation: As we can see in the analysis of the questionnaire (see Appendix) the participants had different levels of expertise ranging from fundamental awareness to recognized authority. All of them received the same introduction to the system in the beginning that lasted 04:30 minutes. Naturally, the required time to complete the same task was varying as well in relation to the robot expertise. The shortest time to complete the task was 223 seconds (03:43 mins) from a user with Expert knowledge and the longest was 448 seconds (07:28 mins) from someone with basic knowledge of robotics. The average time for all participants was 313 seconds (05:13 mins).
Key Performance Indicator 1.6 – Demonstration efficiency
Description: How many times the operators asked for help or guidelines during the setup of the robot after the training.
Measurement: Ratio of successful setup efforts and number of extra guidelines required.
Evaluation: All the test users were allowed to ask for help while programming the task. As an average of all participants they asked 1.8 times for extra guidance during the programming of the task. As a ratio we can state that the demonstration was 67% efficient since all participants managed to set up the task successfully but still asked some extra guidance.
Key Performance Indicator 1.7 – Robot expertise (user friendliness)
Description: The combination of training time, the setup time and the demonstration efficiency define the degree of robot expertise required from the operator in order to use the demonstrator successfully.
Measurement: Measured as a total indicator of user-friendliness of the robot.
Evaluation: Results from the questionnaire reveal that the participants generally found the approach quite intuitive. Especially the ability to write an instruction and accelerate the setup of the task along with the easy-to-use GUI. The vast majority of users were satisfied with the ease of completing the task of the test giving a grade of 6.5 in a scale from 1 to 7. Moreover, they found the amount of time needed to complete the task also satisfactory since they rate 75% of them rated with 7. Overall, they were also very satisfied with the support information shown on the screen during the execution since 92% of them graded with 6.5 out of 7. According to the general comments we can verify that there were some minor details that caused confusion during the setup but this fact did not discourage the participants to conclude that the system is quite intuitive and that they clearly prefer to program it through text instructions instead of manually selecting the necessary skills.
Language-Oriented Benchmarks
Key Performance Indicator 2.1 – Linguistic action ontology
Description: Number of action verbs in the ontology and number of synsets.
Measurement: Determined by the number of action verbs and synsets available in the process memory by the end of the project.
Evaluation: Ontology contains 322 action verbs, organized into 189 synsets.
Key Performance Indicator 2.2 – Object categories
Description: Number of object categories saved in the process memory.
Measurement: Determined by the number of object categories in the linguistic object ontology and the number of associated object images/models available to aid recognition and robotic manipulation of those objects.
Evaluation: Ontology contains 304 object categories (synsets). 66 object categories has associated images/models.
Key Performance Indicator 2.3 – Number of action grounding instances
Description: Number of robot execution/control instances stored in the process memory.
Measurement: Determined by the number of robot execution/control instances stored in the process memory by the end of the project.
Evaluation: 53 ADTs
Key Performance Indicator 2.4 – Action categories
Description: Number of action categories saved in the process memory.
Measurement: Determined by the number of action categories available in the process memory by the end of the project.
Evaluation: 24 action categories are defined in the ontology.
Basic Research-Oriented Benchmarks
Key Performance Indicator 4.1 - Causal relations correctly understood
Key Performance Indicator 4.2 - Vague quantities: Inferring vaguely formulated quantities
Key Performance Indicator 4.3 - Missing objects: Inferring missing objects and roles
Key Performance Indicator 4.4 - Disambiguation: Inferring correct meanings of ambiguous words

 

