Visual identification of the target port (GT-denied policy)

Hi guys,

I may be missing something, but none of the documentation, the forum and the issues on Github seem to answer this simple question: How is our GT-denied policy to visually identify the target port?

What we know:

  1. Ground truth gives access to the tf between gripper tip and target port because the target port’s name in the task description has a leading target_ prefix. A blind cheater can always insert the cable.

  2. The task description for GT-denied evaluation has the target identified as a string name (per documentation).

  3. There can be multiple ports of the same type and only one is a target (per documentation).

  4. The target port will always be visible to the cameras (per documentation).

The question above can be broken down into a set sub-questions, of which the following are a sample:

  1. The target is always visible:

    1. To which camera(s)?

    2. Are the confounders also visible?

  2. Is the disambiguation between target and confounders geometric? Is the target “more” visible (occupies a larger part of the visual field, or is more centrally situated) than the other same-type ports?

  3. Is the disambiguation spatial? Is the target “closer” to the gripper tip?

  4. Is the disambiguation in the simulator environment? Is the name of the target somehow associated with the URDF/SDF object in the simulation?

This seems to me to be a make-or-break issue, if one is to create a dataset that is not “out-of-band”, so to speak.

Please, advise or comment!

Cheers and good luck!

The perception part to identify target ports is part of this challenge! Anyone correct me if I’m wrong but I’m pretty sure it is :slightly_smiling_face:

Only if there are no confounder (“false”) targets of the same type. Generating intent from vision with no criteria should be out-of-scope for this (phase of the) competition. For example, if you have 5 cables of the same plug type and a row of ports for them, how do you identify which one goes where just by looking at them? My two cents.

Yeah. Though multiple NIC cards may be spawned… so if only one target port of the right type is in sight, the camera will have to be pretty close to not see the others.

I agree with you that it would make sense, though perhaps it’s also just part of the challenge :sweat_smile: I think we won’t get a definite answer to this though.

They are clearly numbered; if your solution can perceive the orientation and position of the cards and ports on the board, you can infer the port number from the qualification phase description.

1 Like