Representation of proprioception observation space and action space (joint position / velocity, cartesian position / velocity) #302
Replies: 2 comments 5 replies
-
|
Hi Haoming, We don't actually do any action or state space unification. The actions and state are simply zero-padded to the maximum size in any dataset, which is 32 dimensions. As you pointed out, our internal data all uses joint positions, whereas OXE mostly uses end-effector position, and DROID uses joint velocity. Does this affect performance? Maybe, maybe not... we're still figuring that out! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @kvablack could you tell us how the conversion is done at the end from the "rot6d" conversion to a actual proper rotation representation (removing the possible model noise), either into a quaternion or a S03 matrix ? There see to be several approach possible and this seem important in the execution. cf this chatgpt answer: When mapping a 6D vector (two 3D vectors) to a quaternion, several methods have been proposed. Here’s an outline of the main approaches along with their pros and cons:
How It Works: Pros: Cons:
How It Works: Pros: Cons:
How It Works: Pros: Cons:
How It Works: Pros: Cons: Which Method Is Best? For most machine learning applications—especially when integrated into an end-to-end differentiable pipeline—the standard Gram–Schmidt orthogonalization followed by a rotation matrix to quaternion conversion is considered the best trade-off. It is: While SVD-based methods or direct least-squares formulations may offer marginal improvements in robustness under heavy noise, they tend to be more computationally expensive and complex to differentiate. Modified orthogonalization techniques can improve stability further but are usually not necessary if the standard method is implemented with proper normalization. In summary, normalization is essential in all these methods to ensure that the resulting rotation matrix (and thus the quaternion) is valid. The standard Gram–Schmidt method is typically the preferred choice due to its simplicity, efficiency, and good empirical performance in the context of neural network predictions. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Thanks for your excellent work! I have some questions about the representation of proprioception observation space and action space during training.
As far as I know, the datasets in OXE Magic Soup all use cartesian position (i.e. delta end effector pose) as action space. If the
Similarly, the proprioceptive state of some datasets in OXE Magic Soup is also represented only in the cartesian pose of the end effector in the base frame of the robot. How do you convert it into the joint position? (As far as I know, the end effector pose can be converted to the joint position using inverse kinematics, but this process often has multiple solutions)
In the code you provided, I found that Droid uses joint velocity to represent action
Beta Was this translation helpful? Give feedback.
All reactions