-
Notifications
You must be signed in to change notification settings - Fork 0
Localization Pattern
Although a visual pattern may consist of any set of symbols (such micro-dots of various offsets), binary pixel blocks were the obvious choice for this particular application, primarily because they can be sampled using very low resolution images, in turn significantly reducing real-time processing requirements.
For a pattern to be usable for positioning, the minimum requirement is that every local window of that pattern must be unique, otherwise the mapping from local pattern to position will not be 1-to-1 leading to ambiguities.
In the 1-D case, a valid positioning sequence can be defined as any string where every substring of length n is unique. The longest of such strings for a given code length n are called De Bruijn Sequences, which are cyclic sequences of length 2^n that contain every possible code as substring once. There exist various methods to construct De Bruijn Sequences that will not be detailed here. The significant takeaway is that 2^n is established as the theoretical maximum number of positions that can be decoded from a n-bit code.
A 2D generalization also exists called the De Bruijn Torus. They are of size 2^(n^2) and densely contain every matrix of size n x n uniquely. It would seem that this makes the ideal basis for a 2D positioning pattern, but in practice, there are no known algorithms to efficiently decode them. Another shortcoming is that because the pattern is maximally dense, it has no buffer to detect or correct decoding errors, since every single bit flip will map to another position.
The structure of the 2D pattern is primarily inspired from a method by Bruckstien et al, where the Outer Product of two 1D sequences are taken using XOR as the composition operator. For example:
| 1 | 0 | 0 | 1 | ||
| 1 | 0 | 1 | 1 | 0 | |
| 0 | 1 | 0 | 0 | 1 | |
| 1 | 0 | 1 | 1 | 0 | |
| 0 | 1 | 0 | 0 | 1 |
This mapping provides a simple and efficient way to convert between 1D sequences and 2D patterns while preserving the uniqueness property necessary for localization. Aside from the native performance of XOR operations, a massive benefit is that the problem of decoding a 2D pattern is reduced to separate 1D problems for each axis, which is individually much simpler. With this scheme, 2^2n positions can be decoded from a (n x n)-bit block. This is much less than the theoretical maximum density, but allows for high levels of error correction via redundancy across all rows and columns. However since the XOR operator is only invert-able up to a parity, 2D to 1D conversion leaves the ambiguity of whether the resultant 1D codes are bitwise inverted. This will later be addressed by extending the uniqueness property of the source sequence to bitwise inversions as well.
To obtain 1D codes from a 2D block generated as mentioned above, it is sufficient to simply take any pair of row and column to be the solution directly (up to an unknown parity). In practice however, the 2D block will be corrupted by errors, and it is desirable to account for all rows and columns to produce the most likely output.
One mathematically sound method for Matrix Factorization of the 2D block is through a Rank 1 Truncated Singular Value Decomposition. This can be performed on a binary matrix by mapping {0, 1} under XOR homomorphically to {-1, 1} under multiplication. It can be thought of as taking the inverse outer product operation while optimizing for least-square error. Although the approach outlines the desired flavor of such methods, it is not the one used because it is hard to justify dependency on a heavy linear algebra library for an implementation intended for micro-controllers.
The method used in the current implementation is instead based on binary vector clustering. Borrowing from machine learning terminology, the dataset (ie 2D matrix/block) consists of a set of samples (ie rows) each containing a set of binary features (ie columns). The goal is to assign each sample to one of two clusters such that total assignment error is minimized. The assignment of each row to a binary choice then directly maps to the factored column code as output. To obtain the factored row code, the process is simply repeated with the transposed input.
The clustering algorithm is based on a fast and simple method called K-Means. Instead of euclidean distance, Hamming Distance is used since it is more efficient to perform bitwise operations on binary vectors. Another optimization is that the two cluster centroids are assumed to be bitwise complements, which halves the number of distance computations and clusters to keep track of.