Intrinsic calibration defines how a depth camera such as the Kinect converts from two-dimensional depth images to three-dimensional geometry. Without intrinsic calibration, the surfaces reconstructed by depth cameras would not be a 1:1 match to their real counterparts. While each Kinect is individually calibrated at the factory, and contains the resulting calibration data in its firmware, the calibration is not particularly good. Most importantly, a factory-calibrated Kinect will reconstruct a flat plane as a gently curved bowl. In the AR Sandbox, this would lead to elevation contour lines that are not completely flat when viewed from the side, or water that might appear to flow uphill.

The custom intrinsic calibration procedure in the Kinect package is rather complex, but it can correct for almost all of these problems.

In technical terms, the result of intrinsic calibration is a 4×4 homogeneous projection matrix that transforms depth-valued depth image pixels (px, py, d, 1) into 3D positions (wx, wy, wz, w). (These 4-vectors are homogeneous points; to convert them to regular affine points, the first three components are divided by the fourth, which is then dropped. So the affine counterparts of the two given vectors would be (px, py, d) and (x, y, z), respectively.)

The depth conversion formula is a part of the intrinsic 4×4 matrix that describes how a depth value d as reported by the Kinect is converted into a metric distance z in centimeters. Concretely, the conversion is z = A / (B – d), where A and B are device-dependent constants determined during calibration. A is usually around 32000, and B is around 1090.