This project implements a multi-step pipeline for extracting boundary depth from images. The process involves estimating a depth map from a monocular image, back-projecting this image into a 3D triangular mesh, and then extracting planar surfaces to approximate boundary conditions. This approach is inspired by the methodology described in the paper "LOOSECONTROL: Lifting ControlNet for Generalized Depth Conditioning."
Before running the script, ensure that you have python env in the same directory and it's activated:
pip install -r requirements.txt-
Depth Map Estimation: The first step is to estimate the depth map of the given image using a monocular depth estimator. In the provided code, this is accomplished using the
DepthEstimatorclass, which utilizes theDPTForDepthEstimationmodel from the transformers library. The input image is processed and passed through the model to obtain a depth map. -
3D Triangular Mesh Back-Projection: Once the depth map is obtained, the next step is to back-project the image into a 3D triangular mesh within the world space. This involves converting the depth map into a set of 3D points that represent the scene geometry. In the provided code, this step is performed by the createObj method within the
BoundaryDepthExtractorclass, which generates a 3D object file (model.obj) from the depth map. -
Vertical Plane Extraction: For efficiency during training, the code focuses only on vertical planes. This reduces the 3D boundary extraction problem to a simpler 2D problem. The
verticalPlaneExtractionmethod in theBoundaryDepthExtractorclass is responsible for this step, although the actual implementation is not provided in the code snippet. -
Orthographic Projection: The 3D mesh of the scene is projected onto a horizontal plane using orthographic projection. This projection facilitates the precise delineation of the 2D boundary that encapsulates the scene. The
orthogonalProjectionmethod in theBoundaryDepthExtractorclass performs this step by projecting the 3D points onto a vertical plane. -
2D Boundary Delineation: After projection, the next step is to delineate the 2D boundary that encapsulates the scene. This is achieved by determining the convex hull of the projected points, which represents the outer boundary of the scene. The
boundaryDelineationmethod in theBoundaryDepthExtractorclass performs this step. -
Polygon Approximation: The 2D boundary is then approximated with a polygon to simplify the representation. This approximation is done using the Douglas-Peucker algorithm, which reduces the number of points in the boundary while maintaining its overall shape. The
polygonApproximationmethod in theBoundaryDepthExtractorclass performs this step.
{
"camera": {
"extrinsic": [
[
1.0,
0.0,
0.0,
0.03777360018751525
],
[
-0.0,
-1.0,
-0.0,
0.0074896122114004515
],
[
-0.0,
-0.0,
-1.0,
0.9023693128733252
],
[
0.0,
0.0,
0.0,
1.0
]
],
"intrinsic": {
"width": 1242,
"height": 822,
"fx": 711.8728819108087,
"fy": 711.8728819108087,
"cx": 620.5,
"cy": 410.5,
"intrinsic_matrix": [
[
711.8728819108087,
0.0,
620.5
],
[
0.0,
711.8728819108087,
410.5
],
[
0.0,
0.0,
1.0
]
]
}
},
"vertices": [
[
[
0.461700439453125,
-0.45669275522232056
]
],
[
[
-0.5396929979324341,
-0.4580613672733307
]
],
[
[
-0.5212900042533875,
0.35700321197509766
]
],
[
[
0.43428346514701843,
0.36788055300712585
]
]
]
}