Object annotation in images (bounding box)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view

Object annotation in images (bounding box)

i work since a month with jogl, currently my task is to create an image and mark all objects in that picture.  In these pictures you can see a street environment with different objects (car, lines). In a later step all objects should be marked but for debugging purposes the only marked object is a blue car (2D later 3D with texture).

In the following picture you can see that the creation of picture does already work but the marking of objects have its problems.

In this case i think a proper bounding box can be calculated, but in the next image, in which i moved the camera only some steps forward it is much worse .

To obtain the bounding box of that car, i have an object called hitbox, which holds the corner points of that object. These 3D corner Points will through the gluProject function in 2D image points mapped. For the first image i got this 4 mapped points {x1: 849 y1:314 x2: 7310 y2:4637x3: 1874 y3:3849 x4: 676 y4:313} . For the second image i got this 4 points, which explains why the bonding box is wrong { x1: 861 y1:322 ; x2: -12840 y2:-8833 ; x3: -4760 y3:-15638 ; x4: 679 y4:321}

The Points 2 and 3 from the first image are very high  which is expected because they are left below out of the image. But for the second Image these outer points make something like an overflow which leads to this bad bounding box.

I hope my problem is clear an someone can help me.

best wishes

ps: please forgive me my bad english
Reply | Threaded
Open this post in threaded view

Re: Object annotation in images (bounding box)


Guessing what is wrong without looking at your source code is too difficult in my humble opinion.
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view

Re: Object annotation in images (bounding box)

and thank you for the response. Of course i can insert some code:

These 2 are the main functions to get the rectangle of an object in the image.

 * @param gl The current gl context, needed here to transform the bounding points in image points
 * @param activeCamera The current camera that sees the scene and holds the projection matrix and model view matrix. It
 * provides the method to transform image points.
 * @param objectsInView A List of Objects that in a previous step are detected.
public void annotate(GL2 gl, Scene3DCamera activeCamera, List<ISimulatorObject> objectsInView)
        for (ISimulatorObject object : objectsInView) {
                if (object instanceof CarSimulator) {
                        //|| object instanceof Car || object instanceof Person) {
                        // rotate the hitbox points in the pose of the object TODO should already be done.
                        List<Vector3D> boundingPoints = addPose(object.getHitbox().toPolygon().getPoints(), object.getPose());

                        // adding the upper points of a 3D object
                        if (object.getHitbox().is3D()) {
                                double z = object.getHitbox().getLength();
                                int size = boundingPoints.size();
                                for (int i = 0; i < size; i++) {
                                        Vector3D v = boundingPoints.get(i);
                                        boundingPoints.add(new Vector3D(v.getX(), v.getY(), v.getZ() + z));
                        // TODO
                        // objectPointsBehindCamera(activeCamera, boundingPoints);
                        //show me the points of an object
                        boundingPoints.forEach(v -> GL3DUtils.drawPoint(gl, v, 0.02, Color.GREEN));
                        List<Vector3D> imagePoints = activeCamera.transformToImagePoint(gl, boundingPoints);

                        // TODO this should be done by a thread pool
                        rectangleObject(activeCamera, object, imagePoints);

        objects.forEach(p -> System.out.println(p)); // print each object and rectancle that are added
        System.out.println("ObjectCount: " + objects.size());

 * @param activeCamera The current camera that sees the scene.
 * @param object One of the object that is in the scene. Object can be 2D (4 imagePoints) or 3D (8 imagePoints).
 * @param imagePoints The transformed bounding points of the Object. Only x and y coordinate is relevant.
private void rectangleObject(Scene3DCamera activeCamera, ISimulatorObject object, List<Vector3D> imagePoints)
        int width = activeCamera.getViewPortWidth();
        int height = activeCamera.getViewPortHeight();

        int xMin = width - 1;
        int yMin = height - 1;

        int xMax = 0;
        int yMax = 0;

        for (Vector3D ip : imagePoints) {
                int x = (int) ip.getX();
                int y = (int) ip.getY();

                System.out.println("x: " + x + " y:" + y);

                if (x < 0) {
                        x = 0;
                } else if (x > width - 1) {
                        x = width - 1;

                if (y < 0) {
                        y = 0;
                } else if (y > height - 1) {
                        y = height - 1;

                if (xMin > x)
                        xMin = x;
                if (yMin > y)
                        yMin = y;
                if (xMax < x)
                        xMax = x;
                if (yMax < y)
                        yMax = y;
        System.out.println("xMin: " + xMin + " yMin:" + yMin);
        System.out.println("xMax: " + xMax + " yMax:" + yMax);

        Vector2D point = new Vector2D(xMin, yMin);
        Vector2D size = new Vector2D(xMax - xMin, yMax - yMin);
        if (size.getX() > 2 && size.getY() > 2) // prevent to small objects
                addObject(object, point, size);

The next 2 method are on of the Scene3DCamera. The first is called befor all calculations will be executed, to obtain the new matrices and the frustum of the camera.

 * @param gl The current gl render context. Needed to get all 3 matrices (projection, model view and viewPort).
 * these Matrices are needed to determine the frustum of the current view. Also this used to transform 3D world
 * points to image points.
public void update(GL2 gl)
        gl.glGetFloatv(GL2.GL_PROJECTION_MATRIX, projection, 0);
        gl.glGetFloatv(GL2.GL_MODELVIEW_MATRIX, modelView, 0);
        gl.glGetIntegerv(GL2.GL_VIEWPORT, viewport, 0);

        frustum.updateByPMV(Mat4Util.matMulf(projection, modelView), 0);
        frustumReady = true;

And the last function which is calld in the annotate method to trasnform the object bounding points to an image point .

 * @param gl The current gl context. Needed here to create an GLU object, which provides the Project function.
 * This function performs the calculation to map a 3D world point in an 2D image point  
 * @param worldPoints A list of points which belongs to an object.
 * @return List of image points.
public List<Vector3D> transformToImagePoint(GL2 gl, List<Vector3D> worldPoints)
        update(gl); // not needed because already done, when it performs this method.  

        GLU glu = GLU.createGLU(gl);
        int height = viewport[3];

        List<Vector3D> imagePoints = new ArrayList<>();

        worldPoints.forEach(p -> {
                float[] winPos = new float[3];
                glu.gluProject((float) p.getX(), (float) p.getY(), (float) p.getZ(), modelView, 0, projection, 0, viewport, 0, winPos, 0);
                imagePoints.add(new Vector3D(winPos[0], height - winPos[1], winPos[2]));
        return imagePoints;

My academic advisor has pointed out , that this points of an object, which are causing a wrong  resulting rectangle are laying behind the camera object...

Maybe when i remove wrong points, and with the rest i can determin the rectangle ?! Have to try , or do you now an better approche ?

best wishes