Parsing Images of Architectural Scenes, ICCV 2007

Alex Berg, Floraine Grabler, Jitendra Malik


We address image parsing in the setting of architectural scenes. Our goal is to parse an image into regions of various types such as sky, foliage, buildings, and street. Furthermore we parse the building regions at a finer level of detail, identifying the positions of windows, doors, and rooflines, the colors of walls, and the spatial extent of particular buildings. Recognizing these individual elements is often impossible without the context provided by the initial parsing of the image, for instance a roofline is only defined in relation to the building below and the sky above. Our approach is driven by recognition of generic classes of visual appearance, e.g. for foliage. The generic recognition results boot-strap an image specific model that provides refined estimates to use for matting, segmentation, and more detailed parsing.

Top: We begin by parsing the original image into five visual categories (sky, building, foliage, street and sky-mixed). Bottom: We then perform a detailed parse to compute the roofline, building and roof boundaries, and windows. In addition we estimate color models for the walls of the building and the roof of the building.

Research Paper: [PDF]