Understanding Formal Arrow-Connected Diagrams and Free-form Sketches

Bresler, Martin

Typ dokumentu

disertační práce

Autor

Bresler, Martin

Vedoucí práce

Hlaváč, Václav

Průša, Daniel

Studijní obor

Umělá inteligence a biokybernetika

Studijní program

Elektrotechnika a informatika

Instituce přidělující hodnost

České vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra kybernetiky.

Metadata

Zobrazit celý záznam

Abstrakt

Drawing has been a natural way for humans to express their thoughts and ideas since ancient times. People got used to create and understand illustrations. Many wellde ned formal notations have evolved including technical drawings, musical scores, and various diagrams. On the other hand, people use free-form sketches or ad hoc intuitive notations often. Electronic devices equipped with a touch screen take part in our daily lives and make it is easier to create and share drawings. Naturally, it is more and more desired to have methods for automatic recognition and understanding of drawings. It is much easier and desired in the case of formal drawings consisting of well de ned entities. A computer system can work further with such recognized drawing { beautify the drawing, rearrange a diagram, perform some actions according to a diagram, or manufacture something according to a technical drawing. On the other hand, it is extremely di cult and potentially unnecessary in the case of free-form sketches. It is often enough to provide the user with tools for easier creation and manipulation with sketches. The reason is that these sketches are meant to be understood by humans only and the computer system serves for their creation, storage, and sharing. Obviously, we can classify drawing into formal drawings and free-from sketching. This thesis deals with two tasks: recognition of formally de ned diagrams and segmentation of object of interest in free-form sketches. These two apparently di erent topics are strongly related. We assume that the drawing is created on an electronic device and thus it is in form of a sequence of strokes rather than a raster image. We propose a recognition framework for arrow-connected diagrams. We introduce a model for recognition by selection of symbol candidates, based on evaluation of relations between candidates using a set of predicates. It is suitable for simpler structures, in which the relations are explicitly given by symbols, arrows in the case of diagrams. Knowledge of a speci c diagram domain is used. The two domains are owcharts and nite automata. We created a benchmark database of diagrams from these two domains. Although the individual pipeline steps are tailored for these, the system can be readily adapted for other domains. The recognition pipeline consists of the following major steps: text/non-text separation, symbol segmentation, symbol classi cation, and structural analysis. We performed a comparison with state-of-the-art methods for recognition of owcharts and nite automata and veri ed that our approach outperforms them. We also analysed our system thoroughly and identi ed most frequent causes of recognition failures. The situation is more complicated in the case of a free-from sketching where the user can draw and write anything freely. We cannot expect any particular structure. We can often nd a combination of pictorial drawing with more structured gures and text in form of labels. The classical segmentation by recognition can not be used here. In some cases, the understanding of a sketch might be di cult even for humans. Speci cally, the identi cation of an individual object in a drawing may di er from user to user, case to case. Recent research showed that a single linkage agglomerative clustering of strokes with trainable distance function can be used for segmentation of objects from prede ned symbol classes in formal drawings. The proper distance function can be learned from annotated data. It requires a lot of data and time. We propose an approach combining several pre-trained distance functions for particular structures together to segment object in free-form sketches. We show that the desired segmentation can be achieved in many cases by combining together distance functions trained for very general object types like rows, columns, words, or compact images. We also show that the best combination of distance functions can be found from a very limited data in real time. We propose a segmentation approach, which estimates the optimal combination of clustering distance functions from an initial selection of one object. It results in segmentation of objects, which have similar characteristics to the initial one. Based on this approach, we designed a selection tool bringing additional functionality allowing to select and manipulate the segmented objects seamlessly. The method is suitable for fast rearrangement of sketches during collaborative content creation (brainstorming).

K tomuto záznamu jsou přiřazeny následující licenční soubory:

Původní licence