An algorithm (divided into multiple modules) for generating images of full-text documents is presented. These images can be used to train, test, and evaluate models for Optical Character Recognition (OCR).
The algorithm is modular, individual parts can be changed and tweaked to generate desired images. A method for obtaining background images of paper from already digitized documents is described. For this, a novel approach based on Variational AutoEncoder (VAE) to train a generative model was used.
These backgrounds enable the generation of similar background images as the training ones on the fly. The module for printing the text uses large text corpora, a font, and suitable positional and brightness character noise to obtain believable results (for natural-looking aged documents).
A few types of layouts of the page are supported. The system generates a detailed, structured annotation of the synthesized image. Tesseract OCR to compare the real-world images to generated images is used. The recognition rate is very similar, indicating the proper appearance of the synthetic images. Moreover, the errors which were made by the OCR system in both cases are very similar. From the generated images, fully-convolutional encoder-decoder neural network architecture for semantic segmentation of individual characters was trained. With this architecture, the recognition accuracy of 99.28% on a test set of synthetic documents is reached.
The paper is devoted to speech recognition technology developed in Artificial intelligence Institute (Donetsk, Ukraine). It is based on the following main stages: segmentation with the help of full variation digital analogue; diphone-database creation; DTW-recognition of words based on diphone templates. The technology could be used for large vocabulary speech recognition as well as for development of text editors with voice input.
The article solves the problem of verifying oil spills on the water surfaces of rivers, seas and oceans using optical aerial photographs, which are obtained from cameras of unmanned aerial vehicles, based on deep learning methods. The specificity of this problem is the presence of areas visually similar to oil spills on water surfaces caused by blooms of specific algae, substances that do not cause environmental damage (for example, palm oil), or glare when shooting (so-called look-alikes). Many studies in this area are based on the analysis of synthetic aperture radars (SAR) images, which do not provide accurate classification and segmentation. Follow-up verification contributes to reducing environmental and property damage, and oil spill size monitoring is used to make further response decisions. A new approach to the verification of optical images as a binary classification problem based on the Siamese network is proposed, when a fragment of the original image is repeatedly compared with representative examples from the class of marine oil slicks. The Siamese network is based on the lightweight VGG16 network. When the threshold value of the output function is exceeded, a decision is made about the presence of an oil spill. To train the networks, we collected and labeled our own dataset from open Internet resources. A significant problem is an imbalance of classes in the dataset, which required the use of augmentation methods based not only on geometric and color manipulations, but also on the application of a Generative Adversarial Network (GAN). Experiments have shown that the classification accuracy of oil spills and look-alikes on the test set reaches values of 0.91 and 0.834, respectively. Further, an additional problem of accurate semantic segmentation of an oil spill is solved using convolutional neural networks (CNN) of the encoder-decoder type. Three deep network architectures U-Net, SegNet, and Poly-YOLOv3 have been explored for segmentation. The Poly-YOLOv3 network demonstrated the best results, reaching an accuracy of 0.97 and an average image processing time of 385 s with the Google Colab web service. A database was also designed to store both original and verified images with problem areas.
The article presents the application of a statistical analysis algorithm for multi-temporal multispectral aerial photography data to identify areas of historical anthropogenic impact on the natural environment. The investigated site is located on the outskirts of the urban-type village of Znamenka (Znamensky District, Tambov Region) in a forest-steppe zone with typical chernozem soils, where arable lands were located in the second half of the 19th - early 20th centuries. Grown vegetation as a result of secondary succession in abandoned areas can be a sign for identifying traces of historical anthropogenic impact. Distinctive signs of such vegetation from the surrounding natural environment are its type, age and growth density. Thus, the problem of detecting the boundaries of anthropogenic impact on multispectral images is reduced to the problem of vegetation classification. The initial data were the results of multi-temporal multispectral imaging in green (Green), red (Red), edge of red (RedEdge) and near-infrared (NIR) spectral ranges. The first stage of the algorithm is the calculation of the Haralick texture features on multispectral images, the second stage – reduction in the number of features by the principal component analysis, the third stage – the segmentation of images based on the obtained features by the k-means method. The effectiveness of the proposed algorithm is shown by comparing the segmentation results with the reference data of historical cartographic materials. The study of multi-temporal multispectral images makes it possible to more fully characterize and take into account the dynamics of phytomass growth in different periods of the growing season. Therefore, the obtained segmentation result reflects not only the configuration of areas of an anthropogenic transformed natural environment, but also the features of overgrowth of abandoned arable land.
Descriptions and results of the computing experiments series devoted to the analysis of a lag effect of chaotic processes are submitted. Materials of article are continuation of the researches given in article [1]. Essential difference from the specified work is refusal of segmentation of studied process change area. Such approach allows to adjust more flexibly system of the analysis of a chaotic dynamics lag effect. Earlier received conclusions about existence of a smoothed dynamics lag effect are confirmed. Possibility of effective control strategy creation on the basis of the received conclusions demands the additional researches connected with studying of a trend dynamic properties.
The fundamental problem of existence inertia effect in quasichaotic processes on the basis of a computing experiments series is considered. As the data polygon are used long intervals of supervision over currency tools quotations in the electronic market Forex. The technology of observed dynamic segmentation process for ensuring visualization is used. It is established that the hypothesis of existence of inertia effect is confirmed only for smoothed process.
The method of an acceleration of algorithms for hierarchic image segmentation is proposed. The algorithm is applied when functionality of the decision rule does not need segments features recalculation on an each iteration.
We consider the MALDI imaging segmentation problem and propose an approach based on graphical models (LDA model and Markov random fields). We consider several modifications to the proposed approach and compare it to the previously known methods; we distinguish several advantages of the proposed approach.
The paper proposes a survey of assistive smart spaces and ambient assisted living environments. Also design of a multimodal assistive system for a smart living environment is presented. The system consists of two software coplexes. The first one provides video signal processing and surveillance for detecting and tracking a user as well as analysis of his/her activity. The second software complex provides audio signal processing for automatic recognition of speech messages and non-speech acoustic events. The developed automatic speech recognition system is multilingual one and is able to recognize words both in English and in Russian. At the experiments, 2811 wave files with speech commands and simulated acoustic events have been recorded in total. Recognition rate for speech commands and non-speech acoustic events was 96.5% and 93.8%, respectively.
1 - 11 of 11 items