Computer vision faces a big problem in generating images from textual input. By succeeding in this process, it will support in many applications and, this will also help in many deep learning techniques. To fix this issue we used Variational Autoencoder (VAEs) this model has generated a graphical model and further maximized the low bound the data of new models. By help of neural network auto regressive models such as PixelRNN has been further used in normalizing the distributed pixel and this helps in synthetic images.
By using in recent research Generative Adversarial Network which is also called as (GANs) have produced good output and has generated sharp images. To train the GANs is really hard and to generate high resolution with high pixels such as 256X256 has some training instability. Lot of research has been proposed against this issue and many proposals have been implemented to stabilize the image quality and produce high resolution images. To improvise the quality of the image many new methods have also applied. Models such as generative models and conditional image generation has been further implemented in to utilize the sample conditioning on variables in the class labels. There are numerous work which has been performed on images to generate image. Many software’s has also been evolved in this process with automatically edit photographs. However, there is a limitation on these applications, these applications are still unable to provide high resolution images and when there is any large defect on the image they cannot be corrected by these applications. This brings a huge limitation and these low-resolution images create a huge impact in correcting large defects.
Several new methods has been proposed and developed to generate image from textual input and different algorithm and ideas has been implemented to generate high quality images from textual input. Reed et al have used a concept of PixelCNN to generate images using text as the input and the location of the object location. PixelCNN stands for pixel convolutional neural network. It is a super resolution algorithm but the major problem in running PixelCNN is the computing time in running is too long. Hence due to the computing time there is a high limitation in using this process in different applications.
Align draw model was another concept developed by Mansimov et al which has been used to estimate the alignment between the text and generating canvas. In this process the caption is first encoded by using bidirectional RNN. The generated RNN sample and dynamic caption together generate a canvas matrix and this helps in generating the final image. Approximate Langevin sampling is a process used by Nguyen et al in this process the image is generated in conditioned on textual inputs. But on this process to sample the approach was not so easy and it requires a lot of time and the sampling should be continued again and again to optimize the process. Pictures of flowers and birds was able to generate by Reed et al and the images was in resolution of 64 X 64. The images has been generated from textual input from this research they wore able to take this research further by generating maximum of 128 X 128 by just adding annotation on the object location. Many different problems has been faced in this process of generating high quality images by help of generative adversarial network. The main problem is based on modeling the images. Generative parametric model has been used in producing high quality images. But in this approach has been further broken down into different sub categories in to some convolutional network with Laplacian pyramid framework.
To this approach to generate high quality image of 256 X 256 pixel the quality of the image has been divided into sub categorical methods of different network by using Laplacian Pyramid framework. This method consists of generative adversarial network and they have been used in every stage. Before moving to the next stage the pyramid automatically conditions the image and they are generated for the next stage. By improvising the image before the next process will help to increase the quality of the image. Energy based generative adversarial network which is also called as EGAN has been used by to generate high quality images. In this process the energy function is further sub categorized into two subdivisions as low energy levels and high energy levels. When the regions are close to data manifold they are considered to be low energy levels and when the points are far away from the data manifold are considered to be high energy. This model mainly helps in the stability of the data. By stabilizing the training data effective results has been generated by stabilizing the training data. Training the data has also been a huge problem in generating the required.