Document mosaicing

Document mosaicing is a process that stitches multiple, overlapping images of a document together to produce one large, high resolution composite. The document is slid under a stationary, over-the-desk camera by hand until all parts of the document are snapshotted by the camera’s field of view. As the document slid under the camera, all motion of the document is coarsely tracked by the vision system. The document is periodically snapshotted such that the successive snapshots are overlap by about 50%. The system then finds the overlapped pairs and stitches them together repeatedly until all pairs are stitched together as one piece of document.
The document mosaicing can be divided into four main processes.

Tracking
Feature detecting
Correspondences establishing
Images mosaicing.
Tracking (simple correlation process)

In this process, the motion of the document slid under the camera is coarsely tracked by the system. Tracking is performed by a process called simple correlation process. In the first frame of snapshots, a small patch is extracted from the center of the image as a correlation template as shown in Figure 1. The correlation process is performed in the four times size of the patch area of the next frame. The motion of the paper is indicated by the peak in the correlation function. The peak in the correlation function indicates the motion of the paper. The template is resampled from this frame and the tracking continues until the template reaches the edge of the document. After the template reaches the edge of the document, another snapshot is taken and the tracking process performs repeatedly until the whole document is imaged. The snapshots are stored in an ordered list to facilitate pairing the overlapped images in later processes.

Feature detecting for efficient matching

Feature detection is the process of finding the transformation that aligns one image with another. There are two main approaches for feature detection.

Feature-based approach : Motion parameters are estimated from point correspondences. This approach is suitable for the case that there is plenty supply of stable and detectable features.
Featureless approach : When the motion between the two images is small, the motion parameters are estimated using optical flow. On the other hand, when the motion between the two images is large, the motion parameters are estimated using generalised cross-correlation. However, this approach requires a computationally expensive resources.

Each image is segmented into a hierarchy of columns, lines, and words to match the organised sets of features across images. Skew angle estimation and columns, lines and words finding are the examples of feature detection operations.

Skew angle estimation

Firstly, the angle that the rows of text make with the image raster lines is estimated. It is assumed to lie in the range of ±20°. A small patch of text in the image is selected randomly and then rotated in the range of ±20° until the variance of the pixel intensities of the patch summed along the raster lines is maximised. See Figure 2.
To ensure that the found skew angle is accurate, the document mosaic system performs calculation at many image patches and derive the final estimation by finding the average of the individual angles weighted by the variance of the pixel intensities of each patch.

Columns, lines and words finding

In this operation, the de-skewed document is intuitively segmented into a hierarchy of columns, lines and words. The sensitivity to illumination and page coloration of the de-skewed document can be removed by applying a Sobel operator to the de-skewed image and thresholding the output to obtain the binary gradient, de-skewed image. See Figure 3.
The operation can be roughly separated into 3 steps : column segmentation, line segmentation and word segmentation.

Columns are easily segmented from the binary gradient, de-skewed images by summing pixels vertically as shown in Figure 4.
Baselines of each row are segmented in the same way as the column segmentation process but horizontally.
Finally, individual words are segmented by applying the vertical process at each segmented row.

These segmentations are important because the document mosaic is created by matching the lower right corners of words in overlapping images pair. Moreover, the segmentation operation can organize the list of images in the context of a hierarchy of rows and column reliably.
The segmentation operation involves a considerable amount of summing in the binary gradient, de-skewed images, which done by construct a matrix of partial sums whose elements are given by
The matrix of partial sums is calculated in one pass through the binary gradient, de-skewed image.

Correspondences establishing

The two images are now organized in hierarchy of linked lists in following structure :

image=list of columns
row=list of words
column=list of row
word=length

At the bottom of the structure, the length of each word is recorded for establishing correspondence between two images to reduce to search only the corresponding structures for the groups of words with the matching lengths.

Seed match finding

A seed match finding is done by comparing each row in image1 with each row in image2. The two rows are then compared to each other by every word. If the length of the two words and their immediate neighbours agree with each other within a predefined tolerance threshold, then they are assumed to match. The row of each image is assumed a match if there are three or more word matches between the two rows. The seed match finding operation is terminated when two pairs of consecutive row match are found.

Match list building

After finishing a seed match finding operation, the next process is to build the match list to generate the correspondences points of the two images. The process is done by searching the matching pairs of rows away from the seed row.

Images mosaicing

Given the list of corresponding points of the two images, finding the transformation of the overlapping portion of the images is the next process. Assuming a pinhole camera model, the transformation between pixels of image 1 and pixels of image 2 is demonstrated by a plane-to-plane projectivity.
The parameters of the projectivity is found from four pairs of matching points. RANSAC regression technique is used to reject outlying matches and estimate the projectivity from the remaining good matches.
The projectivity is fine-tuned using correlation at the corners of the overlapping portion to obtain four correspondences to sub-pixel accuracy. Therefore, image1 is then transformed into image2’s coordinate system using Eq.1. The typical result of the process is shown in Figure 5.

Many images coping

Finally, the whole page composition is built up by mapping all the images into the coordinate system of an “anchor” image, which is normally the one nearest the page center. The transformations to the anchor frame are calculated by concatenating the pair-wise transformations found earlier. The raw document mosaic is shown in Figure 6.
However, there might be a problem of non-consecutive images that are overlap. This problem can be solved by performing Hierarchical sub-mosaics. As shown in Figure 7, image1 and image2 are registered, as are image3 and image4, creating two sub-mosaics. These two sub-mosaics are later stitched together in another mosaicing process.

Applied areas

There are various areas that the technique of document mosaicing can be applied to such as :

Text segmentation of images of documents
Document Recognition
Interaction with paper on the digital desk
Video mosaics for virtual environments
Image registration techniques
Relevant research papers
D.G. Lowe. Perceptual Organization and Visual Recognition. Kluwer Academic Publishers, Boston, 1985.
Camera-Based Document Image Mosaicing.. Image, 1.
Sato, T., Ikeda, S., Kanbara, M., Iketani, A., Nakajima, N., Yokoya, N., & Yamada, K.. Mosaic A Journal For The Interdisciplinary Study Of Literature.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...