There are two parts in this repo. The first part is written in python, which enable a simple binarization algorithm.

The second part uses an improved contrast maximization version of Niblack/Sauvola et al’s method to binarize document images. It is also able to perform the more classical Niblack as well as Sauvola et al. methods. Details can be found in the ICPR 2002 paper.


Simple Binarization Algorithm

The first algorithem is easy enough, convert the colorful picture into ‘L’ model, then check the RGB value with an alpha=127. Main code can be found here:

import os, sys
from PIL import Image

inFile = ''
outFile = ''

if len(sys.argv) != 3:
    print 'Input format error!'
    inFile = sys.argv[1]
    outFile = sys.argv[2]

im ='L')

for i in range(im.size[0]):
    for j in range(im.size[1]):
        if im.getpixel((i,j)) > 127:
            im.putpixel((i,j), 255)
            im.putpixel((i,j), 0)

In order to run this code, run

python sample.jpg res.jpg

sample.jpg is the source picture, res.jpg is the result after execution.

Here is an example:


Figure1: Origin picture


Figure2: Simple algorithm

Improved Contrast Maximization Algorithms

The source code please refer the repo in Github, 0x333333/binarizewolfjolion.

In the there are detailed information about how to use and compiler.

Here are three examples:


Figure3: Origin picture


Figure4: Algorithm Wolf et al. (2001) needs black text on white background


Figure5: Algorithm Sauvola et al. (1997) needs black text on white background


Figure6: Algorithm Niblack (1986) needs white text on black background