| American Journal of Engineering Research (AJER) | 2018           |
|-------------------------------------------------|----------------|
| American Journal of Engineering Res             | earch (AJER)   |
| e-ISSN: 2320-0847 p-ISS                         | SN: 2320-0936  |
| Volume-7, Issue-                                | 11, pp-257-262 |
|                                                 | www.ajer.org   |
| Research Paper                                  | Open Access    |

# A Hardware Implementation of Hybrid Image Compression

Padmavati S<sup>1</sup>, VaibhavA Meshram<sup>2</sup>

<sup>1</sup>(Dept of ECE, Jain University, Bengaluru, India) <sup>2</sup>(Dept of ECE, DayanandaSagar University, Bengaluru, India) Corresponding Author: Padmavati S

**ABSTRACT**: Digital image processing is a method for processing digital images. Images are represented in various forms by using Image Transforms. Image transformation is carried out by applying mathematical operations on an image. There are various transform methods available such as Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), Karhunen–Loeve Transform (KLT), etc. These transformation methods find application in compression, enhancement, feature extraction, and pattern recognition etc. DWT is a wavelet transformation from spatial to frequency domain where the wavelets are discretely sampled. The main advantage of DWT is that the image can be represented in multi resolution form. Advantages of DWT are that it provides a high compression ratio by avoiding the blocking artefacts, provides better localization in spatial and frequency domains. But there is a great need to improve the throughput and overall design cycle time. In this paper, new hybrid architecture is proposed for image compression. The proposed hybrid architecture is a combination of DWT and DCT methods. The architecture is designed by using Verilog HDL and is synthesized by using Xilinx ISE 14.2 Version. The architecture is targeted on Spartan 6 FPGA board. The standard Lena image [512x512] is used as a test image. A comparison of encoding time is made between standalone DWT and hybrid DWT-DCT proposed method. The results show that the encoding time of proposed hybrid [DWT-DCT] image compression method is faster than the standalone DWT method.

KEYWORDS Image processing, image compression, DWT, DCT, FPGA implementation, distributed arithmetic.

\_\_\_\_\_

Date Of Submission:15-11-2018

Date Of Acceptance: 29-11-2018

\_\_\_\_\_

#### I. INTRODUCTION

Digital image processing is a system whose input is an image and output is also an image. The digital system will perform some operations on the image and produce a processed image. A function is used inside this digital system to process an image and produce an output image. This function is called as Image Transformation. Image Transformation employs mathematical functions on a signal/image. Transformations find applications in various fields such as Image Enhancement, Image Compression, Image Analysis and Image Filtering. The aim of transformation is to extract more information by using mathematical functions. It converts an image from spatial/time domain to frequency domain. Important significance of Image Transformation is that the critical components of an image can be isolated and the image can be stored in a compact form. Storing the image in its compact form results in efficient transmission of data with less bandwidth. An Image Transformation [4] is classified into as Orthogonal Sinusoidal functions, Non-Orthogonal Sinusoidal Functions, Directional Transformations and functions based on statistics of input signal. Examples of Orthogonal Sinusoidal functions are Discrete Fourier Transforms, Discrete Cosine Transforms, and Discrete Sine Transforms. Examples of Non-Orthogonal Sinusoidal Functions are Haar Transforms, Slant Transforms, WalshTransforms, and Hadamard Transforms. These transforms are also called as Wavelet Transforms. Examples of Directional Transformations are Ridgelet Transforms, Hough Transforms, and Counterlet Transforms, etc. Examples of functions based on statistics of input signal are KL Transforms and Singular Value Decomposition.

In Digital Signal Processing the computations are usually in the form of sum of product, dot product and inner product. A design engineer [15] can carefully design the architecture by reducing the number of gate counts in the range of 50% to 80% in the arithmetic unit of signal processing. Distributed Arithmetic (DA) architecture design can be used to achieve a reduction in gate count up to 80%. DA can also be used to obtain a high speed by employing more arithmetic operations. DA finds application in filters such as biquadratic digital filter.

The rest of the paper is organized as follows: Section 2 describes basics of DWT and DCT. Section 3 briefs out the fundamentals of Distributed Arithmetic. Section 4 briefs the architecture hybrid design of DWT-DCT. Section 5 discusses the results of the design. Section 6 provides the conclusion.

# II. DWT AND DCT

The wavelet transform [5] is similar to Fourier Transform but has different functions. The difference is that Fourier transforms decompose the signal into sine and cosine functions whereas wavelet transforms decompose the signal into real and complex values. The fundamental principle of wavelet transforms is that the transformation should change the time extensions but not the shape. Generally, Wavelet expression is expressed as follows in equation (1).

$$F(a,b) = \int_{-\infty}^{\infty} f(x) \psi^*_{(a,b)}(x) dx \qquad (1)$$

Fig. 1. Level 1 wavelet decomposition

Where the \* is the complex conjugate symbol and function  $\psi$  is some function. This function can be chosen arbitrarily provided that it obeys certain rules. Level 1 wavelet decomposition is shown in the Fig.1.

An image is represented by a two-dimensional array of coefficients. Each coefficient represents a specific brightness level at that point. From a higher perspective view it is difficult to differentiate between more important coefficients and less important coefficients. Usually natural images have smooth colour variations, with the fine details being represented as sharp edges in between the smooth variations. In technical terms the smooth variations in colour can be defined as low frequency components and the sharp variations as high frequency components.

The low frequency components (smooth variations) are called as the base of an image, and the high frequency components (the edges which give the details) will add upon them to refine the image, to produce a detailed image. The process of separating the smooth variations and details of the image are done in many ways. One such method is the decomposition of the image using a Discrete Wavelet Transform (DWT).

DWT [11] is an implementation of the wavelet transform using a discrete set of the wavelet scales and translations which obey certain defined rules. DWT is an orthogonal wavelet. DWT provides sufficient information required for both analysis and synthesis of the original signal, significantly reducing the computation time. DWT is very easy to implement. DWT is suitable for linearsignal processing and multi-resolution analysis [1]. It will provide a better compression approximation. It is image independent.

#### A. Steps in processing DWT Image

- A Low Pass and a High Pass Filter are selected, such that they exactly have half of the frequency range between themselves. This filter pair is called the Analysis Filter pair.
- First, Low Pass Filter (LPF) is applied for each row of data which generate low frequency components for the row. But since the LPF is a half band filter, the output data contains frequencies only in the first half of the original frequency range. So, by Shannon's Sampling Theorem, they can be subsampled by two, so that the output data now contains only half the original number of samples.
- Now, the high pass filter is applied for the same row of data, and similarly the high pass components are separated, and placed by the side of the low pass components. This procedure is done for all rows.
- Next, the filtering is done for each column of the intermediate data. The resulting two-dimensional array of coefficients contains four bands of data, each labelled as LL (low-low), HL (high-low), LH (low-high) and HH (high-high).
- The LL band can be decomposed once again in the same manner, thereby producing even more sub-bands. This can be done up to any level, thereby resulting in a pyramidal decomposition as shown below Fig.2. The below Fig.3. Shows the 512x512 Lena image and its one level decomposition.

American Journal of Engineering Research (AJER)



Fig. 2.Level decompositionFig. 3. Original image and its one level decomposition

We consider an N  $\times$  N image as two dimensional pixel array I with N rows and N columns. The rows are enumerated from top to bottom and the columns from left to right. The index starts with zero and therefore the largest index is N – 1. The image pixels them self at row i and column j will be denoted by [i, j].

#### B. The Inverse DWT of an image

As forward transform is used to divide the image data into various classes of importance, a reverse transform is used to reassemble the various classes of data into a reconstructed image. Here also, pair of high pass and low pass filters are used. Now, this filter pair is called the Synthesis Filter pair.

The filtering procedure is just the opposite. It starts from the topmost level, filters are applied column wise first and then row wise, and then proceed to the next level, till first level reached.

#### C. Image Compression using DCT

In our proposed method, DCT is used to compress the image. DCT is a technique for converting a signal into elementary frequency components and is widely used in image compression. DCT helps to separate the image into parts (or spectral sub-bands) of differing importance (with respect to the image's visual quality). DCT is applied to every non- overlapping block of the image. It is described in the following equation(2):

$$F(m,n) = \frac{2}{L} C_m C_n \sum_{i=0}^{L-1} \sum_{j=0}^{L-1} f(i,j) \cos\left(\frac{(2i+1)m\pi}{2L}\right)$$
  
.  $\cos\left(\frac{(2i+1)n\pi}{2L}\right)$   
Where  $m,n = 0,1,...L - 1$  and  
 $C_k = \begin{cases} 1/\sqrt{2}, & k = 0\\ 1, & k \neq 0 \end{cases}$  (2)

DCT operation reduces the entropy of all images. The reduction in entropy becomes more profound as the number of retained coefficients is decreased. DCT renders an excellent energy compaction for correlated images. DCT stores energy in the low frequency regions. It also provides optimal decorrelation for such images. Hence, all the uncorrelated transform coefficients can be encoded independently without compromising coding efficiency [15]. Therefore, some of the high frequency information can be discarded without significant quality degradation. In quantization, further reduction of the entropy takes place or reduction in the average number of bits per pixel takes place. Interpixel redundancies have been exploited by DCT to render excellent decorrelation for the natural images.

Symmetry is an extremely useful property of DCT as it implies that the transformation matrix can be precomputed offline and then applied to the image which provides improvement in computation efficiency. Orthogonality property of DCT will reduce the pre-computation complexity.

## III. FUNDAMENTALS OF DISTRIBUTED ARITHMETIC (DA)

Distributed Arithmetic is a technique to compute Sum of Products or Multiply and Accumulate (MAC) computations. This technique saves the resources used for MAC implementation. MAC implementation on Field Programmable Gate Array (FPGA) uses the Look UpTables (LUT) of the FPGA. DA [12] algorithm utilizes the potential of the Xilinx FPGA look-up table architecture [6] and capable of producing very efficient filter designs. DA algorithms are derived in a very simple manner and finds widespread applications. DA makes use of simple Boolean algebraic expressions. The overwhelming advantage of using DA in FPGA is high speed computation with a high precision. Area savings can be obtained up to 80% in DSP hardware designs.

The mathematical expression of DA is as follows as in equation (3).

$$Y(n) \Rightarrow {}^{\kappa}\Sigma_{k=1} A_k X_k(n)$$
 (3)

Where: Y (n) = Output response at time n, xk (n) = Input variable at time n. Ak = constant weighting coefficient.

Following steps explain the application of DA

• DA technique works on bit serial principle

2018

# American Journal of Engineering Research (AJER)

- DA is fast when number of elements is equal to the word size
- DA utilizes the lookup tables of FPGA

# IV. TOP LEVEL ARCHITECTURE OF PROPOSED METHOD DWT-DCT

The Discrete Wavelet Transform [2, 3] and Discrete Cosine Transform have gained the reputation of being a very effective signal analysis tool for many practical applications. Poly phase structure is proposed for the filter implementation in the proposed method, which uses Distributive Arithmetic technique. The implementation of DA is based on lookup tables that are popular in FPGA [8] implementations. The design has been targeted on to Xilinx Spartan 6 FPGA board.

The proposed algorithm is as follows:

- Initially the input image [Lena 512x512] is read
- Three level DWT is applied onto the input image
- DCT is applied to the DWT coefficients of the image obtained from third stage of DWT of size 8x8.
- Thus the final hybrid compressed image is obtained.

Fig.4. shows the top level architecture of Proposed DWT-DCT method.Fig.5. shows the Parallel DA Architecture.



Fig. 4. Proposed top level architectureFig.5. Parallel DA Architecture

The memory blocks consist of dual-port RAM units that store the partial and final transformed samples in a convenient way to allow the dual input access in each computation. For this purpose, the memory unit has been divided in two blocks: one to store even samples and another to store odd samples The address generation and manipulation of write enable signals for even and odd sample blocks is done by the control unit. Register Transfer Level (RTL) schematic of the proposed architecture design is shown in Fig.6.



Fig.6. Register Transfer Level (RTL) Schematic

2018

# V. RESULTS AND DISCUSSIONS

The standard Lena image [512x512] is used as a test image. The proposed architecture is designed by using Verilog HDL. The proposed hybrid architecture is simulated by using Xilinx ISE 14.2 version simulator and is targeted on Spartan 6 FPGA board.

## A. Simulated wave form of filtered values

| Messages                        |          |                    |               |        |               |       |              |              |      |       |        |               |      |
|---------------------------------|----------|--------------------|---------------|--------|---------------|-------|--------------|--------------|------|-------|--------|---------------|------|
| ₽-� /dwt_test_32/i              | 10000103 |                    |               |        |               |       |              |              |      |       |        | a di second   |      |
| 🖕 /dwt_test_32/valid_in         |          |                    |               |        |               |       |              |              |      |       |        |               |      |
| 🔷 /dwt_test_32/dk               | 1        | unn                |               | LTIT.  |               |       | ININ         |              | ГЛ   |       | υU     |               | Л    |
| /dwt_test_32/rst                |          |                    |               |        |               |       |              |              |      |       |        |               |      |
|                                 | 0073     | 00a1 (0087 (0097   | 0086,0073     |        |               |       |              |              |      |       |        |               |      |
| ₽-�/dwt_test_32/LL_sample_out   | 2a8e     | 0000               |               |        | (00ae         | Dab9  | <u>(12c1</u> | 2670         | 199c | (1d91 | 23:5   | 2668          | 2a4a |
| ₽-� /dwt_test_32/LH_sample_out  | 0030     | 0000               |               |        | jidd4         | 2baa  | )22b7        | efb3         | foce | 10b01 | 0447   | Ĵ13d9         | 03e8 |
| ₽-�/dwt_test_32/HL_sample_out   | e9de     | 000                |               |        | )(D3e8        | 4466  | )790c        | ba0f         | 8706 | 11fa2 | 5354   | <u>)</u> 7947 | 0073 |
| ₽-4> /dwt_test_32/HH_sample_out | 003d     | 0000               |               |        | <b>ji0:</b> 2 | 1771  | )8dd9        | daa5         | 8a3d | \$6d8 | (176e  | Ba89          | d107 |
| /dwt_test_32/valid_out          |          |                    |               |        |               |       |              |              |      |       |        |               |      |
| ₽-4> /dwt_test_32/vcount        |          | 000                |               |        |               | 101 ) | 02 (003      | <b>(</b> 004 | )005 | þ     | 06 )00 | 2 (008        | )03  |
| ₽-4 /dwt_test_32/count          | 000      | 3fb[3fc ]3fd ]3fe, | (3ff )400 (3f | f )000 |               |       |              |              |      |       |        |               |      |
| 🖕 (gbi)(GSR                     |          |                    |               |        |               |       |              |              |      |       |        |               |      |

Fig.7. Simulation Wave Form of DWT Top Module

The simulation waveform of DWT is shown in the above Fig.7. The three level DWT output of the proposed method is shown in the below Fig.8.



Fig.8. Three Level DWT Output of Proposed Method for Lena image

## **B.** Device Utilization Summary and Timing Summary

Table I Device Utilization Summary of DWT Method

| Number of Slice LUTs              | 2758 out of 218600 | 1%  |
|-----------------------------------|--------------------|-----|
| Number with an unused Flip Flop   | 1822 out of 2920   | 62% |
| Number with an unused LUT         | 162 out of 2920    | 5%  |
| Number of fully used LUT-FF pairs | 936 out of 2920    | 32% |
| Number of bonded IOBs             | 84 out of 362      | 23% |
| Number of Block RAM/FIFO          | 112 out of 545     | 20% |
| Number of BUFG/BUFGCTRLs          | 1 out of 32        | 3%  |

| Table IITiming Summary | of DWT Method |
|------------------------|---------------|
|------------------------|---------------|

| Minimum period      | 3.192ns    |
|---------------------|------------|
| Maximum Frequency   | 313.303MHz |
| Minimum input       | 1.366ns    |
| arrival time before |            |
| clock               |            |
| Maximum output      | 0.754ns    |
| required time after |            |
| clock               |            |

*2018* 

# American Journal of Engineering Research (AJER)

2018

# Table III. Timing Summary of Proposed Method

| Minimum period | 2.447ns    |
|----------------|------------|
| Maximum        | 408.672MHz |
| Frequency      |            |
| Minimum input  | 1.237ns    |
| arrival time   |            |
| before clock   |            |
| Maximum output | 0.670ns    |
| required time  |            |
| after clock    |            |

#### Table IVDevice Utilization Summary of ProposedMethod

| Number of Slice LUTs              | 1093 out of 708480 | 1%  |
|-----------------------------------|--------------------|-----|
| Number with an unused Flip Flop   | 1727 out of 2504   | 68% |
| Number with an unused LUT         | 228 out of 2504    | 9%  |
| Number of fully used LUT-FF pairs | 549 out of 2504    | 21% |
| Number of bonded IOBs             | 84 out of 720      | 11% |
| Number of Block RAM/FIFO          | 28 out of 912      | 3%  |
| Number of BUFG/BUFGCTRLs          | 1 out of 32        | 3%  |

#### Table V Comparison Table

| Encoding Time | Encoding Time             |
|---------------|---------------------------|
| DWT           | Proposed Method [DWT-DCT] |
| 0.754ns       | 0.670ns                   |

### VI. CONCLUSION

In this paper new hybrid [DWT-DCT] architecture has been designed for compression of an image. The test image used for testing the proposed design is a standard Lena image of dimension 512x512. The proposed method is designed using Verilog HDL and is synthesized using Xilinx ISE 14.2 version. The architecture design has been synthesized up to the gate level. The proposed architecture utilizes less hardware resources and reduces the implementation cost. From the results obtained it is shown that the encoding process of the hybrid [DWT-DCT] process is faster when compared to standalone DWT method.

#### REFERENCES

- Mallat, S. G.: A theory of multiresolution signal decomposition: the wavelet representation, IEEE Trans. on Pattern Recognition and Machine Intelligence, Vol. 11, No. 7, (1989).
- [2]. Grzeszczak, A., Mandal, M. K., Panchanathan S.: VLSI implementation of discrete wavelet transform, IEEE Transactions on VLSI Systems(1996).
- [3]. Chengi, Xiong, Jinwen, Tian, Jian, Liu.: A fast VLSI architecture for two dimensional discrete wavelet transform based on lifting scheme, IEEE Trans. pp.1661-1664, (2004).
- [4]. Jong, woog. Kim., Jong, wha. Chong.: A fast parallel VLSI architecture for lifting based 2D discrete wavelet transform, IEEE Trans page 1258-1261(2004).
- [5]. Benkrid, A., Benkrid, K., Crookes, D.: Design and implementation of a generic 2D orthogonal discrete wavelet transform on FPGA, IEEE Symp. On Field-Programmable Custom Computing Machines(2003).
- [6]. Szi-Wen, Chen., Yuan, Ho. Chen.: Hardware design and implementation of a wavelet de-noising procedure for medical signal preprocessing. Sensors, (2015).
- [7]. Wang, X. Y., Zhang, D. D.: Discrete wavelet transform-based simple range classification strategies for fractal image coding, Nonlinear Dyn, 75, 3, pp. 439–448, (2014).
- [8]. Kekre, H. B., Tanuja, K. S., Sudeep, D. T.: Inception of hybrid wavelet transform using two orthogonal transforms and its use for image compression, International Journal of Computer Science and Information Security. (2011).
- [9]. Kumar, B. B. S., Dr.Satyanarayana, P. S.: Image analysis using biorthogonal wavelet, Published in International Journal of Innovative Research and Development, (2013).
- [10]. Praisline, Jasmi. R., Perumal, B., Rajasekaran, P. M.: Comparison of image compresson techniques using Huffmann coding, DWT and Fractal algorithm, IEEE Proc. Of Internation Conference on Computer Communication and Informatics, (2015).
- [11]. Mario, Mastriani.:Denoising and Compression in wavelet domain via projection onto approximation coefficients,arXiv preprint arXiv:1608.00265(2016).
- [12]. Alekseev, V., Kaliakin, Ivan.: Exploring the sampling rate for discrete wavelet transform implementation, IEEE Proc. Of International Siberian Conference on Control and Communications, (2016).
- [13]. Chetan, Deepak Sharma.: Fractal image compression using quadtree decomposition and DWT, International Journal of Scientific Engineering and Research, (2015).
- [14]. Chandan Singh Rawat, SukadevMeher.: A hybrid imagecompression scheme using DCT and fractal image compression, The International Arab Journal of Information Technology, (2015).
- [15]. Yung Gi, Wu.: Medical image compression by sampling DCT coefficients, IEEE Xplore,(2002).
  - Padmavati S, Vaibhav A Meshram "A Hardware Implementation of Hybrid Image Compression "American Journal of Engineering Research (AJER), vol. 7, no. 11, 2018, pp.257-262

www.ajer.org