基于OpenCV的实时文档扫描与矫正技术

文章目录

- 引言
- 一、系统概述
- 二、核心代码解析
- - 1. 导入必要库
  - 2. 辅助函数定义
  - 3. 坐标点排序函数
  - 4. 透视变换函数
  - 5. 主程序流程
- 三、完整代码
- 四、结语

引言

在日常工作和学习中，我们经常需要将纸质文档数字化。手动拍摄文档照片常常会出现角度倾斜、透视变形等问题，影响后续使用。本文将介绍如何使用Python和OpenCV构建一个实时文档扫描与矫正系统，能够通过摄像头自动检测文档边缘并进行透视变换矫正。

一、系统概述

该系统主要实现以下功能：

实时摄像头捕获图像

边缘检测和轮廓查找

文档轮廓识别

透视变换矫正文档

二值化处理增强可读性

二、核心代码解析

1. 导入必要库

import numpy as np
import cv2

我们主要使用NumPy进行数值计算，OpenCV进行图像处理。

2. 辅助函数定义

首先定义了一个简单的图像显示函数，方便调试：

def cv_show(name,img):
cv2.imshow(name,img)
cv2.waitKey(10)

3. 坐标点排序函数

order_points函数用于将检测到的文档四个角点按顺序排列（左上、右上、右下、左下）：

def order_points(pts):
rect = np.zeros((4,2),dtype="float32")
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)] # 左上点(x+y最小)
rect[2] = pts[np.argmax(s)] # 右下点(x+y最大)
diff = np.diff(pts,axis=1)
rect[1] = pts[np.argmin(diff)] # 右上点(y-x最小)
rect[3] = pts[np.argmax(diff)] # 左下点(y-x最大)
return rect

这个函数的作用是对给定的4个二维坐标点进行排序，使其按照左上、右上、右下、左下的顺序排列。这在文档扫描、图像矫正等应用中非常重要，因为我们需要知道每个角点的确切位置才能正确地进行透视变换。

函数详细解析

（1）排序逻辑说明

左上点(rect[0])：选择x+y值最小的点

因为左上角在坐标系中 x 和 y 值都较小，相加结果最小

右下点(rect[2])：选择x+y值最大的点

因为右下角在坐标系中 x 和 y 值都较大，相加结果最大

右上点(rect[1])：选择y-x值最小的点

右上角的特点是 y 相对较小而 x 相对较大，所以 y-x 值最小

左下点(rect[3])：选择y-x值最大的点

左下角的特点是 y 相对较大而 x 相对较小，所以 y-x 值最大

（2）示例

假设有4个点：

A(10, 20) # 假设是左上
B(50, 20) # 右上
C(50, 60) # 右下
D(10, 60) # 左下

计算过程：

x+y值：[30, 70, 110, 70]

最小30 → A(左上)
最大110 → C(右下)

y-x值：[10, -30, 10, 50]

最小-30 → B(右上)
最大50 → D(左下)

最终排序结果：[A, B, C, D] 即 [左上, 右上, 右下, 左下]

（3）为什么这种方法有效

这种方法利用了二维坐标点的几何特性：

在标准坐标系中，左上角的x和y值都较小
右下角的x和y值都较大
右上角的x较大而y较小
左下角的x较小而y较大

通过简单的加减运算就能可靠地区分出各个角点，不需要复杂的几何计算。

4. 透视变换函数

four_point_transform函数实现了文档矫正的核心功能：

def four_point_transform(image,pts):
rect = order_points(pts)
(tl,tr,br,bl) = rect

# 计算变换后的宽度和高度
widthA = np.sqrt(((br[0] – bl[0]) ** 2) + ((br[1] – bl[1]) ** 2))
widthB = np.sqrt(((tr[0] – tl[0]) ** 2) + ((tr[1] – tl[1]) ** 2))
maxWidth = max(int(widthA),int(widthB))

heightA = np.sqrt(((tr[0] – br[0]) ** 2) + ((tr[1] – br[1]) ** 2))
heightB = np.sqrt(((tl[0] – bl[0]) ** 2) + ((tl[1] – bl[1]) ** 2))
maxHeight = max(int(heightA),int(heightB))

# 定义目标图像坐标
dst = np.array([[0,0],[maxWidth – 1,0],
[maxWidth – 1,maxHeight – 1],[0,maxHeight – 1]],dtype="float32")

# 计算透视变换矩阵并应用
M = cv2.getPerspectiveTransform(rect,dst)
warped = cv2.warpPerspective(image,M,(maxWidth,maxHeight))

return warped

这个函数实现了透视变换(Perspective Transformation)，用于将图像中的任意四边形区域矫正为一个矩形（即"去透视"效果）。

函数详细解析

输入参数

def four_point_transform(image, pts):

image: 原始图像
pts: 包含4个点的数组，表示要转换的四边形区域

坐标点排序

rect = order_points(pts)
(tl, tr, br, bl) = rect # 分解为左上(top-left)、右上(top-right)、右下(bottom-right)、左下(bottom-left)

使用之前介绍的order_points函数将4个点按顺序排列

计算输出图像的宽度

widthA = np.sqrt(((br[0] – bl[0]) ** 2) + ((br[1] – bl[1]) ** 2)) # 底边长度
widthB = np.sqrt(((tr[0] – tl[0]) ** 2) + ((tr[1] – tl[1]) ** 2)) # 顶边长度
maxWidth = max(int(widthA), int(widthB)) # 取最大值作为输出图像宽度

计算四边形底部和顶部的边长，选择较长的作为输出宽度

计算输出图像的高度

heightA = np.sqrt(((tr[0] – br[0]) ** 2) + ((tr[1] – br[1]) ** 2)) # 右边高度
heightB = np.sqrt(((tl[0] – bl[0]) ** 2) + ((tl[1] – bl[1]) ** 2)) # 左边高度
maxHeight = max(int(heightA), int(heightB)) # 取最大值作为输出图像高度

计算四边形右侧和左侧的边长，选择较长的作为输出高度

定义目标矩形坐标

dst = np.array([
[0, 0], # 左上
[maxWidth – 1, 0], # 右上
[maxWidth – 1, maxHeight – 1], # 右下
[0, maxHeight – 1] # 左下
], dtype="float32")

定义变换后的矩形角点坐标（从(0,0)开始的正矩形）

计算透视变换矩阵并应用

M = cv2.getPerspectiveTransform(rect, dst) # 计算变换矩阵
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight)) # 应用变换

getPerspectiveTransform: 计算从原始四边形到目标矩形的3×3变换矩阵
warpPerspective: 应用这个变换矩阵到原始图像

返回结果

return warped

返回矫正后的矩形图像

透视变换原理图示

原始图像中的四边形变换后的矩形
tl––––––––tr 0––––––––maxWidth
\\ / | |
\\ / | |
bl––––br maxHeight

为什么需要这样计算宽度和高度？

取最大值的原因：

原始四边形可能有透视变形，两条对边长度可能不等
选择较大的值可以确保所有内容都能包含在输出图像中

减1的原因：

图像坐标从0开始，所以宽度为maxWidth的图像，最大x坐标是maxWidth-1

5. 主程序流程

主程序实现了实时文档检测和矫正的完整流程：

初始化摄像头

cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Cannot open camera")
exit()

实时处理循环

while True:
flag = 0
ret,image = cap.read()
orig = image.copy()
if not ret:
print("不能读取摄像头")
break

图像预处理

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray,(5,5),0) # 高斯滤波降噪
edged = cv2.Canny(gray,75,200) # Canny边缘检测

轮廓检测与筛选

cnts = cv2.findContours(edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[–2]
cnts = sorted(cnts,key=cv2.contourArea,reverse=True)[:3] # 取面积最大的3个轮廓

for c in cnts:
peri = cv2.arcLength(c,True) # 计算轮廓周长
approx = cv2.approxPolyDP(c,0.05 * peri,True) # 多边形近似
area = cv2.contourArea(approx)

# 筛选四边形且面积足够大的轮廓
if area > 20000 and len(approx) == 4:
screenCnt = approx
flag = 1
break

文档矫正与显示

if flag == 1:
# 绘制轮廓
image_contours = cv2.drawContours(image,[screenCnt],0,(0,255,0),2)

# 透视变换
warped = four_point_transform(orig,screenCnt.reshape(4,2))

# 二值化处理
warped = cv2.cvtColor(warped,cv2.COLOR_BGR2GRAY)
ref = cv2.threshold(warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

三、完整代码

# 导入工具包
import numpy as np
import cv2

def cv_show(name,img):
cv2.imshow(name,img)
cv2.waitKey(10)
def order_points(pts):
# 一共4个坐标点
rect = np.zeros((4,2),dtype="float32") # 用来存储排序之后的坐标位置
# 按顺序找到对应坐标0123分别是左上、右上、右下、左下
s = pts.sum(axis=1) #对pts矩阵的每一行进行求和操作，（x+y）
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts,axis=1) #对pts矩阵的每一行进行求差操作，（y-x）
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect

def four_point_transform(image,pts):
# 获取输入坐标点
rect = order_points(pts)
(tl,tr,br,bl) = rect
# 计算输入的w和h值
widthA = np.sqrt(((br[0] – bl[0]) ** 2) + ((br[1] – bl[1]) ** 2))
widthB = np.sqrt(((tr[0] – tl[0]) ** 2) + ((tr[1] – tl[1]) ** 2))
maxWidth = max(int(widthA),int(widthB))
heightA = np.sqrt(((tr[0] – br[0]) ** 2) + ((tr[1] – br[1]) ** 2))
heightB = np.sqrt(((tl[0] – bl[0]) ** 2) + ((tl[1] – bl[1]) ** 2))
maxHeight = max(int(heightA),int(heightB))
# 变换后对应坐标位置
dst = np.array([[0,0],[maxWidth – 1,0],
[maxWidth – 1,maxHeight – 1],[0,maxHeight – 1]],dtype="float32")

M = cv2.getPerspectiveTransform(rect,dst)
warped = cv2.warpPerspective(image,M,(maxWidth,maxHeight))
# 返回变换后的结果
return warped

# 读取输入
import cv2
cap = cv2.VideoCapture(0) # 确保摄像头是可以启动的状态
if not cap.isOpened(): #打开失败
print("Cannot open camera")
exit()

while True:
flag = 0 # 用于标时当前是否检测到文档
ret,image = cap.read() # 如果正确读取帧，ret为True
orig = image.copy()
if not ret: #读取失败，则退出循环
print("不能读取摄像头")
break
cv_show("image",image)

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
# 预处理
gray = cv2.GaussianBlur(gray,(5,5),0) # 高斯滤波
edged = cv2.Canny(gray,75,200)
cv_show('1',edged)

# 轮廓检测
cnts = cv2.findContours(edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[–2]

cnts = sorted(cnts,key=cv2.contourArea,reverse=True)[:3]
image_contours = cv2.drawContours(image,cnts,–1,(0,255,0),2)
cv_show("image_contours",image_contours)
# 遍历轮廓
for c in cnts:
# 计算轮廓近似
peri = cv2.arcLength(c,True) # 计算轮廓的周长
# C 表示输入的点集
# epsilon表示从原始轮廓到近似轮廓的最大距离，它是一个准确度参数
# True表示封闭的
approx = cv2.approxPolyDP(c,0.05 * peri,True) # 轮廓近似
area = cv2.contourArea(approx)
# 4个点的时候就拿出来
if area > 20000 and len(approx) == 4:
screenCnt = approx
flag = 1
print(peri,area)
print("检测到文档")
break
if flag == 1:
# 展示结果
# print("STEP 2: 获取轮廓")
image_contours = cv2.drawContours(image,[screenCnt],0,(0,255,0),2)
cv_show("image",image_contours)
# 透视变换
warped = four_point_transform(orig,screenCnt.reshape(4,2))
cv_show("warped",warped)
# 二值处理
warped = cv2.cvtColor(warped,cv2.COLOR_BGR2GRAY)
# ref = cv2.threshold(warped,220,255,cv2.THRESH_BINARY)[1]
ref = cv2.threshold(warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
cv_show("ref",ref)
cap.release() # 释放捕捉器
cv2.destroyAllWindows() #关闭图像窗口

四、结语

本文介绍了一个基于OpenCV的实时文档扫描与矫正系统，通过边缘检测、轮廓分析和透视变换等技术，实现了文档的自动检测和矫正。该系统可以方便地应用于日常文档数字化工作，提高工作效率。

完整代码已在上文中给出，读者可以根据自己的需求进行修改和扩展。OpenCV提供了强大的图像处理能力，结合Python的简洁语法，使得开发这样的实用系统变得简单高效。

基于OpenCV的实时文档扫描与矫正技术

文章目录

引言

一、系统概述

二、核心代码解析

1. 导入必要库

2. 辅助函数定义

3. 坐标点排序函数

4. 透视变换函数

5. 主程序流程

三、完整代码

四、结语

相关推荐

评论抢沙发

评论前必须登录！

热门标签

置顶推荐

热门文章

最新文章

文章目录

引言

一、系统概述

二、核心代码解析

1. 导入必要库

2. 辅助函数定义

3. 坐标点排序函数

4. 透视变换函数

5. 主程序流程

三、完整代码

四、结语

相关推荐

评论 抢沙发

评论前必须登录！

热门标签

置顶推荐

热门文章

最新文章

评论抢沙发