Use Python to elegantly convert PDF to picture

Author: qikqiak

source (of information etc):

I've collected a lot of good documents before, but it's not very convenient when you need to look at them, you need to find the file, and if you're on mobile you often need to download the relevant plug-ins to do so, and the biggest problem is that it's not easy to organize and share the information. Wouldn't it solve these problems if we could convert the xp to a web page? It can also be shared out directly.

The package is used here to process the file, and for the sake of convenience and speed, I'm converting a page directly into an image here, so I don't need to identify every element in the page, which is not necessary.


The core code is simple, that is, the file is read out, converted to , and then you can get the binary data of each page according to the API, get the binary data over, it is easy to image processing, here with the package to image processing.

It should be noted that the general PDF file is larger, if a one-time conversion of the entire PDF file needs to be careful of the problem of memory overflow, we will be the first time the entire PDF file loaded into memory to avoid reloading each time you read.

batch processing

The above has completed the conversion of a page, to complete the conversion of the entire file is very simple, just need to get the total page number of the file, and then loop through it. Considering that the conversion is time consuming, you can use asynchronous processing to speed it up. For example, you can use to with processing, making sure to watch out for memory leaks.

The core code has been put together on github, well, when there is time to prepare to do a public conversion of the service, open to the public use.

Took some time and made it into a standalone service: Feel free to try it out!

Title image: pexels, licensed by CC0.

1、6 Trends Shaping Cloud Computing in 2017 with a little video on cloud computing benefits
2、10 Data Visualization Cases to Help You Understand Visualization
3、ReplicaManager Source Code Analysis 1 Message Synchronization Thread Management
4、React Learning 2 State Events and Dynamic Rendering primary
5、Blog As expected Google open source BERT model source code

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送