source (of information etc)：https://blog.qikqiak.com/post/python-convert-pdf-images/
I've collected a lot of good documents before, but it's not very convenient when you need to look at them, you need to find the file, and if you're on mobile you often need to download the relevant plug-ins to do so, and the biggest problem is that it's not easy to organize and share the information. Wouldn't it solve these problems if we could convert the xp to a web page? It can also be shared out directly.
The package is used here to process the file, and for the sake of convenience and speed, I'm converting a page directly into an image here, so I don't need to identify every element in the page, which is not necessary.
The core code is simple, that is, the file is read out, converted to , and then you can get the binary data of each page according to the API, get the binary data over, it is easy to image processing, here with the package to image processing.
It should be noted that the general PDF file is larger, if a one-time conversion of the entire PDF file needs to be careful of the problem of memory overflow, we will be the first time the entire PDF file loaded into memory to avoid reloading each time you read.
The above has completed the conversion of a page, to complete the conversion of the entire file is very simple, just need to get the total page number of the file, and then loop through it. Considering that the conversion is time consuming, you can use asynchronous processing to speed it up. For example, you can use to with processing, making sure to watch out for memory leaks.
The core code has been put together on github, well, when there is time to prepare to do a public conversion of the service, open to the public use.
Took some time and made it into a standalone service: https://pdfh5.com Feel free to try it out!
Title image: pexels, licensed by CC0.
>>1、6 Trends Shaping Cloud Computing in 2017 with a little video on cloud computing benefits2、10 Data Visualization Cases to Help You Understand Visualization3、ReplicaManager Source Code Analysis 1 Message Synchronization Thread Management4、React Learning 2 State Events and Dynamic Rendering primary5、Blog As expected Google open source BERT model source code