Another of the less visible, but still cool features in ColdFusion 9 are the enhancements we’ve made to <cfpdf>. We’ve added the ability to:
- Add headers/footers to existing PDFs
- Create PDF packages
- Selectively optimize PDF size
- Extract text from PDFs
- Extract images from PDFs
- Create high quality thumbnails
Of these features, my personal favorites are optimization and extraction.
PDFs can do a lot. Consequently, PDFs size can swell due to the presence of extra information, metadata, and embedded files. The optimize feature allows you to remove specific types of extras in order to selectively reduce the size of your PDF. But you can retain features that you need. When you take action=”optimize” the following options are open to you:
Code looks like this:
As you can see, the code is pretty straightforward. I’ve seen reductions of 65-75% on PDF size when using all options.
Yes, you can get at the text or embedded images of a PDF with ColdFusion 9.
Here’s the code to get at the text of a PDF:
That code will extract the text of a PDF to XML. The structure divides the content into pages, so you can quickly get at content on particular pages, etc.
You have a few options that I’m not showing though. You can get the content as just a string. You can selectively get page numbers. You can even get XY coordinates for all of the words in the document.
Getting images is similar; you plug in a PDF, and send the images to a directory:
You have options to prefix the images, and pick image formats
As you can see, the engineers added some cool functionality here.