▲ Top

Converting PDF to HTML :: Blog

Converting PDF to HTML

Skip to comments

Earlier today, I saw the question "Can imagemagick convert pdf to html?" in IRC. In response, I present the world's worst PDF->HTML converter:

#!/bin/sh
BN=$(basename $1 .pdf)
pdf2ps $1
pstopnm -xsize 800 $BN.ps
cat > $BN.html <<EOF 
<html>
<head>
<title>$1</title>
</head>
<body>
EOF
for file in $BN*.ppm; do
	out=$(basename $file .ppm).png
	convert $file $out
	echo "<img src='data:image/png;base64,$(base64 $out)'>" >> $BN.html
done
cat >> $BN.html <<EOF
</body>
</html>
EOF

Comments

Updated version

Posted at 2007-10-25 19:37:59 UTC by "Fred Emmott"

An updated version, that actually produces readable text is at http://files.fredemmott.co.uk/pdf2html.sh

Quotes

Posted at 2007-10-26 07:37:03 UTC by "Anonymous"

You're missing quotes around the filenames. Try passing a file with spaces in its name to your script to see what I mean. And there's pdftohtml lying around for ages, so your script is not really the first thing to do that...

Re: Quotes

Posted at 2007-10-26 15:10:43 UTC by "Fred Emmott"

It's not meant to be of practical use. It's a joke; there's a reason I labelled it "the world's worst".

New Comment

Drink made from wheat
Your name
Summary
Message