Entering the E-book age, kicking and screaming

So after a nearly a decade of giving away PDFs of my first two books, I’ve decided to sell them as ebooks in different formats.

The technical hassles in so doing are bigger than they should be, although most of the problems are perhaps more in my head than in the format-conversion technology.

Mainly, I’m trying to convert PDF versions of my book to MS Word .doc format.

Any help in making me un-stupid in this process would be much appreciated.


So I have three books out — Acts of the Apostles, Cheap Complex Devices, and The Pains. I make Acts and CCD available in PDF; The Pains in html. These books are released for free download under Creative Commons license. (There are other versions out on the net — people other than me have done format conversions & make them available for free. I kinda consider that a violation of “No Derivative” clause of the license, but not everybody agrees with me. That’s a topic for another day.)

With the help of some friends, I made a kindle version of The Pains some while ago, and it sells on Amazon.com. There are some minor formatting glitches in it, but nothing terrible.


ANYWAY, after spending way too much time investigating this whole “e-book” topic and learning actually very little other than that the age of the Ebook has arrived at last, this time fer sher, I decided to convert my first (& best selling of the three) book Acts of the Apostles. The site Smashwords provides a free service to self-publishing dudes like me. Give them your book in .doc format & they put it through their “meatgrinder” translater program and spit out a bunch of different versions of the book, including EPUB and mobipocket (kindle).

So I’m going to do this Smashwords thing, create the different versions, list my books with the Smashwords premium service, and also offer the differently-formatted versions on my site for $5 or so. I’ll put the mobipocket version on Amazon also.

Then, after seeing how that goes, I’ll explore other avenues, like the iphone app store, etc.

So good, at least I’ve made a decision.

Let teh stupid begin: Creating an MS Word version of Acts of the Apostles

So I need an MS Word version of Acts. My first attempt was to take the output of a freeware program that does just that — converts PDF to Word. The output was not bad, but it contained literally hundreds of small formatting glitches. Places where the font size (or font itself) changed for no apparent reason, unreadable characters, places where font scale & compression changed, images that just got dropped on the floor, etc.

So, after spending about a day fooling around trying to clean that up (I am a very unskilled user of Word) I decided to try another approach.

My pal Gary, who helps my with formatting my books, has the book in InDesign source format; these are the sources that were used to produce the PDF. So why not simply use InDesign to emit .doc instead of PDF?

So Gary did that for me, and the result was OK, but the InDesign-generated version had formatting errors too, only different ones than the freeware PDF-to-Word version. For example, consider page numbers and headers. The Word version from InDesign contains intelligence about these things, but in the PDF to Word version they’re just images. So, since I don’t want headers or page numbers in my source for Smashwords, it should be easier to strip them out in Word using the InDesign-version. But the other formatting issues in the InDesign version were worse.

After spending about a day fooling around with the InDesign version, I came to the conclusion that the first source was actually easier to work with than the second. So that’s what I’m in the middle of doing now, cleaning up the source generated by the PDF-to-Word program.

Is it incompetence or merely stupidity?

After having spent a lot of time worrying about things like how to handle headers page numbers and images so forth, I finally got around to reading the Smashwords Style Guide, which very clearly explains that you have to get rid of headers & footers & page numbers (I could have figured that out if I had spent more than about 3 seconds thinking about it), and you also have to get rid of all pictures unless you absolutely need them.

(If you’re wondering why I didn’t read the Style Guide before I started the whole undertaking, it’s because I’m an idiot and I don’t have a brain in my head. I trust that answers the question.)

So now what I want to do is get rid of

— page breaks
— headers (left hand headers say “Acts of the Apostles”, right hand says the name of the book section, of which there are seven, for example “Angel” “Small Miracles” “Conversion” “A Certain Centurian”)
— page numbers
— little glyphs that I use to demarcate sections within chapters.

Can anybody tell me how to:

— use Word’s search & replace to get rid of page breaks?
— use Word’s search & replace to get rid headers? (Remember, in this version they’re just text, not headers.)
— get rid of all hard line feeds (carriage returns)?
— etc?

Or Maybe I should just brute force it?

Maybe I should just take a Word source (either one?), save the damn thing as text-only, no markup, and use BBeddit to clean up everything as text, and then bring that back into word and reformat with typefaces and paragraph attributes?

Any suggestions welcome. If there are any programming maestros out there with nothing better to do, let me know & I’ll send you the doc versions.

One Comment

  1. John, good luck on the formatting. In the current state of affairs its impossible, or nearly so. I punted. I keep masters of my teaching course work in .txt format. I then apply the formatting as a pre-press effort. I find Sphinx helps alot in doing so.

    I save no more labor doing it that way. But it does reduce my frustration factor for not having to pick through a post conversion search and destroy session on formatting codes.

Comments are closed