PDF -> Kindle

edited September 2010 in Story Games
The ePub thread got me thinking. Is there an easy way to convert existing pdfs into a file format readable by the Kindle? This would make my RPG sessions a lot smoother.

Comments

  • calibre can convert from PDF to EPUB
  • edited September 2010
    ...badly.

    Just like it will convert PDF to Mobi (for Kindle). Badly.

    If a PDF is all single column and uses styling and graphics sparingly, it will just be converted in a lossy manner. If it's a heavily styled layout, it uses one of many layout/graphics tricks, has many illustrations and/or page borders and backgrounds, it will be converted in an unusable manner.
  • PDF doesn't think in terms of flowing text. It thinks about Where Things Are On The Page.

    That, I think, will make it nearly impossible to convert PDF to ePub well.
  • I'm looking at getting an e-reader, and my research has led me to believe that Kindle can natively read PDFs. Am I wrong?
  • No, it does read pdfs, but don't expect to read a US Letter (or A4) pdf on a Kindle3. A KindleDX Graphite (almost 10") might be enough, or might not.

    A 6" device won't even let you read digest sized games *comfortably*. I read HWCTLH on a similar one, but it wasn't comfortable.
  • As much as I loathe the iPad, I think for now it has the advantage in terms of reading PDFs of games.
  • What I've been thinking - you need flowing text, not pdf-like positioning. How much work would it be completely making an epub/mobi file from scratch? Going through the text and copy/pasting chapters or somesuch? Is there an editor for making epub/mobi files or would it be sufficient to just make a plain text-file? And, expanding on that, is there an application that would automagically do that, just get the text from the entire pdf?
  • edited September 2010
    Problem is, many pdfs (those that give you problems with the conversion) have text that is not sequential (and often even title that are completely graphical) so what you get is a pile of garbage with words and letters mangled and missing.

    (just do "Select All -> Copy" in Adobe Reader and paste it into Wordpad/notepad... this is not much unlikely what calibre does, btw)

    --edit--

    Also, you lose all styling (bold/italic/colors), context, visual cues, dimension of the titles and so on. It's an unholy mess: calibre does a lot of more complex tricks than that, trying to infer structure and replicating some of the styles, and as I said, the conversion still kinda sucks.
  • Posted By: wundergeekAs much as I loathe the iPad, I think for now it has the advantage in terms of reading PDFs of games.
    'Advantage' is probably too modest a term.

    Gaming PDFs on the IPad are pretty amazing. I'm as hardcore a book guy as they come (it's how I make my living, hence my personal loathing for All Things Amazon), but the iPad is the first thing I've ever seen that has made me consider actually buying PDFs and forgoing hardcopy on selected games now and then.

    -Jim C.
  • Posted By: GrahamPDF doesn't think in terms of flowing text. It thinks about Where Things Are On The Page.

    That, I think, will make it nearly impossible to convert PDF to ePub well.
    Yeah, that was my fear.
  • If you want to get and idea of how PDFs think of text, cut and paste a page into a word processor.
  • Yeah, i've done that - it's pretty awful. I had hoped that a software solution might do a better job
  • Here is an article that you may find useful : http://angelb.wordpress.com/2009/07/29/getting-your-pdf-to-a-mobi-or-prc-file-to-view-on-kindle-dx/
  • edited October 2010
    Once upon a time, I wrote something in Python (or maybe Perl, I don't remember) that scraped some Exalted PDFs to extract and format information therein in a different way. I assume that the library I used to do so was fairly illustrative of the internal structure of the PDF data, which is a lot about "this block of text (usually a word, but sometimes just single letters) goes here".

    If you've not seen an Exalted book, it uses a fairly typical two column style. Since most of the conversion software doesn't know this, it makes a guess and exports scanning linearly across the page. At best, what this gives you is: line 1 of column 1, line 1 of column 2, line 2 of column 1, line 2 of column 2, etc. Usually, though, the frame boxes of the various words overlap in weird ways, so the order tends to get even more jumbled. The conclusion here is that it is very hard to make a generic tool to export PDFs.

    It is, however, possible to tune the export to the layout of a specific PDF. The API I was using let you do things like grab data from regions within a file. So, tuning the code to grab stuff from a region around column one (tuned by hand) and then another region around column 2 wound up working reasonably well. Naturally, the locations of the regions are slightly different between even and odd pages, and sometimes bounding boxes of certain words would overlap the other column invisibly, which would hose things.

    You'd think that, within a game line, layout would be similar enough that code customized to scrape, say, one Exalted book would work on other Exalted books. This is mostly true, but manual tuning was sometimes needed when moving to a different book, which had slightly different column alignments and text overflow.

    Oh, also... the algorithm used to locate text blocks matters. For example, do you say "give me every text box that overlaps this region at all" or "give me every text box entirely contained in this region" or "give me every text box with a center inside this region". Some of these worked better in one book, but not another, and vice versa. I wound up having to override the PDF API to get some of what I needed.

    In the end, though, doing all these extracts, including writing the code and manually correcting stuff that the scraper just couldn't get right, took way less time than it would have to hand transcribe the data I was after. I'm not sure what the accuracy rate of the scraper was, but I'd guess that, once tuned manually, was about 98% or so.

    I can post the code if anyone cares, but it's pretty bad Gorn technology (i.e. "push it until it moves").
  • Posted By: GrahamPDF doesn't think in terms of flowing text. It thinks about Where Things Are On The Page.

    That, I think, will make it nearly impossible to convert PDF to ePub well.
    I think that depends on the PDF in question. If it's mostly text, like the way Palladium books are, they really wouldn't be any worse off.
  • Has anyone tried a Nook? They support PDF's natively, but I see differing opinions about how well this works. My recent gorging on epimas pdfs is feeding my nook lust but if it's not going to work...
  • If you have a kindle, you can send a pdf to amazon, with your kindle's email address, and amazon will translate it into a kindle file.

    HOWEVER...it's not particularly smart about this. I used to service to translate "The Mountain Witch," and what happens is that all of the page footnotes became embedded as if it was regular text.

    So, imagine reading a book, where there's a sentence break between pages with a footnote at the bottom of the page. In this case, the footnote becomes the next line of text, and when the footnote ends, the sentence (originally on the next page) continues from where it left off.
  • @somelady I picked up a Nook on Black Friday and loaded some indie game PDFs on it. Mixed success. Mouse Guard looks keen. Inspectres is whack because the sidebar info is thrown in as page text. I dunno, its a neat idea, but so iffy I wouldn't hang my hat on enjoying the translation more than half the time.
  • edited December 2010
    Posted By: someladyHas anyone tried a Nook? They support PDF's natively, but I see differing opinions about how well this works. My recent gorging on epimas pdfs is feeding my nook lust but if it's not going to work...
    I'd considered a Nook, as it would be really handy to have a dice rolling app running in the screen at the bottom, but it uses the older gen E-Ink screen that isn't as easy on the eyes as the new one. The new Nook Color uses LCD which defeats the purpose of an E-Reader.

    New Kindle supports PDF, with some limited reflow.

    Incidentally, my girlfriend just got a Sony PRS-650 so I'll be trying out some RPG PDFs on it.
  • Pretty much a total fail. I had everything from entirely freezing the reader to force a restart (apparently that's problems on other ones as well), to with Diaspora at least not crashing everything but still being a pain to get through. There might be the odd game PDF that works with it, but going for a game simply because it works on an E-Reader doesn't seem like the best way to go about it right now. Anything with an SRD should be formatable as EPUB or something else that's got reflowable text with hyperlinks, but I'm pretty certain right now still the main thing for an E-Reader is simply just novels.
  • +1 to reading game (and other) PDFs on an iPad. Works great. The best app for this, that I've seen, is GoodReader, which has a couple of killer features including letting you crop the pages so you get the maximum possible text size. If you do have to zoom in, it's also fairly good about scrolling through two-column layouts: it understands that if you page down from the bottom of the first column it should jump back up and to the right.

    Also, storing all your PDFs on DropBox is awesome. They're synced between your computers and available via web, and easily accessible from iOS and Android devices.

    (I don't get why anyone would consider an iPad more "evil" than a Kindle, but that's off-topic..)
  • The problem with an iPad is it's LCD instead of EPD. That's not good for your eyes, so reading something for any length of time becomes just as much trouble as doing so on a computer screen.
Sign In or Register to comment.