WikiEbook

Perhaps the opposite of a WikiBook. The idea is to write the book in a Wiki (maybe a public WikiLog, maybe a Private Wiki), then produce a "packaged" EBook file (for Marketing purposes).

Reader experience:

  • Each Wiki page becomes the equivalent of a Chapter.
  • If one Chapter refers to another, that's a nice intra-ebook link. (In practice below, I created a linear Thin Book "trunk" of roughly a dozen chapters.)
  • There are additional supporting pages, probably provided alphabetically, which are also linked from Chapter text. (In practice below, there were ~100 of these.)
  • Links to non-packaged Wiki pages, like links to 3rd party sites, are just left as regular web links which the EBook reader will probably pass to the device Web Browser (if the device isn't Off-Line).

I applied this process in 2013 with Hack Your Life With A Private Wiki Notebook Getting Things Done And Other Systems:

  • update the WikiGraph database
  • start with the sequential list of Chapter pages
  • run a script which assembles a list of all other pages linked from one of those chapter pages
  • manually review that list, pick a subset to include as Supplemental chapters
  • run a scraper against the wiki which
    • grabs each wiki page HTML to be included in the WikiEbook
    • strips off headers/footers
    • gives each a label (<A NAME>)
    • converts links for those chapters to link to the labels
    • assembles all the chapters in order into a single big HTML page
    • hmm, actually some of this isn't necessary - you can have each chapter be its own file. For inter-chapter links do you just use relative references? Probably need to play with Sigil or equivalent a bit to see what's up.
      • grr Sigil requires MacOs X 10.7 and I'm on 10.6.7.
      • update, summarizing below: OK to keep each chapter in its own file, use relative reference links between in-book pages.
  • then manually jump through appropriate hoops to convert that HTML into the chosen EBook format
  • my Python scripts are at https://github.com/BillSeitz/wiki_ebook_maker.py
  • I left all the content here on-line so it's openly available, though that irks some purchasers of the ebook asset.

Production notes

Nov14'2013 start producing a packaged draft.

Nov15 write code to scrape all chapters.

Nov15 cleaning pages

  • put prefix and suffix in code
  • save Weekly Log page as full page/folder to see all the included files - 26 of them, ~6 are CSS - seems like main are 4 CSS screen, common, print, projection which are all included in <link> tags in HTML (also msie.css is linked in a comment-tag)
  • are there different classes for hrefs depending on type of link? Yes!
    • in-wiki existing-page - no meta
    • in-wiki no-page-yet - <a class="nonexistent" href="PrincipleCenteredLiving">
    • InterWiki - <a class="interwiki" href="http.... ">
    • generic off-site http - <a class="http" href="http....">
    • note that all the in-wiki links are ''relative'' - no http in them at all!
    • I'll want to add differentiation between in-wiki links that are included in the book vs those that are only online in WikiLog
    • then I'll want to have separate styles for each of these
    • actually going to want to ''remove'' "nonexistent" links, and combine all the not-in-book cases to a single style.
  • I think it's time to read some tutorials on making EPub with CaLibre now.
  • finish code that cleans headers/footers.

Nov16 - try CaLibre creation before getting around to link-cleaning.

  • make Table Of Contents file http://manual.calibre-ebook.com/faq.html#how-do-i-convert-a-collection-of-html-files-in-a-specific-order
  • duh, just Add that Table Of Contents file - links get followed from there.
  • then Convert to EPub
  • view in CaLibre
    • each chapter is prefixed with a blank page and then a title page! But maybe that's calibre and not inherent in the file.
  • view in FBReader
    • no blank pages, but get title-only page for each chapter
    • some chapters appear twice - ah, that's because of the h2 detection I put in, combined with having done that in some of the pages to make nicer-punctuated titles.
      • plan: turn off the h2 detection and see what happens
    • ugh ULs are horrible - in most cases get text on separate line from the bullet!
      • looking at my HTML input, I see MoinMoin puts a <p> tag after the <li>, so I'm going to have to have some code to get rid of those. Detail looks like <li class="gap"><p class="line891"> (never any </p> tags anywhere!)
        • am I also going to get rid of all the useless spans?
  • Use Text Wrangler to remove the <li><p> combo, remove the structure-detection bit, re-generate.
    • somewhat better
      • realize problems with break after bullet still happening is only when text immediately has a linked word! (whether WikiWord or a naked URL)
        • ah, I think there are multiple line-number classes with the
        • case, so need more-generatic regex to replace. Done.

      • still have chapter-title-only page, but no blank pages. (OK, not an improvement)
  • tweak settings again. Also set option to edit To C after generated.
    • looking in FBReader again
      • the bullets are with the text now!
      • but realize that they don't wrap properly! They wrap out to the far left, even further left than the bullet itself! (which is inset because of a general indenting that seems to be happening).
    • have to decide what to do with Economic Transition page - can't put it in the next/prev flow (in the WikiLog). Should I move it to alpha position, and maybe just put a little more of that text into the body of the Why Hack Your Page page?
    • also have to check those next-page refs, as something smells off (esp Why Getting Things Done)
    • next - view in MoonReader
      • bullets don't hang-out right, but since they're flush-left they look better than in FBReader.
      • automatically get "page down to next...." at the end of each chapter

Nov17 - trying some new tweaks

  • meta - contemplating manually tweaking zip file - but then what happens with epub? Or maybe I unzip the epub, tweak, then just zip (instead of using CaLibre again for that part)? Also, using Alexis Ohanian's book as sample?
  • change To C index.html to split up into 3 groups with bullet lists
  • change HTML-to-ZIP conversion plugin to do breadth-first link follow, which affect To C.
  • turned off structure-detection pagebreaks and level-1-To C XPath filter
  • have to remember each time to allow 150 items in To C
  • still ugly in FBReader - so I changed standard styles in FBReader, now not so bad! (Got rid of first-paragraph-indent. Tightened normal line, increased space between paragraphs.)
  • bullets still wrap wrong, but not as obvious
    • strangely, prefs formats don't have List as a choice to customize!
  • most page-breaks gone now (except clean break between chapters which is good)
    • but big mess before "how will keeping a notebook..."
    • also think part of problem is just the various extra intro lines I put in many chapters in the wiki - might just need to remove those by hand before doing CaLibre.

Nov18 - start to think about Private WikiNotebookBook Cover (done Dec05)

  • then Private WikiNotebookSupportItems (KDP info saved Dec05)

Dec06 - comparing unzipped EPub files

  • unzip Without Their Permission to look at contents
    • stylesheet.css is 743 lines - grr
    • it's linked from each content HTML page like online
  • unzip old draft of Private WikiNotebook
    • CaLibre-created .zip just includes zipped HTML pages, no other content
    • copy epub to zip and unzip - get more meat
      • stylesheet.css - 113 lines - most styles are named like 'calibre3'
      • pagestyles.css - just has 5pt top/bottom margins
      • HTML pages use those class=calibre3 styles, also still have all the span lines
      • CaLibre created those redundant title entries - ''derp it's because I tagged the head as header instead of head so they generated a fresh head while leaving my header bits inside the body! Code fixed, see if that does it.''
      • class="nonexistent" gone from the hrefs - that's ok, they'll be removed anyway. But not good for distinguishing in-book from on-line hrefs...
      • hmm I wonder what happens if stick the MoinMoin stylesheets in the folder of scraped pages? (even though won't be linking to them?)

Dec06 - next steps

  • more rewording
  • make index.txt include mapping of Wiki Name to pretty titles (CollApse to Collapse, Why Getting Things Done to Why Getting Things Done?)
    • hah can also use this to map new intro_page variable to "Introduction"
    • can generate with code from local scrapings already
    • then can kill code that reads title from scraped HTML
  • rescrape
    • snapshot old stuff first
    • bring over cover, index, etc.
    • change first page name to "Introduction", update index.html
    • incl copies of MoinMoin stylesheets
    • strip pointless starting lines from key pages (title restatements), make leading h2 cleaner.
  • include "Online version of page" link at bottom of every page!
  • include "Related pages" list of backlinks at bottom of every (non-key) page?
  • do href adjusting

Dec09 - going to do a temporary hack before doing the above (just to get a sense of progress from having a cover and discovering some dumb things I had done)

  • change all the header to head tags
  • insert cover and Table Of Content links at front of index.html, change name for first page to "Introduction"
  • use Kindle cover image
  • save common.css and screen.css in source folder, even though nothing links to them
  • manually hack some of the key pages to dump redundant H2 lines. Also check bottom of page for Next-ish link - fix to add .html piece
  • go grep across all pages to make NoteBook and Private Wiki links point to .html
  • go through intro page and makes pretty much all the links use .html
  • then re-create book with CaLibre. Let's review that process
    • launch; select existing book, "Remove book"
    • "Add book", pick index.html
    • "Edit metadata" - title, author, series, cover
    • "Convert book"
      • metadata - should be fine
      • Look And Feel - fine
      • Heuristic - none
      • Page Setup - fine
      • Structure Detection - leave "detect chapters" as-is (has value), blank "insert page breaks before..." (make sure not H2)
      • Table Of Contents - leave all the checkboxes turned off; incr "Number of links" to 200; leave "Level1" value; ''turn on manually fine-tune'' (make sure not H2)
      • EPub output - leave checkboxes blank
      • hit OK
      • review To C - remove dupe entries
  • results? (on tablet in MoonReader)
    • cover image shows up at top and bottom of Table Of Contents
    • Table Of Contents splits some chapters because they use H2 for sub-sections, which they probably shouldn't.
      • plan - going to change code that puts H2 at top to use H1 which is kinda more logical
    • but overall so much happier!

Dec10 - working on link-convert code.

Dec12 - what to "convert" first content page into "Introduction" - what does this mean, so how do it?

  • listing in HTML Table Of Contents
  • listing in popup metadata Table Of Contents
    • probably changing title (and h1) tag inside that page will drive that
  • links from other pages?
    • not many left, after stripping out those top lines that provide context online
    • makes sense to leave the original/full label, not change to "Intro".
  • conclusion on method
    • changing mapping in index.txt, which drives that page's title and h1
    • change index.html
    • no other changes needed!

Revisiting link types - how distinguish for reader?

  • ''Why'' distinguish? Mainly to encourage clicking on in-book links
  • types that matter (having eliminated 'nonexistent' cases)
    • in-book - make green
    • links to WikiLog, including Twin Page-s - make blue
    • other online - blue, with icon (use same external-link-ltr-icon.png as in Wikipedia)
      • if that doesn't work make red (seems extreme, but want to stick with high-contrast colors

Make stylesheet to handle these cases, plus other things - merge of MoinMoin common.css and screen.css with just a subset. Call stylesheet.css. Lots of time tweaking to get vertical spacing (esp around bullet lists) to something I like.

Add Twin Page link at bottom of every page

Link conversion code done!

Add pages for section breaks - better UX for pop-up Table Of Contents and for browsing sequentially through chapters.

Include "Related pages" list of backlinks at bottom of every (non-key) page? No, probably not.

Try CaLibre again

  • follow same process as above (Dec09)
  • results in FBReader - a mess
    • pop-up Table Of Contents missed everything - need to figure that out
    • seems like no cover image
    • green vs blue links are opposite!
  • right in CaLibre it looks nice! (bullets, colors), but no pop-up Table Of Contents either
    • realizing I shouldn't have to Remove the previous book just to rebuild the Table Of Contents - just going through the Convert process again should be sufficient...
  • ah, key is, when the review-Table Of Contents comes up during the conversion process, first remove all the dumb stuff, then click the button to generate the To C "from files" and everything will come in great!
  • looks good in CaLibre now
  • looks good in FBReader on Mac, except blue/green reversed, and bullet points don't outdent.
  • MoonReader - all links blue; bullets don't outdent
    • should be able to use CSS?!?!?
  • FBReader on Tablet???

Try EPub Validator (and HTML Validator for more info in some cases). Make some manual changes

  • doesn't like <em> spanning an <ol>
  • doesn't seem to like <ol type="0">, so drop type
  • doesn't like colons in h3 id attributes (which MoinMoin puts in automatically)
  • then rebuild - now validates
  • doesn't look any better in Mac-FBReader
    • check Prefs - don't see anything about whether to use CSS or not, but see that "internal links" and "external links" have color settings for blue and green. So I switch them. Which is great for me but irrelevant for anyone else.

Try uploading EPub to Kindle publisher (Dec13)

  • it says they accept EPub uploads
  • accepted!
  • grr found 3 spelling errors - fix source - ''will rebuild epub later, just want to Preview for now''
  • trying the online Previewer.
    • Colors are good, bullet outdents are good. No icons for other-online links.
    • hmm, don't see anything resembling a Table Of Contents here...
    • grr, trying "paperwhite" I realize that of course the EInk devices won't show my link colors anyway!!!!
  • download Previewer app

Next steps

  • tweak stylesheet to use dot-vs-dash-vs-solid underlines for different link types
  • rebuild EBook to use that, plus fix that old mis-spelling
  • use CaLibre to build both EPub and mobi for Kindle
    • hmm, now (after making EPub, when making MOBI) don't have option to manually review/change Table Of Contents!
  • but Kindle Previewer recognizes both a Table Of Contents and NCX now!
    • alas, in EInk version don't get different underline types.

Rebuilding book after lots of content changes, based on feedback from friend. (Dec23)

  • EPub looks fine again/still, Table Of Contents works nicely
  • But now MOBI file has bad To C/NCX. Dangit.
  • Tweaked structure detection to only use H1, and regenerated. Now good.

To see the results, buy Hack Your Life With A Private Wiki Notebook Getting Things Done And Other Systems.


Edited:    |       |    Search Twitter for discussion