Inside-Out Markov Chain

Idea for hybrid of ELIZA-bot and Markov Chain for CoachBot.

  • instead of merging the code, just be redundant. Reply with 2 lines
    • statement relating last-user-statement to corpus
      • but doesn't make sense to trigger this off the first word of the user-statement. So find the keyword. Or more precisely, the keyword that ELIZA picks. (Actually, it's usually a key phrase.)
      • then build sentence by work backward and forward (inside-out) from that keyword with a Markov Chain.
    • then ELIZA mirror-question

Nov18'2017: code for Markov

  • built markov_rev at same time as markov
    • only working with chain_length=1
  • new generate_sentence1(word) builds sentence from inside out
  • Nov21: finish function, tweak to handle multi-word seed.

Nov20: start to build corpus

  • take entire PrivateWiki contents, cat together into 1 file
  • remove empty lines, leading spaces and *, and URLs
  • sort lines, showing lots of other crazy cases to clean up with regex...
  • Nov21: mostly cleaned up.
    • Definitely some issues, but good enough to generate some interesting stuff.
    • Strangely, have fair number of multi-sentence paragraphs which mean I'd expect to see some weird period-ending-words in the middle of generated phrases, but it's happened only a couple times.
    • Of course it's still just a Markov Chain so it's 80% nonsense. But it's meant almost like a Free-Association engine, which makes sense if the corpus is your own thoughts, but probably not super effective if it's someone else's.
    • Also realizing that because my PrivateWiki has lots of lists and cross-references, it generates some useless short phrases. Probably need to clean out the corpus. On the other hand, as I browse it, I see some interesting pithy little phrases, so I'm not sure... Hmm, maybe I'll output 3 sentences for each ELIZA interaction...

Next: find place in ELIZA to branch out to this.

  • Also have to decide how to save markov and markov_rev dictionaries to they can be used repeatedly. Just pickle, or treat like a server? Or maybe just initiate from inside ELIZA daemon itself, so tied to that session length. Yeah, just start with that.
  • using the old Joe Strout eliza.py code, not the pyAIML stuff.
  • Nov27: have ELIZA calling the Markov Chain bits. (Setting the corpus is inside eliza.py.)

Next: deal with details

  • no-match case: when input word doesn't exist in corpus at all
  • case-sensitivity: make everything lower-case before adding to markov?
  • Nov27: did both, but that lower-case is kinda bad because it flattens any Smashed Together Words.
    • probably don't do lower() if word is CamelCase....
  • different corpus (or additional?) - at bottom of IrcBot it says I already did some conversion of The Obstacle Is The Way so I should go find that! (There are some alternatives noted there, too.)

(Distracted by Replika but hit annoyance wall with OpenSSL yet again, so drop to come back here for some actual progress even if toward local-optimum...)

Mar19'2018: found Obstacle Is The Way file, so will stick with that. First try swapping, then try dupe system.

Mar22: Swap in 'Obstacle'. Getting much longer response sentences now.

Next step: merge the 2 corpus files.

  • though I think there's some benefit to having separate/coherent voices
  • use both separately for breadth of inspiration?
  • randomly pick 1 to keep per session?
  • randomly pick 1 per cycle?
  • use keyword in user input to pick?
  • user keyword in first/second user input to pick, save that for session?

Mar23 different spin...

  • this recent stuff is "interesting" but doesn't smell like it moves things (me) forward.
  • I go back to some fake "scripts" I wrote in the past.
    • Lots of it was tied to time windows, Habit nudging, ToDoList
    • And lots was tied to simple word triggering
      • Which might work better for me than most people because my PrivateWiki writing tends to move me toward talking in WikiWords. But most other people are repetitive, too.
  • For the former I could envision a very complicated system: reinvent cron, etc.
    • Or maybe I could keep it relatively simple: a YAML file for each dataset
    • Though that model's more interrupt-driven which means I'd need a TextMessaging interface to a server (in short term ok if just runs on my laptop).
  • So, for the latter, should I just do that?
    • Am I going to re-invent AIML?
    • I could make a little dict to map keywords to response phrases.
    • Or map keywords to PrivateWiki page URLs? That only works if my client supports links.
    • Map keywords to PrivateWiki page-names, actually dump contents into screen?

Edited:    |       |    Search Twitter for discussion