The Great Story Games Archive

Hi friends,

I've archived the entirety of Story Games discussions and comments on my GitHub account, here:

https://github.com/jeffschecter/storygames

I've got a script that lets me quickly update the archive with recent threads and comments. The crawl script is in the repo, so you can create your own version of the archive if you like.

At the moment it's just raw HTML, with some of the boilerplate removed. I might do some postprocessing later to clean it up and make it more legible, either as raw text files, or as HTML with some of the formatting added back in.

If someone wanted to, they could potentially use this as base to take over the storygames domain name, and provide a static archive website that would correctly resolve external links to discussions and threads.

Comments

  • Well done! You're a hero.
  • Yes, incredible. Kudos!

  • If someone wanted to, they could potentially use this as base to take over the storygames domain name, and provide a static archive website that would correctly resolve external links to discussions and threads.
    Is anyone willing to do this? What kind of tools and/or funding would be necessary?

    I wonder how much support there would be if we were to crowdfund such an effort...
  • Haha you wouldn't need crowdfunding. Just, idunno, some web dev knowledge and like a day or three of hacking. It's lightweight enough to live for free forever on Google's cloud hosting free tier.

    I'd volunteer, but I'm about to have a kid in a few weeks, sooooo free time might be short.
  • Kudos to @Jeph !

    But I think you hit a storage limit of some kind? When I click on the storygames folder up there at GH, I'm getting this message: "Sorry, we had to truncate this directory to 1,000 files. 20,323 entries were omitted from the list."

    Looks like we (you) need to make 21 different folders?
  • Nah, that's just github's user interface. All the files are there, they just don't wanna display a webpage with 21k links on it.

    For instance: https://github.com/jeffschecter/storygames/blob/master/storygames/21000.html

    You can download everything to your local computer if you like. Opening the thread's html files in your browser will give you some semblance of formatting, at least compared to viewing a raw HTML blob as plaintext.

    This repository is meant to be an archive of the raw data, not the presentation layer.
  • doomsday averted♥♥♥
  • Thank you, Jeph.

    (and congratulations!)
  • Thanks @jeph! Much appreciated for you to step up here.
  • Doomsday averted, but until someone turns it into a website, we're only haflway there! We'd still lose all the links (if I understand correctly).
  • Well even the fact that I can cut & paste my old writings from out there and paste it into the Great American Novel Houserule doc♥♥♥
    is a great relief. And now maybe we can dream even bigger; if there aren't legal hurdles to putting that stuff on a new Vanilla instance for example idk
  • Happy wishes @Jeph.
  • Awesome.
  • I've been continuing to update this every so often. Most recent version is up-to-date as of about an hour ago.
  • Jeph best human
  • Indeed! Thank you, Jeph.
  • edited June 29
    Ok, so now that Jeph has done all this work to create (and update!) an archive of Story Games, what's the next step to making it usable for people?

    Is it already, as is, or does someone else now need to step up to format it into searchable threads or something like that?

    For instance:

    Challenge One

    Let's imagine that it's September 2019, and S-G no longer exists. I want to find my old thread, which had something about "monsterhearts" and something about "missing MC procedures", or maybe "missing MC advice" in the title. How can I retrieve that information?

    Let's see if it's doable or not. Can someone (other than Jeph) go and find that thread? (And then tell us how you did it.)

    Challenge Two

    I have a link to a thread I really enjoyed, and would like to reference. Here is the link:

    http://www.story-games.com/forums/discussion/12372

    Again, can you, dear reader, go and find that thread in the archive? (And then tell us how you did it.)
  • Yeah, it's possible.

    Challenge 1:

    1. (Make a github account and) log into github
    2. Go to Advanced Search: https://github.com/search/advanced
    3. Type in the search string: Monsterhearts Paul_T missing MC
    4. For "In these repositories", put: jeffschecter/storygames
    5. Click search
    6. It comes up with no results. But this is because it isn't showing results in code. Click "Code" on the left side of the screen.
    7. There are 74 results displayed. We could scroll through and look at them all, but let's try Ctrl+F and type "missing"
    8. Not too far down, I see this result:
    //story-games.com/forums/discussion/17660/monsterhearts-vs-apocalypse-world-missing-mc-advice-or-procedures" rel
    It is coming from the file 17660.html
    9. Now I'll click on that filename... it takes me through to this file: https://github.com/jeffschecter/storygames/blob/569af8ab3d3dd4b9db82363fe6dd0756e8289070/storygames/17660.html
    10. Copy that URL, and switch over to this site: http://htmlpreview.github.io/
    11. Enter the URL there, and click Preview.
    12. Voila.

    Challenge 2:
    Advanced search within the same repo, but this time the search string is filename:12372

    But seriously, it would be much better to get it set up on a website somewhere. And even better still as Jeph says if the story-games URL could redirect to the archive, so that internal links would work and so on...
  • Hey, that’s a great start! Thank you.
  • This tells me that in my archives posts, I'd better not hide the links under descriptive names and leave them as the long URL strings they are.
  • I don't think that's right, @DeReel. Any method of searching will have access to the raw HTML that underlies your post.
  • I'll take it for granted and save me that much work rewriting the links.
  • Another practical question:

    I often find myself searching for a thread by a particular author, and the list of my own threads is really helpful in that regard.

    So, here is the next challenge:

    Can you, dear reader, go into this archive, and pull out all the threads started by a particular author? How do you do it?
  • All threads containing posts by Paul:

    https://github.com/jeffschecter/storygames/search?q="profile/Paul_T"

    Threads started by Paul might be a bit harder. Someone could build an index on top of the data to make that kind of searching easier, if they wanted.
  • Oh, yeah, I'm definitely interested threads *started* by certain people. It's been an invaluable function for me to use here (if for nothing else, it gives me a way to find an old thread of mine easily, even if I don't remember the title, which often happens). Unless a particular poster only ever posts in their own threads, just finding threads they've posted in isn't terribly helpful.

    Any other ideas? It would be a shame if it wasn't possible at all.
  • I've been fooling around with this a little. It's not obvious! Searching is hard, and often requires many tries. Trying to follow a link can be particularly daunting.

    I'm moderately computer-savvy. I'd imagine that for someone in the IT world, this is a piece of cake, but for a "regular" person they'd be completely lost and unable to access any of this.

    Our time is running out soon - just over two weeks away, I think.

    Is anyone else willing to step up and to help arrange this archive in some more functional way?

    It's asking a lot. But think of the accolades and the eternal gratitude of all the people around the internet following Story Games links from every other blog and email and place! I mean, wow! Think of it.
  • Out of curiosity, I started at least saving all the discussion numbers for threads I've started (something that will be pretty much impossible once this place shuts, it seems). Working manually, it looks like it will take me about 3-4 hours just to save the URLs of all of my own discussions and my bookmarked discussions. Yikes!
  • Time isn't running out! These forums will be around for a year in read-only mode after they close down for new posts next month. At some point during that time, someone will write an index.
  • edited July 27
    The searching through github is not the long-term solution. Manually archiving hundreds of threads is obviously not practical either, and should not be necessary.

    Here is how I imagine one would go about it.
    - Download all the pages from Jeph's archive
    - Analyze the structure of the HTML of a page
    - Write a script in e.g. Python that can parse the HTML for a page and determine the author, thread title, date, and URL
    - Run this script automatically on all the saved pages, thereby creating a giant table of metadata for all the threads
    - Save the giant table of threads to some convenient format such as CSV (comma-separated values) or JSON (Java-script object notation)
    - Write another script that can go through the giant table and produce a nice index in HTML format. You could do this a few times and make a list by thread title, a list by thread author, alphabetical list over several pages for faster loading, etc.
    - Upload the pages and index to a site together somewhere

    This type of scripting honestly is not as hard as understanding continental philosophy. It is 100% feasible and could be completed from Jeph's archive even if Story-Games.com were deleted forever tomorrow. It would just take a bit of work to work out the details in practice.

    I would offer to do it myself except I think this is exactly the kind of thing I tend to say I will do and then never get around to. Someone who is good at web development could probably do it very quickly; for me it would take a bit more time because I am more of a video game programmer.

    After the site is frozen I imagine we can still talk about the problem in the future e.g. on Fictioneers. The final index would have to be generated after the site becomes read-only so that it is not missing the last few days of activity.
  • Ok, I've been watching this conversation waiting for someone else to step up, because I felt that me offering to house this content at Fictioneers would seem presumptuous or opportunistic or something.

    But if we don't get any other takers, I can do this:
    Download all the pages from Jeph's archive
    Prepare them in a Feed format for Drupal
    Create another forum for SG (right now there's "Game Stuff" and "Web Stuff")
    Import the Feed into that forum

    So that's content. The Domain Name is another thing. If someone who has control of the DNS Record wants to point the name at Fictioneers.net once all the above is done or at whatever shutoff point is determined for this site here, I could write redirects in .htaccess so that the old URLs would still work.

  • Tod,

    I can only say that I would support your initiative! That would be incredible.
  • This is great.
  • As is AsIf's plan
  • @AsIf I wouldn't call that "presumptuous", I'd call that extremely useful and generally a good thing. ^_^
  • edited July 28
    Hey there: I think I've mentioned this before, but here's my thoughts:

    1) I support only long-term archiving of this content.
    2) It's not okay to stand up a new forum with this content.
    3) I feel uncertain but not 100% opposed to non-interactive websites hosting this content, but am dubious of ones that don't feel geared towards long-term archiving/durability.

    Here are some things which feel okay:
    1) github as a repository for the html (it's not the greatest, because github accounts/organizational structures aren't very good bulwarks against deletion, but you take what you can get).

    note, @jeph, I think the size of the entire archive is under 1GB, so you could with some effort take your repo and make it work in github pages.

    2) The internet archive seems like the clear place for this. Maybe the strategy I described seemed too opaque. Here's all you need:

    https://github.com/buren/wayback_archiver

    You pass it a list of urls like so:
    WaybackArchiver.archive(%w[example.com www.example.com], strategy: :urls)

    and it'll be in archive.org. So really, all you need to do is generate that list of urls, and you've got maybe a 10-line program to archive this forever.

    It feels like 2 satisfies very thoroughly the "nothing will disappear into the ether".
    I feel like adding 1 to this satisfies the "slightly more accessible, but also reasonably durable".

    If 2 feels difficult, I'm happy to help with it. @Jeph, if you need any help with converting what you've got to github pages, let me know.
  • See, y'all, that's what I meant by "presumptuous."

  • @James_Stuart,

    I understand your desire to control the fate of this forum and all these discussions, but it would be an incredible shame if all this information wasn't accessible in the future, and just disappeared. Right now, Tod is the only one offering to do something about that.

    Would you still be unhappy if it was included as a non-interactive "Story Games Archive" area of the Fictioneers site? Would that be a good compromise?
  • I think we've got two angles listed above which will keep SG content from disappearing. I'll get in touch with Jeph to make sure at least one happens.
  • I need to be able to access my own old writings pretty much. Uh... because I kind of put a lot into here
  • Yeah, same here. And so have many other people over the years, which is an incredible resource. Pretty much every time I run a game, there's a relevant old thread on this forum that comes to my rescue or improves my game.
  • edited August 8
    Here's an idea. Take the OP from all your most precious threads and post them on Fictioneer.net. Then edit the bottom of the post to add a link to the thread in the Github repository that Jeph made. Then you will always be able to find it easily.

    If after some time you realize that there is a thread you forgot, you can still search Github and then copy the OP over to Fictioneer.net with a link to the Github thread. It will always be there.

    Basically, you would be making a landing page on Fictioneers for a valuable thread in Github.

    Tod's other thread made me think of this.
  • That’s a good idea. The volume is daunting, though. (Like I said, I started doing this, and had to stop several hours later, having barely made a dent.)
  • @Paul_T My approach to that problem would be very pragmatic. I would just post OPs that I needed or were interested in at the moment (i.e. now) like the Spicey Dice threads, and then I would build out from there with OPs that are cross-referenced in the OP or split from the OP.

    If you try to index your whole corpus you will be defeated by the sheer magnitude. Just try doing it little by little.

    Story-Games will still be here for a year, so there is time; Github will likely be here forever. There's really no rush.
  • @Vivificient @Paul_T Python's NLTK is really your answer here. I can't believe I didn't think about it before (Vivificient's post above made me think about it now). NLTK is a set of python tools for working with text, especially text corpora. It isn't hard to learn.
  • edited August 9
    NLTK = natural language toolkit. You'd have to learn to write some simple scripts, but it is so easy that a moderately computer literate person should have no problems. NLTK was really designed for non-computer geek, language scientists and humanities scholars to work with large text corpora (like the one you are trying to parse, i.e. SG.)
  • You don't even need NLTK. You just need to build these indices:

    A mapping from each post ID to the thread ID in which it appears.

    A mapping from each word that appears in the story games corpus to the set of post IDs in which it appears. (Though you could use NLTK here for lemmatization, instead of string literals, which could be nice.)

    Then a little program that, given a search term, looks it up in the word-to-post index, and thence the post-to-thread index.
  • This is an incredible effort. Thank you so much, Jeph!
Sign In or Register to comment.