Alicornutopia

Posted: **Fri Jan 29, 2016 10:05 am**

Ah, right, sorry, I don't think I was clear enough. Most of the HTML that people have written by hand is probably fine.

Long bits of text of how the thing scrapes in indexes:
-------------------------------
In the Incandescence index, it's a bit weird that there are s inside some tags and outside others (like here - why isn't that last just before the <ol>?), but the format is readily readable: Look for an ordered list (1., 2., 3.), then look for all the list elements inside that. In each list element, get all the text that's before a new list (e.g. "Chamomile"). After that, go through each of the elements of the sub-list (e.g. "Picknicking"), get the URL, get the text, and then list that in the list of chapters.

The Effulgence format is largely similar, but there's a bit of a difference in that the HTML on that page seems to be written with ending tags for each <li> element, whereas on Incandescence they don't. Since Dreamwidth messes this up a little, on the Incandescence index, I just look for any part of a list that has another list inside it, and take that to be the "Chamomile" thing, then I look for any list part inside that. It seems like the sub-lists have list elements that show up properly, but the main list seems to be a bit funny with nesting. I'll try Adelene's idea of stripping "</li>" tags to see if that helps, since it sounds like it probably will. On the Effulgence index, they all seem to be explicitly closed, so Dreamwidth doesn't do anything stupid, so I can go through it as I described above. If people are writing new indexes, it'd probably be nice just to explicitly close the list things in case, since it definitely seems like Dreamwidth is being stupid with that.

For the sandbox, I'm currently going through everything in a really horrible way: I remove all "" tags, since that makes it harder to parse. I go through each top-level element in the entry, check to see if it contains the text "SHORT FORM SANDBOXES" or "MULTI-THREAD PLOTS" or whatever, update it so it knows which section it's on, then I find every link, state that I'm doing a new chapter, set the URL, add the text of any other elements (like "(with kappa)") and wait until I find another link, then I process that chapter. If, instead, each "Milliways Meetings" link (and accompanying text) were inside a <li> tag in a <ul>, that'd make it a lot easier to work out what text goes with which link, and to ensure I go through them all properly (you may have noticed the error with only adding a new chapter when I find a new link: it fails to add the last chapter in a list. I fixed this by manually getting it to check if it needs to save a chapter when it changes major group (e.g. SHORT FORM SANDBOXES)), but it would probably be some effort and if it were to be done, I'd probably do it since it's mainly just for the EPUB thing, and the current layout probably looks better, so people are probably more attached to this, and so on.

Pixiethreads gets processed in the same way as Effulgence. "Glowfic" has no neat, textual index, so I go through the monthly archives of posts and add them through that. Marri's index gets processed in a similar way to Effulgence's, but the headings are more like the example I gave, rather than the headings being parts of a list themselves (as in Effulgence and Incandescence). Radon-absinthe is done in a similar way to Effulgence, but the sections (parts of the main list) don't actually have their own names, so I've done away with the part that processes that, and I've just got an incrementing counter. Peterverse's index has been specifically coded in (it's rather different from the others, and it has a few of the weird " " tags hidden inside tags at points, and outside at others). Maggie's index has been specifically coded in, but it's rather easy to process, so it's not too bad (but it'd be kinda nicer if the indexes were more consistent).

-------------------------------

I'm not saying every index has to be identical, just that it'd be nice if they were a bit more similar structurally, or they conformed to a couple or a few templates, so I can just make the script use template X to process Effulgence, Incandescence, Pixiethreads, ..., (numbered sections) and template Y to process Sandbox and Maggie's stuff (not numbered sections), or something like that.

Double spaces should be fine. If they're double spaces as in double line-breaking, that's also fine. It'd be nice if line-breaks were put together, outside of tags that style the text (such as or ), but inside (at the very end of) all "<li>" tags (you apparently shouldn't have " " tags directly inside "<ul>" or "<ol>" tags). As far as I can tell, paragraphs are totally ignored everywhere, since people don't put these links in paragraphs. I'll try ignoring all "</li>" tags and see if that fixes how it processes the stuff that Dreamwidth messes up, but if anyone's making a new index, it'd probably be nice if you explicitly closed list elements. If you're consistently bolding and underlining or and italicising or whatever, it'd be nice if you could keep the order consistent (if you use "blah" somewhere, can you please not use "blah2" later on?). Not randomly styling things that don't show up as text (like linebreaks) would be nice (e.g. don't do " "; remove the "" and "" if you ever see this, please). Keeping tags outside of this styling wherever possible, so they're as high-level as possible, would also be nice (e.g. "section1 blah", not "section1 blah").

Again, I'd be willing to make the HTML of people's indexes more consistent if they'd be happy for me to do so (I'd look at the page, copy the HTML, change it, then send it to you so you can hopefully update the post with the new layout / small changes).

Posted: **Fri Jan 29, 2016 10:36 am**

I actually have a Racket program I use to generate my list elements for indexing, so all the indexes that I maintain should be pretty similar to each other. In case anybody finds it useful:

Code: Select all

(define (li url title)
  (displayln (string-append "<li> <a href=\"" url "\">" title "</a> +</li>")))

(My own version has a bunch of extra shit to deal with symbellas, but those are only relevant to Effulgence. Technically this is untested code because I edited out the symbella part after I pasted it into this post, but I'm pretty sure it works.)

Posted: **Fri Jan 29, 2016 2:25 pm**

I've now condensed most of the scraping into a few templates:

The Effulgence template
Each and every group of chapters (e.g. "1. make a wish") is numbered on Dreamwidth, and each and every chapter within it is numbered too (e.g. "1. ✴ he couldn't have imagined"). This makes it relatively easy to just pick up all the chapter URLs, along with the names and extra text, and the section titles.

Code: Select all

<ol>
  <li>Section1
    <ol>
      <li>(symbellas) <a href="chapter1url">Chapter 1</a> (extras)</li>
      <li><a href="chapter2url">Chapter 2</a></li>
    </ol>
  </li>
  <li>Section2
    <ol>
     <li>etc</li>
    </ol>
  </li>
</ol>

(Looks like this)
Also used in Incandescence, Pixiethreads and Radon Absinthe

The Marri Template
Each collection of chapters has a name in a "strong" or "b" tag, and potentially some extra text (e.g. "Lumos (with Alicorn)"), and the chapters within each collection are either numbered or bullet-pointed (e.g. "1. back to school shopping"). Probably good for lists of sandboxes by a single author, both one-shots and multi-thread stories; you probably want to put the one-shots at the end in an unordered lists (<ul>), and the multi-thread stories in ordered lists (<ol>).

Code: Select all

<strong>Section title</strong> (extra text)
<ol>
  <li><a href="chapter1url">Ordered Chapter 1</a> (extras)</li>
  <li><a href="chapter2url">Ordered Chapter 2</a></li>
</ol>
<strong>Section 2</strong> (extra text)
<ul>
  <li><a href="chapter-a-url">Unordered Chapter</a> (extras)</li>
  <li><a href="chapter-b-url">Another unordered chapter</a></li>
</ul>

(Looks like this)

The Peterverse template
For when you kinda want the Marri template, but you also want to be able to group them a second time. There are collections of collections of chapters (like, perhaps you organise your threads (collections of chapters) by what type of story they are; are they one-shot, or are they multi-thread? Or maybe you want to organise by character). The title of each collection of collections is in a "b" or "strong" tag (with optional extras afterwards), and then there's an optional title for a collection of chapters that you put inside "em" or "i" tags, and then you have either ordered or unordered lists.

Code: Select all

Starred things are my favourites. Plussed things are yet to finish.<br /><br />
<strong>One-shots</strong> (in the glowfic community)<br />
<ul>
  <li><a href="chapter-a-url">Miscellaneous thing A</a></li>
  <li><a href="chapter-b-url">Miscellaneous thing B</a> (with someone random)</li>
</ul>
<em>With Specific Person 1</em><br />
<ul>
  <li><a href="chapter-c-url">Thing C</a></li>
  <li><a href="chapter-d-url">Thing D</a> +</li>
</ul>
<em>With Person 2</em><br />
<ul>
  <li><a href="chapter-e-url">Thing E</a></li>
</ul>
<br />
<strong>Multi-thread stories</strong> (somewhere)<br />
<em>Multi-thread name 1</em><br />
<ol>
  <li><a href="chapter-1-1-url">Thread 1</a></li>
  <li><a href="chapter-1-2-url">Thread 2</a> + (starring person 3)</li>
</ol>
<em>Multi-thread thing 2</em><br />
<ol>
  <li><a href="chapter-2-1-url">Thread 1</a> *</li>
</ol>

(Looks like this)

Etc
There's also the "glowfic community template", which goes through the months and just gets everything. I also still have stuff dedicated to the sandbox layout and Maggie's current layout, so if you guys don't mind terribly conforming to one of the above templates, that'd be cool, but if you don't, well, provided you don't change the format overmuch, it should still work.

I was thinking the sandbox would look something like this (with this snippet of code as an example):

Code: Select all

Starred things are favourites. Plussed things are yet to finish.<br /><br />
<strong><u>SHORT FORM SANDBOXES</u></strong><br />
<em>Milliways Meetings</em><br />
<ul>
  <li><a href="http://sun-guided.dreamwidth.org/267.html">Aether & Tinuben</a> + (with Liz)</li>
  <li><a href="http://panfandomsandbox.dreamwidth.org/115923.html?thread=9180371#cmt9180371">Aegis & Charles</a> + (with Fi)</li>
  <li><a href="http://panfandomsandbox.dreamwidth.org/116465.html?thread=9252849#cmt9252849">Kerron & Bryce</a> (with Adiva)</li>
  <li>...</li>
  <li><a href="http://alicornutopia.dreamwidth.org/9596.html?thread=4213628&style=site#cmt4213628">Earth, Lightning, and Ace</a> * (with kappa)</li>
  <li><a href="http://royal-obligation.dreamwidth.org/1634.html?style=site">Shell Bell and Edarial</a> * (with Aestrix)</li>
  <li>...</li>
</ul>
<em><a href="http://elcenia.com/">Elcenia</a> Stuff</em><br />
<ul>
  <li><a href="http://alicornutopia.dreamwidth.org/1921.html?thread=174977#cmt174977">Adarin in summoning circle</a> + (with Aestrix)</li>
  <li>...</li>
</ul>
...<br />
<strong><u>MULTI-THREAD PLOTS</u></strong><br />
<em>Amounts Of Dragon</em> (with Kappa)<br />
<ol>
  <li><a href="http://alicornutopia.dreamwidth.org/12291.html?style=site">This One Is Safe</a></li>
  <li><a href="http://alicornutopia.dreamwidth.org/281.html?thread=30489#cmt30489">Kaylo & Lazarus</a></li>
  <li>...</li>
</ol>
...<br />
<strong><u>IN WHICH I GUEST STAR</u></strong><br />
<ul>
  <li><a href="http://poll-the-stars.dreamwidth.org/1357.html?style=site">Avet and Mial</a> (takes Amounts of Dragon as continuity through This One Is Safe) (with kappa and Aestrix) (as Elcenians)</li>
  <li><a href="http://manyworlds.boards.net/thread/80/backstage-leafy-glowfic-index">Leafy Glowfic</a> (with kappa and others), various threads (as Ivan and Demon Cam)</li>
  <li><a href="http://radon-absinthe.dreamwidth.org/295.html">Radon Absinthe</a> (with kappa and andaisq) beginning at "onwards to adventure!" (as Narnians)</li>
  <li><a href="http://glowfic.dreamwidth.org/24433.html">Witch's Cape, Not Hat</a> (with Nemo and Rockeye) (as Dinah Alcott)</li>
</ul>

I know it'd be a lot of work to convert it, but I could definitely do it for you. I don't know if you'd be really opposed to the change, though, so I haven't started doing that yet. If you like it, I'll go ahead, and if you don't, I can just leave the code as is and hopefully you won't change it around too much in the future. I'm afraid if you do keep it with the current layout, I'm probably going to keep the code ignoring which ones are your favourites, since that'd be quite a bit of work to add; I might revisit it in the future if you do stick with it, though.

Sorry for all the ridiculously long posts, people, and sorry for all the complicated things and the effort I'm potentially asking you to put in. Honestly, if you like the new proposed layouts and want me to convert your index and add it to the epub generator / scraper, I'd be happy to, and if you want to stay how you are, that's okay, but it might take me a while to get around to supporting your index / any new changes you make to it.

Posted: **Wed Mar 30, 2016 8:47 am**

So can I get that rose moiety or something? People seem to have forgotten about this

Posted: **Wed Mar 30, 2016 8:58 am**

Do you have a RGB value in mind?

Posted: **Wed Mar 30, 2016 9:17 am**

Rose, like so, #FF91BF or rgb(255, 145, 191)

Posted: **Wed Mar 30, 2016 9:29 am**

██ Timepoof: Rose
██ AndaisQ: Pink
██ Pedro: Salmon
██ Marri: Red
██ Throne3d: Carmine

yep, that looks all right to me.

Posted: **Wed Mar 30, 2016 10:27 am**

I have attempted a rainbow solely for my own amusement. Not sold on all the placements, but it is close enough to please me. Also, man, really no one wants yellow xD

██ Timepoof: Rose
██ AndaisQ: Pink
██ Pedro: Salmon
██ Marri: Red
██ Throne3d: Carmine
██ Anthusiasm: Orange
██ Nemo: Gold
██ Kuuskytkolme: Lime
██ Alicorn: Green
██ Teceler: Forest
██ Kel: Cyan
██ Lambda: Teal
██ Aestrix: Blue
██ Eva: Midnight
██ Anya: Lavender
██ Link: Violet
██ Kappa: Purple
██ ErinFlight: Copper
██ PlainDealingVillain: Tan
██ Adelene: Brown
██ MaggieoftheOwls: White
██ CuriousDiscoverer: Silver
██ Benedict: Grey
██ Rockeye: Black

Posted: **Wed Mar 30, 2016 10:44 am**

I've been rainbowing it like this:
██ timepoof: Rose
██ AndaisQ: Pink
██ Pedro: Salmon
██ Marri: Red
██ Throne3d: Carmine
██ Adelene: Brown
██ PlainDealingVillain: Tan
██ ErinFlight: Copper
██ Anthusiasm: Orange
██ Nemo: Gold
██ Kuuskytkolme: Lime
██ Teceler: Forest
██ Alicorn: Green
██ Lambda: Teal
██ Kel: Cyan
██ Aestrix: Blue
██ Eva: Midnight
██ Anya: Lavender
██ Kappa: Purple
██ Link: Violet
██ MaggieoftheOwls: White
██ CuriousDiscoverer: Silver
██ Benedict: Grey
██ Rockeye: Black

The colors are actually being used now and yellow is hard to see. I could give it a black background like the white, though. *shrug*

Posted: **Wed Mar 30, 2016 10:46 am**

Yellow doesn't show up on text without being unreadable and ugly. As this prompted me to go find a color for myself, i know x.x

Speaking of... Can i grab "venom green" (728C00)? It might sit a little close though. Backup for "Army brown" (827B60) that looks similar. Not sure how to make the color show up in text and my attempt failed somehow.

Alicornutopia

Index of Glowfics

Re: Index of Glowfics

Re: Index of Glowfics

Re: Index of Glowfics

Re: Index of Glowfics

Re: Index of Glowfics

Re: Index of Glowfics

Re: Index of Glowfics

Re: Index of Glowfics

Re: Index of Glowfics

Re: Index of Glowfics