Index of Glowfics

Do you have a setting, character, plot, art, or other notion that you wish to put on the Internet? This is the Internet! Whee!
Throne3d
Posts: 1282
Joined: Sat Oct 10, 2015 1:11 pm
Pronouns: He/him/his
Location: United Kingdom

Re: Index of Glowfics

Post by Throne3d »

Marri wrote:I have been waiting for you to get it into a state you're happy with, but then I'm totally down to convert it into a Rails-happy version I can use on my site.
Oh, right. Well, I've been pretty happy with it for a few weeks now, I think; it's mainly just been finding new indexes to add, where I just add small-ish bits of code at a single point to make it able to run that when you do it. If you set up a rails version, I should probably be able to add new indexes to that if I come across any more.

I don't think I've changed much of the actual structure of the epubs in a while, and that wouldn't affect you anyway. It downloads the pages fine, it gets the content from them fine, it can get more content if you need it to get more data (shouldn't be too difficult), and I'm only adding new indexes when I find them, but if I find more you can always use those later to import stuff.

Basically, I think it's already in a state where I'm pretty much happy with it!
User avatar
MaggieoftheOwls
Posts: 733
Joined: Sun Apr 05, 2015 7:39 pm
Pronouns: she/her/hers

Re: Index of Glowfics

Post by MaggieoftheOwls »

Oh, that reminds me, I do have an index now. https://maggie-of-the-owls.dreamwidth.org/454.html
Throne3d
Posts: 1282
Joined: Sat Oct 10, 2015 1:11 pm
Pronouns: He/him/his
Location: United Kingdom

Re: Index of Glowfics

Post by Throne3d »

MaggieoftheOwls wrote:Oh, that reminds me, I do have an index now. https://maggie-of-the-owls.dreamwidth.org/454.html
Cool! I'll add a scraper for that.

Um, I know it's a bit to ask, but is it possible that people could keep their indexes in a similar format? Perrrhaps also keep the HTML consistent (like, if you're gonna do a linebreak (<br />), could you keep it outside any <em> or <b> or <u> or whatever tags, unless you seriously need a multi-line underlined thing?). It'd be nice if sections were put into either ordered lists like:

Code: Select all

<ol><li>Entry1</li><li>Entry2</li></ol>
Or unordered lists like:

Code: Select all

<ul><li>My thing</li><li>A thing I did with A</li><li>Another thing</li></ul>
With the section name juuust above it (either as a regular string, since I can find that, or in a nice tag, like bolded or something, and if you want "extras" (e.g. "Blah (with ABC)"), make the extras also tagged, maybe italicised). Like this:

Code: Select all

Hi! This is my index.<br /><strong>Section 1</strong> <em>(with Name1)</em><ol><li><a href="URL1">Part 1</a> of section 1.</li><li><a href="URL2">Part 2</a> of section 1.</li></ol><br /><br /><strong>Other stories</strong><ul><li><a href="URL3">A story</a></li><li><a href="URL3">Another story</a></li></ul><br /><br />I hope you like them!
Like, the Effulgence index is good for the most part, since I can just go "look for all links" (the thread URLs), then I can go "look for all text in that numbered point" (the thread names), then I can go "look for the bit of text in the numbered point outside that" (the section names), and then just move on. Other things are not so great (and I'm not trying to name and shame here, seriously! I enjoy your content, and I get that not everyone gets HTML and so on, and that it's effort to maintain it and everything, so if you really want I can just do it for you, send you it, and hopefully you can just maintain the same format later), but... when you've got:

Code: Select all

<strong><u>Thing</u></strong><em><br>Section</em> (extra stuff)<br>1. <a href="about:blank">Link1</a><br>3. <a href="about:blank">Thing 2</a><u><br></u>
It has a random underlined linebreak, and the linebreaks are sometimes inside the section titles, and there are a couple of weird characters, and the numbers in the lists are written in, rather than being automatically done by ordered lists. I have quite a bit of code dedicated to working around the different formats people use.

I mean, I can get around it, and I have, and I suppose it's effort if you guys actually don't care whether I do or don't generate ebooks using your indexes, so maybe you don't want to, but if you couuuld, because you're making a new index or something, that'd be great. If it's a lot of work, just tell me, and I'll do it for you, so you can then just copy-and-paste and try to keep it in the same format later? Sorry. :\
User avatar
Alicorn
Site Admin
Posts: 4226
Joined: Fri Mar 21, 2014 4:44 pm
Pronouns: She/her/hers
Location: The Belltower
Contact:

Re: Index of Glowfics

Post by Alicorn »

I don't mean to be doing any horrid formatting in mine, but I'm very attached to putting double spaces between sentences so DW might be interpreting badly...
User avatar
DanielH
Posts: 3745
Joined: Tue Apr 01, 2014 1:50 pm
Pronouns: he/him/his

Re: Index of Glowfics

Post by DanielH »

I don’t think it’s the double spaces.

Part of the problem seems to be the order of the tags. Try to make sure to put <u>, <em>, etc. on the line with the stuff that should be formatted. However, I’m petty sure part of the problem is that the Dreamwidth auto-formatter has a lot of problems, and those can’t be fixed unless you want to manage linebreaks and stuff manually.

For example, in Incandescence it nests each item of each section’s list inside the next, and each section inside the next. Ordinarily you should have

Code: Select all

<ol>
    <li>[link 1]</li>
    <li>[link 2]</li>
    <li>[link 3]</li>
    <li>[link 4]</li>
    <li>[link 5]</li>
</ol>
Instead, it gives

Code: Select all

[code]
<ol>
    <li>[link 1]
    <li>[link 2]
    <li>[link 3]
    <li>[link 4]
    <li>[link 5]
    </li></li></li></li></li>
</ol>
I think you are using the auto-formatter instead of hand-writing the HTML, and there is just no reason DreamWidth should try that no matter what you’re doing. I was trying to ask you to fix this in the EPUB thread, but before I could clearly communicate the problem and how to fix it, Throne3d came along with something that parsed it anyway.

I think the conclusion to draw is that the auto-formatter is bad. It’s still a nice feature and I would not blame anybody for using it, but it makes things harder on the people who want to actually read and parse the HTML.
User avatar
Ezra
Posts: 944
Joined: Tue Mar 25, 2014 11:15 am
Pronouns: he/him/his

Re: Index of Glowfics

Post by Ezra »

Sometimes people write old-school HTML, where the close tags for things like "<p>" and "<li>" are inferred from context, never written out. It's been less fashionable since xhtml came on the scene, but I'm pretty sure it's still legal in HTML5.

I have to parse that kind of HTML for making the Elcenia print editions, certainly.
User avatar
DanielH
Posts: 3745
Joined: Tue Apr 01, 2014 1:50 pm
Pronouns: he/him/his

Re: Index of Glowfics

Post by DanielH »

Ah, that makes sense. And then I guess Dreamwidth tries and fails to sensibly add close tags. Because the close tags are there, and they mess up the BeautifulSoup library when it tries to parse the HTML. I bet it could handle it if the close tags were not there at all.
User avatar
Alicorn
Site Admin
Posts: 4226
Joined: Fri Mar 21, 2014 4:44 pm
Pronouns: She/her/hers
Location: The Belltower
Contact:

Re: Index of Glowfics

Post by Alicorn »

I handwrite my HTML, but I don't close tags that the thing should be able to figure out its own self.
User avatar
Adelene
Posts: 678
Joined: Fri Mar 21, 2014 5:18 pm
Pronouns: they

Re: Index of Glowfics

Post by Adelene »

DanielH wrote:Ah, that makes sense. And then I guess Dreamwidth tries and fails to sensibly add close tags. Because the close tags are there, and they mess up the BeautifulSoup library when it tries to parse the HTML. I bet it could handle it if the close tags were not there at all.
What happens if you just strip out all the </p> and </li> tags from everything, properly formatted or not, before you do any other parsing?
Utility Admin
User avatar
DanielH
Posts: 3745
Joined: Tue Apr 01, 2014 1:50 pm
Pronouns: he/him/his

Re: Index of Glowfics

Post by DanielH »

I don’t know; before I tried to get the parser to work Throne3d came along with a working one. I expect BeautifulSoup would handle it correctly, but I haven’t really used the package much.

If you handwrite the HTML including the line breaks, then I think some of what Throne3d requested boils down to making the <br />s outside of the <u>s and <em>s.
Post Reply