Cool! I'll add a scraper for that.
Um, I know it's a bit to ask, but is it possible that people could keep their indexes in a similar format? Perrrhaps also keep the HTML consistent (like, if you're gonna do a linebreak (<br />), could you keep it outside any <em> or <b> or <u> or whatever tags, unless you seriously need a multi-line underlined thing?). It'd be nice if sections were put into either ordered lists like:
Code: Select all
<ol><li>Entry1</li><li>Entry2</li></ol>
Or unordered lists like:
Code: Select all
<ul><li>My thing</li><li>A thing I did with A</li><li>Another thing</li></ul>
With the section name juuust above it (either as a regular string, since I can find that, or in a nice tag, like bolded or something, and if you want "extras" (e.g. "Blah (with ABC)"), make the extras also tagged, maybe italicised). Like this:
Code: Select all
Hi! This is my index.<br /><strong>Section 1</strong> <em>(with Name1)</em><ol><li><a href="URL1">Part 1</a> of section 1.</li><li><a href="URL2">Part 2</a> of section 1.</li></ol><br /><br /><strong>Other stories</strong><ul><li><a href="URL3">A story</a></li><li><a href="URL3">Another story</a></li></ul><br /><br />I hope you like them!
Like,
the Effulgence index is good for the most part, since I can just go "look for all links" (the thread URLs), then I can go "look for all text in that numbered point" (the thread names), then I can go "look for the bit of text in the numbered point outside that" (the section names), and then just move on. Other things are not so great (and I'm not trying to name and shame here, seriously! I enjoy your content, and I get that not everyone gets HTML and so on, and that it's effort to maintain it and everything, so if you really want I can just do it for you, send you it, and hopefully you can just maintain the same format later), but... when you've got:
Code: Select all
<strong><u>Thing</u></strong><em><br>Section</em> (extra stuff)<br>1. <a href="about:blank">Link1</a><br>3. <a href="about:blank">Thing 2</a><u><br></u>
It has a random underlined linebreak, and the linebreaks are sometimes inside the section titles, and there are a couple of weird characters, and the numbers in the lists are written in, rather than being automatically done by ordered lists. I have quite a bit of code dedicated to working around the different formats people use.
I mean, I can get around it, and I have, and I suppose it's effort if you guys actually don't care whether I do or don't generate ebooks using your indexes, so maybe you don't want to, but if you couuuld, because you're making a new index or something, that'd be great. If it's a lot of work, just tell me, and I'll do it for you, so you can then just copy-and-paste and try to keep it in the same format later? Sorry. :\