Page 7 of 29

Re: EPUB and MOBI of Effulgence

Posted: Fri Oct 16, 2015 8:13 pm
by DanielH
I have not run it in a while (I’ll get to it tomorrow, since I finally have free time and I left it in a bit of a mess) so I don’t know if something further has happened with get_images breaking, but the last time I ran it, it failed to get all the images because some of them are now missing, but I thought it gracefully continued from there and got the rest of them.

I’m definitely planning for Incandescent ebooks and others, but I don’t know how long that will take. Ideally, Incandescence will take 5 minutes plus the re-downloading time, but everything else will be longer and I don’t know if that ideal will actually pan out. It gathers the list of required images from the links in the story, so there shouldn’t be much I need to change. I’ll look into that after I’ve fixed some other things about the code tomorrow.

When it’s complaining about missing timetamps, it means that the site does not tell it about the timestamps in a way it understands. When a browser (or this) grabs a website, the server sends other information (headers) before it sends the content. One of those is the "Last-Modified" header, so the browser knows how old the relevant page/image/whatever is. In this case, Dreamwidth does not send a Last-Modified header for pages (although it does for images), which is where the warning comes from. This has no practical effect on running the program, except that the program will download more than it needs to, making it run slower.

Re: EPUB and MOBI of Effulgence

Posted: Fri Oct 16, 2015 8:34 pm
by Alicorn
I really want Incandescence versions. They won't even need updating.

Re: EPUB and MOBI of Effulgence

Posted: Sat Oct 17, 2015 7:35 am
by Throne3d
Perhaps it typically does just fail gracefully, but maybe that's only on a 404 error? With this, I can't connect to heritage.nv.gov at all, and it just times out. The problem is, instead of then continuing, it retries, over and over. It got up to the 20th retry before I stopped it, and it seems to give it a lot longer than Chrome typically does before it considers it to have timed out (after just testing it, it seems to time out after ~2 minutes), so that took upwards of like 20 minutes.

Oh - is it using a program which automatically checks the last modified headers? (I don't know if wget does that by default.) I didn't realise that, but yeah, that'd make sense. If only DreamWidth generated that. :P

Well, cool! Thanks. It's just, it's a lot easier to read stuff as an epub on my phone, in one cohesive chunk, rather than having load each page (and each thread, too) separately (and then flat mode not working too well with multiple threads).

Re: EPUB and MOBI of Effulgence

Posted: Thu Oct 22, 2015 11:55 am
by DanielH
I am working on newer ebooks. The Incandescence version will take longer than the ideal time of five minutes because the current script does not handle communities (so many of the chapters are by alicornutopia instead of by their actual authors) and the ToC is poorly formatted (I’m guessing that DW has some bugs turning entered HTML into a officially valid HTML, with the effect that it was valid when entered and DW made it invalid; it added </li> tags in the wrong places and added extraneous <br/>s all over the place).

The ebook will go faster if I don’t need to manually fix the ToC, but if Alicorn doesn’t want to do it then I will be able to. Either way it should be this weekend or next, depending on RL scheduling and how difficult it is to fix the community issue.

Re: EPUB and MOBI of Effulgence

Posted: Thu Oct 22, 2015 1:03 pm
by Alicorn
What exactly would I need to do to fix the ToC?

Re: EPUB and MOBI of Effulgence

Posted: Thu Oct 22, 2015 2:21 pm
by DanielH
I’m not sure what you’re seeing when you look at it, so I’m not sure I can answer. Now that I think about it some more, it’s almost certainly easier for me to fix it on my end because I don’t need to deal with Dreamwidth maybe still having the bug.

In more detail, the problem is that the ToC code looks like the below (reformatted):

Code: Select all

<ol><br />
    <li><i>Chamomile</i><br />
        <ol><br />
            <li><a href="http://alicornutopia.dreamwidth.org/2222.html?style=site">Picknicking</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/3549.html?style=site">Thingamajig</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/3834.html?style=site">Familial</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/4027.html?style=site">Clannish</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/4266.html?style=site">Finances</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/4547.html?style=site">Planetary</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/4698.html?style=site">Birthday</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/4881.html?style=site">Properly</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/5324.html?style=site">Birds</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/5513.html?style=site">Months</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/5784.html?style=site">Anniversary</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/6266.html?style=site">Stash</a><br />
            <li><a href="http://alicornutopia.dreamwidth.org/6993.html?style=site">Population</a><br />
            </li></li></li></li></li></li></li></li></li></li></li></li></li>
        </ol>
        ...
    <li><i>Sorcery</i> ∞<br />
        <ol><br />
            <li><a href="http://ertia.dreamwidth.org/415.html?style=site">Supreme</a><br /></li>
        </ol>
    <br /><br />
    </li></li></li></li></li></li></li></li></li></li></li></li></li></li></li></li></li></li></li>
</ol>
The two problems here are that the <br />s which are outside of <li> tags are illegal (and the others are extraneous but not harmful), and that the </li>s are in the wrong place (list items in the same list should not be nested but should follow each other). Chrome (and presumably FF and other browsers) can read this correctly, but BeautifulSoup (used by the ToC parsing script) cannot.

I suspect that you didn’t type the </li>s but Dreamwidth added them incorrectly; the version without any </li>s would be valid HTML and the browsers would but them in the correct places. I’m also guessing that you use whatever editing option Dreamwidth has to make whitespace matter, and it put in <br />s where you had linebreaks without realizing that some of them did not belong.

If my guesses are correct, then it would be difficult for you to fix this unless Dreamwidth has fixed the <li> placement bug. I, on the other hand, can just add a preprocessor to strip all </li>s and all <br />s and it should then parse correctly.

Re: EPUB and MOBI of Effulgence

Posted: Thu Oct 22, 2015 2:35 pm
by Alicorn
The way I wrote it looks like:

Code: Select all

<ol>
<li><i>Chamomile</i>
<ol>
<li><a href="http://alicornutopia.dreamwidth.org/2222.html?style=site">Picknicking</a>
<li><a href="http://alicornutopia.dreamwidth.org/3549.html?style=site">Thingamajig</a>
<li><a href="http://alicornutopia.dreamwidth.org/3834.html?style=site">Familial</a>
<li><a href="http://alicornutopia.dreamwidth.org/4027.html?style=site">Clannish</a>
<li><a href="http://alicornutopia.dreamwidth.org/4266.html?style=site">Finances</a>
<li><a href="http://alicornutopia.dreamwidth.org/4547.html?style=site">Planetary</a>
<li><a href="http://alicornutopia.dreamwidth.org/4698.html?style=site">Birthday</a>
<li><a href="http://alicornutopia.dreamwidth.org/4881.html?style=site">Properly</a>
<li><a href="http://alicornutopia.dreamwidth.org/5324.html?style=site">Birds</a>
<li><a href="http://alicornutopia.dreamwidth.org/5513.html?style=site">Months</a>
<li><a href="http://alicornutopia.dreamwidth.org/5784.html?style=site">Anniversary</a>
<li><a href="http://alicornutopia.dreamwidth.org/6266.html?style=site">Stash</a>
<li><a href="http://alicornutopia.dreamwidth.org/6993.html?style=site">Population</a>
</ol>

Re: EPUB and MOBI of Effulgence

Posted: Thu Oct 22, 2015 3:01 pm
by DanielH
Which is perfectly valid HTML that Dreamwidth mangled into invalid XHTML You can temporarily turn off the excess <br />s, but I don't think you can turn off the misplaced </li>s without manually putting them in where they belong, which would be tedious.

Re: EPUB and MOBI of Effulgence

Posted: Thu Oct 22, 2015 4:50 pm
by DanielH
I can’t reproduce the nv.gov error because the site’s back up.

I will check over the files I just generated for Effulgence and see if they’re good, then upload them if so.

EDIT: They are not good; the last several versions seem to be missing some of the symbella font code or something. I’ll try to fix that, but if I fail I’ll upload these versions tonight unless I find bigger problems.

EDIT 2: I have no clue why the font stuff doesn’t work in the epub, but it’s not worth investigating further. I’ll upload the new version tomorrow when my computer stops misbehaving.

Re: EPUB and MOBI of Effulgence

Posted: Fri Oct 23, 2015 10:52 pm
by DanielH
The files are uploaded to the same place.