Jump to content

Project Spelunker


_NOPE_

Recommended Posts

PK, are the parsed files being reported properly? The last couple of runs I've done have said "Now processing file # 0 of 1." or "#0 of 3'". It seems to continue as expected and creates the .warcskip file(s).

 

CHEATER CHEATER PUMPKIN EATER!

 

MUgEwc4.png

 

Heh, that's fine, it all goes to the cause. And we're in this for the long haul. I just wonder what it would take to get others to help, because really, the program takes VERY little resources, and runs in the background, and doesn't really interfere with anything (unless they come across an error).

 

Speaking of which, I'll get the patched up shortly. Also, with the Viewer now, you don't really have to report the files that you processed. Hell, I probably should do away with creating those WARCSKIP files too, since I'm also using the database check, it's kind of redundant...

If it's removing the files anyway, probably don't need the .warcskip files. But I throw it/them back into the archive folder, use it kind of like a placeholder for what I've already  parsed.

Dislike certain sounds? Silence/Modify specific sounds. Looking for modified whole powerset sfx?

Check out Michiyo's modder or Solerverse's thread.  Got a punny character? You should share it.

Link to comment
Share on other sites

I was all set to help until I realized that the file is 219 GB, and I would barely have room to store it compressed, much less unzip it...

"The opposite of a fact is falsehood, but the opposite of one profound truth may very well be another profound truth." - Niels Bohr

 

Global Handle: @JusticeBeliever ... Home servers on Live: Guardian ... Playing on: Everlasting

Link to comment
Share on other sites

Hmmm... parser is hung up on a file, not progressing and no error.

 

"boards.cityofheroes.com-threads-range-15130-20120904-150600, line 29072 of 313737"

 

Pulled the other files out and reran the same file. Got the following error:

9gyYXTD.jpg

 

Dislike certain sounds? Silence/Modify specific sounds. Looking for modified whole powerset sfx?

Check out Michiyo's modder or Solerverse's thread.  Got a punny character? You should share it.

Link to comment
Share on other sites

Going to bed - torrent is 99.9% downloaded - let it sit over night.

 

Get it going tomorrow.

** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB **


** Corsair Voyager a1600 **

Link to comment
Share on other sites

And when it reached 99% it reported that it was stalled. Although it also reported that 219G was dl'd which was the size of the file. I think is was just a reporting issue and that it actually completed, as the archive folder reported the same size.

 

My 99.9% (and stalled at this point) downloaded 219Gb tar file is definitely not extracting properly  :(

 

No problem with disk space, 7zip only reports 123 files (which is ~1.5Gb). Tried bash tar as well, no change.

 

But, I'm processing the files it will give me and that's working.

Excelsior Global Channel - for your server wide chat and forming TFs, Trials, Radios, Farms, whatever you want to do - /chan_join Excelsior today!

Link to comment
Share on other sites

I'm still getting a real low download speed. I think there is a small bit that is only available from a really slow source.

 

Maybe I'll try the direct download option.

** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB **


** Corsair Voyager a1600 **

Link to comment
Share on other sites

Hmmm... parser is hung up on a file, not progressing and no error.

 

"boards.cityofheroes.com-threads-range-15130-20120904-150600, line 29072 of 313737"

 

Pulled the other files out and reran the same file. Got the following error:

9gyYXTD.jpg

 

Yeah, this is what I'm talking about when I talk about old, junk data. It's apparently just an HTML page with no content whatsoever, weird:

 

YWUtpTT.png

 

I'm processing the rest of that file now, so don't worry about that one.

I'm out.
Link to comment
Share on other sites

Going to bed - torrent is 99.9% downloaded - let it sit over night.

 

Get it going tomorrow.

Yeah, it looks like the Internet Archive sat on this for a while and then stuck it at the bottom of its resource pile, since it's been around so long (and probably nobody cared about it until now!). Thanks for your efforts!

I'm out.
Link to comment
Share on other sites

And when it reached 99% it reported that it was stalled. Although it also reported that 219G was dl'd which was the size of the file. I think is was just a reporting issue and that it actually completed, as the archive folder reported the same size.

 

My 99.9% (and stalled at this point) downloaded 219Gb tar file is definitely not extracting properly  :(

 

No problem with disk space, 7zip only reports 123 files (which is ~1.5Gb). Tried bash tar as well, no change.

 

But, I'm processing the files it will give me and that's working.

 

Thanks for your efforts! You can try the direct download option if you want. I had both running, and the direct download completed faster, so I cancelled the torrent. Maybe the torrent is in fact missing a bit? Ugh.

I'm out.
Link to comment
Share on other sites

I just published a new version. Let me know if that auto-stops your processing after the next file processes like I intend it to. O5khXA8.gif

 

What I'm doing now is, if I run across an error in parsing (like, for example, a User Page like Red's error), I'm just throwing it into the "Pages" bucket. Then, sometime later, we can look at the Pages bucket and see if there's anything more we can do to salvage that data. If anyone has any better ideas than that, let me know.

 

I kind of expected this to just be the first round of parsing anyways, where we got the lump done, and then there'd be additional parsing as people request more and more specific queries, which would necessitate additional "cross-reference" tables, which would necessitate additional parsing for that data.

I'm out.
Link to comment
Share on other sites

By the way, I just figured out that since I didn't make it a single instance application, you can have multiple instances running and process more faster:

 

bhnGTpN.png

 

You don't HAVE to of course... just saying.... O5khXA8.gif

I'm out.
Link to comment
Share on other sites

OK boss  :o

 

ZF9pbhV.png

 

I stopped adding because with the small number of files I have right now it started trying to process the same one.

 

Also, started the web dl hopefully I get the full archive this time.

Excelsior Global Channel - for your server wide chat and forming TFs, Trials, Radios, Farms, whatever you want to do - /chan_join Excelsior today!

Link to comment
Share on other sites

  • City Council

OK boss  :o

 

ZF9pbhV.png

 

I stopped adding because with the small number of files I have right now it started trying to process the same one.

 

Also, started the web dl hopefully I get the full archive this time.

 

Hopefully you have better luck with this than I have had. I had the same issue with the torrent where the last 0.2% was completely unavailable, leading to only a small percentage of recoverable files (from the tar). The direct download has now failed at 137/219GB 4 times in a row after hours of trying, so it appears there's an issue with the direct download as well.

 

If anyone manages to get the full archive and wants to upload or seed it I can contribute more than just the 1,700 or so files that were recoverable.

Cipher

City Council

 

If you need help, please submit a support request here or use /petition in-game.

 

Got time to spare? Want to see Homecoming thrive? Consider volunteering as a Game Master!

Link to comment
Share on other sites

I've got the full archive, it succeeded for me last night. I'm in the process of decompressing it again, I planned to unzip it out of ALL of the different formats its in, then then rezip it into a single format, for convenience and reducing the number of steps. I suppose I could host it myself, I've got infinite space on my shared host, though the speed might be questionable.

 

18869 files, for reference.

 

hTs30Q8.png

I'm out.
Link to comment
Share on other sites

If you could setup a torrent that we know has the good data, after downloading it I'd be able to leave it seeding for a good while.

 

The time to download is less of a problem if we know the end result will be good.

Excelsior Global Channel - for your server wide chat and forming TFs, Trials, Radios, Farms, whatever you want to do - /chan_join Excelsior today!

Link to comment
Share on other sites

As soon as I get the whole thing unzipped the 12 times necessary into a flat folder, and then rezipped the ONE time using 7z (proven to be the highest compression rate), then I'll setup a new torrent.

 

I can even keep it running on my PC, for whatever that's worth. Maybe I could learn how to set it up on my server as a tracker, might have to investigate that...

I'm out.
Link to comment
Share on other sites

OK boss  :o

 

ZF9pbhV.png

 

I stopped adding because with the small number of files I have right now it started trying to process the same one.

 

Also, started the web dl hopefully I get the full archive this time.

 

You can get around this by placing chunks of files in different folders and pointing each instance of the parser to different folders.

Dislike certain sounds? Silence/Modify specific sounds. Looking for modified whole powerset sfx?

Check out Michiyo's modder or Solerverse's thread.  Got a punny character? You should share it.

Link to comment
Share on other sites

Splitting the files off manually is a really good suggestion and file clash is mostly an issue because of my small set of files it can select randomly from.

 

Ultimately, I'd love the parser to handle multiple runs itself. Some option to just tell it to run 10 instances of itself and get on with it. Use my excessive modern computing resources!

Excelsior Global Channel - for your server wide chat and forming TFs, Trials, Radios, Farms, whatever you want to do - /chan_join Excelsior today!

Link to comment
Share on other sites

Sorry, is this too awkward? What would work better? I'm basically just trying to randomize things so everyone doesn't end up processing the same files and wasting processing time.

 

I suppose we ( I ) could stick to a strictly numerical filename process to avoid stepping on each other's toes. My apologies if I started a poor trend. <.<

 

Dislike certain sounds? Silence/Modify specific sounds. Looking for modified whole powerset sfx?

Check out Michiyo's modder or Solerverse's thread.  Got a punny character? You should share it.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...