Jump to content

Project Spelunker


_NOPE_

Recommended Posts

I get an error when unpacking these four files. I had a fifth one that gave errors, but it unpacked fine when I did it individually. These continued with the error. Something about data beyond the package?

 

boards.cityofheroes.com-threads-range-12708-20120905-012201.warc.gz

boards.cityofheroes.com-threads-range-13957-20120905-104405.warc.gz

boards.cityofheroes.com-threads-range-17737-20120905-012913.warc.gz

boards.cityofheroes.com-threads-range-24525-20120905-105800.warc.gz

 

I've isolated them into an error folder for now and think the rest can be deleted now that I've decompressed them in leu of running your script.

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

IDK why, but I was expecting a UI of sorts instead of just a cmd window. So I just pointed it at the folder where all the files were and left it be. How long will That take? :p

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

IDK why, but I was expecting a UI of sorts instead of just a cmd window. So I just pointed it at the folder where all the files were and left it be. How long will That take? :p

 

UI's can take a bit of time and effort to code. Knight is writing the program for the project.

 

Back when I used to write code - I am a lot of of date - I wouldn't bother with a front end for something like this either.

 

I am at 99% on the new torrent. Should flip from peer to seeder shortly.

 

The data folder size has dropped a bit -- good news, right?

** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB **


** Corsair Voyager a1600 **

Link to comment
Share on other sites

I am on the road, so not really able to contribute further until I get home sometime tomorrow.

 

I haven't been processing, because I've been letting my computer sit to use all of it's resources for:

 

  • Downloading a fresh copy of the archive to work with
  • Unzipping that fresh copy the two times necessary to get to the WARCs
  • Rezipping those WARCs to one flat file at the highest possible compression
  • Creating the torrent file from that 7z file
  • Uploading that torrent file to my site
  • Uploading the actual 7z file to my site
  • Starting an upload of all the raw WARCs in case people want to download invididual ones to parse to my site uncompressed, I'm not going to zip each one one at a time!
  • Bugfixing the parser

 

So, that's what I've been up to, oh, and family stuff too... and marketeering for the first time in my spare time.

 

After these WARCs are all uploaded, then I'll start processing again.

I'm out.
Link to comment
Share on other sites

So the original direct DL finished sometime before work today. I then had WinZip unzip it all. I got home to find that I only needed to unzip a 2nd time? I'll reread the instructions, but I thought you said there was like 7 layers of unzipping. Unless that was just the Torrent to keep it thin?

 

Also, getting errors as 7Zip unpacks the current group (2nd layer). I've an hour left to unpack this group and then I'll check things again.

 

Yeah, my instructions were a bit of a hyperbole. I just don't know why people bother to archive already archived files... it does nothing for them! The errors are normal, you can disregard them.

I'm out.
Link to comment
Share on other sites

IDK why, but I was expecting a UI of sorts instead of just a cmd window. So I just pointed it at the folder where all the files were and left it be. How long will That take? :p

The Parser is just that, a parser. There's no UI because there's nothing for you to do in the program, just let it run and do it's thing, it works automatically. If you want to stop it at any time, just close the window, and then you can restart it again at any time.

 

The VIEWER is probably what you want, if you want to actually browse what people have parsed so far, but it's kind of crap right now, until I have time to make the link between the UserIDs and the ThreadIDs, that's the major missing piece right now, and I'll get around to parsing that when I have more time... it's the only thing I never have enough of... time.

I'm out.
Link to comment
Share on other sites

The VIEWER is probably what you want, if you want to actually browse what people have parsed so far, but it's kind of crap right now, until I have time to make the link between the UserIDs and the ThreadIDs, that's the major missing piece right now, and I'll get around to parsing that when I have more time... it's the only thing I never have enough of... time.

 

Yeah, I'm really not sure what I was even looking for. I'm still not sure what the parser is doing per say, but it seems the 700+Gb has gone down a bit, so I'm guessing it's stripping stuff away, reporting whatever you're looking for, and then deleting the processed file. I tried running the app twice in the same folder, but that gave an error which likely sums up to it not wanting to run a 2nd instance. So instead I went and changed the apps Priority to Realtime. That added about 20% to the memory usage, a tad bit of CPU time, and sped up the scrolling a bit.

 

Any general guess as to how long this will take to process now that there's like 6 of us and somebody's already munched a ton? Overall it looks like we've knocked out like <3k files of <19k.

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

At this rate, maybe a month? If more people join in, sooner?

 

Yes, as I said earlier in the thread, every time the program is done with a file, it deletes it, then queries the database to check if any MORE files have also been processed in that timeframe, and deletes them too.

I'm out.
Link to comment
Share on other sites

Also, you can run the app multiple times at the same time, but you'll want to wait until the first time it deletes all known already processed files and starts processing its own first file. Then it's safe to start a new instance.

I'm out.
Link to comment
Share on other sites

I have not been able to get more then 20 copies running. It varies but sometime at the 18-20 stage I hit a wall of cant connect to the server 3 times errors.

 

Dont know if my router, charter, or what is limiting the number of connections.

 

Otherwise I would get 100-200+ copies going before bed. It kinda a pain to type my name and chose the file path each time though when launching that many copies. Could it look at a text file for the info? And could we set a move to folder so instead of stopping it just keeps going after posting the alert?

 

I get any changes made take away from your free time and we are all very appreciative of the work you are doing... Just some thoughts if you have time.

** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB **


** Corsair Voyager a1600 **

Link to comment
Share on other sites

Another random thought --- if the error messages were outputted to a text file -- date/time - message and it kept adding to the same file it would make posting them quicker and easier. Though at some point I should pick up snag it or some such for home - I use it enough at work.

** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB **


** Corsair Voyager a1600 **

Link to comment
Share on other sites

I have no idea why I didn't think of these ideas. Thanks Zep! Okay, well I shouldn't save the directory, because for all I know users might want to run it multiple times in multiple different directories. But I can save the username. So I am now. I'm also making an "Errors" subdirectory underneath the chosen directory once the first error is generated, then making an Errors.log file under there, and if the error is related to a file, it'll move that file automatically to the errors subdirectory.

 

trsfj3d.png

 

21AeMnu.png

 

Also, what I've found with those timeout errors is that my server has a limited number of total allowable connections, so that kind of limits the speed of this project. So when you get those timeout errors (which will now just be in the console instead of a popup), you'll just need to move those files back into the main directory when you're done with all the other files and try again.

 

VY4gVDJ.png

 

New version of the Parser is now available to download. User the link in the OP.

 

I'm out.
Link to comment
Share on other sites

So......

 

Hows the units worked looking? :)

** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB **


** Corsair Voyager a1600 **

Link to comment
Share on other sites

ok there's a lot of posts already and I've not read everything in detail, so apologies if I'm repeating a question.

 

what is the status of the Euro forums in this project? For many years the game suffered from a dev-inflicted trans-atlantic divide and we were very much the poor relations for a very long time. I hope if you're going to bring back the US forums, you'd also consider bringing the EU forums too, though I'd understand if it took a bit longer.

 

 

 

 

A friend asked me if I knew any Motown puns. I told him I know two or three, maybe four, tops!

 
Link to comment
Share on other sites

You can see for yourself if you run the Viewer program and click the Contributors window:

 

4GHaPuV.png

 

los3Hmo.gif

 

13 steps was too much for me. I'll try again.

** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB **


** Corsair Voyager a1600 **

Link to comment
Share on other sites

ok there's a lot of posts already and I've not read everything in detail, so apologies if I'm repeating a question.

 

what is the status of the Euro forums in this project? For many years the game suffered from a dev-inflicted trans-atlantic divide and we were very much the poor relations for a very long time. I hope if you're going to bring back the US forums, you'd also consider bringing the EU forums too, though I'd understand if it took a bit longer.

 

Part of the reason we are able to bring back the US forums is someone knew where we could download a final copy of it.

 

Which begs the question, Does anyone know where we can find the EU forums for download?

** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB **


** Corsair Voyager a1600 **

Link to comment
Share on other sites

ok there's a lot of posts already and I've not read everything in detail, so apologies if I'm repeating a question.

 

what is the status of the Euro forums in this project? For many years the game suffered from a dev-inflicted trans-atlantic divide and we were very much the poor relations for a very long time. I hope if you're going to bring back the US forums, you'd also consider bringing the EU forums too, though I'd understand if it took a bit longer.

 

This is only moving as fast as it is because someone at the Internet Archive had a foresight to preserve the forums in September of 2012 before they disappeared. I still have a spider going for the Wayback Machine for s***s and giggles, but that still isn't even CLOSE to done.

 

I don't know if anybody did that preservation for the EU forums. If you can find me an archive, I can parse it. Do you know the old URL? I can check the Wayback Machine, it might be easier with the EU because I'm assuming there'd be less data.

I'm out.
Link to comment
Share on other sites

I've just pushed out a critical update for the Parser, please redownload it as soon as you read it. Here's the details:

  • Added new UserInThread table to my schema, and created "UserInThreadExists" and "AddUserInThread" methods, so that we can make an index table linking UserIDs to the ThreadIDs of the threads that they have posted in.  This will let the magic happen that will allow you to search for all threads that a person has posted in. Later on, I'll make a mini-app that queries all of the existing data, and corrects the lack of these records for all previously parsed data.
  • BugFix - The parser, when coming across an error, was renaming the file to "Error + FileName" instead of moving it to the error directory... because I forgot to add a backslash into my concatenation - d'oh! If you have any files in your folder that start with the word "Error", you'll need to rename them to take the "Error" portion off, sorry!
  • BugFix - .WARCSKIP files should no longer be created. I thought I'd already taken that part of the code out, but it looks like I just took out the part that checks for them.

I'm out.
Link to comment
Share on other sites

Got home to find the sparser had been updated and it caused a version failure error, so both threads were shut down. I cloned the archive to a second drive and I'm running a thread per drive since they're mechanical.

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

That's awesome, thank you for your help. And yes, the version check is intentional. I wanted to make sure people weren't parsing with a bad parser, so on every loop, I'm having the parser check the database to ensure that you have the latest "good" version of the Parser. Let me know how it goes!

I'm out.
Link to comment
Share on other sites

Updated to the new version and running again. Also, now seeding out the torrent.

 

Sorry to say I'll be AFK from the project from Thu for a week, going away on holiday, but I'm sure there'll be files left for me when I'm back!

Excelsior Global Channel - for your server wide chat and forming TFs, Trials, Radios, Farms, whatever you want to do - /chan_join Excelsior today!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...