Jump to content

Project Spelunker


_NOPE_

Recommended Posts

Just now, The Philotic Knight said:

You.... didn't want to know how the process ended? 

It's not that, and I commend you for the work you're doing - but that wall of text could be easily condensed for the readers of this forum so as not to take up at least a solid 20 seconds of scrolling. 

  • Like 1
Link to comment
Share on other sites

4 minutes ago, Glacier Peak said:

It's not that, and I commend you for the work you're doing - but that wall of text could be easily condensed for the readers of this forum so as not to take up at least a solid 20 seconds of scrolling. 

Ah, sorry, I hadn't thought about that. I almost always browse on a desktop PC, where you can easily click and drag the scroll bar. I hadn't considered mobile users. I pushed it to a file attachment.

  • Thanks 1
I'm out.
Link to comment
Share on other sites

I just pushed out a software update to the processor, please get the latest version for maximum efficiency. Here's the update notes:

 

image.thumb.png.59d88ceefbc689b531bbde5c05d52f40.png

 

  1. Version number label added
  2. Added the number of files that have been uploaded and are waiting to be processed. I don't know why I forgot about this number, but it's there now.
  3. Added a "Verbose Logging Enabled" checkbox, which is disabled by default. Previously, "verbose logging" was always enabled, and I noticed that refreshing the screen for EVERY single file action was making the program take FOREVER. By disabling this, the program processes each file at LEAST four times faster. But, you can still enable it and process slower, if you want to have a log of every single thing that the program is doing. Just know that it will slow things down considerably!
  4. You can see here that I'm no longer showing individual files being added to the zip file and being deleted from the file system. That's been pushed to the "verbose logging" option.
  5. I've moved the "popup progress bars" to 5 pixels to the right and down from the main window, and did a little "trick" where I forced the window to be "always on top" for a moment, then immediately turned "always on top" off, to force the progress window to appear "in front of" the main window. Previously, it often times ended up hidden behind the form.
  6. I forgot to add a numerical dot for this, but the "files processing" number now includes the current program's processing of a file. Previously, it was only accounting for OTHER people that were processing files. Now, it counts everyone including the current running program.

Please let me know if you have any comments, questions, or concerns about these changes, or the program in general.

Edited by The Philotic Knight
  • Like 1
I'm out.
Link to comment
Share on other sites

@The Philotic Knight I would love to help but we are currently experiencing issues with our service provider which will likely result in one or more router resets.

 

  • Sad 1

Dislike certain sounds? Silence/Modify specific sounds. Looking for modified whole powerset sfx?

Check out Michiyo's modder or Solerverse's thread.  Got a punny character? You should share it.

Link to comment
Share on other sites

Question(s) for you @The Philotic Knight - 

 

I've had a few errors resulting in files 'stranded' in the temp/COH_Forums_Output directory.  I have two 'batches' of files saved into folders from this occurring.  Do you need them?  Or will things on your side notice those files did not complete and put that file back in line to be reprocessed automatically?

 

One of these is from be accidentally killing my own internet connection during processing (don't ask, had not had my coffee yet).  So that one I caused.

 

The other one was from your program after running fine for a bit deciding to stop running and delete itself (leaving everything in the program directory but the .exe.  That happened twice.  Both times with the new version 1.1 (which by the way is lightning fast compared to the previous version - thanks!).  At first I thought maybe some anti-virus program had shut it down and quarantined it however that is not the case.  My second guess is that perhaps since I was running v 1,0, then switched to using v 1.1 without a restart to clear everything, maybe something there caused a hiccup (something stranded or lurking in memory or elsewhere).  That all occurred on my PC earlier.  I have since switched to my laptop and no problems yet (except one strange error about file size which I was able to remedy just by closing the program and starting it up again.

 

So far it seems to be running better on my laptop (it's an Alienware beast) so no worries.  But I didn't want to just leave those files stranded without asking you if you still need them.  I have them in two folders approximately 45 and 68 MB.

 

Lastly, holy granola some of these files are HUGE!  I'm currently processing one that is 1.25 GB that has what appears to be well over 10,000 files in it.  Yikes!  How big are those forum posts! lol

Want to see my current list of characters?  Want to know more about me than you ever wanted to know?

Wish Granted!   Check out the 'About Me' in my profile:   KauaiJim - Homecoming (homecomingservers.com)

 

Link to comment
Share on other sites

@KauaiJim you can just delete those files. This is how the process works:

 

  1. I have an "input" folder that contains all of the WARCS that need to be processed. The program pulls from there.
  2. When someone successfully downloads a WARC, the server copy is moved to a "processing" folder.
  3. When someone successfully fully processes and uploads back a zipped up version of the extracted files, only THEN does my program move the "processing" file over to the final "processed" folder.

So, any jobs that get "stuck" due to one issue or another? The WARC file will just end up getting "stuck" in the "processing" folder and won't be picked up by anyone else. I will manually move those files back to the input folder when all of the other files are done processing, but I want to keep them there for now, just in case there's errors within the WARC files that I need to investigate. Make sense?

 

And yes, some of those files are MASSIVE. But you should note that this is from the Internet Archive "leeching" the forums for ALL data. Not just the posts, but also all images, etc. Most of the large files that I found were actually animated gifs that people were using for their avatars... I think @Acemace holds the record so far.

I'm out.
Link to comment
Share on other sites

Thank you!  That's what I was hoping.  Especially since I did not make note of the file numbers.  So far so good.  Working on that monster now - looks to probably have over 15,000 files!  (still extracting).

 

Yeah, those darn folks with their animated gifs!  [looks quietly at his pretty twisting cube profile pic, lol]

Want to see my current list of characters?  Want to know more about me than you ever wanted to know?

Wish Granted!   Check out the 'About Me' in my profile:   KauaiJim - Homecoming (homecomingservers.com)

 

Link to comment
Share on other sites

@The Philotic Knight I have been getting a lot of these stack trace errors just after download on both my PC and laptop.  Files still process fine IF they can get past this.

 

There was an error in the application. Please copy and paste the following text in a message to the program author:
System.Exception: Error while processing file 'boards.cityofheroes.com-threads-range-13818-20120905-094804.warc' ---> System.Exception: Failed to move file to process from '/input/boards.cityofheroes.com-threads-range-13818-20120905-094804.warc' to '/processing/boards.cityofheroes.com-threads-range-13818-20120905-094804.warc'.
   at COH_WARC_Processor.MainForm.BtnProcess_Click(Object sender, EventArgs e) in C:\Projects\COH WARC Handler\WHB WARC Processor\MainForm.cs:line 84
   --- End of inner exception stack trace ---

 

I'm wondering if maybe your program isn't allowing for enough time or retries before kicking back this error?  If things are getting loaded down on your side, which makes sense with everyone else pinging you AND you moving files into the input directory, then maybe the application on my side is getting ahead of it?

 

Any ideas?

I can just keep restarting each one as it errors (approx every third or forth download), but it would be nice to be able to leave these unattended for a few minutes at a time.  Thanks!

Want to see my current list of characters?  Want to know more about me than you ever wanted to know?

Wish Granted!   Check out the 'About Me' in my profile:   KauaiJim - Homecoming (homecomingservers.com)

 

Link to comment
Share on other sites

*twitches*

A little more to the right next time you decide to poke, @The Philotic Knight.

  • Thanks 1

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired  <||> Generally Inactive


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

55 minutes ago, KauaiJim said:

@The Philotic Knight I have been getting a lot of these stack trace errors just after download on both my PC and laptop.  Files still process fine IF they can get past this.

 

There was an error in the application. Please copy and paste the following text in a message to the program author:
System.Exception: Error while processing file 'boards.cityofheroes.com-threads-range-13818-20120905-094804.warc' ---> System.Exception: Failed to move file to process from '/input/boards.cityofheroes.com-threads-range-13818-20120905-094804.warc' to '/processing/boards.cityofheroes.com-threads-range-13818-20120905-094804.warc'.
   at COH_WARC_Processor.MainForm.BtnProcess_Click(Object sender, EventArgs e) in C:\Projects\COH WARC Handler\WHB WARC Processor\MainForm.cs:line 84
   --- End of inner exception stack trace ---

 

I'm wondering if maybe your program isn't allowing for enough time or retries before kicking back this error?  If things are getting loaded down on your side, which makes sense with everyone else pinging you AND you moving files into the input directory, then maybe the application on my side is getting ahead of it?

 

Any ideas?

I can just keep restarting each one as it errors (approx every third or forth download), but it would be nice to be able to leave these unattended for a few minutes at a time.  Thanks!

I'll try to change that to start the next loop instead of throwing an error. It'll have to be tomorrow though, I have to put my kids to bed.

  • Thanks 1
I'm out.
Link to comment
Share on other sites

Possibly silly questions...

  • How many instances should we attempt?
  • What's the expected bandwidth usage?
  • How bad is it if I have to just quit the app, processing wise?

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired  <||> Generally Inactive


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

I'm on a Ryzen 3900x with 64gb now -- based on my prior Ryzen 1800x, could probably get 30-50 copies of the program running -- can you handle that level of bandwidth?

** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB **


** Corsair Voyager a1600 **

Link to comment
Share on other sites

12 minutes ago, Zep said:

I'm on a Ryzen 3900x with 64gb now -- based on my prior Ryzen 1800x, could probably get 30-50 copies of the program running -- can you handle that level of bandwidth?

 

I've had to restart a few of them, but I'm running 10 at the moment on my 3770k. It seems to be mostly CPU intensive as I'm pegged right now. Ram is sitting at ~8 Gb (Win7 x64, 32Gb RAM). Upload speed is like <200 Kbps, down is ~20 Mbps.  I'll check on it here and there, but I'm doing trophy roundup in ME1 (LE) right now.

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired  <||> Generally Inactive


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

Okay, let's put this on hold for the moment, I just received word that the amount of data is overwhelming my shared host. 😮 So, yeah... I'm going to recode the program to point to a different server that can hold that capacity. In the meantime, let's just put a pin in this. I'm going to delete all of the "input", and "processed" files to get the host off my back, but I will download the "output" work that you all have done so far, so that that isn't lost.

 

Please stand by.

  • Haha 1
  • Thumbs Up 1
I'm out.
Link to comment
Share on other sites

That would explain the lack of file messages. :p

 

Also, got this error periodically. I'm guessing Host issue.

 

There was an error in the application. Please copy and paste the following text in a message to the program author:
System.Exception: Error while processing file 'boards.cityofheroes.com-threads-range-18729-20120904-224603.warc' ---> System.Net.WebException: The remote server returned an error: (550) File unavailable (e.g., file not found, no access).
   at System.Net.FtpWebRequest.SyncRequestCallback(Object obj)
   at System.Net.FtpWebRequest.RequestCallback(Object obj)
   at System.Net.CommandStream.Dispose(Boolean disposing)
   at System.IO.Stream.Close()
   at System.IO.Stream.Dispose()
   at System.Net.ConnectionPool.Destroy(PooledStream pooledStream)
   at System.Net.ConnectionPool.PutConnection(PooledStream pooledStream, Object owningObject, Int32 creationTimeout, Boolean canReuse)
   at System.Net.FtpWebRequest.FinishRequestStage(RequestStage stage)
   at System.Net.FtpWebRequest.GetResponse()
   at COH_WARC_Processor.FTPclient.GetStringResponse(FtpWebRequest ftp) in C:\Projects\COH WARC Handler\WHB WARC Processor\FTP Client.cs:line 742
   at COH_WARC_Processor.FTPclient.GetFileSize(String filename) in C:\Projects\COH WARC Handler\WHB WARC Processor\FTP Client.cs:line 497
   at COH_WARC_Processor.FTPclient.Download(String sourceFilename, FileInfo targetFI, Boolean PermitOverwrite) in C:\Projects\COH WARC Handler\WHB WARC Processor\FTP Client.cs:line 320
   at COH_WARC_Processor.FTPclient.Download(String sourceFilename, String localFilename, Boolean PermitOverwrite) in C:\Projects\COH WARC Handler\WHB WARC Processor\FTP Client.cs:line 242
   at COH_WARC_Processor.MainForm.BtnProcess_Click(Object sender, EventArgs e) in C:\Projects\COH WARC Handler\WHB WARC Processor\MainForm.cs:line 76
   --- End of inner exception stack trace ---

 

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired  <||> Generally Inactive


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

We are back up! New server without the space constraints, though time will tell if it'll be able to handle the strain of a large number processing at once. To connect, you must go back to the site and download version 1.2 that I just uploaded. It points to the new server, and also has a bugfix for that "can't download file" error that just makes it go to the next iteration instead of failing. Now, to ensure that we're not double-processing those files that have already processed, I've added a new check at the beginning of each loop, that checks for already processed zips and compares them to WARCs in the input directory. This adds a few seconds to every iteration, but it's necessary for now until I can get all of the files copied over into the input directory.

 

Once that happens (I suspect sometime tomorrow morning), I'll release version 1.3 with that check taken out. But feel free to process as you wish now, it appears to be working - I tested it on my work PC and my home PC (PCs located at two separate locations).

I'm out.
Link to comment
Share on other sites

Version 1.3 is now up on my server, if you want to get that, it has only two changes from 1.2:

 

  1. Removed the "input" check, to speed things up, since everything is uploaded and where it needs to be, and
  2. Fixed a slight bug with the calculations of the number of files that need to be uploaded, to prevent it from being a negative number.
I'm out.
Link to comment
Share on other sites

So, um... SLIGHT bugfix. I know I just posted 1.3 just two hours ago, but I was checking out the output files so far, and the actual files are all there and great! Thank you!

 

But I didn't realize that Ionic's Zip functionality by default stored the PATH information to the files as well. So.... I suggest downloading 1.4 right now and using that instead, because I fixed it. Unless you WANT me to know where your temp path folder and computer login username is:

image.thumb.png.6b1ef1505dca2144718cd88635e806c9.png

 

I have a little script that I just wrote that I've tested, that will go through all of the output files, and if the ZipEntry filename isn't in the root path, it'll snip off the rest of that path until it's in the root, so no problems going forward or with the existing data. The future files will all now look like this, as they are supposed to:

image.png.1b52cc1c77f164e213173fe211fd73c2.png

 

I'm out.
Link to comment
Share on other sites

In other news, I'm at about my wit's end with my current web host. Look at this crap...

 

Webhost

Quote

Hello,

During routine file system maintenance it has come to our attention that your account contains large amount of data and after an initial review, the content does not appear to be a part of your website. Shared hosting space may only be used for active email, website files and content, as outlined in the Acceptable Use Policies for a shared account as a backup/storage device.

These are the files in question:

D:\InetPub\vhosts\cityofplayers.com - 312 GB

Please respond immediately indicating your timeline for removal of these files. Failure to reply within 7 days may result in limits on your account which may interfere with site functionality or termination of your account for violation of our terms of service.

At this time we have imposed a quota on the account to prevent it from growing any further. This does not keep your website from being accessible, but the ability to create files or folders on this account will cease. You may see some errors if your website relies on using session files or errors related to caching mechanisms. Automated backup plugins may also fail while the quota is in place.

If you require assistance in deleting large files and directories, let us know what files are safe to remove. Please note that if we are asked to remove content, the change is permanent, and no data recovery options are available

 

Me

Quote
Okay, what is an acceptable total file size for the shared server to contain, until I can upgrade to the dedicated server?
 
I'll remove the appropriate files as soon as I can, but I need to know what level I need to get down to.

 

Webhost

Quote

Hello Westley,

Thank you for contacting us.


There is no set size limit on how much space can be used by an account, but you can not use your account as storage or archival means to store files, which from appearance and the behavior, you are doing with these files. 

If the files continue to increase as they have, we will be forced to remove them as they are causing stability concerns with the server.

As they are against the AUP of our shared servers they must be removed from the server soon, as within the next few hours as they are starting to affect other customers on the server.

If you have any additional questions or concerns, please do not hesitate to contact us.

 

Me

Quote
I'm sorry, but I'm confused. You say that there is no file limit, and at the same time you say that I'm reaching a file limit that doesn't exist?
 
I'm currently in the process of trying to remove as many files as I can right now that are not being currently processed. Please stand by. I will provide further updates when I've removed all files that I don't believe are absolutely necessary for the current running processes.

 

Webhost

Quote

There is no set limit to files that are directly related to your site being hosted and that do not violate our AUP.  As these files violate our AUP, and were starting to affect the server stability, they need to be removed.

As long as the files are not causing issues to the server and they are directly related to your hosting and do not violate our AUP there is no hard limit on the size.

Space on our shared servers is not unlimited, but unmetered, as in we do not charge you for the amount of space used on the server.

Thank you for freeing up space on the server.
 

 

Me

Quote

Question relating this issue. These are WARC files that were causing the initial issue. I'm running a project to restore an archived version of an old internet forum. While I can understand not wanting just a bunch of archive and zip files laying around, my final purpose for this project is to make all of these old web pages viewable and searchable to the world. So my question to you is... would this be acceptable under the AUP? The final format of these files won't be archive files, they will be HTML pages, and images that are referenced on those HTML pages, just like any other website. So would these files being hosted on the shared server be acceptable, even if it's a LARGE number of them, because they are in fact actual website content?

 

Webhost

Quote

Hello,

Files used for your website can be on the server, but if you are among the highest disk users if the server's disk space becomes critical again, we will contact you requesting you reduce usage wherever you can.

If the WARC files need to stay on the server, please show us how they are used to display content. They are archive files, so it is unclear why they need to stay on the server versus being uploaded, processed, and then removed.

 

Me

Quote

I wasn't talking about the WARC files, they won't be coming back to the server. My project is to extract from those WARC files all of their contents... the HTML, gif, jpg, Javascript files, etc. So they will not be archive files that I push back up to the server, they will be actual web content.

 
So, from what you are saying on the one part, it sounds like those would be acceptable because they'll be web content and not archive files. On the other hand, there are MANY gigabytes of them to upload, so it may overtaxed the server.
 
So, I suppose I must press again, what is an "aceeptable" number of megabytes or gigabytes of disk space usage for a single account, and what is an unacceptable level?
 
Because I need to know what level to keep my server down to, so that this doesn't happen again. I can't accept "there is no level, just don't do a lot" as an answer. I need an actual number to go by to know when exactly to stop.

 

It's like talking to a wall... ARRRRRGGGGHHHH!!!!

Argh

  • Haha 1
I'm out.
Link to comment
Share on other sites

I'll just keep grinding out files until you let me know there's a time out (to find another host, lol).  😁

 

Also, not worried about the path (and user name).  But thanks for letting us know!  👍 

Want to see my current list of characters?  Want to know more about me than you ever wanted to know?

Wish Granted!   Check out the 'About Me' in my profile:   KauaiJim - Homecoming (homecomingservers.com)

 

Link to comment
Share on other sites

10 hours ago, The Philotic Knight said:

But I didn't realize that Ionic's Zip functionality by default stored the PATH information to the files as well. So.... I suggest downloading 1.4 right now and using that instead, because I fixed it. Unless you WANT me to know where your temp path folder and computer login username is:

 

At this point, between the two of us that's not a big deal, but it's smart of you to both identify and correct this type of privacy based issue (while also pointing it out). I have some window shopping to do on AMZ for a bit, so I'll won't melt your server until later tonite. :p

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired  <||> Generally Inactive


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

8 Instance Server Melting Power Activate!

 

Because I'm running 8 thread CPU

Edited by WanderingAries
  • Thumbs Up 1

OG Server: Pinnacle  <||>  Current Primary Server: Torchbearer  ||  Also found on the others if desired  <||> Generally Inactive


Installing CoX:  Windows  ||  MacOS  ||  MacOS for M1  <||>  Migrating Data from an Older Installation


Clubs: Mid's Hero Designer  ||  PC Builders  ||  HC Wiki  ||  Jerk Hackers


Old Forums  <||>  Titan Network  <||>  Heroica! (by @Shenanigunner)

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...