Gobbledygook Posted July 7, 2019 Posted July 7, 2019 Maybe, but it looks like it's something that almost nobody wants, if you look at the lack of Patreon response. Homecoming wants to remain hands off, and.... I also did post on the Titan Network forums about two days ago, and got something 43 views, no response. Not very encouraging. https://www.cohtitan.com/forum/index.php?topic=13560.0 It looks like the only ones that want this are those that have posted here and already contributed processing time. It's a bit disheartening. I plan on doing the Patreon thing, but it will have to wait a couple of days.
Zep Posted July 8, 2019 Posted July 8, 2019 Wanting access and having money are two separate things. I can contribute a little... just waiting to see how the wind is blowing so to speak. A 1 time fee I am much better at then ongoing. What would be the possibility of importing all the forum data into a blank site? Or once processed could we make it torrent downloadable? Of course online is best just want to think through every option. ** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB ** ** Corsair Voyager a1600 **
_NOPE_ Posted July 8, 2019 Author Posted July 8, 2019 Hmmm... you gave me an idea... I've got to spend some time thinking about it, but... it's an idea. I'll let you know when I've had more time to process. I'm out.
WanderingAries Posted July 8, 2019 Posted July 8, 2019 In the meantime, couldn't we just keep working on the crunching? Albeit at the slower than you desired pace? OG Server: Pinnacle <||> Current Primary Server: Torchbearer || Also found on the others if desired <||> Generally Inactive Installing CoX: Windows || MacOS || MacOS for M1 <||> Migrating Data from an Older Installation Clubs: Mid's Hero Designer || PC Builders || HC Wiki || Jerk Hackers Old Forums <||> Titan Network <||> Heroica! (by @Shenanigunner)
_NOPE_ Posted July 9, 2019 Author Posted July 9, 2019 No, we really can't. The database size has gotten so massive for my host now that every attempt to add a new record fails due to timeout. We're stuck without a new host. But I do have another idea, though it'll probably take me a few weeks to put together. I'm out.
_NOPE_ Posted July 24, 2019 Author Posted July 24, 2019 So, a little tiny update here... remember my spider that I had running? Well... I never stopped it. It just finished. Here's a preview: My plan now is to upload these to my website, and then get Google to just index them, and perhaps add a header page that you can start from that has a Google search bar at the top of it, showing the results in the bottom pane. It might be easier to manage than a full indexed database, by shunting that work and data storage off to Google, while I just host the files (which, by the way, if they turn out to be good files, I'll zip them up and provide them to anyone so that anyone can host a "mirror" of the old CoH forums). This is what I meant when I said I was looking at alternate paths, since it looks like the Patreon is going to be a bust, and neither Homecoming, nor the Titan Network seems interested in hosting. I'm out.
Oubliette_Red Posted July 24, 2019 Posted July 24, 2019 That's great news PK. I was willing to support Patreon to the max for a few months if it looked like we'd reach our goals. Sadly that wasn't the case and since I'm currently looking for work I had to cancel. Dislike certain sounds? Silence/Modify specific sounds. Looking for modified whole powerset sfx? Check out Michiyo's modder or Solerverse's thread. Got a punny character? You should share it.
Zep Posted July 25, 2019 Posted July 25, 2019 Good job! ** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB ** ** Corsair Voyager a1600 **
_NOPE_ Posted July 26, 2019 Author Posted July 26, 2019 FYI, I'm still working on a new parser in case the spider's outputs end up being crap that Google can't index, here's a preview of what I have so far, it seems to be working, but I haven't opened up the files to test out the internal links. I'm trying to rewrite the internal links to point to files within the same directory so that in the end, when all is said and done, these files could sit one someone's hard drive, and they could be clicking around on a locally stored copy of the forums instead of one hosted on a website if they wanted: I'm out.
Zep Posted July 26, 2019 Posted July 26, 2019 2 hours ago, The Philotic Knight said: FYI, I'm still working on a new parser in case the spider's outputs end up being crap that Google can't index, here's a preview of what I have so far, it seems to be working, but I haven't opened up the files to test out the internal links. I'm trying to rewrite the internal links to point to files within the same directory so that in the end, when all is said and done, these files could sit one someone's hard drive, and they could be clicking around on a locally stored copy of the forums instead of one hosted on a website if they wanted: As compressible as the data is that could work (at least for me) too. Use box and/or a torrent to share it out with indexing. ** Asus TUF x670E Gaming, Ryzen 7950x, AIO Corsair H150i Elite, TridentZ 192GB DDR5 6400, Sapphire 7900XTX, 48" 4K Samsung 3d & 56" 4k UHD, NVME Sabrent Rocket 2TB, MP600 Pro 8tb, MP700 2 TB. HDD Seagate 12TB ** ** Corsair Voyager a1600 **
_NOPE_ Posted July 31, 2019 Author Posted July 31, 2019 Update: http://www.cityofplayers.com/2019/07/31/patreon-failure-and-cancellation/ I'm out.
_NOPE_ Posted August 2, 2019 Author Posted August 2, 2019 New update, I'm now working on making a generic "WARC Handler" library/DLL to make use of in a new version of the Project Spelunker Parser. What I was doing was WAY too custom and confusing to work with to be honest. Why am I doing this? A few reasons: Apparently nobody has EVER written a .NET WARC Dll... I have no idea, but I can find libraries in Java and Python, but nothing in the .NET platform, and I think one should exist. So I'll make that part and release it to the world, if anyone wants to use it. It will simplify my future code for Project Spelunker. I know it's going to be a one-time process, but I can imagine its application in future projects. It's a learning exercise for me to practice implementation by taking an ISO standard, reading it, attempting to comprehend it, and creating an implementation of it in the .NET environment. It's a fun challenge! After I have this written, it should be MUCH easier working with my new "WarcRecord" object to process the files and spit them out onto the user's drives, so that we can continue down the path of getting this project done via web indexing rather than the previous database method that I was attempting. This will (hopefully) mean that I won't have to spring out of pocket $120 a month on a beefier server. Just let Google do the work. Now, why am I doing this? Because frankly, I don't trust my spider to have a good enough job. I trust the Internet Archive's spider more than I trust my own. So, long story short (too late!) the project is still being worked on, albeit slowly. I'm out.
WanderingAries Posted August 3, 2019 Posted August 3, 2019 Possibly dumb question, so you're still talking about a local app that talks to a server OR a local app that later sends you results data? OG Server: Pinnacle <||> Current Primary Server: Torchbearer || Also found on the others if desired <||> Generally Inactive Installing CoX: Windows || MacOS || MacOS for M1 <||> Migrating Data from an Older Installation Clubs: Mid's Hero Designer || PC Builders || HC Wiki || Jerk Hackers Old Forums <||> Titan Network <||> Heroica! (by @Shenanigunner)
_NOPE_ Posted August 3, 2019 Author Posted August 3, 2019 The way I'm working it now, the program will "extract" the original directory structure and files from the WARC files, and recreate them on your hard drive. And then, instead of sending that as data to a database, it'll send the files to my server via FTP. I'm out.
_NOPE_ Posted May 6, 2021 Author Posted May 6, 2021 Progress is slow and steady, but this is just phase 1: Here's some interesting things that I've found so far: With a special appearance by @Healix and @Hyperstrike: 2 I'm out.
_NOPE_ Posted May 7, 2021 Author Posted May 7, 2021 So, I'm about 20% done processing, I'd say the initial processing should be done in about a week, assuming nothing interrupts the process: And so far, with that 20% processed, I have this many files so far extracted from the WARCs (with my own WARC parser that I built from the ground up): So, with 20% being 579,873 files taking up 39.745 GB, I estimate that the final total will be about 2,899,365 files taking up roughly 198.728 GB of space. Now, that's just raw, unprocessed files, written directly from a bytestream. But they'll be out there in file format, and I'll stick them in a web directory and make it searchable by Google, so that's something. The unfortunate thing is that since I started this in Debug mode, if anything "breaks", I have to start over. But I'm going to take that risk. It's been a few years, what's another couple of weeks? The second step will be to ATTEMPT to replace all of the internal URLs (those pointing to boards.cityofheroes.com) with relative path URLs that link correctly to the appropriate internal documents. Now, to get the correct filename to replace it with should be easy, because I had already written a "GenerateSanitizedFileName" method that allowed me to turn all of the return URIs into the files' final filenames, so I can just pass the references to that. The trick will be for me to learn enough about the HTML Agility pack to make the swap and replace happen, and I'm not sure how much processing time that whole process will take, given the number of files. While Step 1 here I consider to be reasonable enough to just run on my PC, for step 2, I might have to make a "mini-app" and actually try to resurrect "Project Spelunker" and ask others for their assistance with processing these files. We'll see how that goes. The third step will be to attempt to correct for any encoding errors. I already notice that a few of the image files won't work because of corruption (most work though), and a few of the HTML files appear to have a couple of extra digits at the very start of them. Not sure what that is, maybe a "checksum" value or something, but I'd like to strip those off if I can figure it out. We'll have to wait and see how this all pans out. Time will tell. 5 I'm out.
Grouchybeast Posted May 8, 2021 Posted May 8, 2021 This is amazing! Fingers crossed for a successful run. Reunion player, ex-Defiant. AE SFMA: Zombie Ninja Pirates! (#18051) Regeneratio delenda est!
_NOPE_ Posted May 11, 2021 Author Posted May 11, 2021 FYI, still chugging along. Not quite halfway there, but I would have expected any major errors to have happened by now: Now, cross my fingers that some app on my computer doesn't decide to perform an unattended "update and restart" without my permission! 2 I'm out.
WanderingAries Posted May 12, 2021 Posted May 12, 2021 12 hours ago, The Philotic Knight said: FYI, still chugging along. Not quite halfway there, but I would have expected any major errors to have happened by now: Now, cross my fingers that some app on my computer doesn't decide to perform an unattended "update and restart" without my permission! Set the process to realtime? Manually pause system updates for like a week? Bribe it with candy? OG Server: Pinnacle <||> Current Primary Server: Torchbearer || Also found on the others if desired <||> Generally Inactive Installing CoX: Windows || MacOS || MacOS for M1 <||> Migrating Data from an Older Installation Clubs: Mid's Hero Designer || PC Builders || HC Wiki || Jerk Hackers Old Forums <||> Titan Network <||> Heroica! (by @Shenanigunner)
_NOPE_ Posted May 14, 2021 Author Posted May 14, 2021 So.... uh.... slight update. Remember how it looked Ike I was almost half done? Yeah, it started slowing down... ALOT. I couldn't figure out why, but then I remembered that I had pre-sorted the file list from smallest to largest. So.... we're probably not even 25% done yet.... sorry. 😪 2 I'm out.
WanderingAries Posted May 14, 2021 Posted May 14, 2021 That's ok, historically,computers can't estimate time properly either. 😉 OG Server: Pinnacle <||> Current Primary Server: Torchbearer || Also found on the others if desired <||> Generally Inactive Installing CoX: Windows || MacOS || MacOS for M1 <||> Migrating Data from an Older Installation Clubs: Mid's Hero Designer || PC Builders || HC Wiki || Jerk Hackers Old Forums <||> Titan Network <||> Heroica! (by @Shenanigunner)
_NOPE_ Posted May 17, 2021 Author Posted May 17, 2021 So.... good news/bad news. The bad news is, yes, my system restarted for an automatic update. The good news is, a whole lot of files got extracted before that, so while I reconsider my code and recode it to make it a more sustainable process, I'm also in the process of copying what I HAVE extracted so far over to http://oldcohforums.cityofplayers.com/ So now you can at least get a "taste" of what I've "scavenged" so far. There's THOUSANDS of files, and frankly, my computer may not have the storage capacity to hold them all and keep processing in the same directory without breaking... so, I'm reconsidering how I process these. I may have to make a subfolder for each one of the WARC files, so that everything's not just in one GIANT folder... I'll have to think on this for a bit. I've also got to figure out if Google can just start indexing these things, or if I have to do anything special to make it index them. Otherwise, it's just a bunch of loose files sitting in a directory, and finding the content that you want would be like a needle in a haystack! More to come... Soon™. 2 I'm out.
WanderingAries Posted May 17, 2021 Posted May 17, 2021 If you're looking for a little extra storage... O>:p OG Server: Pinnacle <||> Current Primary Server: Torchbearer || Also found on the others if desired <||> Generally Inactive Installing CoX: Windows || MacOS || MacOS for M1 <||> Migrating Data from an Older Installation Clubs: Mid's Hero Designer || PC Builders || HC Wiki || Jerk Hackers Old Forums <||> Titan Network <||> Heroica! (by @Shenanigunner)
_NOPE_ Posted May 18, 2021 Author Posted May 18, 2021 I might have to consider something like that @WanderingAries, my system seems to be super slow for what I'm trying to accomplish. However, I think it may also be more efficient to extract the files, transfer them to the server, and then delete the original source files when there's a successful transfer. That way, there's less files just sitting around doing nothing. I may also "crowdsource" this like I tried to do last time with the database solution that failed. By the way, in the meantime, I'm still uploading what I have so far, over a million files and still copying: I'm out.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now