Anyone know how to read/see posts in the old forums?

_NOPE_ · June 19, 2019

Aaaaaannnnnndddd... we're processing!

Not sure how GOOD the data is yet though... I'm going to run it for a bit, and then dump it and inspect it... then adjust accordingly. But, it's a start.

Healix · June 19, 2019

Jeneki · June 20, 2019

It's worth mentioning that there was at least one forum purge over the years. I remember it quite well as it deleted several excellent humor posts from the Champion subforum. If someone can't find the old post they are looking for, you might have to look at an older snapshot of the forum than just before shutdown.

Zep · June 20, 2019

Looks like you have things well under control.

I was also thinking you could just box/google drive/whatever me a zip file to uncompressed and execute a file -- if it would be worth the time/hassle.

_NOPE_ · June 20, 2019

Zep · June 20, 2019

YEAH!

Zep · June 22, 2019

Inquiry:

When your glorious work is done how will we access the data?

I would likely start by wanting to find my own posts - in many cases that would lead to wanting to read whole threads and lead to old guides etc.

_NOPE_ · June 24, 2019

Inquiry:

When your glorious work is done how will we access the data?

I would likely start by wanting to find my own posts - in many cases that would lead to wanting to read whole threads and lead to old guides etc.

Sorry, I was on a family trip over the weekend.

I'm currently planning on a web interface similar to the Dev Tracker, but with a "Search Page". Now, just to let you know, I've stopped my parsing process for two reasons:

[*]I just wanted some sample data so that I can start constructing the front end and make sure that all looks good.

[*]I realized that I forgot about thread titles in my schema.... probably doesn't do much good to have a bunch of threads that are just known by their ThreadID. That'd be like having to navigate the internet by going to http://192.168.1.1 instead of http://www.google.com - namely, it would SUCK. So I need to parse the Titles and get them into a unique column so that they are displayable/searchable as well.

No ETA at this time. Trust me, this is a LOT of work! And I have to balance doing this against work/family/Playing CoH for Sanity.

I appreciate your patience in advance. We'll get there, but it might just take a while.

_NOPE_ · June 24, 2019

You know what? I just changed my mind, sorta. I'm going to start it off as a desktop application, because that's where I'm comfortable and I know I can crank that out WAY quicker. And then after I have that working, I might turn it into a web site... because there's a WHOLE lot I still have to learn about designing modern websites using code dynamically. I BARELY got the Dev Tracker working, to be honest, and I still haven't spent the time to figure out how to NOT make the thing load as one giant web page that takes forever to load.

So, yeah... desktop app is where I'm going to start.

Zep · June 24, 2019

I will never rush you on this. I hope no one else would. I do get curious and like to help.

Thank you again for doing this as it is a large amount of work.

_NOPE_ · June 24, 2019

Not to get you TOO excited... but I'm beta-testing the basic interface now, and it seems functional. The resulting screens look like crap, but it's a start:

chigiabelo · June 24, 2019

That looks very nice, PK! Good job!

_NOPE_ · June 24, 2019

Search by UserName (full or partial), is now a thing:

Zep · June 24, 2019

You are awesome.....

Find many for _Zep_ ? Just curious :) **NO RUSH**

I am a slow leveler and have a lot of alts, be a while before I can use any of my old builds anyways -- prob need to do some adapting to newer conditions anyways.

I am hoping to find one or more of my origin stories more so than builds - again no rush. :)

I truly appreciate the time and work you are putting into this. I tried going through the Wayback Machine and found a total of (1) of my old posts before giving up.

Thanks!!!!!

You should get merits for this - HINT HINT HINT to the Dev's.

_NOPE_ · June 24, 2019

Honestly, I haven't been looking for specific usernames, just test data to play with. Now that I think the system is MOSTLY "good enough", I have to change my parser to start uploading to my server instead of to my personal PC.

Then, once the data is all on my server, I can modify my front-end app to point to that server instead of my PC, and I'll be ready for initial release.

When I release this, I'm going to be making a read-only user account for my SQL database, and release the source code for both the parser and the front end, so that anyone with sufficient programming skills can improve on what I started, if they want to go to the Internet Archive and download their own copy of the CoH forums archive. The parsing has been ROUGH, there's a LOT of "junk" and corruption in a lot of the files that the Wayback machine stored... I don't see any way around that, TBH...

Obitus · June 24, 2019

This is great work. Thanks PK.

I have a handful of my old posts saved via bookmark through the archived site, but it's extremely difficult to find specific posts/threads using that method.

_NOPE_ · June 24, 2019

This is great work. Thanks PK.

I have a handful of my old posts saved via bookmark through the archived site, but it's extremely difficult to find specific posts/threads using that method.

Speaking of the Wayback Machine... if for some reason the parsed HTML that's on my servers is bad, I've added a button that lets you go right to the Wayback Machine page for the specific Page/Post/Thread you're looking at in the program:

_NOPE_ · June 25, 2019

Status update: I've rewritten the parser to send data to my remote MySQL database instead of my local MSSQL database, and it's now starting to populate the ACTUAL tables that will be used by the front end:

And this is still just from processing file #1 of.... 18,869. :o

chigiabelo · June 25, 2019

How big is the MSSQL database, file size?

_NOPE_ · June 25, 2019

How big is the MSSQL database, file size?

Oh, I never ran the full parser on all of the files, I stopped the parser after I had a good enough sample size to work with... so I stopped at like file 26 or something like that. So, I have no clue how big this thing is gonna get.

_NOPE_ · June 25, 2019

Made the changes to the reader to read from the remote MySQL database, looks like it's working! Sorry, no _Zep_ user yet, but then again, I'm only through parsing file #3 of 18,869:

Widower · June 25, 2019

You should get merits for this - HINT HINT HINT to the Dev's.

I can't give him in-game merits, but I can give him Widower's Medal for Exceptional Coolness.

_NOPE_ · June 25, 2019

FYI, I'm going to try to post a daily update on how the data upsert process is going. Here's what we've got so far, so you can see it'll be a while, but hey, at least I have the files themselves, they aren't going anywhere (just on my PC), so the only thing slowing things right now is my mandatory database checks to ensure that we're NOT adding the same data into the database twice... because for some reason this archive contains a LOT of duplicated data....

_NOPE_ · June 25, 2019

I'd also stopped and restarted the process, and didn't realize that it was restarting from SCRATCH every time, so I'm now making a "skip" file for every file that I process - if, when I come across a file, I find its "skip" brother/sister, I'm skipping processing the file. A poor man's resume "tool", but it lets me deal with issues with power/internet outages going forward.

chigiabelo · June 25, 2019

... the only thing slowing things right now is my mandatory database checks to ensure that we're NOT adding the same data into the database twice... because for some reason this archive contains a LOT of duplicated data...

I've had a similar challenge importing Azure billing information into a SQL server because we import the current month daily and there's usually some slight variances from the last few days, so we reimport from the beginning of the month to the current date, which can be hundreds of megabytes by a certain time of the month due to our size.

The method I have found that works faster than checking if a record already exists is to set a SQL constraint on the table based on the fields that need to be unique and then a try-catch on the code that is doing the importing. If a duplicate exists, the SQL server fails the attempt to import that one record, the app catches the exception and goes on to the next item.

Sign In

Anyone know how to read/see posts in the old forums?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in