Okay, so I had a few minutes today to look at the files. It looks like they are straight up flat text files, that have a header that looks something like this:
WARC/1.0
WARC-Type: response
WARC-Record-ID: <urn:uuid:68cc19fb-7c62-4272-bc46-cba8947d160d>
WARC-Warcinfo-ID: <urn:uuid:0e8969e5-67a8-4bb8-b607-e79a4836ebbb>
WARC-Concurrent-To: <urn:uuid:16c1fe48-6d6e-43be-8b13-87be55268e5e>
WARC-Target-URI: http://boards.cityofheroes.com/showthread.php?p=3389674
WARC-Date: 2012-09-05T03:16:33Z
WARC-IP-Address: 64.25.35.208
WARC-Block-Digest: sha1:LXWNSCSQQ556BB2MHJKI4RM4QS7NZQWA
WARC-Payload-Digest: sha1:VMESDRD3S6VLSNDJG76JRJAGAJW3K5H2
Content-Type: application/http;msgtype=response
Content-Length: 159891
HTTP/1.1 200 OK
Date: Wed, 05 Sep 2012 03:16:33 GMT
Set-Cookie: bbsessionhash=ff91b849f3c8e8e8e9441213ec5aa751; path=/; HttpOnly
Set-Cookie: bblastvisit=1346814993; expires=Thu, 05-Sep-2013 03:16:33 GMT; path=/
Set-Cookie: bblastactivity=0; expires=Thu, 05-Sep-2013 03:16:33 GMT; path=/
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
X-UA-Compatible: IE=7
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Set-Cookie: bbreferrerid=deleted; expires=Tue, 06-Sep-2011 03:16:32 GMT; path=/
Set-Cookie: bbuserid=deleted; expires=Tue, 06-Sep-2011 03:16:32 GMT; path=/
Set-Cookie: bbpassword=deleted; expires=Tue, 06-Sep-2011 03:16:32 GMT; path=/
Set-Cookie: bblastvisit=deleted; expires=Tue, 06-Sep-2011 03:16:32 GMT; path=/
Set-Cookie: bblastactivity=deleted; expires=Tue, 06-Sep-2011 03:16:32 GMT; path=/
Set-Cookie: bbthreadedmode=deleted; expires=Tue, 06-Sep-2011 03:16:32 GMT; path=/
Set-Cookie: bbsessionhash=deleted; expires=Tue, 06-Sep-2011 03:16:32 GMT; path=/
Set-Cookie: bbstyleid=deleted; expires=Tue, 06-Sep-2011 03:16:32 GMT; path=/
Set-Cookie: bblanguageid=deleted; expires=Tue, 06-Sep-2011 03:16:32 GMT; path=/
Keep-Alive: timeout=2, max=44
Connection: Keep-Alive
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: TS32e4e1=af9d022137a22c65678290c519cf5e4595d529eff757377d5046c3ca; Path=/
Transfer-Encoding: chunked
74
And then after that cryptic number 74 (it's a random two character string, I'm guessing some sort of checksum), then you get the actual CONTENT in raw text format. Usually, the contents of an entire HTML page. Sometimes, it's the "gobledygook" of the raw code for an image, for example. I don't think we want or care about those images, right? Or is that a bad assumption to make?
I can parse the text, and scrape the actual webpages for the HTML values, and shove them into a database (along with the unique URL as an identifier) so that it's searchable, either by text, or by the URL string. Then, after you search, you can be presented with a page like my Dev Tracker, where I "recreate" the HTML in a sub-window. The caveat to this is that it will most likely be without ANY images in it, since the images were NOT stored in the WARC files, it seems. The URL will be pointing to probably broken images. But all of the HTML should still be good, so the words will be readable and have pretty much the same formatting you would have seen on the original forums.
What say you?