City of Data v2.0

BillyMailman · February 2, 2022

The threat sets don't appear to be findable at all. See, e.g., https://cod.uberguy.net/html/boostset.html?set=mocking_beratement

They're showing in the list (https://cod.uberguy.net/html/boostset-groups.html?group=threat_duration) and the set bonus finder, and the Boosts themselves show up (https://cod.uberguy.net/html/power.html?power=boosts.attuned_mocking_beratement_a.attuned_mocking_beratement_a), but the pages for the sets aren't working. Seems to be all four of the Threat sets.

UberGuy · February 4, 2022

Hm, just noticed these posts.

Ugh, the threat set one is, once again, a bug caused by the name change from "Taunt Duration" to "Threat Duration". It's very likely an easy fix, but I won't get to it tomorrow, as it's already well past my bedtime here. Look for it tomorrow afternoon.

The Kuji-In Rin issue is basically the same issue with a different name. I'm pretty sure it's not handling "Jump & Sprint" sets. Every power with a leaping or running component shows no enhancements because of this. How has no one else noticed that?

Edit: Couldn't get to it today, but tomorrow for sure.

Edited February 4, 2022 by UberGuy

UberGuy · February 6, 2022

OK, I didn't get to it Saturday, but I did get to it today.

I hadn't realized Page 3 added "Running & Sprints" set category. (That name seems super redundant to me, but 🤷‍♂️.) There is actually still one plain "Running" set, just as there is still one plain "Leaping" set, so powers need to both plain versions and the "& Sprints" versions.

I've fixed this.

Also, I had not updated "Enhance Taunt" to "Enhance Threat Duration" in the code that builds enhancement icons when displaying set icons, which was why the site was saying "not found". That's a generic default message for "I could not load this data", which was happening because of an exception being hit when not finding "Enhanced Threat Duration".

Because I have to remember to update the code to deal with name changes / additions like these, errors like this are likely to happen again in the future. There's no good reason for this to so thoroughly break functionality when it happens. I updated the page logic to display "unknown" icons instead of barfing. Adding the logic to display something special was easy, but I had to add a bit of logic to the game data extract code to explicitly generate these icons, since this red question mark icon doesn't exist in the game files. (The borders and backgrounds do, however.)

They looks like this for "generic" boost types and set types, respectively.

Likewise, the only thing the boostset info pages use the boostset category name for is to look up the background color for the icon. I've changed those pages to simply ignore rendering the background if the category name isn't recognized. This looks weird, but prevents the pages from completely breaking due to unrecognized names the way they did until now.

Edited February 6, 2022 by UberGuy

BillyMailman · February 7, 2022

So, an SG-mate was looking at how well Savage Leap procs (very!), and I noticed something odd. The power says it builds one stack of Blood Frenzy, two if you're more than 20' away, and three if you're at least 40' away. But looking at the below, each of the three stacks is exclusive with the others.

Turns out, from looking at the raw data, that the second and third ones are exclusive, but also directly give two or three copies at once. I have no damn clue why they didn't just have each of the three give one stack, and the distance criteria overlap, but I ain't on the powers team, so not my place to judge.

Anyway, in terms of CoD, I think that this, Blood Thirst (the Build Up replacement in Savage Melee), Tidal Power (the Aim variant in Water Blast) and others like them make a strong case that the display should show the count of stacks of a power being granted. It might only be needed if it's 2+ stacks, to keep from adding unnecessary noise to other powers? Dunno, design's not really my thing.

UberGuy · February 7, 2022

1 hour ago, BillyMailman said:

Turns out, from looking at the raw data, that the second and third ones are exclusive, but also directly give two or three copies at once. I have no damn clue why they didn't just have each of the three give one stack, and the distance criteria overlap, but I ain't on the powers team, so not my place to judge.

Anyway, in terms of CoD, I think that this, Blood Thirst (the Build Up replacement in Savage Melee), Tidal Power (the Aim variant in Water Blast) and others like them make a strong case that the display should show the count of stacks of a power being granted. It might only be needed if it's 2+ stacks, to keep from adding unnecessary noise to other powers? Dunno, design's not really my thing.

Yeah, that should be doable. It took a bit for me to understand what you meant, but looking at the raw data, I get it. I honestly don't think I realized this was a thing anywhere, as I don't much play either of those two powersets. I think your suggestion of displaying it only if it's more than one copy makes sense.

It constantly breaks my head how much information is packed into powers definitions, and it's only likely to grow with time as our devs come up with new ideas. It's quite the (ongoing) challenge to display all it in a way that's reasonably compact. It always makes me appreciate what the game itself is dealing with, parsing all this stuff on the fly (in a much more compact format to be sure) to know what to do with our powers. When you think about how many patches, auras and other long-lasting effects can be in play at something like a raid or iTrial, plus all the players and critters activating click powers, it's kind of amazing to consider what all the engine is up.

UberGuy · February 13, 2022

On 2/7/2022 at 2:41 PM, UberGuy said:

Yeah, that should be doable.

This is done.

UberGuy · February 18, 2022

I thought you folks might enjoy some details about how CoD is hosted. It's lives in AWS (Amazon Web Services), basically using just two services there.

The first is AWS S3 (Simple Storage Service), which (for the non-techies) is basically a place to store files. Not quite the same as the file system on your computer, but kind of similar, functionally. This is where all the files for the CoD site go. This includes all the data files for powers, critters, etc.

The next is called CloudFront (CF) which is basically AWS "content delivery network" or CDN. A CDN is something you can put between your web server and your end users. The CDN keeps copies of web files geographically closer to the end users, which reduces the time it takes them to download the files if they're far away from your web server. Basically a CDN is a big, geographically distributed file cache for the web. CDNs can do (a lot) more than this, but that's their main function.

CloudFront can serve files out of S3, and that's CoD in a nutshell. I put files in my S3 "bucket", point CF at the same bucket, set up DNS to point cod.uberguy.net at my CloudFront distro, and voila: CoD lives. (AWS provides DNS services too, but uberguy.net isn't there. I had that hosted somewhere else already.

Ever wonder what it costs to run CoD, in terms of actual bills?

AWS billing is suuuuper complicated, but I'm not doing much, so it's simple for me.

S3 charges by amount of data stored and number of uploads and downloads you make to it. AWS is scaled for giant companies like Netflix - I pay fractions of a penny for the data CoD uses. Thankfully I don't have to pay for files that CF reads from S3 to then serve to the web, or if I do, it's so small it doesn't even show in my bills. To limit the number of files I have to upload to S3 (which mostly saves me time), I have software that only uploads files that changed locally. So if I change something in the data extract code I run on my home computer, and it only changes 1000 CoD data files, only those 1000 files will be uploaded to S3 (out of ~50,000 total).
Cloudfront charges for a bunch of things, but the two main ones I see in my bill is data downloaded by y'all and number of "SSL terminations" - basically number of times you folks' browsers establish an SSL connection to the site. Interestingly, the SSL thing was always larger for CoD, partly because I pay more for people connecting from Europe. It probably makes sense - I set the site up to cache intelligently in our browsers, but it's also set to check and see CF has newer files semi-often. Once you've seen a file, you probably don't download it again for days unless I updated it, but if you load it several times a day, you'll reconnect to make sure there isn't a new copy.

This setup is pretty cheap for me to run. Until recently, it was in the low US dollar per month range. Recently, it's started costing way less than a dollar. Sometimes pennies a month.

It turns out that at the end of last year (2021), AWS greatly reduced the cost of Cloudfront and also took a chunk out of S3's pricing. A lot of AWS' services have a "free tier", which is basically a "freemium" style system, where you pay nothing if you use only certain features within certain limits. AWS massively extended the free tier thresholds for Cloudfront, basically causing all my CF costs to disappear. Like, literally. They went to 0. My S3 costs are usually like 10 cents a month. They went to a whopping 63 cents in December when I uploaded 10s of thousands of files for Page 3 updates.

CoD takes time to keep up to date and/or add features to, but at least it really doesn't cost anything to host. I'm not close to broke, but even if I was, I think I could afford this!

As an aside, I can only host CoD this way because it was designed to run completely in browsers. CoD has no server-side logic, no database, etc. A site like this forum can't be hosted fully the way CoD is, because it needs to run server-side software and store dynamically changing data (posts, user accounts, etc.). I'm sure these forums cost the homecoming team way more than CoD cost me even before AWS' prices crashed.

Bionic_Flea · February 18, 2022

Thanks, Uber. If you ever need a buck or two to keep it running, let me know.

BillyMailman · February 18, 2022

And if you ever need a hand, let me know. I don't think you're using languages I've used much, but I do meet the basic requirements of both knowing how to code and how the game works, so should be able to fake it.

UberGuy · February 20, 2022

So a long-ago requested feature is now live.

This lets you download a zip archive containing the "raw" JSON representing the data that comes out of the Rust library, before I do any additional massaging of it as used in the CoD site. Right now this file is about 83MB.

This is really only helpful for either tool makers, like folks working on something like Mid's, or for the terminally curious. The data in the archive is pretty close to what Ruby's powers project code would emit in "raw data" mode. My code has diverged from Ruby's, but the basics are still very close.

I took some time today to make sure the Rust side doesn't export things that are not "actual" powers data info. This means book-keeping fields added to powers/powersets/powercats that are used while doing things like building power/AT relationships. They aren't actually used any other way, so they always should have been filtered out anyway.

Right now there's no "data map" explaining what all the fields are. That might take a while to produce, but the good news is that the Rust code's own comments can be used to automatically generate most of it. I plan to work on this later this week, and to add it to either the site help, in the archive, or both.

~~PS: If you switch to the Alpha data, you'll get that server's data. The file name is the same either way, currently (I may change that), so be sure to rename it if you download both.~~ (Fixed. See my next post below.)

Edited February 21, 2022 by UberGuy

Troo · February 21, 2022

On 2/18/2022 at 6:56 AM, Bionic_Flea said:

Thanks, Uber. If you ever need a buck or two to keep it running, let me know.

Felis Noctu · February 21, 2022

5 hours ago, UberGuy said:

Right now there's no "data map" explaining what all the fields are. That might take a while to produce, but the good news is that the Rust code's own comments can be used to automatically generate most of it. I plan to work on this later this week, and to add it to either the site help, in the archive, or both.

Haha, the Mids team definitely has experience processing poorly-/unlabeled blobs of data, I can tell you that much! We've pulled our data from several different methods and formats at this point.

This is incredibly useful, thank you for providing this!

And like Flea said, please let us know if you're looking for help to keep the lights on. I'm sure we can scrounge something together.

Edited February 21, 2022 by Felis Noctu

UberGuy · February 21, 2022

Thanks for the offers of assistance, gang. I think if I need help with paying for this I'm in deep trouble and would probably need help with other things first, so I am not expecting it, but I still appreciate it.

I just uploaded a couple of minor tweaks.

The raw data archive filename is now dependent on the data set viewed on the current page. For "live" data it's called "raw_data_homecoming.zip" and for "alpha" data, it's "raw_data_cryptic.zip". Those names align with the "revision" name up in the top right of the site ("Homecoming_xxx" for live and "Cryptic_xxx" for alpha), because those are the "internal" data set names I produce.
I discovered that, at least on some browsers, the vertical scrollbar could flicker on and off rapidly during dynamic page changes, like the main "hamburger" menu fly-out when clicked, or while the "busy" icon was spinning. This only happens at some window sizes, so I'm guessing some people never see this, even on browsers subject to it. Anyway, there is scrollbar style that avoids this effect, and I added it to the site's styling. The vertical scrollbar is now an "overlay" scrollbar, meaning it appears on top of content when it's visible, rather than taking up some of the space on the side of the page.

UberGuy · February 21, 2022

Working on a data map makes clear I don't have explanations for all the struct fields. I'll need to go through and add (and in some cases, dig up) explanations for them. It's not a ton, but it's more than I want to leave unexplained, if I can avoid it.

Edit: OK, this is done. All the fields for all the Rust data structures (what gets turned into the "raw data") have doc comments that explain them to the best of my knowledge. Now I just need to turn that into something I can shove into the site and archives.

Edited February 21, 2022 by UberGuy

Bill Z Bubba · February 22, 2022

9 hours ago, UberGuy said:

Edit: OK, this is done.

You're awesome, man. Thanks for all this work.

UberGuy · February 28, 2022

Not much new progress yet, but letting y'all know I haven't re-vanished. Still plenty of real-life stuff keeping my hobby coding time low for now.

UberGuy · March 21, 2022

Time for Uber's forum post book of the month.

Still not much progress on the data map. I hope to work on it later this week. But I did make some small, backend-only changes as an experiment.

Building all the CoD site data for one "version" of the game (Alpha or Live) takes a little over 8 minutes on the hardware I usually use. That's extracting everything, icons, data files, etc. but not including uploading it to where it's hosted, which can take a couple of minutes but is usually much faster because only changed files need to be uploaded.

When I build both Live and Alpha site data together, I build them both at the same time by running two copies of the program in parallel, one handling each build, so the time is about the same whether I build one or both.

As a reminder for context, CoD's data is pulled out of the game files using Rust code derived from that written by @RubyRed. This does all the low-level work of actually turning the very raw *.bin file data into data structures it can then pass around, manipulate, link together, etc. The Rust code handles this plus stitches powers into powersets and powersets into power categories ("powercats"), as well as things like figuring out which ATs use a power, plus a few more things.

At that point, the Rust code disgorges the extracted data into Python. This transfer about as efficient as it can be, as the Rust code is loaded into Python (and invoked) as though it were native Python code (just much faster). A final transformation of that data is done in actual Python code, turning all the raw data from Rust into dedicated Python objects (so there's an object type for a Powerset, a Power, an Atrribmod, etc.), then, slicing and dicing those into the JSON output used by the CoD website. I originally did this this way because it was easier for me to code the transformation in Python. That's because I'm very experienced with Python, and because Rust's design, which is meant to protect programmers from common coding/design mistakes, makes it hard(er) to do certain things that are easy in other languages. Or at least harder to think about doing them in new ways.

Anyway, the downside is that the bits done in Python are much slower than the stuff done in Rust. But there's a version of Python, called "PyPy", that's often way faster than the standard version. I wondered if PyPy could speed things up for me, at least a bit.

A speed improvement of 3-6 times is considered typical for PyPy. PyPy's is pretty darn compatible with regular Python, but the way it runs the code is very different than the standard Python interpreter. These differences aren't easy to summarize for non-programmers, but they tend to do two things for you: make the code just plain faster, and allow effective multi-threading. Regular Python supports multi-threaded programs, but the interpreter design hamstrings the ability of multiple threads to do really heavy duty work in parallel. PyPy doesn't have that problem.

My CoD Python code isn't multithreaded and really would not be easy to make it so, but I still wondered if PyPy still shave off some execution time if I just dropped my code on it mostly as-is.

So I spent some time today figuring out how to get the Rust/Python library to play nice with PyPy. This mostly involved upgrading a few things in the project and then fixing a few things that this broke. Getting everything working with PyPy actually wasn't very hard, but I did spent a couple of hours stumbling over a couple of issues that were actually quite simple in the end but just weren't document well.

So what happened?

The PyPy version actually took about 30 seconds longer to run. Oops.

Why? Well, based on watching the system while it ran, and knowing a little about how PyPy works, I think the problem is that what my program is doing is kind of a bad deal for PyPy. To explain why, I have to go a little into the territory I said would be hard to summarize. Programming runtimes need to deal with things, like variables or data structures, that a program was using but is now done with. If it doesn't, the program will just fill up memory with unused junk until it runs out of places to put things. There are, broadly, three ways to handle this problem:

Make the programmer responsible for cleaning up after themselves. You ask for it, you have to explicitly clean it up when you're done with it. This is, except in the case of extremely low-level, high performance code, not that common in modern languages. (Note: CoH's client and engine code, written in C, works this way.) The historic problem with this approach is that it is easy to get wrong and create not just memory leaks, but situations where you trying to use memory that's not yours to use any more. (This is the source of whole range of security vulnerabilities in the real world.)
The language or runtime automatically cleans up things that are not used any more as soon as this becomes "obvious". There are a lot of ways compilers or runtimes can tell when a variable (and/or what it pointed to) is no longer used, and whatever it pointed to can then be cleaned up on the spot. For the most part, this is the approach taken by regular Python. It's convenient and relatively easy to design, but does have some performance implications and sometimes needs some help from a garbage collector anyway (see next point) so it's not always preferred.
Create a "garbage collection" mechanism. Basically, these languages figure out when something is unused after the fact, sometimes using very complicated analysis of what's in memory. Stuff that's been marked "old enough" is slated for cleanup by a reaper that runs in its own thread(s), so that, in theory, cleanup of unused stuff doesn't steal any time from the main program. This is the approach taken by PyPy's version of Python, as well as well-known languages like Java, C# and Go.

It turns out that PyPy's "garbage collector" expects most things a program uses to stick around for a while. A program that creates tons of transient objects and then rapidly discards them (which CoD's Python code definitely does) breaks PyPy's tuning assumptions, leading to a buildup of junk that isn't cleared out as fast as it could be. I think PyPy actually would have run a little faster than my code, except near the end it used a ton more memory than the regular Python version. This caused the VM it ran it in to use swap memory, slowing down its performance.

There's another reason, though. I already do most of the heavy lifting in my code in other, faster languages, rather than pure Python. I already talked about how a lot of the CoH data parsing is done in Rust. I also use a very fast Python library written in C++ to write out the JSON data. The three things the program really does is load the data, turn it into Python objects, and save those objects as JSON. The first and third things probably can't get much faster. That leaves turning the data into Python objects, and this is the place where tons of temporary objects get created and discarded.

I could speed up my extract code by stripping out the step that turns the raw data spit out by Rust into Python objects. That step is not strictly needed to emit the CoD data JSON. If that was removed, the code would basically be a variation on Ruby's tool. I suspect it would shave a good 3-4 minutes off the run time. But the Python transformation step gives me something that I can dump out and easily load back into an interactive Python interpreter later, and this is much easier (and faster) to work with than the final JSON. I use this setup all the time to find answers to questions for things you can't find on CoD itself, or would be very hard to generate. A semi-common example is to spit out a report like "what powers use attribmod X?") Adding 3-4 mins to my build time whenever there's a new CoH release is worth it if it does something that saves me tons of time every time I want to access the data.

This just shows that what's "best" (let alone "fastest") in a given scenario is highly dependent on use cases.

Bionic_Flea · March 21, 2022

What if you reverse the polarity of the blilateral kelilactirals then route it through the Heisenberg compensators? That just might work!

Troo · March 21, 2022

and a sprinkle of whimsy..

UberGuy · March 23, 2022

On 3/21/2022 at 6:05 PM, Bionic_Flea said:

What if you reverse the polarity of the blilateral kelilactirals then route it through the Heisenberg compensators? That just might work!

Only if I can get it to 1.21 gigawats first. Or maybe 88 miles per hour. Or both...

BillyMailman · April 3, 2022

Any chance we can get an update to the data soon? I kinda wanna see exactly what they've done with the April Fools invasion enemies' powers.

UberGuy · April 4, 2022

I didn't expect these enemies to have new powers, or indeed to be new entities at all. I thought they pre-existed from last year.

I'll see if I can find anything new tomorrow.

UberGuy · April 5, 2022

Ended up running the extract last night and uploaded the data this morning.

BillyMailman · April 6, 2022

Well, I ain't done this in a while, so... time for a feature request!

It's not super-useful, but can Set Costume be changed to display the costume name?

UberGuy · April 6, 2022

Yep. Looks like it might not be trivial to add, as that attribmod has no special handling currently, and the attribmod handling code is a giant cascading if/else tree that I usually break when I touch, but it should definitely be doable and might not be hard.

Edit: Definitely not too hard.

Edited April 6, 2022 by UberGuy

Sign In

City of Data v2.0

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in