Jump to content

Number Six

City Council
  • Posts

    716
  • Joined

  • Last visited

  • Days Won

    15

Posts posted by Number Six

  1. Cross-posting this from the patch notes thread in case it gets buried there. Some technical details about what was done in the latest patch if anyone is interested:

     

    I'm curious, how does one change the graphics calculations from the GPU to the CPU?

     

    To clarify on this a little, this patch note is about the PhsyX effects for particles, debris, etc. When it was originally introduced, it required an Ageia hardware card, or there was a software mode which had reduced effects. Later, NVidia bought Ageia and ported the drivers to use CUDA on NVidia GPUs for hardware acceleration.

     

    At some point, Paragon Studios updated the PhysX software from 2.4 to 2.8. Something broke when they did this, and since then the effects were still present, but didn't work right. Most of the small objects just disappeared immediately instead of hanging around, usually by sinking through the floor, though in some cases they would float upwards if your framerate was higher than expected. It also failed to detect newer NVidia cards as usable for hardware acceleration.

     

    Recently there was a bunch of attention paid to this subsystem and someone made a change to the Issue 24 source to import a newer version of the driver, which was said to "re-enable" these broken effects. I haven't been able to independently verify this, and looking at that particular change it appears to be a minor version update - from 2.8.4.5 to 2.8.4.6. That's a minor bugfix release and based on the patch notes for that version it's not clear to me how that could successfully enable true hardware support again. We decided not to implement that particular change as it would cause conflicts with other servers using the same base game files, specifically the PhsyX DLLs.

     

    Since we were looking at it anyway, one of our developers decided to dive in and see if they could find what originally broke these effects. It turns out that some relatively minor tweaks to the way geometry is "cooked" for the PhysX software was all it took to prevent objects from falling through the world and make this feature behave the way it's supposed to. We then wrote a separate fix to handle framerates higher than 60fps.

     

    Along the way, we also deprecated the "HardwareOnly" flag, which is what the game used on FX in order to spawn extra objects when Ageia/NVidia hardware is present. Full PhsyX support is now enabled for both hardware and software simulation modes.

     

    The best part of this is that even with this relatively ancient version of the PhysX software, the physics simulation happens in a separate thread from the rest of the game. That means if you have a multicore machine, you can easily run it at the highest settings and one of the spare CPU cores that the game isn't using (it normally can use 2 at most) does all of the math for it, effectively filling the same role that the add-on card used to. That way you get the full experience even if you don't have an NVidia GPU to do true hardware offload. Even if you do, using otherwise idle CPU capacity for this rather than adding to the GPU's already busy load may be a net win.

     

    So if you have a CPU with 4+ cores, I highly recommend going into graphics options and cranking "Physics Quality" up to Very High. Ignore the warning about no PPU being present - it no longer really applies to modern hardware and will be removed in a future patch.

  2. I'm curious, how does one change the graphics calculations from the GPU to the CPU?

     

    To clarify on this a little, this patch note is about the PhsyX effects for particles, debris, etc. When it was originally introduced, it required an Ageia hardware card, or there was a software mode which had reduced effects. Later, NVidia bought Ageia and ported the drivers to use CUDA on NVidia GPUs for hardware acceleration.

     

    At some point, Paragon Studios updated the PhysX software from 2.4 to 2.8. Something broke when they did this, and since then the effects were still present, but didn't work right. Most of the small objects just disappeared immediately instead of hanging around, usually by sinking through the floor, though in some cases they would float upwards if your framerate was higher than expected. It also failed to detect newer NVidia cards as usable for hardware acceleration.

     

    Recently there was a bunch of attention paid to this subsystem and someone made a change to the Issue 24 source to import a newer version of the driver, which was said to "re-enable" these broken effects. I haven't been able to independently verify this, and looking at that particular change it appears to be a minor version update - from 2.8.4.5 to 2.8.4.6. That's a minor bugfix release and based on the patch notes for that version it's not clear to me how that could successfully enable true hardware support again. We decided not to implement that particular change as it would cause conflicts with other servers using the same base game files, specifically the PhsyX DLLs.

     

    Since we were looking at it anyway, one of our developers decided to dive in and see if they could find what originally broke these effects. It turns out that some relatively minor tweaks to the way geometry is "cooked" for the PhysX software was all it took to prevent objects from falling through the world and make this feature behave the way it's supposed to. We then wrote a separate fix to handle framerates higher than 60fps.

     

    Along the way, we also deprecated the "HardwareOnly" flag, which is what the game used on FX in order to spawn extra objects when Ageia/NVidia hardware is present. Full PhsyX support is now enabled for both hardware and software simulation modes.

     

    The best part of this is that even with this relatively ancient version of the PhysX software, the physics simulation happens in a separate thread from the rest of the game. That means if you have a multicore machine, you can easily run it at the highest settings and one of the spare CPU cores that the game isn't using (it normally can use 2 at most) does all of the math for it, effectively filling the same role that the add-on card used to. That way you get the full experience even if you don't have an NVidia GPU to do true hardware offload. Even if you do, using otherwise idle CPU capacity for this rather than adding to the GPU's already busy load may be a net win.

     

    So if you have a CPU with 4+ cores, I highly recommend going into graphics options and cranking "Physics Quality" up to Very High. Ignore the warning about no PPU being present - it no longer really applies to modern hardware and will be removed in a future patch.

  3. Actually using a replica would indeed generate more costs, but as I alluded to, I'd think you'd want a replica anyway. The alternative is all sorts of potential hilarity when you lose a drive on your db server and need to do a raid rebuild, or a rollback if the db eats its fingers. If the goal is stability, that should be pretty high on your list regardless of any export functionality.

     

    We do have a replica, but it's using log shipping since we're running the cheap version of SQL server, so it's not online and available for queries. The replica is inside the secure part of the network in case we need to go active/active and run some shards there, which I believe was done this morning as the primary SQL box got maxed out a few times last night.

     

    If/when we switch to pg that should open up some more options here. The port is almost complete (everything except accountserver & auth is done) and just needs testing under real load conditions.

     

    Burdensome is substantially better than nonexistent. Don't let perfect be the enemy of good enough. It's perfectly possible to throw together something that gives us safety now and then still build those other, cooler things later.

     

    The team is quite pragmatic about it and time-to-production is definitely a factor. None of the options being considered are ones that would be prohibitively difficult or time consuming to implement.

     

    Given how the centralized control model worked out for me in 2012, I tend to think that doing it "right" in this case is at least throwing together something simple so your players can manage their own character data before worrying about whether you can get more than 2000 players on a shard, but you are, of course, entitled to your own priorities.

     

    I appreciate that. FWIW, the super-high priority list contains issues that are affecting people daily. Excelsior frequently has more than 2,000 connected -- it was nearly 2,500 last night when it started queuing people and it was running really close to the instability line. It's the same issue that took down Torchbearer yesterday -- at around 2300 online with multiple MSRs going, dbserver couldn't keep up and the queue of deferred updates to send to the SQL server grew to over 60,000 before it became unresponsive. That caused many to get disconnected and stuck in the dreaded "character is still logging out" state. There's also a problem somewhere in the netlink code that can hard crash an entire shard if global services (chatserver / accountserver) go offline unexpected, but the bug only seems to show up when there are lots of people connected and getting a good crash dump is proving tricky.

     

    All those are issues that we're expecting to take a week or two to solve at most - not months. The dbserver changes in particular are what the temporary test server is for. That build also contains the pgsql port, but since it replaces the entire sqlconn layer to make it modular, it needs testing even with MSSQL.

     

    There are offsite backups "just in case", and while not just a whole lot of people have asked for individual character-level backups, it's something that's high up on my own priority list once the immediate issues are solved, as I know it would make people feel safer after what happened in 2012.

  4. I have one more question, in regards to the client-side crashes that occur; given that CoH is 32-bit, is it possible that the game runs into an out of memory event from hitting the 32-bit memory cap? Is the executable Large Address Aware, and if not, could the crashes be alleviated, or even eliminated, by making it so?

     

    It's something that can happen, yes. The client is already Large Address Aware and was for several years while the game was live, so it can use up to the full 4GB on 64-bit systems.

     

    Base editing is the quickest way to run out of address space. There are definitely some memory leaks in the editor somewhere that haven't been isolated.

     

    A 64-bit client would be nice, but IMO is a long way off. There is a ton of dodgy pointer aliasing that would need to be inspected and potentially fixed.

  5. Huh? Relational databases are really good at pulling small amounts of data out of a bunch of tables using indices. I can't imagine what you mean by that being a brute force method. It's literally the primary job of a RDMS. If you were feeling really iffy about it, just make a downstream replica of your db server and have the queries run there. I'm assuming you guys have one of those anyway, unless you like fun large scale outages and rollbacks because your db server fell over.

     

    I did say it was the short version. Based on reading the COH discord I'm not convinced that giving more detail won't still be twisted to fit whatever narrative, but in case people are interested I'll go ahead and elaborate.

     

    dbquery is the "brute force" method because it's the most direct method and the one that works now to produce the character in the text container format. But dbquery doesn't touch SQL directly; so indices and such don't matter. Instead it opens a connection to the dbserver just like a map starting up would do and asks for it from there.

     

    In testing we found that dbquery also can cause the dbserver to hiccup when it's run for some as yet unknown reason. That's really bad, since every map in a shard depends on dbserver being responsive not only to load/save characters but also to relay chat, mission updates, teams, moving between zones, etc. The dbservers are already struggling to handle more than 2,000 players online due to some design issues unrelated to the backend database, and a hiccup at the wrong time is catastrophic for a lot of people.

     

    So that leaves either writing a tool to extract the data from SQL directly and format it the same way, or coming up with some other method. For security reasons the web tier shouldn't be allowed to touch the game database directly, and an extra replica for that would incur more license costs - at least until the pgsql conversion is done.

     

    There are also logistical issues that we've discussed the best way to handle. The website is the most obvious route. However, it's based on old forum software and at the risk of spoiling something that shouldn't be public yet, there are plans to replace it with something more modern for account management and other functions. It doesn't make a lot of sense to write something for the current site that will just have to be rewritten in a month or two. Web access may not be the best way to go for other reasons as well. On such an alt-heavy game, it could become quite burdensome to manually download your characters whenever you want to make a backup copy.

     

    One other approach we're considering (and the one I'm personally advocating for) is to modify the protocol so that the game client receives a complete copy of the character on logout and automatically saves it in a backup folder. This would be opt-in to avoid privacy issues if you're playing on somebody else's computer, but once enabled would allow players the peace of mind of knowing they always have a current backup. This also shifts the burden of producing the character dump to the mapserver, which already has a full copy anyway so there is zero additional load on the dbserver.

     

    Finally there is talk of adding a cryptographic signature so that the character export can be verified as authentic. This would be useful both in a worst-case-scenario recovery contingency, as well as for server operators who want to allow automated imports of verified characters. This would be coupled with adding a character UUID to prevent duplication.

     

    Right now the server team's development priorities are focused almost entirely on stability. There is an extensive rewrite of parts of dbserver ongoing to address its design issues preventing smooth scaling above 2000 players per shard. There is also a plan for a potential PostgreSQL migration; as its design meshes well with the update-heavy workload and it could reduce licensing costs in the long term. The powers and frontend/UI people are working on different things in their areas of expertise of course.

     

    TL;DR: Character exports are something we think is a good idea, and something we're planning to look at once we're comfortable with the stability of the servers. But we're not going to slap something together and throw thousands of players at it without being confident it will work - we'll do it right.

  6. When is Homecoming going to support character to text file exports? There's no way I can possibly convince myself to play on a server where I'm a C&D away from losing all my characters again.

     

    That's something we've looked at and had internal discussions on. The short answer is that it's something we would like to be able to do; there are just some technical details that need to be worked out to be able to do it at scale. The brute force method (using dbquery) causes lag for other players when you use it on a server that's hosting more than a few hundred online.

  7. I'm fine with the naming stuff, and with the multi-boxing stuff in general, with exception to the 1 mastermind limit.  Dual boxing masterminds was kind of my thing on live, and I was really looking forward to doing it again.

     

    The initial policy is based mostly on the "common knowledge" that pet classes take more resources - the reason the Paragon devs gave for making Masterminds and Controllers available to subscription accounts and higher freemium tiers only.

     

    We need to do some testing to see how much much load the pets really cause and if that was overstated. Without that evidence in hand yet, based solely on what I know about engine performance, I think it's likely that this restriction will be lifted at some point.

  8. On the other hand, now that the game code is in the hands of the community, it would be possible for someone on the coding team to simply add the functionality that HeroStats provided into the game client itself. The issue there, of course, is that you need someone on the current development team to focus on doing that, and they've got a lot of other things on their plates!

     

    Some sort of client API for addons like this is something that's on my radar. If you could design any kind of information gathering interface you wanted, what kind of things would you like to see in it and what would be easy to connect to?

    • Like 1
×
×
  • Create New...