Pillars of Eternity II - Performance Fix Mod

Arkadia7 · July 16, 2018

Stingray said:
Kind of impressive that, as an RPG player, you've only managed to play one game that used Unity? Are you sure?

My personal list would include… Shadowrun Returns, Shadowrun Dragonfall, Shadowrun Hong Kong, Might & Magic X, Wasteland 2, Torment: Tides of Numenera, PoE 1, Tyranny. Also tried Shroud of the Avatar. M&M X performance seemed a little meh, but as a turn-based grid-movement game, who cares really. No complaints about the rest.

Ok, I do have both Shadowrun DragonFall and Torment: Tides of Numenera, but I haven't really played them, in terms of playing for real. I tried out Dragonfall for around 20 minutes, according to steam, and have about an hour in T:TON. I do intend on playing a full game of Tides one day sooner or later, just have not got around to it yet.

Anyway, yea, I forgot about a couple of those games also being the Unity engine. I know Tides looks gorgeous to me though, so I'm not surprised its Unity.

joxer · July 16, 2018

Couchpotato said:
Darth Tagnan said:

Anyone tried this?

Click to expand...

Nope as I haven't played PoE 2 yet, but @joxer; or @Silver; might answer that.

I play games on i5 based machine with a single GPU and didn't notice any performance problems in Eternity2 that would need patching.

How the old rule goes? What isn't broken...

Darth Tagnan · July 16, 2018

I tried this and it actually works - and it has options to allow for G-Sync as well, which Unity games apparently don't normally use.

I never had much in the way of FPS issues - but I did have quite a lot of stuttering - especially in areas like Neketaka. These are now almost entirely eliminated.

Also, the G-Sync fix makes scrolling around the map silky smooth.

Very nice.

That said, it's kinda weird that I have to use a third party overlay to get basic functionality working - as well as improved performance.

Morrandir · July 16, 2018

I guess the mod doesn't decrease the (not that long but still quite annoying) loading times between areas?

Darth Tagnan · July 16, 2018

Morrandir said:
I guess the mod doesn't decrease the (not that long but still quite annoying) loading times between areas?

No, it doesn't. Yes, it's a little annoying - but that's part of the game that we're not likely to see a solution for.

Couchpotato · July 16, 2018

Morrandir said:
I guess the mod doesn't decrease the (not that long but still quite annoying) loading times between areas?

Nothing will not even installing it on an SSD as all that does is make them load a little faster. Like I said above another flaw of Unity on how it handles loading of new areas.

Especially in RPGs like PoE & Tyranny. :disappointed:

Kaldolek · July 16, 2018

Kaldaien here: Have I mentioned I hate juggling accounts? I never have access to them when most of these sites force me to change my password constantly. So I really do prefer it if you keep any discussion with me to the Steam forums.

In any case, allow me to try and address some of the things discussed so far as best I can .

Let me start by pointing out your understanding of SMT leaves a little bit to be desired. I am not criticizing you, just letting you know that if you view physical and logical cores as having the same set of scheduling challenges, then you already killed your software's performance. You would not be alone though, many games perform worse with SMT enabled because the developers do not fully appreciate how it works.

Here is an idealized workload if I magically had 16 full processor cores that could retire completed instructions in parallel without needed the to borrow resources from the paired logical core:

Call this: (i), and don't ever do this in the real-world:

Code:

A B C D E F AA BB CC DD EE FF AAA BBB CCC DDD
0 1 2 3 4 5 6  7  8  9  10 11 12  13  14  15

That does not actually work for two reasons:

The queue these threads are getting their data from has to acquire/release locks constantly
The logical processor pair {0,1} does not contain two fully independent execution pipelines.

Let's try something smarter now….

(ii): This is an SMT-aware design, instead of 16 workers we have 8 and allow the scheduler to move threads between two SMT-friendly cores whenever permissible.

Code:

CPU {0,1}   share data, instrtuction cache, and have the same locality to near/far memory.
CPU {2,3}   same deal
CPU {4,5}   …
CPU {6,7}   …
CPU {8,9}   …
CPU {10,11} …
CPU {12,13} …
CPU {14,15} …

We're now moving things in the a more sensible direction by reducing the number of parallel queues, but ideally we need to pipeline these things to make up for various sources of contention caused by going parallel.

(iii): A 4 thread pool - 4 job deep queue with locking illustrated:

Code:

      *
    * D
  * C D
* B C D
A B C D
A B C *     ….
A B * -
A * - -  (Queue 0 has 4 jobs fetched and has started on the first job, locks has been released and queue 1 is now fetching the nxet 4 jobs)
* - - -  (Queue 0 is fetching 4 jobs, holds a lock and 0,1,2, must wait)
0 1 2 3

This is more detailed discussion than the topic ever deserves here, but I was completely taken aback when it was suggested I do not understand multi-threaded execution. Unity doesn't understand; I understand just fine

Creating a very wide and shallow distribution of jobs in a threadpool is not ideal in an SMT-based system and only serves to hurt the performance of the other tasks the game needs to continue doing.

You can generally improve throughput of the system as a whole if you do not create a massive threadpool that pre-empts more important tasks (thread priority is missing from most of Unity).

Latency to retire the same number of jobs increases, but you will not interrupt the threads that are constantly buffering audio or delivering graphics commands to the driver.

That problem of the render thread being interrupted and starved of CPU time is emphatically why performance is in the toilet in this game and the driver can't keep my GPU load above 25%.

You can solve this any number of ways and if I had a better working overview of Unity's actual jobs I might instead opt to tweak the priority scheduling rather than constricting parallel work queues.

I did adequate profiling with my own custom tools to come to the conclusion that fewer threads in the pool is simple and effective and that Unity reacts to being told there are fewer CPU cores in a way favorable to the discussion above.

Lucky Day · July 16, 2018

Welcome

Most of us don't take part with each other in the steam forums - we have arguably been around longer as are previous incarnation was RPGDot.

Thanks for your detailed analysis. In my brief read of it, I will refrain from asking if the problem is that waiters are sitting around doing nothing because it looks like there's more going on and it may be you are just giving a hypothetical.

But it does remind of my brief internship at LLNL where they explained the going from 108,000 processors to the 219,000 processor machine wasn't so simple because any small lag can become an exponential problem.

I don't remember if Unity is open source and if just anyone can implement solutions for these kind of issues.

Kaldolek · July 17, 2018

Lucky Day said:
Welcome
Most of us don't take part with each other in the steam forums - we have arguably been around longer as are previous incarnation was RPGDot.

Fair enough. I actually have an account here with my namesake but I'm apparently really bad at keeping track of this stuff.

t's probably best not to try and talk to people on Steam anyway, it can lead to angry arguments for no apparent reason

Lucky Day said:
Thanks for your detailed analysis. In my brief read of it, I will refrain from asking if the problem is that waiters are sitting around doing nothing because it looks like there's more going on and it may be you are just giving a hypothetical.

I went overboard with my explanation for sure, but I was seeing a lot of confusion around one topic in particular (HyperThreading / SMT).

I just wanted to make sure we were on the same page here.

The actual cause of the performance problems is well understood by multiple parties now, and comes down to the graphics thread being overpowered by a huge pool of worker threads running at the same priority that prevent it from getting CPU time.

https://imgur.com/a/Cv0TzRs (29 FPS when Unity nobody questions Unity's weird design, vs. 98 FPS after)

It is beyond weird for a commerically licensed engine to make this mistake, but it is a quirk entirely caused by the basketcase Unity engine. Unity is always doing crazy things that leave me scratching my head

I have outlined a couple of possible paths to get to an official fix and have been impressed with how professional Obsidian has been. An official fix should be doable by them and Unity really needs a good slap on the wrist from a high profile developer.

Lucky Day said:
But it does remind of my brief internship at LLNL where they explained the going from 108,000 processors to the 219,000 processor machine wasn't so simple because any small lag can become an exponential problem.

I don't remember if Unity is open source and if just anyone can implement solutions for these kind of issues.

Unity is a victim of its own widespread success. It does very little that doesn't apply universally to game consoles, mobile phones and PC. If any of these devices has special design considerations, Unity's going to casually gloss right over them

But that's sometimes desirable.

Stingray · July 17, 2018

Kaldolek said:
Let me start by pointing out your understanding of SMT leaves a little bit to be desired. I am not criticizing you, just letting you know that if you view physical and logical cores as having the same set of scheduling challenges, then you already killed your software's performance. You would not be alone though, many games perform worse with SMT enabled because the developers do not fully appreciate how it works.

Sorry, from your post here I still don't buy that you fully understand what HT (SMT if you want to call it that) is all about. Yes, of course each physical core doesn't have multiple independent pipelines for HT purposes, but you still get more work done in the same period of time by scheduling 2 fulltime threads on that physical core, if it's got 2 logical cores. That's the whole point of it. The amount of work you get done on each of the 2 logical cores will be less, per period of time (compared to just running one thread on that physical core), but the total work done at the end of any given period of time will be more. Which one of those makes more sense for your purposes depends on how you've engineered everything. And yes, as you posted you might need to worry about data locality etc but the OS does its best effort to handle that for you, same as it does in NUMA situations.

Also, this has very little (probably nothing) to do with the PoE2/Unity issue anyway. That should be kinda obvious since people have found that they get better performance when the thread count is below even the number of physical cores they have, much less including the logical cores. So there's really just something fundamentally broken with the threaded work that Unity is doing.

Kaldolek · July 17, 2018

Stingray said:
Sorry, from your post here I still don't buy that you fully understand what HT (SMT if you want to call it that) is all about. Yes, of course each physical core doesn't have multiple independent pipelines for HT purposes, but you still get more work done in the same period of time by scheduling 2 fulltime threads on that physical core, if it's got 2 logical cores. That's the whole point of it. The amount of work you get done on each of the 2 logical cores will be less, per period of time (compared to just running one thread on that physical core), but the total work done at the end of any given period of time will be more. Which one of those makes more sense for your purposes depends on how you've engineered everything. And yes, as you posted you might need to worry about data locality etc but the OS does its best effort to handle that for you, same as it does in NUMA situations.

What makes you think that a pool of threads that spends most of its measured CPU time going into and out of kernel-mode to acquire the same lock has any runnable threads that can be switched to if you have more threads than locks?

We're in agreement that SMT is designed to use otherwise unused processor resources, but I don't understand why you think two related tasks, both waiting for a job to open from the same pool of jobs are even remotely viable candidates for SMT?

Unless they're using a spinlock (tsk, tsk), the scheduler's going to see that entire job pool is blocked because something is either adding a new job or removing one, and the scheduler can't activate any of the threads in the pool at this point. So, what happens? Faced with no runnable threads in your pool that aren't already running, it switches to a completely unrelated task. You added an excessive number of threads to your pool but got nothing but increased locking overhead to show for it

I feel like a complete ass right now arguing with you over this, but you're not entirely understanding some really key concepts.

Queue up 16 lines at your bank the next time you're there. Tell 8 tellers to go to lunch. If you really do this, you will find a bunch of confused customers who don't know who gets the next teller whenever one customer finishes. That about sums up what happens in a game engine as well, even with SMT. You increase thread sync. overhead and don't finish any task quicker than you would have otherwise.

I've been doing this a long time and there aren't any engines off the top of my head that will take every available logical processor and lump them all into a single job pool like this. In practice, SMT always favors an engine that has broken stuff into distinct tasks and keeps the number of threads well below … this.

Is this making any sense or am I just annoying a bunch of people for no reason?

I'll definitely stop if I'm doing the latter.

Stingray · July 17, 2018

I don't know the first thing about what PoE2/Unity does with its various threads, so I'll defer to you on all that, and it wasn't what I had a beef with to begin with.

Kaldolek said:
Queue up 16 lines at your bank the next time you're there. Tell 8 tellers to go to lunch. If you really do this, you will find a bunch of confused customers who don't know who gets the next teller whenever one customer finishes. That about sums up what happens in a game engine as well, even with SMT. You increase thread sync. overhead and don't finish any task quicker than you would have otherwise.

I still feel like you don't understand how HT works. While you are right about overhead, it doesn't overcome the benefits you can still gain from HT.

A better analogy for HT would be this:
- 16 lines at the bank
- 8 tellers handle 2 lines apiece. They can switch between their two lines while waiting on a customer to fill out paperwork, or waiting on approval for a transaction, etc.

Now obviously, this setup isn't going to get as much total work done overall as if you had 16 tellers, each handling 1 line. But you're still getting a little more work done than if you had only 8 lines and each teller handling 1 person. That's HT.

And I can only assume this must be the part you're missing… The reason it can achieve this is because it can put instructions from 2 threads into its pipeline at the same time, so in a way, contrary to what you said somewhere (here or Steam), it does, in fact, allow a physical core to execute 2 threads "simultaneously", they are both mixed into the pipeline simultaneously. Assuming you call that simultaneously anyway - I don't know why you wouldn't. When not using HT, you normally have a lot of empty slots in your pipeline, which is why HT allows more overall work to be done - when you are running 2 threads simultaneously in the pipeline, you end up with less empty slots.

With what you are saying, I guess I don't even understand what you think the actual point of HT is?

Is this making any sense or am I just annoying a bunch of people for no reason? I'll definitely stop if I'm doing the latter.

Well it's sort of off-topic, so yeah we probably shouldn't go on too much longer.

Lucky Day · July 17, 2018

Stingray said:
Well it's sort of off-topic, so yeah we probably shouldn't go on too much longer.

I'm enjoying this.

Can a Unity game be optimized to do this? What you guys are saying is its clearly not happening with PoE2 in general but is it a fault of the Unity engine?

This is the kind of reason everyone doesn't just to development for multicores. The Von Neumnann architecture is so much easier to work with and never has race conditions.

On the "waiting in line" analogy - are they doing something while waiting in line or are they stuck until the teller unlocks their kiosk?

Stingray · July 18, 2018

Lucky Day said:
Can a Unity game be optimized to do this? What you guys are saying is its clearly not happening with PoE2 in general but is it a fault of the Unity engine?

The problem seems to be that PoE2/Unity performance actually gets worse with more threads (beyond a certain number), even when you have a full physical core to run each thread on, without even bringing HT (HyperThreading) into the equation. If that's true (and it sounds like it is) then someone really screwed up - that's ridiculous. If they can't make efficient use of more threads beyond a certain number, obviously they shouldn't have started more. Would have to get that right before you could even talk about getting anything out of HT.

On the "waiting in line" analogy - are they doing something while waiting in line or are they stuck until the teller unlocks their kiosk?

HT runs 2 threads simultaneously, mixing instructions from both threads into the core's pipeline. I suppose the equivalent in my bank analogy would be that there's constantly some level of activity going on with the person at the front of both of the teller's lines, but they're both moving along slower than a single person would, if said teller had devoted his full attention to one.

But that's just looking at a small piece of the puzzle. In a modern OS, the OS's scheduler is going to be swapping around what threads/processes the cores are working on, on a quite frequent basis. Your system has dozens or hundreds of threads running, and everything needs a share of time eventually. Bank analogy doesn't work at all once you zoom out to that level.

Kaldolek · July 18, 2018

Stingray said:
I don't know the first thing about what PoE2/Unity does with its various threads, so I'll defer to you on all that, and it wasn't what I had a beef with to begin with.

I still feel like you don't understand how HT works. While you are right about overhead, it doesn't overcome the benefits you can still gain from HT.

A better analogy for HT would be this:
- 16 lines at the bank
- 8 tellers handle 2 lines apiece. They can switch between their two lines while waiting on a customer to fill out paperwork, or waiting on approval for a transaction, etc.

Now obviously, this setup isn't going to get as much total work done overall as if you had 16 tellers, each handling 1 line. But you're still getting a little more work done than if you had only 8 lines and each teller handling 1 person. That's HT.

And I can only assume this must be the part you're missing… The reason it can achieve this is because it can put instructions from 2 threads into its pipeline at the same time, so in a way, contrary to what you said somewhere (here or Steam), it does, in fact, allow a physical core to execute 2 threads "simultaneously", they are both mixed into the pipeline simultaneously. Assuming you call that simultaneously anyway - I don't know why you wouldn't. When not using HT, you normally have a lot of empty slots in your pipeline, which is why HT allows more overall work to be done - when you are running 2 threads simultaneously in the pipeline, you end up with less empty slots.

With what you are saying, I guess I don't even understand what you think the actual point of HT is?

Well it's sort of off-topic, so yeah we probably shouldn't go on too much longer.

Yes, I have no idea where you ever got the idea that coarse-grained concurrency problems involving lock contention are what SMT is designed for. Go ahead and spin up 16 threads that spend almost all of their runtime acquiring and releasing a mutex.

SMT only works when you have threads that can actually run. It solves pipeline blips, you need either two unrelated tasks that do not compete for the same resources or you need a much smarter way of pulling jobs off the queue and pipelining them to get a thread and its logical sibling working together. It's not going to solve a thread that is making a real mess synchronizing itself with other threads.

You want less of a thread that does this, never more. That's why 1. you don't just double the size of every thread pool in your system when you discover it's running on an SMT system and 2. I reduced the number of threads and got a 300% performance boost. I don't really know what you think SMT does, but this ain't the proper use-case as I plainly told you to begin with.

Moreover, spawning upwards of 48 threads is going to also cause massive contention for heap allocation in a lot of DLLs that happen to be loaded into your software. You cannot assume that these DLLs all use a private heap. There are numerous reasons you don't just jack thread count sky high when you can't even accomplish rudimentary concurrency without starving threads needed to give the GPU commands.

I don't know how C#'s memory allocator behaves these days, but it wasn't particularly good at compartmentalizing allocations per-thread 15 years ago, which is something you're going to need if you want to start quadrupling your thread count for no apparent reason. You need a whole bunch of address space spread as far apart as possible or these threads will destroy any hope of finishing very simple tasks (such as string manipulation) in a predictable amount of time. It's not a pretty subject and one game developers should stay far away from. The kind of allocator you need for massive concurrency also wastes massive amounts of memory.

Stingray · July 18, 2018

Kaldolek said:
Yes, I have no idea where you ever got the idea that coarse-grained concurrency problems involving lock contention are what SMT is designed for. Go ahead and spin up 16 threads that spend almost all of their runtime acquiring and releasing a mutex.

That's probably because I never had that idea, and never said anything of the sort. Go back and check, if you want. What initially caused you to freak and register an account on this forum, was that I commented here on how you don't understand HT, when you said (on Steam) that a single physical core can't do work on 2 threads "simultaneously" with HT. That's just wrong. Properly designed software most definitely can, and does. If you just refuse to believe that, then I guess we're done here because I don't really have anymore to say and I doubt anyone here cares.

2. I reduced the number of threads and got a 300% performance boost

Yeah, and the performance improvements you got obviously have nothing to do with HT because you are getting further performance improvements even when reducing the number of threads down below the number of physical cores on the system, much less the total number of logical cores. I don't understand how this isn't obvious to you, but you go off on some HT/SMT rant on Steam anyway.

you · July 18, 2018

I wanted to correct something here. First I have a great deal of understanding of multi-threading software as I have been writing such for 20+ years. Fundamentally I have to ask the question what is Pillars doing that would require keeping 16 (for example) threads constantly busy. The last piece of software I worked on had quite a bit of complexity and while we routinely allocated 320 threads in our pool (though by design some ran at higher priority than others) we rarely entered a starvation issue. While some of our threads were blocked on I/O some handle cpu intensive tasks and other sat idle waiting for work. Given the speed of modern cpu the locks mechanism required for the semaphore needed to implement the thread pool should cost well below 0.1% of the total computational cost. In a micro sense compare to regular computation they are very expensive esp given the impact on a processor pipeline (prefetch et all). I just don't buy that pillars computational requirements are causing those threads to be kept busy. I'm more inclined to believe either something is spinning or they have a polling design rather than an event based design that causes the threads to do infinite work (an example would be for a npc to sit there and constant check for an encounter rather than either trigger the npc encounter when a specific event (prefer for performance) or use a 'tick' to update the state of the system - probably good enough for a game but not very multi-tasking friendly if the 'ticks' are small). I can speculate all day long why 16 threads would result in starvation but to be honest even that shouldn't occur if they have equal priority. Modern CPU are FAST and the game (or game engine) would have to be horrible un-optimized to create an issue here.
-
Anyway I believe you that reducing the threads improve performance but I don't believe the explanation as to why (er more precisely while I might agree there is starvation - I think the actual cause of that starvation is not well understood).
--
Btw did you (or anyone who have played pillars 2) actually confirm that the threads are consuming 100% of available cpu across all processors. An alternative explanation could be that the game is reaching a state due to bad locking that some threads are blocked for a long period of time which make it appears to be a starvation issue (well it is sort of a starvation issue but it isn't due to competing computational). In the software I wrote we used locks with liberty but the time a lock was held ('cept when waiting for new work) was typically sub millisecond.

I'll get around to playing pillars 2 in the fall and at least make an attempt then to check if the program is actually saturating the cpu. While many games I play have saturated the gpu very few use much cpu time - with strategy games generally using the most.

Kaldolek said:
Kaldaien here: Have I mentioned I hate juggling accounts? I never have access to them when most of these sites force me to change my password constantly. So I really do prefer it if you keep any discussion with me to the Steam forums.

Kaldolek · July 18, 2018

you said:
I wanted to correct something here. First I have a great deal of understanding of multi-threading software as I have been writing such for 20+ years. Fundamentally I have to ask the question what is Pillars doing that would require keeping 16 (for example) threads constantly busy. The last piece of software I worked on had quite a bit of complexity and while we routinely allocated 320 threads in our pool (though by design some ran at higher priority than others) we rarely entered a starvation issue. While some of our threads were blocked on I/O some handle cpu intensive tasks and other sat idle waiting for work. Given the speed of modern cpu the locks mechanism required for the semaphore needed to implement the thread pool should cost well below 0.1% of the total computational cost. In a micro sense compare to regular computation they are very expensive esp given the impact on a processor pipeline (prefetch et all). I just don't buy that pillars computational requirements are causing those threads to be kept busy. I'm more inclined to believe either something is spinning or they have a polling design rather than an event based design that causes the threads to do infinite work (an example would be for a npc to sit there and constant check for an encounter rather than either trigger the npc encounter when a specific event (prefer for performance) or use a 'tick' to update the state of the system - probably good enough for a game but not very multi-tasking friendly if the 'ticks' are small). I can speculate all day long why 16 threads would result in starvation but to be honest even that shouldn't occur if they have equal priority. Modern CPU are FAST and the game (or game engine) would have to be horrible un-optimized to create an issue here.
-
Anyway I believe you that reducing the threads improve performance but I don't believe the explanation as to why (er more precisely while I might agree there is starvation - I think the actual cause of that starvation is not well understood).

C# in generally is horribly slow. This game crosses C++ and C# boundaries frequently because the real engine underneath all the C# fluff is written in a different language. So you get all the joy of having one language marshal memory layout from one ABI to the other over and over any time you do anything non-trivial.

Unity's been working that way for a long time though, so it's not as efficient as it could be, but it's also not the end of the world.

However, one thing Unity does that very few other engines do, is run its message loop for Win32 on a separate thread from the one that handles the D3D11 swapchain. This is the reason Unity cannot safely enable Fullscreen Exclusive mode by default. You can force it on, but it's likely you will deadlock if you Alt+Tab out because DXGI doesn't particularly like what Unity does.

As if that weren't bad enough, the message pump in some of these games is known to make calls to Sleep (…) rather than MsgWaitFor…. (…). When that alone does not deadlock your software, it will cause window events to pile up and then you sometimes get a flood of messages that have to be handled before the next frame is drawn and you can't meet that deadline -- so you hitch.

Unity is already doing really unsafe things with its threads before you go and add way more of them than the software can keep running. It's why the swapchain thread just up and dies for several frames unless you cut down the number of workers.

These things just don't ever stop with Unity. About the only thing I've found with Unity that doesn't need fixing is its input management. Unreal is terrible at that but good at just about everything else.

Darth Tagnan · July 18, 2018

This thread certainly took an unpredictable turn. How refreshing

Kaldolek · July 18, 2018

Stingray said:
That's probably because I never had that idea, and never said anything of the sort. Go back and check, if you want. What initially caused you to freak and register an account on this forum, was that I commented here on how you don't understand HT, when you said (on Steam) that a single physical core can't do work on 2 threads "simultaneously" with HT. That's just wrong. Properly designed software most definitely can, and does. If you just refuse to believe that, then I guess we're done here because I don't really have anymore to say and I doubt anyone here cares.

Yeah, and the performance improvements you got obviously have nothing to do with HT because you are getting further performance improvements even when reducing the number of threads down below the number of physical cores on the system, much less the total number of logical cores. I don't understand how this isn't obvious to you, but you go off on some HT/SMT rant on Steam anyway.

No, I can say with a great degree of certainty that it is caused by SMT. I think you're just bitter at this point because you didn't read what I wrote initially and now you have no way out that doesn't make you look foolish.

If you turn SMT off, the number of logical processors is cut in half. Various task pools shrink in size and the engine stops overestimating its ability to run stuff in parallel.

I've dealt with many engines that are actually profiled for concurrency and don't continue to bloat the number of threads they want to schedule beyond their means. About the only saving grace here would be the way they count the number of processors stops working after 32. So 10 years from now when we all have 64 core systems, the engine's going to stop spawning threads after 32. You'll have hundreds of threads allocated but it'll at least top out there.

Other engines would have stopped at about 8.

Pillars of Eternity II - Performance Fix Mod

SasqWatch

The Smoker

Darth Tagnan

Guest

SasqWatch

Darth Tagnan

Guest

Part-Time News-bot

Watcher

Daywatch

Watcher

SasqWatch

Watcher

SasqWatch

Daywatch

SasqWatch

Watcher

SasqWatch

Lazy_dog

Watcher

Darth Tagnan

Guest

Watcher