|
Your donations keep RPGWatch running!
RPGWatch Forums » Comments » News Comments » Pillars of Eternity II - Performance Fix Mod

Default Pillars of Eternity II - Performance Fix Mod

July 16th, 2018, 05:41
Originally Posted by Stingray View Post
Kind of impressive that, as an RPG player, you've only managed to play one game that used Unity? Are you sure?

My personal list would include… Shadowrun Returns, Shadowrun Dragonfall, Shadowrun Hong Kong, Might & Magic X, Wasteland 2, Torment: Tides of Numenera, PoE 1, Tyranny. Also tried Shroud of the Avatar. M&M X performance seemed a little meh, but as a turn-based grid-movement game, who cares really. No complaints about the rest.
Ok, I do have both Shadowrun DragonFall and Torment: Tides of Numenera, but I haven't really played them, in terms of playing for real. I tried out Dragonfall for around 20 minutes, according to steam, and have about an hour in T:TON. I do intend on playing a full game of Tides one day sooner or later, just have not got around to it yet.

Anyway, yea, I forgot about a couple of those games also being the Unity engine. I know Tides looks gorgeous to me though, so I'm not surprised its Unity.
--
To check out my games library, and see what recent games I'm playing, visit my steam profile! -- http://steamcommunity.com/profiles/76561197982351404
Arkadia7 is offline

Arkadia7

Arkadia7's Avatar
Keeper of the Watch
Original Sin 2 Donor

#21

Join Date: Oct 2009
Location: Pacific NorthWest, USA!
Posts: 1,190
Mentioned: 5 Post(s)

Default 

July 16th, 2018, 09:06
Originally Posted by Couchpotato View Post
Originally Posted by Darth Tagnan View Post
Anyone tried this?
Nope as I haven't played PoE 2 yet, but @joxer or @Silver might answer that.
I play games on i5 based machine with a single GPU and didn't notice any performance problems in Eternity2 that would need patching.

How the old rule goes? What isn't broken…
--
Toka Koka
joxer is offline

joxer

joxer's Avatar
The Smoker
Original Sin 1 & 2 Donor

#22

Join Date: Apr 2009
Posts: 18,658
Mentioned: 84 Post(s)

Default 

July 16th, 2018, 09:08
I tried this and it actually works - and it has options to allow for G-Sync as well, which Unity games apparently don't normally use.

I never had much in the way of FPS issues - but I did have quite a lot of stuttering - especially in areas like Neketaka. These are now almost entirely eliminated.

Also, the G-Sync fix makes scrolling around the map silky smooth.

Very nice.

That said, it's kinda weird that I have to use a third party overlay to get basic functionality working - as well as improved performance.
Darth Tagnan is offline

Darth Tagnan

Darth Tagnan's Avatar
SasqWatch

#23

Join Date: Jun 2018
Posts: 1,620
Mentioned: 17 Post(s)
+1:

Default 

July 16th, 2018, 10:11
I guess the mod doesn't decrease the (not that long but still quite annoying) loading times between areas?
--
We don't stop playing because we grow old; we grow old because we stop playing.
- George Bernard Shaw


Currently playing: -
Morrandir is offline

Morrandir

Morrandir's Avatar
SasqWatch
RPGWatch Donor
Original Sin 2 Donor

#24

Join Date: May 2013
Location: Germany
Posts: 2,882
Mentioned: 4 Post(s)

Default 

July 16th, 2018, 10:36
Originally Posted by Morrandir View Post
I guess the mod doesn't decrease the (not that long but still quite annoying) loading times between areas?
No, it doesn't. Yes, it's a little annoying - but that's part of the game that we're not likely to see a solution for.
Darth Tagnan is offline

Darth Tagnan

Darth Tagnan's Avatar
SasqWatch

#25

Join Date: Jun 2018
Posts: 1,620
Mentioned: 17 Post(s)
+1:

Default 

July 16th, 2018, 10:36
Originally Posted by Morrandir View Post
I guess the mod doesn't decrease the (not that long but still quite annoying) loading times between areas?
Nothing will not even installing it on an SSD as all that does is make them load a little faster. Like I said above another flaw of Unity on how it handles loading of new areas.

Especially in RPGs like PoE & Tyranny.
--
"One Vision. One Purpose. Peace Through Power."

Check out my RPG News Thread usually updated daily.
Couchpotato is offline

Couchpotato

Couchpotato's Avatar
In the Name of Kane!

#26

Join Date: Oct 2010
Location: New England
Posts: 18,026
Mentioned: 13 Post(s)
+1:

Default 

July 16th, 2018, 22:56
Kaldaien here: Have I mentioned I hate juggling accounts? I never have access to them when most of these sites force me to change my password constantly. So I really do prefer it if you keep any discussion with me to the Steam forums.

In any case, allow me to try and address some of the things discussed so far as best I can .


Let me start by pointing out your understanding of SMT leaves a little bit to be desired. I am not criticizing you, just letting you know that if you view physical and logical cores as having the same set of scheduling challenges, then you already killed your software's performance. You would not be alone though, many games perform worse with SMT enabled because the developers do not fully appreciate how it works.


Here is an idealized workload if I magically had 16 full processor cores that could retire completed instructions in parallel without needed the to borrow resources from the paired logical core:

Call this: (i), and don't ever do this in the real-world:
Code:
A B C D E F AA BB CC DD EE FF AAA BBB CCC DDD
0 1 2 3 4 5 6  7  8  9  10 11 12  13  14  15
That does not actually work for two reasons:
  1. The queue these threads are getting their data from has to acquire/release locks constantly
  2. The logical processor pair {0,1} does not contain two fully independent execution pipelines.


Let's try something smarter now….

(ii): This is an SMT-aware design, instead of 16 workers we have 8 and allow the scheduler to move threads between two SMT-friendly cores whenever permissible.

Code:
CPU {0,1}   share data, instrtuction cache, and have the same locality to near/far memory.
CPU {2,3}   same deal
CPU {4,5}   …
CPU {6,7}   …
CPU {8,9}   …
CPU {10,11} …
CPU {12,13} …
CPU {14,15} …

We're now moving things in the a more sensible direction by reducing the number of parallel queues, but ideally we need to pipeline these things to make up for various sources of contention caused by going parallel.

(iii): A 4 thread pool - 4 job deep queue with locking illustrated:
Code:
      *
    * D
  * C D
* B C D
A B C D
A B C *     ….
A B * -
A * - -  (Queue 0 has 4 jobs fetched and has started on the first job, locks has been released and queue 1 is now fetching the nxet 4 jobs)
* - - -  (Queue 0 is fetching 4 jobs, holds a lock and 0,1,2, must wait)
0 1 2 3
This is more detailed discussion than the topic ever deserves here, but I was completely taken aback when it was suggested I do not understand multi-threaded execution. Unity doesn't understand; I understand just fine

Creating a very wide and shallow distribution of jobs in a threadpool is not ideal in an SMT-based system and only serves to hurt the performance of the other tasks the game needs to continue doing.

You can generally improve throughput of the system as a whole if you do not create a massive threadpool that pre-empts more important tasks (thread priority is missing from most of Unity).

Latency to retire the same number of jobs increases, but you will not interrupt the threads that are constantly buffering audio or delivering graphics commands to the driver.

  • That problem of the render thread being interrupted and starved of CPU time is emphatically why performance is in the toilet in this game and the driver can't keep my GPU load above 25%.

You can solve this any number of ways and if I had a better working overview of Unity's actual jobs I might instead opt to tweak the priority scheduling rather than constricting parallel work queues.

I did adequate profiling with my own custom tools to come to the conclusion that fewer threads in the pool is simple and effective and that Unity reacts to being told there are fewer CPU cores in a way favorable to the discussion above.
Kaldolek is offline

Kaldolek

Traveler

#27

Join Date: Jul 2018
Posts: 7
Mentioned: 0 Post(s)
+1:

Default 

July 16th, 2018, 23:21
Welcome

Most of us don't take part with each other in the steam forums - we have arguably been around longer as are previous incarnation was RPGDot.

Thanks for your detailed analysis. In my brief read of it, I will refrain from asking if the problem is that waiters are sitting around doing nothing because it looks like there's more going on and it may be you are just giving a hypothetical.

But it does remind of my brief internship at LLNL where they explained the going from 108,000 processors to the 219,000 processor machine wasn't so simple because any small lag can become an exponential problem.

I don't remember if Unity is open source and if just anyone can implement solutions for these kind of issues.
--
Developer of The Wizard's Grave Android game. Discussion Thread:
http://www.rpgwatch.com/forums/showthread.php?t=22520
Lucky Day is online now

Lucky Day

Lucky Day's Avatar
Daywatch

#28

Join Date: Oct 2006
Location: The Uncanny Valley
Posts: 4,535
Mentioned: 4 Post(s)

Default 

July 17th, 2018, 00:54
Originally Posted by Lucky Day View Post
Welcome
Most of us don't take part with each other in the steam forums - we have arguably been around longer as are previous incarnation was RPGDot.
Fair enough. I actually have an account here with my namesake but I'm apparently really bad at keeping track of this stuff.

t's probably best not to try and talk to people on Steam anyway, it can lead to angry arguments for no apparent reason



Originally Posted by Lucky Day View Post
Thanks for your detailed analysis. In my brief read of it, I will refrain from asking if the problem is that waiters are sitting around doing nothing because it looks like there's more going on and it may be you are just giving a hypothetical.
I went overboard with my explanation for sure, but I was seeing a lot of confusion around one topic in particular (HyperThreading / SMT).

I just wanted to make sure we were on the same page here.

The actual cause of the performance problems is well understood by multiple parties now, and comes down to the graphics thread being overpowered by a huge pool of worker threads running at the same priority that prevent it from getting CPU time.
  • https://imgur.com/a/Cv0TzRs (29 FPS when Unity nobody questions Unity's weird design, vs. 98 FPS after)

It is beyond weird for a commerically licensed engine to make this mistake, but it is a quirk entirely caused by the basketcase Unity engine. Unity is always doing crazy things that leave me scratching my head

I have outlined a couple of possible paths to get to an official fix and have been impressed with how professional Obsidian has been. An official fix should be doable by them and Unity really needs a good slap on the wrist from a high profile developer.


Originally Posted by Lucky Day View Post
But it does remind of my brief internship at LLNL where they explained the going from 108,000 processors to the 219,000 processor machine wasn't so simple because any small lag can become an exponential problem.

I don't remember if Unity is open source and if just anyone can implement solutions for these kind of issues.
Unity is a victim of its own widespread success. It does very little that doesn't apply universally to game consoles, mobile phones and PC. If any of these devices has special design considerations, Unity's going to casually gloss right over them But that's sometimes desirable.
Kaldolek is offline

Kaldolek

Traveler

#29

Join Date: Jul 2018
Posts: 7
Mentioned: 0 Post(s)

Default 

July 17th, 2018, 01:58
Originally Posted by Kaldolek View Post
Let me start by pointing out your understanding of SMT leaves a little bit to be desired. I am not criticizing you, just letting you know that if you view physical and logical cores as having the same set of scheduling challenges, then you already killed your software's performance. You would not be alone though, many games perform worse with SMT enabled because the developers do not fully appreciate how it works.
Sorry, from your post here I still don't buy that you fully understand what HT (SMT if you want to call it that) is all about. Yes, of course each physical core doesn't have multiple independent pipelines for HT purposes, but you still get more work done in the same period of time by scheduling 2 fulltime threads on that physical core, if it's got 2 logical cores. That's the whole point of it. The amount of work you get done on each of the 2 logical cores will be less, per period of time (compared to just running one thread on that physical core), but the total work done at the end of any given period of time will be more. Which one of those makes more sense for your purposes depends on how you've engineered everything. And yes, as you posted you might need to worry about data locality etc but the OS does its best effort to handle that for you, same as it does in NUMA situations.

Also, this has very little (probably nothing) to do with the PoE2/Unity issue anyway. That should be kinda obvious since people have found that they get better performance when the thread count is below even the number of physical cores they have, much less including the logical cores. So there's really just something fundamentally broken with the threaded work that Unity is doing.
Stingray is offline

Stingray

SasqWatch
Original Sin 1 & 2 Donor

#30

Join Date: Sep 2007
Posts: 1,590
Mentioned: 2 Post(s)

Default 

July 17th, 2018, 04:40
Originally Posted by Stingray View Post
Sorry, from your post here I still don't buy that you fully understand what HT (SMT if you want to call it that) is all about. Yes, of course each physical core doesn't have multiple independent pipelines for HT purposes, but you still get more work done in the same period of time by scheduling 2 fulltime threads on that physical core, if it's got 2 logical cores. That's the whole point of it. The amount of work you get done on each of the 2 logical cores will be less, per period of time (compared to just running one thread on that physical core), but the total work done at the end of any given period of time will be more. Which one of those makes more sense for your purposes depends on how you've engineered everything. And yes, as you posted you might need to worry about data locality etc but the OS does its best effort to handle that for you, same as it does in NUMA situations.
What makes you think that a pool of threads that spends most of its measured CPU time going into and out of kernel-mode to acquire the same lock has any runnable threads that can be switched to if you have more threads than locks?

We're in agreement that SMT is designed to use otherwise unused processor resources, but I don't understand why you think two related tasks, both waiting for a job to open from the same pool of jobs are even remotely viable candidates for SMT?

Unless they're using a spinlock (tsk, tsk), the scheduler's going to see that entire job pool is blocked because something is either adding a new job or removing one, and the scheduler can't activate any of the threads in the pool at this point. So, what happens? Faced with no runnable threads in your pool that aren't already running, it switches to a completely unrelated task. You added an excessive number of threads to your pool but got nothing but increased locking overhead to show for it

I feel like a complete ass right now arguing with you over this, but you're not entirely understanding some really key concepts.

Queue up 16 lines at your bank the next time you're there. Tell 8 tellers to go to lunch. If you really do this, you will find a bunch of confused customers who don't know who gets the next teller whenever one customer finishes. That about sums up what happens in a game engine as well, even with SMT. You increase thread sync. overhead and don't finish any task quicker than you would have otherwise.

I've been doing this a long time and there aren't any engines off the top of my head that will take every available logical processor and lump them all into a single job pool like this. In practice, SMT always favors an engine that has broken stuff into distinct tasks and keeps the number of threads well below … this.

Is this making any sense or am I just annoying a bunch of people for no reason? I'll definitely stop if I'm doing the latter.
Kaldolek is offline

Kaldolek

Traveler

#31

Join Date: Jul 2018
Posts: 7
Mentioned: 0 Post(s)
+1:

Default 

July 17th, 2018, 04:58
I don't know the first thing about what PoE2/Unity does with its various threads, so I'll defer to you on all that, and it wasn't what I had a beef with to begin with.

Originally Posted by Kaldolek View Post
Queue up 16 lines at your bank the next time you're there. Tell 8 tellers to go to lunch. If you really do this, you will find a bunch of confused customers who don't know who gets the next teller whenever one customer finishes. That about sums up what happens in a game engine as well, even with SMT. You increase thread sync. overhead and don't finish any task quicker than you would have otherwise.
I still feel like you don't understand how HT works. While you are right about overhead, it doesn't overcome the benefits you can still gain from HT.

A better analogy for HT would be this:
- 16 lines at the bank
- 8 tellers handle 2 lines apiece. They can switch between their two lines while waiting on a customer to fill out paperwork, or waiting on approval for a transaction, etc.

Now obviously, this setup isn't going to get as much total work done overall as if you had 16 tellers, each handling 1 line. But you're still getting a little more work done than if you had only 8 lines and each teller handling 1 person. That's HT.

And I can only assume this must be the part you're missing… The reason it can achieve this is because it can put instructions from 2 threads into its pipeline at the same time, so in a way, contrary to what you said somewhere (here or Steam), it does, in fact, allow a physical core to execute 2 threads "simultaneously", they are both mixed into the pipeline simultaneously. Assuming you call that simultaneously anyway - I don't know why you wouldn't. When not using HT, you normally have a lot of empty slots in your pipeline, which is why HT allows more overall work to be done - when you are running 2 threads simultaneously in the pipeline, you end up with less empty slots.

With what you are saying, I guess I don't even understand what you think the actual point of HT is?

Is this making any sense or am I just annoying a bunch of people for no reason? I'll definitely stop if I'm doing the latter.
Well it's sort of off-topic, so yeah we probably shouldn't go on too much longer.
Last edited by Stingray; July 17th, 2018 at 05:13.
Stingray is offline

Stingray

SasqWatch
Original Sin 1 & 2 Donor

#32

Join Date: Sep 2007
Posts: 1,590
Mentioned: 2 Post(s)

Default 

July 17th, 2018, 18:45
Originally Posted by Stingray View Post
Well it's sort of off-topic, so yeah we probably shouldn't go on too much longer.
I'm enjoying this.

Can a Unity game be optimized to do this? What you guys are saying is its clearly not happening with PoE2 in general but is it a fault of the Unity engine?

This is the kind of reason everyone doesn't just to development for multicores. The Von Neumnann architecture is so much easier to work with and never has race conditions.

On the "waiting in line" analogy - are they doing something while waiting in line or are they stuck until the teller unlocks their kiosk?
--
Developer of The Wizard's Grave Android game. Discussion Thread:
http://www.rpgwatch.com/forums/showthread.php?t=22520
Lucky Day is online now

Lucky Day

Lucky Day's Avatar
Daywatch

#33

Join Date: Oct 2006
Location: The Uncanny Valley
Posts: 4,535
Mentioned: 4 Post(s)

Default 

July 18th, 2018, 00:19
Originally Posted by Lucky Day View Post
Can a Unity game be optimized to do this? What you guys are saying is its clearly not happening with PoE2 in general but is it a fault of the Unity engine?
The problem seems to be that PoE2/Unity performance actually gets worse with more threads (beyond a certain number), even when you have a full physical core to run each thread on, without even bringing HT (HyperThreading) into the equation. If that's true (and it sounds like it is) then someone really screwed up - that's ridiculous. If they can't make efficient use of more threads beyond a certain number, obviously they shouldn't have started more. Would have to get that right before you could even talk about getting anything out of HT.

On the "waiting in line" analogy - are they doing something while waiting in line or are they stuck until the teller unlocks their kiosk?
HT runs 2 threads simultaneously, mixing instructions from both threads into the core's pipeline. I suppose the equivalent in my bank analogy would be that there's constantly some level of activity going on with the person at the front of both of the teller's lines, but they're both moving along slower than a single person would, if said teller had devoted his full attention to one.

But that's just looking at a small piece of the puzzle. In a modern OS, the OS's scheduler is going to be swapping around what threads/processes the cores are working on, on a quite frequent basis. Your system has dozens or hundreds of threads running, and everything needs a share of time eventually. Bank analogy doesn't work at all once you zoom out to that level.
Stingray is offline

Stingray

SasqWatch
Original Sin 1 & 2 Donor

#34

Join Date: Sep 2007
Posts: 1,590
Mentioned: 2 Post(s)

Default 

July 18th, 2018, 20:06
Originally Posted by Stingray View Post
I don't know the first thing about what PoE2/Unity does with its various threads, so I'll defer to you on all that, and it wasn't what I had a beef with to begin with.


I still feel like you don't understand how HT works. While you are right about overhead, it doesn't overcome the benefits you can still gain from HT.

A better analogy for HT would be this:
- 16 lines at the bank
- 8 tellers handle 2 lines apiece. They can switch between their two lines while waiting on a customer to fill out paperwork, or waiting on approval for a transaction, etc.

Now obviously, this setup isn't going to get as much total work done overall as if you had 16 tellers, each handling 1 line. But you're still getting a little more work done than if you had only 8 lines and each teller handling 1 person. That's HT.

And I can only assume this must be the part you're missing… The reason it can achieve this is because it can put instructions from 2 threads into its pipeline at the same time, so in a way, contrary to what you said somewhere (here or Steam), it does, in fact, allow a physical core to execute 2 threads "simultaneously", they are both mixed into the pipeline simultaneously. Assuming you call that simultaneously anyway - I don't know why you wouldn't. When not using HT, you normally have a lot of empty slots in your pipeline, which is why HT allows more overall work to be done - when you are running 2 threads simultaneously in the pipeline, you end up with less empty slots.

With what you are saying, I guess I don't even understand what you think the actual point of HT is?


Well it's sort of off-topic, so yeah we probably shouldn't go on too much longer.
Yes, I have no idea where you ever got the idea that coarse-grained concurrency problems involving lock contention are what SMT is designed for. Go ahead and spin up 16 threads that spend almost all of their runtime acquiring and releasing a mutex.

SMT only works when you have threads that can actually run. It solves pipeline blips, you need either two unrelated tasks that do not compete for the same resources or you need a much smarter way of pulling jobs off the queue and pipelining them to get a thread and its logical sibling working together. It's not going to solve a thread that is making a real mess synchronizing itself with other threads.

You want less of a thread that does this, never more. That's why 1. you don't just double the size of every thread pool in your system when you discover it's running on an SMT system and 2. I reduced the number of threads and got a 300% performance boost. I don't really know what you think SMT does, but this ain't the proper use-case as I plainly told you to begin with.

Moreover, spawning upwards of 48 threads is going to also cause massive contention for heap allocation in a lot of DLLs that happen to be loaded into your software. You cannot assume that these DLLs all use a private heap. There are numerous reasons you don't just jack thread count sky high when you can't even accomplish rudimentary concurrency without starving threads needed to give the GPU commands.

I don't know how C#'s memory allocator behaves these days, but it wasn't particularly good at compartmentalizing allocations per-thread 15 years ago, which is something you're going to need if you want to start quadrupling your thread count for no apparent reason. You need a whole bunch of address space spread as far apart as possible or these threads will destroy any hope of finishing very simple tasks (such as string manipulation) in a predictable amount of time. It's not a pretty subject and one game developers should stay far away from. The kind of allocator you need for massive concurrency also wastes massive amounts of memory.
Last edited by Kaldolek; July 18th, 2018 at 20:19.
Kaldolek is offline

Kaldolek

Traveler

#35

Join Date: Jul 2018
Posts: 7
Mentioned: 0 Post(s)

Default 

July 18th, 2018, 20:23
Originally Posted by Kaldolek View Post
Yes, I have no idea where you ever got the idea that coarse-grained concurrency problems involving lock contention are what SMT is designed for. Go ahead and spin up 16 threads that spend almost all of their runtime acquiring and releasing a mutex.
That's probably because I never had that idea, and never said anything of the sort. Go back and check, if you want. What initially caused you to freak and register an account on this forum, was that I commented here on how you don't understand HT, when you said (on Steam) that a single physical core can't do work on 2 threads "simultaneously" with HT. That's just wrong. Properly designed software most definitely can, and does. If you just refuse to believe that, then I guess we're done here because I don't really have anymore to say and I doubt anyone here cares.

2. I reduced the number of threads and got a 300% performance boost
Yeah, and the performance improvements you got obviously have nothing to do with HT because you are getting further performance improvements even when reducing the number of threads down below the number of physical cores on the system, much less the total number of logical cores. I don't understand how this isn't obvious to you, but you go off on some HT/SMT rant on Steam anyway.
Stingray is offline

Stingray

SasqWatch
Original Sin 1 & 2 Donor

#36

Join Date: Sep 2007
Posts: 1,590
Mentioned: 2 Post(s)

Default 

July 18th, 2018, 20:57
I wanted to correct something here. First I have a great deal of understanding of multi-threading software as I have been writing such for 20+ years. Fundamentally I have to ask the question what is Pillars doing that would require keeping 16 (for example) threads constantly busy. The last piece of software I worked on had quite a bit of complexity and while we routinely allocated 320 threads in our pool (though by design some ran at higher priority than others) we rarely entered a starvation issue. While some of our threads were blocked on I/O some handle cpu intensive tasks and other sat idle waiting for work. Given the speed of modern cpu the locks mechanism required for the semaphore needed to implement the thread pool should cost well below 0.1% of the total computational cost. In a micro sense compare to regular computation they are very expensive esp given the impact on a processor pipeline (prefetch et all). I just don't buy that pillars computational requirements are causing those threads to be kept busy. I'm more inclined to believe either something is spinning or they have a polling design rather than an event based design that causes the threads to do infinite work (an example would be for a npc to sit there and constant check for an encounter rather than either trigger the npc encounter when a specific event (prefer for performance) or use a 'tick' to update the state of the system - probably good enough for a game but not very multi-tasking friendly if the 'ticks' are small). I can speculate all day long why 16 threads would result in starvation but to be honest even that shouldn't occur if they have equal priority. Modern CPU are FAST and the game (or game engine) would have to be horrible un-optimized to create an issue here.
-
Anyway I believe you that reducing the threads improve performance but I don't believe the explanation as to why (er more precisely while I might agree there is starvation - I think the actual cause of that starvation is not well understood).
--
Btw did you (or anyone who have played pillars 2) actually confirm that the threads are consuming 100% of available cpu across all processors. An alternative explanation could be that the game is reaching a state due to bad locking that some threads are blocked for a long period of time which make it appears to be a starvation issue (well it is sort of a starvation issue but it isn't due to competing computational). In the software I wrote we used locks with liberty but the time a lock was held ('cept when waiting for new work) was typically sub millisecond.

I'll get around to playing pillars 2 in the fall and at least make an attempt then to check if the program is actually saturating the cpu. While many games I play have saturated the gpu very few use much cpu time - with strategy games generally using the most.

Originally Posted by Kaldolek View Post
Kaldaien here: Have I mentioned I hate juggling accounts? I never have access to them when most of these sites force me to change my password constantly. So I really do prefer it if you keep any discussion with me to the Steam forums.
Last edited by you; July 18th, 2018 at 21:12.
you is offline

you

Lazy_dog
RPGWatch Donor
Original Sin 2 Donor

#37

Join Date: Oct 2006
Location: usa - boston
Posts: 5,251
Mentioned: 23 Post(s)

Default 

July 18th, 2018, 21:16
Originally Posted by you View Post
I wanted to correct something here. First I have a great deal of understanding of multi-threading software as I have been writing such for 20+ years. Fundamentally I have to ask the question what is Pillars doing that would require keeping 16 (for example) threads constantly busy. The last piece of software I worked on had quite a bit of complexity and while we routinely allocated 320 threads in our pool (though by design some ran at higher priority than others) we rarely entered a starvation issue. While some of our threads were blocked on I/O some handle cpu intensive tasks and other sat idle waiting for work. Given the speed of modern cpu the locks mechanism required for the semaphore needed to implement the thread pool should cost well below 0.1% of the total computational cost. In a micro sense compare to regular computation they are very expensive esp given the impact on a processor pipeline (prefetch et all). I just don't buy that pillars computational requirements are causing those threads to be kept busy. I'm more inclined to believe either something is spinning or they have a polling design rather than an event based design that causes the threads to do infinite work (an example would be for a npc to sit there and constant check for an encounter rather than either trigger the npc encounter when a specific event (prefer for performance) or use a 'tick' to update the state of the system - probably good enough for a game but not very multi-tasking friendly if the 'ticks' are small). I can speculate all day long why 16 threads would result in starvation but to be honest even that shouldn't occur if they have equal priority. Modern CPU are FAST and the game (or game engine) would have to be horrible un-optimized to create an issue here.
-
Anyway I believe you that reducing the threads improve performance but I don't believe the explanation as to why (er more precisely while I might agree there is starvation - I think the actual cause of that starvation is not well understood).
C# in generally is horribly slow. This game crosses C++ and C# boundaries frequently because the real engine underneath all the C# fluff is written in a different language. So you get all the joy of having one language marshal memory layout from one ABI to the other over and over any time you do anything non-trivial.

Unity's been working that way for a long time though, so it's not as efficient as it could be, but it's also not the end of the world.

However, one thing Unity does that very few other engines do, is run its message loop for Win32 on a separate thread from the one that handles the D3D11 swapchain. This is the reason Unity cannot safely enable Fullscreen Exclusive mode by default. You can force it on, but it's likely you will deadlock if you Alt+Tab out because DXGI doesn't particularly like what Unity does.

As if that weren't bad enough, the message pump in some of these games is known to make calls to Sleep (…) rather than MsgWaitFor…. (…). When that alone does not deadlock your software, it will cause window events to pile up and then you sometimes get a flood of messages that have to be handled before the next frame is drawn and you can't meet that deadline -- so you hitch.

Unity is already doing really unsafe things with its threads before you go and add way more of them than the software can keep running. It's why the swapchain thread just up and dies for several frames unless you cut down the number of workers.

These things just don't ever stop with Unity. About the only thing I've found with Unity that doesn't need fixing is its input management. Unreal is terrible at that but good at just about everything else.
Kaldolek is offline

Kaldolek

Traveler

#38

Join Date: Jul 2018
Posts: 7
Mentioned: 0 Post(s)

Default 

July 18th, 2018, 21:20
This thread certainly took an unpredictable turn. How refreshing
Darth Tagnan is offline

Darth Tagnan

Darth Tagnan's Avatar
SasqWatch

#39

Join Date: Jun 2018
Posts: 1,620
Mentioned: 17 Post(s)

Default 

July 18th, 2018, 21:29
Originally Posted by Stingray View Post
That's probably because I never had that idea, and never said anything of the sort. Go back and check, if you want. What initially caused you to freak and register an account on this forum, was that I commented here on how you don't understand HT, when you said (on Steam) that a single physical core can't do work on 2 threads "simultaneously" with HT. That's just wrong. Properly designed software most definitely can, and does. If you just refuse to believe that, then I guess we're done here because I don't really have anymore to say and I doubt anyone here cares.


Yeah, and the performance improvements you got obviously have nothing to do with HT because you are getting further performance improvements even when reducing the number of threads down below the number of physical cores on the system, much less the total number of logical cores. I don't understand how this isn't obvious to you, but you go off on some HT/SMT rant on Steam anyway.
No, I can say with a great degree of certainty that it is caused by SMT. I think you're just bitter at this point because you didn't read what I wrote initially and now you have no way out that doesn't make you look foolish.

If you turn SMT off, the number of logical processors is cut in half. Various task pools shrink in size and the engine stops overestimating its ability to run stuff in parallel.

I've dealt with many engines that are actually profiled for concurrency and don't continue to bloat the number of threads they want to schedule beyond their means. About the only saving grace here would be the way they count the number of processors stops working after 32. So 10 years from now when we all have 64 core systems, the engine's going to stop spawning threads after 32. You'll have hundreds of threads allocated but it'll at least top out there.

Other engines would have stopped at about 8.
Kaldolek is offline

Kaldolek

Traveler

#40

Join Date: Jul 2018
Posts: 7
Mentioned: 0 Post(s)
RPGWatch Forums » Comments » News Comments » Pillars of Eternity II - Performance Fix Mod
Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 18:19.
Powered by vBulletin® Version 3.8.10
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
User Alert System provided by Advanced User Tagging (Lite) - vBulletin Mods & Addons Copyright © 2018 DragonByte Technologies Ltd.
Copyright by RPGWatch