Stats stats stats and stats ! NC hardmode version 2

Discussion in 'PlanetSide 2 Gameplay Discussion' started by CoreCombat, Oct 22, 2013.

  1. Xae

  2. Aegie

    Yeah, I knew that and I'm sorry, that was not directed at you personally- I'm just sick of all of this really.
    • Up x 1
  3. Aegie

    If KPU was a bad metric for comparing weapons then it is a remarkable fact that weapons that have carbon copy stats perform so equally according to KPU.
  4. CDN_Wolvie

    Thanks. I did as you asked, I took a look at the KPU of the -P NS Weapons and I'm not seeing a reasonable explanation post of why KPU is a insufficient metric.

    Is this meant to suggest that KPU is insufficient because the KPU of -P is different than the other NS weapon, since there is no weapon stat difference? I don't think this shows what you think it shows, since they are essentially the same weapon stats, the values of the weapon should probably be totaled together to show the true average of their weapon stats for the entire game.

    For example: NS-11P and NS-11A can reasonably have their total kills and unique users added together, so the NS-11A(P) KPU is (6924 + 24942) / (721 + 1972) = 31866 / 2693 = 11.83

    Correct me if I am wrong but if you pull from the total aggregate of a object's use in the user base, you get a more accurate picture. I am sure Aegie will correct me if my understanding of this in statistics is inaccurate.
  5. Xae

    Except they don't.
    Harasser Kobalt: 60% Variance
    100mm HE: 31% Variance
    MBT Halberd: 27% Variance
    NS15-M/MP: 24% Variance
    NS15-C/CP: 37% Variance


    Hilariously by your KPU Metric: The Phoenix is the best ESRL.
  6. Xae

    Don't add them together.
    The guns are identical, yet have vastly different KPU.

    If KPU was a measure of balance they would be extremely similar.
  7. CDN_Wolvie

    No, that isn't what it is showing, its showing that subset of the total of users using that objects weapon stats has a variance in possible outcomes. The larger the number of users you pull from (in the case of adding them together, the total), the closer it achieves average probability.

    http://en.wikipedia.org/wiki/Law_of_large_numbers

    So the smaller the number of -P subset of unique users for a NS weapon is, the more likely you are going to be seeing more extreme variances from the larger group of users' of that NS weapon's KPU. This is to be expected and does not invalidate the use of the KPU to understand the average kill performance of a weapon for all the users of that weapon in any of the possible PS2 scenarios a player finds them self in (variance in ranges of engagements, shooting first or second, outnumbered or zerging, etc).
    • Up x 1
  8. Aegie

    Link to those stats?

    I see TR Kobalt MBT: 2.83; VS: 3.16; NC: 3.31; Remarkably close and probably very similar to the within-weapon variance over time within any of the factions.

    TR Halber MBT: 4.16; VS: 5.73; NC: 5.10; A bit of a difference to be sure but still nothing extreme or very dissimilar to the within-weapon variance over time within any of the factions.

    TR Ranger MBT: 1.23; VS: 1.20; NC: 1.38; practically identical and well within the within-weapon variance over time within any of the factions.

    TR Walker MBT: 1.47; VS: 1.67; NC: 1.50; dang, again that is remarkably close and well within the within-weapon variance over time.

    The same is true for shotguns, the same is true for sniper rifles, the same is true for scout rifles...

    What is particularly interesting here is that VS suffer a 25m/s loss in velocity within shotguns for the no drop and they do not suffer any penalty for no drop in the scout rifles. I will give you one guess who excels with scout rifles and the bonus question will be whether or not the amount they excel in this class is stable across the weapons.

    According to kills I would put my money on the Phoenix being the best ESRL. No question about it- Phoenix front loads a lot of damage, comes more or less without warning, and can target any object. Why is that difficult to accept? Phoenix may not be the greatest at defending against vehicles but it sure is a lot more likely to get you the kill than a Striker. How does this not serve as support for the metric and instead counters it? What, were you expecting me to try and defend the Phoenix for some reason and make some BS claim about how it is trash at killing people despite evidence to the contrary?

    Is KPU perfect? Absolutely not. Is it worthless? Nope. It is further evidence that converges with all kinds of other evidence and together all the evidence paints a very clear picture.
    • Up x 3
  9. Herby20

    No, they shouldn't. The differences on the carbines make them excel in different scenarios. The ACX-11 is clearly designed for engagements at longer range. The Jaguar and Serpent are not. In fact, they are the complete opposite; they are designed for close range engagements.
    • Up x 2
  10. Aegie

    You pretty much nailed it in your previous response- part of the issue is that there will indeed be variance within-weapon across a number of different areas.

    Moreover, finding that there is one or even a few cases where there is what appears to be significant variance outside of what is expected is, actually, not unexpected at all.

    So, for instance, when we do significance testing and we say "there is a 95% probability that" this is saying that there is a 5% probability that we could see the results we are seeing even if the relationship does not hold in the population. In certain cases, we will do an ANOVA (analysis of variance). If we have more than two groups that we are comparing and we want to test for significant difference in, say, height among people with blue eyes, brown eyes, and green eyes, then the issue arises in that we have an omnibus F test that tells us whether any significant differences are likely to exist somewhere in these three groups. In order to know where this difference lies (i.e. between green and blue, green and brown, etc.) we perform what are called "post hoc" comparison of means. This is a fancy way of saying we will perform a regular t-test (a test of mean difference between two groups) several times.

    Problem is, since this is one sample and we need to run more than one significance test then we need to adjust our significance tests in order to ensure that the overall probability of making a type II error (accepting a false hypothesis) is still only 5%. The simplest way we do this is by the "Bonferroni" method, which is essentially just saying that if you want the overall probably of making a type II error to be at or below 5% then we need to take this p value (.05) and divide by the number of tests we must perform (i.e. .05/x = y where x is the number of tests and y is the resultant probability we must surpass to achieve significance). The reason that we do this is because we know that there is always a possibility that the hypothesis may be correct despite our data not finding evidence of a difference and vice versa- in science we try to stack the deck very strongly against type II errors (accepting a false hypothesis) and favor making type I errors (rejecting a correct hypothesis).

    For example, when we say "95% likelihood" what we really mean is that were we to take 100 other independent samples conducted identically within the population then we would expect to find this result (find a mean values within out confidence interval, etc.) in 95 out of 100 of those cases. Ergo, the more tests of the sample we conduct, the more likely we are to have one of our tests be one of these outliers. This is similar to the Rozencrantz and Gildenstern problem where it is entirely possible to flip a coin and infinite number of times and have that coin land on heads every single without violating any of the laws of probability.

    All this really means is that in any case where you are collecting data there is always some possibility, however small, that a very real and perhaps even strong effect exists and yet you do not see mathematical evidence in the data.

    TLDR- One example where there appears to be a large discrepancy without accounting for within-weapon variance over time hardly discredits and entire metric or set of results. This is akin to having the "perfect" measure and find one place where the NC is not under performing and then claiming "see here, the NC are not under performing in this one place so clearly the hypothesis is invalidated".
    • Up x 1
  11. Aegie

    Yeah, pretty much this- for instance, just for giggles, look at the API data from this forum. Carbines, go find the GD-7F.

    Now, the range of KPU across these 7 days goes from 9.1 to 13.2- this is greater than the example that the Xae is pointing out with the -P weapon variant. This is a big reason why you aggregate (simple average) across the days because it accounts for this within-weapon variance.

    This data is organized by day and includes two weeks- if you take a nice long look at the data you can probably pick out after a little while what days in the set are weekdays and what days are weekends. Does this mean that the weapons are better on weekends than weekdays? Of course not. Does this mean that KPU is worthless? Of course not and actually this lends some validity to the measure because it is able to show us relatively clearly when people are playing for longer durations. The issue, as always with data analysis, is that Xae is correct to a certain extent in that we must understand the limitations we are dealing with and be comfortable with the fact that there is no such thing as "perfect" data. However, in understanding data analysis we are also able to not throw the baby out with the bathwater just because certain things may initially confound us. Like I said, KPU is not perfect but just because there is a fairly regular increase in KPU during the weekends does not invalidate the metric because this increase tends to occur across the board (thanks to the law of large numbers). While the law of large numbers in now way prevents outliers, we are able to handle them and there are various ways that we have to address these issues.

    Bottom line- no amount of evidence, however perfect or imperfect, will ever be enough to sway someone's belief if they do not have an open mind. I mean, people still doubt evolution and think dinosaur bones were put here by the devil.
    • Up x 3
  12. Aegie

    Right, and the "close range" NC carbine still falls behind both the Jaguar and the Serpent by around 22%.

    You're also saying that it is perfectly acceptable for NC traits to result in garbage performance.

    Right, CQC weapons deserve to be better weapons and certain faction deserve to have the best of them.
    • Up x 2
  13. Xae

    721 Uniques on NS-11P
    1000 Samples is large enough for a poll of the united states (population 300 million).


    That sample is large enough to prove the point. You are attempting to obfuscate by using terms you don't understand.
  14. DeadliestMoon

    People, stop worrying about the stats, it's not your job to balance the game. You are not devs!
  15. MurderBunneh

    After playing on nearly every server as NC I can tell you that Waterson holds the best and most organized scary as that may sound to you.

    We don't have many high br players and our leaders have quit or switched factions. The NC is in complete disarray atm and it is because of balance. Not pop balance but almost exclusively tied to bad ES weapons balance.

    Here place a score by each ES system and tell me where they are in the faction pecking order.

    1. MAX most situational and least effective we have basically the same ttk at close range now and are completely worthless at range.

    2. Reaver Biggest hit box and hardest to fly for new players without new players no new pilots.

    3. ACX-11 Terrible It is supposed to be a mid long range gun but gets gutted by velocity and mag size. Compare it to the other ES carbines. They are not PENALIZED for trying to do what the guns are meant for.

    4. Phoenix Situational but I love it. But shooting long shots drops dps to a level of non threat to all but the injured. Our "long" shots can only be 300m but still we are penalized for taking them with awful damage out put. Not to mention leaving ourselves DEFENSEless against the growing number of LOLNINJA500 pro sub/infil players. Also the Striker is FAR more effective at clearing the skies and ground to a lesser extent. Air and armor superiority are the deciding factor in big battles and smaller ones too.

    5. DMR See ACX-11

    6. ES secondary weapons Do I need to go here?
    • Up x 3
  16. Aegie

    No, the sample is not large enough to prove the point.

    If I were to flip a coin from now until the day I die and it were to land on heads every single time it would still not invalidate the use of statistics.

    Moreover, you are forgetting that there is a distinct difference between these two groups- the people that have the Platinum version are much more likely to consist of people who are far more dedicated to the game because at least some of them performed extra actions to get this special item. I would put money on the people who play with the Platinum version spending, on average, more time in game than others.

    You are also forgetting that the difference in performance between the -P variant weapon is smaller than the range of KPU obtained from, say, the GD-7F. Hence why we have more power when we aggregate these values over time because that helps adjust for day to day variance.

    Furthermore, if the metric was worthless then the probability that we would see such similar results across factions within weapons that are carbon copies is far more astounding than finding a few outliers here and there. In order to lend any credence to this objection you much present a meaningful argument as to why, randomly, these other weapons would produce such startling similar results across so many people and so much time.

    What you are attempting to do here is akin to finding someone who smoked 2 packs of cigarettes from 10-100 and never got lung cancer and then claiming that this one person is proof that there is not a link between smoking and cancer.
    • Up x 2
  17. Chewy102

    Why are you trying to compare vehicle weapons to infantry weapons?

    With vehicles TR have MUCH better options for AV and AI. The Marauder makes a Kobalt look like a baby toy on a ******* Harasser. And the Vulcan is debatable to being better than a Halberd for MBTs depending on the loadout. Then you compare HE rounds, HE got kicked in the nuts and are still better in TR tanks from having 2 shots and about a 27-28% shorter reload.

    Also vehicle weapons are effected FAR more from infantry in the area than infantry weapons are from vehicles in the area. Lock-ons, C4, dumbfires, and AV turrets ruin the data from each factions vehicles not having to deal with the same AV weapons. Then you have the vehicles anti infantry abilities to deal with. Some vehicles are better than others no matter what weapons they have.

    If you want to get into a stats debate then you need to bring better data. For one you don't bother to post a source, another you try to compare 4 weapon types (Harasser weapon, MBT main weapon, MBT 2nd weapon, and infantry weapons) over classes that might as well be its own species.

    Im not the biggest fan of using a single data type but at least we are giving real data and not pulling **** out our *****. Evolve a bit and bring your A game.
  18. Aegie

    I suppose we should just join all the other people who make purely anecdotal arguments with no tangible evidence?

    Or have faith in the ability for the people in charge to make a great game given that they have such a stellar history of making excellent decisions?

    This sure reminds me a lot of the conversations that took place over the ridiculous initial flinch values.
    • Up x 1
  19. Aegie

    As it currently stands, the NC trait is high damage per shot kneecapped by all the other stats to result in generally worse performance nearly across the board when compared to the other factions.

    Saddest part, I have news for you, it has been this way pretty much since release.
    • Up x 2
  20. SquattingPig

    What happened to the thread from yesterday about NC repelling good players? That thread made a far more subtle and, IMO, correct argument than what's being discussed here.