4. Simple Skewness

Getting Started with Competitive Programming

(an NPTEL course)

Module 4. Simple Skewness

This lecture is based on the problem called “Simple Skewness” (8VC Venture Cup 2016 - Elimination Round, Codeforces; Problem E), and you can find a link to it here.

In this problem we are trying to find a subset which maximizes the difference between the mean and the median! You can find the explanation that this lecture is based on over at Petr’s blog.

We claim that for arrays of even size with non-negative skewness, you can remove the larger of the two middle elements to ensure that you get to an array with an odd number elements whose skewness is no less than in the original array. This can be shown with the following computation, which I skipped in class.

Recall that the simple skewness of a set of numbers is defined as the difference between the mean and the median of .

Claim. Let be a set of integers with non-negative simple-skewness, and let be the upper middle element, i.e, the -th element if the elements of are written in sorted order. We claim that the simple skewness of the set is at least the simple skewness of the original set .

Proof (version 1; from this comment).
Proof (version 2, in the search of a more visual/intuitive approach).

Remark. Do we really need the assumption that the simple skewness of is non-negative to begin with?

It turns out that yes, we do! Without this assumption the statement may not be true, here's a counter example shown by Harshil Mittal: and . Then but . However, it also turns out that a weaker version of the claim holds without the assumption: while possibly not the upper middle element, there always exists some element whose removal preserves (i.e, does not decrease) the simple skewness of the original set.

The argument for this can be made by an elegant application of the probabilistic method, as discovered by Sagar Kale.

The calculation is reproduced below, originally articulated on Twitter.

Let be the given collection of numbers in nondecreasing order; note that is even.

Let and Then the skewness is

Remove an element at random, then:

image

which is the original mean.

Also, since whenever the removed random element is from the first elements, the new median is and otherwise, we have that:

image

which is the original median. Since the skewness stays the same in expectation, there exists some element whose removal causes new skewness to be at least as much as the old one.

If you have a visual proof for the original claim, please leave it in a comment below!

The code discussed in the video can be found here.

super-embed:<div id="hyvor-talk-view"></div><script async defer type="text/javascript" src="//talk.hyvor.com/web-api/embed.js"></script>