Search the median from big data

The file Integers holds 100,000 positive integers. We need to find the middlemost integer - the integer in the 50000th position ranking in descending order.

Solution:

A B C D
1 =file("Integers") =A1.cursor@b() >a=0
2 =B1.groups@n(~.Integer\1000000+1:ID;count(~):Count) =A2.sum(Count)  
3 =A2.select@1((a=a+~.Count,a>B2/2))    
4 =A1.cursor@b()    
5 =A4.select(~.Integer\1000000+1 ==A3.ID) =A5.sortx(-Integer)  
6 =B5.select@1(#==ceil(a-B2/2)) =A6.fetch()  

In the first traversal, compute the range where the median is located and its ranking in this range. Because the number of groups is not great after the range is divided, use the function groups for grouping and summarizing. The median can be computed out in the second traversal. In the computation, you are not required to retrieve all data all at once. This method can be used to find the median for whatever amount of data.

In B6, the median is found ultimately, as shown below: