Class Histogram<K extends Comparable>

  • All Implemented Interfaces:
    Serializable

    public final class Histogram<K extends Comparable>
    extends Object
    implements Serializable
    Class for computing and accessing histogram type data. Stored internally in a sorted Map so that keys can be iterated in order.
    See Also:
    Serialized Form
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  Histogram.Bin<K extends Comparable>
      Represents a bin in the Histogram.
    • Constructor Summary

      Constructors 
      Constructor Description
      Histogram()
      Constructs a new Histogram with default bin and value labels.
      Histogram​(Histogram<K> in)
      Copy constructor for a histogram.
      Histogram​(String binLabel, String valueLabel)
      Constructs a new Histogram with supplied bin and value labels.
      Histogram​(String binLabel, String valueLabel, Comparator<? super K> comparator)
      Constructor that takes labels for the bin and values and a comparator to sort the bins.
      Histogram​(Comparator<? super K> comparator)
      Constructs a new Histogram that'll use the supplied comparator to sort keys.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addHistogram​(Histogram<K> addHistogram)
      Mutable method that allows the addition of a Histogram into the current one.
      Comparator<? super K> comparator()
      Returns the comparator used to order the keys in this histogram, or null if this histogram uses the natural ordering of its keys.
      boolean containsKey​(K key)
      Return whether this histogram contains the given key.
      Histogram<K> divideByHistogram​(Histogram<K> divisorHistogram)
      Immutable method that divides the current Histogram by an input Histogram and generates a new one Throws an exception if the bins don't match up exactly
      boolean equals​(Object o)
      Checks that the labels and values in the two histograms are identical.
      double estimateSdViaMad()
      Returns a value that is intended to estimate the mean of the distribution, if the distribution is essentially normal, by using the median absolute deviation to remove the effect of erroneous massive outliers.
      Histogram.Bin<K> get​(K key)
      Retrieves the bin associated with the given key.
      String getBinLabel()  
      double getCount()  
      double getCumulativeProbability​(double v)
      Returns the cumulative probability of observing a value <= v when sampling the distribution represented by this histogram.
      double getGeometricMean()
      Gets the geometric mean of the distribution.
      double getMax()
      Returns the key with the highest count.
      double getMean()
      Assuming that the key type for the histogram is a Number type, returns the mean of all the items added to the histogram.
      double getMeanBinSize()
      Calculates the mean bin size
      double getMedian()  
      double getMedianAbsoluteDeviation()
      Gets the median absolute deviation of the distribution.
      double getMedianBinSize()
      Calculates the median bin size
      double getMin()
      Returns the key with the lowest count.
      double getMode()
      Returns id of the Bin that's the mode of the distribution (i.e.
      double getPercentile​(double percentile)
      Gets the bin in which the given percentile falls.
      double getStandardDeviation()  
      double getStandardDeviationBinSize​(double mean)
      Calculates the standard deviation of the bin size
      double getSum()
      Returns the sum of the products of the histgram bin ids and the number of entries in each bin.
      double getSumOfValues()
      Returns the sum of the number of entries in each bin.
      String getValueLabel()  
      int hashCode()  
      void increment​(K id)
      Increments the value in the designated bin by 1.
      void increment​(K id, double increment)
      Increments the value in the designated bin by the supplied increment.
      boolean isEmpty()
      Returns true if this histogram has no data in in, false otherwise.
      Set<K> keySet()
      Returns the set of keys for this histogram.
      void prefillBins​(K... ids)
      Prefill the histogram with the supplied set of bins.
      void setBinLabel​(String binLabel)  
      void setValueLabel​(String valueLabel)  
      int size()
      Returns the size of this histogram.
      String toString()  
      void trimByTailLimit​(int tailLimit)
      Trims the histogram when the bins in the tail of the distribution contain fewer than mode/tailLimit items
      void trimByWidth​(int width)
      Trims the histogram so that only bins <= width are kept.
      Collection<Histogram.Bin<K>> values()
      Returns a Collection view of the values contained in this histogram.
    • Constructor Detail

      • Histogram

        public Histogram()
        Constructs a new Histogram with default bin and value labels.
      • Histogram

        public Histogram​(String binLabel,
                         String valueLabel)
        Constructs a new Histogram with supplied bin and value labels.
      • Histogram

        public Histogram​(Comparator<? super K> comparator)
        Constructs a new Histogram that'll use the supplied comparator to sort keys.
      • Histogram

        public Histogram​(String binLabel,
                         String valueLabel,
                         Comparator<? super K> comparator)
        Constructor that takes labels for the bin and values and a comparator to sort the bins.
      • Histogram

        public Histogram​(Histogram<K> in)
        Copy constructor for a histogram.
    • Method Detail

      • prefillBins

        public void prefillBins​(K... ids)
        Prefill the histogram with the supplied set of bins.
      • increment

        public void increment​(K id)
        Increments the value in the designated bin by 1.
      • increment

        public void increment​(K id,
                              double increment)
        Increments the value in the designated bin by the supplied increment.
      • getBinLabel

        public String getBinLabel()
      • setBinLabel

        public void setBinLabel​(String binLabel)
      • getValueLabel

        public String getValueLabel()
      • setValueLabel

        public void setValueLabel​(String valueLabel)
      • equals

        public boolean equals​(Object o)
        Checks that the labels and values in the two histograms are identical.
        Overrides:
        equals in class Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object
      • getMean

        public double getMean()
        Assuming that the key type for the histogram is a Number type, returns the mean of all the items added to the histogram.
      • getSum

        public double getSum()
        Returns the sum of the products of the histgram bin ids and the number of entries in each bin. Note: This is only supported if this histogram stores instances of Number.
      • getSumOfValues

        public double getSumOfValues()
        Returns the sum of the number of entries in each bin.
      • getStandardDeviation

        public double getStandardDeviation()
      • getMeanBinSize

        public double getMeanBinSize()
        Calculates the mean bin size
      • size

        public int size()
        Returns the size of this histogram.
      • comparator

        public Comparator<? super K> comparator()
        Returns the comparator used to order the keys in this histogram, or null if this histogram uses the natural ordering of its keys.
        Returns:
        the comparator used to order the keys in this histogram, or null if this histogram uses the natural ordering of its keys
      • getMedianBinSize

        public double getMedianBinSize()
        Calculates the median bin size
      • values

        public Collection<Histogram.Bin<K>> values()
        Returns a Collection view of the values contained in this histogram. The collection's iterator returns the values in ascending order of the corresponding keys.
      • getStandardDeviationBinSize

        public double getStandardDeviationBinSize​(double mean)
        Calculates the standard deviation of the bin size
      • getPercentile

        public double getPercentile​(double percentile)
        Gets the bin in which the given percentile falls. Should only be called on histograms with non-negative values and a positive sum of values.
        Parameters:
        percentile - a value between 0 and 1
        Returns:
        the bin value in which the percentile falls
      • getCumulativeProbability

        public double getCumulativeProbability​(double v)
        Returns the cumulative probability of observing a value <= v when sampling the distribution represented by this histogram.
        Throws:
        UnsupportedOperationException - if this histogram does not store instances of Number
      • getMedian

        public double getMedian()
      • getMedianAbsoluteDeviation

        public double getMedianAbsoluteDeviation()
        Gets the median absolute deviation of the distribution.
      • estimateSdViaMad

        public double estimateSdViaMad()
        Returns a value that is intended to estimate the mean of the distribution, if the distribution is essentially normal, by using the median absolute deviation to remove the effect of erroneous massive outliers.
      • getMode

        public double getMode()
        Returns id of the Bin that's the mode of the distribution (i.e. the largest bin).
        Throws:
        UnsupportedOperationException - if this histogram does not store instances of Number
      • getMin

        public double getMin()
        Returns the key with the lowest count.
        Throws:
        UnsupportedOperationException - if this histogram does not store instances of Number
      • getMax

        public double getMax()
        Returns the key with the highest count.
        Throws:
        UnsupportedOperationException - if this histogram does not store instances of Number
      • getCount

        public double getCount()
      • getGeometricMean

        public double getGeometricMean()
        Gets the geometric mean of the distribution.
      • trimByTailLimit

        public void trimByTailLimit​(int tailLimit)
        Trims the histogram when the bins in the tail of the distribution contain fewer than mode/tailLimit items
      • isEmpty

        public boolean isEmpty()
        Returns true if this histogram has no data in in, false otherwise.
      • trimByWidth

        public void trimByWidth​(int width)
        Trims the histogram so that only bins <= width are kept.
      • divideByHistogram

        public Histogram<K> divideByHistogram​(Histogram<K> divisorHistogram)
        Immutable method that divides the current Histogram by an input Histogram and generates a new one Throws an exception if the bins don't match up exactly
        Parameters:
        divisorHistogram -
        Returns:
        Throws:
        IllegalArgumentException - if the keySet of this histogram is not equal to the keySet of the given divisorHistogram
      • addHistogram

        public void addHistogram​(Histogram<K> addHistogram)
        Mutable method that allows the addition of a Histogram into the current one.
        Parameters:
        addHistogram -
      • get

        public Histogram.Bin<K> get​(K key)
        Retrieves the bin associated with the given key.
      • keySet

        public Set<K> keySet()
        Returns the set of keys for this histogram.
      • containsKey

        public boolean containsKey​(K key)
        Return whether this histogram contains the given key.