4.1

4.1.1 Data and its Representation

  1. Bar Graph

    Untitled

  2. Pie Chart

    Untitled

  3. Cross Table

    Untitled

    ↓↓↓ Cross Table can be convert to Bar Chart ↓↓↓

    Untitled

  4. (Relative) Frequency Table (for categorial data)

    Frequency table is sometimes called frequency distribution.

    Note: Left: Frequency Table Right: Relative Frequency Table

    Untitled

  5. (Relative) Frequency Table (for numerical data)

    Note: Round up: 9.2 → 10

    Untitled

Example

  1. Histogram

    Untitled

    Note:

    1. no gaps between the bars
    2. the width of bars are equal
    3. the number of observations can be displayed above the bars (count can show above the bars)
    4. the intervals correspond to the classes in a frequency distribution (x of a bar: range, or “the unit of measurement”)
    5. the height of each bar is proportional to the number of observations in the interval (y of a bar: count in the range of this bar)
  2. Ogive An ogive is a line that connects points that are the cumulative percent of observations below the upper limit of each interval in a cumulative frequency distribution.

    Untitled

  3. Stem and Leaf Plot

    Untitled

    Source data

    Source data

    Untitled

  4. Time Series Plot

    It describes how the values change as time goes by.

    Untitled

  5. Scatterplot

    Untitled

    3 important things of Scatterplot:

    1. direction:

      Negative

      Untitled

      Positive

      Untitled

    2. form:

      Linear or not(curved, bending all over the place…)

    3. strength:

      Linear strength can be measured mathematically

Measure the Linear strength of Scatterplot

Let $(x_i,y_i)$ for $i=1,2,...,n$ be the corresponding values of two quantitative variables, then the correlation $r$ is defined as

$$ r=\frac{\sum^n_{i=1}(x_i-\bar x)(y_i-\bar y)}{\sqrt{\sum^n_{i=1}(x_i-\bar x)^2(y_i-\bar y)^2}} $$

where $\bar x=\frac{1}{n}\sum^n_{i=1}x_i$ and $\bar y=\frac{1}{n}\sum^n_{i=1}y_i$ are the mean of $x$ and $y$ respectively.

Note:

  1. $r$ can be used only if there is a linear relationship between the variables.
  2. The variables must be quantitative.
  3. Outliers can strongly affect the correlation.

Remarks:

  1. Correlation $r$ has no units.
  2. $\left | r \right |\le 1$. When $|r| =1$, the points lie exactly on a straight line.

4.1.2 Four Scales of Measurements