# Big data & Analytics Chapter 3 Quiz Answers – Data Analysis

1. Which statement is an accurate description of discrete variables?

• Discrete variables are qualitative and consist of two or more categories in which order matters.
• They are quantitative with a continuous range of values.
• They are quantitative with a finite set of values.
• They are qualitative and consist of two or more categories of values in which order does not matter.

Explanation: Variables are either categorical or numeric. Numerical variables are qualitative and are either continuous or discrete. Discrete variables are those from a finite set of values such as the number of users on the network.

2. Which term is used to describe the difference between the highest and the lowest values for a variable?

• range
• mode
• variance
• standard deviation
• skew

Explanation: The range is the difference between the highest and lowest values for a variable. By knowing the range, someone looking at the data can have a basic idea if the data is making sense.

3. What are two data structures in pandas? (Choose two.)

• text
• series
• image
• number
• dataframe

Explanation: Pandas data structures include the series structure and the dataframe structure. Text, image, and number are possible values stored in a data structure.

4. In a graphical reppresentation of the distribution for a discrete variable, what does the X and Y axis reppresent? (Choose two.)

• skew
• variable
• probability
• range
• standard deviation

Explanation: A distribution can be graphed in two dimensions using two characteristics: the variable and the probability. Typically the variable is plotted on the x-axis and the probability on the y-axis.

5. Which term is used to describe the right and left ends of a distribution graph?

• tail
• end
• peak
• axis

Explanation: On a distribution curve, the right and left ends of the graphed curve are known as the tails.

6. Which file type is most similar to a dataframe in pandas?

• a PDF file
• a word processing file
• a multilayer image file

Explanation: A dataframe in pandas is like a spreadsheet with rows and columns.

7. A data analyst studies two variables over a period. Data set A is (100, 90, 81, 73) and data set B is (3500, 3150, 2835, 2552). The Pearson r value between these two variables is 0.99. Which statement describes the correlation between them?

• It indicates no relationship.
• It is a weak negative correlation.
• It is a strong positive correlation.
• It is a strong negative correlation.

Explanation: The Pearson r is a quantity to indicate the correlation between two variables. It is expressed as a value between -1 and 1. Positive values express a positive relationship between the changes in two variables. Negative values express an inverse relationship. The magnitude of either the positive or negative values indicates the degree of correlation. In other words the closer the value is to 1 or -1, the stronger the relationship. 0 indicates no relationship.

8. Match the variable with the description. Explanation: The correct answer is: ordinal → qualitative values in order of ranking, nominal → qualitative values based on identity of the object, continuous → quantitative values with infinite range, discrete → quantitative and finite set of values

9. What is a measurement of central tendency that is also known as the average?

• mode
• mean
• median
• Person r

Explanation: There are three common measures of central tendency: mean, median, and mode. Each expresses the values that a variable has that is closest to the central distribution of a data set. The mean is also known as the average, which is the sum of all data divided by the number of values in the data set.

10. What category of variables includes nominal or ordinal variables?

• numerical
• categorical
• ratio
• interval

Explanation: Variables are classified as either categorical or numerical. Categorical variables include nominal and ordinal variables.

11. What relationship is an example of causation?

• a relationship between sensors and actuators in IoT
• a relationship in which two things change at a similar rate
• a relationship between a computer operating system and one or more programming languages
• a relationship when a change in one thing directly results in something else changing

Explanation: Causation describes a relationship in which one thing changes, or is created, directly because of something else. It is not the relationship of the rate of change. Sensors and actuators in IoT are not directly related to each other. A controller might receive the signal from a sensor, make a decision, and instruct an actuator to act if necessary. Also, operating systems and programming languages are not dependent on each other. An operating system can be made by different programming languages.

12. What is a purpose of using the head Linux command?

• to determine the size of the file
• to display the permissions on the file
• to verify that the file is available and preview the first few lines
• to determine whether the file contains enough code to be executable

Explanation: The head Linux command reads the first few lines of a file and writes them to the standard output (by default, the display screen). This command is useful to verify the existence and the content of a file before it is imported into pandas.

13. Match the variable with the type of value. Explanation: The correct answer is: nominal → eye color, ordinal → student class rank, continuous → sales volume, discrete → number of users

14. What term describes the key characteristics that are observed or measured as part of an experiment or analysis?

• variable
• data set
• statistics
• population

Explanation: Variables are the key characteristics that are measured or observed as part of an experiment or an analysis.

15. Which term is used to describe a graph of distribution where the peak is left of center or right of center?

• skewed
• normal
• bimodal
• symmetric

Explanation: When a graph of distribution has the peak left or right of center, this is known as skew.

16. A data analyst performs a correlation analysis between two quantities. The result of the analysis is an r value of 0. What does this mean?

• The analysis failed.
• The two variables have no relationship.
• Neither variables changed during the period.
• When one variable increases its value, the other variable reduces its value.

Explanation: The commonly used correlation coefficient, Pearson r, (or r value), is a quantity that is expressed as a value between -1 and 1. Positive values indicate a positive relationship between the changes in two quantities. Negative values indicate an inverse relationship. The magnitude of either the positive or negative values indicates the degree of correlation. The closer the value is to 1 or -1, the stronger the relationship. 0 indicates no relationship. 