Data Analytics Essentials Course Final Exam Answers

1. Match the task and purpose to the appropriate Big Data analytics method.

Explanation: Place the options in the following order:

analyzing customer subscription cancellations and correlating them with customer comments and ratings to determine the most common reasons why users cancel subscriptions diagnostic analysis
analyzing customer ordering history to find popular products within last 6 months descriptive analytics
analyzing historical data of temperature to prepare latest weather forecast predictive analytics
reviewing sales data of top 5 competitors to determine the marketing strategy of own product for next quarter prescriptive analytics

2. Match the type of data analysis with the question it can answer.

Explanation: Place the options in the following order:

Predictive What might happen in the future?
Prescriptive What should be done next?
Diagnostic Why did something happen?
Descriptive What happened?

3. Which statement is true concerning data analytics?

  • Descriptive analytics uses a feedback system to track the outcome of actions taken.
  • Prescriptive analytics uses historical data to provide regular reports on events that have already happened.
  • Descriptive analytics attempts to predict what will happen next.
  • Predictive analytics uses simulation models and forecasting to suggest what could happen.

Explanation: Descriptive analytics uses historical data to provide regular reports on events that have already happened. Prescriptive analytics uses a feedback system to track the outcome of actions taken. Predictive analytics uses data and statistical techniques to predict future trends. All three types of analytics (Prescriptive, Predictive, and Descriptive) are used in Big Data analysis.

4. A data analyst uses Excel and Tableau to identify patterns and correlations in a dataset to draw conclusions. Which phase of the analytic process is the analyst currently working on?

  • presenting the results
  • analyzing the data
  • investigating the data
  • preparing the data

Explanation: During the analyzing data phase of an analytic project, an analyst looks for patterns and correlations in the data set to draw conclusions.

5. Refer to the exhibit. A data analyst wants to create a formula in Microsoft Excel that will automatically calculate the net revenue from the sales of dining sets. Which formula inserted in cell F2 will accomplish this?

  • =(C2*(B2-D2)*E2)
  • =B2*C2-D2*E2
  • =B2-D2+C2*E2
  • =(B2*C2)-(D2*C2)*E2

Explanation: The correct formula is =(C2*(B2-D2)*E2) and will yield a net loss in revenue amounting to $580.10.

6. Which type of variable is used for qualitative values such as gender or eye color?

  • ordinal
  • discrete
  • continuous
  • nominal

Explanation: Variables are either categorical or numerical. Categorical variables are qualitative and are either nominal or ordinal. Nominal variables are used for values that are based on the identity of the object such as eye color or gender.

7. What are three resources provided by the Kaggle web site? (Choose three.)

  • API for major social media networks
  • publicly available datasets
  • data analysis integration with Excel
  • code
  • data analysis integration with Tableau
  • community competitions

Explanation: Kaggle offers many resources for new and advanced data scientists. These include publicly available data sets, code, community, inspiration, competitions, and courses.

8. Match the variable with the description.

Explanation: Place the options in the following order:

interval quantitative values from a given range
discrete quantitative value from a finite set of values
ordinal qualitative values in which order matters
ratio quantitative values that can specify if a value for a variable exists

9. Match the ETL process with its description.

Explanation: Place the options in the following order:

transform is the process of conditioning data into a usable form, such as removing duplicate data, providing missing data, and correcting any errors in the data.
load is the process of transferring data to a database
extract is the process of gathering the data to be analyzed

10. Which statement is true about the ELT and ETL process?

  • In the ELT process, the transform step occurs on the stored data as it is used.
  • In the ELT process, the load step occurs after the transform step.
  • In the ETL process, the extract step occurs after the load step.
  • In the ETL process, the load step occurs before the transform step.

Explanation: In the ETL process (Extract, Transform and Load), data is first extracted, then transformed and then loaded into the database. In the ELT process ((Extract, Load and Transform) the load and transform steps are reversed. ELT enables raw data to skip the transformation step and go straight to storage in an unstructured form. Transformation then occurs on the stored data as it is used.

11. An analyst in an online order company is researching products that online customers spend the most time browsing for on the website but do not buy. The result of the analysis will enable the company to release quick discount notices on the website to encourage customers to buy those products. What relevant data is required to do the research?

  • customer order history
  • inventory of products
  • brands of products
  • streaming customer-viewing data

Explanation: The role of time plays an important role in modern data analytics. Businesses rely on real-time data to make quick decisions that will hold the greatest benefits to them. In this case, the customer viewing data can be analyzed against a policy to release discount notices in a timely fashion in order to encourage interested customers to make a purchase.

12. Which two tasks are performed as part of the transform step of the ETL data process? (Choose two.)

  • removing the blank records
  • standardizing formats such as date, time, and location
  • communicating results to decision-makers
  • collecting and recording the data
  • transferring the data into a database

Explanation: The process of transforming the data includes tasks such as:

  • Combining and sorting data sources
  • Locating and removing duplicate data
  • Providing data for missing values
  • Rejecting or correcting erroneous data

13. Match the Microsoft Excel function with its description.

Explanation: Place the options in the following order:

LEN it is used to return the number of characters in a cell
COUNTA it is used to return the number of blank or non-blank cells in a range
CONCATENATE it is used to combine the data from one or more columns into a single column
SUMIF it is used to create a sum of values based on multiple criteria
INDEX it is used to return a value or the reference to a value from within a table or range
UPPER it is used to convert text to all capital letters (UPPERCASE)

14. Which option in Microsoft Excel represents a formula with an absolute reference?

  • =($A1-$B1)
  • =(B$1-B$3)
  • =($A$1-$B$1))
  • =(B$=(C$1$-F$1$)

Explanation: Dollar symbols in a cell reference indicate to Microsoft Excel to treat the cell reference as absolute and to always refer to the value of the cell regardless of whether it is moved or where the formula is located. An absolute reference is designated in a formula by the addition of a dollar sign ($) before the column and row. As an absolute reference was used to refer to Cell B1, the formula will automatically update to wherever the contents of this cell is moved.

15. A data analyst needs to organize sales data for analysis. Which Excel function can order the data by date sold, with the most recent sales listed first?

  • Conditional Formatting
  • Sort & Filter
  • Text to Columns
  • Find & Select

Explanation: The Sort & Filter tool in Excel allows an analyst to sort the contents of a column in either ascending order so that dates are earliest to most recent or descending order so that dates are most recent to earliest.

16. How can data analysts use the Conditional Formatting tool in Excel to aid in data analysis of bike sales records?

  • to show only sales from the United States
  • to highlight values in the revenue column that are above average
  • to separate the contents of the product description column into separate columns
  • to list sales order numbers from lowest to highest

Explanation: By using conditional formatting, a data analyst can highlight cells that contain certain values or that meet certain criteria.

17. A learner is analyzing a large volume of data in a Microsoft Excel spreadsheet and wishes to find duplicate data values. The data has been organized in a table where each row has different but related forms of data in each column. Which Microsoft Excel function can be used to do this?

  • ISNA
  • SUM
  • IF

Explanation: VLOOKUP is a very powerful data analysis tool in Microsoft Excel that is used to find information in a large spreadsheet including duplicate values. VLOOKUP is a vertical lookup function, so the data needs to be organized in a table where each row has different but related forms of data in each column.
The IF function makes logical comparisons.
The ISNA looks for cells with #N/A.
The SUM function is used to add the values in cells selected.

18. A data analyst wants to compare the average life expectancy and GDP for forty countries. Which type of visual representation would best suit this task?

  • pie chart
  • scatter plots
  • bar chart
  • line graph

Explanation: A scatter plot is a type of data visualization that shows the relationship between different variables. This data is shown by placing various data points between an x- and y-axis. Scatter plots are very popular for correlation visualizations, or when you want to show the distribution of a large number of data points. Scatter plots are also useful for demonstrating clustering or identifying outliers in the data.

19. The figure contains a section of an Excel spreadsheet. Cell C2 contains the formula “=VLOOKUP(B1,$A$2:$A$10,1,FALSE)” When cell C2 is clicked “#N/A” is displayed.

Refer to the exhibit. A learner is analyzing a data spreadsheet in Microsoft Excel and notices a formula function “=VLOOKUP(B1,$A$2:$A$10,1,FALSE)”. When clicking on cell C2, the value displayed in the field in cell C2 is “#N/A”. Why is this value displayed?

  • The formula is entered in the wrong cell.
  • The cell range is incorrect.
  • No value is entered in cell C3.
  • No duplicate value for the value entered in cell B2 is found.

Explanation: VLOOKUP can also be used to help with data cleaning by finding duplicates. With VLOOKUP you can compare two columns (or lists) and find duplicate values. The formula is written in cell C2 as =VLOOKUP(B1,$A$2:$A$10,1,FALSE). This formula compares the value in B1 with the range A2:A10. If there is no duplicate then an error is displayed as “#N/A”. If a duplicate was found it would display the name of the duplicate.

20. What is a statistical analysis result that a descriptive statistical analysis will not provide?

  • calculation of how widely dispersed the data is
  • visualization of particular data trends occurring
  • calculation of the smallest or largest value
  • predictions made about other data sets that are not in the population

Explanation: Descriptive statistics are used to describe or summarize data in ways that are meaningful and useful. They describe the current or historical state of the observed population but do not allow for comparison of groups, conclusions to be drawn, or predictions to be made about other data sets that are not in the population.

21. Refer to the exhibit. Which option will correctly display the user names and email addresses of users in this table?

  • SELECT User
    WHERE user_id, user_name, user_email
  • SELECT User
    FROM user_id, user_name, user_email
  • SELECT user_id, user_name, user_email
    FROM User
  • SELECT user_id, user_name, user_email
    WHERE User

Explanation: The SELECT command is used to request the specified fields using commas to separate fields.
The FROM command returns the selection from the table name.

Therefore, the correct answer is:
SELECT user_id, user_name, user_email
FROM User 

22. Drag the SQL Data Manipulation Language (DML) statements to the correct description.

Explanation: Place the options in the following order:

ORDER BY used to structure the resulting query in ascending (ASC) order by default
WHERE is a conditional statement used to filter your returned results
FROM returns the selection from the table name specified
SELECT returns the fields specified

23. Refer to the exhibit. A portion of the Movie table from the Movies database is shown. A data analyst is writing a query that will return the title and release date of movies released after 2000. The analyst further needs the list ordered by the release date. Which query will return the required results?

  • SELECT Movie
    FROM Title, Release_date
    WHERE Release_date > ‘2000-12-31’
    ORDER BY Release-date
  • SELECT Title, Release_date
    FROM Release_date > ‘2000-12-31’
    WHERE Movie
    ORDER BY Release-date
  • SELECT Title, Release_date
    FROM Movie
    WHERE Release_date > ‘2000-12-31’
    ORDER BY Release-date
  • SELECT Release_date > ‘2000-12-31’
    FROM Title, Release_date
    WHERE Movie
    ORDER BY Release-date

Explanation: The correct query is:
SELECT Title, Release_date
FROM Movie
WHERE Release_date > ‘2000-12-31’
ORDER BY Release-date

The SELECT command indicates the columns
The FROM command indicates the table
The WHERE command indicates the criteria
The ORDER command indicates that result is ordered by release date

24. Match the type of database to the best description.

Explanation: Place the options in the following order:

relational database it requires a SQL to store, manipulate, and retrieve data
spreadsheet it stores data in a sheet using a tabular format of columns, rows, and cells
flat file database it stores records in a single file with no hierarchical structure

25. What is the purpose of the DML language commands in SQL?

  • to manage updates to the database storage
  • to create, modify or delete objects in the database schema
  • to select, update, insert or delete records and values from a table
  • to change rights and permissions to data in the database

Explanation: The Data Manipulation Language stores, modifies, retrieves, and deletes data in a database table. DML includes the most common SQL commands, including SELECT, INSERT, UPDATE and DELETE.

26. What does a field refer to in database schema terminology?

  • a specific attribute of an individual record
  • a unique record within the database
  • a collection of attributes of an individual item
  • a collection data in rows and columns

Explanation: A field within the database schema refers to a specific attribute of an individual record. Fields are named by column headings. An individual movie title in the Title column would be a field in the movie table.

27. Refer to the exhibit. A data analyst writes a SQL query to extract information from multiple tables in the Movies database. To complete the ON command, what needs to be entered in place of the question mark in this query?

SELECT m.Title, m.Date, r.AuthorName, r.Score, r.Comment, r.Date
FROM Review AS r
JOIN Movie AS m
ON m.Title = ?;
  • r.Date;
  • r.Comment;
  • r.Score;
  • r.MovieTitle;
  • r.ID;

Explanation: To combine two tables, a SQL JOIN can use columns containing the same data type and information in both tables. The Title column in the Movie table includes the movie’s name, and the MovieTitle column in the Review table contains the same information. Even though the Date column appears in both tables, it does not have the same information and, therefore, cannot be used in the JOIN.

28. Refer to the exhibit. Match the JOIN operation with the proper description.

Explanation: Place the options in the following order:

LEFT JOIN Returns all records from Table 1 and only the matched records from Table 2.
FULL JOIN Returns all records from Table 1 and Table 2.
RIGHT JOIN Returns all records from Table 2 and the matched records from Table 1.
INNER JOIN Returns only records where the selected fields have matching values in Table 1 and Table 2.

29. Which two are benefits of organizing information from multiple datasets into a dashboard? (Choose two)

  • The information is easily visualized.
  • The information is easily secured.
  • The information is easily analyzed.
  • The information is easily tracked.

Explanation: There are several benefits to using dashboards to present your data.

  • They give you access to data and reports from multiple data sources on a single screen simplifying data analysis.
  • The gamified and dynamic interface is easy to use and understand.
  • The information is easily visualized and analyzed.

30. What Tableau Public feature is used to consolidate multiple views and visualizations?

  • worksheets
  • objects
  • screens
  • dashboards

Explanation: In Tableau, using a dashboard is a way to display information that consolidates multiple views or visualizations.

31. Why would you use Tableau in your data analysis?

  • To perform mathematical calculations using formulas and built-in functions.
  • To display data in a tabular format.
  • To create visualizations.
  • To communicate with and query databases.

Explanation: Tableau is a business intelligence and data visualization tool. With Tableau you can create great and dynamic visualizations.

32. Match the data analysis tool with the use case.

Explanation: Place the options in the following order:

Excel generating computational solutions using formulas and built-in functions
SQL managing and querying relational databases
Tableau creating meaningful data visualizations and interactive dashboards
Kaggle competing in data science competitions

33. Which statement is true in respect of avoiding bias in data analytics?

  • don’t allow the data to confirm your conclusions
  • avoid selecting only data and methods that support your assumptions
  • use a small sample set that supports your hypothesis
  • focus on the larger patterns and trends and include outliers in the overall analysis

Explanation: Some ways to avoid bias in data analytics include:

  1. Be aware that bias exists. Record your assumptions and hypotheses before beginning your analysis. Avoid selecting only data and methods that support your assumptions.
  2. Validate your data sources and the methodology used to collect the data.
  3. Focus on the larger patterns and trends, remove outliers from the overall analysis in order to investigate them further.
  4. Review your methods and data with others in your team. They may be able to spot bias that you may have overlooked.
  5. Be open-minded and impartial in your analysis. Allow the data to inform your conclusions.

34. What are two methods that ensure confidentiality? (Choose two.)

  • authorization
  • encryption
  • availability
  • nonrepudiation
  • authentication
  • integrity

Explanation: Confidentiality means viewing of information only for those who need to know. This can be accomplished by encrypting data and authenticating users who request access.

35. A data analyst conducting a study stops collecting more data once the evidence starts to support the hypothesis. What type of bias has the analyst introduced into the findings?

  • confirmation
  • selection
  • interpretation
  • information

Explanation: Confirmation bias can occur when an analyst only collects or analyzes data that supports a particular hypothesis.

36. Which type of bias in data analysis can be caused by the influence of outliers?

  • interpretation
  • information
  • confirmation
  • selection

Explanation: Information bias can occur when outliers are present in the data and they are not dealt with appropriately. Outliers can skew the outcomes of analysis and distort the results.

37. Which three features should your data project portfolio contain to ensure that it is considered favorably by employers and recruiters? (Choose three.)

  • it contains many multi-colored charts and graphics that show the results of the data analysis
  • it uses a layout and presentation that showcases your web design and markup skills
  • there are links to all your projects on the homepage
  • each project has a clear outcome or conclusion
  • it is organized and easy to navigate
  • there are links to software, so non-technical reviewers can replicate your analysis

Explanation: A project portfolio should present those data projects you want to showcase in a well-organized and easy-to-navigate format. Non-technical reviewers will probably need to gain the skills or knowledge and the need to use software to replicate your analysis. A data project portfolio should focus on the processes and outcomes of your projects, not your web design and markup skills. Extensive use of multi-colored charts and graphics in your portfolio will detract from the content you want to showcase.

38. Why is the Jupyter Notebook tool useful when developing and testing data analysis software?

  • program output is displayed in the standalone Jupyter Notebook application
  • markdown text explains the operation of SQL queries and displays the results
  • when the Python code is compiled, each step is displayed
  • program execution output is displayed dynamically in real-time

Explanation: Python code is interpreted and not compiled. Jupyter Notebook runs code interactively within a web browser and not in a standalone application. SQL queries are applied to relational databases and not to Jupyter Notebooks. A Notebook’s displayed output results from executed code execution in real time.

39. Which data analytic tool can create interactive documents containing executable program code and markdown text?

  • Excel
  • Jupyter Notebooks
  • Kaggle
  • Tableau

Explanation: Jupyter notebook is a web-based interactive computing platform. It is a valuable tool for data analysts that provides a way to run code interactively within a web browser. It also includes markdown text to explain what the code is doing.

40. What are the three functions of programming languages optimized for data analysis? (Choose three)

  • virtualization of processing hardware
  • data communications across networks
  • statistical analysis
  • data visualization
  • data cleaning
  • fast response time to user input

Explanation: Network communication, hardware virtualization, and fast response times are not functions that programming languages optimized for data analysis are expected to perform.

41. Refer to the exhibit. What is the resulting output from the formula in cell D8?

  • 46
  • 12
  • 7.6
  • 6

Explanation: The COUNT function counts the number of cells containing numeric data in the formula range. In this example, the number of cells in the range D2 through D7 that contains data is 6.

42. Which Microsoft Excel formula would correctly multiply a value in cell A1 with a value in cell A19?

  • =A1*A19
  • =A1xA19
  • =MULTIPLY(A1:A19)
  • =xA1:A19

Explanation: The formula =A1*A19 takes the value in cell A1 and multiplies it with the value in cell A19. The symbol for multiplication in Microsoft Excel is the asterisk (*).

43. What are two plain-text file types that are compatible with numerous applications and use a standard method of representing data records? (Choose two.)

  • JSON
  • DOC
  • PDF
  • XLS
  • XML

Explanation: As data is collected from varying sources and in varying formats, it is beneficial to utilize specific file types that allow easy conversion and universal application support. CSV, JSON, and XML are plain text file types that allow for collecting and analyzing of data in a format that is easily compatible and applicable for analysis.

44. What is an outlier in data analytics?

  • it is a pointer to the sample data set
  • it is a point in inferential data that is numeric
  • it is a value or data point that varies significantly from others in the data set
  • it is a point in descriptive data that is historical

Explanation: An outlier is defined as a value or data point varying significantly from the others, either much smaller or much greater. Outliers can lead to anomalies in the results obtained, because they can lead to negative effects on the results of your analysis. Outliers have to be cleaned up before the data set can be used for effective analysis.

45. What is referenced by the FROM statement in an SQL query?

  • the table containing the data
  • the row where the data is located
  • the database to be queried
  • the column where the data is located

Explanation: The FROM statement in an SQL query specifies the table where data is stored.

46. ou are preparing a presentation that needs a visualization showing the relative levels of coffee production in the coffee regions of the world. Which type of visualization in Tableau is well suited for displaying this type of data?

  • bar chart
  • bubble map
  • heat map
  • area graph

Explanation: The bubble map is useful for comparing proportions over geographic areas. Circles or dots, which are proportional in size to its value in the dataset, are displayed over a designated geographic region.

47. What assurance does data integrity provide?

  • Data is recoverable.
  • Data is unaltered in transit.
  • Data access is restricted.
  • Data is available when requested.

Explanation: Data integrity ensures that data is unaltered in transit. With data integrity there is confidence that the data is accurate, consistent, and trustworthy. Cryptographic hashing functions are used to ensure data integrity.

48. Which feature of Python reduces the coding requirements when data tasks such as exploratory data analysis and machine learning are required?

  • The availability of code libraries such as NumPy and Scikit-learn.
  • Python code based on common English words.
  • It is both an interpreted and object-oriented language.
  • It is free and open source.
  • It uses dynamic data typing.

Explanation: The availability of code libraries such as NumPy and Scikit-learn reduces the coding effort when using Python programs to perform data tasks such as exploratory data analysis and machine learning. All the other options are general features of Python that are not necessarily related to data analysis.

49. What are two types of continuous variables? (Choose two.)

  • discrete
  • nominal
  • ordinal
  • ratio
  • interval

Explanation: Variables are either categorical or numerical. Categorical variables are qualitative and are either nominal or ordinal. Numerical variables are quantitative and are either continuous or discrete. Continuous variables are measured along a range of values and are either interval or ratio variables.

50. What is a pivot table in Microsoft Excel?

  • a table containing data that is organized vertically
  • a table containing data that is organized horizontally
  • a function to investigate formula errors and correct them
  • a function to automatically summarize, analyze, explore, and present data

Explanation: Pivot tables provide a way to automatically summarize, analyze, explore, and present data. Using the built-in tools you can identify trends, make comparisons between data items and create charts in different styles to visualize your data. Pivoting data can help you answer different questions and even experiment with your data to discover new trends and patterns.

51. Which action is taken during the data investigation step of the data analysis lifecycle?

  • Transform the data into a format appropriate for the analysis methods and tools.
  • Determine if there is enough data to complete the analysis.
  • Identify patterns, relationships, and correlations in the data.
  • Determine if the data is complete and relevant for the analysis.

Explanation: In the data investigation step of the lifecycle, data analysts will determine if the data set is complete and contains the relevant information needed for analysis.

Notify of

Inline Feedbacks
View all comments