1.3.2.6 Lab – Demonstrate Spreadsheet Limitations in Data Analysis Answers

1.3.2.6 Lab – Demonstrate Spreadsheet Limitations in Data Analysis (Instructor Version)

Objectives

Observe the limitations of a spreadsheet when it is used for data analysis.

  • Part 1: Explore and Download the Data
  • Part 2: Determine the Root Bridge

Background / Scenario

There are massive amounts of open data available on the Internet for you to download and use to perform analysis. In this lab, you will explore the limitations of spreadsheets in data analysis using an open dataset from the City of Seattle.

Required Resources

  • PC device with high-speed Internet access
  • Spreadsheet program, such as Microsoft Excel, Google Sheet, LibreOffice Calc, Apple Numbers, or OpenOffice Calc

Part 1: Explore and Download the Data

In this part, you will download the dataset Road Weather Information Stations from https://data.seattle.gov.

You will also explore some of the characteristics of this dataset.

a. Navigate to https://data.seattle.gov/Transportation/Road-Weather-Information-Stations/egc4-d24i?.

b. Click View Data.

This dataset lists the road and air temperatures from the sensors embedded on bridges and surface
streets within the city limits of Seattle. The data is updated every fifteen minutes.

1.3.2.6 Lab - Demonstrate Spreadsheet Limitations in Data Analysis Answers 9

How many fields (columns) are in the dataset? How many records (rows)?
6(columns) and 100 rows__

What are the field names?
StationName,DataTime,Recordid, RoadSurfaceTemperature, AirTemperature__

c. Click Export. Click CSV to download this dataset as CSV. The download will take a few minutes. If you are unable to download the file, continue to the next part.

Note: If you were unable to completely download the file, Road_Weather_Information_Stations.csv, search for Road_Weather_Information_Stations.csv.part in your file system. It is mostly in your Download folder. Change the filename to Road_Weather_Information_Stations.csv.

1.3.2.6 Lab - Demonstrate Spreadsheet Limitations in Data Analysis Answers 10

What is the size of the downloaded file? 2.0MB

Part 2: Explore the Limitations of a Spreadsheet

With the downloaded dataset, you will attempt to open it in the spreadsheet program of your choice. If you were unable to download the file, you can review the steps and explore the limitations of spreadsheets without opening the file in a spreadsheet program.

a. Open your spreadsheet program. Microsoft Excel is used in this example.

b. Import the downloaded csv file, Road_Weather_Information_Stations.csv. Click the Get External Data. Click From Text and choose Road_Weather_Information_Stations.csv.

1.3.2.6 Lab - Demonstrate Spreadsheet Limitations in Data Analysis Answers 11

c. The Text Import Wizard displays. In step 1 of 3, click Next to continue.

1.3.2.6 Lab - Demonstrate Spreadsheet Limitations in Data Analysis Answers 12

d. In step 2 of 3, select Comma as the delimiter and unselect Tab. Click Next to continue.

1.3.2.6 Lab - Demonstrate Spreadsheet Limitations in Data Analysis Answers 13

e. In step 3 of 3, click Finish to continue.

1.3.2.6 Lab - Demonstrate Spreadsheet Limitations in Data Analysis Answers 14

f. Click OK to import the data.

1.3.2.6 Lab - Demonstrate Spreadsheet Limitations in Data Analysis Answers 15

What is the warning message you received?
The size of data is very large than one worksheet can fit it.

1.3.2.6 Lab - Demonstrate Spreadsheet Limitations in Data Analysis Answers 16

How will this limitation impede your analysis of the data?
Need many discrete worksheet to analysis a data and it’s difficult on big data.

g. Open a new spreadsheet. Try to reach the maximum number of rows in the spreadsheet. What is the maximum number of rows in the spreadsheet of your choice?
1048576 in excel 2019.

Compare the maximum number of rows in the spreadsheet to the number of records in the Road Weather Information Stations dataset from https://data.seattle.gov. Using this observation, explain the limitation of the spreadsheet.

Reflection

If the popular spreadsheet programs cannot handle larger data sets, what data analysis tools are available? Use the Internet to search for some possible tools.
Microsoft HDInsight: is a Spark and Hadoop service in the cloud. It provides big data cloud offerings in two categories, Standard and Premium. It provides an enterprise-scale cluster for the organization to run their big data workloads.

Spark: Apache Spark is a powerful open source big data analytics tool. It offers over 80 high-level operators that make it easy to build parallel apps. It is used at a wide range of organizations to process large datasets.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x