184.108.40.206 Lab – Demonstrate Spreadsheet Limitations in Data Analysis (Instructor Version)
Observe the limitations of a spreadsheet when it is used for data analysis.
- Part 1: Explore and Download the Data
- Part 2: Determine the Root Bridge
Background / Scenario
There are massive amounts of open data available on the Internet for you to download and use to perform analysis. In this lab, you will explore the limitations of spreadsheets in data analysis using an open dataset from the City of Seattle.
- PC device with high-speed Internet access
- Spreadsheet program, such as Microsoft Excel, Google Sheet, LibreOffice Calc, Apple Numbers, or OpenOffice Calc
Part 1: Explore and Download the Data
In this part, you will download the dataset Road Weather Information Stations from https://data.seattle.gov.
You will also explore some of the characteristics of this dataset.
a. Navigate to https://data.seattle.gov/Transportation/Road-Weather-Information-Stations/egc4-d24i?.
b. Click View Data.
This dataset lists the road and air temperatures from the sensors embedded on bridges and surface
streets within the city limits of Seattle. The data is updated every fifteen minutes.
How many fields (columns) are in the dataset? How many records (rows)?
6(columns) and 100 rows__
What are the field names?
StationName,DataTime,Recordid, RoadSurfaceTemperature, AirTemperature__
c. Click Export. Click CSV to download this dataset as CSV. The download will take a few minutes. If you are unable to download the file, continue to the next part.
Note: If you were unable to completely download the file, Road_Weather_Information_Stations.csv, search for Road_Weather_Information_Stations.csv.part in your file system. It is mostly in your Download folder. Change the filename to Road_Weather_Information_Stations.csv.
What is the size of the downloaded file? 2.0MB
Part 2: Explore the Limitations of a Spreadsheet
With the downloaded dataset, you will attempt to open it in the spreadsheet program of your choice. If you were unable to download the file, you can review the steps and explore the limitations of spreadsheets without opening the file in a spreadsheet program.
a. Open your spreadsheet program. Microsoft Excel is used in this example.
b. Import the downloaded csv file, Road_Weather_Information_Stations.csv. Click the Get External Data. Click From Text and choose Road_Weather_Information_Stations.csv.
c. The Text Import Wizard displays. In step 1 of 3, click Next to continue.
d. In step 2 of 3, select Comma as the delimiter and unselect Tab. Click Next to continue.
e. In step 3 of 3, click Finish to continue.
f. Click OK to import the data.
What is the warning message you received?
The size of data is very large than one worksheet can fit it.
How will this limitation impede your analysis of the data?
Need many discrete worksheet to analysis a data and it’s difficult on big data.
g. Open a new spreadsheet. Try to reach the maximum number of rows in the spreadsheet. What is the maximum number of rows in the spreadsheet of your choice?
1048576 in excel 2019.
Compare the maximum number of rows in the spreadsheet to the number of records in the Road Weather Information Stations dataset from https://data.seattle.gov. Using this observation, explain the limitation of the spreadsheet.
If the popular spreadsheet programs cannot handle larger data sets, what data analysis tools are available? Use the Internet to search for some possible tools.
Microsoft HDInsight: is a Spark and Hadoop service in the cloud. It provides big data cloud offerings in two categories, Standard and Premium. It provides an enterprise-scale cluster for the organization to run their big data workloads.
Spark: Apache Spark is a powerful open source big data analytics tool. It offers over 80 high-level operators that make it easy to build parallel apps. It is used at a wide range of organizations to process large datasets.