Big data & Analytics Chapter 1 Quiz Answers – Data and the Internet of Things

1. Which term describes the growth rate of data in the IoT?

  • linear
  • cyclical
  • uniform
  • exponential

Explanation: Current trends and forecasts all indicate that today data grows exponentially.

2. What are two examples of unstructured data? (Choose two.)

  • video content
  • user account data
  • SQL queries
  • blog entry
  • customer account spreadsheet

Explanation: Unstructured data is raw data, data that is not organized in a predefined way. Examples of unstructured data would be contents of photos, audio, video, web pages, blogs, books, journals, and white papers.

3. What is a characteristic of open data?

  • data that lacks intellectual property restrictions
  • data that lacks predefined organization
  • data that does not need to be stored
  • data that does not generate new knowledge

Explanation: Open data is not protected by intellectual property restrictions and can be used and redistributed without legal, technical, or social restrictions.

4. What is Hadoop?

  • a method of preventing loops when analyzing Big Data
  • a groundbreaking method of moving large amounts of data through micro loops
  • a framework that allows distributed processing of data across clusters of computers
  • a method of sharing data across multiple companies using computing resources housed within each respective company

Explanation: Data management and analysis today are characterized by the use of flat file databases, relational database management system (RDBMS), and the Hadoop framework that allows distributed processing of data across clusters of computers using simple programming models.

5. Which statement describes the paradigm that is promoted in the Cisco Fog Computing Model?

  • All data analysis and decision making should take place near the data source.
  • Some data analysis should take place at the edge of infrastructure rather than at a central location.
  • Data generated by edge devices should be sent to the nearest regional data analysis center for data aggregation.
  • Data collected at the edge of the infrastructure should be stored in a central data center for security and backup operations.

Explanation: The Cisco Fog Computing Model states that some of the analysis work should take place at the network edge instead of at a centralized location. Sensors and controllers at the edge can make smart decisions based on the data collected locally, factors that facilitate faster response and action.

6. In the data analysis process, which sequence depicts the work flow suitable for data at rest?

  • act > analyze > store > notify
  • analyze > notify > act > store
  • notify > store > act > analyze
  • store > analyze > notify > act

Explanation: Data at rest is static data that is stored in a database first and then analyzed and interpreted. Data at rest follows the traditional analysis flow of store > analyze > notify > act. Once the data is analyzed, decision makers are notified and determine whether action is needed.

7. What has contributed to the exponential growth in data generation?

  • the increasing number of mobile devices
  • the increasing number of standalone devices
  • the increasing number of isolated software applications
  • the increasing number of physical installations for protecting environment facilities

Explanation: An increased number of sensors and other end devices as well as mobile devices are contributing to an exponential growth in data generation.

8. What is a characteristic of structured data?

  • Structured data is subject to intellectual property restrictions.
  • It has a predefined organization.
  • It is raw data.
  • It generates new knowledge.

Explanation: Structured data is data that is structured and can be entered, classified, and queried by a computer. Data that is found in databases and spreadsheets is an example of structured data.

9. What is true of Big Data in comparison to traditional data?

  • Traditional data is represented through binary strings, whereas Big Data is represented through hexadecimal strings.
  • Both types of data require the same hardware for processing and storage.
  • Big Data requires a different approach to analysis, computing, and storage mechanisms.
  • Big Data means that the data sets are being sent through the network in larger packets than the sets that contain legacy data.

Explanation: Scale defines the difference between Big Data and the data that existed before the term Big Data existed. Based on the increased volume and type of data, big data requires a different approach to data analysis, computing, and storage. Different hardware and applications are required to handle the quantity of data produced. Both types, however, still involve binary strings. There is no difference between Big Data packets and packets that are not Big Data.

10. How do sensors relate to Big Data?

  • They are types of multimedia applications that are sources for Big Data.
  • They are devices that collectively generate large amounts of data.
  • They are devices that can only be used with static data.
  • They produce structured data.

Explanation: The use of sensors in IoT systems is growing exponentially. Each sensor has a multiplicative effect on the amount of data generated. Sensors are quickly becoming the greatest contributors toward Big Data.

11. Which two statements describe characteristics of data in motion? (Choose two.)

  • Its value changes over time.
  • It is stored at a central data center.
  • It requires real-time processing close to the source.
  • It is the data in RAM during a data analysis process.
  • It is stored in removable devices for easy transportation.

Explanation: Data in motion describes the status of data to be distributed among different locations, the need of data to be analyzed close to the source, and how its value changes dynamically over time.

12. Refer to the exhibit. To remain competitive, a company has progressed from on-premise technology to the cloud environment. What technology environment would a manager need to consider to accommodate long-term storage and immediate analysis of data in motion.

  • an analytic model
  • a hybrid model
  • the fog model
  • a cloud model will accompish both requirements
  • on-premise clouds

Explanation: A manager should consider a hybrid option that includes cloud computing for long-term storage of data and fog computing for immediate access to streaming data. The immediate access to the data at the company edge would allow for rapid timely analysis for time-sensitive applications.

13. When is data considered to be information?

  • when it is stored
  • when it is recorded
  • when it is processed and analyzed
  • when it is generated

Explanation: Data that has been processed, organized, analyzed, or presented in a meaningful way becomes information.

14. Which characteristic of big data describes different types of datasets that include both structured and unstructured data?

  • velocity
  • volume
  • variety
  • veracity

Explanation: The characteristics of big data can be described in four Vs:

  • Volume – the amount of data being transported and stored
  • Velocity – the rate at which this data is generated
  • Variety – the different types of data both structured and unstructured: video, audio, text
  • Veracity – the process of preventing inaccurate data from spoiling the data sets

15. What is an example of data in motion?

  • recording road traffic volumes and patterns for future highway planning
  • medical information being transmitted from an ambulance to emergency department staff as a critically ill patient is being transported to the hospital
  • hourly weather information being collected in preparation for the next day weather forecast for a specific location
  • collecting sales and transaction records in preparation for a monthly sales report from sales consultants as they travel between customers

Explanation: Data in motion is dynamic data that requires real-time processing before the data becomes obsolete. It represents the continuous interactions between people, processes, data and things. In this example the real-time medical information enables the emergency staff to be appropriately prepared before the patient arrives at the hospital.

16. Which type of information is captured and stored as events happen?

  • critical
  • analytical
  • comparative
  • transactional

Explanation: The two primary types of business information useful to a company are transactional information and analytical information. Transactional information is captured and stored as events happen. Transactional information can be used to analyze daily sales reports and production schedules to determine how much inventory to carry. Analytical information supports managerial analysis tasks like determining whether the organization should build a new manufacturing plant or hire additional sales personnel.

17. Which method does openPDS use to protect user privacy of GPS records on a mobile device?

  • requiring authentication to be completed first
  • encrypting the communication from data to the app
  • providing answers to specific queries instead of raw data
  • removing identifiable personal information before sending data to the app

Explanation: Using the SafeAnswers framework, openPDS provides only answers to specific queries and no raw data is sent. The calculation for the answer is done within the personal data store (PDS) of the user.

18. What are three examples of a NoSQL database? (Choose three.)

  • Ceph
  • HDFS
  • Redis
  • MongoDB
  • GlusterFS
  • Apache Cassandra

Explanation: MongoDB, Apache Cassandra, and Redis are examples of a NoSQL database. The Hadoop Distributed File System (HDFS), Ceph, and GlusterFS are examples of distributed file systems (DFS).

19. What is a purpose of applying data anonymization process to a data sets?

  • to compress the data sets
  • to reduce the size of the data sets
  • to remove identifiable personal information
  • to adjust the value length of certain data fields

Explanation: Data anonymization is a process of either encrypting or removing identifiable personal information from data sets to achieve privacy protection.

20. A multi-campus school wants to perform analytics on classes held during the past 5 years. The school wants to know which classes filled up the quickest across all campuses and which classes filled up the quickest at each campus. The school also wants to know if there is a relationship between the number of passing students and the speed in which a class taught by a particular teacher fills. If the school could only choose one type of database to store the data on one server, which type would be best suited for this task?

  • flat
  • local
  • Hadoop
  • relational

Explanation: A relational database, even though it has multiple, connected tables, can reside on one server and would be best for this type of data. A local database is typically used to collect and store local data, for example, a database of all movies and music for a particular family. A flat database would most likely not be used in a multi-location school to store student data such as this. Hadoop is best to use when distributing processing power across server clusters.

21. What are two key components in creating data analysis tools from scratch? (Choose two.)

  • coding
  • modeling
  • data sets
  • performance
  • program length

Explanation: Modeling and coding are the two key components in the process of creating data analysis tools from scratch. Modeling consists of deciding what to do with the data to achieve the desired results and conclusions. A well-developed model can be used to handle multiple types of data sets. The code is the program that implements the model and processes the data according to the model already developed. The length and performance are factors and features of a program.

22. Which statement describes SQLite?

  • It is an example of flat file database.
  • It is an embedded SQL database engine.
  • It is a free version of RDBMS suitable for enterprises.
  • It is a fully functional RDBMS for distributed data processing.

Explanation: SQLite is an embedded SQL database engine in that it does not follow the traditional client/server model like SQL RDBMS (relational database management system). SQLite reads and writes directly to ordinary disk files.


Inline Feedbacks
View all comments
Would love your thoughts, please comment.x