July 1, 2021 / Mwangi Ndonga

Some Big Data Lingo That You Should Know

Big Data is the term for data sets that are so large they can’t be analyzed or used in the traditional way. Imagine, for example, trying to work with a spreadsheet that has more than 100,000 rows and 40 columns, and you’ll begin to understand why special tools and techniques are necessary to manipulate Big Data. In addition to numbers and text, such data can encompass pixels, images, videos, or vectors. The challenges of Big Data include not only the volume of data, which is measured in terabytes, but its velocity, or the speed with which the data is generated and processed.

If a solution to your problem requires Big Data, you will need the assistance of your IT team. Below are explanations of some common database-related terms, as well as examples of my collaborations with data professionals, that can help prepare you to discuss Big Data with IT professionals. The list is not exhaustive.

A relational database is a collection of items with predefined relationships. For a business, a relational database may include customer names, orders, addresses, phone numbers, total sales, product names, and UPC codes. I have used relational databases that contain operational production information and readings from four-gas monitors to better understand how an increase in production volume affects exposures at the lower explosive limit. To understand the types of data in the relational database, I requested a preview of the data that would feed my exploratory analysis.

A data warehouse is a type of data management system designed to enable and support business intelligence activities, especially analytics. Many organizations have data warehouses. If your company uses dashboards or other tools that are updated in real time, they are likely querying the data warehouse. In the example I gave above, the relational database I was previewing was the data warehouse.

You may be able to point tools like Power BI or Tableau to your data warehouse so that visualizations of data are always current. Sometimes, however, software platforms such as incident management systems are unable to integrate with other systems. If you can extract the information you need from such software and add it to the data warehouse, you can blend information to your liking.

A data mart is a subset of the data warehouse and is usually curated to a specific business line or item. If there are certain data—such as production rates or sales figures—that you use constantly, request that a data mart be constructed that suits your needs. Data marts are more cost-effective than dealing directly with data warehouses. I was fortunate that the real-time data from the four-gas monitors was being downloaded to our EHS-specific data mart.

In a flat file, records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. Comma-separated values (CSV) files are examples of flat files. Such files have the advantage of simplicity of storage and ease of use.

Structured Query Language (SQL) is the standard programming language used to query and manipulate data in relational databases. If you think you’ll be spending a lot of time sifting through databases, then reduce your reliance on the database team by learning SQL. SQL is intuitive and one of the easier languages to learn. Although I’ve learned basic SQL, I haven’t had a chance to query databases on a frequent basis. However, if you are familiar with R or Python, you may be able to connect to databases and use SQL within your development environment.

An application programming interface (API) enables companies to open their applications’ data and functionality to external third-party developers, business partners, and internal departments within their companies. APIs are simple and everywhere; federal agencies like OSHA, EPA, and the National Oceanic and Atmospheric Administration (NOAA) have APIs. You can make API calls to many data sources to inform your decision-making. For example, you can place calls to the NOAA API to integrate climate and weather data with your own.

If you’re curious about where to go next in your data journey, consider joining AIHA’s Big Data and Sensor Technology Content Advisory Group, Technology Initiatives Strategic Advisory Group, or Big Data Technical Framework Group. For a primer on Big Data, read “Predictive Purposes” in The Synergist.

Mwangi Ndonga

Mwangi Ndonga, CIH, CSP, CHMM, is the senior health and safety hygienist at Ball Corporation and is a charter member of the AIHA Technology Initiatives Strategic Advisory Group.


There are no submissions.

Add a Comment