May 21, 2024

How Good Are Chatbots at Summarizing OEHS Regulations?

By Ed Rutkowski

Since ChatGPT was introduced in November 2022, occupational and environmental health and safety professionals have sought ways to use chatbots for OEHS purposes. A short “pop-up” session held May 20 at AIHA Connect 2024 described one potential practical application of the technology: to read and summarize OEHS regulations.

Benjamin Roberts, PhD, MPH, CIH, a supervising risk scientist at Benchmark Risk Group, began his presentation with an overview of how chatbots work. He explained that everything typed into a chatbot is translated into numbers, a process known as “tokenization.” This process allows the chatbot to perform nuanced analysis of words in context, with particular attention paid to adjacent words.

The engine behind a chatbot is what’s known as a “large language model,” or LLM. These models are trained on huge amounts of data. Through this iterative training, chatbots have “learned” to predict sequences of words, but when their predictions are wrong, it’s hard to understand why, Roberts said. He cautioned his audience not to become enchanted with the technology’s ability to produce seemingly intelligent text. “These models don’t know anything,” he said. “They’re not thinking.” He suggested that people consider chatbots as a kind of enhanced autocorrect.

He also warned that chatbots can be manipulated. “You can actually bully ChatGPT into giving you an answer it probably shouldn’t,” he said, such as providing instructions for making dangerous chemicals.

To test chatbots’ usefulness for interpreting OEHS regulations, Roberts and other researchers uploaded PDF copies of OSHA’s noise and silica standards into several tools, including ChatGPT 3.5, which was the most recent iteration of the tool at the time. They also tested several versions of Llama, a chatbot created by Meta, the company that owns Facebook. The research team prepared questions about both standards, such as “how long can employees be exposed to 95 decibels of noise?” The chatbots’ answers to these questions were recorded, and four reviewers graded the answers on a four-point scale.

ChatGPT performed the best, with an average reviewer score of 3.15 on questions about the noise standard and 3.64 on questions about the silica standard. By comparison, the versions of Llama scored from 1.67 to 2.27 on noise questions and 2 to 2.86 on silica questions.

Still, the researchers noted some oddities in ChatGPT’s responses. In one instance, ChatGPT replied to the question about 95 decibels with a lengthy response that didn’t provide an answer. (Roberts noted that according to the OSHA noise standard, exposure to 95 dBA should not exceed four hours.)

The results of the testing illustrated commonly reported problems with chatbots: that they aren’t good at even basic math, and they often present answers with unwarranted certainty. “Even though they struggle with synthesis, they’re very confident that they’re right,” Roberts said. Someone unfamiliar with the regulations could be lulled into a false sense of security that the chatbot is answering a question accurately. Roberts also noted that remotely hosted chatbots seemed to do better overall than locally hosted ones, but sending OEHS data to remote chatbots raises privacy concerns.

Roberts suggested that large organizations may benefit from developing their own internal LLMs. For small and medium-sized businesses, the usefulness of LLMs will depend on their individual data needs.

While chatbots have the potential to be useful for specific OEHS applications, Roberts concluded that users need to proceed cautiously. “Don’t rely on these things to do your thinking for you,” he said.

Ed Rutkowski is editor in chief of The Synergist.

Read more coverage of AIHA Connect 2024.