Data Can Be A Double-edged Sword For Generative AI, Say Government Experts

Federal leaders say streamlining the vast amounts of existing and incoming data inherent to federal operations is central to agencies’ planned use cases for generative artificial intelligence technology, but proper data management is essential to prevent bias and inaccuracies.

“The primary benefit of using GenAI is the ability to analyze very large data sets in government operations and the ability to process that vast amount of data and derive insights,” Chakib Khraiby, chief data scientist at the Commerce Department’s National Technical Information Service, said during an ATARC panel on Thursday.

Internally, agencies are using AI for a variety of uses related to this issue. Conrad Bovell, branch director of the Department of Health and Human Services’ Cybersecurity Advisory and Strategy Division, said researchers at the National Institutes of Health have created an AI tool that uses clinical data to gauge whether a particular immunotherapy drug is effective in treating patients’ cancer.

Nathan Hotaling, a senior data scientist at NIH, proposed another in-house AI application that uses generative AI software to read unstructured data, such as notes stored in PDF documents, and convert it into searchable text and data.

Stephanie Wilson, a contracting officer in the Defense Department’s Chief Digital and Artificial Intelligence Office, said during the panel that the department is using generative AI to do similar work with unstructured data, ultimately aiming to reduce the administrative burden involved, particularly around contract documents, research and policy.

“I think that kind of impact is going to help everyone, no matter what field they’re in, do their job better, because before we had to rely on humans to get contextual information from unstructured data,” Hotaling says, “and now, with Gen AI, we can use that just like we use structured information.”

Terry Carpenter, chief information officer at the National Science Foundation, said the organization, like other agencies, is exploring deploying chatbots that leverage large-scale language models, but is focused on specific AI algorithms that use search and predictive analytics to support customer experiences.

Federal agencies are also interested in applying AI to cybersecurity, and Bovell said the double-edged sword AI brings to the field is that cybersecurity teams can use pattern-recognition algorithms to monitor the data coming into their networks while also fighting AI attacks used by cybercriminals and bad actors.

“AI models can identify patterns that indicate cyber threats such as malware, ransomware or anomalous network traffic, which may include data from existing traditional detection systems,” Bovell said. “Generative AI contributes to more sophisticated analysis and anomaly detection. [security information management] The system also gets stronger as it learns from past security data.”

Kreibi said dealing with some of the raw, unstructured data in government systems runs parallel to cybersecurity challenges.

“I think the challenges in cybersecurity are similar to the challenges we have in terms of data access and management,” he says. “When I talk to cybersecurity professionals, the first thing they complain about is they don’t have the time, they don’t have the bandwidth. That’s where I think generative AI can come in handy.” [generative AI systems] It actually helps with the tedious tasks and basically makes the threat more predictable. [and] You can actually work on them.”

As with other AI applications, these federal use cases hinge on how the algorithms process input data and what outputs they produce. As government agencies continue to deploy AI capabilities to speed and streamline government services, panelists discussed the need to address unresolved security concerns.

Kreibi noted that data contamination, illusions, and instant injections could produce harmful or biased AI output. Carpenter echoed that the lack of standardization of past government data is an open issue that could negatively impact the functioning of AI algorithms.

“Data has been a pain point for us for decades,” Carpenter said. “The work of properly tagging the data, understanding the data, and organizing the data has never been done. We need to do some of that.”

He added that at NSF, the first step is to develop a stronger technical workforce. The agency is helping develop a prompt engineering curriculum to build a holistic understanding of AI and machine learning systems. That includes training everyone in the current workforce on the fundamentals of AI.

“Building these capabilities requires different actors involved in different processes to execute those processes using different tools,” Carpenter says. “IT is not the only party. This further reduces the importance of IT and again elevates the importance of the mission owners, who know the data better than IT people and can interact with it on a regular and more frequent basis.”

Kleibi agreed, reiterating the well-known imperative that humans remain in the loop to ensure that federal AI solutions don’t become black box technologies.

“[In] “When it comes to developing GenAI, and AI in general, the solution has to be a collaborative effort,” Chraibi said. “Panelists spoke about the importance of data quality. It’s not enough to have data scientists alone to make sure the data is complete, consistent, unbiased, etc.; you need input from experts who work with that type of data.”