
Last week, a federal judge issued a sweeping 223-page opinion sharply criticizing the Department of Homeland Security for the way it conducted immigration raids targeting undocumented immigrants in Chicago. While the ruling primarily focused on systemic misconduct and procedural failures, one particularly striking revelation appeared almost in passing. Tucked away in a footnote were two sentences disclosing that at least one law enforcement officer had used ChatGPT to draft an official use-of-force report—a document meant to provide an accurate, factual account of how force was applied against an individual.
The ruling, authored by U.S. District Judge Sara Ellis, took direct aim at the conduct of Immigration and Customs Enforcement (ICE) and other participating agencies during what they labeled “Operation Midway Blitz.” That operation resulted in the arrests of more than 3,300 people, with over 600 individuals ultimately placed into ICE custody. According to the court, the operation was marked by repeated confrontations with protesters, bystanders, and local residents, some of which escalated into violent encounters.
Under standard procedure, each of those incidents was supposed to be carefully documented through formal use-of-force reports. But Judge Ellis found significant discrepancies between what officers recorded on their body-worn cameras and what appeared later in the written reports. In many cases, the accounts simply didn’t match the video evidence. As a result, Ellis concluded that the reports themselves were unreliable and could not be trusted as accurate reflections of what actually occurred.
More troubling still was the revelation that at least one of those reports was not written by an officer at all. According to Judge Ellis’s footnote, body-camera footage showed that an agent “asked ChatGPT to compile a narrative for a report” using only a brief summary of an encounter along with several images. The officer then allegedly submitted the AI-generated output as an official report—despite the extremely limited information provided to the chatbot and the likelihood that it filled in missing details with assumptions.
Judge Ellis explicitly warned about the implications of such practices, writing:
“To the extent that agents use ChatGPT to create their use of force reports, this further undermines their credibility and may explain the inaccuracy of these reports when viewed in light of the [body-worn camera] footage.”
In other words, the judge suggested that the use of generative AI may not only compromise the reliability of individual reports, but could help explain why so many official accounts conflicted with recorded evidence.
According to the Associated Press, it remains unclear whether the Department of Homeland Security has a formal, agency-wide policy governing the use of generative AI tools for writing law enforcement reports. However, experts note that relying on tools like ChatGPT in this context is deeply problematic. Generative AI systems are known to fill in gaps with fabricated or speculative information when they lack sufficient data—an especially dangerous flaw when accuracy and accountability are critical.
DHS does maintain a public-facing portal outlining its approach to artificial intelligence and has introduced an internal chatbot designed to help agents handle “day-to-day activities.” That internal system was reportedly developed after testing commercial chatbots, including ChatGPT. However, Judge Ellis’s footnote strongly suggests that the officer in question did not use a DHS-approved tool, but instead accessed ChatGPT directly, uploading images and minimal context to generate the report.
Given the stakes, it’s perhaps unsurprising that one expert told the Associated Press this situation represents the “worst-case scenario” for AI use in law enforcement—where unverified, AI-generated narratives are inserted directly into official records that can affect people’s rights, legal outcomes, and personal safety.
The revelation adds yet another layer of concern to an already damning ruling, raising serious questions about transparency, training, accountability, and the expanding role of generative AI in areas where factual accuracy is not just important, but essential.