Reporter's Notebook: Using machine learning to boost accountability reporting on the local level
Artificial intelligence techniques hold the promise to jumpstart investigative reporting, including at small news organizations.
Last year, students at the Yale Law School housing clinic spotted a troubling pattern.
In dozens of foreclosure cases in New Haven County, homeowners were saddled with inflated fees, making it even harder to pay off their debts.
Those excessive fees came from state marshals, who deliver copies of official paperwork during the court process. Under state law, marshals are limited in how much they can charge. But in case after case, the Yale students discovered, fees were too high – sometimes by hundreds of dollars.
“For us, it was really just gut-wrenching,” former Yale student Nicole Cabanez told me, “because we didn’t know how long this had been going on.”
They suspected the problem was more widespread. But with thousands of foreclosures filed each year, and with the details buried in court records around the state, how could we investigate?
Across a range of disciplines, artificial intelligence provides new ways to answer these kinds of questions.
In journalism, we’re in the early stages of experimenting with how to incorporate these emerging techniques into our work. Organizations such as The Associated Press use AI to automate repetitive tasks, such as writing articles about corporate earnings. Others use algorithms to help monitor social media, and better understand our audiences.
An entire new field of journalism focuses on algorithmic accountability, understanding how biases and other flaws are baked into new computer intelligence systems. In 2016, a ProPublica investigation called into question a risk assessment tool that was widely used in the criminal justice system, showing it was biased against Black defendants.
But as an industry, we’re still discovering how artificial intelligence techniques can jumpstart investigative reporting. At Connecticut Public, we recently made our first foray into the field, developing a rudimentary machine learning application to help us investigate court fees.
We gathered paperwork filed by state marshals in five years’ worth of foreclosure cases – more than 17,000 in total. We wanted to drill down on cases with inflated fees. But documenting the numbers in each was a significant technical challenge. Marshals don’t use a common template. The structure of each document was different; the wording varied, and the information showed up in different parts of the page.
Our solution was to run each record through a structured document analysis tool called PaddleOCR, developed by researchers at the Chinese software company Baidu. The tool reads text on the page, and also spits out numerical coordinates for the location of each block of text.
We used the information to train a machine learning model to read the dollar figure we wanted to record from each document, devising some simple characteristics to help the computer discern where to look.
Our model helped us to flag hundreds of cases with higher-than-average fees. We discovered widespread disparities in how much marshals charge, and numerous examples of inflated fees – more than $40,000 in total overcharges.
The reporting that followed showed marshals lack meaningful oversight, and Connecticut has no mechanism in its foreclosure process to scrutinize excessive fees.
Homeowners are also unlikely to know they're being gouged. While reporting the story, I spoke with a couple living in a mobile home. They were fighting foreclosure of a municipal tax lien. Their debt increased by several thousand dollars after the case went to court. Those additional fees included an excessively high bill from the state marshal who served them paperwork.
I met with another woman in Milford who was struggling to find placement in an assisted living facility. Her home had entered foreclosure more than two years earlier because she owed back taxes. A family member sifted through a stack of paperwork by her bedside. He produced a bill showing the amount needed to keep her home. It had grown to more than $14,000, a sum that included an erroneous bill from the state marshal.
Machines won’t unearth these stories on their own. But as we found, new computational techniques promise to strengthen accountability reporting in exciting ways in the future.