Search
  • John Concklin

Data Integration

Updated: Jul 24, 2019

The anti-poverty industry is playing with half-century old data tools. "Community data" is this buzz phrase that gets thrown around as if it's some kind of silver bullet. Anyone who works in the field knows what I'm talking about. I'm not knocking community data - it has a vital role, but at best it's only half the picture. The problem is that while data tools have evolved in the rest of the world, the anti-poverty industry has yet (or in some cases flat out refuses) to embrace them.


Community data describes what has happened at a given point or during a certain range of time. It is useful in that regard only - describing the past. Unfortunately, anti-poverty organizations then use such data to predict what might happen. They make million dollar decisions on it. Community organizations organize around those decisions. We all pat ourselves on the back about how "collaborative" we all are. We re-measure using some variation of the same practice, just with more fancy language. We find it describes something different. We repeat the entire cycle again. Nothing really gets better. The metaphorical house is built on a bad foundation, so we build another floor on top.


Ask yourself: have we solved poverty? Have we changed educational outcomes? Early childhood? Homelessness? Veterans? The answer, unfortunately, is not really.


One of the reasons is that we do not have the means to describe or predict, at scale, the experience of individuals inside the system - the people we are supposedly "fighting for."


Below is a very basic illustration of the problem.

John goes to four different organizations. Those organizations do their thing. They then each report "1 served/fed/employed/etc" to the funder. The funder, blindly, reports "4 served" to the community. The first problem is that "4 served" is patently false. This is how we see gaudy and almost unbelievable numbers out there. The second, and more pressing, problem is that no one can clearly describe John or his experience.


The retail industry has made untold wealth on predicting (accurately) what consumers will do. During my hiatus from music, I wondered aloud what would be possible if we modified those tools for something other than consumerism. Using the robust data practices used by virtually everyone, could we develop a pathway out of poverty based on the experience of similar, preceding, individuals and tailor it to the current individual?


The answer is yes. And even more exciting, it was cheaper and easier than the current method of data collection (and it still nets the old community data, just more accurately and cost effectively). The best part - the on the ground, service organizations barely had to change anything. In fact, they got a lot of time back to - you know - actually help people. The ones that had to change? The people and organizations with the power and money.


I will say up front that data privacy was and is a concern. Most of the time spent building the thing was figuring out the ethics of it. This will evolve and get more complicated. We had to use lawyers. It is/was worth it.


Here's what we did:


We realized that organizations were collecting all this individual-level data to send out the various reports that were required. For each, they had to clarify language, manually make a calculation, and (in most cases) hand key enter figures into any number of "reporting tools." They were effectively aggregating the information (resulting in keeping the data anonymous and therefore unactionable) and multiplying opportunities for human error. Instead of that, we simply asked organizations to export their database on a monthly basis and share that file via an encrypted file-sharing service. We recruited close to 20 organizations to follow this practice to see what was possible.


Once everyone's monthly data was in the same place, we used R and SQL to develop a set of proof points: 1) could we find unique individuals that went to more than one organization; 2) using the information from those separate organizations, could we paint a more specific picture of the individual; 3) could we find that person again if they showed up in the system in the future?


The answer to all three was yes.


We found that duplication was around 11%-14%, annually. We could see multiple data points on every individual. And we could see those data points change over time. It was simply miraculous.


From there, we commissioned a data company to build us a program that would automate everything for us so that all we had to do was visualize it. We purchased the rights to it. It was built, and it's still working. My plan was to give it away free to anyone who wanted it (then music called me back, so if you want it, you'll have to ask United Way for the tech - I'm happy to help you get started).


Unfortunately, I can't go into details about what we found in the data (see data privacy, and I err towards very prickly about it), but I can say that if someone were to ask me the typical profile of a service recipient, I could tell you their race, age, gender, and where they live. If someone were to ask me who makes it out of poverty and how long it takes, we have a testable theory on that as well. If someone were to ask who can't seem to make it and why, no matter how hard they try, I can tell you that too. And that, paired with community data, and case manager testimonial, we can understand the poverty experience to a level of detail as never before.


Here's the endgame, though. Let's say ten John-like people (mid-30s, Asian, college educated, married with no children) were to go through the system and "get out of poverty." Then, John #11 shows up. We could give John #11 a science-based action plan that could provide a tailored map, estimated time, and signposts along the way.


We could, in effect, have a cure for poverty.

0 views
  • YouTube
  • Instagram