Our AI analyst, Filly, is powerful, but it can’t read your mind. If your input data is rubbish, you’ll get high-resolution hallucinations in return.
We’ll be honest here, sometimes it’s us. Afterall Filly is an AI assistant and, for now, they do occasionally hallucinate. We’re not pretending to be anything we’re not here. We know the current limitations of AI models, but we do spend a lot of time working on our system prompt so that Filly can do its best to be a valuable member of your team. Just like humans though, it’ll sometimes get it wrong. But, I digress, that’s not the point of this post (we’ll cover it soon though).
Like with all data analysis, whether it’s AI driven or not, if your raw data is rubbish your outputs and analysis won’t be valuable. To the data nerds out there - you’ll know this as garbage in, garbage out. There are 100 different ways we can contextualise this using analogies or metaphors, like comparing it to; you are what you eat - if your data is junk food your analysis won’t thrive. Or my personal favourite; you can’t polish a turd. But at the end of the day the most valuable thing we can do (for you, and us - the more you learn about data the easier our job is), is to teach you how to build and maintain good datasets.
We’re finding as our customers connect more data sources to Filly, we’re seeing an increase in reported ‘errors’. When we dig down on this we’re finding that people are connecting data sources they haven’t been able to analyse before, or are uploading spreadsheets containing heavily edited data and trying to join it to the original source. It doesn’t matter how good your prompt is, or how comprehensive your context, if the input data doesn’t match what you’re saying, has gaps, or contains conflicting information, Filly is going to have a hard time giving you quality outputs. What you’ll likely get instead is a high-resolution (and very pretty) hallucination.
We know most non-profits have a Data Person. Not in the sense that they were actually hired to analyse data and help people make decisions with it. They’ve often happened upon the responsibility because of an innate curiosity, or it was thrust upon them as the only person in the team that could do a VLOOKUP. If you’re reading this, it’s probably you*.*
So, to help you do your unofficial data job better - there are six dimensions of data quality to keep in mind.
Accuracy refers to how closely your data reflects real-world truth. If the data you’re feeding Filly has been massaged, edited, or touched by human hands since it left the source, it’s probably no longer truly accurate. Most inaccuracy issues we encounter come from manual entry or manipulation; someone mistypes a donation amount, miskeys a date, or accidentally edits a cell while trying to get a pivot table to behave in Excel. That data gets loaded into Filament, and all of a sudden your average donation is $1,032. As nice as that would be, it’s probably not true. The second you move from raw, unedited source data to a manual spreadsheet, you’re introducing the risk of human error that can turn a factual report into a work of fiction.
Completeness measures whether all the required data is present and if there are any unintended gaps in your records. We know the way you collect and store information is constantly evolving. You might switch CRMs or start tracking new donor categories. Most of the time it’s impossible (or far too time-consuming) to backdate those changes across all records. When you’re setting up with Filament it’s important to bring in as much historical data as possible. If you’re integrating your apps, we do this for you (and we'll let you know if any data fails to load so you can fix it). But if you’ve moved from tracking information in a spreadsheet called DONATIONS_DATA_MASTER_V6_FINAL_2019.xls to Salesforce and you don’t upload that spreadsheet, Filly can’t magically fill in those gaps for you.
Consistency ensures your data looks the same across different systems and that two pieces of information don't contradict each other. If you are storing the same information in multiple places (and we hope you aren’t - peep Uniqueness for more on that), it’s vital that it stays uniform. This is where the wheels usually fall off for organizations using multiple tools. If one system uses a name while the other uses an email as the primary identifier, you need to ensure you can join the records and not over count. If those systems are collecting and storing that information in different formats, things get even trickier. Consistency is about aligning your identifiers, naming conventions, and categories across every source. Not only does this stop you from accidentally overcounting your impact, it gives you the power to build a truly comprehensive profile of your supporters, events, and programs.
Timeliness refers to how current your data is and whether it is available when it’s actually needed. In the world we’re living in, things move so quickly that if you aren’t using real-time data for decision making, you’re already behind. Filament has more than 650 app integrations, and we’re adding more every week. If something is missing, just ask us! By integrating your apps directly and getting live data feeds, rather than sticking to your old monthly "export-to-CSV" workflow, it’s much easier for you to keep your finger on the pulse and report in real time (and your dashboards will update automatically as new data flows into Filament!).
Validity, simply put, means is the information you’re uploading actually "data"? This is a big one for us because, under the hood, Filament is powered by SQL. SQL is a logic-based language that expects data to behave in a certain way and follow strict rules (you can read more on its statistical capabilities here). We’ve had spreadsheets loaded into Filament with "notes" shoved in fields meant for numbers, or custom metrics added with no consistent, documented methodology. Filly is smart, but it’s not a mind reader. If you aren't using a consistent coding system (think binary yes/no, standardized category labels, or documented and re-creatable methodology to calculate custom metrics), there’s not a lot of meaningful analysis we can do with it. If you need help documenting methodology for your custom project metrics or impact rankings - reach out to the team.
Uniqueness ensures there are no duplicate records and that each piece of data is only represented once. In an ideal world every record you store should represent a unique person, event, or data point. The reality is that most of the time, donors, organizations, or events are captured in multiple systems with varying pieces of data attached. This is complicated to unravel and stems from non-profits being chronically underserved by tech; your tools don’t talk to each other, so you’re forced to export data from five different places and try to wrangle it in Excel just to see your monthly impact. You might have done a great job engaging a donor across multiple channels, but they’ve used a different email each time. Suddenly, they appear once in one system but get double-counted in the other, and your records no longer represent unique people or events.
When it comes to setting up your Filament account, preparation is key. If you take the time to understand your data sources, ensure records are linking correctly, and connect all your apps, Filly will do a really good job of filling the role of ‘Data Person’ so you can get back to doing the job you were actually hired for.