So I was recently working with a colleague and she sought my advice on using a workflow for a very large task; here is the score: This client (ACME to protect the innocent) has a HUGE amount of data, like many of you out there; it was the data equivalent of an episode of hoarders. There was a hiccup during a data import and a sales history record was populated with a customer number, but not the customer name; so essentially she needed to find a way to use an automated process to look at the record, use customer number to run a query and bring back the appropriate customer and put it in the ‘customer’ field on the sales history record. Sounds easy right? Not so much.
Firstly, there is no way to run a compare like she was suggesting in a workflow, so I was immediately thinking about a dialog where you can run a query. Once I started looking over her shoulder though, I realized that we had a bigger problem. Many of you may realize that if you have a lot of records, CRM will stop counting them at 5000. (You can surf beyond that point by leafing through the pages maximum of 250 at a time but if you’re just pulling up a list of records, it will only show you that you have 5000+.)
So my next suggestion is a trick I’ve used on client sites before: kick out a list of clients, with the client ID that’s listed at the sales history, then kick out the sales history in a separate spreadsheet and do a VLOOKUP, then BAM! Reimport the data. Well, when we tried to do that, the request with the server kept timing out (Meaning the list of records was truly HUGE!) What it took me a minute to realize was that we were really talking about an immense amount of records here and that caused the inability of the request to complete.
We even tried to use the above approach and just do one month of data at a time, but even with just one month of data, we were still running into the same issue with trying to pull down too much data. We would have basically had to do at least a year and a half’s worth of data by breaking it up into two week chunks based on created on date. Not fun.
So this problem had clearly been going on for a while at ACME. What I realized in the midst of all of this was that there was another way we could come at this problem. I mean we could have done something through our (custom) development team, but that seemed like overkill to me. I could have built the dialog/workflow but we’re stuck basically going through individual records though technically at a slightly faster pace (daunting when you’re over 50k records per month). How we settled on it was to save the advanced find as a view, use our ‘created on’ trick to break up the data into more manageable chunks of about a month with the ‘filter’ trick, and then render those in a chart. In this case I used client ID by client ID in a pie chart. This allowed me to see who was taking up the most real estate in the ‘sales history’ records and then fix all of those first; but even this wasn’t so simple. Even at a month we were north of 50,000 records. And many of you may know that when you’re talking that many records, a chart will not display summary information for datasets over 50,000 records. So we basically broke our sales history records up into chunks of just under a month, taking out the biggest offenders (since those are the easiest) with a mass edit, and then moving on to the next segment. This will allow my colleague to work with a slightly more manageable dataset, and once she is at a point where not dealing with an estimated 900,000 records we should be able to go back to go back to the second solution and use a VLOOKUP to populate data for the records that weren’t big enough to get fixed in the first pass.
Sure it may all sound easy now that I’ve laid out the solution for you, but I assure you it was a decided ‘aha moment’ when I realized we could chuck out the records based on that customer ID and then use more standard approaches to get the rest.
That was all rather stream of conscience, so if it didn’t make sense, or if you have your own data questions, feel free to hit me up in the comments.