table { border-collapse: collapse; width: 500px; margin: 40px auto; } td, th { border: solid 1px #cecece; padding: 10px; } This post on cleaning data is the second in a 3-part series on extracting, cleaning and enhancing data. Part 1 covered extracting data from a website.It's an inconvenient truth: most data you find on the web is messy and often needs to be thoroughly cleaned. At Silk.co we spend a good amount of time cleaning data and we figured we'd share some of the tricks we picked up along the way!Google Sheets and Excel tricksWe'll use Google Sheets in these examples, but Excel usually uses the same formula names (if you are using the English version).The Find & Replace command is very helpful when cleaning data. Use it to change anything that doesn't look right and follows a predictable pattern. Examples are weird notation symbols, an extra space at the end of each value, etc.Use the PROPER formula to capitalize each word in a string if needed. Example usage: =PROPER(A2).Use the CONCATENATE formula to combine the values of 2 columns into one column. Example usage: =CONCATENATE(A2," ",A3)Convert AM/PM times to HH:MM easily by using =TEXT(A2,"HH:MM").Use the SPLIT formula to split the contents of a cell into separate columns, based on a common symbol (a comma, for instance). Example usage: =split(A2,", ",false)Use Format -> Number to make sure your cells with numbers are formatted appropriately. Here you can check things like thousand separators, the way dates are formatted, currencies notations, and more.Taking care of multiple values: depending on your data visualization tool of choice, you might be able to visualize more than one value per data entry. For this to work, you need to separate the values with a character. A comma usually does the trick, but use another symbol if your values contain commas. Silk.co lets you split on any character on import.OpenRefine: hardcore cleaning timeOpenRefine is a Java powered data cleaning tool that you can run locally and that works within your web browser. When using spreadsheet formulas to clean data doesn't cut it, OpenRefine is your new best friend.We recently did a presentation at Growth Tribe where we demonstrated how to clean a dataset of job offers with OpenRefine.One column in the dataset looked like this, showing the yearly salary and equity together:Job Compensation₹1200k - ₹2200k · 0.0 - 0.25%$3k - $30k · 0.0 - 2.0%$50k - $75k · 0.0 - 1.0%$70k - $90k · No equity5.0 - 10.0%OpenRefine lets you use formulas to split the values into 2 columns, like this:Job CompensationEquity₹1200k - ₹2200k0.0 - 0.25%¥20k - ¥100k0.0 - 2.0%£1k - £2k0.0 - 1.0%₹600k - ₹1000kNo equity5.0 - 10.0%Here were the steps to follow:Pick Edit Column -> Add Column Based on This Column. Name the column "Equity".Use the following expression: value.split(' · ')[-1]. This takes all values that occur before the '·' character in the 'Job Compensation' column, and places them in a new column.Then on the 'Job Compensation' column, pick Edit Cells -> Transform. As the expression, use value.split(' · ')[0]. This removes everything before the '·' character.To clean up values without a currency symbol, pick Facet -> Text Facet. Select all cells with no currency symbol. Then pick Edit Cells -> Transform Use the expression: leave empty.Check out the Silk we made for the presentation to learn more about this particular example, and be sure to also go to OpenRefine's own website. This is just a small example of what is possible with OpenRefine.Learn moreA lot of the tips in this post are repurposed from silk-data-handbook.silk.co, a resource for data journalists who work with Silk.cocsvkit is a suite of command line utilities for converting to and working with CSV files. Not for the faint of heart, but very useful for manipulating large datasets, examining datasets, performing SQL like queries, joining multiple csv files, and much more.There are tons of other great resources on data cleaning and data journalism in general: check out Microsoft's cleaning guide, The School of Data, and Data + Design, to name a few.If you liked this post, follow our Twitter account to get updates on the next installment of this post and more data-greatness.