In this post I’m going to show you how to work with text files and turn them into something useable within Excel. Sometimes you may have a report that you want to work with that’s in text format instead of Excel. Plus, it’s designed to be a report with many pages with column headings on each page. Trying to work with this in Excel can be challenging. I’m going to show you how I’ve worked with these in the past.
I’m going to use some data I pulled from a government site. This file is a text file listing of weather stations. Here’s the file if you would like to follow along.
To open this file in Excel you will need to use the Open, Browse command then navigate to the file’s folder. In the lower right of the Browse box, click on the dropdown for file type. The default on mine says “All Excel Files”. Change this to all files. You should now be able to see the text file, click on that to open the file.
Excel will bring you into the Text Import Wizard. In the first step make sure the original data type is Fixed Width and leave the other options the same. Then click “Next >”. In step two we’ll be setting the places where Excel will start a new column. It’s best to use the scroll bar on the right to get down to the actual data you’re going to be working with. In this case move down until you can see “Alaska” at the far left and then data below it. At this point you would look at the data and put in column breaks to separate the various data fields. I put my breaks in at the following positions: 3, 20, 26, 32, 39, 47, 55, 62, 65, 68, 71, 74, 77, 79, and 81. Then click “Next >”. (If you do this for another spreadsheet, make sure to write down these numbers, because there are cases where you need to recreate what you did.) For this spreadsheet we don’t need to do anything for step 3, so you can click on “Finish”.
In the spreadsheet that opens scroll down to row 41. You will see Alaska spread over two columns followed by the data with column headings. To make this easier to work with let’s get rid of the rows above this that we don’t need. Delete rows 1 through 40. Now we have the data we want to work with. First of all, we’re going to put in the missing column headings in column O and P. In cell O2 type in “Priority” and in cell P2 type in “CNTRY CODE”.
Most people don’t know the two-letter code for all fifty states, much less the Canadian provinces and other countries that are use later. Let’s add a column where we can pull in the heading with that data into each row. First insert a column at the beginning of the spreadsheet. In cell A2 type in “Location”.
Now in cell A3 we need a formula that will return “Alaska” or whatever names are used later on. You can see in row one Alaska is spread between two columns. So, well concatenate (meaning put them together) them. In cell A3 put in the formula B1&C1. This will put the text from these two cells together. But you’ll notice that if you increase the width of column A, that there is also a 1 in that text. We don’t want that; it belongs with the date in the next couple of columns. Counting the number of spaces, we find that come before the number gives us 16. We want a formula that will concatenate cell B2 and first 16 characters of C2. The function LEFT can be used to give us the formula to do that. We’re also going to want to get rid of the extra spaces after the end of the name but before the number. The function TRIM will do this. Change the formula in cell A2 to TRIM(B1&LEFT(C1,16)). *[I’m not going to explain how the formula works as you can get that information from your Excel help.] That gives a result of “ALASKA”.
Now to extend this to cover the whole spreadsheet. We can’t just copy the formula down all the way to the end. That would only work for the first row of each location. We could use a formula to pick up the result from the cell above each cell (For example cell A4 would have a formula =A3). But that wouldn’t work when the location changes. What we really need is a formula that will either pick up the result from the cell above or if the location changes, use the formula we put into cell A3. If you’re thinking an IF function might work, you are correct. But first we need to figure out how to tell when the location changes.
Scroll down to the spot where Alaska changes to Alabama, row 211. This way we can see what happens when the location changes and figure out how to use that in our formula. I’m looking at the columns to the left of the location and date and seeing that there empty. Let’s use that. Return to the top. In cell A4 we’ll use the following formula: =IF(H2=””,TRIM(B2&LEFT(C2,16)),A3). This formula should accomplish what we wanted to do.
Unfortunately, upon examining the results, there are some stations where the location is blank. Investigating shows that the locations in the US, Canada and Mexico all have column headings while the others that are all further down in the spreadsheet do not. Looks like we need to adjust our formula. For locations that do not have a column heading we want to adjust the formulas to refer to a different row. Setting up a data filter on the spreadsheet, we can filter on one of the column heading titles and see that the last column heading is in row 4653. We can add another IF function to do this. So the new formula for cell A4 would be: IF(ROW()>4653,IF(H3="",TRIM(B3&LEFT(C3,16)),A3),IF(H2="",TRIM(B2&LEFT(C2,16)),A3)).
Copy this formula to the end of the data. You can page through the spreadsheet to verify the formula has worked. Use save as and save the spreadsheet as an excel file. Just in case you won’t have to reenter the formulas if there is a problem later. Also, if the file you worked with changes, such as a report that covers a different month, you can just copy the formulas rather than recreating everything.
Now we need to eliminate the rows that do not contain data. This is easy to do. Because we’re going to eliminate rows this will mess up the results of our formulas in column A. We can change those formulas to data, so we don’t have to worry about that problem. Highlight all the data in column A and copy it. Now while that data is still highlighted do a paste, special, values to change everything to data instead of formulas.
Next delete row 1. Now the column headings are the first row. The next step is going to involve sorting and to make sure we can get the data back in its original order easily we can add a column with the current order. In cell R1 type in “Row”. In cell R2 put in the formula =ROW(). Copy that formula to the end of the spreadsheet. Then copy and paste special values to change the formula to numbers.
Next click on Sort. In the popup box make sure the box next to “My data has headers” is checked. Then set the sort by to CNTRY CODE and sort A to Z. Moving to the end of the country codes you will see all the rows after that are the ones we want to eliminate. You can delete all these rows to the end of the spreadsheet. Then if the spreadsheet is resorted A to Z for column R we have all the data in its original order.
I wanted to show you this so you could get the general concepts. Sometimes data isn’t in an easy to use format. But if you can convert it to Excel you have something that is easier to work with.
Trees and Forest Challenge Answers
All requests that mention Trump: Find these by clicking on the drop down for Request Description and typing trump into the search box above the list at the bottom. The resulting filter should show 4 items. Tracking numbers FY19-25,31,37 and 45.
The quickest responses: Find these by first clicking on the drop down for Final Reply Date and in the list at the bottom uncheck (Blanks). This will remove all of the large negative number items from Days. Then click on the drop down for Days. In the list at the bottom uncheck Select All then click on zero to select that one. The resulting filter should show 4 items. Tracking numbers FY19-03, 57, 58, 59 and 82. My guess is that these took zero days because there was nothing to report for these items.
The final reply date entry error: This one was a little trickier to find. When you looked for the quickest responses you might have noticed that after eliminating items that didn’t have a response date there was still one negative item in Days. Selecting that pulls up Tracking Number FY19-60. The final reply date is 3/12/18 which is nearly a year prior to the date received. This probably had the year entered wrong and should have been 3/12/19.