In this post I’m going to show you how to work with text files and turn them into something useable within Excel. Sometimes you may have a report that you want to work with that’s in text format instead of Excel. Plus, it’s designed to be a report with many pages with column headings on each page. Trying to work with this in Excel can be challenging. I’m going to show you how I’ve worked with these in the past. I’m going to use some data I pulled from a government site. This file is a text file listing of weather stations. Here’s the file if you would like to follow along.
To open this file in Excel you will need to use the Open, Browse command then navigate to the file’s folder. In the lower right of the Browse box, click on the dropdown for file type. The default on mine says “All Excel Files”. Change this to all files. You should now be able to see the text file, click on that to open the file. Excel will bring you into the Text Import Wizard. In the first step make sure the original data type is Fixed Width and leave the other options the same. Then click “Next >”. In step two we’ll be setting the places where Excel will start a new column. It’s best to use the scroll bar on the right to get down to the actual data you’re going to be working with. In this case move down until you can see “Alaska” at the far left and then data below it. At this point you would look at the data and put in column breaks to separate the various data fields. I put my breaks in at the following positions: 3, 20, 26, 32, 39, 47, 55, 62, 65, 68, 71, 74, 77, 79, and 81. Then click “Next >”. (If you do this for another spreadsheet, make sure to write down these numbers, because there are cases where you need to recreate what you did.) For this spreadsheet we don’t need to do anything for step 3, so you can click on “Finish”. In the spreadsheet that opens scroll down to row 41. You will see Alaska spread over two columns followed by the data with column headings. To make this easier to work with let’s get rid of the rows above this that we don’t need. Delete rows 1 through 40. Now we have the data we want to work with. First of all, we’re going to put in the missing column headings in column O and P. In cell O2 type in “Priority” and in cell P2 type in “CNTRY CODE”. Most people don’t know the two-letter code for all fifty states, much less the Canadian provinces and other countries that are use later. Let’s add a column where we can pull in the heading with that data into each row. First insert a column at the beginning of the spreadsheet. In cell A2 type in “Location”. Now in cell A3 we need a formula that will return “Alaska” or whatever names are used later on. You can see in row one Alaska is spread between two columns. So, well concatenate (meaning put them together) them. In cell A3 put in the formula B1&C1. This will put the text from these two cells together. But you’ll notice that if you increase the width of column A, that there is also a 1 in that text. We don’t want that; it belongs with the date in the next couple of columns. Counting the number of spaces, we find that come before the number gives us 16. We want a formula that will concatenate cell B2 and first 16 characters of C2. The function LEFT can be used to give us the formula to do that. We’re also going to want to get rid of the extra spaces after the end of the name but before the number. The function TRIM will do this. Change the formula in cell A2 to TRIM(B1&LEFT(C1,16)). *[I’m not going to explain how the formula works as you can get that information from your Excel help.] That gives a result of “ALASKA”. Now to extend this to cover the whole spreadsheet. We can’t just copy the formula down all the way to the end. That would only work for the first row of each location. We could use a formula to pick up the result from the cell above each cell (For example cell A4 would have a formula =A3). But that wouldn’t work when the location changes. What we really need is a formula that will either pick up the result from the cell above or if the location changes, use the formula we put into cell A3. If you’re thinking an IF function might work, you are correct. But first we need to figure out how to tell when the location changes. Scroll down to the spot where Alaska changes to Alabama, row 211. This way we can see what happens when the location changes and figure out how to use that in our formula. I’m looking at the columns to the left of the location and date and seeing that there empty. Let’s use that. Return to the top. In cell A4 we’ll use the following formula: =IF(H2=””,TRIM(B2&LEFT(C2,16)),A3). This formula should accomplish what we wanted to do. Unfortunately, upon examining the results, there are some stations where the location is blank. Investigating shows that the locations in the US, Canada and Mexico all have column headings while the others that are all further down in the spreadsheet do not. Looks like we need to adjust our formula. For locations that do not have a column heading we want to adjust the formulas to refer to a different row. Setting up a data filter on the spreadsheet, we can filter on one of the column heading titles and see that the last column heading is in row 4653. We can add another IF function to do this. So the new formula for cell A4 would be: IF(ROW()>4653,IF(H3="",TRIM(B3&LEFT(C3,16)),A3),IF(H2="",TRIM(B2&LEFT(C2,16)),A3)). Copy this formula to the end of the data. You can page through the spreadsheet to verify the formula has worked. Use save as and save the spreadsheet as an excel file. Just in case you won’t have to reenter the formulas if there is a problem later. Also, if the file you worked with changes, such as a report that covers a different month, you can just copy the formulas rather than recreating everything. Now we need to eliminate the rows that do not contain data. This is easy to do. Because we’re going to eliminate rows this will mess up the results of our formulas in column A. We can change those formulas to data, so we don’t have to worry about that problem. Highlight all the data in column A and copy it. Now while that data is still highlighted do a paste, special, values to change everything to data instead of formulas. Next delete row 1. Now the column headings are the first row. The next step is going to involve sorting and to make sure we can get the data back in its original order easily we can add a column with the current order. In cell R1 type in “Row”. In cell R2 put in the formula =ROW(). Copy that formula to the end of the spreadsheet. Then copy and paste special values to change the formula to numbers. Next click on Sort. In the popup box make sure the box next to “My data has headers” is checked. Then set the sort by to CNTRY CODE and sort A to Z. Moving to the end of the country codes you will see all the rows after that are the ones we want to eliminate. You can delete all these rows to the end of the spreadsheet. Then if the spreadsheet is resorted A to Z for column R we have all the data in its original order. I wanted to show you this so you could get the general concepts. Sometimes data isn’t in an easy to use format. But if you can convert it to Excel you have something that is easier to work with. Trees and Forest Challenge AnswersAll requests that mention Trump: Find these by clicking on the drop down for Request Description and typing trump into the search box above the list at the bottom. The resulting filter should show 4 items. Tracking numbers FY19-25,31,37 and 45. The quickest responses: Find these by first clicking on the drop down for Final Reply Date and in the list at the bottom uncheck (Blanks). This will remove all of the large negative number items from Days. Then click on the drop down for Days. In the list at the bottom uncheck Select All then click on zero to select that one. The resulting filter should show 4 items. Tracking numbers FY19-03, 57, 58, 59 and 82. My guess is that these took zero days because there was nothing to report for these items. The final reply date entry error: This one was a little trickier to find. When you looked for the quickest responses you might have noticed that after eliminating items that didn’t have a response date there was still one negative item in Days. Selecting that pulls up Tracking Number FY19-60. The final reply date is 3/12/18 which is nearly a year prior to the date received. This probably had the year entered wrong and should have been 3/12/19.
1 Comment
[Note: Apologies if you visited on Wednesday looking for a post. I usually write in the evenings and we had a power outage last night. My posts will be up by Thursday mornings.] What’s that old saying about not being able to see the forest and the trees, or is it the trees and the forest? OK, Google tells me it’s “see the forest for the trees” with the first known use being in 1546. In this post I’d like to show you how to take a forest of Excel data and pull out the trees that interest you. You might have a large spreadsheet with a lot of data that you’ve summarized and graphed to show the forest. But sometimes you’ll need to look at specific trees where there’s a part of the forest that’s not in line with expectations. That’s where data filters can help you. First, we need a spreadsheet to work with. I like to work with real data; it’s a lot easier than trying to develop sample data. So, I have used the Advanced Search function in Google to find a spreadsheet that we can use for practice. I searched for Excel files that were published on websites the had a domain of “.gov”. I used the dot gov filter to make sure I was getting something in the public domain. I found a spreadsheet of freedom of information act (FOIA) requests to the US Trade Representative. You could do the same. Or you could just use my copy here:
Let’s open the file and take a look at it. We’ll work with just the first tab that is named Intake. Let’s set up the data filters so that we’re able to pull out just certain records to view. This is the easy part. First click on any cell within the data that you can see. Then click the filter icon. If you put this on your quick access toolbar (as I discussed here: https://www.tkanecpa.com/blog/excels-quick-access-toolbar) all you have to do is click on the icon that looks like a funnel (pictured on the left).Or if you haven’t set this up on your toolbar you can find it on the Data ribbon. Alternatively, because there’s usually multiple ways to do things in Excel, on your keyboard you can simultaneously press the CTRL, Shift and “L”. Excel automatically figures out the top row for your data and how and the data to include. Dropdown arrows appear on the headings in row 3. Now you have filters set up. Let’s see who’s been requesting data. Click on the dropdown arrow in column D, Requester Organization Name. You’ll see a menu followed by a list appear. We’ll try looking at just one item. In the list at the bottom click on the check box next to Select All. This will remove the checks from all the boxes. Now click on the check box next to the first item and we should see only the AFL-CIO’s requests. But that didn’t work. You will see the AFL-CIO data request, but there’s a purple row followed by all these other requests. What’s going on? Remember back when you clicked the filter icon and I said Excel would automatically figure out what data to include. It turns out that the method it uses to figure that is to assume that an empty row or column is the end of the data. So, if there’s an empty row in the data (even if it’s colored purple) it assumes the data ended with the row above that. In order to fix this, we’ll need to turn the filters off. Just do the same steps again that you used to turn to filters on and they will turn off. There are a couple ways to correct this. The first would be to just delete the blank row. Which in this case is fine, because looking at the data I can see that there is only the one blank row. That’s what I’m going to do as we work with this data. So, go to row 15 and remove it. (Right click on any purple cell and pick Delete from the menu that pops up. On the next menu pick Entire Row and then OK. The row is deleted.) You can now turn the filters back on. The other method is better if you’re dealing with a lot of data and don’t want to find all the empty rows or columns and delete them. You will need to choose all the data from the top to bottom and side to side. With this spreadsheet you would start at with the column headings at cell A3. From there highlight the range A3 through G122. With that area highlighted turn the filters on. The filters will now include all the data. Now filtering for just the AFL-CIO will show just their one request. You will also notice that there are a lot more Requesters to choose from when you look at the list. Now that we have our filters, let’s work with this data a little bit. Let’s say we want to look at how long it takes to get a response from a FOIA request and be able to filter that. We’ll start by setting up the formula to calculate how many days it takes. Let’s insert a column for that data. Move your cursor up to the top of the spreadsheet to the column headings and put it on the “G”. Right click your mouse and on the menu that pops up click insert. A new column appears, and it is already filtered. (We could have just put our formulas in column H, then we would have had to turn the filter off and on again to pick up the new column.) Move to cell G3 and give the column a title. I’m calling it “Days”. Now move down one cell to G4. We’ll just do a simple formula to give us the difference between the “Final Reply Date” and the “Date Received”. In cell G4 enter the formula =H4-B4. The date 12/19/1905 appears. This is due to the way Excel treats dates, if we convert this to a number it will give us the days between the two dates. I just click on the comma icon on my quick access toolbar to do this and it becomes 2,180.00. Now we can extend the formula to all the data. I do this by picking cell H4. It now has a border around it that includes a small filled in square in the lower right-hand corner. Holding the mouse over the border it will change to a black cross with arrowheads on each direction. Moving it over the small square the arrowheads will disappear. Double clicking at this point will extend the formula to the end of the data. We now have our forest of data, can we use filters to look at some trees? Now if you left click on the G in the column heading, you’ll see some stats – average, sum and count of the data in column G. But how can the average number of days be negative? Look at row 5, the AFL-CIO again. This request doesn’t have a final date so the amount in column G is a big negative number. We can correct that. Leaving column G highlighted, click on the dropdown for column H, then on the list at the bottom unclick the check mark in the box marked (Blanks). This will filter out the requests that have no final date. Now the average response time changes to 81.56 days.
Now, try clicking on the dropdown for column G. Instead of filtering on the list of numbers, let’s try something different. Just above the list you’ll see the menu choice “Number Filters”, click on that. You’ll see some choices that you can use to look for some data. Click on Top 10, then click “OK” on the box that appears. Only the ten requests that took the longest time to finalized will display and the average time for these is 460 days. Let’s try something else. Start with nothing filtered. So, click on the dropdowns for columns G & H then click on the “Clear Filter” menu choice. You’ll notice that when the data is filtered that the row numbers on the left of the spreadsheet change to blue. When your filters are cleared, they will change back to black. Now click on the dropdown for “Request Description”. Just above the list of items there is a search box. You can type terms into that box to use as a filter. Try typing NAFTA into that box. There should be three items that pop up. You can see from playing around with this that filters can be a powerful tool to use when looking at data. Now a challenge for you. Using this spreadsheet find the following: All requests that mention “Trump”. The five requests that received the quickest responses. The final reply date that was probably an entry error. I’ll post the answers in my next entry. In my career I’ve had to deal with a lot of numbers. I started my work career just as spreadsheets and the personal computer were beginning to be used. So, I remember doing things by hand that are now done with spreadsheets. I’ve been using Excel for over 20 years as it came to become the standard for all spreadsheets. You might call me an expert, but I know there’s still a lot that I don’t know about Excel. What I do know, I’ve picked up over time from seminars, books, magazines, internet searches and sometimes just clicking on things to see what happens. I’d like to share some of what I know. Today’s post will show you where to find some things that are hidden in Excel. I’ll also show you a tool to use to that will cut down on the amount of clicks and mousing to use Excel commands that you use the most. The first thing I’d like to share with you is using Excel’s quick access toolbar. Microsoft did a good job of grouping the various commands together by different categories in the ribbons. But when I first started using the ribbons, I found myself jumping between ribbons for the various commands. That led me to start using the quick access toolbar. Let me introduce you to what you can do with this. If you haven’t used your quick access toolbar, it probably looks something like this. It’s the part up in the green band. You may have clicked on the dropdown arrow. When you do that, you’ll see a list of different commands. Some will be checked, and some will not. The checked ones are the commands that show up on the toolbar. And that’s how I started using this. Then I found some additional things that turned it into a good tool for me. The first thing did was look at the dropdown menu. The very last item on the list lets you change the placement of the menu to above or below the ribbons. I changed mine to appear below. My reasoning was that if it’s truly quick access then I want to access it quicker. There ended up being a side benefit to being below also. If you can add more commands before you run out of space when it’s below. This is where the second from the last item on the dropdown menu comes into play. This is called “More Commands…” and it lets you add in more than just the ones listed in in the dropdown menu. When you click on this, you’ll get a box that looks like this. From here you’ll be able to modify your quick access toolbar. You can see from the menu at the side that this box can also be accessed from within Excel Options. As is frequently the case, there are multiple ways to do things within Excel to accomplish the same task. Another example of this would be at the bottom of the box towards the left side, there is another way to move your toolbar to below the ribbon. On the left side at the top there is a dropdown labeled “Choose commands from:”. The default is a list of popular commands. If the command you’re looking for isn’t in this list, you can click the dropdown and choose which ribbon tab the command belongs in and find it that way. When you click this dropdown, also notice there is a category for commands that aren’t on any ribbons. Hidden Excel. Once you’ve picked and arranged your commands you can click on OK and your toolbar will be updated. Then your Excel may look something like mine. Here’s what my quick access toolbar. In order here’s what’s on my quick access. Save Separator New File Open Save As (If you’re wondering why this isn’t put together with Save, I have a reason. If I’m working on a spreadsheet making modifications that I want to save but still retain the original file, the last thing I want to do is accidentally click on Save instead of Save As.) Separator Quick Print Print Preview Separator Undo Redo Separator Select Visible Cells (A command that doesn’t show up in the ribbons. It allows you to highlight a range of cells to copy that include hidden cells without having the hidden cells show up when you paste.) Copy Paste Paste Value Separator Borderd Fill Color Font Color Separator Comma Style Percent Style Decrease Decimal Increase Decimal Separator Autosum Evaluate Formula Spelling Separator Add or Remove Filters Subtotal Sort Ascending Sort Descending Sort Separator Freeze Panes Set Zoom to 100% Zoom to Selection Arrange All Separator Insert Page Break Camera Just above the OK button there is a button that lets you import and export your customizations. This is a way you can save any customizations you make. It’s helpful when you switch computers to be able to quickly set up your toolbar again. You can make a copy of the export file that is saved an import it on another computer. I have a file of my toolbar. I have placed it here if you want to download it to use for yourself or use as a base to modify. Just be aware that if you import this file if you already have any customizations of the toolbar or the ribbons, you will lose those.
Thanks for reading, Tim
Welcome. I’ll be using this space to blog about business from a financial viewpoint. My goal is to post every other week. But that will depend on my time available. For now, I’m going to shoot for every other Wednesday.
In part, how often I post is up to you. My intention is for this blog is going to be reader supported. If you find my posts to be of value, you can make a contribution. I’ll evaluate how much time I’ll allocate to writing this blog based on how much support is received. Click on the PayPal button below to contribute to this blog. Thanks in advance for your support. |