CertCities.com -- The Ultimate Site for Certified IT Professionals
Post Your Mind in the CertCities.com Forums Share share | bookmark | e-mail
  Microsoft®
  Cisco®
  Security
  Oracle®
  A+/Network+"
  Linux/Unix
  More Certs
  Newsletters
  Salary Surveys
  Forums
  News
  Exam Reviews
  Tips
  Columns
  Features
  PopQuiz
  RSS Feeds
  Press Releases
  Contributors
  About Us
  Search
 

Advanced Search
  Free Newsletter
  Sign-up for the #1 Weekly IT
Certification News
and Advice.
Subscribe to CertCities.com Free Weekly E-mail Newsletter
CertCities.com

See What's New on
Redmondmag.com!

Cover Story: IE8: Behind the 8 Ball

Tech-Ed: Let's (Third) Party!

A Secure Leap into the Cloud

Windows Mobile's New Moves

SQL Speed Secrets


CertCities.com
Let us know what you
think! E-mail us at:



 
 
...Home ... Editorial ... Columns ..Column Story Saturday: April 5, 2014


 Inside the Kernel  
Emmett Dulaney
Emmett Dulaney


 Fun With 'sed' and 'awk'
Last week, Emmett walked you through stream editing using "sed." This time, "awk" gets a little more face time in a scripting exercise using ISBN numbers.
by Emmett Dulaney  
10/15/2007 -- Last month, I wrote that two of my favorite tools in the scripting toolbox are sed (the stream editor) and awk (the in-a-rush programming language). I also offered up a problem in need of a solution -- a database of books (named "books") resembling:

0743477103:Macbeth:Shakespeare, William
1578518520:The Innovator's Solution:Christensen, Clayton M.
0321349946:(SCTS) Symantec Certified Technical Specialist:Alston, Nik
1587052415:Cisco Network Admission Control, Volume I:Helfrich, Denise

This year, there was a change to the ISBN numbering system, which uniquely identifies each book. Prior to the beginning of this year, ISBN numbers were 10 digits long and included an identifier for the publisher and a unique number for each book (as shown in the previous column). As of January 2007, ISBN numbers are now 13 digits long for new books. Books published prior to the first of this year) have both the old 10-digit and the new 13-digit number that can be used to identify them. To account for this, the database needs to stay exactly as it is, but with a new, fourth field added to the end of each entry holding the ISBN-13 number.

To come up with the ISBN-13 number for the existing entries in the database, add "978" to the beginning, then use the first nine digits of the old ISBN number (the old 10th digit was a checksum). The new 13th digit is a mathematical calculation (a newfangled "check digit") obtained by doing the following:

1. Add all the odd-placed digits together.
2. Add all the even-placed digits together and multiply by 3.
3. Add the total of step #2 to the total of #1.
4. Find out what you need to add to round the number up to the nearest 10. This value becomes the 13th digit.

For example, consider the first entry in the database and the 10-digit ISBN of 0743477103. It first becomes 978074347710, then:

1. 9+8+7+3+7+1=35
2. 7x3=21; 0x3=0; 4x3=12; 4x3=12; 7x3=21; 0x3=0; 21+0+12+12+21+0=66
3. 66+35=101
4. 110-101=9

The ISBN-13 thus becomes: 9780743477109. And the first line of the database should look like this:

0743477103:Macbeth:Shakespeare, William:9780743477109

The example that follows accomplishes this goal. It's not the prettiest thing ever written (as I mentioned in the previous column, I'm a spaghetti coder at best), but it tackles this problem using awk and sed. I've also included writing to temporary files so you can examine those files to see the contents at various stages. Clean programming would mitigate the use of temporary files everywhere possible, but that sometimes makes it difficult to follow the action. That said, here's one (of, I'm sure, dozens) solution to the problem.

Step 1: Pull Out the ISBN
Given the current database, the first order of business is to pull out the existing ISBN -- the first nine digits only, since the 10th digit no longer matters -- and slap "978" onto the beginning. The nine digits we need are the first nine characters of each line, so they can be pulled out using the cut utility:

cut –c1-9 books

Because a mathematical operation will be performed on the numbers comprising this value that works with each digit, I chose to add a space between each of the numbers:

sed 's/[0-9]/& /g'

Now, it's time to add the new code to the beginning of each entry (the start of every line):

sed 's/^/9 7 8 /'

And, finally, I added an extra step of removing the white space at the end of the line -- just to make the entry a bit cleaner:

sed 's/ $//'

The results were then written to a temporary file that can we can check to make sure everything is working as it should. The full first step then becomes:

cut –c1-9 books | sed 's/[0-9]/& /g' | sed 's/^/9 7 8 /' | sed 's/ $//' > isbn2

Note that the sed operations can be combined in a script file to increase speed and decrease cycles, but I'm choosing here to walk through each operation step-by-step to show what's going on. I'm not worrying about creating script files for a one-time-only operation.

Examining the temporary file, the contents of it now are:

9 7 8 0 7 4 3 4 7 7 1 0
9 7 8 1 5 7 8 5 1 8 5 2
9 7 8 0 3 2 1 3 4 9 9 4
9 7 8 1 5 8 7 0 5 2 4 1

Step 2: Calculate the 13th Digit
The first 12 digits of the ISBN number are now done. What's left is to take each of those 12 digits and mathematically compute them in order to figure out the 13th value. With a space between the numbers, they can now be interpreted by awk as fields. The calculation will take several steps:

1. Add all the odd-placed digits together: x=$1+$3+$5+$7+$9+$11
2. Add all the even-placed digits together and multiply by 3: y=($2+$4+$6+$8+$10+$12)x3
3. Add the total of step #2 to the total of #1: x=x+y
4. Find out what you need to add to round the number up to the nearest 10; compute the modulo when divided by 10, and then subtract it from 10. The following awk command gets almost everything in place except the transformation:
awk '{ x=$1+$3+$5+$7+$9+$11 ; y=$2+$4+$6+$8+$10+$12 ; y=y*3 ; x=x+y ; y=x%10 ; print y }'

With this, everything is done -- the computation is complete to obtain the modulo, etc. -- except actually subtracting the final result from 10. This is hardest part. If the modulo is 7, naturally the check digit becomes 3. If, however, the modulo is 0, the check digit does not become 10 (10-0), but stays 0. The best solution that I can come up with is using the transform function of sed:

sed 'y/12346789/98764321/'

Combining the two operations into one, the second step becomes:

awk '{ x=$1+$3+$5+$7+$9+$11 ; y=$2+$4+$6+$8+$10+$12 ; y=y*3 ; x=x+y ; y=x%10 ; print y }' | sed 'y/12346789/98764321/' > isbn3

The contents of the temporary file now are:

9
4
1
5

Step 3: Add the 13th Digit to the Other 12
We can now combine the two temporary files to get the correct 13-digit ISBN number. Just as cut was used earlier, paste can be used now to combine them. The default delimiter for paste is a tab, but that can be changed to anything with the –d option. I chose to use a space as the delimiter, and then strip out all the spaces with sed (remember, the isbn2 file has spaces between each digit so they could be read as fields):

paste –d" " isbn2 isbn3 | sed 's/ //g'

The only other thing to do is add a colon as the first character of each entry, which will make it easier to append this to the existing file. This is accomplished with:

sed 's/^/:/'

And the whole command becomes:

paste –d" " isbn2 isbn3 | sed 's/ //g' | sed 's/^/:/' > isbn4

The contents of the temporary file now are:

:9780743477109
:9781578518524
:9780321349941
:9781587052415

Step 4: Finish It All Up
The only operation that remains is appending the values in the temporary file to the current database. In this case, I'll use the default tab delimiter and then strip it out. Technically, a colon can be specified as the delimiter and the last part of Step 3 avoided, but I'd rather have my value complete there and be confident that I'm stripping out characters that don't belong (tabs) instead of run the risk of adding more than I should. The final command is:

paste books isbn4 | sed 's/\t//g' > newbooks

The final file looks like this:

0743477103:Macbeth:Shakespeare, William:9780743477109
1578518520:The Innovator's Solution:Christensen, Clayton M.:9781578518524
0321349946:(SCTS) Symantec Certified Technical Specialist:Alston, Nik:9780321349941
1587052415:Cisco Network Admission Control, Volume I:Helfrich, Denise:9781587052415

Again, there are undoubtedly dozens of (cleaner) ways to accomplish this result, but it does give a quick-and-dirty illustration how to use two tools -- sed and awk -- which can come in handy for one-time operations...or, for that matter, whenever you have better things to do than spend a long time writing a complex program to perform a simple task.


Emmett Dulaney is the author of several books on Linux, Unix and certification. He can be reached at .

 


More articles by Emmett Dulaney:

-- advertisement --


There are 35 CertCities.com user Comments for “Fun With 'sed' and 'awk'”
Page 1 of 4
7/1/13: louisvuittonttoutlet.com from [email protected] says: ths louisvuittonttoutlet.com http://www.louisvuittonttoutlet.com
7/5/13: christianlouboutinoutleta.com from [email protected] says: ths christianlouboutinoutleta.com http://www.christianlouboutinoutleta.com
7/5/13: guccioutletstore-online.com from [email protected] says: good share. guccioutletstore-online.com http://www.guccioutletstore-online.com
7/25/13: Snapback Hats Online from [email protected] says: thank you for share! Snapback Hats Online http://www.discount-snapbackhats.com/
8/8/13: OakleyHohbrookSungla from [email protected] says: sunglass earns zero cost turbo-charge... from a civic concept business Oakley Hohbrook Sunglasses http://www.cheapoakleyglassesusa.com
8/11/13: toms sale from [email protected] says: The Astonishing " inside info " Of shoes toms sale http://www.tomsoutlets-usa.com
8/14/13: Mac Makeup Promotion from [email protected] says: The makeup truth your mother and father doesn't want one to find out about Mac Makeup Promotion http://www.makeup-wholesaleronline.com
8/15/13: OakleySunglassesChea from [email protected] says: Far too Occupied To Handle sunglass ? Oakley Sunglasses Cheap http://www.usa-fakeoakleys.com
8/16/13: nike basketball from [email protected] says: Rumours which experts state shoes pulls to a end, ill tell you the follow-up nike basketball http://www.runontheway.com/
8/16/13: nike running shoes from [email protected] says: Targeted prospects takes the bling on shoes nike running shoes http://www.runontheway.com/
First Page   Next Page   Last Page
Your comment about: “Fun With 'sed' and 'awk'”
Name: (optional)
Location: (optional)
E-mail Address: (optional)
Comment:
   

-- advertisement (story continued below) --

top