CertCities.com -- The Ultimate Site for Certified IT Professionals
Free CertCities.com Newsletter via E-mail Share share | bookmark | e-mail
  Microsoft®
  Cisco®
  Security
  Oracle®
  A+/Network+"
  Linux/Unix
  More Certs
  Newsletters
  Salary Surveys
  Forums
  News
  Exam Reviews
  Tips
  Columns
  Features
  PopQuiz
  RSS Feeds
  Press Releases
  Contributors
  About Us
  Search
 

Advanced Search
  Free Newsletter
  Sign-up for the #1 Weekly IT
Certification News
and Advice.
Subscribe to CertCities.com Free Weekly E-mail Newsletter
CertCities.com

See What's New on
Redmondmag.com!

Cover Story: IE8: Behind the 8 Ball

Tech-Ed: Let's (Third) Party!

A Secure Leap into the Cloud

Windows Mobile's New Moves

SQL Speed Secrets


CertCities.com
Let us know what you
think! E-mail us at:



 
 
...Home ... Editorial ... Columns ..Column Story Saturday: April 5, 2014


 Inside the Kernel  
Emmett Dulaney
Emmett Dulaney


 Setting the Stage with Stream Editing
Emmett walks you through an exercise using the "sed" tool.
by Emmett Dulaney  
9/15/2007 -- Two of my favorite tools in the Unix/Linux toolbox are sed and awk. While sed is the "stream editor" and awk is a quick programming language, the truth of the matter is that they complement each other so well that I rarely use one without the other.

I recently had the occasion to work with two examples that were similar enough to require one or both of these tools, but were unrelated in scope. Both examples, however, show the beauty and power of what these tools can do. I'll be the first to admit that I'm not the cleanest programmer in the world -- in fact, the term "spaghetti code" was invented to describe my work -- but one of the advantages of these tools is that it's possible to accomplish what you need to without thinking too much about how the code looks.

I'll walk through the first example this month, then set the stage for the second example and discuss it next month. The following are sample lines of a colon-delimited employee database that includes five fields: unique ID number, name, department, phone number, address:

1218:Kris Cottrell:Marketing:219.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:219.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:21974 Unix Way
1221:Anne Heltzel:Finance:219.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:219.555.5555:984 Bash Lane

This database has been in existence since the beginning of the company and has since grown to include everyone who now works, or has ever worked, for the company. Given that, a number of proprietary scripts read from the database and the company can't afford to be without it. The problem is that the telephone company has changed the 219 prefix to 260 and all entries in the database need to be changed.

This is precisely the task for which sed was created. As opposed to standard (interactive) editors, a stream editor works its way through a file and makes changes based on the rules it was given. The rule, in this case, is to change "219" to "260." It is not quite that simple, however; if you use the command

sed 's/219/260/'

the result won't be completely what you want (changes have been bolded):

1218:Kris Cottrell:Marketing:260.555.5555:123 Main Street
1260:Nate Eichhorn:Sales:219.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:26074 Unix Way
1221:Anne Heltzel:Finance:260.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:260.555.5555:984 Bash Lane

The changes in the first, fourth and fifth lines are correct, which only produces a 60 percent accuracy rate. In the second line, the first occurrence of "219" is changed to "260," and this appears in the employee ID number rather than in the phone number. If you wanted to change more than the very first occurrence in a line, you could slap a "g" (for global) into the command:

sed 's/219/260/g'

That is not what you want to do in this case, however; the employee ID number shouldn't change. Similarly, in the third line, no change at all should be made since the employee doesn't have this telephone prefix. Nevertheless, a change was made erroneously to their addresses since they both contain the value that's being searched for.

The first rule of using sed is to identify what makes the location of the string you're looking for unique. If the telephone prefix were encased in parentheses, it would be much easier to isolate. That's not the case in this database, though, and the task becomes a bit more complicated.

In this case, you could say that it must appear at the beginning of the field (denoted by a colon) and get a result which is much closer:

sed 's/:219/:260/'

Again, bolding has been added to the changes:

1218:Kris Cottrell:Marketing:260.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:260.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:26074 Unix Way
1221:Anne Heltzel:Finance:260.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:260.555.5555:984 Bash Lane

The accuracy has now increased to 80 percent, but there's still the problem of the third line. As the colon helped to identify the start of the string, it may be tempting to turn to the period to identify the end:

sed 's/:219./:260./'

But the result still isn't what you want. Notice the third line:

1218:Kris Cottrell:Marketing:260.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:260.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:260.4 Unix Way
1221:Anne Heltzel:Finance:260.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:260.555.5555:984 Bash Lane

Since the period has a special meaning of any character, a match is found to the search whether the 219 is followed by a period itself, a "7" or any single character. Whatever that character happens to be, it gets replaced with a period. There's no problem with the replacement side of things, but the search needs to be tweaked. By using the \ character, it's possible to override the special meaning of the period and specify that you are indeed looking for a period and not any single character:

sed 's/:219\./:260./'

The result becomes:

1218:Kris Cottrell:Marketing:260.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:260.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:21974 Unix Way
1221:Anne Heltzel:Finance:260.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:260.555.5555:984 Bash Lane

And the mission is accomplished.

The second example involves a database of books that includes the ISBN numbers of each title. Prior to the beginning of this year, ISBN numbers were 10 digits and included an identifier for the publisher and a unique number for each book. As of January, ISBN numbers are now 13 digits long for new books. Old books (those published prior to the first of this year) have both the old 10-digit and a new 13-digit number. For this example, the existing 10-digit number will stay in the database and a new field will be added to the end of each entry holding the ISBN-13 number.

To come up with the ISBN-13 number for the existing entries in the database, you start with "978" then use the first nine digits of the old ISBN number. The 13th digit is a mathematical calculation (a "check digit") obtained by doing the following:

  1. Add all the odd-placed digits together.
  2. Multiply all the even-placed digits by 3 and add them together.
  3. Add the total of Step 2 to the total of Step 1.
  4. Find out what you need to add to round the number up to the nearest ten. This value becomes the 13th digit.

For example, consider the 10-digit ISBN of 0743477103. It first becomes 978074347710. Then:

  1. 9+8+7+3+7+1=35
  2. 7*3=21 ; 0*3=0; 4*3=12; 4*3=12; 7*3=21; 0*3=0; 21+0+12+12+21+0=66
  3. 66+35=101
  4. 110-101=9. The ISBN-13 thus becomes: 9780743477109

The beginning database resembles

0743477103:Macbeth:Shakespeare, William

And you want the resulting database to resemble

0743477103:Macbeth:Shakespeare, William:9780743477109

Next month, we'll look at how to create a script to generate this. In the meantime, feel free to play around with it and see if you can come up with your own way to do so. Hint: sed will not do it all, and is only one part of the toolbox.


Emmett Dulaney is the author of several books on Linux, Unix and certification. He can be reached at .

 


More articles by Emmett Dulaney:

-- advertisement --


There are 11 CertCities.com user Comments for “Setting the Stage with Stream Editing”
Page 1 of 2
12/15/13: bottes ugg pas cher from [email protected] says: because this document is a legal contract, You agree that you will only access the Service through the interfaces we provide. where 22 homes would be lost along Riverdale Road. more reliable east-west transit than buses and spur redevelopment near stations. the irritated Smurf starts to view the world and the current mission involving his pint-size friends through rose-colored glasses.Grumpys upbeat outlook must be contagious, But I am optimistic that we will get beyond these.The implications of these shifts are tremendous." Funny, Admittedly.
12/16/13: air jordan pas cher from [email protected] says: 1980. IBM asks Gary Kildall to invent an operating system for the future PC, to be called OS/2. Kildall refuses, and then makes matters worse by flying around and around IBM headquarters in a biplane taunting the IBMers about 'their silly blue shirts'. IBM hires Bill Gates to blow Kildall out of the sky with an anti-aircraft gun, and as a token of gratitude for accomplishing this successfully hands over the rights to all computing technology forever.
12/16/13: air jordan pas cher from [email protected] says: such as visiting a farm, berry picking, gardening, fishing or stopping at the local farmer's market for fresh produce. This is much better than carnival food that seems to be available at every summer festival and the symbol of summer eating fun.
12/17/13: bottes ugg from [email protected] says: with an emphasis on cybersecurity," sums up the spirit of the program: "It is better to have reported overzealously than never to have reported at all.these maps explain why you see a group like Enroll America focusing its work on 10 states, broken down by income level:The top map shows the population that is likely to qualify for Medicaid coverage, he said he was looking for a place that could comfortably accommodate his mom,Also in :capitalist. what's evident from the stories that were written about Romney's work with the Mormon church is a) it was and is a huge part of his life and b) his actions were,When the history of the Christian church of the late 20th and early 21st centuries is written the suffering of their fellow believers in the places that saw the birth of their faith more than two-thousand years ago. Authorities are urging residents to take the following precautions to prevent thefts: Keep garage doors closed and dont leave bikes outside for extended periods of time "Thieves may be watching and may target unused bikes" police said Invest in a hardened steel U-lock Consider installing motion sensor lights or alarms on your property Take a photo of your bike and record the serial number Register your bicycle on the Report the stolen item
12/19/13: ugg pas cher from [email protected] says: La plate-forme Web fran ou cloud computing, SkyDrive.-- A voir aussi --Crédit photo : alphaspirit - Shutterstock. Twitter veut entretenir et développer le potentiel de ses équipes dingénieurs.Mo (une est fournie avec lappareil).t sympathique. sur le principe de la curation. en 15 langues, Connaissez-vous le taux de sauvegardes réussies dans votre entreprise?le résultat est impressionnant. 10 ou 20 Kbytes,6 pouce facilitant la visualisation des photos sur la carte dans le cadre dune impression directe sans PC.le monochrome. au , pour une somme estimée à 100 millions de dollars.
12/21/13: air jordan pas cher from [email protected] says: CheckPoint va intégrer à ses offres grand public ZoneAlarm (ZoneAlarm Internet Security et ZoneAlarm Extreme Security) le système de RSA de détection en temps réel et en continu des menaces en ligne?Au niveau des onglets,Mozilla Firefox 22 est disponible l'asile en accord avec ses lois et ses obligations internationales.La Russie accueille lAméricain Edward Snowden en répondant favorablement à sa demande dasile politiqueAvant même son déploiement,Lors dune concertation avec les principaux acteurs concernés, conduite arcade et stratégie Healer et Constructor), Si vous traversez la manche.
12/23/13: isabel marant chaussures from [email protected] says: ?l Le Gra? travaille pour sa femme. des bahuts en bois dans la salle à manger, "Le point de non-retour est atteint", optimisation fiscale, "un proche sur le plan politique et amical" de M. ajoutant qu'ils devaient être validés "par le conseil des Ministres et sur proposition du ministre de l'Economie et des Finances. Blier,Le film avec Bernard Blier. isabel marant chaussures http://www.ramh.net/FR/isabelmarants/
12/30/13: christian louboutin homme pas cher from [email protected] says: The non-turbo Regal base model starts at $26,245, while my test car cost $31,975.
1/7/14: Coach Outlet Online Store from [email protected] says: )Bloque les pirates et sécurise vos données personnellesAVG Anti-SpamBloque les expéditeurs de courrier indésirable et les escrocsNAVIGATION,Ce numéro était reconnu et compris par l'appareil qui proposait alors de passer un appel, Play Movies, Goojet, la plate-forme Eyeka, affiche aussi une certaine satisfaction en se désengageant de SFR (une participation jugée non stratégique), affiche 21,En théorie, du nom de cette initiative lancée en 2010 et qui vise à déployer un réseau optique à 1 Gbit/s,La visionneuse vous propose de visualiser vos images en vignette.
1/8/14: nike tn requin pas cher from [email protected] says: 333-hp, 3.0-liter V-6 ( hybrid)
First Page   Next Page   Last Page
Your comment about: “Setting the Stage with Stream Editing”
Name: (optional)
Location: (optional)
E-mail Address: (optional)
Comment:
   

-- advertisement (story continued below) --

top