Silflay Hraka

5/23/2002




Why they do that.

They do that because it is their lot in life. You may as well ask Andrew Sullivan why he pisses people off.

:1,$s/^.*MP3 // and :1,$s/ .*:... /-/ are what is known as regular expressions. Put simply, they are patterns. Put simply, the sentence before this one is a lie. They contain regular expressions, but they also contain other stuff. I can get away with this because all the real technical people read Camworld, and only come round here at two-thirty in the morning, after all the bars close, looking to get laid. And no, they don't respect me in the morning. Hell, they don't respect me at two-thirty.

When I took this line;

AFROM~19 MP3 4,975,989 08-23-01 11:07p AfroMan - Because I Got High.mp3,

and got this line;

4,975,989-AfroMan - Because I Got High.mp3,

it was because I specified a pattern that first matched AFROM~19 MP3, and then matched 08-23-01 11:07p. Once they got matched, the rest of the command deleted them. I could have just typed "AFROM~19 MP3" but that would have only worked once. The pattern has to be loose enough to find something on every line in the document, but tight enough so that it's not deleting the information I want to keep. With your kind permission, I'll parse the first one.

: actually isn't part of the "reg ep", as they call it in the biz..... Well, I think they do. I do, certainly, and the fact that long-haired Star Trek freaks in need of a bath look at me funny when I do doesn't mean a THING! I CAN BE GEEK CHIC TOO! I KNOW WHAT 42 MEANS, I CAN QUOTE MONTY BLOODY PYTHON, I ....

Ahem. : is actually a command in Unix's vi editor that allows you to issue commands against all or part of he document you are working on. And it's pronounced "vee-aye", not "six", as I know, to my great and enduring sorrow. Typing : allows me to proceed with the rest of the command.

1,$ - says that I am going to be searching through the entire document, from the first character (1), to the last ($). Not actually part of the regular expression.

s - substitute. It says to replace the first thing in the pattern with the second thing. Also not actually part of the regular expression.

^.*MP3 - the first thing in the pattern. The / in front of it essentially says "here begins the first thing". The / after it says "Here endeth the first thing. Right after me begins the second thing". The pattern it is searching for says "Match anything that is found at the beginning of a line (^), is followed by one or more characters(.*) and ends in MP3 (MP3) followed by a space ( )". Actually the regular expression.

/ - the second thing. Well, there is no thing there. The / means "Here endeth the second thing. Since there is nothing there, I've essentially deleted whatever fits the pattern. Again, not actually part of the regular expression.

You may ask, why isn't there just a delete command? There is. I didn't learn it until after i had fallen into the "replace with nothing" habit, so I hardly ever use it.

:1,$s/ .*:... /-/ is more of the same, other than the pattern it is looking for . It parses as "Match anything that is one or more characters(.*), followed by colon(:) followed by exactly three characters and a space (... ), and repace all that with dash (-).

Phear my L33t skillz.


Postscript: First time visitor to House Hraka? Wondering if everything we produce could possibly be as brilliant/stupid/evil/pedantic/insipid/inspired as the post you just read? Check out the Hraka Essentials, the (mostly) reader-selected guide to Hraka's best posts, and decide for yourself. Also, you're currently at the old site. Fresh Hraka is posted every day at our current location.

5/21/2002




Stroking data.

Well, massaging data, really. I'd rather just skip the massage and go directly to getting my data drunk, but that's no way to form a lasting relationship. If you recall, I claimed at the end of the last post that my data looked a lot like this;

THEWIG~1 MP3 1,867,904 04-06-02 12:10p The Wiggles - Teddy Bear Hug.mp3
THEWIG~2 MP3 1,966,080 04-06-02 4:42p The Wiggles - The Monkey Dance.mp3
THEWIG~3 MP3 1,370,112 04-06-02 1:24p The Wiggles - Rock A Bye Bear.mp3
THEWIG~5 MP3 2,327,220 04-06-02 5:14p The Wiggles - Uncle Noah's ark.mp3

It's a damned lie. That data is neat, clean and ready to commit. It exists, mind you, but it's embedded in much nastier data, stuff that looks so;

AFROM~19 MP3 4,975,989 08-23-01 11:07p AfroMan - Because I Got High.mp3
ALIENA~1 MP3 8,372,224 09-01-01 8:58a Alien Ant Farm - Smooth criminal.mp3
IRISHR~1 MP3 3,406,064 09-05-01 9:33p Irish Rovers - Finnegans Wake.mp3
SKANDA~8 MP3 5,148,967 04-03-00 12:07a Skandalous All-Stars - Radio Free Europe.mp3
JOESTR~1 MP3 4,270,834 09-06-01 10:01p Joe Strummer & The Mescaleros - Sandpaper Blues.mp3
DUBLIN~2 MP3 4,082,695 09-05-01 9:38p Dubliners & Pogues - Whiskey In The Jar.mp3
AFRIKA~6 MP3 6,153,323 03-30-00 4:17p Afrika Bambaataa & Soul Sonic Force - Planet Rock.mp3

Before, everything was lined up neatly, ready for slotting into Access. Now it almost lines up, but i'm not playing horseshoes or hand-grenades. Close isn't good enough. Access will allow you to import data from a text file two different ways. You either specify the data in the file is of a fixed width or that it is delimited in some manner. Delimited means that some character, say Buddy Ebsen, pops up in between every piece of important data in a record. Each of these pieces is called a field. If the Afroman song above was delimited by Buddy Ebsen, it'd look like this;

AFROM~19 MP3Buddy Ebsen4,975,989Buddy Ebsen08-23-01 11:07pBuddy EbsenAfroManBuddy EbsenBecause I Got High.mp3

I'd be happy with the size of the file, 975,989, the name of the artist, Afroman, and the song, Because I Got High.mp3. Since Buddy isn't here to assist me, I've got to figure out someway to put a delimiter into each of the 4000+ lines of this file. Let's examine Afroman more closely;

AFROM~19 MP3 4,975,989 08-23-01 11:07p AfroMan - Because I Got High.mp3

My life is a little easier because I don't want all the fields. I need to make the line above look like this;

4,975,989-AfroMan - Because I Got High.mp3

Once it does, the dash is the delimiter, and I am re-goldenized

Once again, doing it by hand is not an elegant solution, so that's out. I do know of a way to do this, I think. And, just like yesterday, it can't be done in windows. Or, if it can be done in Windows, I don't know how to do it. And i've tried, believe me.

The first thing I do it is connect to one of the UNIX servers at work. UNIX can do anything. Once I'm there I cut and paste all 4000 or so records into a an open file. I used UNIX's vi editor to create the file and I'll use some of the functionality it has to start replacing data en masse. If you'd like a small taste of what vi is like, create a file in Notepad without ever using the mouse to do anything. This includes opening and closing Notepad. However, I can replace a lot of stuff with just a few keystrokes. These, in fact;

:1,$s/^.*MP3 // and :1,$s/ .*:... /-/ give me this

4,975,989-AfroMan - Because I Got High.mp3

Woo-hoo!

Tomorrow, why they do that.


Postscript: First time visitor to House Hraka? Wondering if everything we produce could possibly be as brilliant/stupid/evil/pedantic/insipid/inspired as the post you just read? Check out the Hraka Essentials, the (mostly) reader-selected guide to Hraka's best posts, and decide for yourself. Also, you're currently at the old site. Fresh Hraka is posted every day at our current location.

5/20/2002




So, this has been on my mind for a while. I've got like 4300 mp3 files, from cds I own, of course, and I want to list them in an Access database so when I...go out to buy more cds I'm not getting duplicate songs. Also, it'd be convenient to be able to sort them by size, so when I get ready to burn a cd, I dont end up with wasted space, or have to drop songs at the last minute.

The problem being that entering them by hand is incredibly labor intensive, and there's nothing in the windows drop-downs that says anything like "copy directory listings to clipboard". Try to copy and paste the listings, and the bloody OS either refuses to do it, or attempts to insert links to 4000+ files into a word doc.

If I can get the raw data into a text file, I'm golden. I spent years at net32.com putting the Word and Excel files typed up by the drunken monkeys who ran our vendors into Access, then moving them to the production db. Trouble is, up until now I've not been able to figure it out. There's probably a windows tool somewhere that does it, but I've never found one at download.com. Likely there is a correct term for such a tool, but I don't know it, so my searches are in vain.

So, like the fellow says, I turned the desire over to my subconscious and let it stew there. Eventually I realize that ms-dos, being at least superficially like Unix, might support command line i/o redirection. Everyone more technical than me solved this problem in the second paragraph and left, so I'll explain what command line redirection is.

Essentially, Unix lets you take the output of a command, like "date" and mess with it, should you desire to do so. Here's what the "date" command and the output thereof looks like unmolested.

$ date

Mon May 20 20:32:36 EDT 2002


Now, should I for some reason not want the actual output to appear on my screen, I can tell it to go elsewhere. Here I tell the date to go to Hell.

$ date >> Hell
$

Note that what I get back is not the date, but a command prompt. The date is writhing in agony as gibbering demons, excuse me, daemons, poke it in the unmentionables with red-hot forks.

Well, no. What has actually happened is that the Unix shell has taken the command and put it into a file called Hell. It doesn't matter if the file was there beforehand. If Unix looks around and doesn't see Hell, it creates it, and sticks the output from the date command into it.

So what Sub has told me to do is to bring up a ms-dos window, go to the mp3 folder and run the "dir" command, like this;

C:\WINDOWS>cd ..

C:\>cd mp3s

C:\mp3s>dir >> temp1.txt

The only things I typed above were "cd..","cd mp3s" and "dir >> temp1"

C:\WINDOWS>, C:\> and C:\mp3s> are ms-dos versions of the command prompt.They tell me where I am at any particular moment. Unix can be set to do the same thing.

I start out in the WINDOWS folder, change directory (cd) to the C: folder above it. (The two dots are just a command line convention than mean "folder above this one". It's more correctly known as the parent folder. One dot means "this folder". No dots means "you forgot the rest of the command". Well, no it doesn't. It actually takes you back to the directory you started from, your "home" directory. Then I cd into the mp3s directory and run the "dir" command (short for directory). This normally lists all of the files in a particular folder, but I've redirected ">>" them into a file I've called temp1.txt. I added the .txt so that I can go back to windows and open it up with Word or Notepad.

It looks something like this;

THEWIG~1 MP3 1,867,904 04-06-02 12:10p The Wiggles - Teddy Bear Hug.mp3
THEWIG~2 MP3 1,966,080 04-06-02 4:42p The Wiggles - The Monkey Dance.mp3
THEWIG~3 MP3 1,370,112 04-06-02 1:24p The Wiggles - Rock A Bye Bear.mp3
THEWIG~5 MP3 2,327,220 04-06-02 5:14p The Wiggles - Uncle Noah's ark.mp3

Woo-hoo!

Next, stroking data


Postscript: First time visitor to House Hraka? Wondering if everything we produce could possibly be as brilliant/stupid/evil/pedantic/insipid/inspired as the post you just read? Check out the Hraka Essentials, the (mostly) reader-selected guide to Hraka's best posts, and decide for yourself. Also, you're currently at the old site. Fresh Hraka is posted every day at our current location.

Home