![]() |
|
It's late, I'm tired. Do my job for me. - Printable Version +- Furiously Eclectic People (http://furiouslyeclectic.com/forum) +-- Forum: Toying with Sardonicism (http://furiouslyeclectic.com/forum/forumdisplay.php?fid=4) +--- Forum: Endless Blanket (http://furiouslyeclectic.com/forum/forumdisplay.php?fid=5) +--- Thread: It's late, I'm tired. Do my job for me. (/showthread.php?tid=402) |
It's late, I'm tired. Do my job for me. - Oedipussy Rex - 10-08-2017 I have a text file. Let's say it's: Code: Amazing Spider-Man 01It isn't, but let's say it is. What I want is a text fileĀ that reads: Code: Amazing Spider-Man 01-05, 07, 09LeadingĀ 0s are optional, but exist in the original file. Also, spaces after commas aren't strictly necessary. I want to do this on a file over 10,000 lines long, so doing it by hand is out of the question. RE: It's late, I'm tired. Do my job for me. - Kersus - 03-31-2018 Did you find a solution? https://www.stashmycomics.com RE: It's late, I'm tired. Do my job for me. - Oedipussy Rex - 12-01-2024 Sneak peak Code: \(((19[0-9]{2}|20[0-1][0-9]|202[0-5])|(19[0-9]{2}|20[0-1][0-9]|202[0-5])(-|\,)[0-9]{2}|(19[0-9]{2}|20[0-1][0-9]|202[0-5])(-|\,)(19[0-9]{2}|20[0-1][0-9]|202[0-5]))\)RE: It's late, I'm tired. Do my job for me. - Oedipussy Rex - 12-08-2024 Right. So here it is. Tailor-made for my specific needs. Kind of. Or not. Quote:Please correct the following errors before continuing: Wall o' text: Code: #!/bin/bashLet's see if I can attach ComicList. Nope. First it didn't like the type of file, so I added a .txt extension, and now it doesn't like the 1.2 MiB size. Anyway. As you can see from the crunchbang, this is a bash script. Bash is a scripting language, not a programming programming language. Trying to use a scripting language as a programming language often leads to frustration, tears, and heartbreak. If you're lucky. If you aren't, well let's not get into it. First issue: all variables are global in scope unless declared local. But all variables are available to any subprocesses called, whether declared local or not. Second issue: Arrays in bash aren't arrays. They are special space- and/or \0-delimited strings. Of no fixed size. And the indices do not need to be sequential. The relatively new associative array (bash4.2+ I think) just adds more fun to the party. Third issue: variables and arrays do not need to be declared. You want new variable? Just assign a value to it. Frustrating when you can't figure out why echo "$Abbrevation" keeps coming up empty when you know for a fact that you made the Abbreviation="dh" assignment the line before. Getting to the script. Yes, I use UpperCamelCase. Deal with it. A remnant from learning Pascal in college. I like to visually differentiate variables from system commands. I use Snake_Case for function names, where appropriate. These are terms I've only recently learned. Nested functions: Only used as a reminder of where a sub-function is called. Functions, like variables, are global, limited only in that a function must either be defined in the text of the script before it is called or has been instantiated. test function: Code: #!/bin/bashI only just, as in within the last 5 minutes, learned that it is possible to make function variables and nested functions truly local by running the function in a subshell -- Function() ( code; ) -- instead of in the current shell -- Function() { code; } -- but that's something I'd have to look into regarding passing arrays by "reference", and something that I probably wouldn't have used for this script because I'm using global variable side-effects that would set my programming instructors' teeth on edge. Is_Issue_Number(): In bash, and presumably sh, zsh, fish, etc., when a function exits, it stores the return value (whether the function ran correctly or not, 0 is true) in the ? variable which is used by 'if' to determine true/false. Code: if Is_Issue_Number "$LastWord" Issue Prefix "$WorkingComic"; then ...Whatever. The idea of the script is that anything between parentheses at the end of the comic book string, of which there may be many or none, is unneeded. Except for the year. The year might be needed, so record the year. Issue numbers will be the last word (characters surrounded by white space) in the string, followed by parentheticals, or followed by one or two dashes ( -/-- ). If you find an issue number and the current title (everything before the issue number) is the same as the last comic you checked, then record the issue number and get a new comic. If it's a different title, then print the old title, all the issue numbers, and the year for that title if there aren't two or more issue numbers. If you do not find an issue number, then this title is by default different from the previous title. Rinse, repeat. Printing out issue numbers: assign first element in array to First and Last. Going through the array, if each number is equal to or one greater than the previous number, then increase Last by 1, otherwise print "$First-$Last," (or "$Last," if equal) (Why $Last and not $First? One fewer characters to type); assign this number to First and Last, and continue through the array. When at the end of the array, do one final printing for the title. That's it. All the rest is taking care of exceptions to the rules. Example: sometimes the last number in a string isn't the issue number for the title. Like "Lovecraft Adaptations 02 - HPL 1920". The previous book is "Lovecraft Adaptations 01 - Beyond the Wall of Sleep", so obviously 1920 isn't an issue number. Or look at Batman. "Batman 910" isn't "Batman 910", it's "Batman v3 145". But you know I'm not going to use "Batman v3 145" because fuck you Warner Brothers, I'm going to use "Batman 910". But the cover reads, "Batman v3 145". So I record it as "Batman 910 - v3 145". That means when the script encounters something like "Batman 910 - v3 145" or "Lovecraft Adaptations 02 - HPL 1920" it needs to check if there is a number followed by a dash (or two) further left in the string. Good job, everyone. Well done. Great. But what about "Spider-Man 2099 - Dark Genesis 01"? Whoops. So now we need to check for exceptions to the exceptions. Weakness of the script: Garbage in/garbage out. The script reads one line at a time looking neither forward nor backward in the input file. If the file isn't well-sorted, the output is going to be a mess. Let's take a look at a couple Batman titles: Batman Family and Batman - Family. Using the sort command, we get Batman Family 01 Batman - Family 01 Batman Family 02 Batman - Family 02 etc. Using the -V switch results in Batman Family 01 Batman Family 02 ... Batman - Family 01 Batman - Family 02 ... Good. That's what we want. But it does nothing for Batman and Batman 80-Page Giant: Batman 043 Batman 047 Batman 80-Page Giant 01 Batman 80-Page Giant 02 Batman 80-Page Giant 03 Batman 80-Page Giant (2010) Batman 80-Page Giant (2011) Batman 081 Batman 085, etc. Same with Alien and Alien 3; Batman Beyond and Batman Beyond 2.0; Harbinger Wars and Harbinger Wars 2; Marvel Zombies, Marvel Zombies 2, 3, 4, and 5. But it's late again. On the 50,000+ line file, it takes almost 10 minutes. Did I mention that bash is slow as hell? Bash is slow as hell. RE: It's late, I'm tired. Do my job for me. - Oedipussy Rex - 12-09-2024 Right. So I added Code: Prepare_Input_for_Processing() {at the top of the functions, and Code: Prepare_Input_for_Processingjust before Code: while IFS= read -r Comic; doas well as creating an array, ProblematicTitles, where each element is one of the problematic titles, a few of which are mentioned above. What Prepare_Input_for_Processing does it it takes all of the issues from ProblematicTitles and puts them on the top of the ComicList text file so that they are no longer in the middle of the issues they were breaking up. Then I finish the script with a re-sorting of ComicList and OutputFile: sort -o "$OutputFile" "$OutputFile" sort -Vo "$ComicList" "$ComicList" Sure, Prepare_Input_for_Processing is short enough to have been written inline, plus it's just a one-off, not really justifying placing it in a function, but it's cleaner. Also, did a little more investigation into running functions in a subshell and there's no way I'm going to attempt that for this project. From what I can tell, there is no passing by reference. Yes, the function will take the reference and throw errors if using the same name for the variable, but because the subshell makes copies of all parameters sent to it, any changes made to any variables by the function, whether global or passed by value or reference, are not reflected when the function exits, which is the entire purpose of subshells regarding global variables, but completely defeats the purpose of passing by reference (which isn't a true pass by reference). As far as I can tell, the only way you can get values back is with the return value (very limited, only the digits 0-255, I believe, maybe a smaller range) and command substitution (the now deprecated backtick `` thingy). I don't even want to start to figure out how to make that work with an associative array. RE: It's late, I'm tired. Do my job for me. - Oedipussy Rex - 07-26-2025 Found another issue. What do these two sets have in common? Batman - Earth One Batman - Earth One v02 Batman - Earth One v03 Aliens vs. Predator - Three World War (scan) Aliens vs. Predator - Three World War 01 Aliens vs. Predator - Three World War 02 Aliens vs. Predator - Three World War 03 Aliens vs. Predator - Three World War 04 Aliens vs. Predator - Three World War 05 Aliens vs. Predator - Three World War 06 They both cause Code: elif (( 10#${IssueList[$i]} == 10#$Last ||to say "Up yours, a-hole," exit the while read loop, and end the script. The problem is $(( 10#$Last + 1 )), which is trying to add 1 to a non-numeric string. I saw two solutions to this issue: Write a patch with a new array called ReallyProblematicTitles, which simply removes the really problematic titles, then adds them back in after all processing was done, OR don't print them to the list file like that. Guess which I chose. Batman - Earth One is now Batman - Earth One v01 and Aliens vs. Predator - Three World War (scan) is Aliens vs. Predator - Three World War (2011) (scan). Okay, the third way would be to correct the code but, realistically, who does that? |