How to read all lines of a file into a bash array

This blog post has received more hits than I had anticipated. It’s enough that I decided to revise it to improve the quality of the code (that people appear to be using). The original code examples were specifically written to explain the effects of IFS on bash parsing. The code was not intended to be explicitly used as much as it was to illustrate a point. It was also created in a proprietary embedded environment with limited shell capabilities, which made it archaic. The original post follows this update. Read it if you’re interested in IFS and bash word splitting and line parsing. Click here for a thorough lesson about bash and using arrays in bash.

There are two primary ways that I typically read files into bash arrays:

Method 1: A while loop

The way I usually read files into an array is with a while loop because I nearly always need to parse the line(s) before populating the array. My typical pattern is:

declare -a myarray
let i=0
while IFS=$'\n' read -r line_data; do
    # Parse “${line_data}” to produce content 
    # that will be stored in the array.
    # (Assume content is stored in a variable 
    # named 'array_element'.)
    # ...
    myarray[i]="${array_element}" # Populate array.
    ((++i))
done < pathname_of_file_to_read

Here’s a trivial example:

#!/bin/bash
declare -a myarray

# Load file into array.
let i=0
while IFS=$'\n' read -r line_data; do
    myarray[i]="${line_data}"
    ((++i))
done < ~/.bashrc

# Explicitly report array content.
let i=0
while (( ${#myarray[@]} > i )); do
    printf "${myarray[i++]}\n"
done

Method 2: mapfile aka readarray

The most efficient (and simplest) way to read all lines of file into an array is with the ‘readarray’ built-in bash command. I use this when I want the lines to be copied verbatim into the array, which is useful when I don’t need to parse the lines before placing them into the array. Typical usage is:

declare -a myarray
readarray myarray < file_pathname # Include newline.
readarray -t myarray < file_pathname # Exclude newline.

Here’s a trivial example:

#!/bin/bash
declare -a myarray

# Load file into array.
readarray myarray < ~/.bashrc

# Explicitly report array content.
let i=0
while (( ${#myarray[@]} > i )); do
    printf "${myarray[i++]}\n"
done

There are several options for the readarray command. Type ‘man bash’ in your terminal and search for readarray by typing ‘/readarray’.


Original post

 

By default, the bash shell breaks up text into chunks by separating words between white space characters, which includes new line characters, tabs, and spaces.  This action is called parsing.  You can control how bash breaks up text by setting the value of the bash built in “IFS” variable (IFS is an acronym for Internal Field Separator).  Bash will use each individual character in the IFS variable to decide when to break up text rather than using all characters as a whole.  So, for example, setting IFS to space and tab and new line, i.e. ‘ \t\n’, will cause bash to break up text every time it finds any combination of those three characters – not just when it finds the one combination of space followed by tab followed by new line.

Setting the value of a bash built in variable requires a different syntax than setting the value of a regular (non built in) variable.  The right hand side of the assignment must be prefixed with the ‘$‘ character.  Here is how to set IFS to the new line character, which causes bash to break up text only on line boundaries:

IFS=$’\n’

And here is a simple bash script that will load all lines from a file into a bash array and then print each line stored in the array:

# Load text file lines into a bash array.
OLD_IFS=$IFS
IFS=$'\n'
lines_ary=( $(cat "./text_file.txt") )
IFS=$OLD_IFS

# Print each line in the array.
for idx in $(seq 0 $((${#lines_ary[@]} – 1))); do
line=”${lines_ary[$idx]}”
printf “${line}\n”
done

While the code above works fine, it is not very efficient to store a text file in a bash array.  Programmers new to bash often want to do this and aren’t aware that it isn’t necessary.  An alternative solution is to simply parse on the fly so no array is required, like so:

# Load text file lines into a bash array.
OLD_IFS=$IFS
IFS=$'\n'
for line in $(cat "./text_file.txt"); do
printf "${line}\n"
done
IFS=$OLD_IFS

If you need to keep track of line numbers, just count lines as you parse them:

# Load text file lines into a bash array.
OLD_IFS=$IFS
IFS=$'\n'
let line_counter=0
for line in $(cat "./text_file.txt"); do
let line_counter=$(($line_counter+1))
printf "${line_counter}: ${line}\n"
done
IFS=$OLD_IFS

Be aware that changing IFS in the scripts shown above only affects IFS within the context of those scripts. If you want to change IFS in the context of your running bash shell and all sub-shells (and other child processes) that it spawns then you will need to export it like this:

IFS=$'\n'
export IFS

– or -
export IFS=$'\n'

And if you want this change to be system wide (not recommended) then you need to put this into /etc/environment or /etc/profile, or whatever is appropriate for your system configuration.

References

http://mywiki.wooledge.org/BashFAQ/001
http://mywiki.wooledge.org/BashFAQ/005
http://tldp.org/LDP/abs/html/internalvariables.html
http://www.gnu.org/software/bash/manual/bashref.html#Word-Splitting

Disclaimer:

Please keep in mind that the references listed above know WAY MORE than me. As of this post I’ve only been bash scripting for about three months and I only do it on occasion – like maybe once every three weeks – to solve some IT or embedded development issue. My posts are only meant to provide quick [and sometimes dirty] solutions to others in situations similar to mine. If you really want to be good at bash scripting then spend some quality time with the official documentation or a good book.

About these ads

, , , , , , , , , , ,

  1. #1 by David on April 27, 2011 - 4:23 pm

    Thanks, that’s very cool! It can be used to prepend a FIL1 to FIL2 without an intermediary file:

    L="$( wc -l $FIL1 )" L=$[L-1] OLD_IFS=$IFS IFS=$'\n'
    A=($(cat "$FIL1")) IFS=$OLD_FS
    for n in `seq $L -1 0` ; do
    line="${A[$n]}"
    sed -i "1i$line" $FIL2
    done

    Cheers,
    Dave

    • #2 by lhunath on November 17, 2013 - 6:45 pm

      There are too many bugs in this code for me to go into, pretty much every line is buggy in some way.

      If you want to concatenate two files, the right way to do it is with `cat`:

      cat “$file1″ “$file2″ > “$file3″

      Also, your claim of “without an intermediate file” is false, you’re making LOADS of intermediate files, one for EACH LINE in FIL1, in fact. Since that’s what sed -i does.

  2. #3 by bleh on October 4, 2011 - 9:46 am

  3. #4 by guysoft on January 1, 2012 - 9:43 am

    Hey,
    Just so you know, its a pain to get this to work on Mac OS X because there is no seq there

    • #5 by lhunath on November 17, 2013 - 6:37 pm

      You shouldn’t be using seq anywhere. Ever. The above code is junk.

      In bash 4, do:

      readarray -t myarray < file

      In bash 3, do:

      i=0; while IFS= read -r myarray[i++]; do :; done < file

  4. #6 by Saf' on June 12, 2013 - 2:47 pm

    I LOVE YOU.

    • #7 by Saf' on June 12, 2013 - 2:47 pm

      For this :

      # Load text file lines into a bash array.
      OLD_IFS=$IFS
      IFS=$’\n’
      lines_ary=( $(cat “./text_file.txt”) )
      IFS=$OLD_IFS

      # Print each line in the array.
      for idx in $(seq 0 $((${#lines_ary[@]} – 1))); do
      line=”${lines_ary[$idx]}”
      printf “${line}\n”
      done

      yes i’m noob in bash.

      • #8 by lhunath on November 17, 2013 - 6:41 pm

        Don’t do that. You’re unwittingly pathname expanding all the lines in your file. The post is loaded with bugs.

        In bash 4, do:

        readarray -t myarray < file

        In bash 3, do:

        i=0; while IFS= read -r myarray[i++]; do :; done < file

        To iterate the array, you do:
        for line in "${lines[@]}"; do printf '%s\n' "$line"; done

  5. #9 by lhunath on June 12, 2013 - 5:26 pm

    This is all bad and broken code. You’re poisoning all your readers.

    bash 4: readarray -t array < file
    bash 3: while IFS= read -r line; array+=("$line"); done < file

    That's it. Delete all the other crap above, it will result in a huge range of bugs.

    • #10 by peniwize on June 12, 2013 - 7:06 pm

      This post originated from needing to explain how IFS impacts parsing to a few coworkers (back when I wrote it). The code in this article was not intended to be used verbatim in production solutions. It’s simply illustrative and intended to explain a concept to [C/C++] software engineers new to bash who are trying to learn how bash works – not necessarily the best/ideal way to use it. This is why I have the references and disclaimer at the end of the article. It’s a bit harsh for you to claim that I’m poisoning readers. Give people some credit. They can think for themselves.

      • #11 by lhunath on June 12, 2013 - 7:32 pm

        It’s not really harsh, it’s just true. Also, I’ve been an operator of the #bash freenode channel long enough to be able to tell you with full confidence that you can *not* give people enough credit to think their way out of the bugs in this code. Heck, just look at the comments above.

        I’m certain your post originated from a good cause, and had the best of intentions. But the fact of the matter remains: People who know nothing about wordsplitting, quoting, and arrays read your code and copy it verbatim. The biggest issue with that is that bash is so lax that it doesn’t tell you your code is horribly buggy until you are lucky enough to catch it suddenly misbehaving without causing *too* much damage, and at a time that you have the time to fix the code and aren’t pressing for an immediate deadline relying on code to just work.

        I recommend you update your post and re-iterate the points you hoped to make in a way that is correct.

        I find it slightly disheartening that you link to articles describing word-splitting but fail to have learned anything from them. Also, please don’t link to the ABS, the same argument applies to that guide. It *looks* advanced; but it’s filled with negligence and bugs; and poisons its readers just as much as this post: Readers that trust that the code they read is re-usable, while in fact it is dangerous to do so.

        I already gave you good code. Here’s some additional good references:
        http://mywiki.wooledge.org/Arguments
        http://mywiki.wooledge.org/Quotes
        http://mywiki.wooledge.org/BashGuide
        http://mywiki.wooledge.org/BashSheet

        As for IFS, I highly recommend you NEVER modify it in script-scope; ONLY scoped to a command (eg. IFS= read), then you don’t need to worry about changing default bash parsing behaviour and undoing your changes to IFS.

      • #12 by peniwize on June 13, 2013 - 1:51 am

        You make good points. I can’t argue the point about how people will interpret the article or what they’ll do with the code. I suspect you’re right – especially with your lengthy experience in IRC. I imagine you’ve seen just about everything. I’ll update the article sometime in the future when I have the time.

        I know my use of IFS seems bazaar and potentially buggy and I agree that it’s safest when used in the context of a command, such as read. That is almost exclusively how I use it. However, the abridged code in this article expected IFS to be changed and I expected that those reading this article would read the references and gain a deeper understanding. Part of the reason why I used IFS explicitly in the code above is to show that it can be done since so many people have documented stuff like: “while IFS=$’\n’ read -r line; do …; done” One thing that wasn’t immediately obvious to my colleagues new to bash was that ‘read’ was the command, so I went a different route with the article. Perhaps it was a bad idea to post this code in the wild. I dunno. None of my colleagues were led astray by it. (Full disclosure: they’re all senior software engineers.)

        Please consider that this article was written so that I would not have to reexplain the same things to several people, not necessarily to teach the world. I put it on the Internet for convenience and future reference, not because I think I’m Mr. bash or because I have a strong need to try to educate the world about bash. As I said in the article, I’m no bash expert and I don’t claim to be. I’ve learned a tremendous amount since I originally wrote the article, and I’ve implemented some sophisticated bash scripts, but I still don’t claim to be an expert and don’t typically write large scale utilities in bash (e.g. more than a couple thousand lines). If you have more references that you would like posted, please reply again and I’ll make sure they get posted. Thanks for the four you provided.

      • #13 by lhunath on November 17, 2013 - 6:38 pm

        Seeing as you keep getting replies, it means people keep reading your crap and thinking it’s the way to do it. If you have any responsibility, fix your post or delete it.

  6. #14 by Tiamarchos on November 4, 2013 - 10:33 pm

    Thank you so much for this bit of code. I was looking for it for a week now. God bless you!

    • #15 by lhunath on November 17, 2013 - 6:40 pm

      You can also thank him for teaching you bugs.

      In bash 4, do:

      readarray -t myarray < file

      In bash 3, do:

      i=0; while IFS= read -r myarray[i++]; do :; done < file

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: