Archive for April, 2011

How to read all lines of a file into a bash array

This blog post has received more hits than I had anticipated. It’s enough that I decided to revise it to improve the quality of the code (that people appear to be using). The original code examples were specifically written to explain the effects of IFS on bash parsing. The code was not intended to be explicitly used as much as it was to illustrate a point. It was also created in a proprietary embedded environment with limited shell capabilities, which made it archaic. The original post follows this update. Read it if you’re interested in IFS and bash word splitting and line parsing. Click here for a thorough lesson about bash and using arrays in bash.

There are two primary ways that I typically read files into bash arrays:

Method 1: A while loop

The way I usually read files into an array is with a while loop because I nearly always need to parse the line(s) before populating the array. My typical pattern is:

declare -a myarray
let i=0
while IFS=$'\n' read -r line_data; do
    # Parse “${line_data}” to produce content 
    # that will be stored in the array.
    # (Assume content is stored in a variable 
    # named 'array_element'.)
    # ...
    myarray[i]="${array_element}" # Populate array.
done < pathname_of_file_to_read

Here’s a trivial example:

declare -a myarray

# Load file into array.
let i=0
while IFS=$'\n' read -r line_data; do
done < ~/.bashrc

# Explicitly report array content.
let i=0
while (( ${#myarray[@]} > i )); do
    printf "${myarray[i++]}\n"

Method 2: mapfile aka readarray

The most efficient (and simplest) way to read all lines of file into an array is with the ‘readarray’ built-in bash command. I use this when I want the lines to be copied verbatim into the array, which is useful when I don’t need to parse the lines before placing them into the array. Typical usage is:

declare -a myarray
readarray myarray < file_pathname # Include newline.
readarray -t myarray < file_pathname # Exclude newline.

Here’s a trivial example:

declare -a myarray

# Load file into array.
readarray myarray < ~/.bashrc

# Explicitly report array content.
let i=0
while (( ${#myarray[@]} > i )); do
    printf "${myarray[i++]}\n"

There are several options for the readarray command. Type ‘man bash’ in your terminal and search for readarray by typing ‘/readarray’.

Original post


By default, the bash shell breaks up text into chunks by separating words between white space characters, which includes new line characters, tabs, and spaces.  This action is called parsing.  You can control how bash breaks up text by setting the value of the bash built in “IFS” variable (IFS is an acronym for Internal Field Separator).  Bash will use each individual character in the IFS variable to decide when to break up text rather than using all characters as a whole.  So, for example, setting IFS to space and tab and new line, i.e. ‘ \t\n’, will cause bash to break up text every time it finds any combination of those three characters – not just when it finds the one combination of space followed by tab followed by new line.

Setting the value of a bash built in variable requires a different syntax than setting the value of a regular (non built in) variable.  The right hand side of the assignment must be prefixed with the ‘$‘ character.  Here is how to set IFS to the new line character, which causes bash to break up text only on line boundaries:


And here is a simple bash script that will load all lines from a file into a bash array and then print each line stored in the array:

# Load text file lines into a bash array.
lines_ary=( $(cat "./text_file.txt") )

# Print each line in the array.
for idx in $(seq 0 $((${#lines_ary[@]} – 1))); do
printf “${line}\n”

While the code above works fine, it is not very efficient to store a text file in a bash array.  Programmers new to bash often want to do this and aren’t aware that it isn’t necessary.  An alternative solution is to simply parse on the fly so no array is required, like so:

# Load text file lines into a bash array.
for line in $(cat "./text_file.txt"); do
printf "${line}\n"

If you need to keep track of line numbers, just count lines as you parse them:

# Load text file lines into a bash array.
let line_counter=0
for line in $(cat "./text_file.txt"); do
let line_counter=$(($line_counter+1))
printf "${line_counter}: ${line}\n"

Be aware that changing IFS in the scripts shown above only affects IFS within the context of those scripts. If you want to change IFS in the context of your running bash shell and all sub-shells (and other child processes) that it spawns then you will need to export it like this:

export IFS

– or –
export IFS=$'\n'

And if you want this change to be system wide (not recommended) then you need to put this into /etc/environment or /etc/profile, or whatever is appropriate for your system configuration.



Please keep in mind that the references listed above know WAY MORE than me. As of this post I’ve only been bash scripting for about three months and I only do it on occasion – like maybe once every three weeks – to solve some IT or embedded development issue. My posts are only meant to provide quick [and sometimes dirty] solutions to others in situations similar to mine. If you really want to be good at bash scripting then spend some quality time with the official documentation or a good book.


, , , , , , , , , , ,


How to fill unused drive space with zeros in Linux

Filling unused drive space with zeros is useful for cleaning a drive and for preparing a drive image for compression.  A drive whose unused space is filled with zeros will compress to a much smaller size than one that is not.  There are two methods typically used to fill a Linux drive with zeros:

A simple shell script

The following shell script simply creates a file and fills it with zeros until all drive space in exhausted, then it deletes the file.  This isn’t a perfect solution, but it’s better than nothing.

cat /dev/zero > /zero.fill
sleep 5
rm -f /zero.fill

The ‘zerofree’ utility.

The zerofree utility, written by Ron Yorston, only works on devices containing an ext2 or ext3 file system. It seems to work fairly well, however I have crashed one system with it so be careful. You should probably clone the drive you plan to zerofree before doing so.


, , , , , , , ,

1 Comment

How to get the command line for any process in Linux

“/proc” file system

Every process has an associated sub-directory entry under “/proc” in the form “/proc/<PID>“, where <PID> is the process identifier.  The “ps” command can be used to obtain the process identifier for any process by executing:

# ps aux | grep -i process_name | grep -v grep

For example:

peniwize@host:~$ ps aux | grep -i bash | grep -v grep
peniwize 2255 0.0 0.1 7044 3596 pts/1 Ss 18:52 0:00 bash
peniwize 2347 0.9 0.1 7044 3704 pts/2 Ss 19:08 0:00 bash

The second column is the process identifier (PID).


“/proc/<PID>/cmdline” file

Each process sub-directory also has a read-only “cmdline” file that contains the complete command line used to execute the process. The command line arguments are separated by by nulls instead of white space, and the ‘cmdline’ files does not end with a new-line, so it is necessary to replace all nulls with spaces (or tab, or whatever you want) in order to display them. This can be done with a simple sed filter. Here are examples of the command line both raw and processed by sed:

peniwize@host:~$ cat /proc/self/cmdline && printf "\n"
peniwize@host:~$ cat /proc/self/cmdline | sed 's/\x0/ /' && printf "\n"
cat /proc/self/cmdline

Notice that there is no space (fourth character) between “cat” and “/proc…” in the output of the first command but there is in the output of the second. Note that most modern Linux kernels are configured to create a directory named “self” in the “/proc” file-system. “self” is an alias for the PID of the currently running process, i.e. a process can easily access “/proc/<PID>/…” by accessing “proc/self/…”.

Simple bash script to display the command line for any process by PID or by name

# Argument 1 ($1) is the PID (or name) of the process whose command line should be shown.
if [[ "$1" =~ ^[0-9]+$ ]]; then
    PID="$(pidof $1)"
if [[ -z "$PID" ]]; then
    printf "Unable to resolve process: '$1'!\n"
    exit 1
cat /proc/$PID/cmdline | sed 's/\x0/ /' && printf "\n"

Leave a comment

How to clone a drive over the network in Linux

In my embedded work I often need to clone disk drives, USB memory sticks, and miscellaneous other types of mass storage devices.  Here is a simple and easy way to clone/copy the primary hard disk of a source machine onto the primary hard disk of a destination without taking anything apart or using tertiary storage.  Of course, you may need to update drivers and system configuration on the destination machine after copying the source machine to it (if the hardware in each machine is not identical).

Assuming that your destination machine disk drive is as large as your source machine disk drive (or larger), you can copy the contents of the source machine drive directly to the target machine using the netcat (‘nc’) application. Since netcat uses the network, you may encounter problems if you’re running a firewall on either machine. If you are then either temporarily disable them or reconfigure them to allow traffic on whichever port you tell netcat to use. I typically use port 12345. For example, I needed to copy/clone the contents of the primary hard disk so I booted each machine with a live Linux CD and executed the following commands:

[Target Machine (assume IP address]

# nc -l 12345 | gunzip - > /dev/sda

[Source Machine]

# cat /dev/sda | gzip - | nc 12345

And then I waited. This resulted in about 8MB/sec throughput (on average) over my 10MB/sec LAN, which is not bad. Saturation would be 10MB/sec so gzip was definitely improving performance since it was compressing at least 2MB/sec. It took a while to transfer, but I didn’t have to use a tertiary mass storage device and I didn’t have to worry about copying non-regular “files” such as device nodes and fifo’s/pipes, which would normally require the use of tar or rar. I just started the process and let it run while I did other work.  I suspect it ran for a couple of hours, but I wasn’t keeping track so I don’t know for sure.

This method of cloning a drive completely ignores drive geometry and file systems so it will not work on all machine configurations, but it’s likely to work for most and it’s an easy set and walk away type of solution to an otherwise complex problem.

After using this method to clone the source machine, I used the Linux LVM utilities to resize the partitions and file system on the destination machine so no space was wasted.  This can also be done to NTFS partitions using a Linux live CD.

, , , , ,

1 Comment

How to see how much free memory is available in Linux

The amount of available memory is equal to the amount of free memory + the amount of cached memory. Keep in mind that RAM disks are stored in the cache and memory used by RAM disk content cannot be reclaimed by the system until that content is deleted from the RAM disk (maybe). The last documentation I read about RAM disks stated that they only grow. They never shrink. This may be different now in newer kernels.

The following command can be used to see how much RAM is available for use:

cat /proc/meminfo | \
awk '{ \
if ($0 ~ /^MemTotal:/) { TotalMemory = $2; } \
if ($0 ~ /^Cached:/) { CachedMemory = $2; } \
if ($0 ~ /^MemFree/) { FreeMemory = $2; } \
} \
END { print \
"Available memory = " (TotalMemory - CachedMemory) / 1024 \
"MB, Free memory = " (FreeMemory + CachedMemory) / 1024 "MB" \

I’m not sure which value in the report is more accurate: “Available memory” or “Free memory”, but I think it’s the latter because it takes miscellaneous allocations into consideration. Here is an example of the output on my workstation, which contains 2GB of system RAM:

user@host:~$ echo $SHELL

user@host:~$ cat /proc/meminfo | \
> awk '{ \
> if ($0 ~ /^MemTotal:/) { TotalMemory = $2; } \
> if ($0 ~ /^Cached:/) { CachedMemory = $2; } \
> if ($0 ~ /^MemFree/) { FreeMemory = $2; } \
> } \
> END { print \
> "Available memory = " (TotalMemory - CachedMemory) / 1024 \
> "MB, Free Memory = " (FreeMemory + CachedMemory) / 1024 "MB" \
> }'
Available memory = 1560.73MB, Free Memory = 1525.25MB

user@host:~$ cat /proc/meminfo
MemTotal:        2059248 kB
MemFree:         1105060 kB
Buffers:           84752 kB
Cached:           460936 kB
SwapCached:            0 kB
Active:           511048 kB
Inactive:         350320 kB
Active(anon):     315976 kB
Inactive(anon):     3380 kB
Active(file):     195072 kB
Inactive(file):   346940 kB
Unevictable:          16 kB
Mlocked:              16 kB
HighTotal:       1186160 kB
HighFree:         389832 kB
LowTotal:         873088 kB
LowFree:          715228 kB
SwapTotal:       1952764 kB
SwapFree:        1952764 kB
Dirty:               132 kB
Writeback:             0 kB
AnonPages:        315692 kB
Mapped:           103636 kB
Shmem:              3680 kB
Slab:              45576 kB
SReclaimable:      35152 kB
SUnreclaim:        10424 kB
KernelStack:        2304 kB
PageTables:         5476 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2982388 kB
Committed_AS:    1243316 kB
VmallocTotal:     122880 kB
VmallocUsed:       63084 kB
VmallocChunk:      40444 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       4096 kB
DirectMap4k:       36856 kB
DirectMap4M:      872448 kB

, , , , ,

Leave a comment