Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /homepages/42/d436282994/htdocs/wp.vpalos.com/wp-content/plugins/wp-markdown/markdownify/markdownify.php on line 299

vpalos.com // milk, cookies, segfaults…

URI parsing using Bash built-in features

by Valeriu Paloş on February 3, 2010

A bit of background

A while ago I posted an article describing how one could parse complete URIs in Bash using the sed program. Since then, I have realized that there is a better way to do it, a much better way: via Bash built-in pattern matching
Here are some benefits of this improvement:

  • It no longer executes external programs (i.e. sed) for pattern matching. This translates to higher speed and lower memory and CPU usages, which means that you could use this parser for much more intense URI crunching.
  • The new regular expressions are drastically simplified thanks to the ${BASH_REMATCH[*]} array that is able to hold more than 9 matched sub-expressions, unlike sed that can only work with single-digit escapes: \1-\9 (yuck!).
  • The parsing algorithm is contained in a single Bash function, so no external file is needed to hold the regular expressions. This also means, obviously, that the pattern file is no longer loaded from disk on every execution (so HDD is saved as well).
  • The generated variables are named identically to the first version, so you should be able to upgrade your scripts to this version with absolutely minimal effort.
  • [Edit]
    No eval instruction is needed (unlike in the first version), further improving performance.

(more →)

First impressions on Loomiere/Stream performance

by Valeriu Paloş on February 2, 2010

UPDATE: http://vpalos.com/1165/loomiere-2-0-1-beta-finally-out/

As promised, here are some of the first monitoring statistics of Loomiere/Stream in a production environment after moving away from psstream. Only one server is considered, a Quad-core Xeon with 8GB RAM (not that they are used anymore).

This shows the memory usage over one week (the switch was made on the 29th as is obvious).

(more →)

Loomiere/Stream – A high performance streaming server

by Valeriu Paloş on January 30, 2010

UPDATE: http://vpalos.com/1165/loomiere-2-0-1-beta-finally-out/

The Loomiere (0.2.1) code is now freely available under GPLv3.
Please see this post for an update.

Are you killing psstream?

Well, yes! I am sure that many of you already know about psstream (the PHP streaming extension I made a while back). Well, many things happened since then and I came to realize I could do better; a whole lot better actually. As of now the ‘psstream’ project is officially no longer developed (see below). It will remain on the website for some time to come for archiving purposes but that is it.

But wait, why did you do it?

For some time now I have be looking into improving the streaming mechanism for a large video-sharing project run by my company. PSStream was a first effort, and it did the job but soon ran into problems. None of our servers was able to properly stream more than 150 clients simultaneously and the resources were grossly wasted, hence, Loomiere/Stream. (more →)

Recursive chmod distinguishing files from folders

by Valeriu Paloş on December 16, 2009

Version 3

An even better method is:

find "$target" -type f -exec chmod -c "$mode_files" {} \; \
     -or -type d -exec chmod -c "$mode_dir" {} \;

A true one-liner! 😀

Version 2

A better method is this:

find "$target" -type f -exec chmod -c "$mode_files" {} \;
find "$target" -type d -exec chmod -c "$mode_dir" {} \;

This one can also be used from the command line.

Version 1

Many times I needed to apply certain permissions recursively on a given path but with different permissions on files than on directories (i.e. I want 0644 for files and 0744 for directories). This behaviour is not provided by the chmod tool so here is a simple and effective bash function to do just that:

# Recursively apply chmod to path.
# If mode_files is missing then apply mode_dir to files too.
# Params: target mode_dir [mode_files]
function deep_chmod() {
    function _walk() {
        local F
        for F in `find "$1"`; do
            local M="$3"; [[ `file -b "$F"` == "directory" ]] && M="$2"
            chmod -c "$M" "$F" > /dev/null
    if [[ $# > 2 ]]; then
        _walk "$1" "$2" "$3"
        chmod -Rc "$2" "$1"

I’m looking for a way to improve on this since it is quite costly for large directories: for each file or directory at least two programs are executed (file and chmod) which is not very efficient! For now, it gets the job done.

Enjoy! :)

Bash URI parser using SED

by Valeriu Paloş on November 16, 2009

Warning! This version is now obsolete!
Check out the new and improved version (using only Bash built-ins) here!

Here is a command-line (bash) script that uses sed to split the segments of an URI into usable variables. It also validates the given URI since malformed strings produce the text “ERROR” which can be handled accordingly:

# Assembling a sample URI (including an injection attack)
uri_2='?param=some_value&array[0]=123¶m2=\`cat /etc/passwd\`'

# Parse URI
op=`echo "$uri" | sed -nrf "uri.sed"`

# Handle invalid URI
[[ $op == 'ERROR' ]] && { echo "Invalid URI!"; exit 1; }

# Execute assignments
eval "$op"

# ...work with URI components...

Notice the "uri.sed" file given to sed?
(more →)

Recursive file/directory change-detection

by Valeriu Paloş on November 6, 2009


This article explores a way in which an approximate “fingerprint” of a file tree can be created! If all you want is to detect file changes a much more appropriate method would be to use inotify/incron.

Version 2 (update)

Another, much faster method would be to use ls -lR to browse over the filesystem. On a newly installed Debian virtual machine (on Xen) hashing the entire filesystem (the root directory) took approximately 1.7 seconds. So, here it is:

ls -lR "$D" | sha1sum | sed 's/[ -]//g'

This method is sensitive to file name, size and modification size; usually that would be enough but if you need more control use…

Version 1

Detect when the contents of a file or directory ($D) changes:

find "$D" | while read f; do stat -t "$f"; done | sha1sum | sed 's/[ -]//g'

This yields a hash of the current state of the file or directory which is extremely sensitive to even the most subtle changes (even a simple touch to any file/directory somewhere inside "$D" changes the generated hash).

(more →)

Detect number of CPUs on a machine

by Valeriu Paloş on November 6, 2009

UPDATE: Steven pointed out (very nicely) that there’s no need for cat in this picture, grep would do just fine on its own. So, thanks Steven!

Detect how many CPU cores are present on the running machine:

grep -c processor /proc/cpuinfo

This can be very useful when writing multi-threaded programs to properly match the number of threads with the number of CPU cores.

(more →)