`sort` for Testers

Apr 25, 2025 (Apr 4, 2025) Loading...

Sometimes you're using sort. Sometimes you're testing it.

Machines love to iterate: if they expect to be walking through sorted data, they can be thrown by something out of place.

sort as a tool

sort sorts stuff, which seems handy 0n its own – but organising information opens doors of its own.

  • comparing two sets of data is relatively trivial if they're sorted (in the same way), painful (and slow) if not.
  • finding unique values in data is trivial if the data is sorted, harder if not.
  • unique values can be aggregated into counts, sums and statistics
  • knowing the unique values lets one look at boundaries, and see outliers
  • putting ranges in order lets one see where change is smooth, and where it is lumpy
  • sorting can reveal ways that values across several fields go together.
  • It's easy to pick out not only the smallest and largest, but the several smallest or largest.
  • One can prioritise processing by oldest or newest, furthest or closest, largest or smallest, or by any expressible value. You might lump together similar things, or choose to work in a way that distributes them throughout a job.

sort as a target

As a tester, you hardly ever test whether a sort sorts stuff; the algorithms are well understood and typically not implemented in code that one can change. Indeed, they're often deep enough that testers hardly think of them – imagine file search, databases or screen rendering.

You certainly need to test the choice of sort and whether the data suits it. You'll test the performance of sort, looking for inflexions that indicate system constraints or novel problems with data distribution. You'll test the knock-on effects of sort

Sort organises stuff: organised stuff lets you analyse and aggregate.

Got a directory listing and want the most-recent? sort -k6M -k7n. Want the largest? sort -k5hr

Got several logs to look through?

Want to see the unique accountIDs?

You can sort to see the unique values

Handy to know

Sort doesn't necessarily expect the same things as you. It has options to help cope.

So in sort -k6M -k7n above,

Weirdnesses

Unix sort defaults to character sort – give it numbers 1-100, and 10 and 100 will follow 1, and 11 will be next. We've all seen this in address sorting. I saw a phone company bill calls wrongly because they pulled batches of files off the top of a list. That was fine when they had a few files, but when the switches wrote double-or triple-digit names, the picker left the older files in place – some until they were so old that the rater rejected the records.

Where two things are sorted by the same method, they can be compared – but that doesn't mean that the sorted data is in the right order for use. I've seen an accounting system happily and correctly match two collections for years. When one of the collections needed to be used for something else, it was clear that it was in the wrong order – and was unusable in its

Sort doesn't sort in the order you expect. Apparently, if

the environment has LANG set to en_US.UTF-8... sort appears broken because case is folded and punctuation is ignored because ‘en_US.UTF-8’ specifies this behavior

Demonstration to follow.

sort -R doesn't sort in random order, but by the hash of the line. It's deterministic, but doesn't repeat, and I don't understand randomness clearly-enough to demonstrate. I understand that you should use shuf instead –if you've got it.

Member reactions

Reactions are loading...

Sign in to leave reactions on posts

Comments

Sign in or become a Workroom Productions member to read and leave comments.

James Lyndsay

Getting better at software testing. Singing in Bulgarian. Staying in. Going out. Listening. Talking. Writing. Making.

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.