Thursday, March 18, 2021

Installing AWS CLI on Solaris 10

A (former) colleague needed to transfer an Oracle Data Pump file to Amazon S3 (in order to allow Amazon RDS Oracle to import said Data Pump file), but the origination server was running Oracle Solaris and the customer needed the AWS CLI on that server. AWS documentation has no steps for installation of the AWS CLI on Solaris, so I decided to write up this short note.

The test system I used for validation was a SunFire V100 (a circa-2003 Sun SPARC machine with a single-core, single-socket UltraSPARC IIi clocked at 650MHz, and probably slower than a Raspberry Pi). This test system was running Oracle Solaris 10 update 11 (from here) with the CPU OS Patchset 2018/01 Solaris 10 SPARC.

Output of uname -a is

SunOS v100.local 5.10 Generic_150400-59 sun4u sparc SUNW,UltraAX-i2

The first required step is to install a later version of Python (even with the 2018/01 patchset, Solaris 10 comes with Python 2.6.4 which is far too old). Note that this Python package is from OpenCSW Solaris Packages, which is a third party. Enterprise customers would need to have their security teams validate whether OpenCSW is a permissible source of packages.

pkgadd -d

/opt/csw/bin/pkgutil -U

/opt/csw/bin/pkgutil -y -i python27

Once Python 2.7 is installed, you can then download the AWS CLI, version 1 (note that support even for Python 2.7 in the AWS CLI is ending very soon, mid-2021 at the time of this writing, so a specific AWS CLI version that does support Python 2.7 must be used). Note that we have to use a particular switch with wget because the version of wget that comes with Solaris 10 is very old and does not recognize the SSL certificate from Amazon S3.

wget --no-check-certificate ""


We can then install the AWS CLI, but we need to make sure the right version of Python is used, i.e. it must be ahead of the system Python in the path.

export PATH=/opt/csw/bin:$PATH

./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

After the installer completes, the AWS CLI is successfully installed, and you can configure credentials as appropriate.

Friday, October 25, 2019

Intel and AMD Processor Micro-Benchmarking

There are a large number of synthetic CPU benchmarks available - for example, GeekBench, JetStream, SPEC. The utility of these benchmarks for whole-system performance is debatable. Then we have benchmarks that attempt to measure whole-system performance; for example the time-honored Linux kernel compilation, and elaborate benchmarks such as SAP Sales and Distribution (SAP SD), otherwise known as the famous "SAPS rating."

Here I am attempting to measure some degree of whole-system performance by using ffmpeg to transcode Big Buck Bunny. This is a CPU-bound (more correctly, FPU-bound) benchmark with some memory and I/O load due to the very large size of the movie. I've used a statically-linked binary that is not particularly optimized for particular processor features or GPU's (ffmpeg can greatly speed up transcoding on Nvidia GPU's).

Here are the necessary steps to replicate my results (these are for Linux; on MacOS, I used the ffmpeg distribution from brew but the steps are otherwise identical):


tar xf ffmpeg-release-amd64-static.tar.xz


for i in 1 2 3; do
rm -f output.mp4; time ffmpeg-*-amd64-static/ffmpeg -threads 2 -loglevel panic -i bbb_sunflower_1080p_60fps_normal.mp4 -vcodec h264 -acodec aac -strict -2 -crf 26 output.mp4 2>&1 >>out.txt

Note that we are limiting the number of threads that FFMPEG can use to 2, which allows it to only use 2 cores. On a 4-core (or more..) machine, the encoding results are much better, but since many of my data points are from 2-core machines, we have to limit the number of threads to 2 in order to have an apples-to-apples comparison.

Note that on a 2-core hyper-threaded system, "in theory" 4 threads is ideal; however, hyper-threading is really only relevant for I/O-bound workloads, and since FFMPEG is CPU-bound, a thread limit of 2 is more appropriate.

We can see on this simple test, that for the MacOS trials:
  • there is a 37% performance improvement from Sandy Bridge to Broadwell (3 generations)
  • 17% improvement from Broadwell to Kaby Lake (2 generations)
Over 5 generations there is a cumulative improvement of 48%.

For the AWS M instance family:
  • 13% from Sandy Bridge (m1) to Ivy Bridge (m3) (1 generation)
  • 14% from Ivy Bridge (m3) to Broadwell (m4) (2 generations)
  • 13% from Broadwell (m4) to Skylake (m5) (1 generation)
Over 4 generations there is a cumulative improvement of 35%.

For the AWS C instance family:
  • 28% from Ivy Bridge EP to Haswell (1 generation)
  • 12% from Haswell to Skylake (2 generations)
Over 3 generations there is a cumulative improvement of 37% - but this is also partially due to differing clock speeds.

We normally would consider a benchmark such as SAPS to be a rigorous, whole-system benchmark because SAPS measures order line items per hour (an application metric) across infrastructure (CPU, memory, I/O), operating system, Java virtual machine, database, and ERP application. But it very much seems that SAPS is essentially a CPU benchmark.

Consider the following:
  • SAP certification #2015005 from 2015-03-10 (AWS c4.4xlarge, 8 cores / 16 threads) - 19,030 SAPS or 2,379 SAPS/core
  • SAP certification #2015006 from 2015-03-10 (AWS c4.8xlarge, 18 cores / 36 threads) - 37,950 SAPS or 2,108 SAPS/core
Here we observe almost linear scaling - as the number of cores/threads is increased from 8 to 18 (2.25X) the SAPS increases from 19,030 to 37,950 (1.99X).

If we consider the SAPS results for the previous-generation AWS C3 instance family:
  • SAP certification #2014041 from 2014-10-27 (AWS c3.8xlarge, 16 cores / 32 threads) - 31,830 SAP or 1,989 SAPS/core
The C3 result is about 6% lower than the c4.8xlarge on a per-core basis. If we recall the naive Big Buck Bunny transcoding benchmark, the C4 is about 12% faster than C3. Thus it appears that SAPS is not purely a CPU benchmark (as it should be) but is strongly CPU-dominated (at least half of the SAPS is directly attributable to CPU performance).

Naively concluding, there appears to be (on average) around 10% performance improvement across Intel CPU generations (across tick and tock). This means CPU performance doubles in 6.9 years (87 months - a far cry from Moore's Law which optimistically predicted 18 months

Thursday, December 13, 2018

AWS Z1D, M5A (AMD EPYC) Performance Comparison

This is the time-honoured Linux kernel benchmark.

In order to prepare a stock Amazon Linux 2 installation to compile the kernel, the following packages need to be installed:

sudo yum update
sudo yum install kernel-devel
sudo yum install ncurses-devel
sudo yum install bison
sudo yum install flex
sudo yum install openssl-devel

Preparation was simply unzipping the 4.20-rc3 tarball from, running "make menuconfig" and immediately saving the config (no changes) and then make -j 4.

This was on Amazon Linux 2, with a 40GB GP2 EBS block storage volume.

AMD EPYC m5a.xlarge:

time tar zxf linux-4.20-rc3.tar.gz

real 0m6.109s
user 0m5.928s
sys 0m2.838s

And the kernel build:

time make -j 4

real 18m31.421s
user 66m52.071s
sys 5m54.968s

Intel Xeon Platinum m5.xlarge:

time tar zxf linux-4.20-rc3.tar.gz 

real 0m4.693s
user 0m4.688s
sys 0m1.767s

the kernel build:

time make -j 4

real 14m4.332s
user 49m12.569s
sys 5m44.682s

So there we have it: on a kernel compilation, one run, the Intel instance completed the kernel compilation 25% faster.

Update 13-Dec-2018. The new Z1D instance is supposed to be significantly faster than M5/C5/R5 due to sustained 4 GHz Turbo Boost on all cores.

time tar zxf linux-4.20-rc3.tar.gz 

real 0m3.648s
user 0m3.623s

sys 0m1.438s

z1d.xlarge (also 4 vCPU) is 22% faster than m5.xlarge at uncompressing the kernel.

the kernel build:

time make -j 4

real 11m25.560s
user 39m35.242s

sys 5m2.343s

also 23% faster. So it looks like for general purpose workloads (that are probably I/O bound) the Z1D only provides a 20% performance uplift.

Wednesday, November 8, 2017

Getting Oracle Linux on Amazon EC2

Edit 08-Nov-2017:

As per Oracle's instructions - adding the GPG key is no longer neededthe bad news is - the procedure documented by Oracle throws a "Broken pipe" error, and the OS left in an unusable state (all Yum configuration disappears):

How to fix (basically just nohup the script).

Start with the CentOS 6 (x86_64) - with Updates HVM on the AWS Marketplace (provided by After logging in (as the centos user) run the following commands as root:

curl -O
nohup sh &

The script does some package updating that cuts the SSH connection to the EC2 instance, and when the connection gets cut, the script dies in the middle with a "Broken pipe" and leaves the OS in an unusable state. By using nohup we avoid the broken pipe issue.

After the conversion script completes, run the following:

yum distro-sync

Step 4 will no longer be required, and you can proceed to install the Oracle software prerequisites.

Old Information Below:

If you feel the need to roll your own Oracle Linux install on Amazon EC2 (since Oracle no longer provides an officially-supported AMI, and you may not be too keen on using one of the community AMI's):

(1) launch an EC2 image using the CentOS 6.5 AMI from the Marketplace (which is free..) then log in as root (not ec2-user)

(2) import the Oracle GPG key
cd /etc/pki/rpm-gpg/
curl -O

(3) use the Oracle specific rules to convert the CentOS 6 to OL from this documentSpecifically, run the following commands as root:
curl -O

(4) Synchronize the yum repository
yum -y upgrade

Once the command completes, the end-user should have a fully-patched Oracle Linux 6.7.

If you intend to install Oracle Database 11gR2, you can apply all the necessary packages and kernel parameters with this command:
yum install oracle-rdbms-server-11gR2-preinstall -y

and if you will install Oracle Database 12c R1, use the following:
yum install oracle-rdbms-server-12cR1-preinstall -y