File Compression Guide

When sending files online good compression is essential for saving bandwidth, while many people these days have quite fast download speed, there are even more that have speeds below 8Mbps.

File compression consists of two parts, the archive and the compression algorithm, many archive formats support various compression algorithms, the most noticeable example of this is the .tar archive, when compressed it’s common practice to add the type of compression as a suffix, for example .tar.gz, .tar.bz2, other formats like .zip, .rar and .7z often specify a preferred compression method.

For this article I’m going to be using 7-zip which offers a variety of compression algorithms and archive types, it’s also completely free and open source.

Testing

This test will be done on three different types of files, the first being the nvidia driver installer (361.91-desktop-win10-64bit-international-whql.exe), the second being a PDF book and the third a large plain text file, this is important since the compression ratio depends on the file type, for instance installers are typically already compressed so I expect minimal compression there.

Files Uncompressed Size
Installer 321 MB (337,507,360 bytes)
 PDF Book 114 MB (120,225,893 bytes)
 Text File  9.13 MB (9,584,473 bytes)

For the first benchmark I will be compressing each with LZMA2 using the 7z archive which is the default and recommended for 7-zip, other options are at defaults, compression level normal, dictionary size 16MB, word size 32, solid block size 2GB, CPU threads 2.

Files Compressed Size Compression Ratio Compression Time
Installer  321 MB  100%  ~43 seconds
PDF  109 MB  95.6%  ~17 seconds
Text  1.40 MB  15.3%  ~4 seconds

As we can see from these results plain text has by far the best compression ratio, while the installer did not benefit at all, in some cases this may actually increase the size, the PDF had a reasonable improvement in size but this is dependent on how the PDF is compressed.

Now let’s try again but with the compression level set to ultra.

Files Compressed Size Compression Ratio Compression Time
Installer
PDF 107 MB 93.8% ~26 Seconds
Text 1.39 MB 15.2%  ~4 seconds

The results of this are rather interesting, the installer caused 7-zip to freeze on ultra so I was unable to see if there is any compression, the PDF shows a reasonable gain at the cost of compression time while the text file remains mostly the same.

Compression level isn’t the only thing you can tweak, dictionary size can have a major effect on the compression ratio but also enormously increases the memory requirement for compression and decompression, the default 16MB is rather conservative, ultra defaults to 64MB which is much better but you can get a little more by increasing it, generally above 128MB gives minimal gain.

This test is a little unrealistic as often you will be compressing many files, let’s try a mix of different file types with an uncompressed size of 132MB

Compression Compressed Size Compression Ratio Compression Time
Default  117 MB  88.6%  ~9 seconds
Ultra  90 MB  68.18%  ~24 seconds
Ultra + 128MB Dict  89.7 MB  67.95%  ~22 seconds

I was a little surprised by these results that a larger dictionary size actually took less time, it really goes to show that the types of files determine how far you can compress more than anything else.

Conclusion

I was expecting more definitive results as to what is better but as these tests show it varies on a case by case basis, I would certainly recommend you stick to LZMA2 as various benchmarks by many people have shown it to be the best in terms of compression ratio, memory and for the most part compression time, things like .zip with deflate (I.E winzip) should be avoided these days.

If you really need good compression then the only true way to do it is to test various settings for what you are trying to compress.

For things like video, audio and images, compression isn’t really the answer, using a different format or codec is the way to go since compression can only go so far.

Advertisements

Installing Gentoo Linux Tips

586px-Gblend

Gentoo is a very popular source based distribution primarily intended for more experienced Linux users, although really it’s not as hard as people make it seem, certainly you should have experience using Linux and be comfortable using the terminal.

Since it would be a waste of my time to repeat the excellent Gentoo handbook I’m going to just cover the bits that may cause your first time user trouble, it’s recommend you do your first install in a virtual machine rather than on a physical machine so you can get a feel for it.

Prerequisites

  • Around 40GB free disk space for a decent install
  • A reasonably fast CPU
  • Access to the Gentoo handbook throughout the install
  • 1GB of RAM or more

The fast CPU is so you can get it installed in a short amount of time, compiling is an intensive task that can take days on a slower machine, for comparison a 1.8GHz Intel Celeron took me around 40 hours for a full desktop install, so I recommend at least a dual core processor, of course if you’re not in a hurry that’s fine.

It’s important you do your research before proceeding with the install, some thing you really need to know are:

  • What hardware do you have, use lspci and lsusb or other tools
  • Is your hardware supported in the kernel

The easiest way to check this is to run a Linux distribution, the Gentoo desktop live cd will work fine for this, if all your hardware works then your good to go, although you should note down the loaded modules so you can optimize your kernel to just what you need.

Finally it’s a good idea to note down your network configuration, particularly if you’re not used to setting up your network from the terminal.

Base System

For this install I’m going to assume you’re installing x86_64 (64 bit), most of this will apply to x86 (32 bit) as well.

First get the Gentoo minimal install cd from here, once it’s downloaded you can burn it to a cd, or as I’d recommend instead a flash memory stick with unetbootin since the image is updated very frequently.

When booting you should be asked to select your keyboard key map, if for some reason you can’t select it or need to change it later use loadkeys.

Wireless

Wireless in general is a pain in the ass when it comes to Linux in my opinion, mainly due to highly variable support, it’s often easier to buy a well supported adapter than try get a poorly supported one working.

To connect to a WPA-PSK secured network, as most are these days you need wpa_supplicant which is included on the install cd, you first need to make a configuration file for your network:

wpa_passphrase [ssid] [passphrase] > /etc/wpa_supplicant.conf

The ssid is the network ID you wish to connect to, the output is stored for later use, if you don’t know which is your network you can use the either of the following commands to scan:

iwlist [interface] scan
iw dev [interface] scan

The interface is the name of your wireless interface which should be displayed if you type iwconfig, if you don’t see anything it usually means the driver is not loaded or not available.

Once you have your configuration file you can connect to the network with:

wpa_supplicant -i  [interface] -c /etc/wpa_supplicant.conf -B -D [driver]

The -B option runs the wpa_supplicant daemon in the background so you may want to omit it the first time you run so you can check for errors, for the driver wext or nl80211 are the most common, nl80211 is preferable if supported.

Once it’s connected run the DHCP daemon to auto configure the network:

dhcpcd

If all goes well your wireless network should now be working.

Wired

For a wired ethernet connection this will usually work right away without any configuration, if your network adapter appears when you type ifconfig you’re generally good to go, run dhcpcd if needed or perform a manual configuration, check the ifconfig man pages for more info.

Setting up disks

The most basic partition scheme you can really go with is:

Partition    Usage       Size    Filesystem
/dev/sda1    BIOS Boot   2MB     none
/dev/sda2    Swap        4GB     swap
/dev/sda3    /boot       128MB   vfat
/dev/sda4    /           ~       ext4

I strongly recommend you carefully read the difference between MBR and GPT, if in doubt go for GPT, the above partition scheme should work in either case.

I generally recommend that if you’re dual booting with Windows that you put your Linux install on a different disk, this helps to avoid any problems with the Windows bootloader.

Setting up compile and USE flags

This is one of the more important bits to get right, for compile options you should not go over the top, -O2 -pipe -mtune=native is good enough 99.9% of the time, for 1GB of memory and below do not use -pipe, for the USE flags you really need to think ahead about what you want your system to do, in particular if you ever want to run 32 bit applications put in the multilib USE flag right away, also make sure you set MAKEOPTS=”-j9″ as well since it speeds up compilation a huge amount, the number you use should be the total number of logical CPU cores plus 1.

If you run in to segmentation faults when compiling, as can happen if you don’t have enough memory, put the following in /etc/portage/make.conf

FEATURES="keepwork"

Remove it when no longer needed.

When it comes to setting your profile you should generally go for Desktop, otherwise you’ll need to add a whole bunch of USE flags, finally when you emerge a package always use –ask and look at the flags in blue, consider if you may need the features those flags provide now or in future.

Kernel Configuration

This is a really critical step, not getting this right can in certain cases cause serious issues (like forgetting wifi support), other minor problem can be fixed by reconfiguring the kernel.

Take a good amount of time to read through all the configuration options, some may not make any sense but it general you don’t have to worry too much as the defaults are mostly sensible, if you have any doubts use genkernel, you can always tweak things later.

Only thing I would really change always is to increase the scroll back buffer as the default is tiny in my opinion.

For driver support save the configuration and open the .config file in nano, have a search through and you should find your needed drivers if they’re available in the kernel.

Network Configuration

This is one place where the handbook failed me, in the end I had to put the needed commands to launch wpa_supplicant in a script in /etc/init.d in any case this isn’t difficult to do but keep it in mind if you run in to the same problem as me.

If you’re using wifi make sure you emerge these packages before you reboot otherwise you are screwed:

  • net-wireless/wpa_supplicant
  • net-wireless/iw

After Installation

Once you’ve rebooted in to your new Gentoo installation you can start installing more packages, Gentoo for a source distribution is very easy to use, during the months that I’ve been using it I only had one minor package issue.

If you made a serious mistake during the installation all is not lost, you can boot the install disk again, once you’ve mounted the partitions and chrooted you can fix whatever problem there is without doing a full reinstall.

If you do run into trouble make sure you visit the Gentoo IRC channel which has a lot of helpful people, or if the problem is with a specific package post on the forum.

Useful Free Windows Tools

There are lots of tools that greatly improve the usability of Windows but for one reason or another are often quite obscure.

Process Monitor

vst6ufl

This is an extremely useful tool that allows you to monitor file activity, process activity, registry activity, network activity and more in real time so you can find out exactly what applications are up to.

You can get it here.

Process Explorer

4slhio8

This is a much more useful process manager than the one built in to Windows, aside from a wide range of resource monitors it can show all the loaded dll and other files which can be very handy if you run in to the common problem of being unable to delete a file since it’s open in another process, a lot more information is available as well such as active network connections, threads, GPU usage for example.

You can get it here.

Visual Subst

jusc8d3

This is a graphical interface for the subst command which allows you to map folders to virtual drives, you can of course do it with the command but this is easier, particularly if you make regular changes.

You can get it here.

SuperF4

icon128

One of the most annoying aspects of Windows is how hard it is to kill some full screen applications, with Linux you can almost always switch to a virtual terminal and kill it there but with Windows unless you have a second monitor you’re stuck.

This handy little tool runs taskkill /f on the active full screen application when you hit ctrl + alt + F4, this is far more likely to work than the regular alt + F4 which can be ignored by programs, it also has a feature like xkill which lets you click on the window you wish to kill.

You can get it here.

f.lux

qrh3cfx

Pretty much one of my favourite applications for Windows, it adjusts the color temperature of the monitor at night time to reduce eye strain and give better sleep, after using it for some time I can definitely say it helps, it may seem strange at first but your eyes quickly get used to the more orange color to the point where you don’t even notice it.

You can get it here, it’s also available for Linux and more.

Updating Gnu GCC on Linux

GCC is a major part of any Linux system, it contains a C, C++, Fortran and Java compiler (plus some extras if compiled), it’s also the only compiler recommended to build the GNU C library (glibc) which is required to make a Linux system work, other alternatives are available but not as common.

Most systems use an older compiler for stability reasons since it has been significantly tested, however it’s sometimes desirable to use a cutting edge compiler for maximum performance, generally however you should not replace your system compiler unless you’re happy to deal with any bugs that may appear, this is mainly a concern on a source based system like Arch, Gentoo or BSD.

For this post I’m going to be installing GCC 5.30 on XUbuntu 15.10 x64 with the following libraries:

  • GMP 6.10
  • ISL 0.16
  • MPC 1.03
  • MPFR 3.1.3

You can use your systems version of these or build them with GCC which I chose to do.

Prerequisites

The follow tools must be on your system if you intend to follow this post:

  • Gnu Bash (Tested 4.3.42)
  • Gnu Make > 3.80 (Tested 4.0)
  • Gnu GCC > 4.9.x (Tested 5.2.1)
  • Gnu G++ > 4.9.x (Tested 5.2.1)
  • binutils > 2.25 (Tested 2.25.1)
  • awk (Tested mawk 1.3.3)
  • tar, gzip and bzip2 (for unpacking sources)
  • InfoZIP (Tested 3.0)
  • DejaGnu
  • TCL
  • Expect

On XUbuntu 15.10 x64 I only had to do the following:

sudo apt-get install g++ dejagnu

Setting up your build environment

For this I decided to use a separate partition mounted at /media/dev
Make a folder for your sources and build, you need around 15GB of free space, this is what I ended up with:

/media/dev
    sources/
    build/gcc-build/
    tools/

In the sources folder I downloaded and unpacked GCC 5.30 and the support libraries listed above, these are optional but I recommend you build them instead of using the system ones.

If you decide to build the support libraries you need to move or create a symbolic link to them in the gcc source directory so it ends up like this:

/media/dev/sources/gcc-5.3.0
            gmp/
            isl/
            mpc/
            mpfr/

Once that is done go to your build directory /media/dev/build/gcc-buld, unlike many applications you keep the build directory and source completely separate.

Configuration

There are a lot of configuration options available so I highly recommend you check out the official documentation and other sources (the official documentation is surprisingly sparse), this is what I used to configure:

../../sources/gcc-5.3.0/configure --prefix=/media/dev/tools --disable-nls --disable-multilib

Let’s take a look at each bit:

../../sources/gcc-5.3.0/configure
This is the path to the configure script, you could use a non-relative path here instead.

–prefix=/media/dev/tools
The prefix is where I want the fully built compiler and related files to be installed when I run make install, other good locations are /usr/local and /usr/opt, do not put it in /usr unless you are completely sure, also do not leave this unset or it will default to /usr.

–disable-nls
This disables the native language system which provides errors in your native language, unless you have trouble with English or are building it for distribution you should turn nls off.

–disable-multilib
Without this gcc will be built for x86 and x64, for my system I have no interest in 32 bit so it’s disabled, keep in mind you will need a 32 bit version of gcc and glibc installed in order to build multilib.

One thing you might want to add is –disable-werror as during the build process it may run into an error, this is nothing to worry about since you will be checking the compiler later.

Once the configuration is complete we can proceed with the build.

Compiling

The next steps are really simple but rather time consuming, for a complete build of gcc you should allocate at least 4 hours on a fast system (I built on an Intel Core I7 4790 and it took quite a while), for a very slow system you might want to run it overnight.

As for why this takes so long it’s due to two reasons, first Gnu GCC is a complicated bit of software and secondly it has to be built at the bare minimum three times.

The first build known as stage 1 uses your system compiler to build the new version of GCC, this is called bootstrapping, if all is well it goes on to stage 2 where it builds the compiler again with the new version you just built, this ensures the compiler is properly optimized, a final build is done and a comparison is made between stage 2 and 3, this verifies that the compiler is stable.

One important warning should be given though, if you’re going from a rather old compiler straight to the latest there is a very good chance the compile will fail or other strange errors will appear in the compiler, if this occurs you must used a version closer to the system compiler and build your way up to the latest, from a very old compiler this can take several steps so always start with the latest available compiler for your system.

The binutils version isn’t as important, if you want to you can put it in the source tree and it will be built along with gcc.

To compile run the following:

make -j9 bootstrap

The -j9 option tells make to use 9 parallel threads, this speeds up the process many times so make sure you include it, the number should be the total number of logical cores plus 1 (as a general rule), for an I7 with 4 physical cores and hyper threading this comes to 9.

If you don’t have much disk space try bootstrap-lean instead, this will take longer but should use less disk space (I have no idea how much less).

BOOT_CFLAGS can be set to adjust how it’s built, the default is -g -O2, feel free to adjust but be aware if you go too far it may break.

Do not run make without bootstrap, that’s only for when the versions are the same.

Testing

Once bootstrapping is complete, hopefully without errors you can move on to the checking process, this take a long time as well but it’s really a bad idea to skip unless you are repeating a build, let me say this again, it’s really a bad idea to skip.

To run the tests:

make -j9 -k check

The -k option ensures it will not stop on any error, at the end you will get a report consisting of:

  • Tests performed
  • Tests that passed
  • Tests that passed unexpectedly
  • Tests that failed
  • Tests that failed unexpectedly

Only the last one you really need to worry about, if the number is low (preferably zero) you should be okay, if it’s more than five you really need to stop and check them, it’s a good idea to report any failures on the gcc-testresults mailing list.

Installing

To install simply run:

sudo make install

Assuming you remembered to set your prefix it should all be in the right place, if you really want to install in /usr it’s a good idea to set either –program-prefix= or –program-suffix so it will have a different name to your system compiler.

Go to the bin folder and run ./gcc -v, you should see something similar:

Using built-in specs.
COLLECT_GCC=./gcc
COLLECT_LTO_WRAPPER=/media/dev/tools/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../sources/gcc-5.3.0/configure --prefix=/media/dev/tools --disable-nls --disable-multilib : (reconfigured) ../../sources/gcc-5.3.0/configure --prefix=/media/dev/tools --disable-nls --disable-multilib
Thread model: posix
gcc version 5.3.0 (GCC)

Conclusion

It really isn’t that hard to update GCC and well worth it if you’re a programmer or just want to optimize your system as much as possible.

Installing Minecraft on Linux

Installing everyones favorite game (well okay not everyone) is fairly simple on Linux, as to why you’d want to install it on Linux a lot of players find they get much better performance, especially when you start adding lots of mods.

Java

The first thing you need to do is make sure you have Java installed, this can be done by typing:

java -version

You should get something like this:

java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

Mojang recommend that you use the Oracle JVM rather than IcedTea but I’ve not had any  real trouble with it on the latest 1.8.9 version of Minecraft, perhaps slightly slower but that’s all, if Java is not installed do a search for how to install for your particular Linux distribution, for Gentoo simply emerge jre and then oracle-jre-bin.

Installing Minecraft

Go to minecraft.net and download the launcher, once it’s done type:

java -jar Minecraft.jar

Login to your Minecraft account and install the latest, or your preferred version of Minecraft, once it’s done launch, if all is well it should run without problems, if you notice performance problems try the Oracle JVM instead.

Some older Minecraft versions may benefit from updating the included LWJGL library, particularly if you get input issues.

Dwarf Fortress on Gentoo Linux

wz9y0cb

Recently I decided to get back into playing Dwarf Fortress, which if you don’t know is an extremely addictive game I highly recommend you check out.

However I ran in to problems right away trying to run it, the first being that Dwarf Fortress requires 32 bit libraries and I have a 64 bit system, this was easily fixed by  adding a text file in /etc/portage/package.use/  with the full package name (I.E media-libs/libsdl) and abi_x86_32 then running emerge again, this compile the 32 bit version of the needed library.

The second problem I had not encountered before, since the /usr/lib32/libz.so is actually a ld script there is a bug where bash being 64 bit loads the 64 bit library instead right before execution causing it to fail loading, this is fixed by setting the LD_PRELOAD environment variable to /lib32/libz.so.1 I put the following in the df shell script:

LD_PRELOAD=/lib32/libz.so.1
export LD_PRELOAD

P.S the libz bug may effect other distributions as well, fix is the same just double check the path to libz is correct.

Browsing the Web Securely

Browsing the web is one of the most dangerous activities when it comes to keeping your computer secure, the vast majority of all malware and worse infections come through web exploits, this article covers some of the best ways to improve your security.

The Web Browser

Using a modern web browser that is regularly updated is one the most important thing you can do, Mozilla Firefox and Google Chrome are two of the most popular but there are plenty of others out there that are just as good such as Opera, Vivaldi, Chromium and SeaMonkey.

Most of these have versions for mobile devices, although in my opinion it’s best to avoid doing anything important on a mobile device, particularly with Android.

Block Advertising

A large percentage of malware is delivered through online advertising so it’s absolutely critical that you block it, white listing certain websites is also a bad idea since this can occur even on major websites like YouTube.

There are a variety of adblockers available, some of the most common being adblock plus and ublock origin, personally I recommend the latter as it uses less resources and allows no ads by default.

Another form of adblocking (also used for other purposes) is by using a custom hosts file, this stops the computer from connecting to the listed websites, this is best used in combination with an adblocker, one good hosts file can be found here along with usage instructions.

Browser Plugins

Plugins like Flash and Java are a big no if you’re looking for security, flaws in these can easily expose your system to serious infections, if you need to use them make sure you always have the latest version and keep it disabled until needed.

Javascript

The majority of serious malware makes use of Javascript in combination with known web browser flaws to gain unrestricted access to the system or some other kind of attack, disabling Javascript when visiting unknown sites is the best thing you can do, unfortunately Javascript is also used by almost all websites for interactive content.

One way to make this simpler is to use a browser extension such as NoScript, this by default blocks all scripts so you have to manually accept them, this is a little time consuming but it only needs to be done on your first visit to each web site, in addition it allows you more control over what the website can do.

Cross-site Scripting (XSS)

A cross-site script is a script that reads or sends content to another website, one simple example being loading an image hosted on another web site, the problem with this is that without proper care and design it’s possible to exploit XSS to read private data or inject malicious code.

The risk of this cannot be emphasised enough, many major websites such as YouTube, Twitter and Facebook have been attacked using XSS, the best way to prevent this is to use a browser extension that by default blocks all cross-site requests such as RequestPolicy.

Virtual Machine

Perhaps the only true way to ensure security of your computer is to browse the web in a virtual machine, this is often time consuming to setup but is well worth the effort, this way you can be reasonably sure that even if you are infected the infection will be contained to the virtual machine, a lesser kind of virtual machine is a virtual sandbox which basically creates an isolated container, this isn’t nearly as secure as a virtual machine but is much quicker to setup.

Secure Operating System

If you’re using Windows then you’re going to be at significantly higher risk of infection, simply due to the number of users, the quickest way to boost security is to switch to Linux, BSD or Mac OS X (if you can afford it), this is not for everyone however but is well worth giving it a try, these can also be used in a virtual machine.

Use Anti-virus Software

Having some anti-virus software installed is very important, this is usually the final barrier stopping an infection, particularly as most now scan any changes made so malware and other nasty stuff is caught before it can actually cause any problems, this comes at a small impact to system performance but the loss is well worth it.

Anti-virus software should not be confused with anti-malware software, most anti-malware software deals with minor things such as adware and tracking cookies, anti-virus software will often ignore these so it’s good to have both.

Password Security

In the event that a web site you used is compromised (all too common these days) it’s important that you have unique passwords for each website that you use, these can be hard to remember so a program like keepass is extremely useful, this also allows the use of much longer passwords helping to prevent dictionary attacks and brute force.

Suspicious Sites

Always look at the URL before you click a link, unusual domains like .tk and domains in countries like russia and china (assuming you don’t live there) should be avoided.

If your browser has the option or there is an extension available you should disable automatic redirects, there have been many cases where a normal site has been hacked and changed to redirect you to the attack site.

Downloads

Another common technique to catch people is drive-by downloads, this is where a download will randomly pop up when you reach an infected page (usually triggered by a script), always check the download name, size and file extension, if you’re even the slightest bit concerned scan it before opening, a final fail safe is to open any download in a virtual machine.

Another way to verify a download is to verify the checksum if available, any changes or corruption of the download will alter the resulting checksum, IgorWare Hasher is a free Windows tools you can use, Linux and BSD usually already have something installed.

HTTPS

HTTPS encrypts data being sent and received by your web browser with SSL, most websites support encryption but not all have it enabled by default, always be very aware when sending sensitive data that the website is using HTTPS, this is usually indicator by a padlock icon near the address bar and the URL starting with https://

A nice little browser extension is HTTPS Everywhere, this forces use of HTTPS where available among other features.

Autofill

Most web browsers can remember your password to make things easier and quicker, however this is a big security risk that is often targeted by malicious scripts or software, so it’s strongly recommended that you disable it.

Advanced Authentication

Many websites are now offering more advanced authentication using things like verifying your email address or sending you an SMS message rather than just password alone, this can be a bit annoying but for important accounts you should always enable it.

Conclusion

Good browsing security isn’t difficult, most of it comes down to common sense but hopefully you have learned something of use from this article.