Compare files ignoring a field or column using Process Substitution

Lets say the data contains multiple fields/columns separated by space or comma or some other delimiter. And we want to compare two files ignoring a specific column. Lets divide work in two small issues. First is to ignore the provided field/column.

If we simply want to ignore the first column, we can use one of the following cut constructs.

cut -d',' -f 1 --complement datafile
cut -d',' -f 2- fileName.csv

If we want to ignore a specific one we can use awk in following manner which is much more generalized because you can specify which column to ignore, be it first, third or last.

This can be used as

awk -F',' -v FieldToIgnore=3 -f ignoreField.awk datafile

Next part is to diff the output after ignoring (read removing) the column. That is where process substitution comes handy. Here are two examples.

# ignore 1st column from two csv datafiles while comparing
diff -u <(cut -d, -f 2- datafile1) <(cut -d, -f 2- datafile2)
# ignore column 3 from two csv datafiles while comparing
diff -u <(awk -F',' -v FieldToIgnore=3 -f ignoreField.awk datafile1) <(awk -F',' -v FieldToIgnore=3 -f ignoreField.awk datafile2)

So instead of giving it two real files, we give it two redirected streams. Same solution can be used to pre-process files differently (e.g. ignore any comments or empty lines or compare two unsorted files).

See below for more information on Process Substitution.
http://www.tldp.org/LDP/abs/html/process-sub.html
http://wiki.bash-hackers.org/syntax/expansion/proc_subst

Application Logging Improvement – Part 3 Making it Readable

This is part three of my Application Logging improvement plan. So far I have discussed that log should be machine readable for application performance, management and monitoring. In this post I give an example of how to make the log readable to human (or make the log just like everyone has been used to seeing them). I am going to use vim to view the log files and have it configured so it knows how to handle the file with syntax etc.

First thing is configure vim to recognize the format. Continue reading

Application Logging Improvement – Part 2 Multithreading

Multi-threading is now becoming a norm. Obvious issue with logging is how to synchronize between threads. As discussed in last post Application Logging Improvement Plan – Part 1, we want to log as much as possible in machine readable format. So there comes a problem with multiple threads trying to log at the same time. Two possible implementations come to mind but both are flawed.

  1. Synchronize between threads for logging – Disk writes are slow and now locking contention would only make it worse. This slows down the business logic and is a big no-no.
  2. Log without synchronizing – Business logic works but logs get jumbled up because multiple threads are trying to log at the same time. This leaves logs in worst shape and unusable.

We can do better by combining both of above to get a solution. We will create a per thread logging buffer (lets call it LogBuffer) where each thread would log without any conflicts. And at a certain threshold, threads synchronize and log their LogBuffer to the disk (lets call this Flush).  Continue reading

Application Logging Improvement Plan – Part 1

People are divided on how to log, what to log, how much to log. A never ending discussion this is. In addition many open source libraries are available for logging. Not to mention many standards. I am not going to go in details of what is available out there. Use Google to pick your poison. What I am going to discuss here is what I think makes most sense with available technology.
Continue reading

Create Post Excerpt Intelligently in Jekyll

meta-descriptions or OpenGraph Descriptions are the short descriptions that Search Engines/Facebook display on result pages. They provide a brief introduction to the content and grab user attention. So when possible, try to use keywords in meta description that describes in short what that page is about. Ensure to limit meta description length to 160 characters which seems to be a norm for SEO now-a-days.

Jekyll by default supports post.excerpt which would automatically take the value of excerpt from Front Matter (if present). Otherwise it will fall-back to post content from beginning till excerpt_separator or end of post. Whatever separator you choose, have it defined as excerpt_separator in _config.yaml. With this done you can use {% raw %}{{ post.excerpt | strip_html }}{% endraw %} wherever you need to show it (e.g. meta-description/og-description). I use strip_html to remove any html which will cause issues with description tags.

I personally like to show a bigger teaser on my main page to entice user attention. And on other pages (like search results, Category/Tag index) show a brief description. Here is how I do it.

  1. Define meta-description tag in Front Matter of the post and describe in few words what the post is about. Then use {% raw %}{{ post.meta-description }}{% endraw %} during publishing HTTP headers for the post and on Category/Tag index pages. And on main page OR index.html use {% raw %}{{ post.excerpt }}{% endraw %} which is auto-generated by Jekyll based on Front Matter or excerpt_separator. I personally stopped defining excerpt in Front Matter to avoid namespace collision with Jekyll.
    Continue reading

mimikatz : Export non-exporteable Private certificate from Symantec PKI

Recently our organization started to provision Private certificates using Symantec Managed PKI Service. It has lot more appeal for IT admins because it takes out all user intervention which always creates support nightmares.

Previously I had direct access to the private key so it was easy to export it to all my devices and use for VPN and other secure stuff that needed to verify that I am indeed the real user. Because Symantec PKI is not available for Linux, it broke the VPN access from my Ubuntu system. Naturally I started to look for ways to export the key out of windows system. So here is what I did to get me out of the bind.

How to export certificates

First I installed Symantec PKI client on a windows 7 system. That was a no brainer because there was no other choice. I did not try with Windows 8 so YMMV. The main issue was that Windows certificate manager showed that the private key was not exportable. If it was then my quest would have been over right there. But I had to take another step. Mimikatz was the answer which marks them exportable and also allows to export them. Note: The patching that it does only lasts for that session. Once you reboot windows system you have to patch again using mimikatz. I used latest version which is 2.0 at the writing of this post. Continue reading

postfix : Configure outgoing relay server

Update /etc/postfix/main.cf and add the name of your outgoing/relaying mailhost as “relayhost”. Ensure that the relay server is accepting your email first.

e.g. if the outgoing relay is mailhost.xyzserver.com sendmail configuration should look like following.

# INTERNET OR INTRANET

# The relayhost parameter specifies the default host to send mail to
# when no entry is matched in the optional transport(5) table. When
# no relayhost is given, mail is routed directly to the destination.
#
# On an intranet, specify the organizational domain name. If your
# internal DNS uses no MX records, specify the name of the intranet
# gateway host instead.
#
# In the case of SMTP, specify a domain, host, host:port, [host]:port,
# [address] or [address]:port; the form [host] turns off MX lookups.
#
# If you're connected via UUCP, see also the default_transport parameter.
#
#relayhost = $mydomain
#relayhost = [gateway.my.domain]
#relayhost = [mailserver.isp.tld]
#relayhost = uucphost
#relayhost = [an.ip.add.ress]
relayhost = mailhost.xyzserver.com

After that restart postscript.

service postscript restart

 

Find the process monopolizing the CPU without using “top”

Lets say you are on a system where top is not available (or other tools similar to it). Sound incomprehensible but believe me. There are systems which do not have any of those great tools available. So how do you find the process eating up most CPU? The humble ps command provides pcpu which is CPU percentage used by a process. Here is how.

ps -eo pcpu,pid,ruser,args | sort -r -k1 | less

This will give in reverse sort order the “pid” that is taking up most of pcpu and the ruser (real user) with args. So there you have it.

Setup WebEx on 64 bit Ubuntu 12.04 using 32 bit Oracle Java

WebEx would not work on Ubuntu 12.04 64 bit with default configuration. It requires 32 bit java. WebEx control window would launch but desktop sharing, application sharing, white-board etc. do not show up. Neither I could see other people’s shared content nor I could share mine even if I am the host of the meeting.

Starting Firefox from command line on a terminal shows ELFCLASS32 error from WebEx shared objects. So it was clear that WebEx would not work on 64 bit system as is and would need 32 bit java to work. Because I use 64 bit system I do not want to downgrade to a 32 bit version just for the sake of WebEx.

In brief, these three steps cover the fix.

  1. Install 32 bit Oracle Java locally. Oracle Java is must and OpenJDK would not cut. Warning: because it is local installation, user would need to manually keep on updating as new java becomes available. Recently there have been many releases from Oracle which came with very little time in between addressing major security issues so this would be concerning.
  2. Install Firefox locally so it can be configured to use this 32 bit java. Add a different profile and use a different theme so it does not conflict with the native Firefox and clearly stands out if both are running.
  3. (Optional) Add shortcut in Unity HUD for quick access.
    Continue reading