Playing with Logstash and syslog-ng: forwarding syslog traffic from a log server
I, like most of you (hopefully) have a central syslog server. It’s truly invaluable whether you have 3 servers or 300 servers. The combination of technical curiosity and the firehose of messages from syslog has led me to logstash. The docs for logstash are good, but sparse, so here is how I got logstash hooked up to syslog-ng without changing all of my source machines.
The general flow should be:
LOG SOURCE ———-> SYSLOG SERVER ———-> LOGSTASH
I’m not going to cover fetching logstash, making sure you have java installed, etc. If you need help with that, checkout the Getting Started with Logstash guide.
How we look at environments…
There is often confusion in the space of environments and also a lot of discussion around what environment should be used and for what purpose. At Yale, this is especially prevalent because this place is so big and there are so many groups doing things their own way.
We have moved away from individual departments running their own IT to a Shared Services model and over the past 5 years, we have made great strides in standardization and automation. This has been helped along by configuration management, continuous integration and agil-ifying our development practices (I still wouldn’t call them Agile).
Without further ado, here are our current definitions:
Sandbox - An environment running on a developers laptop/workstation for writing code. The developer has root access and can do whatever they want. So far we are not providing these. I promise more to come on this topic later!
Development - The first experience in a “datacenter” environment. This is where datacenter kinks get worked out by the developer (load balancing, networks, shared storage, etc). The developer has non-root access to this environment, but can deploy and/or restart the instance at will via Jenkins.
Test - This environment serves as a test of the migration into production. This environment should be as close to production as possible. Developers have access to logs here, but cannot deploy or restart instances (currently). Only -RELEASEd code should be deployed to Test. Load testing and user acceptance testing should happen in this environment. On occasion, a load test environment will be spun up just for that purpose, but it’s fairly rare.
Production - This is where your released, tested code goes. Duh!
It should be made clear that even though we have strict definitions for environments, there is no flexibility lost here. For the most part, a deploy to test is a change that happens within an hour or two of requesting it, and deploys to production can easily happen the same day they are requested. All deploys are done through Jenkins.
If you are interested in a little more on how we use Jenkins, checkout my previous post.
How do you use environments?
JUC NYC 2012: Recap and Yale Build and Deployment with Jenkins
JUC NYC 2012 has come and gone.
It was a great day and it was awesome to get a chance to find out how other people are using Jenkins and to hear their experiences. I was really happy to catch up with folks that I knew and to meet a whole bunch of new people including the creator of Jenkins, Kohsuke.
I learned about some compelling new companies like XebiaLabs and CloudSmith (who is doing cool stuff with Puppet and Jenkins). I got some great new ideas from the folks at Sendgrid about how they use vagrant, chef and veewee and explored how to make our Jenkins and ruby on rails stuff better listening to Daniel from Art.sy.
Finally, I got the lowdown on some amazing new features coming out of Cloudbees for Jenkins like High Availability and Pre-Tested Commits. They also announced BuildHive, free Jenkins for your projects on Github! Really cool stuff.
Towards the end of the day, I got a chance to take the stage myself and tell Yale’s story (I’ll update this with the video once it’s released):
Shout outs to the awesome folks at Cloudbees, and to Saleem from Thoughtworks who rescued me with a VGA dongle for my Mac.
Yale Jenkins Build and Deploy Presentation
Below is a 20 minute presentation I gave to the ITS Extended Leadership Group. This is a group of people who report directly to our CIO and down 2 or 3 levels, so it mainly consists of technical managers. The presentation focused on Yale’s adoption of Jenkins to improve both our build and deployment processes.
Chef Environments are Awesome
I’ve been running chef on my production and non-production servers for quite a while now (since ~0.8 ish). Environments were introduced in .10.x and they, along with the rest of the chef ecosystem, are pretty slick.
Disclaimer: this is just scratching the surface of the power and usefulness of chef environments. To get a full rundown of the how environments work, check out the opscode docs on the subject.
My goal was to have apache configs built with different attributes (like startservers, minspareservers, maxspareservers, etc) for production instances and test instances. Of course, I could have built a prod_webserver role and a test_webserver role, assigning custom attributes to each, but this seemed anti-DRY and I didn’t like it. I also could have built logic into the template to decide how to populate different values based on other node attributes, but I’m of the mind that templates should have as little logic as is reasonable for your environment (yes, I’m being purposefully opaque).
F5 persistence Viewer
We’ve been working on a persistence problem with our v11 BigIPs + Exchange 2010. During the process, I’ve been lacking a way to really view the persistence, do searches, order it, etc. Maybe it’s overkill, but I wrote a little rails app called f5persist.
There are brief instructions in the Readme on how to use it. It uses the cajun-code fork of Caseproof’s datatables gem which uses the power of Datatables.
Have fun and profit!
10 steps to Exchange 2010 Client Access Servers behind F5 BigIP LTM v11
We’ve been running Exchange 2010 behind a pair of 6400s running version 9.4.8 for about 6 months. Since we started, we have seen a significant imbalance in traffic to one node in the cluster. The hammered node will move to one of the others if it gets rebooted or connections are interrupted for some reason. We spent quite a lot of time trying to get this right, but ultimately decided it was no longer worth the effort or time since new LTMs with a new version were on the horizon.
Well, the time is now! Our 6900s running version 11.1.0HF1 are here and have been put into production.
A few notes:
- I’m only going to describe our configuration. There are a million and one ways to configure Exchange behind a load balancer. This is how we did it.
- I’m responsible for the F5s, but not Exchange. I’m no Exchange expert and I won’t be explaining any changes made on that end.
- This might be out of date when a new version of LTM is released or when a new version of the deployment guide is released.
- I generally use the CLI, but since the iApp stuff is GUI driven, these are mostly GUI based instructions. I’m not going to tell you how to navigate, just what to do.
- There are required changes to the authentication method for IMAP and POP in Exchange, these are in the doc and I leave them to you.
The relavent bits of our configuration:
- v11.1.0HF1
- Exchange 2010 with 4 CAS servers
- A single IP will be used for all services (https/imap/pop)
- We don’t use RPC Client Access
- We are deploying:
- Outlook Web Access
- Outlook Anywhere
- Autodiscover
- ActiveSync
- IMAP4S
- POP3S
- We did not enable TCP request Queing
- We primarily vend services to a LAN
- We are re-encrypting all of our services
- We did not enable analytics
- We do not expect more than 64,000 simultaneous connections
- Our servers do not default route through the BigIP
How to view your SSH Key’s Fingerprint
I (along with many others) received this from github today.
A security vulnerability was recently discovered that made it possible for an attacker to add new SSH keys to arbitrary GitHub user
accounts. This would have provided an attacker with clone/pull access to repositories with read permissions, and clone/pull/push
access to repositories with write permissions. As of 5:53 PM UTC on Sunday, March 4th the vulnerability no longer exists.
While no known malicious activity has been reported, we are taking additional precautions by forcing an audit of all existing SSH keys.
# Required Action
Since you have one or more SSH keys associated with your GitHub account you must visit https://github.com/settings/ssh/audit
to approve each valid SSH key.
Until you have approved your SSH keys, you will be unable to clone/pull/push your
repositories over SSH.
# Status
We take security seriously and recognize this never should have happened. In addition to a full code audit, we have taken
the following measures to enhance the security of your account:
- We are forcing an audit of all existing SSH keys
- Adding a new SSH key will now prompt for your password
- We will now email you any time a new SSH key is added to your account
- You now have access to a log of account changes in your Account Settings page
Sincerely, The GitHub Team
--- https://github.com support@github.com
I also have long forgotten how to see the fingerprints of my SSH keys. It’s pretty simple:
ssh-keygen -l -f ~/.ssh/my_ssh_key
Have fun!
Continued: Launching Amazon EC2 Instances with Knife and Chef
This is an update to my previous post here.
Talking to a colleague about what I was doing, he asked why I chose to use instance storage. I didn’t really know. According to him, bootstrapping instance storage is far more time consuming than using EBS volumes and using EBS also provides other benefits (like easy migration between Dom0’s).
With that in mind, I changed my knife command from this
knife ec2 server create -I ami-41814f28 -d fedora13-gems -f m1.large -Z us-east-1a -x ec2-user -r [MY RUN LIST]
to this
knife ec2 server create -I ami-1b814f72 -d fedora13-gems -f m1.large -Z us-east-1a -x ec2-user -r [MY RUN LIST]
Delivering favicon.ico for virtuals with no default pool
We support a large number of apps that live behind the same CNAME and port, but are vended from various backend server and server technologies (tomcat, jboss, ruby, etc). To simplify our environment, we push all of our logic up into the load balancer and don’t run a fat webserver like Apache httpd.
This works great and provides lots of flexibility for sending URIs around to wherever we want them. Unfortunately, one of our clients recently noticed that they were no longer receiving a favicon! THE HORROR!
We are an institution, so the number of favicons should be small, but we have a huge variety of apps, so pushing the icon into each app and making sure it stays up to date is a big deal.
Here’s how we solved the issue…