Why our solutions are different...

We architect and develop award-winning systems e.g. VOD platforms and portals which we also host and manage.

picture

An Operations Management Technical Lead

We work very hard to ensure the availability of our clients' infrastructures and applications. Our engineers work in customer aligned teams so they have in-depth knowledge of not just the generic technologies but (and most critically) of the platform specific configurations that makes each customer's infrastructure unique. We asked one of our Technical Leads to keep a diary of a typical day's activity as a way of illustrating how we support our customer services.

Hi, I'm Joe, here's my day …

5:45am. Proactive monitoring alerts me that my client's web site is breaching safe monitoring thresholds in terms of network performance. The ioko Service Management Infrastructure (iSMI) has detected a new message in Solar Winds (which we use for network monitoring) and raised a ticket in the Service Desk which in turn creates an alert which notifies the on-call engineer (today this is one of my colleagues, Ray who is based out of Los Angeles which means it is evening time for him) with the content of the syslog message and reference number of the service ticket. I have also been alerted as I have overall technical responsibility for this platform and I like to see what is going on. A lookup is conducted that compares the Configuration Item (CI) of the component creating the alert with the service attributed to the CI which, in this case, the linkage results in our Microsoft Application Support team being notified in order to take action before the incident affects service availability.

Our iSMI system means all of this alerting and ticket raising is done automatically so everyone who needs to know, knows immediately. The iSMI is unique to ioko, it is a specialized, and integrated ITIL compliant service management system which uses a combination of many different technologies but the main "hub" is HP OpenView - the HP OpenView team rate our system as world-class, we're often doing reference visits for their potential customers.

5:50am. Ray is already on the network and has started to investigate the alert. All ioko engineers can get access to the systems we manage across the world via ioko's SSL VPN portal which uses two factor authentication utilising RSA SecureID to authenticate against ioko's "jump off" terminal servers which in turn provide secure access to different systems whether they are in our data centers or our customers.

6:00am. Within 15 minutes the issue has been resolved without an outage or a noticeable (for the end user) degradation in service quality which is good news!

9:00am. As I am the Technical Lead for the account I need to know and understand everything that's going on so Ray talks me through the details behind the incident. Whilst the system is now stable, Ray was unable to identify the root cause of the issue (the problem). Following standard procedure, I instruct another one of my engineers to conduct an investigation into the root cause of the incident which will include a monitoring review to ensure the current alerts are optimized in light of this last event.

9:10am. I am now on a conference call with our Service Delivery Manager for this client, and a technical contact at the client themselves (not all our clients want this level of debrief and detail for non-site-down incidents but this particular clients does and we're happy to oblige). Our Service Delivery Manager chairs the call and briefs the client on the incident and we discuss the potential lines of investigation and any potential interim solutions that may be needed as the incident was serious enough to cause a site-down incident if we hadn't responded as quickly as we did.

9:20am. Each client's manage service team conducts a raft of routine daily checks which collectively covers the thousands of servers and other devices we manage around the world. An automated ticket is generated for each team in the Service Desk so we can maintain an audit trail of the actions taken and ensure the Configuration Management Database (CMDB) is updated against any changes to configurations made in the on-going fine-tuning of the systems we manage. Today's disk space report reveals that an SQL server database LUN (Logical Unit Number) is at 88% utilization. The client is contacted to highlight the condition because although it is within its normal operating threshold by 2%, it will probably breach during the day given the rate that this database has been growing recently.

9:30am. The team and I then deal with routine tickets such as fixing BizTalk suspended items, rectifying an AD (Active Directory) replication issue and debugging a SAN / HBA / Asigra backup integration issue.

There is plenty to do behind the scenes but this is usually quite invisible so I hope my little diary will help illuminate what we do and how we pass our time.

1:45pm. Ray, my engineer who I was working with earlier is coming to the end of his night shift and has completed his initial investigation of this morning's incident. He has identified a root cause fix and a change has already been raised to implement. As Technical Lead, I review the change plan and approve it before it goes to the customer for co-authorization.

3:42pm. A high impact incident has just been raised. It's the middle of the Rugby World Cup and the Quova Geo-IP blocking software is preventing access to content to users based in the Channel Islands for some reason. The client calls our Service Desk to report that their Channel Island users are complaining via their forums site. A high impact ticket is raised which kicks off an end-to-end check of the VMS platform (VMS in this case means Video Management System). The VMS is a system that the client has built themselves and its job is to ingest video assets, encrypt, them, generate licenses and deliver content to the client's Content Management System. The end-to-end check of the VMS is OK and we contact the client to confirm that solution is operating within expected parameters but that we are now digging deeper - strictly speaking we don't need to do this as the problem is outside the scope of what we manage but its not our philosophy to leave a client in the lurch. My engineer has his suspicions and he enlists the help of one of our application developers and on their further investigation, a bug is discovered in VMS' SOAP call to license web services. This web service uses Quova Geo-IP blocking solution which is why Channel Island IPs are unable to receive a license. The engineer feeds this back to the client and the client is able to go back to their own development team and create a fix. Once the fix is available, and after testing through RC and Stage environments, we deploy a patch to the production environment and all will be well with the Channel Islands users.

4:00pm client calls ioko service desk - a power outage has occurred at the client's site. The outage at the client site prevents content editors from accessing the Video Management system and as a result they are unable to produce real-time content for the rugby world cup site. This is a critical issue to the client as advertising revenue is generated directly through click through on the .com site and its particularly urgent as it's an England match. I am able to use email and remote access to get the content into the environment as a temporary work around.

5:00pm. Finally I get around to a request from one of our sales people out in the Far East. He has a prospective new client who wants to build a new mobile media delivery service but in the meantime they are having some performance problems with their existing solution. I've had a brief phone call with him and he's sent me some emails and logs and I think I know what the problem is but I am glad of the time zone difference as it gives me some extra time to go through the issue in more detail. After I have done this, I write an email back to our sales person outlining what the potential causes could be and some checks that could be made to narrow it down and I offer to help the client's engineering team do this if they don't have the expertise themselves.

6:00pm. Home time! It's been a busy day, though no more than usual. Tomorrow I am out on my client's site for meetings so I'll need to be up early (again). No two days are ever alike and this is what I love about working for ioko; it's a really challenging environment. Anyway, I am off now for a beer with my team, Tom just passed his CCIE (Cisco) which means its his round (ha ha ha). I enjoyed writing this mini-diary - now I see where the time goes ! I think I'll send it to my Mum and Dad, they're always asking what I do and I always mumble something about "erm, I fix computers" and they always say "You've been doing that since you were 10". And I still can't imagine wanting to do anything else ...

Alternative content

Get Adobe Flash player

ioko story

Want to know a bit more about how ioko started and what sort of things matter to us?

recruitment

ioko is an enterprise system integrator with specialized expertise in the media and entertainment industry. We are defined by our genuine passion, commitment and excellence in delivering technology solutions

a day in the life...

Read about a day in the life of Joe, one of our Technical Leads...