Wednesday, 12 June 2019

Just another acronym TBD

Working at scale I am constantly aware of how much we decide upfront. Before it gets anywhere near a team a lot of time goes into looking at what it is, what will change and who will be involved. In some cases, whole designs are considered before a team even see's it.

On the face of it, there is good reason. It costs a lot of money to build things: better make sure it will give a good return. Things taking longer costs even more: better make sure we know what we are getting into. It takes a lot of people to build stuff: better make sure we know who is involved so we can make sure they can actually make it.

We loose something important in doing this - our competitive edge. Every week we take in understanding the risk and cost is another week our customers don't have our product and our competitors have a window of opportunity.

Working with teams, I am aware of how much we assume. We build architectures based on our understanding at the time, which often include a lot of assumptions. I like assumptions because we can actually prove them out - but we usually don't.

We often build more that we actually need since we don't or can't prove out these assumptions. After a while of a service running, I have seen teams reduce the system in lots of different ways. This can sometimes be by removing caches, services that store data or using different scaling patterns that we discover over time.

We would benefit massively from building something small because we can see how it responds in the real world. We get feedback using monitoring and telemetry to understand what is going on and we can make better decisions on it's design and architecture based on that information.

But we need to start somewhere...... so how about we just focus on getting this data in the easiest way we can.

Imagine we took our best guess at an architecture that would suit our intended audience and built the services and deployed them. We make sure we add no logic whatsoever, only the bare minimum to allow the system to interact and we focus only on the monitoring and telemetry.

We can then load test this in a live environment and we could call this a 'best case' system. Without the logic this is the fastest it could operate - anything we add will slow it down. See it as an extreme case, where we are looking at the skinniest skeleton we could possibly get away with.

We can load the system and see what happens. We could also introduce waits in areas we can anticipate more logic and see what happens under load. We can add more monitoring where we have poor visibility. We can stub 3rd parties and make them 'misbehave' to see what happens.

Since there is not much too this, we can quickly move things around and see what happens and fix problems we can see - essentially we searching for a baseline test that we are happy with before we add anything else. We can easily remove things that don't have a measurable impact in the scenarios we are testing.

Since we don't have an logic there is no need for unit tests, meaning changes can be quick. As it does not do anything and contains/accesses no data, it is benign in a live environment so should not constitute a security risk either.

When we do start to add logic, we have a baseline we can compare to and a suite of tests that we use to monitor KPIs that we can actually monitor from the beginning. We also have an architecture that is better than a guess - it's already got some data to support why this is the right place to start.

I call this Telemetry Biased Design but it just sounds like a cool way of making sure you starting with just the right amount of architecture to solve the problem you have.

In full disclosure: at time of writing, I have never tried this. I am no longer an engineer and I work with smart people who get things done in their own way. It's just an idea.

Tuesday, 14 May 2019

The hidden features of Feature teams

I while back I used the metaphor of booking a meeting to describe some of the problems with planning and priorities in development teams which are looking after discrete areas of a large system, which are typically called platforms.

An alternative approach is to use feature teams, which allow us to comprise a team of everyone needed to deliver a feature. You can immediately see, we don't have as much co-ordination overhead because a single team can deliver a feature end to end. We just have to task a team to deliver something and they can, making it easier to prioritise.

There are several flavours of feature teams but it really boils down to 2 types - permanent or prioritised. Permanent is exactly that, the feature team are permanent and have a stable number of permanent people. Prioritised is a team formed to solve a particular problem which are then disbanded after they have delivered the feature.

From my experience, it is when we look at feature teams over a longer term so we start to see some problems and they are pretty difficult to solve.

In a prioritised team, the immediate problem is with how people work together. It is true we often see a bounce when we form teams as we are invigorated with a new challenge. We also know this does not last forever and we know teams go through a storming phase when they are new. It takes time to get used to working with new people before we can form ways of working and finally reach our collective potential.

Depending on the length of engagement of a feature we should be sensitive to this and what the people in that team are going through. They should be fully bought into joining and working in a feature team and ideally volunteered rather than were placed. This might be difficult to achieve in some organisations and we should expect some people will not be a good fit for prioritised feature teams.

It is a safe assumption that features cut across multiple systems so these people would be touching multiple systems in order to deliver the feature end to end. You can read of the many accounts of how this can be handled and it really comes down to disciplined engineers doing things 'the right way' for their context.

I know I have seen many of these in my career but I have seen more engineers who would also start to change systems outside of the feature requirements because they did not like it or are acting altruistically and leaving it 'a better place' (Boy Scout Principle). Getting the balance is incredibly hard which is why I think the success of changing ownership from discreet systems to features is  more to do with people than process.

We have to remember that software engineering is incredibly subjective so there is no one way to solve problems and coming to a consensus across 10s of engineers is hard. It's even harder with 100s of engineers, no matter how well meaning they are.

So we can mitigate some of these problems with a permanent feature team, which we can help through the forming stage and stabilise only once. They can form their own arrangements and start building things pretty effectively, getting that balance of engineering practices through evolving agreements.

There are many upsides to feature teams from a development perspective as they allow people to have exposure to multiple systems and architectures and work on differing problems which can often become stale in platforms. The counter is that without ownership you need some pretty well developed disciplines in your engineering teams that will prevent systems rotting from being constantly adapted to meet the requirements of features.

You would also need to be much better at moving people around since features typically need different systems to be delivered. So the permanent team probably is not so permanent unless you expect the team to learn the systems as they go, which might be going against the improvements in delivery you expect from less coordination.

It's operationally where I see many cracks and I call this the '2am problem'. Put simply, who do we call at 2am if there is a problem? The prioritised feature team may not be around anymore and although that problem is solved with a permanent feature team it may not be clear which team needs to be involved since you would need deep system knowledge in order to link the fault with the feature that requires it.

If we contrast with a platform team for a second. If a system had a problem, there is a clear owner for that system and that is who we call. In a feature team world, multiple features may have altered a system and when a fault occurs it might be quite difficult to know which team has the knowledge to resolve the issue.

This actually looks worse over time. With a permanent feature team delivering multiple features touching many systems they are now having to support those features as an ongoing concern if we are to preserve the very sensible mentality that the teams that develop the systems should also monitor and maintain them. Think also what this looks like to new members of the team where they have to learn how multiple features work end to end, which could well be more difficult than learning how a discreet system works and how that fits in with other systems.

You should also expect that any sort of centralisation will also be impacted as you need to allow feature teams to find their own path. If you start to force centralisation on feature teams then you again suffer from coordination problems which will slow them down again. We should expect and maybe even embrace duplication of effort as the same problems are solved several times over. To be fair this is something we also struggle with in platform teams too but I would expect it to be more pronounced in feature teams since they work across multiple systems.

A coping mechanism would be a centralised support function that we hand over support to. This feels like we are crashing out of devops thinking entirely which we have seen to improve stability and time to recovery of systems since they emphasise ownership and responsibility. It also hides the true cost as we burn time in handovers, documentation and operational procedures which extend out of the development cycle - essentially, you pay for it in the long term (OPEX vs CAPEX)

This feels like I am not a fan of feature teams, which is not true. They solve many of the coordination problems we see with platform teams but they introduce other problems. This is what forms the majority of my learning with scaled systems - you cannot win. With every change, you introduce some problems which you need to be aware of. Ultimately we have to decide which problems we would prefer to solve.

Tuesday, 30 April 2019

Are we expecting too much?

Let's do some maths:

(67-x)/2 = y

So if I told you to substitute 'x' for your current age, what do you get for 'y'?

For me, at this current moment in time 'y' would be 12.5 since I am 42.

That is the number of positions and pay rises I would need to occupy me until my retirement age. This is based on a very reasonable expectation that I would be promoted every 2 years.

12.

Are there any career ladders that have 12 steps in them? Nope. Is the top one the 'Supreme Lead Senior Manager <whatever>'? There aren't even enough adjectives to describe these roles.

Also bear in mind I have already been doing this for 25 years so if you are just starting out you have an even bigger number.

How are you going to fill the decades of when you reach that top role and your retirement? Does the tech industry even have a track record for this? How about another game.... name the oldest person doing your role in your organisation. Are they 60? How about 50? 40? 30?

It is a probability that we will not be with our current company until we finish working. Looking across the tech sector, 3 years seems like a pretty good run. I have seen some data that suggests 2 years is more likely.

Why not be honest?

In your time with your current company, what do you want to achieve? What do you want to learn? Who do you want to be taught by? Who can you teach and what could you give back? What stories do you want to take with you? Which people will you want to keep in touch with? What great ideas will you emulate? What experiments will you convince people to run with? What improvements will you leave behind? What new habits will you form?

How long do you think that will take?

We all walk in to a company and totally ignore the end.

Expecting organisations to somehow create roles and an ever increasing pay packet is simply too much. They are often just a chapter in our total story - when we understand that our time together will end, we can choose to treat that time with the importance it deserves and make decisions that benefit both us and others.

People leaving us stronger, better, more confident is surely an outcome we should look for and even celebrate.

Clearleft had a lovely way of seeing this:
"Our passion for the digital community and our innate collective desire to make a meaningful contribution towards enabling design to thrive beyond our studio is something that continues to excite and motivate us. The success of ex-employees is one part of that which continues to make us all proud."
Lets start setting an expectation that the time we spend together will be full of mutual learning and value and when it ends we will both be better, wiser and still friends.

Saturday, 16 March 2019

What we can learn from booking a meeting

You have probably had to book a meeting. Tell me if this sounds familiar....

I open up the calendar, find my perfect slot and then start adding delegates.

That's when the problems start. Of my 10 people, I try to find a slot that matches everyone's availability.

Epic fail! As hard as I try, around the date I wanted there is no 'magic' slot where everyone is available.

I have an additional problem in that some people have decided to not allow anyone to read their calendar so I have no idea when would be convenient for them. Oh well, everyone likes a surprise I guess.

The only people I have success with are those people with gaps in their calendar. People who are back to back force me to look at different options.

I also notice that people who have shorter meetings are easier to fit things around. Especially if there are gaps on either side.

The only way I can guarantee everyone's availability is to move it way, way out into the future. For this particular meeting, that's really not an option.

This is where most people would probably just book their ideal slot, giving a compelling background and agenda and adding lots of bold and uppercase writing - basically saying "You need to attend this, move stuff around!". This strategy means you expect flexibility from your delegates.

Maybe you force people to attend. You email each one explain it's not optional and that they need to sort something out. I wish I were that important.

Neither of those are an option for me, so what else could I do.

I could book this out of hours..... they would be available then, right? Failing that, I could always book over a lunchtime that is something people can be flexible with. I would of course be pure evil if I did this but as long as I provide biscuits, I might survive.

So lets look at what would make this successful:

1) Visibility - the more I can see, the better the chances of being able to find something that works for everyone.
2) Flexibility - You could see this from both parties. I need to be flexible with my dates and my delegates need to be flexible with theirs. I might find something that works for most people but some people will need to fall into line.
3) Slack - Without slots anywhere in a day, you will always have to be flexible. So slack helps. People who have planned their days from beginning to end can only disappoint others when they have to change their plans or not attend. Size is a big part of this - the shorter your meetings, the more options this creates when we also introduce slack using gaps in your calendar.

In an organisation that uses a platform model, this metaphor is close to problems with planning at scale.

To build anything, you need to align your development with other areas of the business which have other stuff going on too. It is only together that you can get something to your customer.

Differing priorities for each platform required to build a new feature can reduce flexibility.

We are often transfixed with utilisation of our development people that we fail to see the side effects of this local optimisation - in this case, it makes it even harder to get something in front of our customer. Without slack in our planning systems we offer no opportunities to adjust without making a bigger change somewhere else.

Following the metaphor, we often see long wait times as we have to wait for everyone to become available to even start. This increases leadtimes as it delays when we actually start the work.

The sizes of work also play a huge part. If batches of work are in the months size then we have to lengthy waits and it makes it even harder to align all the platforms that need to contribute. If the owners of those platforms are not flexible either, the problem gets even worse as we cannot negotiate.

We always have the option to stop in the middle of batches but we need to be aware of the waste this might create - the team will have to stop work, potentially having to maintain a branch until they can work on it again as well as loose track of where they were in that work.

As much as we would like people to just switch around, the reality is this stuff is hard and a mental model of what we are doing takes a while to form. We are human, we forget. We also loose track of how important focus is to development teams - swapping and changing types of work does not help!

In a system with a platform setup, the only way to win at planning is to start to change peoples behaviours so you have visibility, flexibility and slack in their local systems to allow better planning at an organisational level.

You could force alignment but be aware of the waste this will create in your system which will probably be hidden from sight.

Or you could simply delay work until you find the 'magic' slot :)

All of this is fantastic argument for vertically aligned feature teams which promise to solve all of these problems. I will write about these soon....

Monday, 25 February 2019

Maturity measured by Retrospectives

Having built up a wide range of retrospectives I can't help but notice the changes teams go through as they mature in their practices.

If you think about it, if we are continually improving our delivery of software then it follows that our retrospectives will start to look and feel different as we focus on different problems.

1) Stop, I want to get off! - I think the first stage after the euphoria and excitement of moving to an agile process, is to winge. The complaints come thick and fast and it's usually everyone else's problem. It is cathartic but does not really move things forwards. Actions are probably difficult to get. The core problem is that the team don't want to own the process - they have been used to someone else owning it and have not accepted responsibility for it (or have not been allowed to)

2) Own it - Eventually, the team accept that although there are wider problems they own a lot more than they thought they did. I use the 'Perfect Sprint' retro to reset this - showing the team that their perfect sprint is largely in their control helps us get there. They start to come up with actions that they can do but these tend to be poorly managed. The team are still not clear that they own the process and it's still someone else's fault if it isn't working for them.

3) Going through the motions - It's easy coming up with ideas but putting them into action is another problem. The team essentially go through the motions, generate actions but don't have the guts or inclination to go through with them even if they are great ideas. If nothing is done about the actions then there is little point in the retro itself. The core problem is that carrying out actions requires a change in behviour, specifically it means the team need to take responsibility for their process. They realise they own it but changing it might still seem scary or too much work.

4) This is boring :( - To me, this is symptom of the process being OK and working for the team. The actions from the sprints might be predictable and may be difficult to implement since there is nothing much they want to improve when they look at the whole process. It feels boring to the team since everything is OK and the improvements are not immediately obvious or even with the process itself, which has been the primary focus up until now. Actions are pretty well managed by they might look and feel 'samey'.

5) Dive, Dive, Dive! - This is where we look at very specific areas rather than the process. Maybe this is a specific story or problem that the team have encountered. I have looked at metrics, team values, customers and suppliers and other focused topics. These focused retrospectives usually result in actions that are more difficult to put into action e.g. influencing other members of the organisation and usually take longer to carry out. The team will usually be good at keeping track of actions at this point and holding themselves to account.

6) Can we do that? - The team are creating actions and the scope of those actions is growing into a space where they are questioning what they can change. The team starts to ask to own more from its stakeholders and possibly changing the status quo.

7) Hands off - For me the last stage of this process is when the team own the retrospectives too. They see value in the ceremony and are disciplined in keeping track of and carrying out the actions they come up with. They challenge and support each other in carrying out actions and ask for help or advice if they need it.

This is not a sequential process and regressions are common. These are the stages I have seen and think about when working with a team - the goal being for them to own their process and it's evolution whilst honest enough to ask for help if they need it.

Thursday, 3 January 2019

Using extremes to help teams

This is technique I have been using for a while and has a wide range of applications. If you are familiar with affinity mapping, then this takes that and expands it for use in several different areas.

Let's start off with a simple application....

Story sizes have not been working for me for a while. I love the conversations which often uncovered something we had forgotten or someone knew that nobody else did. Unfortunately differing sizes were often a catalyst for these conversations.

In my previous blog about super fast sizing, we used a different way of sizing that I think gets you the best of both worlds.

How I use this in refinement is knowing where to put more effort. 5 days or less needs no more intervention. The team I work with now even have something called a 'fast track', which is 1 day turn around which they do not refine since it would cost more to talk about than do.

More than 2 sprints means we need to spend some time looking at how we could break it down. The ones in the middle we also try to break down but can also accept the increase in risk if we want to.

Asking these questions and gauging the response from the team allows you to invite conversations as a facilitator and help the team explore the scope of the story.

You can also use this for long term estimates. When I was asked about how long a piece of work could take I used extremes to give some idea - "4 months is too pessimistic but 2 months feels to optimistic but it will be somewhere between the two. With progress on this bit of work I would expect this come in as we know more, which I can update you on as we progress". This was just enough to allow our stakeholder to plan.

I often use fist of five voting to get people's feedback on ceremonies. This again uses extremes and you can be playful to make it more fun e.g. ".... where no fingers is 'please, please never ever do this to me again' and 5 fingers is 'this is sooo cool, I think I might get in early just so we can sneak another one in before work'"

Recently, I have used extremes to challenge the status quo. When looking at staff retention, we can honestly ask what would happen if nobody left - is that ideal? This extreme stance makes us realise that the extreme is not ideal either so we can start to ask questions about what is the ideal. In terms of staff retention, we can be realistic about what to expect using the extremes to guide that thought process.

My favourite comes in exploring ideas. We had an example where we were talking about testing strategies and testing environments. We used an extreme to explore some new thinking rather than just what we had.

In this extreme, we asked what our testing strategy look like if we only had our production environment and our local development environments - no dev or integration environments. We explored what branching and releases might look like, what testing we could do and where and the risks we would face.

It's deceptively simple and you have probably be using it without realising in some areas, enjoy!

Friday, 16 November 2018

Technical Debts and Loans

Yesterday I had one of those fantastic conversations where suddenly ideas crystallise and take a new unexpected form.

Talking with one of my team about technical debt we were musing on if that was the right word. There are so many ways technical debt could be created we were wondering if having a single word was helpful.

For example we may have done something that was the absolutely the right thing to do at the time only for us to learn a better way of doing it in the future. We have debt to clear up but it was not done on purpose.

In another instance we may make a decision to do something in a less that ideal way to enable something else. A common example could be to meet a deadline or recoup some lost time.

This second one, my esteemed colleague suggested sounds more like a loan than a debt. We have traded something for something else we need right now and have the expectation that it will be paid back at some point.

This trade off is a decision and these are difficult to keep track of. Someone else might not have the context of this decision - with the code alone, it just looks like debt.

Let's take that loan metaphor a little further....

If we were going to take out a loan we would definitely have a record of it. It would contain a term and conditions along with an agreement of how much this will cost us at the end of the term.

We would also be aware that this would cost us more than it would have. We have traded getting it now for a higher cost, which we have decided is beneficial to us in the short term. This ensures we are happy with this cost of this service.

We would agree payment terms too, upfront, so we all know when the loan will be repaid in full.

The amount this will cost us depends on several factors which are linked to risk. Where the risk of non-repayment is low the cost of the loan is low and we have more flexibility on the length of the term. 

Where the risk is high, the cost of the loan increases and term typically shortens.

The purpose of the loan is also a factor. An investment which can cover the risk, like a house will typically lower the risk, whist something like a car which depreciates quickly will increase it.

Finally the person taking out the loan is considered. In the UK, we have a credit score system which scores your risk as an individual - where your track record on loans and repayments is taken into account. 

If you have a habit of not paying loans back, you can be sure you won't be considered a good risk for future loans.

At the extreme, where loans have not be paid after several requests and warnings, collectors will be employed to forcibly recoup the outstanding amount along with additional fees to cover the hassle.

So, let's apply this to some software development!

We have the option of delivering something a bit faster if we trade off some technical area. For the moment, let's assume our stakeholder has a good line of credit so we offer a 'technical loan agreement'.

We outline what we are trading off and what the implications will be in the future. We decide on the risk of this and let that inform the term of the repayment. This term is the maximum amount of time the loan can be left unpaid based on the risk it represents to us as a development team.

We all agree this is the right thing to do and we store the loan agreement as a document which is included in the source for the project concerned. It acts as a permanent record of that decision.

When it comes to prioritisation of work, the team will expect a slice of the throughput to address the debts, which is how we make the repayment. These are refined along with all other stories and we can use forecasting to make sure the delivery of them is inline with the terms of the loan agreement.

When the loan is replayed in full, the document is removed from source but the history can still be relied on if we need it in the future.

If multiple loans are being repayed, there may come a point where this becomes unaffordable for the stakeholder - the repayments for all the outstanding loans exceeds the number of stories we are capable of delivering. The development team can refuse to give a loan until the situation improves e.g. the stakeholder could pay off the outstanding loans in full by giving all the available stories to the team.

Where the terms of a loan are not met, we have some options. 

In cases where not paying loans becomes a habit, our stakeholders credit rating would also be impacted. We could only extend loans to low risk decisions limiting the options for our stakeholders. We might even stop loans being offered entirely until the situation improves. The stakeholder may have to rebuild their credit rating with us before they get what they want.

If we have to forcibly collect on a loan (after having asked nicely, many times), we would take stories away from our stakeholders until the debt was paid in full, slowing delivery and probably causing some pain in the process. This could also effect our view of the stakeholders credit rating since we had to intervene.

This may seem playful but it increases visibility of these decisions and gives you feedback which can help build better behaviours across the product and technical teams. 

I have written on this subject before, check out "Debts and Credits in your backlog".