The Product Backlog is a feature list. Or a list of User Stories if that's your approach. Either way, it is a simple list of things that are of value to a user - not technical tasks - and they are written in business language, so they can be prioritised by the Product Owner. There are no details about each feature until it is ready to be developed, just a basic description and maybe a few notes if applicable.
'POINTS MAKE SIZES'
Each item on the Product Backlog is given a points value to represent its size. Size is an intuitive mixture of effort and complexity. It's meant to represent 'how big it is'.
I like to use the Fibonacci number sequence for the points values. Fibonacci goes 1, 2, 3, 5, 8, 13 - where each number is the sum of the previous two. This builds a natural distribution curve into the estimates. The bigger something's size, the less precise the estimate can be, which is reflected in the widening range between the numbers as they get bigger.
Points are an abstract number. They do not convert to a unit of time. They are simply a *relative* indication of size. In other words, a 2 is about twice the size of a 1. A 5 is bigger than a 3, but smaller than an 8. Developers find it hard to estimate accurately in hours or days when they don't yet know the details of the requirements and what the solution involves. But it's easier to compare the size of two features relative to each other.
ESTIMATE AS A TEAM
The points should be assigned to each backog item as a team. The collective intelligence - or wisdom of crowds - is an important way to apply multiple people's experience to the estimate. If you have a very big team, you can split up so it's quicker to do this, but the estimating groups should ideally involve at least 3 people, so you dont just get two opposing opinions.
Planning Poker is a fun technique to facilitate rapid estimating as a team. The team discusses a feature verbally to understand more about what it entails and how it might be done. Each team member writes what they think its size is (in points) on a card. All team members reveal their card at the same time. Differences in opinion are used to provoke further discussion. Maybe one person saw risks and complexity that others didn't. Maybe another persion saw a simpler solution. The team re-votes until there is a concensus, then moves on to the next item.
DONE MEANS DONE
During the Sprint, or iteration, the team only counts something as Done when it is completely done, i.e. tested and signed off by the Product Owner. At that time, and only at that time, the team scores the points for the item.
The team shows its commitment and daily progress on a graph, so it is measurable and visible at a glance. This is called a Burndown Chart. The burndown shows the total number of points committed to, depreciating over time to the end of the Sprint. This is the target line. It also shows the actual number of points scored each day - i.e. the sum of points for all items that are 100% done and signed off so far. The team plots this each day before their daily stand-up meeting. When the actual line is above the target line, the team is behind. When it's below, they're ahead.
At the end of the Sprint, the team's score is called their Velocity. The team tracks its Velocity over time. This allows the team to see if it's improving. Of course at some point it will stabilise, if the team is stable. If not, this is an issue in itself. When Velocity is relatively stable - in my experience that will be after 3 or 4 Sprints - it can be reliably used to decide how much (i.e. how many points) the team should commit to in the next Sprint.
RELIABILITY / PREDICTABILITY
As a result, the team can measure how reliable - or how predictable - they are. The metric for this is Velocity (points scored) as a percentage of points planned. As Velocity stabilises, the team's Reliability will get better, and the team will be better at predicting what they can deliver. Ironically, the team doesn't need to get better at estimating to get better at delivering on their commitments. Even if they are terrible at estimating, as long as they are consistently terrible, with this method they will still get better at predicting what they can deliver.
POINTS VERSUS TIME
One of the benefits of points is that it does not relate to time. Resist the temptation to convert it. If a team plans on 100 points and delivers 50, can you imagine telling your stakeholders that you are only planning future Sprints for half the team's time. If a team commits to 100 points and delivers 150, imagine telling the team you're planning on doing 60 hours each per week. It just doesn't work. Points are not a measure of time. They are abstract, relative sizes, and a measure of how much can be delivered. That's why it works. It works because the team can adjust its commitment based on what its track record shows it can usually deliver.
This does not measure a team's productivity. Velocity does tell you if a team is getting more or less productive. But you can't really use Velocity to compare the productivity of two teams, as their circumstances are different. And you can't use it to determine whether a team's Velocity is as high as it should be. For this, you still need to use your judgement, based on previous experience and taking into account many subjective factors.
PLAYING THE SYSTEM
Using these two metrics - Velocity and Reliability - it's hard to cheat the system. If a team commits low, they acheive Reliability but Velocity goes down. If a team commits too high, their Velocity goes up but their Reliability goes down. This is like the balanced scorecard concept. The metrics are deliberately measuring opposing things, so they can't easily be played.