Why does Radix use a UTXO model instead of a balance model?

Short answer - because if you store balances you cannot do state sharding without making the transactions non-atomic. A UTXO type system of consumables allow you to state shard with zero destination overhead, whereas a balance model ends up having a significant destination overhead, making sharding inefficient and only secure through a form of centralised orchestration.

Long answer - to answer this question we need to first establish that sharding is necessary - generally speaking there are two constraints with a DLT:

1. Transaction throughput

2. Storage throughput

With a DAG your transaction throughput is higher than on a blockchain as the transport can be optimized  - you are dealing with transactions individually rather than as a block. However, those transactions still need to be validated, and at around 2,000 transactions per second on a single shard (assuming standard servers) you will see the performance of the full nodes on a network start to drop significantly. Pruning the ledger does nothing to reduce this load as this is a throughput constraint, not a storage constraint.

To scale beyond the 2,000 transactions per second mark you will therefore need to start splitting the throughput load; this requires sharding the network, splitting the work amongst the nodes, rather than requiring all the nodes to do all the same work.

Now that you are splitting up the ledger into parts, you need a method of dealing with transactions between shards.

Using balances, rather than UTXO can certainly reduce the storage requirements of the ledger on a given shard as you can prune the transaction history; however, one shards output is no longer another shards input without overhead. Since no TX history is being stored, just balances, the transactions between shards are not atomic, and this starts creating serious complexity. As a worked example:

There are four shards - A, B, C, D and I start with a balance of 50 on shard A and send 10 to wallet X on Shard B.

My balance is now recorded as 40, and at some unknown time in the future wallet X will have a balance of 10 (lets disregard the mechanics of how wallet X becomes aware of the spend in shard A).

However, I also send the same 10 to wallet Y on Shard C, and wallet Z on Shard D.

If I am only recording balances, and not UTXO; and only one of those transactions is valid, how do I void the updated balances of the wallet on the other two shards once the triple spend in Shard A is resolved?

The overhead I tried to avoid by making transactions non-atomic has simply moved to the double spend resolution process, as I now have substantial inter-shard communications to perform to correct the double spend as the output of one shard is not the input of another; we have no transaction history to construct a balance, just the balances themselves.  

Now there is a co-ordinator requirement again - some sort of central authority with a guaranteed, up to date view of all shards, to make sure that any given spend only updates 1 of the balances on any shard.

Without that, without transaction history, and without atomicity, double spending between shards becomes trivial.