Sandy Bridge is impressive in its encode ability, but comparing it to video cards isn't all that valid (video cards aren't really dedicated video encoders). There's not really $1000 cards any more, and the $100-200 cards actually do as well at encoding as the higher end ones do (and pack a lot of gaming power to boot). That could change as GPU compute capability (namely on the software side of things) improves, as encoding could really make use of the highly parallel nature of GPUs and so it would likely scale (whereas it doesn't much now beyond the midrange video cards). Also, to be fair, that hardware on Sandy Bridge is dependent on software taking advantage of it as well (but having it on CPUs will definitely help its adoption). As portable hardware becomes more capable its also becoming a bit of a moot point as there's less need for transcoding/encoding.
Optical out would be (I'd say more would have been at this point) nice. Every new computer does have a USB port completely capable of handling a DAC (the port can, its the software that is the issue with 24/192). HDMI is common enough that multi-channel even has an easy way out (and I believe companies like TI are coming up with receiver chips that handle SPDIF, toslink, USB, and HDMI so that it'll handle high rate audio through all of them, so we could be seeing DACs with HDMI ports). Optical and high quality analog output are viewed as largely unnecessary extra costs on PC hardware which are already low margin products.
I think the better thing would be on-die dedicated audio processor (similar to the GPU, in fact they might as well tie it in with the GPU, and many SoC already have dedicated audio parts, so this is actually fairly likely to happen anyway). This way they can keep software from mucking with the audio and be powerful enough to improve audio in general (can process multiple different audio streams and not have to alter sampling rates, so that you can have a high quality movie or music audio track playing, talk with someone on Skype or a "phone" call, watch some Youtube clip they linked, etc). Also, this would make it so that programs can just defer to it versus having to mess with making their own. Right now, they sorta have that, but its handled by Windows via software and its not the greatest (weak, I think forces resampling or you have to use exclusive modes and stuff like WASAPI, not to mention its poor for games, and I doubt its all that great at handling input like mics).
Also it could enable them to do away with the current system of audio, where placement is pre-encoded. So that there'd be spatial data in the audio files, and the processor will automatically place audio according to how you have the output setup (how many speakers and how they're placed, which would allow everything to be binaural for headphones). Eventually it could even enable you to do other things (imagine "changing your seat" so that you could sit closer or farther away to the stage). It could even pre-encode it for other older devices.
By improving the capability, it would lead to improved quality (both directly, and by people wanting better equipment to take advantage of the better quality), and this way they could focus on the aspects they can handle explicitly, while staying away from some of the more subjective aspects. They could add something like Audyssey though, that would help you with setup and do some testing and optimizing for you.
If there was one computer audio thing I want though, it would be for them to improve communication between devices, implement something like EDID. This would help a lot for audio devices as then it wouldn't be dependent on the manufacturer writing proper drivers/software just to get the hardware to function like its supposed to. It would likely help with clocking and other issues as well.