Malazan Empire: 'Apple unveils suite of AI-voiced audiobooks' - Malazan Empire

Jump to content

  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

'Apple unveils suite of AI-voiced audiobooks'

#1 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 05 January 2023 - 01:41 PM

'Apple has quietly launched a catalogue of books narrated by artificial intelligence in a move that may mark the beginning of the end for human narrators. The strategy marks an attempt to upend the lucrative and fast-growing audiobook market[...]

Authors were told that Apple – which at the time was not named as the company behind the technology – would shoulder the costs of production and writers would receive royalties from sales.

While there is potential for backlash by professional voice actors, authors themselves are increasingly being asked to narrate their own books. There is a financial incentive for the writers, both in the upfront payments and the expanded availability of their work.

But producing an audiobook with a human voice can take weeks and can cost publishers thousands of dollars. The lure of AI promises to significantly cut the costs.'

Death of the narrator? Apple unveils suite of AI-voiced audiobooks

In the long run this is almost certainly a good thing. It won't end professional audiobook narration, and great performers will be able to train AI on their voices and performance styles, so many more people will have access to them at minimal cost.

OTOH when an audiobook performer sounds like they have no idea WTF they're saying it annoys me. The worst is when they don't even seem to have read ahead to understand the syntax or context and so get the flow and inflection very wrong. (For syntax at least AI might actually be an improvement over those people....)

While machine learning can, given enough examples, accurately predict how a great human narrator would be influenced by the contexts of words, it's seems unlikely to have been implemented (yet...). It probably works more like the virtual singer AI I have rather than ChatGPT or diffusion-based AI art generators---trained on the singer's general singing style and the syntactic flow of singing in English, but not responding intelligently to the particular meaning of a phrase (which can be gotten around through a mix of pitch and timbre re-takes and manual editing---a time-consuming process that's unlikely to be applied to audiobooks, except perhaps for short fiction by authors who don't want to record their own voices).

There's the question of how much recognizable emotion should be audible (which in many cases risks misinterpretation, or imposing the performer's own interpretation at the expense of other possibilities).

Obviously many people like character voices. One benefit of AI will hopefully be better cross-gender voices (provided they're not stubborn about using the same voice for the whole thing) and accents. Parameters can be tweaked and randomized to create a better and wider variety of character or creature voices.

Personally I often like listening to authors perform their own works. Even though they probably aren't great at reproducing the way they'd like it to be heard, it's usually possible to get an idea of what they were going for, what sorts of cadences and voices they were imagining. Ideally they would work with audiobook performers to better realize that. AI could in principle help them do that, at much lower cost and with much greater control over vocal parameters---though in the near term that will be very time consuming and require substantial editing skill. Whereas this is being promoted as a time-saver....

AI voices may eventually be honed to create more ideal voices for each genre, audience, or (with enough relevant data) individual... and also optimize variation and novelty (with listeners having the option to switch voices or alter them---change the accent, be more this or less that, more emotional or less, etc.).

This post has been edited by Azath Vitr (D'ivers: 05 January 2023 - 01:42 PM

0

#2 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 05 January 2023 - 01:58 PM

Of course it may have significant effects on the job market. It's possible that decreasing the number of low-profile jobs will substantially weaken the talent pool over time as fewer people go into doing it professionally. (I imagine that extended non-authorized performances are generally kept off of Youtube TikTok etc. because of copyright enforcement.)

OTOH having to compete with AI may improve the quality of human performance: if a performer sounds like they have no clue WTF they're saying and is too lazy to bother reading ahead a little or doing retakes then they deserve to be replaced by AI. In the long run this could be a bit problematic, but there will probably always (while there are still humans (who don't become transhuman cyborgs, at least)) be demand for actual human narration.

This post has been edited by Azath Vitr (D'ivers: 05 January 2023 - 01:58 PM

0

#3 User is offline   Tsundoku 

  • A what?
  • Group: Malaz Regular
  • Posts: 4,794
  • Joined: 06-January 03
  • Location:Maison de merde

Posted 05 January 2023 - 02:18 PM

I don't like audiobooks to begin with, but having one that sounds like my GPS sounds even worse. Hard pass.

It will take years - decades - before an AI can compete with a human. In lots of things, not just narration.
"Fortune favors the bold, though statistics favor the cautious." - Indomitable Courteous (Icy) Fist, The Palace Job - Patrick Weekes

"Well well well ... if it ain't The Invisible C**t." - Billy Butcher, The Boys

"I have strong views about not tempting providence and, as a wise man once said, the difference between luck and a wheelbarrow is, luck doesn’t work if you push it." - Colonel Orhan, Sixteen Ways to Defend a Walled City - KJ Parker
0

#4 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 05 January 2023 - 02:24 PM

View PostTsundoku, on 05 January 2023 - 02:18 PM, said:

I don't like audiobooks to begin with, but having one that sounds like my GPS sounds even worse. Hard pass.

It will take years - decades - before an AI can compete with a human. In lots of things, not just narration.


These are trained on specific audiobook performers and replicate their performance styles for particular genres.

The technology is probably similar to Synth V (for singing); some of the voices, in some respects, are already better than most human amateurs IMO.

This post has been edited by Azath Vitr (D'ivers: 05 January 2023 - 02:25 PM

0

#5 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 05 January 2023 - 02:40 PM

For example the English sections of this are fantastic (the French a bit less so, though it's not designed to do French---may want to skip to 15 seconds in for the start of the English):




(That voicebank is based on a performer singing in Mandarin Chinese, but they've developed cross-lingual Chinese to English synthesis that works very well.)



0

#6 User is offline   QuickTidal 

  • Frog
  • Group: Team Quick Ben
  • Posts: 21,339
  • Joined: 05-November 05
  • Location:Nowhere Specific
  • Interests:Nothing, just sitting. Quietly.

Posted 05 January 2023 - 02:50 PM

View PostAzath Vitr (D, on 05 January 2023 - 02:24 PM, said:

View PostTsundoku, on 05 January 2023 - 02:18 PM, said:

I don't like audiobooks to begin with, but having one that sounds like my GPS sounds even worse. Hard pass.

It will take years - decades - before an AI can compete with a human. In lots of things, not just narration.


These are trained on specific audiobook performers and replicate their performance styles for particular genres.

The technology is probably similar to Synth V (for singing); some of the voices, in some respects, are already better than most human amateurs IMO.


Add in that lawmakers are saying that AI-generated shit can't be copyrighted...that's a precedent that would be hard to overcome. This means that those audiobooks would be open to theft and no one could do shit because the spoken/audio version of the book would not be copyrighted to the author.

None of this AI created shit is going to stick much.
"When the last tree has fallen, and the rivers are poisoned, you cannot eat money, oh no." ~Aurora

“Someone will always try to sell you despair, just so they don't feel alone.” ~Ursula Vernon
0

#7 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 05 January 2023 - 02:55 PM

View PostQuickTidal, on 05 January 2023 - 02:50 PM, said:

View PostAzath Vitr (D, on 05 January 2023 - 02:24 PM, said:

View PostTsundoku, on 05 January 2023 - 02:18 PM, said:

I don't like audiobooks to begin with, but having one that sounds like my GPS sounds even worse. Hard pass.

It will take years - decades - before an AI can compete with a human. In lots of things, not just narration.


These are trained on specific audiobook performers and replicate their performance styles for particular genres.

The technology is probably similar to Synth V (for singing); some of the voices, in some respects, are already better than most human amateurs IMO.


Add in that lawmakers are saying that AI-generated shit can't be copyrighted...that's a precedent that would be hard to overcome. This means that those audiobooks would be open to theft and no one could do shit because the spoken/audio version of the book would not be copyrighted to the author.

None of this AI created shit is going to stick much.


The text of the book is copyrighted; just as people can't produce and distribute their own audiobooks without violating copyright, they can't share the AI-generated audiobook (without authorization) without violating copyright.
0

#8 User is offline   QuickTidal 

  • Frog
  • Group: Team Quick Ben
  • Posts: 21,339
  • Joined: 05-November 05
  • Location:Nowhere Specific
  • Interests:Nothing, just sitting. Quietly.

Posted 05 January 2023 - 03:02 PM

View PostAzath Vitr (D, on 05 January 2023 - 02:55 PM, said:

View PostQuickTidal, on 05 January 2023 - 02:50 PM, said:

View PostAzath Vitr (D, on 05 January 2023 - 02:24 PM, said:

View PostTsundoku, on 05 January 2023 - 02:18 PM, said:

I don't like audiobooks to begin with, but having one that sounds like my GPS sounds even worse. Hard pass.

It will take years - decades - before an AI can compete with a human. In lots of things, not just narration.


These are trained on specific audiobook performers and replicate their performance styles for particular genres.

The technology is probably similar to Synth V (for singing); some of the voices, in some respects, are already better than most human amateurs IMO.


Add in that lawmakers are saying that AI-generated shit can't be copyrighted...that's a precedent that would be hard to overcome. This means that those audiobooks would be open to theft and no one could do shit because the spoken/audio version of the book would not be copyrighted to the author.

None of this AI created shit is going to stick much.


The text of the book is copyrighted; just as people can't produce and distribute their own audiobooks without violating copyright, they can't share the AI-generated audiobook (without authorization) without violating copyright.


Uh no.

Audiobooks are copyrighted separately from the written text.

The argument in court about the AI generated artwork being the copyright of the person who set the AI to make the art failed....this precedent would apply here.

Copyright law for books is way more specific than you think.

So when they go to copyright the audiobook, and have an AI-generated narration, the case for that copyright to be not-valid is HIGH. The copyright for an audiobook is FOR the audio of that written word, not the written word/story itself.
"When the last tree has fallen, and the rivers are poisoned, you cannot eat money, oh no." ~Aurora

“Someone will always try to sell you despair, just so they don't feel alone.” ~Ursula Vernon
0

#9 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 05 January 2023 - 04:04 PM

View PostQuickTidal, on 05 January 2023 - 03:02 PM, said:

View PostAzath Vitr (D, on 05 January 2023 - 02:55 PM, said:

View PostQuickTidal, on 05 January 2023 - 02:50 PM, said:

View PostAzath Vitr (D, on 05 January 2023 - 02:24 PM, said:

View PostTsundoku, on 05 January 2023 - 02:18 PM, said:

I don't like audiobooks to begin with, but having one that sounds like my GPS sounds even worse. Hard pass.

It will take years - decades - before an AI can compete with a human. In lots of things, not just narration.


These are trained on specific audiobook performers and replicate their performance styles for particular genres.

The technology is probably similar to Synth V (for singing); some of the voices, in some respects, are already better than most human amateurs IMO.


Add in that lawmakers are saying that AI-generated shit can't be copyrighted...that's a precedent that would be hard to overcome. This means that those audiobooks would be open to theft and no one could do shit because the spoken/audio version of the book would not be copyrighted to the author.

None of this AI created shit is going to stick much.


The text of the book is copyrighted; just as people can't produce and distribute their own audiobooks without violating copyright, they can't share the AI-generated audiobook (without authorization) without violating copyright.


Uh no.

Audiobooks are copyrighted separately from the written text.

The argument in court about the AI generated artwork being the copyright of the person who set the AI to make the art failed....this precedent would apply here.

Copyright law for books is way more specific than you think.

So when they go to copyright the audiobook, and have an AI-generated narration, the case for that copyright to be not-valid is HIGH. The copyright for an audiobook is FOR the audio of that written word, not the written word/story itself.



Assuming you don't mean that it's legal to distribute audiobooks produced without the authorization of rights holders, I guess what you're thinking is that the copyright on the words of the text only extends to whether an audiobook can legally be created, not whether a legally created but non-copyrightable audiobook could be copied or distributed? Whether redistributing a file counts as 'copying' or 'distributing', both are apparently covered by the author's copyright (on the words of the text) in the US: 'cover the right to reproduce literary work in various formats, whether the copies are in digital, paper, and/or audio formats, as well as distribution rights (which are separate from the right to make copies). It covers distributing the literary work in the US, in other countries, over the internet, and more.'

Copyright Law for Audiobook Production

Similarly, while musical recordings have their own copyright as sound recordings, they are also subject to a separate copyright for the composition.

This post has been edited by Azath Vitr (D'ivers: 05 January 2023 - 04:08 PM

0

#10 User is offline   QuickTidal 

  • Frog
  • Group: Team Quick Ben
  • Posts: 21,339
  • Joined: 05-November 05
  • Location:Nowhere Specific
  • Interests:Nothing, just sitting. Quietly.

Posted 05 January 2023 - 04:08 PM

I don't want to fight with you about another thing man....I'm just saying that this is not going to go how doomsayers seem to think and getting into the fucking weeds about it will help neither of our points of view.

The fact is that AI generated shit has had one kick ate the can for "artist copyright" and it failed...as such that's what we have to go on currently. That's it.
"When the last tree has fallen, and the rivers are poisoned, you cannot eat money, oh no." ~Aurora

“Someone will always try to sell you despair, just so they don't feel alone.” ~Ursula Vernon
0

#11 User is offline   Tiste Simeon 

  • Faith, Heavy Metal & Bacon
  • Group: Malaz Regular
  • Posts: 12,097
  • Joined: 08-October 04
  • Location:T'North

Posted 05 January 2023 - 06:02 PM

I knew Azath would be excited by this but then I assume he's looking forward to the day when the movie Surrogates comes true and there's nothing human needed just robots.
A Haunting Poem
I Scream
You Scream
We all Scream
For I Scream.
0

#12 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 05 January 2023 - 06:12 PM

There are audio examples of Apple's first two AI voices here:

Digital narration for audiobooks - Apple Books for Authors
0

#13 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 05 January 2023 - 06:27 PM

View PostTiste Simeon, on 05 January 2023 - 06:02 PM, said:

I knew Azath would be excited by this but then I assume he's looking forward to the day when the movie Surrogates comes true and there's nothing human needed just robots.


Guess you assumed my posts were long-winded rationalizations of 'Team Skynet FTW!' and didn't bother skimming. TL;DR Mixed feelings about this in the short-term; unless the author has a really annoying voice I think I'd prefer books read by the author if budget doesn't allow for hiring an excellent performer. But the 'professionals' who sound like they have no idea what they're reading will hopefully be replaced (or inspired to improve...).

But yes, I would like the end of human labor being a practical 'necessity' in order to achieve what humans desire. Especially the de facto forced labor that the vast majority of people are still subject to. Likewise, authors who want to do an audiobook but lack the funds to hire a great professional and don't want to put in the time to make a shitty one themselves (especially if they have an unappealing voice and suck at translating what they hear in their heads into verbalization) should have this option.

This post has been edited by Azath Vitr (D'ivers: 05 January 2023 - 06:27 PM

0

#14 User is offline   Cause 

  • Elder God
  • Group: Malaz Regular
  • Posts: 5,800
  • Joined: 25-December 03
  • Location:NYC

Posted 05 January 2023 - 06:50 PM

This is only the beginning:

https://www.veritonevoice.com/

Companies are working on being able to deepfake celebrity voices and likenesses for sponsored content. Celebrities will cash in huge without having to do any work. It wont be the end of voice actors careers but it will mean the viable candidate pool will shrink from thousands to a few hand picked elite who can be in everything for a fraction of the cost.

That said one door closes and a new door opens. I imagine someone will now be paid a lower fee to go through books and highlight various sentences as requiring different emotional affects. Than again, AI could probably be trained to be able to pick this up eventually too,

This post has been edited by Cause: 05 January 2023 - 06:52 PM

0

#15 User is offline   worry 

  • Master of the Deck
  • Group: Malaz Regular
  • Posts: 14,674
  • Joined: 24-February 10
  • Location:the buried west

Posted 05 January 2023 - 10:53 PM

They can finally fully automate new Simpsons episodes.
They came with white hands and left with red hands.
0

#16 User is offline   Macros 

  • D'ivers Fuckwits
  • Group: High House Mafia
  • Posts: 8,951
  • Joined: 28-January 08
  • Location:Ulster, disputed zone, British Empire.

Posted 05 January 2023 - 11:04 PM

So what it really means is more money funneling more easily to a few high profile celebs (on the deep fake licensing tech) and less money for the peasants.

Oh sorry, AI and computer replacement totally means an easier life for the people who lose their jobs, the big companies won't gobble up all the money and say fuck the poor
Wrong thread?
1

#17 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 11 January 2023 - 07:05 PM

Microsoft and Meta might be getting in on the action:

'Microsoft's new AI can simulate anyone's voice with 3 seconds of audio

Text-to-speech model can preserve speaker's emotional tone [...]

[...] trained [...] on an audio library, assembled by Meta, [...] 60,000 hours of English language speech from more than 7,000 speakers, mostly [... in] public domain audiobooks. [...] to generate a good result, the voice [...] must closely match a voice in the training data.'


Microsoft's new AI can simulate anyone's voice with 3 seconds of audio

... so it may work best for audiobooks. Or people who sound like audiobooks....

This post has been edited by Azath Vitr (D'ivers: 11 January 2023 - 07:05 PM

0

#18 User is offline   Chance 

  • Mortal Sword
  • Group: Malaz Regular
  • Posts: 1,065
  • Joined: 28-October 05
  • Location:Gothenburg, Sweden

Posted 12 January 2023 - 07:30 AM

View PostMacros, on 05 January 2023 - 11:04 PM, said:

So what it really means is more money funneling more easily to a few high profile celebs (on the deep fake licensing tech) and less money for the peasants.

Oh sorry, AI and computer replacement totally means an easier life for the people who lose their jobs, the big companies won't gobble up all the money and say fuck the poor
Wrong thread?


Its like everything else obsolete jobs disappear and new ones appear been the trend at least since the 19th century. Probably not that much fewer people involved really just different people, at least that is my experience from working with robots most days. Might loose a job in this case as a voice actor and one in post/pre production but gain one in AI development and one in computer maintenance.

Still I wonder if this will really catch on the best preformers add emotions, timings and stuff to an audiobook, my experience is that robots and programs aren't really any good for this kind of things yet. :D

This post has been edited by Chance: 12 January 2023 - 07:34 AM

0

#19 User is offline   Maark Abbott 

  • Part Time Catgirl
  • Group: Malaz Regular
  • Posts: 4,263
  • Joined: 11-November 14
  • Location:Lether, apparently...
  • Interests:Redacted

Posted 12 January 2023 - 08:35 AM

Gross. AI is nothing but parrots, trained to imitate.
Debut novel 'Incarnate' now available on Kindle
0

#20 User is online   Azath Vitr (D'ivers 

  • Ascendant
  • Group: Malaz Regular
  • Posts: 3,249
  • Joined: 07-February 16

Posted 12 January 2023 - 12:29 PM

'What scares human narrators is that some of them are pretty good.

[... Apple's] A.I. narrators are simply terrible at fiction. The majority of these audiobooks are romances and thrillers. It's hard to imagine romance fans thrilling to dialogue from one of the genre's sexy alpha heroes when it's recited in the earnest female voice of "Madison," which seems by far to be the most popular of Apple's four options.'


[Solution is to have an idealized sexy voice (could eventually be of the listener's choosing, or optimized for the listener) substitute in those passages---but the AI would probably have to be paired with a second AI system to figure that out without a significant error rate. Mixing up the voices---for example female protagonist speaking with deep male voice or vice versa---would throw people out of the story. Publisher might expect the author to go through and indicate where the 'sexy' voice(s) should be....]


'Likewise, I listened to the in medias res opening scene of a thriller [....] 'Don't let go of me!' she shouted," recited Jackson with zombie-like placidity.'


[Similar issue. Shouldn't be difficult to associate 'shouted' with a standard 'shouting' delivery (likewise for other terms indicating a particular type of emotional delivery---though many could be conflated (crudely) with the same 'emotional/dramatic' treatment). Not likely to be as good as the human narrator on which it's based tailoring the 'shouting' to the text and context (until AI manages to simulate that too), but the narrator may still have a more appealing voice than the average audiobook performer.]


'Another thing the A.I. narrators fail at [so far] is humor. [Aside from deadpan humor I'd imagine. ...] the line registers as nonsensical when read with the flat intonation of the AI narrator. Accents are another stumbling block. [...]

[...] even if a sophisticated A.I. someday emerges that can alter its voice depending on who's speaking, [...] "Context is everything, and we may make a different choice about the way that a sentence is delivered because of who is saying it to whom at any given time in the story." [...]

However, nonfiction is another story. [...] books like When Your Baby Won't Stop Crying [...] are ideal candidates for A.I. narration. Their audience is limited enough that an audiobook with a human narrator might not be feasible [...] "[...] backlist titles and nonfiction that nobody was going to put into audio anyway. Here is a tool that can make it accessible for people."'

Apple Just Rolled Out A.I. Audiobooks. What Scares Human Narrators Is That Some of Them Are Pretty Good.
0

Share this topic:


  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

3 User(s) are reading this topic
0 members, 3 guests, 0 anonymous users