This is probably crazy impossible and complicated to do, but what if there was a singing ability attached to an object? When you use the object (maybe it's a microphone or something) your keystrokes turn into sounds, so words appear in a speech bubble, but everyone (or a select few or something) hears a collection of sounds. Maybe there could be a limit to the number of keystrokes, to determine a song "length".
The sounds could be anything... a combination of musical notes, voices, barnyard animals, car horns, or other sounds from around the game.