research
concept
Technology inhibits emotional experience and social connection. This can limit our communication and discussion in public spaces. There is a demand for engaging, interactive content for public screens, so we present an emotional spoken word image retriever and visualiser made with the goal of enhancing the emotional expression had with a display. We employed a method of applied research, which involved system development, artistic practice and user studies. Through this journey we found that by displaying affective visual feedback of speech people are more than willing to express themselves emotionally and, therefore, have a more meaningful experiences with the interactive content than is currently available.
smart city screens
One of the foundations of a “Smart City” is to create an ecosystem that engages more actively with its citizens. We are reshaping our urban environments, and Smart Cities are becoming a reality. Experts in the field of public space are becoming increasingly aware of the importance of spaces that feel inviting for people to meet each other and stick around for discussion. Research is necessary to question, which social and urban situations occur around a screen, and how these situations can be identified and characterised. By exploring and creating forms of interactive content that are suitable for the specific context, we can reuse the content in different locations. Engagement is the relationship between people and their surroundings.
free speech
Speaker’s Corners in public spaces are meeting places to share expressions and practice free speech. However, information and knowledge sharing through digital means, and the Internet has become a popular soapbox. Although technology will not lead us to an information Utopia, the role of free speech as an inherent right could be better practiced. There is a threat to free speech. Comprehending the structure of free speech of current cyber-law allows us to pre-determine, and work for the future of free speech rights, and propose alternatives through other mediums. So what can we do to create new platforms, spaces, and open networks that continue to enable free speech in the future?
technical approach
Voice and speech recognition systems had not been integrated into our daily personal use until recently. The tools are now made accessible to a wider audience and people are experimenting with the technology in many ways, but typically for commercial purposes and without an affective feedback. There is a gap in the experimentation with creative speech, storytelling, poetry, singing, and other forms of meaningful word-use with computer systems. We produced and demonstrated how to visually communicate emotions from spoken word, which encourages the user to be more expressive within the context of public screen-mediated interactions. Features of kaleidOk include computing emotion from speech, retrieving images based on emotional input, and displaying them artistically with vocal attributes. The design is based both on empirical premise and our subjective criteria.
user response
The user testing showed that the use case of a more intimate screen interaction experience was most effective, so kaleidOk will continue as a console-based setup in public spaces. kaleidOk was deployed in a wide variety of contexts, from hackathons to a café-cinema, in a university, and public spaces. Through the observation and interviews it became apparent people are quite afraid of public speaking, speaking to a computer in general and expressing their emotions in a public setting. However, people are genuinely interested in the potential future applications of speech recognition based creative technologies. Based on user responses, kaleidOk is received as an innovative concept, it's interesting and engaging, it's aesthetically pleasing, it reflects the user's emotional input, and it makes the user more aware of their current emotional state.
further work
In our further work, we aim to address aspects of emotion detection from the voice, and updated developments on emotion in speech recognition. Firstly we will employ an alternative speech to text service (currently experimenting with CMU Sphinx). It would be a great enhancement to use analysis of tonal features to contribute to the colour search and texture generation, so not only speech but also singing and other vocal expressions can enhance the interaction and results. It is most desirable that we use an alternative image search index, and one that can be accessed and contributed to by a local community audience. User studies will need to address other factors such as user experience design and interface design. We will also find a research methodology to test emotional accuracy, and for reproducible outcomes.