ucrel.lancs.ac.uk Open in urlscan Pro
148.88.67.202  Public Scan

URL: https://ucrel.lancs.ac.uk/wmatrix/
Submission: On January 02 via manual from GB — Scanned from GB

Form analysis 0 forms found in the DOM

Text Content

WMATRIX CORPUS ANALYSIS AND COMPARISON TOOL

Wmatrix is a software tool for corpus analysis and comparison. It provides a web
interface to natural language processing tools such as the USAS and CLAWS corpus
annotation tools for English, plus (from Wmatrix6 onwards) the multilingual
semantic tagger PyMUSAS, and standard corpus linguistic methodologies such as
frequency lists and concordances. It extends the keywords method to key
grammatical categories and key semantic domains.

Wmatrix allows the user to run these tools via a web browser such as Chrome or
Firefox, and so will run on any computer (Mac, Windows, Linux) with a web
browser and a network connection. Wmatrix was initially developed by Paul Rayson
in the REVERE project, extended and applied to corpus linguistics during PhD
work and is still being updated regularly. Earlier versions were available for
Unix via terminal-based command line access (tmatrix) and Unix via Xwindows
(Xmatrix), but these only offer retrieval of text pre-annotated with USAS and
CLAWS.



Sections in this introduction to Wmatrix: screenshots, screencasts (short video
introductions), acknowledgements and references for Wmatrix, and example
applications and publications.

Tutorial for Wmatrix5: with step-by-step instructions using a case study on how
to compare Liberal Democrat and Labour Party Manifestos for the 2005 UK General
Election (updated November 2022). Further examples of the application to the
2010 general election manifestos can be seen on Paul's blog. The plain text
versions of the 2010 UK election manifestos can be downloaded for use in your
favourite text analysis software (with thanks to Martin Wynne for editing two of
the files). TEI encoded versions of the 2010 election manifestos are now
available (with thanks to Lou Burnard). Similar application has also been
carried out on the 2015, 2017 and 2019 General Election manifestos with
downloadable versions of the documents from seven main parties.

Tutorial for Wmatrix6: with step-by-step instructions for comparability this
uses the same case study on how to compare Liberal Democrat and Labour Party
Manifestos for the 2005 UK General Election (updated June 2023).

One version of Wmatrix is now currently live for public use:



https://ucrel-wmatrix6.lancaster.ac.uk/
Wmatrix6 is open for invited beta testers
https://ucrel-wmatrix5.lancaster.ac.uk/
Wmatrix5 is expected to be available until April 2025
Wmatrix4 closed on 1st March 2023



Usernames for Wmatrix are free to members and alumni of Lancaster University for
non-commercial research. Please apply on Wmatrix5 using your Lancaster email
address, or if you no longer have access to a Lancaster address as an alumni
then please contact Paul Rayson. Accounts on Wmatrix5 are freely available for
UK government and academic researchers in countries on the OECD DAC list of ODA
recipients (https://www.oecd.org/), and these accounts will stay free beyond the
current one month trial period. Please apply on Wmatrix5 using your
organisational email address.

Usernames for non-commercial research and teaching: (e.g. by non-Lancaster
academics and students). A free one-month trial is available for individual
academic users, please apply on Wmatrix5 using your organisational email address
to set up a username and password. Once the one-month trial has expired,
usernames are available for £50 per username per year from the online secure
order page run by Lancaster University. Multiple usernames (or years) may be
purchased at a reduced cost e.g. for teaching purposes. Please contact Paul for
details. Further development, support, and external availability of Wmatrix
currently depends on licensing its use.



--------------------------------------------------------------------------------



INTRODUCTION TO WMATRIX


FOLDERS

Wmatrix users can upload their own corpus data to the system, so that it can be
automatically annotated and viewed within the web browser. Each file is stored
in a folder (equivalent to a folder in Windows or directory on Unix).

INPUT FORMAT GUIDELINES

The analysis may be improved with some pre-editing of the input text, although
pre-editing is not normally required. There are guidelines provided for texts to
be tagged by CLAWS. Most important is the replacement of less-than (<) and
greater-than (>) characters by the corresponding SGML entity references (&lt;)
and (&gt;) respectively. The text may contain well-formed HTML, SGML or XML
tags. If the text contains less-than or greater-than symbols in formulae, for
example, then CLAWS may mistake large quantities of the following text for SGML
tags, or fail to POS tag the file. The guidelines mention start and end text
markers, but these are not required since they are inserted for you by Wmatrix.



TAG WIZARD

Wmatrix users can upload their file and complete the automatic tagging process
by clicking on the tag wizard. Once the file has been uploaded to the web
server, it is POS tagged by CLAWS and semantically tagged by USAS. This process
can be carried out step by step starting with the 'load file without tagging'
option in the advanced interface. As a shortcut you can simply upload frequency
profiles if you have them. The format for a frequency list is a very simple two
column format with a total line at the head of the file. You can see an example
of this. The column widths are not significant.


MY TAG WIZARD

My Tag Wizard is a variant of the tag wizard which allows you to override or
extend the system dictionaries for your own data. There are two main uses.
First, you can override the current most likely tag for any word or MWE. Second,
you can extend the dictionaries in terms of coverage of vocabulary and tagset.
For example, you can create a new tag by listing the words and MWEs that you
wish to be tagged with it.



VIEWING FOLDERS

By clicking on the folder name, the user can see its contents. Following the
application of the tag wizard, the folder contains the original text, POS and
semantically tagged versions of that text, and a set of frequency profiles.

SIMPLE AND ADVANCED INTERFACES

The user can toggle between simple and advanced interfaces in Wmatrix. The
advanced interface offers more options and more control over the data.



FREQUENCY PROFILES

From the folder view, the user can click on a frequency list to see the most
frequent items in their corpus. Frequency lists are available for words in the
simple interface, and in the advanced interface for POS tags and semantic tags.
The lists can be sorted alphabetically or by frequency.



CONCORDANCES

From the frequency list view, the user can click on 'concordance' and see
standard concordances. These can show the usual word based concordance as well
as all occurrences for words in one POS or semantic category.



KEY WORDS, KEY POS AND KEY DOMAINS: COMPARISON OF FREQUENCY LISTS

From the folder view, the user can click on compare frequency list to perform a
comparison of the frequency list for their corpus against another larger
normative corpus such as the BNC sampler, or against another of their own texts
(once that text has been loaded into Wmatrix). This comparison can be carried
out at the word level to see keywords, or at the POS (in the advanced
interface), or at the semantic level (to see key concepts or domains). The
log-likelihood statistic is employed by Wmatrix. For more details, see the
log-likelihood calculator. In the simple interface, word and tag clouds are
shown which visualise the more significant differences in the larger font sizes.
In the advanced interface more detailed frequency information is also displayed
in table form. Then the key comparison shows the most significant key items
towards the top of the list since the result is sorted on the LL
(log-likelihood) field which shows how significant the difference is. You should
just look at items with a '+' code since this shows overuse in your text as
compared to the standard English corpora. To be statistically significant you
should look at items with a LL value over about 7, since 6.63 is the cut-off for
99% confidence of significance.



N-GRAMS AND C-GRAMS

Recurrent sequences of words are called n-grams in Wmatrix. These are similar to
clusters in WordSmith and lexical bundles in Biber's work. You can calculate
n-grams of length 2 to 5 for each text. Collapsed-grams (or c-grams) are a
merged version of these lists. They show you which 2-grams are subsets of
3-grams, which 3-grams are subsets of 4-grams, and so on. The resulting c-gram
list is a tree structure with the longest n-grams on the left and shortest
n-grams on the right.



COLLOCATIONS

Collocations in Wmatrix are pairs of words that occur together more often than
would be expected due to chance. There are a choice of 11 different statistics
that can be used to calculate the strength of association between the two words.
For further details about these statistics, see the following paper:

Piao, S. (2002) Word alignment in English-Chinese parallel corpora. Literary and
linguistic computing, 17 (2), 207-230. doi:10.1093/llc/17.2.207

The collocation feature was introduced in September 2009 and is currently in
beta testing.


--------------------------------------------------------------------------------


SCREENCASTS:

This section shows short video introductions to the Wmatrix software. Further
videos will be appearing soon.




--------------------------------------------------------------------------------


ACKNOWLEDGEMENTS AND REFERENCES:

Wmatrix was initially developed within the REVERE project (REVerse Engineering
of Requirements) funded by the EPSRC, project number GR/MO4846.

Lancaster University Proof of concept funding in July 2006 provided support for
a new server and continued software development. In December 2006, further
interface design using XHTML/CSS was carried out by Andrew Foote (InfoLab21
Knowledge Business Centre) funded under support from the European Regional
Development Fund. Through a Lancaster University small grant (Towards an Online
Conceptual Database of the Latin Vulgate Bible) a 'reader' interface is being
developed for pre-tagged corpora.

Why the name, Wmatrix? Originally, I wrote a piece of software called Matrix
which presented tables of frequency information from corpora, hence the named is
partially derived from mathematical 'matrices'. This was Unix terminal based
using 'curses'. I then wrote an X-windows version with a graphical user
interface and named it Xmatrix. The web based version came next, hence Wmatrix.
I also have a Java API to the website called Jmatrix. There's a note in my PhD
saying that it has nothing to do with any films featuring Keanu Reeves, but if
you're a Doctor Who fan like me, you may recognise another meaning of the
Matrix.

The collocation feature in Wmatrix uses software derived from MLCT developed by
Scott Piao.

The C-grams feature uses software developed by Andrew Stone.

Thanks are due to Steve Wattam who ported the semantic tagger, frequency
profiling and concordance software to Linux from Solaris.

Please reference Wmatrix as one of the following:
Rayson, P. (2008). From key words to key semantic domains. International Journal
of Corpus Linguistics. 13:4 pp. 519-549. DOI: 10.1075/ijcl.13.4.06ray
Rayson, P. (2009) Wmatrix: a web-based corpus processing environment, Computing
Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/
Rayson, P. (2003). Matrix: A statistical method and software tool for linguistic
analysis through corpus comparison. Ph.D. thesis, Lancaster University.
(abstract or full text )

All icons and emojis designed by OpenMoji - the open-source emoji and icon
project. License: CC BY-SA 4.0 SQLite icon provided by Wikimedia Commons.

--------------------------------------------------------------------------------


PUBLICATIONS AND APPLICATIONS USING WMATRIX:

Wmatrix has been applied to numerous issues including: systems engineering,
Aspect oriented requirements engineering, impact analysis of academic research,
Ontology learning, Frequency profile comparison of written and spoken English,
Political science research, Corpus stylistics, Training chatbots: comparison of
human-human and human-machine dialogues, Key word analysis, Key word-class
analysis for EAP, Key domain analysis, Phraseology, Comparison of political
party manifestos, Metaphors in political discourse, Analysis of online language,
Discourse analysis, e-learning materials development, modality, Computer content
analysis: analysis of interview transcripts and Entrepreneurship studies and
knowledge transfer.
      Abu Shawar, Bayan; Atwell, Eric. Using dialogue corpora to train a
      chatbot. In Archer, D, Rayson, P, Wilson, A & McEnery, T (editors)
      Proceedings of CL2003: International Conference on Corpus Linguistics, pp.
      681-690 Lancaster University. 2003.
 1.   Bandar Al-Hejin (2014) Covering Muslim women: Semantic macrostructures in
      BBC News. Discourse & Communication. doi: 10.1177/1750481314555262
 2.   Ang, P.S., Kock, Y.L. (2023). Contesting Views in the Representation of
      ICERD Ratification in English Language Newspapers. In: Rajandran, K., Lee,
      C. (eds) Discursive Approaches to Politics in Malaysia. Asia in
      Transition, vol 18. Springer, Singapore.
      https://doi.org/10.1007/978-981-19-5334-7_9
 3.   Archer, D., Culpeper, J. and Rayson, P. (2005) Love - a familiar or a
      devil? An exploration of key domains in Shakespeare's Comedies and
      Tragedies. Presented at the AHRC ICT Methods Network Expert Seminar on
      Linguistics. Lancaster University, 8 September 2005.
 4.   Archer, D. and Malory, B. (2017) Tracing facework over time using
      semi-automated methods. International Journal of Corpus Linguistics,
      Volume 22, Number 1, 2017, pp. 27-56. DOI: 10.1075/ijcl.22.1.02arc
 5.   Giuseppina Balossi (2014) A Corpus Linguistic Approach to Literary
      Language and Characterization Virginia Woolf's The Waves. Benjamins.
 6.   Balossi, G. (2020). Key Pronouns through Wmatrix in a Novel of Formation:
      Conrad's The Shadow-Line. Umanistica Digitale, 5(9), 79-96. DOI:
      10.6092/issn.2532-8816/10542
 7.   Beigman Klebanov, B., Diermeier, D., and Beigman, E. 2008. Automatic
      annotation of semantic fields for political science research. Journal of
      Language Technology and Politics 5(1):95-120.
      http://www.cs.huji.ac.il/~beata/publications.html
 8.   Bianchi F. (2016). "Subtitling Jane Austen: Pride & Prejudice by Joe
      Wright". In Colomba C. (ed.), Pride and Prejudice: A Bicentennial
      Bricolage, ALL, Forum, Udine, pp. 253-265.
 9.   Bianchi F. (2017). The social tricks of advertising. Discourse strategies
      of English-speaking tour operators on Facebook. Iperstoria 10, pp. 3-32.
      http://www.iperstoria.it/joomla/images/PDF/Numero_10/monografica_10/Bianchi_10_2017.pdf
 10.  Bianchi F. (2017). Strategie promozionali degli operatori del lusso in
      Facebook, Lingue e Linguaggi, 20, Special Issue, edited by M.G. Guido,
      "Strategie di comunicazione dei prodotti di lusso attraverso l'inglese
      come 'lingua franca' internazionale. Sostenibilita ed emozioni come leve
      strategiche per lo sviluppo del 'Made in Puglia'", pp. 239-271.
      http://siba-ese.unisalento.it/index.php/linguelinguaggi/article/view/17358/14849
 11.  Borza N. (2021) The Discursive Representation of Violence in the Context
      of the Migration Crisis in Europe: A CDA Case Study on the Discursive
      Support of Non-violence in the Media Reporting on the Chemnitz Events. In:
      Anesa P., Fragonara A. (eds) Discourse Processes between Reason and
      Emotion. Postdisciplinary Studies in Discourse. Palgrave Macmillan, Cham.
      DOI: 10.1007/978-3-030-70091-1_5
 12.  Breeze, R. (2018) Imagining the people in UKIP and Labour. In Hidalgo
      Tenorio, E., Benitez-Castro, M. A., de Cesare, F. (eds). Populist
      Discourse: A Methodological Synergy. London: Routledge (pp. 120-135).
 13.  Breeze, R. (2019) Emotion in politics: Affective-discursive practices in
      UKIP and Labour. Discourse & Society 30 (1), 24-43.
 14.  Calvo Maturana, Ma del Coral. 2012. Maternidad y voces poéticas en 'The
      Adoption Papers' de Jackie Kay: un estudio de estilistica de corpus.
      [Motherhood and poetic voices in 'The Adoption Papers' by Jackie Kay: a
      corpus stylistics study] PhD. Granada: Universidad de Granada.
 15.  Calzada Pérez, Maria. 2010. "Learning from Obama and Clinton: Using
      individuals' corpora in the language classroom". Moreno Jaen et al. (eds)
      Exploring New Paths in Language Pedagogy, London: Equinox. p. 191- 212.
 16.  Caimotto, M. Cristina (2020). Discourses of Cycling, Road Users and
      Sustainability: An Ecolinguistic Investigation. Palgrave Macmillan. (see
      chapter 5)
 17.  Capriello A, Mason P, Davis B, Crotts J. 2013. Farm tourism experiences in
      travel reviews. A cross-comparison of three alternative methods for data
      analylsis. Journal of Business Research, 66: 778-785
 18.  Castaneda, A., & Lopez de D'Amico, R. 2012 PODER Y LENGUAJE EN BRUISED
      HIBISCUS, DE ELIZABETH NUNEZ: ANÁLISIS LITERARIO A TRAVÉS DE LA
      HERRAMIENTA INFORMÁTICA WMATRIX. [Power and Language in Elizabeth Nunez's
      Bruised Hibiscus: a literary analysis through the use of WMatrix] Tonos
      Digital [Online] 22:0. Available at
      http://www.tonosdigital.es/ojs/index.php/tonos/article/view/736/512
 19.  Castañeda, R. R. (2015). Land Acquisition and the Semantic Context of Land
      within the Normative Construction of "Modern Development". In E.
      Osabuohien (Ed.),Handbook of Research on In-Country Determinants and
      Implications of Foreign Land Acquisitions (pp. 63-82). Hershey, PA:
      Business Science Reference. doi: 10.4018/978-1-4666-7405-9.ch004
 20.  Chandra, Y. (2016) A rhetoric-orientation view of social entrepreneurship.
      Social Enterprise, 12:2, 161-200. doi: 10.1108/SEJ-02-2016-0003
 21.  Cheng, Le and Cheng Chen. (2019). The Construction of Relational Frame
      Model in Chinese President Xi Jinping's Foreign Visit Speeches, Text &
      Talk 2:149-170. https://doi.org/10.1515/text-2019-2022
 22.  Christos Charitonidis, Awais Rashid, Paul J. Taylor (2017) Predicting
      Collective Action from Micro-Blog Data. In J. Kawash et al. (eds.),
      Prediction and Inference from Social Networks and Social Media, Lecture
      Notes in Social Networks, DOI 10.1007/978-3-319-51049-1_7
 23.  Jonathan Charteris-Black and Clive Seale. (2010). Gender and the language
      of illness. Basingstoke: Palgrave Macmillan.
 24.  Charteris-Black, J., & Seale, C. (2013). Men and emotion talk: Evidence
      from the experience of illness. Gender And Language, 1(1). Retrieved 1
      May, 2013, from
      https://www.equinoxpub.com/journals/index.php/GL/article/view/17190
 25.  Chitchyan, R., Sampaio, A., Rashid, A. and Rayson, P. (2006). Evaluating
      EA-Miner: Are Early Aspect Mining Techniques Effective? In proceedings of
      Towards Evaluation of Aspect Mining (TEAM 2006). Workshop Co-located with
      ECOOP 2006, European Conference on Object-Oriented Programming, 20th
      edition, July 3-7, Nantes, France, pp. 5-8.
 26.  Da Silva AL, Dennick R. Corpus analysis of problem based learning
      transcripts : an exploratory study. Medical education. 2010;44(3):280-8.
 27.  Da Silva AL, Dennick R. 2009 CORPORA ANALYSIS OF PROBLEM-BASED LEARNING
      TRANSCRIPTS. In ASME Annual Scientific Meeting 2009. Edinburgh, UK
 28.  Da Silva AL, Dennick R. 2009 - PBL - "it's all talk". Corpora Analysis of
      Problem Based Learning transcripts. In o Association for Medical Education
      in Europe (AMEE) conference 2009. Malaga, Spain
 29.  Da Silva AL, Dennick R. 2010 -Applying corpora research methods to the
      study of Language and Clinical Reasoning in a Problem Based Learning
      Curricula. In Promoting Excellence in Healthcare Educational Research - A
      Multiprofessional Conference. Law and Social Sciences Building University
      of Nottingham, Nottingham
 30.  Da Silva AL, Dennick R 2010 EVALUATING PROBLEM-BASED LEARNING TRANSCRIPTS
      USING CORPUS ANALYSIS: DO MEN AND MACHINES AGREE?. In 14th Ottawa
      Conference. Miami, Florida, US
 31.  Da Silva AL, Dennick R 2010 EVALUATING PROBLEM Corpus Analysis of
      Problem-Based Learning Transcripts: A new method to look into PBL. In o
      Researching Medical Education. London, UK
 32.  Da Silva, Wharrad & Pitt., 2011. Interprofessional Learning Sets:
      Exploratory analysis of online students discussions (Poster). In NET
      Conference, 2011. Cambridge, UK.
 33.  Da Silva, & Pitt., 2011. More than words: Analysis of students'
      Interprofessional online discussions. In EIPEN 2011. Ghent, Belgium.
 34.  Davis B, Pope C, Mason P, Magwood G, Jenkins C. 2011. 'It's a wild thing,
      waiting to get me': Stance analysis of African Americans with diabetes.
      Diabetes Educator, 409-418
 35.  Davis B, Maclagan M. 2013. Talking with Maureen: Pauses, extenders, and
      formulaic language in small stories and canonical narratives by a woman
      with dementia. In R. Schrauf and N Mueller, eds. Dialogue and dementia:
      Cognitive and communicative engagement. NY: Psychology Press
 36.  Davis B, Mason P. 2013. Computer-aided identification of stance shifts and
      semantic themes in electronic discourse analysis. In H. Lim & F. Sudweeks,
      eds, Innovative Methods and Technologies for Electronic Discourse
      Analysis. Hershey: ICI.
 37.  Debras, C. and L'Hôte, E. (2015) Framing, metaphor and dialogue: A
      multimodal approach to party conference speeches. Metaphor and the Social
      World 5:2 (2015), 177-204. doi 10.1075/msw.5.2.01deb
 38.  Marilyn Deegan, Harold Short, Dawn Archer, Paul Baker, Tony McEnery, Paul
      Rayson (2004) Computational Linguistics Meets Metadata, or the Automatic
      Extraction of Key Words from Full Text Content. RLG Diginews, Vol. 8, No.
      2. ISSN 1093-5371.
 39.  Demjén, Z. (2011) The role of second person narration in representing
      mental states in Sylvia Plath's Smith Journal. Journal of Literary
      Semantics. 40(1), pp1-22.
 40.  Doherty, N., Lockett, N., Rayson, P. and Riley, S. (2006). Electronic-CRM:
      a simple sales tool or facilitator of relationship marketing? 29th
      Institute for Small Business & Entrepreneurship Conference. International
      Entrepreneurship - from local to global enterprise creation and
      development. 31 October - 2 November 2006, Cardiff-Caerdydd, UK.
 41.  Escobar, W. (2015). Language configurations in the spoken production of
      Colombian EFL university students. Colomb. Appl. Linguist. J., 17(1), pp.
      114-129 DOI: 10.14483/udistrital.jour.calj.2015.1.a08
 42.  FBI Law Enforcement Bulletin (July 2012) The Language of Psychopaths: New
      Findings and Implications for Law Enforcement. By Michael Woodworth,
      Ph.D.; Jeffrey Hancock, Ph.D.; Stephen Porter, Ph.D.; Robert Hare, Ph.D.;
      Matt Logan, Ph.D.; Mary Ellen O'Toole, Ph.D.; and Sharon Smith, Ph.D.
      https://leb.fbi.gov/articles/featured-articles/the-language-of-psychopaths-new-findings-and-implications-for-law-enforcement
 43.  Gabrielatos, C. and McEnery, T. (2005). Epistemic modality in MA
      dissertations. In. Fuertes Olivera, P.A. (ed.) Lengua y Sociedad:
      Investigaciones recientes en linguistica aplicada. Linguistica y Filologia
      no. 61. Valladolid: Universidad de Valladolid, pp. 311-331.
 44.  Gacitua, R., Sawyer, P., Rayson, P. (2008). A flexible framework to
      experiment with ontology learning techniques. In Knowledge-Based Systems,
      21, 3, April 2008, pp. 192-199. DOI: 10.1016/j.knosys.2007.11.009
 45.  Jeffrey T. Hancock, Michael T. Woodworth and Stephen Porter (2013) Hungry
      like the wolf: A word-pattern analysis of the language of psychopaths.
      Legal and Criminological Psychology. Volume 18, Issue 1, pages 102-114.
      http://dx.doi.org/10.1111/j.2044-8333.2011.02025.x
 46.  Hayes, N. and Poole, R. (2022) A diachronic corpus-assisted semantic
      domain analysis of US presidential debates. Corpora, 17, 3, December 2022,
      pp. 449-469. DOI: 10.3366/cor.2022.0266
 47.  He, J. (2019) Two-layer reading positions in comments on online news
      discourse about China. Discourse & Communication.
      https://doi.org/10.1177/1750481319856206
 48.  Hidalgo-Downing, Laura and Yasra Hanawi (2017) Bush's and Obama's
      addresses to the Arab World: recontextualizing stance in political
      discourse. In Karin Ajmer & Diana Lewis (eds.) The Yearbook of Corpus
      Linguistics and Pragmatics. Special Issue on 'Contrastive Analysis of
      Discourse -pragmatic Aspects of Linguistic Genres'.
 49.  Hidalgo-Downing, Laura (2014) The role of negative-modal synergies in
      Charles Darwin's The Origin of Species. In Geoff Thompson and Laura Alba
      Juez (eds.) Evaluation in Discourse. John Benjamins. Pps. 259-279.
 50.  Hidalgo-Tenorio E. (2009) The Metaphorical Construction of Ireland. In:
      Ahrens K. (eds) Politics, Gender and Conceptual Metaphors. Palgrave
      Macmillan, London. https://doi.org/10.1057/9780230245235_6
 51.  Yufang Ho. (2007) Investigating the key concept differences between the
      two editions of John Fowles's The Magus - a corpus semantic approach.? The
      27th International Conference of the Poetics and Linguistics Association
      (PALA), Kansai Gaidai University, Hirakata, Osaka, Japan, 31 July - 4
      August 2007.
 52.  Hou, Z. (2019) Using semantic tagging to examine the American Dream and
      the Chinese Dream. Semiotica (227), pp. 145-168. DOI:
      10.1515/sem-2016-0116
 53.  Hu, C. (2015) Using Wmatrix to Explore Discourse of Economic Growth.
      English Language Teaching, Vol. 8, No. 9. DOI: 10.5539/elt.v8n9p146
 54.  Xin Huang (2003) A Computer-aided Diachronic Content Analysis of Twentieth
      Century Political Discourse in China. MA dissertation in Language Studies,
      Lancaster University.
 55.  Irwin, P.M. (2015). The development of resilience in two cohorts of older,
      single women, living on their own, in a small rural town in Australia.
      (Unpublished doctoral dissertation). University of Oxford, Oxford, UK.
 56.  Isaacs T, Murdoch J, Demjén Z, Stevenson F. (2020) Examining the language
      demands of informed consent documents in patient recruitment to cancer
      trials using tools from corpus and computational linguistics. Health.
      10.1177/1363459320963431
 57.  Jones, M., Rayson, P. and Leech, G. (2004) Key category analysis of a
      spoken corpus for EAP. Presented at The 2nd Inter-Varietal Applied Corpus
      Studies (IVACS) International Conference on "Analyzing Discourse in
      Context" The Graduate School of Education, Queen's University, Belfast,
      Northern Ireland, 25 - 26 June, 2004.
 58.  Kheovichai, B. (2015). Metaphorical scenarios in business science
      discourse. Iberica, 29, 155-178. Available from
      http://www.aelfe.org/documents/09_IBERICA_29.pdf
 59.  Emilie L'Hôte and Maarten Lemmens (2009) Reframing treason: metaphors of
      change and progress in new Labour discourse. CogniTextes, Volume 3,
      http://cognitextes.revues.org/index248.html
 60.  Leech, G., Rayson, P., and Wilson, A. (2001). Word Frequencies in Written
      and Spoken English: based on the British National Corpus. Longman, London.
      (see the companion website for more details)
 61.  Leech, G. (2013) Virginia Woolf meets Wmatrix. Etudes de Stylistique
      Anglaise No. 4, pp. 15-26.
 62.  Leedham, M., Lillis, T. & Twiner, A. (2020). Exploring the core
      'preoccupation' of social work writing: A corpus-assisted discourse study.
      Journal of Corpora and Discourse Studies. 3. Pp.1-26.
      https://jcads.cardiffuniversitypress.org/articles/abstract/26/
 63.  Leedham, M. (2020). 'Social workers dismissed concerns': A corpus-assisted
      discourse study of the portrayal of a profession in UK newspapers. In:
      Corpus Assisted Discourse Studies (CADS) Conference, 17-19 June 2020
      (online). University of Sussex.
 64.  Leedham, M.; Lillis, T. and Twiner, A. (2019). Exploring the core
      'preoccupation' of social work writing: A corpus-assisted discourse study.
      In: International Corpus Linguistics Conference, 23-26 July 2019. Cardiff
      University.
 65.  Lin, Y-L. (2015) Contrastive analysis of adolescent learner interlanguage
      in asynchronous online communication: A keyness approach. System. Volume
      55, December 2015, Pages 53-62. DOI: 10.1016/j.system.2015.08.011
 66.  Lin, Y-L. (2017) Keywords, semantic domains and intercultural competence
      in the British and Taiwanese Teenage Intercultural Communication Corpus.
      Corpora, Volume 12 Issue 2, Page 279-305. DOI: 10.3366/cor.2017.0119
 67.  López-Rodríguez, C. I. (2022). Emotion at the end of life: Semantic
      annotation and key domains in a pilot study audiovisual corpus. Lingua,
      277, 103401. DOI: 10.1016/j.lingua.2022.103401
 68.  Lord V, Davis B, Mason P. 2008. Stance-shifting in language used by sex
      offenders. Psychology, Crime & Law 14, 357-379.
 69.  MacArthur, F., Krennmayr, T. and Littlemore, J. (2015). How basic is
      UNDERSTANDING IS SEEING when reasoning about knowledge? Asymmetric uses of
      SIGHT metaphors in office hours' consultations in English as academic
      lingua franca. Metaphor and Symbol 30 (3): 184-217. DOI:
      10.1080/10926488.2015.1049507
 70.  Maclagan M, Davis B, Lunsford R. 2008. Fixed expressions, extenders and
      metonymy in the speech of people with Alzheimer's disease. In Phraseology:
      an interdisciplinary perspective, eds. S. Granger & F. Meunier. Amsterdam
      & NY: John Benjamins,
 71.  Patrick Maiwald (2011). Exploring a Corpus of George MacDonald's Fiction.
      North Wind: Journal of George MacDonald Studies 30: 50-84. Available here:
      http://www.snc.edu/english/documents/North_Wind/By_genre_or_topic/Language/Exploring_a_Corpus_of_George_MacDonald%27s_Fiction_-_Patrick_Maiwald.pdf
 72.  Markowitz DM, Hancock JT (2014) Linguistic Traces of a Scientific Fraud:
      The Case of Diederik Stapel. PLoS ONE 9(8): e105937.
      doi:10.1371/journal.pone.0105937
 73.  McIntyre, D. and Walker, B. (2010) 'How can corpora be used to explore the
      language of poetry and drama?' in McCarthy, M. and O'Keefe, A. (eds) The
      Routledge Handbook of Corpus Linguistics. Abingdon: Routledge.
 74.  Afida Mohamad Ali (2007). Semantic fields of problem in business English:
      Malaysian and British journalistic business texts. Corpora, 2, 2, pp.
      211-239.
 75.  Akira Murakami, Paul Thompson, Susan Hunston and Dominik Vajn (2017)
      ‘What is this corpus about?’: using topic modelling to explore a
      specialised corpus. Corpora, Volume 12 Issue 2, Page 243-277. DOI:
      10.3366/cor.2017.0118
 76.  Murphy, S. (2007). Now I am alone: A corpus stylistic approach to
      Shakespearian soliloquies. Papers from the Lancaster University
      Postgraduate Conference in Linguistics & Language Teaching, Vol. 1. Papers
      from LAEL PG 2006 Edited by Costas Gabrielatos, Richard Slessor & J.W.
      Unger.
 77.  Manvender Kaur Sarjit Singh, N. H. D. (2020). Automated Detecting of Key
      concept of Kurdish National Identity for Discourse-Historical Approach
      (DHA). International Journal of Advanced Science and Technology, 29(3s),
      404 - 418. Retrieved from
      http://sersc.org/journals/index.php/IJAST/article/view/5618
 78.  Nakano, T. and Koyama, Y. (2005). e-Learning Materials Development Based
      on Abstract Analysis Using Web Tools. Knowledge-Based Intelligent
      Information and Engineering Systems. 9th International Conference, KES
      2005, Melbourne, Australia, September 14-16, 2005, Proceedings, Part I,
      LNCS 3681, Springer, pp. 794-800. DOI 10.1007/11552413_113
 79.  Newman, J. and K. Geeraert. 2014. TIME in a semantically annotated corpus
      of Canadian English. In B. Lewandowska-Tomaszczyk and K. Kosecki (eds.),
      Time and Temporality in Language and Human Experience, pp. 241-262. Lodz
      Studies in Language 32. Frankfurt a. M.: Peter Lang.
 80.  O'Halloran, K.A. (2010) 'Critical reading of a text through its electronic
      supplement', Digital Culture and Education, 2(2): 210-229.
      http://www.digitalcultureandeducation.com/cms/wp-content/uploads/2011/06/DCE1022_ohalloran_2010.pdf
 81.  O'Halloran, K.A. (2011a) 'Limitations of the logico-rhetorical module:
      Inconsistency in argument, online discussion forums and Electronic
      Deconstruction', Discourse Studies, 13(6): 797-806.
 82.  O'Halloran, K.A. (2011b) 'Investigating Argumentation in Reading Groups:
      Combining Manual Qualitative Coding and Automated Corpus Analysis Tools',
      Applied Linguistics 32(2): 172-196.
 83.  O'Halloran, K. (2012) Deleuze, Guattari and the use of web-based corpora
      for facilitating critical analysis of public sphere arguments. Discourse,
      Context & Media. Volume 2, Issue 1, March 2013, Pages 40-51, ISSN
      2211-6958, 10.1016/j.dcm.2012.12.001.
 84.  O'Halloran, K.A. (2014) Deconstructing arguments via digital mining of
      online comments. Literary and Linguistic Computing, DOI:
      10.1093/llc/fqu034
 85.  O'Halloran, K.A. (2017) Posthumanism and Deconstructing Arguments: Corpora
      and Digitally-driven Critical Analysis, London: Routledge.
 86.  O'Halloran, K.A. (2019) A posthumanist pedagogy using digital text
      analysis to enhance critical thinking in higher education. Digital
      Scholarship in the Humanities. DOI: 10.1093/llc/fqz060
 87.  Vincent B.Y. Ooi, Peter K.W. Tan & Andy K.L. Chiang (2007) Analyzing
      personal weblogs in Singapore English: the Wmatrix approach. Studies in
      Variation, Contacts and Change in English. Volume 2. Research Unit for
      Variation, Contacts and Change in English (VARIENG), University of
      Helsinki. http://www.helsinki.fi/varieng/journal/volumes/02/ooi_et_al/
 88.  Vincent B.Y. Ooi (2008) lexis of electronic gaming on the Web: a
      Sinclairian approach, International Journal of Lexicography, 21 (3),
      311-323. doi: 10.1093/ijl/ecn021
 89.  Parkinson, C. and Howorth, C. (2008) 'The language of social
      entrepreneurs', Entrepreneurship and Regional Development, 20(3): 285-309.
      http://dx.doi.org/10.1080/08985620701800507
 90.  Magali Paquot, Sylviane Granger, Paul Rayson and Cédrick Fairon (2004)
      Extraction of multi-word units from EFL and native English corpora: The
      phraseology of the verb 'make'. Presented at Europhras, European Society
      of Phraseology, 26-29 August 2004, Basel, Switzerland.
 91.  Pérez-Paredes, P. (2017). A Keyword Analysis of the 2015 UK Higher
      Education Green Paper and the Twitter Debate. In Power, persuasion and
      manipulation in specialised genres: providing keys to the rhetoric of
      professional communities. Bern: Peter Lang.
 92.  Pérez-Paredes, P. & Díez-Bedmar, B. (2018) Researching learner language
      through POS Keyword and syntactic complexity analyses. In S. Götz and J.
      Mukherjee (eds.) Learner Corpora and Language Teaching. Studies in Corpus
      Linguistics Series. Amsterdam: John Benjamins.
 93.  Potts, A. (2015). Filtering the Flood: Semantic Tagging as a Method of
      Identifying Salient Discourse Topics in a Large Corpus of Hurricane
      Katrina Reportage. In Paul Baker and Tony McEnery (eds.) Corpora and
      Discourse Studies, pp. 285-304. doi: 10.1057/9781137431738.0018
 94.  Potts, A. and Baker, P. (2013) Does semantic tagging identify cultural
      change in British and American English?, International Journal of Corpus
      Linguistics 17(3): 295-324.
 95.  Potts, A. and Kjær, A.L. (2015) Constructing Achievement in the
      International Criminal Tribunal for the Former Yugoslavia (ICTY): A
      Corpus-Based Critical Discourse Analysis. International Journal for the
      Semiotics of Law. doi: 10.1007/s11196-015-9440-y
 96.  Amanda Potts, Monika Bednarek, Helen Caple (2015) How can computer-based
      methods help researchers to investigate news values in large datasets? A
      corpus linguistic study of the construction of newsworthiness in the
      reporting on Hurricane Katrina. Discourse & Communication, Vol 9, Issue 2,
      pp. 149 - 172. DOI: 10.1177/1750481314568548
 97.  Paul Rayson (2004). Keywords are not enough. Invited talk for JAECS (Japan
      Association for English Corpus Studies) at Chuo University, Tokyo, Japan,
      27th November 2004. (slides)
 98.  Rayson, P. and Smith, N. (2006) The key domain method for the study of
      language varieties. The Third Inter-Varietal Applied Corpus Studies
      (IVACS) group International Conference on "LANGUAGE AT THE INTERFACE".
      University of Nottingham, UK, 23-24 June 2006.
 99.  Sawyer, P., Rayson, P. and Cosh, K. (2005) Shallow Knowledge as an Aid to
      Deep Understanding in Early Phase Requirements Engineering. IEEE
      Transactions on Software Engineering. Volume 31, number 11, November,
      2005, pp. 969 - 981. ISSN 0098-5589.
      doi: http://doi.ieeecomputersociety.org/10.1109/TSE.2005.129
 100. Sera, H. (2013). Dipictions of emotions in Snow Country: A semantic
      analysis. Presented at PALA 2013, Heidelberg.
 101. Sera, H. (2012). Dickens' 'The Signal-Man' and Poe's 'The Fall of the
      House of Usher': How did they describe terror? Presented at PALA 2012,
      Malta.
 102. Shapero, J. J. (2011). The Language of Suicide Notes. Unpublished Thesis.
      The University of Birmingham. http://etheses.bham.ac.uk/1525/
 103. Shapero, J. J. & Blackwell, Susan A. (2012) "'There are letters for you
      all on the sideboard': what can linguists learn from multiple suicide-note
      writers?" p.225-244. In Samuel Tomblin, Nicci MacLeod, Rui Sousa-Silva and
      Malcolm Coulthard (Eds.) Proceedings of The International Association of
      Forensic Linguists' Tenth Biennial Conference. Centre for Forensic
      Linguistics, Aston University, U.K. [ISBN: 978 1 85449 432 0]
      www.forensiclinguistics.net/iafl-10-proceedings.pdf
 104. Song, Y., Lee, CC., Huang, Z. (2019). The news prism of nationalism versus
      globalism: How does the US, UK and Chinese elite press cover 'China's
      rise'?. Journalism, 1-20. First published online on May 8, 2019.
      https://doi.org/10.1177/1464884919847143
 105. Emily C. Soriano (2014) A corpus linguistics approach to exploring
      interpersonal processes in couple-focused therapy for problematic alcohol
      use. Thesis for Master of Experimental Psychology, University of Arizona.
 106. Emily C. Soriano, Kelly E. Rentscher, Michael J. Rohrbaugh and Matthias R.
      Mehl (2016) A Semantic Corpus Comparison Analysis of Couple-Focused
      Interventions for Problematic Alcohol Use. Clinical Psychology and
      Psychotherapy. DOI: 10.1002/cpp.2030
 107. M Stubbs (2014) Patterns of emotive lexis and discourse organization in
      short stories by James Joyce. In P Blumenthal et al eds. Les émotions dans
      le discours. Emotions in Discourse. Frankfurt/Main: Peter Lang. 237-53.
 108. Francois Taiani, Paul Grace, Geoff Coulson and Gordon Blair (2008) Past
      and future of reflective middleware: Towards a corpus-based impact
      analysis. The 7th Workshop On Adaptive And Reflective Middleware (ARM'08)
      December 1st 2008, Leuven, Belgium, collocated with Middleware 2008.
 109. Tan, Yesheng. A Corpus-based Cognitive Study of the "Rustic Literariness"
      of Translated Chinese Fiction: Focusing on Sinologist Translators' Works
      in the Last Four Decades. In Ricardo Morrato (ed.) Diverse voices in
      Chinese Translation and interpreting, Springer, 2021. DOI:
      10.1007/978-981-33-4283-5_6
 110. Trotta, J. (2019). What can a corpus tell us about apocalyptic/dystopian
      texts? In J. Trotta, Z. Filipovic, & H. Sadri (eds.), Broken mirrors:
      Representations of Apocalypses and Dystopias in Popular Culture. London:
      Routledge, pp. 179-201.
 111. Van de Putte, Thomas. (2017) European citizenship policy between building
      collectives and appealing to individuals: A study of person deixis.
      Flubacher, Mi-Chia;Diederich, Catherine;Dankel, Philip - Bulletin
      VALS-ASLA, 2016, vol. 104, p. 105-123. Swiss Association of Applied
      Linguistics (Vereinigung für angewandte Linguistik in der Schweiz
      VALS-ASLA) http://doc.rero.ch/record/289057
 112. Walker, B. (2010) Wmatrix, key-concepts and the narrators in Julian
      Barnes' Talking It Over. In Busse, B. and McIntyre, D. (eds.) Language and
      Style, pp. 364-387.
 113. Walker, B. (2012). Character and Characterisation in Julian Barnes'
      Talking It Over: A Corpus Stylistic Analysis. PhD Thesis, Lancaster
      University.
 114. Walkerdine, J. and Rayson, P. (2004) P2P-4-DL: Digital Library over
      Peer-to-Peer. In Caronni G., Weiler N., Shahmehri N. (eds.) Proceedings of
      Fourth IEEE International Conference on Peer-to-Peer Computing (PSP2004)
      25-27 August 2004, Zurich, Switzerland. IEEE Computer Society Press, pp.
      264-265. ISBN 0-7695-2156-8.
 115. Rebecca Willis (2017) Taming the Climate? Corpus analysis of politicians'
      speech on climate change, Environmental Politics, 26:2, 212-231. DOI:
      10.1080/09644016.2016.1274504
 116. Wong, I., Ou, J. and Wilson, A. (2021) Evolution of hoteliers'
      organizational crisis communication in the time of mega disruption.
      Tourism Management. DOI: 10.1016/j.tourman.2020.104257
 117. Xin, Jing and Matheson, Donald (2015) The Chinese writer as empty
      signifier: A corpus-based analysis of the English-language reporting of
      the 2012 Nobel Prize in Literature. Chinese Journal of Communication 8
      (3): 289-305. DOI: 10.1080/17544750.2015.1051070
 118. A number of papers were presented at the PALA 2007 conference (29-30 July
      2007, Kansai Gaidai University, Osaka, Japan) including those by Geoffrey
      Leech, Yu-fang Ho, Dan McIntyre, Haruko Sera, Brian Walker. Mick Short and
      Brian Walker also ran a Workshop: Using Wmatrix to compare scenes from
      Harold Pinter's Betrayal. See the book of abstracts on the conference
      website for more details.
 119. EPSRC InfoLab21 Knowledge Transfer Study Report and the ICT Knowledge
      Transfer Research Project