首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >openoffice可以统计控制台的字数吗?

openoffice可以统计控制台的字数吗?
EN

Stack Overflow用户
提问于 2013-02-28 11:37:54
回答 5查看 1.4K关注 0票数 5

我有一个小问题,我需要在控制台内计算字数来读取doc,docx,pptx,ppt,xls,xlsx,odt,pdf…所以不要推荐我| wc -w或grep,因为它们只处理文本或控制台输出,而且只计算空格,而在日语、汉语、阿拉伯语、印度语和希伯来语中,它们使用不同的分隔符,所以单词计数是错误的,我尝试用这个来计数

代码语言:javascript
运行
复制
pdftotext file.pdf -| wc -w
/usr/local/bin/docx2txt.pl < file.docx | wc -w
/usr/local/bin/pptx2txt.pl < file.pptx | wc -w
antiword file.doc -| wc -w 
antiword file.word -| wc -w

在某些情况下,microsoft word,openoffice sad 1000个单词,如果语言是(日语,中文,印度语等),计数器将返回10或300个单词。),但如果我使用普通字符,那么我没有问题,在某些情况下,最大的错误是少了3个字符,这就是"OK“

我试着用soffice,openoffice转换,然后尝试WC -w,但我甚至不能转换,

代码语言:javascript
运行
复制
soffice --headless --nofirststartwizard --accept=socket,host=127.0.0.1,port=8100; --convert-to pdf some.pdf /var/www/domains/vocabridge.com/devel/temp_files/23/0/东京_1000_words_Docx.docx 

代码语言:javascript
运行
复制
 openoffice.org  --headless  --convert-to  ........

代码语言:javascript
运行
复制
openoffice.org3 --invisible 

因此,如果有人知道使用openoffice或其他工具或linux控制台正确计数或显示文档统计数据的任何方法,请与我们分享

谢谢。

EN

回答 5

Stack Overflow用户

回答已采纳

发布于 2013-04-15 17:47:48

我发现答案是创建一个服务

代码语言:javascript
运行
复制
#!/bin/sh
#
# chkconfig: 345 99 01
#
# description: your script is a test service
#

(while sleep 1; do
  ls pathwithfiles/in | while read file; do
    libreoffice --headless -convert-to pdf "pathwithfiles/in/$file" --outdir pathwithfiles/out
    rm "pathwithfiles/in/$file"
  done
done) &

然后,我需要的php脚本计算了所有内容

代码语言:javascript
运行
复制
 $ext = pathinfo($absolute_file_path, PATHINFO_EXTENSION);
        if ($ext !== 'txt' && $ext !== 'pdf') {
            // Convert to pdf
            $tb = mktime() . mt_rand();
            $tempfile = 'locationofpdfs/in/' . $tb . '.' . $ext;
            copy($absolute_file_path, $tempfile);
            $absolute_file_path = 'locationofpdfs/out/' . $tb . '.pdf';
            $ext = 'pdf';
            while (!is_file($absolute_file_path)) sleep(1);
        }
        if ($ext !== 'txt') {
            // Convert to txt
            $tempfile = tempnam(sys_get_temp_dir(), '');
            shell_exec('pdftotext "' . $absolute_file_path . '" ' . $tempfile);
            $absolute_file_path = $tempfile;
            $ext = 'txt';
        }
        if ($ext === 'txt') {
            $seq = '/[\s\.,;:!\? ]+/mu';
            $plain = file_get_contents($absolute_file_path);
            $plain = preg_replace('#\{{{.*?\}}}#su', "", $plain);
            $str = preg_replace($seq, '', $plain);
            $chars = count(preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY));
            $words = count(preg_split($seq, $plain, -1, PREG_SPLIT_NO_EMPTY));
            if ($words === 0) return $chars;
            if ($chars / $words > 10) $words = $chars;
            return $words;
        }
票数 1
EN

Stack Overflow用户

发布于 2013-04-14 15:40:26

如果你有Microsoft Word (当然还有Windows ),你可以写一个VBA宏,或者如果你想直接从命令行运行,你可以写一个VBScript脚本,如下所示:

代码语言:javascript
运行
复制
wordApp = CreateObject("Word.Application")
doc = ... ' open up a Word document using wordApp
docWordCount = doc.Words.Count
' Rinse and repeat...

如果你有OpenOffice.org/LibreOffice,你会有类似(但更多)的选择。如果您想继续使用office应用程序并运行宏,您或许可以做到这一点。我不太了解Python,无法告诉您如何使用它,但我可以为您提供另一种选择:创建一个StarBasic脚本来从命令行获取单词计数。粗略地说,您可以执行以下操作:

票数 2
EN

Stack Overflow用户

发布于 2013-03-08 21:36:24

我想这可能会达到你的目标。

代码语言:javascript
运行
复制
# Continuously updating word count
import unohelper, uno, os, time
from com.sun.star.i18n.WordType import WORD_COUNT
from com.sun.star.i18n import Boundary
from com.sun.star.lang import Locale
from com.sun.star.awt import XTopWindowListener

#socket = True
socket = False
localContext = uno.getComponentContext()

if socket:
    resolver = localContext.ServiceManager.createInstanceWithContext('com.sun.star.bridge.UnoUrlResolver', localContext)
    ctx = resolver.resolve('uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext')
else: ctx = localContext

smgr = ctx.ServiceManager
desktop = smgr.createInstanceWithContext('com.sun.star.frame.Desktop', ctx)

waittime = 1 # seconds

def getWordCountGoal():
    doc = XSCRIPTCONTEXT.getDocument()
    retval = 0

    # Only if the field exists
    if doc.getTextFieldMasters().hasByName('com.sun.star.text.FieldMaster.User.WordCountGoal'):
        # Get the field
        wordcountgoal = doc.getTextFieldMasters().getByName('com.sun.star.text.FieldMaster.User.WordCountGoal')
        retval = wordcountgoal.Content

    return retval

goal = getWordCountGoal()

def setWordCountGoal(goal):
    doc = XSCRIPTCONTEXT.getDocument()

    if doc.getTextFieldMasters().hasByName('com.sun.star.text.FieldMaster.User.WordCountGoal'):
        wordcountgoal = doc.getTextFieldMasters().getByName('com.sun.star.text.FieldMaster.User.WordCountGoal')
        wordcountgoal.Content = goal

    # Refresh the field if inserted in the document from Insert > Fields >
    # Other... > Variables > Userdefined fields
    doc.TextFields.refresh()

def printOut(txt):
    if socket: print txt
    else:
        model = desktop.getCurrentComponent()
        text = model.Text
        cursor = text.createTextCursorByRange(text.getEnd())
        text.insertString(cursor, txt + '\r', 0)

def hotCount(st):
    '''Counts the number of words in a string.

    ARGUMENTS:

    str st: count the number of words in this string

    RETURNS:

    int: the number of words in st'''
    startpos = long()
    nextwd = Boundary()
    lc = Locale()
    lc.Language = 'en'
    numwords = 1
    mystartpos = 1
    brk = smgr.createInstanceWithContext('com.sun.star.i18n.BreakIterator', ctx)
    nextwd = brk.nextWord(st, startpos, lc, WORD_COUNT)
    while nextwd.startPos != nextwd.endPos:
        numwords += 1
        nw = nextwd.startPos
        nextwd = brk.nextWord(st, nw, lc, WORD_COUNT)

    return numwords

def updateCount(wordCountModel, percentModel):
    '''Updates the GUI.
    Updates the word count and the percentage completed in the GUI. If some
    text of more than one word is selected (including in multiple selections by
    holding down the Ctrl/Cmd key), it updates the GUI based on the selection;
    if not, on the whole document.'''

    model = desktop.getCurrentComponent()
    try:
        if not model.supportsService('com.sun.star.text.TextDocument'):
            return
    except AttributeError: return

    sel = model.getCurrentSelection()
    try: selcount = sel.getCount()
    except AttributeError: return

    if selcount == 1 and sel.getByIndex(0).getString == '':
        selcount = 0

    selwords = 0
    for nsel in range(selcount):
        thisrange = sel.getByIndex(nsel)
        atext = thisrange.getString()
        selwords += hotCount(atext)

    if selwords > 1: wc = selwords
    else:
        try: wc = model.WordCount
        except AttributeError: return
    wordCountModel.Label = str(wc)

    if goal != 0:
        pc_text =  100 * (wc / float(goal))
        #pc_text = '(%.2f percent)' % (100 * (wc / float(goal)))
        percentModel.ProgressValue = pc_text
    else:
        percentModel.ProgressValue = 0

# This is the user interface bit. It looks more or less like this:

###############################
# Word Count            _ o x #
###############################
#            _____            #
#     451 /  |500|            #
#            -----            #
# ___________________________ #
# |##############           | #
# --------------------------- #
###############################

# The boxed `500' is the text entry box.

class WindowClosingListener(unohelper.Base, XTopWindowListener):
    def __init__(self):
        global keepGoing

        keepGoing = True
    def windowClosing(self, e):
        global keepGoing

        keepGoing = False
        setWordCountGoal(goal)
        e.Source.setVisible(False)

def addControl(controlType, dlgModel, x, y, width, height, label, name = None):
    control = dlgModel.createInstance(controlType)
    control.PositionX = x
    control.PositionY = y
    control.Width = width
    control.Height = height
    if controlType == 'com.sun.star.awt.UnoControlFixedTextModel':
        control.Label = label
    elif controlType == 'com.sun.star.awt.UnoControlEditModel':
        control.Text = label
    elif controlType == 'com.sun.star.awt.UnoControlProgressBarModel':
        control.ProgressValue = label

    if name:
        control.Name = name
        dlgModel.insertByName(name, control)
    else:
        control.Name = 'unnamed'
        dlgModel.insertByName('unnamed', control)

    return control

def loopTheLoop(goalModel, wordCountModel, percentModel):
    global goal

    while keepGoing:
        try: goal = int(goalModel.Text)
        except: goal = 0
        updateCount(wordCountModel, percentModel)
        time.sleep(waittime)

if not socket:
    import threading
    class UpdaterThread(threading.Thread):
        def __init__(self, goalModel, wordCountModel, percentModel):
            threading.Thread.__init__(self)

            self.goalModel = goalModel
            self.wordCountModel = wordCountModel
            self.percentModel = percentModel

        def run(self):
            loopTheLoop(self.goalModel, self.wordCountModel, self.percentModel)

def wordCount(arg = None):
    '''Displays a continuously updating word count.'''
    dialogModel = smgr.createInstanceWithContext('com.sun.star.awt.UnoControlDialogModel', ctx)

    dialogModel.PositionX = XSCRIPTCONTEXT.getDocument().CurrentController.Frame.ContainerWindow.PosSize.Width / 2.2 - 105
    dialogModel.Width = 100
    dialogModel.Height = 30
    dialogModel.Title = 'Word Count'

    lblWc = addControl('com.sun.star.awt.UnoControlFixedTextModel', dialogModel, 6, 2, 25, 14, '', 'lblWc')
    lblWc.Align = 2 # Align right
    addControl('com.sun.star.awt.UnoControlFixedTextModel', dialogModel, 33, 2, 10, 14, ' / ')
    txtGoal = addControl('com.sun.star.awt.UnoControlEditModel', dialogModel, 45, 1, 25, 12, '', 'txtGoal')
    txtGoal.Text = goal

    #addControl('com.sun.star.awt.UnoControlFixedTextModel', dialogModel, 6, 25, 50, 14, '(percent)', 'lblPercent')

    ProgressBar = addControl('com.sun.star.awt.UnoControlProgressBarModel', dialogModel, 6, 15, 88, 10,'' , 'lblPercent')
    ProgressBar.ProgressValueMin = 0
    ProgressBar.ProgressValueMax =100
    #ProgressBar.Border = 2
    #ProgressBar.BorderColor = 255
    #ProgressBar.FillColor = 255
    #ProgressBar.BackgroundColor = 255

    addControl('com.sun.star.awt.UnoControlFixedTextModel', dialogModel, 124, 2, 12, 14, '', 'lblMinus')

    controlContainer = smgr.createInstanceWithContext('com.sun.star.awt.UnoControlDialog', ctx)
    controlContainer.setModel(dialogModel)

    controlContainer.addTopWindowListener(WindowClosingListener())
    controlContainer.setVisible(True)
    goalModel = controlContainer.getControl('txtGoal').getModel()
    wordCountModel = controlContainer.getControl('lblWc').getModel()
    percentModel = controlContainer.getControl('lblPercent').getModel()
    ProgressBar.ProgressValue = percentModel.ProgressValue

    if socket:
        loopTheLoop(goalModel, wordCountModel, percentModel)
    else:
        uthread = UpdaterThread(goalModel, wordCountModel, percentModel)
        uthread.start()

keepGoing = True
if socket:
    wordCount()
else:
    g_exportedScripts = wordCount,

链接了解更多信息

https://superuser.com/questions/529446/running-word-count-in-openoffice-writer

我希望这对汤姆有所帮助。

编辑:然后我发现了这个

http://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=22555

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/15126983

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档