首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >Objective-C / Cocoa Touch中的HTML字符解码

Objective-C / Cocoa Touch中的HTML字符解码
EN

Stack Overflow用户
提问于 2009-07-09 16:56:28
回答 9查看 94.3K关注 0票数 103

首先,我发现了这个:Objective C HTML escape/unescape,但它不适用于我。

我的编码字符(来自RSS feed,btw)如下所示:&

我在网上到处搜索,找到了相关的讨论,但没有修复我的特定编码,我认为它们被称为十六进制字符。

EN

回答 9

Stack Overflow用户

发布于 2010-05-16 19:02:34

看看我的NSString category for HTML吧。以下是可用的方法:

- (NSString *)stringByConvertingHTMLToPlainText;
- (NSString *)stringByDecodingHTMLEntities;
- (NSString *)stringByEncodingHTMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;
票数 163
EN

Stack Overflow用户

发布于 2011-03-02 13:38:19

似乎没有人提到最简单的选项之一:Google Toolbox for Mac

(尽管名称如此,但它也适用于iOS。)

https://github.com/google/google-toolbox-for-mac/blob/master/Foundation/GTMNSString%2BHTML.h

/// Get a string where internal characters that are escaped for HTML are unescaped 
//
///  For example, '&' becomes '&'
///  Handles   and 2 cases as well
///
//  Returns:
//    Autoreleased NSString
//
- (NSString *)gtm_stringByUnescapingFromHTML;

我只需要在项目中包含三个文件: header、implementation和GTMDefines.h

票数 35
EN

Stack Overflow用户

发布于 2009-07-09 17:19:35

我应该把这个贴到GitHub上或者别的什么上。这属于NSString类别,使用NSScanner实现,并处理十六进制和十进制数字字符实体以及通常的符号字符实体。

此外,它可以相对优雅地处理格式错误的字符串(后面跟一个无效的字符序列),这在使用此代码的released app中是至关重要的。

- (NSString *)stringByDecodingXMLEntities {
    NSUInteger myLength = [self length];
    NSUInteger ampIndex = [self rangeOfString:@"&" options:NSLiteralSearch].location;

    // Short-circuit if there are no ampersands.
    if (ampIndex == NSNotFound) {
        return self;
    }
    // Make result string with some extra capacity.
    NSMutableString *result = [NSMutableString stringWithCapacity:(myLength * 1.25)];

    // First iteration doesn't need to scan to & since we did that already, but for code simplicity's sake we'll do it again with the scanner.
    NSScanner *scanner = [NSScanner scannerWithString:self];
    do {
        // Scan up to the next entity or the end of the string.
        NSString *nonEntityString;
        if ([scanner scanUpToString:@"&" intoString:&nonEntityString]) {
            [result appendString:nonEntityString];
        }
        if ([scanner isAtEnd]) {
            goto finish;
        }
        // Scan either a HTML or numeric character entity reference.
        if ([scanner scanString:@"&" intoString:NULL])
            [result appendString:@"&"];
        else if ([scanner scanString:@"'" intoString:NULL])
            [result appendString:@"'"];
        else if ([scanner scanString:@""" intoString:NULL])
            [result appendString:@"\""];
        else if ([scanner scanString:@"<" intoString:NULL])
            [result appendString:@"<"];
        else if ([scanner scanString:@"&gt;" intoString:NULL])
            [result appendString:@">"];
        else if ([scanner scanString:@"&#" intoString:NULL]) {
            BOOL gotNumber;
            unsigned charCode;
            NSString *xForHex = @"";

            // Is it hex or decimal?
            if ([scanner scanString:@"x" intoString:&xForHex]) {
                gotNumber = [scanner scanHexInt:&charCode];
            }
            else {
                gotNumber = [scanner scanInt:(int*)&charCode];
            }
            if (gotNumber) {
                [result appendFormat:@"%C", charCode];
            }
            else {
                NSString *unknownEntity = @"";
                [scanner scanUpToString:@";" intoString:&unknownEntity];
                [result appendFormat:@"&#%@%@;", xForHex, unknownEntity];
                NSLog(@"Expected numeric character entity but got &#%@%@;", xForHex, unknownEntity);
            }
            [scanner scanString:@";" intoString:NULL];
        }
        else {
            NSString *unknownEntity = @"";
            [scanner scanUpToString:@";" intoString:&unknownEntity];
            NSString *semicolon = @"";
            [scanner scanString:@";" intoString:&semicolon];
            [result appendFormat:@"%@%@", unknownEntity, semicolon];
            NSLog(@"Unsupported XML character entity %@%@", unknownEntity, semicolon);
        }
    }
    while (![scanner isAtEnd]);

finish:
    return result;
}
票数 18
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/1105169

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档